Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Dashboards can be customized.
Note that each Kubernetes cluster has its own OpenTelemetry stack by default. If you need to share an OTEL stack across clusters, contact your DuploCloud support team and they can set it up for you.
The OpenTelemetry part of the AOS dashboard has 5 cards that point to Grafana dashboards. These cards depend on the the links from under Administrator --> SystemSettings -> System Config. Search for "otel" and you will find a list of settings of type otel with links as shown in the picture below:
Each entry maps to a card on the dashboard. An entry that starts with <infraname>/ applies to the cards for that Infrastructure. Note that in admin dashboard all cards are in the context of the Infrastructure and there is a infrastructure drop down.
Also note that the settings can have place holders like [[TENANT_NAME]] that get dynamically replaced by the platform when the user clicks on the respective button.
Link external data sources to AOS dashboards by adding custom links to data cards.
From the DuploCloud Portal, navigate to Administrator -> Observability -> Advanced -> Dashboard.
Select the Admin tab to add a custom link only to the Administrator AOS Dashboard, or the Common tab to include the custom link on the Tenant AOS Dashboard.
Click Add. The Add profiles Custom Link pane displays.
Enter a Name, URL, and Description for the custom link.
Click Submit. The custom link is added to the data card on the AOS Dashboard(s).
From the DuploCloud Portal, navigate to Observability -> Advanced -> Dashboard.
Click Add. The Add profiles Custom Link pane displays.
Enter a Name, URL, and Description for the custom link.
Click Submit. The custom link is added to the data card on the Tenant AOS Dashboard.
You can get the admin credentials for the grafana deployment from the tenant where the otel stack is deployed. It is the service called grafana-ui. Click edit on the service to get the credentials from the env.
How the Advanced Observability Suite and OpenTelemetry integrate with DuploCloud
Advanced Observability Suite (AOS) is based on OpenTelemetry. The following graphic shows the various components.
The OTel stack consists of 50 or more components and hundreds of configurations. If you need to change your OpenTelemetry configuration, contact your DuploCloud support team.
To view the complete deployment of the OpenTelemetry stack:
In the DuploCloud Portal, navigate to Administrator -> Observability -> Advanced -> Dashboard.
In the Observability area, click the K8s/Docker card button. The Grafana K8s Resource Monitoring dashboard launches, giving you a detailed view of resources and monitoring for Kubernetes nodes, Docker containers, and Pods.
Your OpenTelemetry data is stored in S3 Buckets in a Tenant that DuploCloud preconfigures for you during Onboarding. The name of this Tenant may vary depending on your preferences.
In the DuploCloud documentation, the OpenTelemetry Tenant is referred to as OpenTelemetry_Tenant (in bold italics) to indicate that this Tenant name is a variable (the name you chose during the Onboarding setup).
The OpenTelemetry data is stored in S3 Buckets, which you can view.
In the DuploCloud Portal, select the OpenTelemetry_Tenant from the Tenant list box at the top of the Portal.
Navigate to Cloud Services -> Storage and view the data in the S3 tab, which is stored in S3 buckets. This setup is deployed and managed via Flux Helm release infrastructure.
To view a complete list of Kubernetes deployments, containers, and S3 buckets in an OpenTelemetry deployment, select the OpenTelemetry_Tenant from the Tenant list box at the top of the Portal and navigate to Kubernetes -> Services.
To view a complete list of Docker Containers in an OpenTelemetry deployment, select the OpenTelemetry_Tenant from the Tenant list box at the top of the Portal and navigate to Kubernetes -> Containers.
The Tenant AOS Dashboard provides cost and observability data by Tenant for granular infrastructure management. This dashboard is acccessible to non-administrators.
To access the Tenant AOS Dashboard, navigate to Observability -> Advanced -> Dashboard.
Use the Tenant list box at the top of the Tenant AOS Dashboard to select the Tenant for which you wish to view metrics.
The Cloud Spend area, on the left side of the Tenant AOS Dashboard, offers a comprehensive view of expenses for the selected Tenant. It includes the following expenditure categories:
Current Month: Displays the current month’s spend for the selected Tenant.
Spend By Service: Displays a breakdown of cloud spending by Service.
The Observability area, on the right side of the Tenant AOS Dashboard gives health and performance data for a selected Tenant.
Grafana: The Grafana button, in the Observability header, opens the Grafana console where you can add, customize, or edit your AOS dashboards, query your logs, metrics, and traces, and more. For additional information, see the Grafana documentation.
Under the Observability header are data cards displaying various metrics.
Resources: Lists the type and number of DuploCloud resources, such as Services, containers, and Ingresses, in the selected Tenant.
K8s/Docker: Kubernetes and Docker metrics specific to the Tenant, assisting in container workload management.
Logs: Access Tenant-specific logs for tracking, troubleshooting, and compliance.
Metrics: Displays performance metrics relevant to the Tenant’s resources.
Traces: View traces specific to the Tenant for performance and latency monitoring of the Tenant’s applications.
Profiles: Access profiling data for in-depth application insights and performance tuning.
For Grafana-generated metrics (e.g., K8s/Docker, Logs, Metrics, Traces, Profiles), you can click on the card (header or visual data) to open the corresponding detailed view in the Grafana console. Additionally, you can add custom links to the data cards.
Working with the AOS Administrator Dashboard
Navigate to Administrator -> Observability -> Advanced -> Dashboard.
The Cloud Spend area, on the left side of the Advanced AOS Dashboard, offers a comprehensive, real-time view of expenses across all resources.
Fin Ops: The Fin Ops button, in the Cloud Spend header, opens the DuploCloud Billing dashboard, which displays billing details including billing summaries by month or Tenant, billing alerts, and DuploCloud license usage information.
Current Month: Displays cloud expenditures for the current month.
Monthly Spend: Displays spending by month. Use the Monthly Spend list box to display spending by week or day.
Spend By Service: Displays a breakdown of spend by Cloud Service.
Spend By Tenant: Highlights expenditures by Tenant.
The Observability section, on the right side of the Advanced AOS Dashboard, gives real-time health and usage data across resources.
Infrastructure: In the Observability header, the Infrastructure list box allows you to select the Infrastructure for which you wish to view observability details.
Grafana: The Grafana button, in the Observability header, opens the Grafana console where you can add, customize, or edit your dashboards, query your logs, metrics, and traces, and more. For additional information, see the Grafana documentation.
Under the Observability header are data cards displaying the following metrics:
Resources: Lists the type and number of DuploCloud resources, such as Tenants, Services, etc.
K8s/Docker: Shows Kubernetes and Docker metrics, providing visibility into containerized workloads.
Logs: Displays logs for troubleshooting and compliance checks across all resources.
Metrics: Displays rate, errors, and duration metrics across Services.
Traces: View traces to monitor request flows and latency, supporting application performance analysis.
Profiles: Access profiling data for in-depth application insights, allowing performance tuning.
For Grafana-generated metrics (e.g., K8s/Docker, Logs, Metrics, Traces, Profiles), you can click on the card (visual data or header) to open the corresponding detailed view in the Grafana console. Additionally, you can add custom links to the data cards.
Loki is the backend for logging setup with Grafana as the visualization tool. Alloy is the collector that collects the logs.
DuploCloud orchestrates the configuration such that it automatically inserts meta data like tenant name, namespace, container, host etc.
Links can be setup from logs to go to traces and then metrics
Pyroscope the backend for profile that show the CPU and Memory profile for various parts of the application.
Tempo is the backend for tracing and Alloy and Beyla are the collectors that use eBPF technology to collect traces without requiring any instrumentation.
One can go from traces to metrics
Mimir is the backend for the metrics setup with Grafana as the visualization tool. Alloy/Beyla is the collector that collects the metrics.
Following picture shows the RED (Request, Error, Duration) dashboard
There is an elaborate set of Kubernetes dashboards
DuploCloud's Advanced Observability Suite or available as a Product add-on
DuploCloud's Advanced Observability Suite (AOS) is an add-on service to boost your monitoring and troubleshooting abilities. Built on OpenTelemetry, our AOS leverages and to deliver robust, real-time observability for your cloud infrastructure. It includes real-time anomaly detection and customizable alerts and unifies metrics, traces, logs, and profiles to track all aspects of your environment easily.
The Grafana-powered dashboards can be customized to spotlight key metrics and visualize trends in real time, empowering swift, data-driven decisions. DuploCloud’s AOS simplifies troubleshooting and enhances system health by providing a holistic view of your application and infrastructure performance.
is an open-source project from the Cloud Native Computing Foundation (CNCF) that supports multiple programming languages and environments. It provides a flexible, vendor-neutral framework for monitoring and analyzing application performance. Designed for high scalability, OpenTelemetry is ideal for distributed applications and helps reduce costs by using open-source components instead of expensive proprietary systems. See the for more.
System Health Check: Track real-time metrics like CPU usage, memory consumption, network traffic, and error rates across services to ensure that applications are healthy.
Latency Tracking: Monitor request latencies to identify performance bottlenecks, particularly in services where latency spikes might indicate an issue.
Error Rate Analysis: Track error counts and types to ensure services operate as expected and identify critical failure points.
Request Flow Visualization: Visualize the path of requests through different services to understand interdependencies and identify which services may contribute to slowdowns or failures.
Service Dependency Mapping: See which services interact with each other, allowing for a clear view of critical service dependencies and helping identify potential cascading failures.
Root Cause Analysis (RCA): Trace issues back to the source by identifying slow, failed, or error-prone transactions and drilling down to pinpoint problematic services or infrastructure.
Identify Performance Bottlenecks: Detect slow components, such as long-running queries, network delays, or overloaded services, and make data-driven decisions to optimize them.
Capacity Planning: Use metrics to analyze usage patterns, forecast demand, and optimize resource allocation, helping avoid over-provisioning or under-resourcing.
Compare Release Impact: Measure the impact of new releases on system performance by comparing metrics before and after deployments.
Set Alerts for Key Metrics: Define thresholds and set alerts for anomalies in critical metrics, such as high error rates or slow response times, so that teams can act quickly.
Incident Response and Remediation: Enable responders to access context-specific insights that accelerate incident response, diagnostics, and resolution times.
Service-Level Agreement (SLA) Monitoring: Track SLA compliance metrics, such as uptime and latency, to ensure the system meets contractual obligations.
Track User Journey Metrics: Analyze performance metrics specific to different user journeys (e.g., checkout flows) to understand the impact of backend performance on user experience.
End-to-End Latency Perceptions: Collect latency and success metrics for critical user actions to help understand where improvements could enhance user satisfaction.
Detect Anomalous Behavior: Set up alerts on unusual patterns or outliers in metrics that could indicate potential security incidents.
Audit Logging: Track API call/sensitive event logs (e.g., login attempts) to assist in compliance reporting and security auditing.
Monitor Business KPIs: Track custom metrics like transaction success rates, revenue per minute, or user engagement rates to tie technical performance to business outcomes.
Cost Optimization: Identify underused resources or inefficient services to optimize operational costs and improve infrastructure utilization.
Request new capabilities via DuploCloud/Using the OpenTelemetry open-source API.
Working with the Advanced Observability Suite (AOS) dashboards in DuploCloud
The DuploCloud AOS dashboards are a gateway to the detailed Grafana dashboards, serving two purposes:
SSO and Authentication Proxy: The Grafana dashboards reside on a private network. DuploCloud acts as an authentication layer, connecting the same single sign-on (the DuploCloud login) to a Grafana session.
Summarizing Links: While AOS contains many pre-configured Grafana dashboards, you can create quick links with descriptions of the ones you use most frequently. See for more information.
Depending on your (Administrator or User), you can access the Advanced Observability Suite dashboard from two locations in the DuploCloud Portal.
The displays cloud data across all resources and allows you to select DuploCloud Infrastructures. To use it, navigate to Administrator -> Observability -> Advanced -> Dashboard in the DuploCloud Portal.
The displays data for specific Tenants. To use it, navigate to Observability -> Advanced -> Dashboard in the DuploCloud Portal.
Setting Name | Card | Dashboard Type |
---|---|---|
Click the link icon () in the header of the card to which you wish to add a custom link. The All Admin Custom Links pane displays.
Click the link icon () in the header of the card to which you wish to add a custom link. The All Admin Custom Links pane displays.
From the Grafana console, you can perform various tasks to customize your observability experience and proactively manage system health. For additional information, see the .
<infraname>/proxyurl
Grafana button on the dashboard
Admin
<infraname>/logs
Logs Button
Admin
<infraname>/metrics
Metrics Button
Admin
<infraname>/traces
Traces Button
Admin
<infraname>/k8s
K8S Button
Admin
We can fetch custom metrics from applications and ingest them in the stack. Folowing is the process to do the same:
Grafana's native Alert manager is used for alerting. Alerting can be setup based on log strings, metrics and so on.
Following is are example set of alerts for nodes in Kubernetes. Duplocloud comes with default set of alerts based on best practices and compliance standards.
Set up alerts for key observability metrics (e.g., high CPU or memory usage) to be proactively notified of potential issues.
From the DuploCloud Portal, navigate to the Admin AOS Dashboard (Administrator -> Observability -> Advanced -> Dashboard) or the Tenant AOS Dashboard (Observability -> Advanced -> Dashboard).
Click the Alert icon ()near the upper right in the Observability header.
The Grafana Alert rules page displays, allowing you to view, add, delete, or modify alerts. For more, see the Grafana alert documentation.