This the multi-page printable view of this section. Click here to print.

Return to the regular view of this page.

  • 1:
    • 2:
      • 3:
        • 4:
          • 5:
            • 6:
              • 7:
                • 8:
                  • 9:
                    • 10:
                      • 11:

                        Dashboard Templates

                        Sysdig provides a number of pre-defined dashboards to assist users in monitoring their environments and applications. Dashboard templates are essentially immutable dashboards that can’t be edited, and the scope is fixed. They are useful as is to get a quick overview of infrastructure, but you can use them as a template and can copy them to customize.

                        This section outlines the main dashboards that are available out-of-the-box.

                        1 -

                        Application Dashboards

                        Dashboard

                        Description

                        Use Cases

                        Elasticsearch

                        This view lists eight important metrics for node and document counts, shards, indexing time and query latency.

                        • Track the node count, as this can impact query times.

                        HAProxy

                        This view reports metrics for host CPU use and proxy throughput.

                        Redis

                        This view reports seven metrics for host resource usage and application performance.

                        Cassandra By Node

                        This view shows how every node in a Cassandra cluster is performing, by mixing key system metrics with Cassandra-specific metrics such as requests volume and compactions.

                        • Use this view on a group containing the entire Cassandra cluster when you have already identified that there is a problem with a metric (using the "Cassandra Overview" view), and you need to see which node is causing the problem.

                        • Spot issues such as imbalances between the size of data held in each node, nodes going down and generating a lot of hinted handoffs, or disk bottlenecks by looking at the pending compactions.

                        Cassandra Overview

                        This view shows how a Cassandra cluster is performing, by mixing key system metrics with Cassandra-specific metrics such as requests volume and compactions.

                        • Use this view on a group containing the entire Cassandra cluster as a first starting point to troubleshoot the overall health of your database.

                        • Inspect typical system metrics to make sure the cluster is not being overloaded

                        • Correlate the information displayed with important advanced Cassandra metrics such as pending compactions or JVM metrics to identify critical problems.

                        HTTP Top Requests

                        This view details the top requested URLs to your web server, including the total number of requests, average and maximum times to service the requests, and the amount of traffic contained in the requests and responses.

                        MongoDB

                        This view shows how busy the MongoDB service is, which collections are in highest demand and which have the slowest performance.

                        • Use to spot which collections may benefit from query and index performance tuning.

                        HTTP

                        This view provides a basic understanding of the health of your web server by showing the load being put on it and the server's ability to service requests in a timely manner.

                        • Gauge the overall busyness of the server.

                        • Identify correlations between the Top URLs and Slowest URLs panels to find opportunities to increase performance.

                        MySQL/PostgreSQL

                        This view shows the overall load and performance status of your SQL database transactions with metrics for the number of requests and how quickly they are handled.

                        • Determine whether performance can be improved.

                        MySQL/PostgreSQL Top

                        This view shows the top SQL queries by displaying metrics for the number of queries received and the amount of traffic sent and received for the query.

                        • Identify the most requested, highest traffic producing or slowest processing queries.

                        2 -

                        AWS CloudWatch Dashboards

                        DashboardDescription
                        ALB OverviewDisplays information such as unhealthy host count, response time, HTTP response count, active and new connection, and so on.
                        DynamoDB OverviewProvides information such as user errors and consumed Read and Write capacity units.
                        DynamoDB Overview By OperationShows the count of HTTP operations performed on the DynamoDB.
                        EC2 OverviewDisplays CPU, disk, network operations in a selected window.
                        ECS ProjectsProvides the resource count and usage percentage in each cluster.
                        ECS OverviewHighlights the containers and services per host, request count, and highest resource consumption in containers.
                        ECS ServicesDisplay information including container and request count per services and resource usage.
                        ECS Task FamiliesDisplays container and request count per task family and resource usage.
                        ElastiCache OverviewHighlights resource usage in ElastiCache.
                        ELB OverviewHighlights resource usage in ELB.
                        RDS OverviewHighlights resource usage in RDS.
                        SQS OverviewDisplays information such as number of messages sent, received, deleted in SQS.

                        3 -

                        Capacity and Resource Management Dashboards

                        DashboardDescription
                        Available Resources CalculatorEnsure there is sufficient capacity in a cluster to deploy a new application.
                        Cluster Capacity PlanningMonitor the capacity of Kubernetes clusters ensuring they’re correctly sized to support new applications when they’re deployed.
                        Pod Scheduling TroubleshootingIf a pod cannot be scheduled due to insufficient resources, use this dashboard to identify where the resource bottleneck is.
                        Pod Rightsizing & Capacity Optimization. Optimize your infrastructure and better control cluster spend by ensuring pods are sized correctly. Understand if you can free up resources by reducing memory and/or CPU requests.

                        4 -

                        Compliance & Security Dashboards

                        Dashboard

                        Description

                        Use Cases

                        Compliance (Docker)

                        Provides an overview of the available compliance metrics for Docker.

                        • Review the Docker configuration after running CIS Docker benchmark tests.

                        Compliance (Kubernetes)

                        Provides an overview of the available compliance metrics for Kubernetes

                        • Review the Kubernetes Cluster configuration after running CIS Kubernetes benchmark tests.

                        Sysdig Secure Summary

                        The summary dashboard provides a complete overview of the Sysdig Secure environment, including the number of active agents, the number of defined policies and how many have been enabled, and summary policy event information.

                        5 -

                        Containers Dashboards

                        Dashboard

                        Description

                        Use Cases

                        Container Resource Usage

                        Displays resource usage statistics, including CPU, file bytes, memory and network bytes, for containers running within the defined scope.

                        • Monitor this view to identify which containers are using disproportionate amounts of resources.

                        • Helpful in determining if an application should be moved to a more capable host.

                        • Identify which file systems are filling up or being underutilized.

                        Container File System Usage

                        This table view shows directory mount points, file system devices, and capacity and usage information for the file systems mounted on the instance. When groups are selected, metrics are averages for similar filesystem mount points.

                        Container CPU & Memory Limits

                        Shows CPU and memory limits across the environment, and the percentages currently used.

                        Container Network Traffic & Bandwidth

                        Highlights network bytes usage, connection count, errors, and queue length.

                        6 -

                        Hosts Infrastructure Dashboards

                        Dashboard

                        Description

                        Use Cases

                        Host Resource Usage

                        Displays resource usage statistics, including CPU, file bytes, memory and network bytes, for hosts running within the defined scope.

                        • Use this view to identify when a host is being over or under utilized within a group of hosts with similar job functions.

                        • Identify which file systems are filling up or being underutilized.

                        Disk and File System

                        This table view shows directory mount points, file system devices, and capacity and usage information for the file systems mounted on the instance. When groups are selected, metrics are averages for similar filesystem mount points.

                        Remotely mounted file systems are not listed by default. To enable, add the remotefs = true entry to the /opt/draios/bin/dragent.properties file on each instance.

                        Memory Usage

                        Displays the memory and swap usage and page faults.

                        Network Traffic & Bandwidth

                        Provides an overview of network traffic in the host, including throughput, queue length, and errors

                        Sysdig Agent Health and Status

                        This view reports the number of Sysdig agents deployed in your environment and their versions.

                        7 -

                        Kubernetes Dashboards

                        The Kubernetes * Health dashboards break down resource and performance metrics by various logical entities to allow for an in-depth analysis, and for critical issues to be identified and isolated. Each dashboard is built around the Golden Signals approach to monitoring: Latency, Traffic, Errors, and Saturation. Resource utilization metrics are oriented toward health and performance. These are aspects like CPU, memory, network, and storage usage by Kubernetes object. kube-state-metrics is about the status or count. Pairing kube-state-metrics with resource utilization metrics, each dashboard provides a comprehensive picture of what’s happening in your Kubernetes environment.

                        Dashboard

                        Description

                        Use Cases

                        Kubernetes Horizontal Pod Autoscaler

                        Highlights minimum, maximum, current, and desired replicas.

                        • Identify performance bottlenecks.

                        • Identify whether there are enough available pods compared to the desired pods.

                        • Use usage percentages over time to better estimate expansion capacity.

                        • Locate logical entities that are consuming too many cluster resources, or that are rapidly trending upwards towards unsustainable levels.

                        • Dive deeper into specific entities to identify the root cause of problems.

                        • Use usage percentages over time to better estimate expansion capacity.

                        For example:

                        • A deployment with no available pods indicates that the corresponding app is not serving requests. Getting a dashboard on this condition means you can visualize the metrics and spring into action to find and resolve the issue quickly.

                        • Dropping the number of pods available and remaining below the desired number indicate that your application performance is degraded or not running at the redundancy required. With these metrics represented on the dashboard, you get a quick glance of the severity of the impact on your app's user experience.

                        • A lower number of replicas running during an extended period of time than the number of replicas desired indicates a symptom of entities not working properly, such as nodes or resources unavailability, Kubernetes or Docker Engine failure, broken Docker images, and so on. No replicas for a deployment object could potentially mean that the app is down.

                        • A continuous loop of pod restart (CrashLoopBackOff) might be associated with missing dependencies or unmet requirements, or insufficient resources. In CrashLoopBackOff, pods never get into ready status and therefore are counted as unavailable and down.

                        • Use these three dashboards to provide a high-level overview of all aspects of the Kubernetes environment's performance and resource saturation status.

                        • Set high-level alerts to narrow down areas of concern, before moving to the more in-depth dashboards.

                        • Quickly identify major performance issues within each type of entity.

                        Kubernetes Resource Quota

                        Provides an overview of resource limit and request, and the number of replication controllers, services, service ports, service load balancers, configMap, and secrets.

                        Kubernetes Memory Allocation Optimization

                        Highlights Memory allocation optimization.

                        Kubernetes CPU Allocation Optimization

                        Displays CPU utilization of your Kubernetes environment.

                        Kubernetes Cluster Overview

                        Provides an overview of your Kubernetes cluster.

                        Kubernetes DaemonSet Overview

                        Overview of DaemonSet objects.

                        Kubernetes Deployment Overview

                        Highlights whether each deployment has a sufficient number of available pods and resources, and indicates the number of pods running, desired, or have been updated.

                        Kubernetes Job Overview

                        Overview of all the jobs and the performance information.

                        Kubernetes Namespace Overview

                        Displays metrics such as resource requests and resource limits at the namespace level; identifies the performance of the Kubernetes entities such as pods, deployments, DaemonSet, Statefulset, and jobs, and compliance with replicaSets specs. Highlights the number of services, deployments, replicaSets, and jobs per namespace.

                        Kubernetes Node Overview

                        Highlights the number of nodes that are ready, unavailable, or out of disk; the number of nodes that are under the memory, disk, or network pressure; compares allocatable capacity with requested capacity on the node; provides the number of pod resources of a node that are available for scheduling and the available capacity to serve the pods running on the nodes.

                        Kubernetes Pod Overview

                        Helps identify potential bottlenecks by graphing the number of container restarts, the number of pods waiting to be scheduled, resource utilization of containers within each pod and available capacity to serve pod requests, the number of available pods compared to the desired pods, and the number of pods in available state and ready to serve requests.

                        Kubernetes ReplicaSet Overview

                        Provides details such as the number of pods per replicaSet, the desired number of pods per replicaSet, and pods per replicaSet that are in a ready state.

                        Kubernetes StatefulSet Overview

                        Overview of the StatefulSet objects in your environment.

                        Kubernetes Cluster and Node Capacity

                        Highlights a comprehensive overview of the performance of the hosts or nodes that form the Kubernetes cluster, including CPU, memory, and file system usage, and network traffic.

                        Before analyzing the Dashboard, consider the following guidelines related to resource usage:

                        • If Resource Limits is undefined for a container, Kubernetes does not default to a value.

                        • if Resource Requests is unspecified for a container, Kubernetes defaults to Limits if that is explicitly specified, otherwise to an implementation-defined value. Limits do not default to any value.

                        • If both Resource Limits and Resource Requests are not specified, no matter which value had been defaulted by Kubernetes, kube-state-metric (and hence Sysdig Monitor) reports zero. Therefore, only user-defined requested are reported by the kubernetes.pod.resourceRequests.memByte metric.

                        • The memory used by a container (the value returned by memory.used.bytes) can be greater than the memory requested by a pod (the value returned by kubernetes.pod.resourceRequests.memByte). This is permissible in Kubernetes because Requests value determines the minimum amount of resources required.

                        For these reasons, it can be deduced that

                        • In some cases, the value of Used Resources will be more than that of Resource Requests and Resource Limits, and the value of Resource RequestS could be more than that of Resource Limit.

                        • The value of kubernetes.pod.resourceRequests.memByte<=memory.used.bytes<=kubernetes.pod.resourceLimits.memByte

                        Kubernetes Health Overview

                        Provides a comprehensive overview of the performance of the entire Kubernetes environment, broken down by various logical entities and underlying resource availability and usage. This dashboard breaks down resource and performance kube-state-metrics by the logical Kubernetes entities, such as pods, namespaces, deployments, and replicaSets, containers, and so on.

                        Kubernetes Service Health

                        Displays the count, resource usage, performance, and limitations of services running in the Kuberenetes environment. The dashboard provides and overview of what resources each service is using, their response times, the container and request counts, and how the response times measure up against the resource utilization.

                        Kubernetes Workloads CPU Usage and Allocation

                        Displays resource utilization of your workloads. This dashboard helps you review the CPU usage of your workloads, making sure that the CPU is properly allocated in the Kubernetes environment. All the numbers in this dashboard are expressed in CPU cores.

                        Kubernetes Workloads Memory Usage and Allocation

                        Helps you review the memory usage of your workloads, making sure that the memory is properly allocated in the Kubernetes environment.

                        Kubernetes Service Golden Signals

                        Highlights the latency, traffic, errors, and saturation in your Kubernetes environment.

                        8 -

                        Marathon Dashboards

                        DashboardDescription
                        ApplicationsDisplays the container count and resource usage.
                        OverviewHighlights the overall performance of marathon application. The dashboard provides container count, top resource-consuming containers and file system, request count by application, and so on.
                        GroupsDisplays the container count and resource usage in each group.

                        9 -

                        Mesos Dashboards

                        DashboardDescription
                        FrameworksHighlights container count and resource consumption.
                        OverviewProvides container count, top resource-consuming containers and file system, request count within defined scope.
                        TasksShows the resource usage and performance of Mesos tasks.

                        10 -

                        Platform Application & Troubleshooting Dashboards

                        DashboardDescription
                        Application Status&OverviewUnderstand the status of applications (workloads) running in a cluster by monitoring performance, pod health, and resource usage.
                        Pod Status&OverviewMonitor the health, resource usage, and network statistics for pods running as part of workloads.
                        Container Resource Usage&Troubleshooting Understand the performance of the different containers running in pods across your infrastructure and identify any that are behaving anomalously.
                        Node Status&OverviewMonitor the health, resource usage, and network statistics for nodes running in clusters.

                        11 -

                        Troubleshooting Dashboards

                        Dashboard

                        Description

                        Use Cases

                        Process Resource

                        Highlights the resource consumption for processes (for example, httpd, java, and ntpd).

                        • Identify the top consuming processes in an environment where the same process is spawned multiple times.

                        • Monitor this view to identify which processes are using disproportionate amounts of resources.

                        • Use to spot which collections may benefit from query and index performance tuning.

                        • Identify the most requested, highest traffic producing or slowest processing queries.

                        • Determine whether performance can be improved.

                        MongoDB Troubleshooting

                        Displays the performance of the MongoDB cluster. This view shows how busy the MongoDB service is, which collections are in highest demand and which have the slowest performance

                        Network Connections Table

                        Displays a full list of the environment’s local and remote endpoints, and all network traffic resource statistics relevant to those endpoints.

                        SQL Troubleshooting

                        Shows the top SQL queries by displaying metrics for the number of queries received and the amount of traffic sent and received for the query.

                        Top Processes

                        Lists the top processes running on the Kubernetes environment. Displays resource usage statistics, including CPU, file bytes, memory, and network bytes, for the top processes running within the defined scope.