This the multi-page printable view of this section. Click here to print.

Return to the regular view of this page.

  • 1:
    • 2:
      • 3:
        • 4:
          • 5:
            • 6:
              • 7:
                • 8:
                  • 9:


                    Metrics are quantitative values or measures that can be grouped/divided by labels. Sysdig Monitor metrics are divided into two groups: default metrics (out-of-the-box metrics associated with the system, orchestrator, and network infrastructure), and custom metrics (JMX, StatsD, and multiple other integrated application metrics).

                    Sysdig automatically collects all types of metrics, and auto-labels them. Custom metrics can also have custom (user-defined) labels.

                    Out-of-the box, when an agent is deployed on a host, Sysdig Monitor automatically begins collecting and reporting on a wide array of metrics. The sections below describe how those metrics are conceptualized within the system.

                    In the sections, you can learn more also about the metrics types and the data aggregation techniques supported by Sysdig Monitor:

                    1 -

                    Grouping, Scoping, and Segmenting Metrics

                    Data aggregation and filtering in Sysdig Monitor are done through the use of assigned labels. The sections below explain how labels work, the ways they can be used, and how to work with groupings, scopes, and segments.


                    Labels are used to identify and differentiate characteristics of a metric, allowing them to be aggregated or filtered for Explore module views, dashboards, alerts, and captures. Labels can be used in different ways:

                    • To group infrastructure objects into logical hierarchies displayed on the Explore tab (called groupings). For more information, refer to Groupings .

                    • To split aggregated data into segments. For more information, refer to Segments.

                    There are two types of labels:

                    • Infrastructure labels

                    • Metric descriptor labels

                    Infrastructure Labels

                    Infrastructure labels are used to identify objects or entities within the infrastructure that a metric is associated with, including hosts, containers, and processes. An example label is shown below:

                    Sysdig Notation


                    Proemetheus Notation


                    The table below outlines what each part of the label represents:

                    Example Label ComponentDescription
                    kubernetesThe infrastructure type.
                    podThe object.
                    nameThe label key.

                    Infrastructure labels are obtained from the infrastructure (including from orchestrators, platforms, and the runtime processes), and Sysdig automatically builds a relationship model using the labels. This allows users to create logical hierarchical groupings to better aggregate the infrastructure objects in the Explore module.

                    For more information on groupings, refer to the Groupings.

                    Metric Descriptor Labels

                    Metric descriptor labels are custom descriptors or key-value pairs applied directly to metrics, obtained from integrations like StatsD, Prometheus, and JMX. Sysdig automatically collects custom metrics from these integrations, and parses the labels from them. Unlike infrastructure labels, these labels can be arbitrary, and do not necessarily map to any entity or object.

                    Metric descriptor labels can only be used for segmenting, not grouping or scoping.

                    An example metric descriptor label is shown below:

                    website_failedRequests:20|region='Asia', customer_ID='abc'

                    The table below outlines what each part of the label represents:

                    Example Label ComponentDescription
                    website_failedRequestsThe metric name.
                    20The metric value.
                    region=‘Asia’, customer_ID=‘abc’The metric descriptor labels. Multiple key-value pairs can be assigned using a comma separated list.

                    Sysdig recommends not using labels to store dimensions with high cardinalities (numerous different label values), such as user IDs, email addresses, URLs, or other unbounded sets of values. Each unique key-value label pair represents a new time series, which can dramatically increase the amount of data stored.


                    Groupings are hierarchical organizations of labels, allowing users to organize their infrastructure views on the Explore tab in a logical hierarchy. An example grouping is shown below:

                    The example above groups the infrastructure into four levels. This results in a tree view in the Explore module with four levels, with rows for each infrastructure object applicable to each level.

                    As each label is selected, Sysdig Monitor automatically filters out labels for the next selection that no longer fit the hierarchy, to ensure that only logical groupings are created.

                    The example below shows the logical hierarchy structure for Kubernetes:

                    • Clusters: Cluster > Namespace > Replicaset > Pod

                    • Namespace: Cluster > Namespace > HorizontalPodAutoscaler > Deployment > Pod

                    • Daemonsets : Cluster > Namespace > Daemonsets > Pod

                    • Services: Cluster > Namespace > Service > StatefulSet > Pod

                    • Job: Cluster > Namespace > Job > Pod

                    • ReplicationController: Cluster > Namespace > ReplicationController > Pod

                    The default groupings are immutable: They cannot be modified or deleted. However, you can make a copy of them that you can modify.

                    Unified Workload Labels

                    Sysdig provides the following labels to help improve your infrastructure organization and troubleshooting easier.

                    • kubernetes_workload_name: Displays all the Kubernetes workloads and indicates what type and name of workload resource (deployment, daemonSet, replicaSet, and so on) it is.

                    • kubernetes_workload_type: Indicates what type of workload resource (deployment, daemonSet, replicaSet, and so on) it is.

                    The availability of these labels also simplifies Groupings. You do not need different groupings for each type of deployment, instead, you have a single grouping for workloads.

                    The labels allow you to segment metrics, such as sysdig_host_cpu_cores_used_percent , by kubernetes_workload_name to see CPU cores usage for all the workloads, instead of having a separate query for segmenting by kubernetes_deployment_name, kubernetes_replicaSet_name , and so on.

                    Learn More


                    A scope is a collection of labels that are used to filter out or define the boundaries of a group of data points when creating dashboards, dashboard panels, alerts, and teams. An example scope is shown below:

                    In the example above, the scope is defined by two labels with operators and values defined. The table below defines each of the available operators.

                    isThe value matches the defined label value exactly.
                    is notThe value does not match the defined label value exactly.
                    inThe value is among the comma separated values entered.
                    not inThe value is not among the comma separated values entered.
                    containsThe label value contains the defined value.
                    does not containThe label value does not contain the defined value.

                    The scope editor provides dynamic filtering capabilities. It restricts the scope of the selection for subsequent filters by rendering valid values that are specific to the previously selected label. Expand the list to view unfiltered suggestions. At run time, users can also supply custom values to achieve more granular filtering. The custom values are preserved. Note that changing a label higher up in the hierarchy might render the subsequent labels incompatible. For example, changing the kubernetes_namespace_name > kubernetes_deployment_name hierarchy to swarm_service_name > kubernetes_deployment_name is invalid as these entities belong to different orchestrators and cannot be logically grouped.

                    Dashboards and Panels

                    Dashboard scopes define the criteria for what metric data will be included in the dashboard’s panels. The current dashboard’s scope can be seen at the top of the dashboard:

                    By default, all dashboard panels abide by the scope of the overall dashboard. However, an individual panel scope can be configured for a different scope than the rest of the dashboard.

                    For more information on Dashboards and Panels, refer to the Dashboards documentation.


                    Alert scopes are defined during the creation process, and specify what areas within the infrastructure the alert is applicable for. In the example alerts below, the first alert has a scope defined, whereas the second alert does not have a custom scope defined. If no scope is defined, the alert is applicable to the entire infrastructure.

                    For more information on Alerts, refer to the Alerts documentation.


                    A team’s scope determines the highest level of data that team members have visibility for:

                    • If a team’s scope is set to Host, team members can see all host-level and container-level information.

                    • If a team’s scope is set to Container, team members can only see container-level information.

                    A team’s scope only applies to that team. Users that are members of multiple teams may have different visibility depending on which team is active.

                    For more information on teams and configuring team scope, refer to the Manage Teams and Roles documentation.


                    Aggregated data can be split into smaller sections by segmenting the data with labels. This allows for the creation of multi-series comparisons and multiple alerts. In the first image, the metric is not segmented:

                    In the second image, the same metric has been segmented by container_id:

                    Line and Area panels can display any number of segments for any given metric. The example image below displays the sysdig_connection_net_in_bytes metric segmented by both container_id and host_hostname:

                    For more information regarding segmentation in dashboard panels, refer to the Configure Panels documentation. For more information regarding configuring alerts, refer to the Alerts documentation.

                    The Meaning of n/a

                    Sysdig Monitor imports data related to entities such as hosts, containers, processes, and so on, and reports them in tables or panels on the Explore and Dashboards UI, as well as in events, so across the UI you see varieties of data. The term n/a can appear anywhere on the UI where some form of data is displayed.

                    n/a is a term that indicates data that is not available or that it does not apply to a particular instance. In Sysdig parlance, the term signifies one or more entities defined by a particular label, such as hostname or Kubernetes service, for which the label is invalid. In other words, n/a collectively represent entities whose metadata is not relevant to aggregation and filtering techniques—Grouping, Scoping, and Segmenting. For instance, a list of Kubernetes services might display the list of all the services as well as n/a that includes all the containers without the metadata describing a Kubernetes service.

                    You might encounter n/a sporadically in Explore UI as well as in drill-down panels or dashboards, events, and likely elsewhere on the Sysdig Monitor UI when no relevant metadata is available for that particular display. How n/a should be treated depends on the nature of your deployment. The deployment will not be affected by the entities marked n/a.

                    The following are some of the cases that yield n/a on the UI:

                    • Labels are partially available or not available. For example, a host has entities that are not associated with a monitored Kubernetes deployment, or a monitored host has an unmonitored Kubernetes deployment running.

                    • Labels that do not apply to the grouping criteria or at the hierarchy level. For example:

                      • Containers that are not managed by Kubernetes. The containers managed by Kubernetes are identified with their  container_name labels.

                      • In certain groupings by DaemonSet, Deployments render N/A and vice versa. Not all containers belong to both DaemonSet and Deployment objects concurrently. Likewise, a Kubernetes ReplicaSet grouping with the kubernetes_replicaset_name label will not show StatefulSets.

                      • In a kubernetes_cluster_name > kubernetes_namespace_name > kubernetes_deployment_name  grouping, the entities without the kubernetes_cluster_name label yield n/a.

                    • Entities are incorrectly labeled in the infrastructure.

                    • Kubernetes features that are yet to be in sync with Sysdig Monitoring.

                    • The format is not applicable to a particular record in the database.

                    2 -

                    Understanding Default, Custom, and Missing Metrics

                    Default Metrics

                    Default metrics include various kinds of metadata which Sysdig Monitor automatically knows how to label, segment, and display.

                    For example:

                    • System metrics for hosts, containers, and processes (CPU used, etc.)

                    • Orchestrator metrics (collected from Kubernetes, Mesos, etc.)

                    • Network metrics (e.g. network traffic)

                    • HTTP

                    • Platform metrics (in some cases)

                    Default metrics are collected mainly from two sources: syscalls and Kubernetes.

                    Custom Metrics

                    About Custom Metrics

                    Custom metrics generally refer to any metrics that the Sysdig Agent collects from some third-party integration. The type of infrastructure and applications integrated determine the custom metrics that the Agent collects and reports to Sysdig Monitor. The supported custom metrics are:

                    Each metric comes with a set of custom labels, and additional labels can be user-created. Sysdig Monitor simply collects and reports them with minimal or no internal processing. The limit currently enforced is 3000 metrics per host. Use the metrics_filter option in the dragent.yaml file to remove unwanted metrics or to choose the metrics to report when hosts exceed this limit. For more information on editing the dragent.yaml file, see Understanding the Agent Config Files.

                    Unit for Custom Metrics

                    Sysdig Monitor detects the default unit of custom metrics automatically with the delimiter suffix in the metrics name. For example, custom_expvar_time_seconds results in a base unit set to seconds. The supported base units are byte, percent, and time. Custom metrics name should carry one of the following delimiter suffixes in order for Sysdig Monitor to identify and configure the accurate unit type.

                    • second

                    • seconds

                    • byte

                    • bytes

                    • total (represents accumulating count)

                    • percent

                    Custom metrics will not be auto-detected and the unit will be incorrect unless this naming convention is followed. For instance, custom_byte_expvar will not yield the correct unit, that is MiB.

                    Editing the Unit Scale

                    You have the flexibility to change the unit scale either by editing the panel on the Dashboard or in the Explore.


                    From the Search Metrics and Dashboard drop-down, select the custom metrics you want to edit the unit selection for, then click More Options. Select the desired unit scale from the Metric Format drop-down and click Save.


                    Select the Dashboard Panel associated with the custom metrics you want to modify. Select the desired unit scale from the Metrics drop-down and click Save.

                    Display Missing Data

                    Data can be missing for a few different reasons:

                    • Problems such as faulty network connectivity in the communication channel between your infrastructure and Sysdig metrics store.

                    • Metrics or StatsD batch jobs are submitted sporadically.

                    Sysdig Monitor allows you to configure the behavior of missing data in Dashboards. Though metric type determines the default behavior, you can configure how to visualize missing data and define it at the per-query level. Use the No Data Display drop-down in the Options menu in the panel configuration, and the No Data Message text box under the Panel tab. See Create a New Panel for more information.

                    Consider the following guidelines:

                    • Use the No Data Message text box under the Panel tab to enter a custom message when no data is available to render on the panels. This custom message, which could include links in markdown format and line breaks, is shown when queries return no data and reports no errors.

                    • The No Data Display drop-down has only two options for the Stacked Area timechart: gap and show as zero.

                    • For form-based timechart panels, the default option for a metrics selection that does not contain a StatsD metric is gap.

                    • Adding a StatsD metric to a query in a form-based timechart panel will default the selected No Data Display type to the show as zero , which is the default option for form-based StatsD metrics. You can change this selection to any other type.

                    • The default display option is gap for PromQL Timechart panels.

                    The options for No Data Display are:

                    • gap: The default option for form-based timechart panel, where a query metrics selection does not contain a StatsD metric. gap is the best visualization type for most use cases because it is easy to spot indicating a problem.

                    • show as zero: The best option for StatsD metrics which are only submitted sporadically. For example, batch jobs and count of errors. This is the default display option for StatsD metrics in form-based panels.

                      We do not recommend this option as setting zero could be misleading. For example, this setting will report the value for free disk space as 0% when the disk or host disappears, but in reality, the value is unknown.

                      Prometheus best practices recommend avoiding missing metrics.

                    • connect - solid: Use for measuring the value of a metric, typically a gauge, where you want to visualize the missing samples flattened.

                      The leftmost and rightmost visible data points can be connected as Sysdig does not perform the interpolation.

                    • connect - dotted: Use it for measuring the value of a metric, typically a gauge, where you want to visualize the missing samples flattened.

                      The leftmost and rightmost visible data points can be connected as Sysdig does not perform the interpolation.

                    3 -

                    Metric Limits

                    Sysdig ensures that you see the most relevant metric information relevant to your monitored environment. To achieve this, limits are enforced on the number of metrics that the datastore can store. Different limits apply to different metric types and agent versions.

                    The default metric limits per agent is different from the subscription limit imposed on custom time series entitlement. Your entitlement limits per agent could be lower than the metric limits. For more information, see Time Series Billing.

                    View Metric Limits

                    The metric limits are automatically set by the Sysdig backend components based on your plan, agent version, and backend configuration.

                    Use the Sysdig Agent Health & Status dashboard under Host Infrastructure templates to view metric limit for your account and the current usage per host for each metric type.

                    The metric limits are exposed to the UI through the following agent metrics.

                    statsd_dragent_metricCount_limit_appCheckThe maximum number of unique appCheck timeseries that are allowed in an individual sample from the agent per node.
                    statsd_dragent_metricCount_limit_statsdThe maximum number of unique statsd timeseries that are allowed in an individual sample from the agent per node.
                    statsd_dragent_metricCount_limit_jmxThe maximum number of unique JMX timeseries that are allowed in an individual sample from the agent per node.
                    statsd_dragent_metricCount_limit_prometheusThe maximum number of unique Prometheus timeseries that are allowed in an individual sample from the agent per node.

                    Learn More

                    4 -

                    Sysdig Info Metrics

                    Sysdig provides Prometheus compatible Info metrics to show infrastructure (sysdig_*_info) and Kubernetes (kube_*_info) labels. The info metric are gauges with a value of 1 and will have the _info suffix .

                    For example, querying sysdig_host_info in PromQL Query will provide all labels associated with the host, such as:

                    • agent_id
                    • agent_tag_cluster
                    • host_hostname
                    • domain
                    • host
                    • host_domain
                    • host_mac
                    • instance_id

                    Although info metrics are available, all the metrics that are ingested by Sysdig agents are automatically enriched with the metadata and you don’t need to do PromQL joins. For more information, see Run PromQL Queries Faster with Extended Label Set

                    5 -

                    Manage Metric Scale

                    Sysdig provides several knobs for managing metric scale.

                    There are three primary ways in which you could include/exclude metrics, should you encounter unwanted metrics limits.

                    1. Include/exclude custom metrics by name filters.

                      See Include/Exclude Custom Metrics.

                    2. Include/exclude metrics emitted by certain containers, Kubernetes annotations, or any other container label at collection time.

                      See Prioritize/Include/Exclude Designated Containers.

                    3. Exclude metrics from unwanted ports.

                      See Blacklist Ports.

                    6 -

                    Data Aggregation

                    Sysdig Monitor allows users to adjust the aggregation settings when graphing or creating alerts for a metric, informing how Sysdig rolls up the available data samples in order to create the chart or evaluate the alert. There are two forms of aggregation used for metrics in Sysdig: time aggregation and group aggregation.

                    Time aggregation is always performed before group aggregation.

                    Time Aggregation

                    Time aggregation comes into effect in two overlapping situations:

                    • Charts can only render a limited number of data points. To look at a wide range of data, Sysdig Monitor may need to aggregate granular data into larger samples for visualization.

                    • Sysdig Monitor rolls up historical data over time.

                      Sysdig retains rollups based on each aggregation type, to allow users to choose which data points to utilize when evaluating older data.

                    Sysdig agents collect 1-second samples and report data at 10-second resolution. The data is stored and reported every 10-second with the available aggregations (average, rate, min, max, sum) to make them available via the Sysdig Monitor UI and the API. For time series charts covering five minutes or less, data points are drawn at this 10-second resolution, and any time aggregation selections will have no effect. When an amount of time greater than five minutes is displayed, data points are drawn as an aggregate for an appropriate time interval. For example, for a chart covering one hour, each data point would reflect a one minute interval.

                    At time intervals of one minute and above, charts can be configured to display different aggregates for the 10-second metrics used to calculate each datapoint.

                    Aggregation TypeDescription
                    averageThe average of the retrieved metric values across the time period.
                    rateThe average value of the metric across the time period evaluated.
                    maximumThe highest value during the time period evaluated.
                    minimumThe lowest value during the time period evaluated.
                    sumThe combined sum of the metric across the time period evaluated.

                    In the example images below, the kubernetes_deployment_replicas_available metrics first uses the average for time aggregation:

                    Then uses the sum for time aggregation:

                    • Rate and average are very similar and often provide the same result. However, the calculation of each is different.

                      • If time aggregation is set to one minute, the agent is supposed to retrieve six samples (one every 10 seconds).

                      • In some cases, samples may not be there, due to disconnections or other circumstances. For this example, four samples are available. If this was the case, the average would be calculated by dividing by four, while the rate would be calculated by dividing by six.

                    • Most metrics are sampled once for each time interval, resulting in average and rate returning the same value. However, there will be a distinction for any metrics not reported at every time interval. For example, some custom statsd metrics.

                    • Rate is currently referred to as timeAvg in the Sysdig Monitor API and advanced alerting language.

                    • By default, average is used when displaying data points for a time interval.

                    Group Aggregation

                    Metrics applied to a group of items (for example, several containers, hosts, or nodes) are averaged between the members of the group by default. For example, three hosts report different CPU usage for one sample interval. The three values will be averaged, and reported on the chart as a single datapoint for that metric.

                    There are several different types of group aggregation:

                    Aggregation TypeDescription
                    averageThe average value of the interval’s samples.
                    maximumThe maximum value of the interval’s samples.
                    minimumThe minimum value of the interval’s samples.
                    sumThe combined value of all of the interval’s samples.

                    If a chart or alert is segmented, the group aggregation settings will be utilized for both aggregations across the whole group, and aggregation within each individual segmentation.

                    For example, the image below shows a chart for CPU% across the infrastructure:

                    When segmented by proc_name, the chart shows one CPU% line for each process:

                    Each line provides the average value for every process with the same name. To see the difference, change the group aggregation type to sum:

                    The metric aggregation value showed beside the metric name is for the time aggregation. While the screenshot shows AVG, the group aggregation is set to SUM.

                    Aggregation Examples

                    The tables below provide an example of how each type of aggregation works. The first table provides the metric data, while the second displays the resulting value for each type of aggregation.

                    In the example below, the CPU% metric is applied to a group of servers called webserver. The first chart shows metrics using average aggregation for both time and group. The second chart shows the metrics using maximum aggregation for both time and group.

                    For each one minute interval, the second chart renders the highest CPU usage value found from the servers in the webserver group and from all of the samples reported during the one minute interval. This view can be useful when searching for transient spikes in metrics over long periods of time, that would otherwise be missed with average aggregation.

                    The group aggregation type is dependent on the segmentation. For a view showing metrics for a group of items, the current group aggregation setting will revert to the default setting, if the Segment By selection is changed.

                    7 -

                    Deprecated Metrics and Labels

                    Below is the list of metrics and labels that are discontinued with the introduction of new metric store. We made an effort to not deprecate any metrics or labels that are used in existing alerts, but in case you encounter any issues, contact Sysdig Support.

                    We have applied automatic mapping of all net.*.request.time.worst metrics to net.*.request.time, because the maximum aggregation gives equivalent results and it was almost exclusively used in combination with these metrics.

                    Deprecated Metrics

                    The following metrics are no longer supported.

                    • net.request.time.file
                    • net.request.time.file.percent
                    • net.request.time.local
                    • net.request.time.local.percent
                    • net.request.time.nextTiers
                    • net.request.time.nextTiers.percent
                    • net.request.time.processing
                    • net.request.time.processing.percent
                    • net.request.time.worst.out
                    • net.http.request.time.worst
                    • net.mongodb.request.time.worst
                    • net.sql.request.time.worst

                    Deprecated Labels

                    The following labels are no longer supported:

                    • net.connection.client
                    • net.connection.direction
                    • net.connection.endpoint.tcp
                    • net.connection.udp.inverted
                    • net.connection.errorCode
                    • net.connection.l4proto
                    • net.connection.server
                    • net.connection.state
                    • net.role
                    • cloudProvider.resource.endPoint
                    • host.container.mappings
                    • host.ip.all
                    • host.ip.private
                    • host.ip.public
                    • host.server.port
                    • host.isClientServer
                    • host.isInstrumented
                    • host.isInternal
                    • host.procList.main
                    • program.environment
                    • program.usernames
                    • mesos_cluster
                    • mesos_node
                    • mesos_pid

                    In addition to this list, the composite labels ending with ‘.label’ string will no longer be supported. For example kubernetes.service.label will be deprecated, but kubernetes.service.label.* labels are supported.

                    8 -

                    Troubleshooting Metrics

                    Troubleshooting metrics include program metrics, connection-level network metrics, and Kubernetes troubleshooting metrics. They are reported on a granular 10s level and are stored for 4 days. Below is the list of troubleshooting metrics and the labels that you can use to segment them.

                    Program Level Metrics

                    The following metrics are program metrics:

                    • sysdig_program_cpu_cores_used
                    • sysdig_program_cpu_cores_used_percent
                    • sysdig_program_cpu_used_percent
                    • sysdig_program_memory_used_bytes
                    • sysdig_program_net_in_bytes
                    • sysdig_program_net_out_bytes
                    • sysdig_program_net_connection_in_count
                    • sysdig_program_net_connection_out_count
                    • sysdig_program_net_connection_total_count
                    • sysdig_program_net_error_count
                    • sysdig_program_net_request_count
                    • sysdig_program_net_request_in_count
                    • sysdig_program_net_request_out_count
                    • sysdig_program_net_request_time
                    • sysdig_program_net_request_in_time
                    • sysdig_program_net_tcp_queue_len
                    • sysdig_program_proc_count
                    • sysdig_program_thread_count
                    • sysdig_program_up

                    In addition to the user-defined labels and standard set of labels Sysdig provides, you can use following labels to segment program metrics: program_cmd_line, program_name.

                    Connection-Level Network Metrics

                    The following metrics are connection metrics:

                    • sysdig_connection_net_in_bytes
                    • sysdig_connection_net_out_bytes
                    • sysdig_connection_net_total_bytes
                    • sysdig_connection_net_connection_in _count
                    • sysdig_connection_net_connection_out _count
                    • sysdig_connection_net_connection_total _count
                    • sysdig_connection_net_request_in_count
                    • sysdig_connection_net_request_out_count
                    • sysdig_connection_net_request_count
                    • sysdig_connection_net_request_in_time
                    • sysdig_connection_net_request_out_time
                    • sysdig_connection_net_request_time

                    In addition to the user-defined labels and standard set of labels Sysdig provides, you can use following labels to segment connection level metrics: net_local_service, net_remote_service, net_local_endpoint, net_remote_endpoint, net_client_ip, net_server_ip, net_protocol

                    Kubernetes Troubleshooting Metrics

                    The following metrics are Kubernetes troubleshooting metrics:

                    • kube_workload_status_replicas_misscheduled
                    • kube_workload_status_replicas_scheduled
                    • kube_workload_status_replicas_updated
                    • kube_pod_container_status_last_terminated_reason
                    • kube_pod_container_status_ready
                    • kube_pod_container_status_restarts_total
                    • kube_pod_container_status_running
                    • kube_pod_container_status_terminated
                    • kube_pod_container_status_terminated_reason
                    • kube_pod_container_status_waiting
                    • kube_pod_container_status_waiting_reason
                    • kube_pod_init_container_status_last_terminated_reason
                    • kube_pod_init_container_status_ready
                    • kube_pod_init_container_status_restarts_total
                    • kube_pod_init_container_status_running
                    • kube_pod_init_container_status_terminated
                    • kube_pod_init_container_status_terminated_reason
                    • kube_pod_init_container_status_waiting
                    • kube_pod_init_container_status_waiting_reason

                    9 -

                    Prometheus Metrics Types

                    Sysdig Monitor transforms Prometheus metrics into usable, actionable entries in two ways:

                    Calculated Metrics

                    The Prometheus metrics that are scraped by the Sysdig agent and transformed into the traditional StatsD model are called calculated metrics. In calculated metrics, the delta is stored with the previous value. This delta is what Sysdig uses on the classic backend for metrics analyzing and visualization. While generating the calculated metrics, the gauge metrics are kept as they are, but the counter metrics are transformed.

                    Prometheus calculated metrics cannot be used in PromQL.

                    The Histogram and Summary metrics are transformed into a different format called Prometheus histogram and summary metrics respectively. The transformations include:

                    • All of the quantiles are transformed into a different metric, with the quantile added as a suffix.

                    • The count and sum of these summary metrics are exposed as different metrics with names slightly changed. _ (underscore) in the name is replaced with a period .. For more information, see Mapping Classic Metrics and PromQL Metrics.

                    Prometheus calculated metrics (legacy metrics) are scheduled to be deprecated in the coming months.

                    Raw Metrics

                    In Sysdig parlance, the Prometheus metrics that are scraped (by the Sysdig agent), collected, sent, stored, visualized, and presented exactly as Prometheus exposes them are called raw metrics. Raw metrics are used with PromQL.

                    Sysdig counter is a StatsD type counter, where the difference in value is kept, but not the raw value of the counter, whereas Prometheus raw metrics are counters that are always monotonically increasing. A rate function needs to be applied on Prometheus raw metrics to make sense of it.

                    Time Aggregations Over Prometheus Metrics

                    The following time aggregations are supported for both the metric types:

                    • Average: Returns an average of a set of data points, keeping all the labels.

                    • Maximum and Minimum: Returns a maximal or minimal value, keeping all the labels.

                    • Sum: Returns a sum of the values of data points, keeping all the labels.

                    • Rate (timeAvg): Returns a sum of changes to the counter across data points in a given time period and divides by time, keeping all the labels as they are. For Prometheus raw metrics, timeAvg is calculated by taking the difference and dividing it by time.

                    Prometheus Calculated Metrics

                    Prometheus calculated metrics are treated as gauges by Sysdig, and there the following time aggregations are available:

                    • Average

                    • Sum

                    • Minimum

                    • Maximum

                    Rate (timeAvg) is not available because they are not applicable to gauge metrics.

                    Prometheus Raw Metrics

                    For the gauge type, the following types are available:

                    • Average

                    • Minimum

                    • Maximum

                    For the counter type, the following types are available:

                    • Rate: Calculates the first derivative of the counter (change over time).

                    • Sum: Calculates a complete change of the counter over a period of time.