This the multi-page printable view of this section. Click here to print.

Return to the regular view of this page.

Working with Metrics

Sysdig Monitor metrics are divided into two groups: default metrics (out-of-the-box metrics associated with the system, orchestrator, and network infrastructure), and custom metrics (JMX, StatsD, and multiple other integrated application metrics).

Sysdig automatically collects all types of metrics, and auto-labels them. Custom metrics can also have custom (user-defined) labels.

Out-of-the box, when an agent is deployed on a host, Sysdig Monitor automatically begins collecting and reporting on a wide array of metrics. The sections below describe how those metrics are conceptualized within the system.

1 - Types of Metrics

This topic introduces you to the types of metrics in Sysdig Monitor.

Default Metrics

Default metrics include various kinds of metadata which Sysdig Monitor automatically knows how to label, segment, and display.

For example:

  • System metrics for hosts, containers, and processes (CPU used, etc.)

  • Orchestrator metrics (collected from Kubernetes, Mesos, etc.)

  • Network metrics (e.g. network traffic)

  • HTTP

  • Platform metrics (in some cases)

Default metrics are collected mainly from two sources: syscalls and Kubernetes.

Custom Metrics

About Custom Metrics

Custom metrics generally refer to any metrics that the Sysdig Agent collects from some third-party integration. The type of infrastructure and applications integrated determine the custom metrics that the Agent collects and reports to Sysdig Monitor. The supported custom metrics are:

Each metric comes with a set of custom labels, and additional labels can be user-created. Sysdig Monitor simply collects and reports them with minimal or no internal processing. Use the metrics_filter option in the dragent.yaml file to remove unwanted metrics or to choose the metrics to report when hosts exceed this limit. For more information on editing the dragent.yaml file, see Understanding the Agent Config Files.

Unit for Custom Metrics

Sysdig Monitor detects the default unit of custom metrics automatically with the delimiter suffix in the metrics name. For example, custom_expvar_time_seconds results in a base unit set to seconds. The supported base units are byte, percent, and time. Custom metrics name should carry one of the following delimiter suffixes in order for Sysdig Monitor to identify and configure the accurate unit type.

  • second

  • seconds

  • byte

  • bytes

  • total (represents accumulating count)

  • percent

Custom metrics will not be auto-detected and the unit will be incorrect unless this naming convention is followed. For instance, custom_byte_expvar will not yield the correct unit, that is MiB.

Editing the Unit Scale

You have the flexibility to change the unit scale either by editing the panel on the Dashboard or in the Explore.

Explore

From the Search Metrics and Dashboard drop-down, select the custom metrics you want to edit the unit selection for, then click More Options. Select the desired unit scale from the Metric Format drop-down and click Save.

Dashboard

Select the Dashboard Panel associated with the custom metrics you want to modify. Select the desired unit scale from the Metrics drop-down and click Save.

Display Missing Data

Data can be missing for a few different reasons:

  • Problems such as faulty network connectivity in the communication channel between your infrastructure and Sysdig metrics store.

  • Metrics or StatsD batch jobs are submitted sporadically.

Sysdig Monitor allows you to configure the behavior of missing data in Dashboards. Though metric type determines the default behavior, you can configure how to visualize missing data and define it at the per-query level. Use the No Data Display drop-down in the Options menu in the panel configuration, and the No Data Message text box under the Panel tab. See Create a New Panel for more information.

Consider the following guidelines:

  • Use the No Data Message text box under the Panel tab to enter a custom message when no data is available to render on the panels. This custom message, which could include links in markdown format and line breaks, is shown when queries return no data and reports no errors.

  • The No Data Display drop-down has only two options for the Stacked Area timechart: gap and show as zero.

  • For form-based timechart panels, the default option for a metrics selection that does not contain a StatsD metric is gap.

  • Adding a StatsD metric to a query in a form-based timechart panel will default the selected No Data Display type to the show as zero , which is the default option for form-based StatsD metrics. You can change this selection to any other type.

  • The default display option is gap for PromQL Timechart panels.

The options for No Data Display are:

  • gap: The default option for form-based timechart panel, where a query metrics selection does not contain a StatsD metric. gap is the best visualization type for most use cases because it is easy to spot indicating a problem.

  • show as zero: The best option for StatsD metrics which are only submitted sporadically. For example, batch jobs and count of errors. This is the default display option for StatsD metrics in form-based panels.

    We do not recommend this option as setting zero could be misleading. For example, this setting will report the value for free disk space as 0% when the disk or host disappears, but in reality, the value is unknown.

Prometheus best practices recommend avoiding missing metrics.

  • connect - solid: Use for measuring the value of a metric, typically a gauge, where you want to visualize the missing samples flattened.

    The leftmost and rightmost visible data points can be connected as Sysdig does not perform the interpolation.

  • connect - dotted: Use it for measuring the value of a metric, typically a gauge, where you want to visualize the missing samples flattened.

    The leftmost and rightmost visible data points can be connected as Sysdig does not perform the interpolation.

2 - Using Labels

Data aggregation and filtering in Sysdig Monitor are done through the use of assigned labels. The sections below explain how labels work, the ways they can be used, and how to work with groupings, scopes, and segments.

Labels are used to identify and differentiate characteristics of a metric, allowing them to be aggregated or filtered for Explore module views, dashboards, alerts, and captures. Labels can be used in different ways:

  • To group infrastructure objects into logical hierarchies displayed on the Explore tab (called groupings). For more information, refer to Groupings.

  • To split aggregated data into segments. For more information, refer to Segments.

There are two types of labels:

  • Infrastructure labels

  • Metric descriptor labels

Infrastructure Labels

Infrastructure labels are used to identify objects or entities within the infrastructure that a metric is associated with, including hosts, containers, and processes. An example label is shown below:

Sysdig Notation

kubernetes.pod.name

Prometheus Notation

kubernetes_pod_name

The table below outlines what each part of the label represents:

Example Label ComponentDescription
kubernetesThe infrastructure type.
podThe object.
nameThe label key.

Infrastructure labels are obtained from the infrastructure (including from orchestrators, platforms, and the runtime processes), and Sysdig automatically builds a relationship model using the labels. This allows users to create logical hierarchical groupings to better aggregate the infrastructure objects in the Explore module.

For more information on groupings, refer to the Groupings.

Metric Descriptor Labels

Metric descriptor labels are custom descriptors or key-value pairs applied directly to metrics, obtained from integrations like StatsD, Prometheus, and JMX. Sysdig automatically collects custom metrics from these integrations, and parses the labels from them. Unlike infrastructure labels, these labels can be arbitrary, and do not necessarily map to any entity or object.

Metric descriptor labels can only be used for segmenting, not grouping or scoping.

An example metric descriptor label is shown below:

website_failedRequests:20|region='Asia', customer_ID='abc'

The table below outlines what each part of the label represents:

Example Label ComponentDescription
website_failedRequestsThe metric name.
20The metric value.
region=‘Asia’, customer_ID=‘abc’The metric descriptor labels. Multiple key-value pairs can be assigned using a comma separated list.

Sysdig recommends not using labels to store dimensions with high cardinalities (numerous different label values), such as user IDs, email addresses, URLs, or other unbounded sets of values. Each unique key-value label pair represents a new time series, which can dramatically increase the amount of data stored.

Groupings

Groupings are hierarchical organizations of labels, allowing users to organize their infrastructure views on the Explore tab in a logical hierarchy. An example grouping is shown below:

The example above groups the infrastructure into four levels. This results in a tree view in the Explore module with four levels, with rows for each infrastructure object applicable to each level.

As each label is selected, Sysdig Monitor automatically filters out labels for the next selection that no longer fit the hierarchy, to ensure that only logical groupings are created.

The example below shows the logical hierarchy structure for Kubernetes:

  • Clusters: Cluster > Namespace > Replicaset > Pod

  • Namespace: Cluster > Namespace > HorizontalPodAutoscaler > Deployment > Pod

  • Daemonsets : Cluster > Namespace > Daemonsets > Pod

  • Services: Cluster > Namespace > Service > StatefulSet > Pod

  • Job: Cluster > Namespace > Job > Pod

  • ReplicationController: Cluster > Namespace > ReplicationController > Pod

The default groupings are immutable: They cannot be modified or deleted. However, you can make a copy of them that you can modify.

Unified Workload Labels

Sysdig provides the following labels to help improve your infrastructure organization and troubleshooting easier.

  • kubernetes_workload_name: Displays all the Kubernetes workloads and indicates what type and name of workload resource (deployment, daemonSet, replicaSet, and so on) it is.

  • kubernetes_workload_type: Indicates what type of workload resource (deployment, daemonSet, replicaSet, and so on) it is.

The availability of these labels also simplifies Groupings. You do not need different groupings for each type of deployment, instead, you have a single grouping for workloads.

The labels allow you to segment metrics, such as sysdig_host_cpu_cores_used_percent , by kubernetes_workload_name to see CPU cores usage for all the workloads, instead of having a separate query for segmenting by kubernetes_deployment_name, kubernetes_replicaSet_name , and so on.

Learn More

Scopes

A scope is a collection of labels that are used to filter out or define the boundaries of a group of data points when creating dashboards, dashboard panels, alerts, and teams. An example scope is shown below:

In the example above, the scope is defined by two labels with operators and values defined. The table below defines each of the available operators.

OperatorDescription
isThe value matches the defined label value exactly.
is notThe value does not match the defined label value exactly.
inThe value is among the comma separated values entered.
not inThe value is not among the comma separated values entered.
containsThe label value contains the defined value.
does not containThe label value does not contain the defined value.
starts withThe label value starts with the defined value.

The scope editor provides dynamic filtering capabilities. It restricts the scope of the selection for subsequent filters by rendering valid values that are specific to the previously selected label. Expand the list to view unfiltered suggestions. At run time, users can also supply custom values to achieve more granular filtering. The custom values are preserved. Note that changing a label higher up in the hierarchy might render the subsequent labels incompatible. For example, changing the kubernetes_namespace_name > kubernetes_deployment_name hierarchy to swarm_service_name > kubernetes_deployment_name is invalid as these entities belong to different orchestrators and cannot be logically grouped.

Dashboards and Panels

Dashboard scopes define the criteria for what metric data will be included in the dashboard’s panels. The current dashboard’s scope can be seen at the top of the dashboard:

By default, all dashboard panels abide by the scope of the overall dashboard. However, an individual panel scope can be configured for a different scope than the rest of the dashboard.

For more information on Dashboards and Panels, refer to the Dashboards documentation.

Alerts

Alert scopes are defined during the creation process, and specify what areas within the infrastructure the alert is applicable for. In the example alerts below, the first alert has a scope defined, whereas the second alert does not have a custom scope defined. If no scope is defined, the alert is applicable to the entire infrastructure.

For more information on Alerts, refer to the Alerts documentation.

Teams

A team’s scope determines the highest level of data that team members have visibility for:

  • If a team’s scope is set to Host, team members can see all host-level and container-level information.

  • If a team’s scope is set to Container, team members can only see container-level information.

A team’s scope only applies to that team. Users that are members of multiple teams may have different visibility depending on which team is active.

For more information on teams and configuring team scope, refer to the Manage Teams and Roles documentation.

Segments

Aggregated data can be split into smaller sections by segmenting the data with labels. This allows for the creation of multi-series comparisons and multiple alerts. In the first image, the metric is not segmented:

In the second image, the same metric has been segmented by container_id:

Line and Area panels can display any number of segments for any given metric. The example image below displays the sysdig_connection_net_in_bytes metric segmented by both container_id and host_hostname:

For more information regarding segmentation in dashboard panels, refer to the Configure Panels documentation. For more information regarding configuring alerts, refer to the Alerts documentation.

The Meaning of n/a

Sysdig Monitor imports data related to entities such as hosts, containers, processes, and so on, and reports them in tables or panels on the Explore and Dashboards UI, as well as in events, so across the UI you see varieties of data. The term n/a can appear anywhere on the UI where some form of data is displayed.

n/a is a term that indicates data that is not available or that it does not apply to a particular instance. In Sysdig parlance, the term signifies one or more entities defined by a particular label, such as hostname or Kubernetes service, for which the label is invalid. In other words, n/a collectively represent entities whose metadata is not relevant to aggregation and filtering techniques—Grouping, Scoping, and Segmenting. For instance, a list of Kubernetes services might display the list of all the services as well as n/a that includes all the containers without the metadata describing a Kubernetes service.

You might encounter n/a sporadically in Explore UI as well as in drill-down panels or dashboards, events, and likely elsewhere on the Sysdig Monitor UI when no relevant metadata is available for that particular display. How n/a should be treated depends on the nature of your deployment. The deployment will not be affected by the entities marked n/a.

The following are some of the cases that yield n/a on the UI:

  • Labels are partially available or not available. For example, a host has entities that are not associated with a monitored Kubernetes deployment, or a monitored host has an unmonitored Kubernetes deployment running.

  • Labels that do not apply to the grouping criteria or at the hierarchy level. For example:

    • Containers that are not managed by Kubernetes. The containers managed by Kubernetes are identified with their  container_name labels.

    • In certain groupings by DaemonSet, Deployments render N/A and vice versa. Not all containers belong to both DaemonSet and Deployment objects concurrently. Likewise, a Kubernetes ReplicaSet grouping with the kubernetes_replicaset_name label will not show StatefulSets.

    • In a kubernetes_cluster_name > kubernetes_namespace_name > kubernetes_deployment_name  grouping, the entities without the kubernetes_cluster_name label yield n/a.

  • Entities are incorrectly labeled in the infrastructure.

  • Kubernetes features that are yet to be in sync with Sysdig Monitoring.

  • The format is not applicable to a particular record in the database.

3 - Data Aggregation

Sysdig Monitor allows you to adjust the aggregation settings when graphing or creating alerts for a metric, informing how Sysdig rolls up the available data samples in order to create the chart or evaluate the alert. This topic helps you familiarize with aggregation concepts and settings and explains some mechanic Sysdig uses to allow for efficient query performance and data retention.

Data Aggregation Concepts

Data Sampling

Sysdig agents collect 1-second samples and report data at 10-seconds resolution. It is the lowest resolution at which backend stores the data. In order to do so, the agent performs the downsampling from 1-second to 10-second samples.

Note: This is true for all the metrics, except Prometheus. For Prometheus metrics, data is sampled at every 1 second, but what is reported in the 10-seconds interval is the lastest value, not the downsample.

Samples are initially stored on the lowest supported resolution of 10-seconds, after which samples are being rolled up to higher downsampled timelines periodically, as new data arrives. For example, the data registered at every 10 seconds is rolled up in blocks of 1-minute interval, and the data stored in blocks of 1-minutes is being rolled up to 10-minutes blocks.

Downsampling

Downsampling refers to the process of aggregating multiple samples, on defined time interval, into set of values which can provide estimation for aggregated time ranges. In Sysdig parlance, downsampling is nothing but the data aggregation performed by the backend before exposing it as time aggregation on the UI or by the API. In effect, the data available for time aggregation during query evaluation is not the raw data, but the values that represent the estimated values for the given time range.

Reducing the amount of samples also help reduce data retention costs as well as improve query performances by reducing the amount of data loaded during query evaluation.

Downsampled data is used only for longer time ranges. If you are viewing most recent data, such as 10 minute or last 1 hour, raw data is used for evaluation.

Data Rollup

Sysdig Monitor rolls up historical data over time.

Sysdig downsampling produces data rollups of aggregated samples. In each data rollup, Sysdig calculates and records 4 values: maximum, minimum, sum, and count. These values allow for exposing the following time aggregations: max, min, sum, count, avg, rate, rateOfChange on the UI as well as by the APIs.

For example, the data collected every 10-seconds is aggregated and rolled up in blocks of 1-minute interval. From the recorded values in 1-minute rollups, data is rolled up again for a block of 10-minutes interval.

Data Resolution

Data resolution is the frequency with which the data is displayed. Sysdig Monitor supports the data resolution of 10 seconds, 1 minute, 10 minutes, 1 hour, and 1 day.

Time and Group Aggregations

There are two forms of aggregation used for metrics in Sysdig: time aggregation and group aggregation. Time aggregation is always performed before group aggregation.

Time Aggregation

Time aggregation comes into effect in two situations (that can sometimes overlap):

  • Aggregation: Graphs can only render a limited number of data points. To look at a wide range of data, Sysdig Monitor aggregate granular data into larger blocks of samples for visualization as given in Data Downsampling.
  • Data Rollup: Sysdig retains rollups based on each aggregation type to allow users to choose which data points to utilize when evaluating older data.

Aggregation Types

Aggregation TypeDescription
averageThe average of the retrieved metric values across the time period.
rateThe average value of the metric across the time period evaluated.
maximumThe highest value during the time period evaluated.
minimumThe lowest value during the time period evaluated.
sumThe combined sum of the metric across the time period evaluated.

Difference Between Rate and average

  • Rate and average are very similar and often provide the same result. However, the calculation of each is different.

    • If time aggregation is set to one minute, the agent is supposed to retrieve six samples (one every 10 seconds).

    • In some cases, samples may not be there, due to disconnections or other circumstances. For this example, four samples are available. If this was the case, the average would be calculated by dividing by four, while the rate would be calculated by dividing by six.

  • Most metrics are sampled once for each time interval, resulting in average and rate returning the same value. However, there will be a distinction for any metrics not reported at every time interval. For example, some custom statsd metrics.

  • Rate is currently referred to as timeAvg in the Sysdig Monitor API and advanced alerting language.

  • By default, average is used when displaying data points for a time interval.

Time Aggregation on the UI

On the Sysdig Monitor UI, you select the time aggregation from the Metric drop-down.

Depending on the time range you have selected, how old the data is, and what the resolution is , panels display data at a granularity of 10 seconds, 1 minute, 10 minute, 1 hour, and 1 day.

The data drawn at 10-second resolution is reported every 10-second with the available aggregations (average, rate, min, max, sum) to make them available via the Sysdig Monitor UI and the API. For time series panels covering 5 minutes or less, data points are drawn at this 10-second resolution, and any time aggregation selections will have no effect.

When a panel displays an amount of time greater than 5 minutes, data points are drawn as an aggregate for an appropriate time interval. For example, for a panel covering 1 hour, each data point would reflect a 1-minute interval.

At time intervals of 1-minute and above, charts can be configured to display different aggregates for the 10-second metrics used to calculate each datapoint.

Time Aggregation and Time Range Mapping on the UI

Aggregation IntervalTime Range
10-seconds10 Minutes
1-minute1 Hour
10-minutes6 Hours, 12 Hours
1-hour1 Day, 4 Day, 1 Week
1-day2 Weeks

Group Aggregation

Metrics applied to a group of items (for example, several containers, hosts, or nodes) are averaged between the members of the group by default. For example, three hosts report different CPU usage for one sample interval. The three values will be averaged, and reported on the chart as a single datapoint for that metric.

There are several different types of group aggregation:

Aggregation TypeDescription
averageThe average value of the interval’s samples.
maximumThe maximum value of the interval’s samples.
minimumThe minimum value of the interval’s samples.
sumThe sum of values of all of the interval’s samples.

If a chart or alert is segmented, the group aggregation settings will be utilized for both aggregations across the whole group, and aggregation within each individual segmentation.

For example, the image below shows a chart for CPU% across the infrastructure:

When segmented by proc_name, the chart shows one CPU% line for each process:

Each line provides the average value for every process with the same name. To see the difference, change the group aggregation type to sum:

The metric aggregation value showed beside the metric name is for the time aggregation. While the screenshot shows AVG, the group aggregation is set to SUM.

Aggregation Examples

The tables below provide an example of how each type of aggregation works. The first table provides the metric data, while the second displays the resulting value for each type of aggregation.

In the example below, the CPU% metric is applied to a group of servers called webserver. The first chart shows metrics using average aggregation for both time and group. The second chart shows the metrics using maximum aggregation for both time and group.

For each one minute interval, the second chart renders the highest CPU usage value found from the servers in the webserver group and from all of the samples reported during the one minute interval. This view can be useful when searching for transient spikes in metrics over long periods of time, that would otherwise be missed with average aggregation.

The group aggregation type is dependent on the segmentation. For a view showing metrics for a group of items, the current group aggregation setting will revert to the default setting, if the Segment By selection is changed.

4 - Metric Limits

Metric limits determine the amount of custom time series ingested by the Sysdig agent. While this is primarily a tool to help limit the total time series ingested for limiting cost exposure for each user, it does affect the total number of time series consumed and used in tracking metrics.

The Sysdig agent metric limit is different from the entitlement limit imposed on custom time series. Your time series entitlement could be lower than agent metric limits. For more information, see Time Series Billing.

View Metric Limits

The metric limits are automatically defined by Sysdig backend components based on your plan, agent version, and backend configuration. Metric limits are set per-category, and when aggregated the per-category limits define your overall metric limit per agent. Metric limits are global per account and the same limit will apply to each agent within a Sysdig account.

Use the Sysdig Agent Health & Status dashboard under Host Infrastructure templates to view per-category metric limits for your account, along with the current usage per host for each metric type.

Contact Sysdig Support to adjust metric limits for any category.

See the Sysdig Agent Health & Status dashboard to view the metric limits and current time series consumption for each agent.

MetricsDescription
statsd_dragent_metricCount_limit_appCheckThe maximum number of unique appCheck timeseries that are allowed in an individual sample from the agent per node.
statsd_dragent_metricCount_limit_statsdThe maximum number of unique statsd timeseries that are allowed in an individual sample from the agent per node.
statsd_dragent_metricCount_limit_jmxThe maximum number of unique JMX timeseries that are allowed in an individual sample from the agent per node.
statsd_dragent_metricCount_limit_prometheusThe maximum number of unique Prometheus timeseries that are allowed in an individual sample from the agent per node.

Learn More

5 - Manage Metric Scale

Sysdig provides several knobs for managing metric scale. This topic introduces you to the primary ways in which you could include/exclude metrics, should you encounter unwanted metrics limits.
  1. Include/exclude custom metrics by name filters.

    See Include/Exclude Custom Metrics.

  2. Include/exclude metrics emitted by certain containers, Kubernetes annotations, or any other container label at collection time. See Prioritize/Include/Exclude Designated Containers.

  3. Exclude metrics from unwanted ports. See Blacklist Ports.

6 - Metrics Library

The Sysdig metrics dictionary lists all the metrics, both in Sysdig legacy and Prometheus-compatible notation, supported by the Sysdig product suite, as well as kube state and cloud provider metrics. The Metrics Dictionary is a living document and is updated as new metrics are added to the product.

6.1 - Metrics and Labels Mapping

This topic outlines the mapping between the metrics and label naming conventions in the Sysdig legacy datastore and the new Sysdig datastore.

6.1.1 - Mapping Classic Metrics with Context-Specific PromQL Metrics

Sysdig classic metrics such as cpu.used.percent previously returned values from a process, container, or host depending on the query segmentation or scope. You can now use context-explicit metrics which aligns with the flat model and resource specific semantics of Prometheus naming schema. Your existing dashboards and alerts will be automatically migrated to the new naming convention.

Sysdig Classic MetricsContext-Specific Metrics in Prometheus Notation
cpu.cores.usedsysdig_container_cpu_cores_used
sysdig_host_cpu_cores_used
sysdig_program_cpu_cores_used
cpu.cores.used.percentsysdig_container_cpu_cores_used_percent
sysdig_host_cpu_cores_used_percent
sysdig_program_cpu_cores_used_percent
cpu.used.percentsysdig_container_cpu_used_percent
sysdig_host_cpu_used_percent
sysdig_program_cpu_used_percent
fd.used.percentsysdig_container_fd_used_percent
sysdig_host_fd_used_percent
sysdig_program_fd_used_percent
file.bytes.insysdig_container_file_in_bytes
sysdig_host_file_in_bytes
sysdig_program_file_in_bytes
file.bytes.outsysdig_container_file_out_bytes
sysdig_host_file_out_bytes
sysdig_program_file_out_bytes
file.bytes.totalsysdig_container_file_total_bytes
sysdig_host_file_total_bytes
sysdig_program_file_total_bytes
file.error.open.countsysdig_container_file_error_open_count
sysdig_host_file_error_open_count
sysdig_program_file_error_open_count
file.error.total.countsysdig_container_file_error_total_count
sysdig_host_file_error_total_count
sysdig_program_file_error_total_count
file.iops.insysdig_container_file_in_iops
sysdig_host_file_in_iops
sysdig_program_file_in_iops
file.iops.outsysdig_container_file_out_iops
sysdig_host_file_out_iops
sysdig_program_file_out_iops
file.iops.totalsysdig_container_file_total_iops
sysdig_host_file_total_iops
sysdig_program_file_total_iops
file.open.countsysdig_container_file_open_count
sysdig_host_file_open_count
sysdig_program_file_open_count
file.time.insysdig_container_file_in_time
sysdig_host_file_in_time
sysdig_program_file_in_time
file.time.outsysdig_container_file_out_time
sysdig_host_file_out_time
sysdig_program_file_out_time
file.time.totalsysdig_container_file_total_time
sysdig_host_file_total_time
sysdig_program_file_total_time
fs.bytes.freesysdig_container_fs_free_bytes
sysdig_fs_free_bytes
sysdig_host_fs_free_bytes
fs.bytes.totalsysdig_container_fs_total_bytes
sysdig_fs_total_bytes
sysdig_host_fs_total_bytes
fs.bytes.usedsysdig_container_fs_used_bytes
sysdig_fs_used_bytes
sysdig_host_fs_used_bytes
fs.free.percentsysdig_container_fs_free_percent
sysdig_fs_free_percent
sysdig_host_fs_free_percent
fs.inodes.total.countsysdig_container_fs_inodes_total_count
sysdig_fs_inodes_total_count
sysdig_host_fs_inodes_total_count
fs.inodes.used.countsysdig_container_fs_inodes_used_count
sysdig_fs_inodes_used_count
sysdig_host_fs_inodes_used_count
fs.inodes.used.percentsysdig_container_fs_inodes_used_percent
sysdig_fs_inodes_used_percent
sysdig_host_fs_inodes_used_percent
fs.largest.used.percentsysdig_container_fs_largest_used_percent
sysdig_host_fs_largest_used_percent
fs.root.used.percentsysdig_container_fs_root_used_percent
sysdig_host_fs_root_used_percent
fs.used.percentsysdig_container_fs_used_percent
sysdig_fs_used_percent
sysdig_host_fs_used_percent
host.error.countsysdig_container_syscall_error_count
sysdig_host_syscall_error_count
infosysdig_agent_info
sysdig_container_info
sysdig_host_info
memory.bytes.totalsysdig_host_memory_total_bytes
sysdig_container_memory_used_bytes
sysdig_host_memory_used_bytes
sysdig_program_memory_used_bytes
memory.bytes.virtualsysdig_container_memory_virtual_bytes
sysdig_host_memory_virtual_bytes
memory.swap.bytes.usedsysdig_container_memory_swap_used_bytes
sysdig_host_memory_swap_used_bytes
memory.used.percentsysdig_container_memory_used_percent
sysdig_host_memory_used_percent
net.bytes.insysdig_connection_net_in_bytes
sysdig_container_net_in_bytes
sysdig_host_net_in_bytes
sysdig_program_net_in_bytes
net.bytes.outsysdig_connection_net_out_bytes
sysdig_container_net_out_bytes
sysdig_host_net_out_bytes
sysdig_program_net_out_bytes
net.bytes.totalsysdig_connection_net_total_bytes
sysdig_container_net_total_bytes
sysdig_host_net_total_bytes
sysdig_program_net_total_bytes
net.connection.count.insysdig_connection_net_connection_in_count
sysdig_container_net_connection_in_count
sysdig_host_net_connection_in_count
sysdig_program_net_connection_in_count
net.connection.count.outsysdig_connection_net_connection_out_count
sysdig_container_net_connection_out_count
sysdig_host_net_connection_out_count
sysdig_program_net_connection_out_count
net.connection.count.totalsysdig_connection_net_connection_total_count
sysdig_container_net_connection_total_count
sysdig_host_net_connection_total_count
sysdig_program_net_connection_total_count
net.request.countsysdig_connection_net_request_count
sysdig_container_net_request_count
sysdig_host_net_request_count
sysdig_program_net_request_count
net.error.countsysdig_container_net_error_count
sysdig_host_net_error_count
sysdig_program_net_error_count
net.request.count.insysdig_connection_net_request_in_count
sysdig_container_net_request_in_count
sysdig_host_net_request_in_count
sysdig_program_net_request_in_count
net.request.count.outsysdig_connection_net_request_out_count
sysdig_container_net_request_out_count
sysdig_host_net_request_out_count
sysdig_program_net_request_out_count
net.request.timesysdig_connection_net_request_time
sysdig_container_net_request_time
sysdig_host_net_request_time
sysdig_program_net_request_time
net.request.time.insysdig_connection_net_request_in_time
sysdig_container_net_request_in_time
sysdig_host_net_request_in_time
sysdig_program_net_request_in_time
net.request.time.outsysdig_connection_net_request_out_time
sysdig_container_net_request_out_time
sysdig_host_net_request_out_time
sysdig_program_net_request_out_time
net.server.bytes.insysdig_container_net_server_in_bytes
sysdig_host_net_server_in_bytes
net.server.bytes.outsysdig_container_net_server_out_bytes
sysdig_host_net_server_out_bytes
net.server.bytes.totalsysdig_container_net_server_total_bytes
sysdig_host_net_server_total_bytes
net.sql.error.countsysdig_container_net_sql_error_count
sysdig_host_net_sql_error_count
net.sql.request.countsysdig_container_net_sql_request_count
sysdig_host_net_sql_request_count
net.tcp.queue.lensysdig_container_net_tcp_queue_len
sysdig_host_net_tcp_queue_len
sysdig_program_net_tcp_queue_len
proc.countsysdig_container_proc_count
sysdig_host_proc_count
sysdig_program_proc_count
thread.countsysdig_container_thread_count
sysdig_host_thread_count
sysdig_program_thread_count
uptimesysdig_container_up
sysdig_host_up
sysdig_program_up

6.1.2 - Mapping Classic Metrics with PromQL Metrics

Starting SaaS v 3.2.6, Sysdig classic metrics and labels have been renamed to be aligned with Prometheus naming convention. For example, Sysdig classic metrics have a dot-oriented hierarchy, whereas Prometheus has label-based metric organization. The table below helps you identify the Prometheus metrics and labels and the corresponding ones in the Sysdig classic system.

Entity

Type

PromQL Metric Name

Classic Metric Name

Label

Classic Label

host

info

sysdig_host_info

Not exposed

  • host_mac

  • host

  • instance_id

  • agent_tag_{*}

  • host.mac

  • host.hostName

  • host.instanceId

  • agent.tag.{*}

sysdig_cloud_provider_info

  • host_mac

  • provider_id

  • account_id

  • region

  • availability_zone

  • instance_type

  • tag_{*}

  • security_groups

  • host_ip_public

  • host_ip_private

  • host_name

  • name

  • host.mac

  • cloudProvider.id

  • cloudProvider.account.id

  • cloudProvider.region

  • cloudProvider.availabilityZone

  • cloudProvider.instance.type

  • cloudProvider.tag.{*}

  • cloudProvider.securityGroups

  • cloudProvider.host.ip.public

  • cloudProvider.host.ip.private

  • cloudProvider.host.name

  • cloudProvider.name

data

sysdig_host_cpu_used_percent

cpu.used.percent

  • host_mac

  • host

  • host.mac

  • host.hostname

sysdig_host_cpu_cores_used

cpu.cores.used

sysdig_host_cpu_user_percent

cpu.user.percent

sysdig_host_cpu_idle_percent

cpu.idle.percent

sysdig_host_cpu_iowait_percent

cpu.iowait.percent

sysdig_host_cpu_nice_percent

cpu.nice.percent

sysdig_host_cpu_stolen_percent

cpu.stolen.percent

sysdig_host_cpu_system_percent

cpu.system.percent

sysdig_host_fd_used_percent

fd.used.percent

sysdig_host_file_error_open_count

file.error.open.count

sysdig_host_file_error_total_count

file.error.total.count

sysdig_host_file_in_bytes

file.bytes.in

sysdig_host_file_in_iops

file.iops.in

sysdig_host_file_in_time

file.time.in

sysdig_host_file_open_count

file.open.count

sysdig_host_file_out_bytes

file.bytes.out

sysdig_host_file_out_iops

file.iops.out

sysdig_host_file_out_time

file.time.out

sysdig_host_load_average_15m

load.average.15m

sysdig_host_load_average_1m

load.average.1m

sysdig_host_load_average_5m

load.average.5m

sysdig_host_memory_available_bytes

memory.bytes.available

sysdig_host_memory_total_bytes

memory.bytes.total

sysdig_host_memory_used_bytes

memory.bytes.used

sysdig_host_memory_swap_available_bytes

memory.swap.bytes.available

sysdig_host_memory_swap_total_bytes

memory.swap.bytes.total

sysdig_host_memory_swap_used_bytes

memory.swap.bytes.used

sysdig_host_memory_virtual_bytes

memory.bytes.virtual

sysdig_host_net_connection_in_count

net.connection.count.in

sysdig_host_net_connection_out_count

net.connection.count.out

sysdig_host_net_error_count

net.error.count

sysdig_host_net_in_bytes

net.bytes.in

sysdig_host_net_out_bytes

net.bytes.out

sysdig_host_net_tcp_queue_len

net.tcp.queue.len

sysdig_host_proc_count

proc.count

sysdig_host_system_uptime

system.uptime

sysdig_host_thread_count

thread.count

container

info

sysdig_container_info

Not exposed

container_id

container_id

container_full_id

none

host_mac

host.mac

container

container.name

container_type

container.type

image

container.image

image_id

container.image.id

mesos_task_id

container.mesosTaskId

Only available in Mesos orchestrator.

cluster

kubernetes.cluster.name

Present only if the container is part of Kubernetes.

pod

kubernetes.pod.name

Present only if the container is part of Kubernetes

namespace

kubernetes.namespace.name

Present only if the container is part of Kubernetes.

data

sysdig_container_cpu_used_percent

cpu.used.percent

  • host_mac

  • container_id

  • container_type

  • container

  • host.mac

  • container.id

  • container.type

  • container.name

sysdig_container_cpu_cores_used

cpu.cores.used

sysdig_container_cpu_cores_used_percent

cpu.cores.used.percent

sysdig_container_cpu_quota_used_percent

cpu.quota.used.percent

sysdig_container_cpu_shares

cpu.shares.count

sysdig_container_cpu_shares_used_percent

cpu.shares.used.percent

sysdig_container_fd_used_percent

fd.used.percent

sysdig_container_file_error_open_count

file.error.open.count

sysdig_container_file_error_total_count

file.error.total.count

sysdig_container_file_in_bytes

file.bytes.in

sysdig_container_file_in_iops

file.iops.in

sysdig_container_file_in_time

file.time.in

sysdig_container_file_open_count

file.open.count

sysdig_container_file_out_bytes

file.bytes.out

sysdig_container_file_out_iops

file.iops.out

sysdig_container_file_out_time

file.time.out

sysdig_container_memory_limit_bytes

memory.limit.bytes

sysdig_container_memory_limit_used_percent

memory.limit.used.percent

sysdig_container_memory_swap_available_bytes

memory.swap.bytes.available

sysdig_container_memory_swap_total_bytes

memory.swap.bytes.total

sysdig_container_memory_swap_used_bytes

memory.swap.bytes.used

sysdig_container_memory_used_bytes

memory.bytes.used

sysdig_container_memory_virtual_bytes

memory.bytes.virtual

sysdig_container_net_connection_in_count

net.connection.count.in

sysdig_container_net_connection_out_count

net.connection.count.out

sysdig_container_net_error_count

net.error.count

sysdig_container_net_in_bytes

net.bytes.in

sysdig_container_net_out_bytes

net.bytes.out

sysdig_container_net_tcp_queue_len

net.tcp.queue.len

sysdig_container_proc_count

proc.count

sysdig_container_swap_limit_bytes

swap.limit.bytes

sysdig_container_thread_count

thread.count

Process/ Program

Info

sysdig_program_info

not exposed

program

proc.name

cmd_line

proc.commandLine

host_mac

host.mac

container_id

container.id

container_type

container.type

data

sysdig_program_cpu_used_percent

cpu.used.percent

host_mac

host.mac

container_id

container.id

container_type

container.type

program

proc.name

cmd_line

proc.commandLine

sysdig_program_memory_used_bytes

memory.bytes.used

host_mac

host.mac

container_id

container.id

container_type

container.type

program

proc.name

cmd_line

proc.commandLine

sysdig_program_net_in_bytes

net.bytes.in

container_id

container.id

host_mac

host.mac

container_type

container.type

program

proc.name

cmd_line

proc.commandLine

sysdig_program_net_out_bytes

net.bytes.out

host_mac

host.mac

container_id

container.id

container_type

container.type

program

proc.name

cmd_line

proc.commandLine

sysdig_program_proc_count

proc.count

host_mac

host.mac

container_id

container.id

container_type

container.type

program

proc.name

cmd_line

proc.commandLine

sysdig_program_thread_count

thread.count

host_mac

host.mac

container_id

container.id

container_type

container.type

program

proc.name

cmd_line

proc.commandLine

fs

info

sysdig_fs_info

not exposed

host_mac

host.mac

container_id

container.id

container_type

container.type

device

fs.device

mount_dir

fs.mountDir

type

fs.type

data

sysdig_fs_free_bytes

fs.bytes.free

host_mac

host.mac

container_id

container.id

container_type

container.type

device

fs.device

sysdig_fs_inodes_total_count

fs.inodes.total.count

host_mac

host.mac

container_id

container.id

container_type

container.type

device

fs.device

sysdig_fs_inodes_used_count

fs.inodes.used.count

host_mac

host.mac

container_id

container.id

container_type

container.type

device

fs.device

sysdig_fs_total_bytes

fs.bytes.total

host_mac

host.mac

container_id

container.id

container_type

container.type

device

fs.device

fs.bytes.used

host_mac

host.mac

container_id

container.id

container_type

container.type

devide

fs.device

6.1.3 - Mapping Legacy Sysdig Kubernetes Metrics with Prometheus Metrics

Prometheus metrics, in Kubernetes parlance, are nothing but Kube State Metrics. These metrics are available in Sysdig PromQL and can be mapped to existing Sysdig Kubernetes metrics.

For descriptions on Kubernetes State Metrics, see Kubernetes State Metrics.

Resource

Sysdig Metrics

Kubernetes State Metrics

Label

Example / More Information

Pod

kubernetes.pod.containers.waiting

kube_pod_container_status_waiting

  • container=<container-name>

  • pod=<pod-name>

  • namespace=<pod-namespace>

kubernetes.pod.resourceLimits.cpuCores

kubernetes.pod.resourceLimits.memBytes

kube_pod_container_resource_limits

kube_pod_sysdig_resource_limits_memory_bytes

kube_pod_sysdig_resource_limits_cpu_cores

  • resource=<resource-name>

  • unit=<resource-unit>

  • pod=<pod-name>

  • namespace=<pod-namespace>

  • node=< node-name>

{namespace="default",pod="pod0",container="pod1_con1",resource="cpu",unit="core"}

{namespace="default",pod="pod0",container="pod1_con1",resource="memory",unit="byte"}

kubernetes.pod.resourceRequests.cpuCores

kubernetes.pod.resourceRequests.memBytes

kube_pod_container_resource_requests

kube_pod_sysdig_resource_requests_cpu_cores

kube_pod_sysdig_resource_requests_memory_bytes

  • resource=<resource-name>

  • unit=<resource-unit>

  • container=<container-name>

  • pod=<pod-name>

  • namespace=<pod-namespace>

  • node=< node-name>

{namespace="default",pod="pod0",container="pod1_con1",resource="cpu",unit="core"}

{namespace="default",pod="pod0",container="pod1_con1",resource="memory",unit="byte"}

kubernetes.pod.status.ready

kube_pod_status_ready

  • pod=<pod-name>

  • namespace=<pod-namespace>

  • condition=<true|false|unknown>

kube_pod_info

  • pod=<pod-name>

  • namespace=<pod-namespace>

  • host_ip=<host-ip>

  • pod_ip=<pod-ip>

  • node=<node-name>

  • uid=<pod-uid>

{namespace="default",pod="pod0",host_ip="1.1.1.1",pod_ip="1.2.3.4",uid="abc-0",node="node1",created_by_kind="<none>",created_by_name="<none>",priority_class=""}

kube_pod_owner

  • pod=<pod-name>

  • namespace=<pod-namespace>

  • owner_kind=<owner kind>

  • owner_name=<owner name>

{namespace="default",pod="pod0",owner_kind="<none>",owner_name="<none>;",owner_is_controller="<none>"}

kube_pod_labels

  • pod=<pod-name>

  • namespace=<pod-namespace>

  • label_POD_LABEL=<POD_LABEL>

{namespace="default",pod="pod0", label_app="myApp"}

kube_pod_container_info

  • pod=<pod-name>

  • namespace=<pod-namespace>

  • container_id=<containerid>

{namespace="default",pod="pod0",container="container2",image="k8s.gcr.io/hyperkube2",image_id="docker://sha256:bbb",container_id="docker://cd456"}

node

kubernetes.node.allocatable.cpuCores

kube_node_status_allocatable_cpu_cores

  • node=<node-address>

  • resource=<resource-name>

  • unit=<resource-unit>

  • node=<node-address>

resource/unit have one of the values: (cpu, core); (memory, byte); (pods, integer). Sysdig currently supports only CPU, pods, and memory resources for kube_node_status_capacity metrics.

"# HELP kube_node_status_capacity The capacity for different resources of a node.
kube_node_status_capacity{node=""k8s-master"",resource=""hugepages_1Gi"",unit=""byte""} 0
kube_node_status_capacity{node=""k8s-master"",resource=""hugepages_2Mi"",unit=""byte""} 0
kube_node_status_capacity{node=""k8s-master"",resource=""memory"",unit=""byte""} 4.16342016e+09
kube_node_status_capacity{node=""k8s-master"",resource=""pods"",unit=""integer""} 110
kube_node_status_capacity{node=""k8s-node1"",resource=""pods"",unit=""integer""} 110
kube_node_status_capacity{node=""k8s-node1"",resource=""cpu"",unit=""core""} 2
kube_node_status_capacity{node=""k8s-node1"",resource=""hugepages_1Gi"",unit=""byte""} 0
kube_node_status_capacity{node=""k8s-node1"",resource=""hugepages_2Mi"",unit=""byte""} 0
kube_node_status_capacity{node=""k8s-node1"",resource=""memory"",unit=""byte""} 6.274154496e+09
kube_node_status_capacity{node=""k8s-node2"",resource=""hugepages_1Gi"",unit=""byte""} 0
kube_node_status_capacity{node=""k8s-node2"",resource=""hugepages_2Mi"",unit=""byte""} 0
kube_node_status_capacity{node=""k8s-node2"",resource=""memory"",unit=""byte""} 6.274154496e+09
kube_node_status_capacity{node=""k8s-node2"",resource=""pods"",unit=""integer""} 110
kube_node_status_capacity{node=""k8s-node2"",resource=""cpu"",unit=""core""} 2

kubernetes.node.allocatable.memBytes

kube_node_status_allocatable_memory_bytes

kubernetes.node.allocatable.pods

kube_node_status_allocatable_pods

kubernetes.node.capacity.cpuCores

kube_node_status_capacity_cpu_cores

  • node=<node-address>

  • resource=<resource-name>

  • unit=<resource-unit>

  • node=<node-address>

kubernetes.node.capacity.memBytes

kube_node_status_capacity_memory_bytes

kubernetes.node.capacity.pod

kube_node_status_capacity_pods

kubernetes.node.diskPressure

kube_node_status_condition

  • node=<node-address

  • condition=<node-condition>

  • status=<true|false|unknown>

kubernetes.node.memoryPressure

kubernetes.node.networkUnavailable

kubernetes.node.outOfDisk

kubernetes.node.ready

kubernetes.node.unschedulable

kube_node_spec_unschedulable

  • node=<node-address>

kube_node_info

  • node=<node-address>

kube_node_labels

  • node=<node-address>

  • label_NODE_LABEL=<NODE_LABEL>

Deployment

kubernetes.deployment.replicas.available

kube_deployment_status_replicas_available

  • deployment=<deployment-name>

  • namespace=<deployment-namespace>

kubernetes.deployment.replicas.desired

kube_deployment_spec_replicas

kubernetes.deployment.replicas.paused

kube_deployment_spec_paused

kubernetes.deployment.replicas.running

kube_deployment_status_replicas

kubernetes.deployment.replicas.unavailable

kube_deployment_status_replicas_unavailable

kubernetes.deployment.replicas.updated

kube_deployment_status_replicas_updated

kube_deployment_labels

job

kubernetes.job.completions

kube_job_spec_completions

  • job_name=<job-name>

  • namespace=<job-namespace>

kubernetes.job.numFailed

kube_job_failed

kubernetes.job.numSucceeded

kube_job_complete

kubernetes.job.parallelism

kube_job_spec_parallelism

kube_job_status_active

kube_job_info

kube_job_owner

  • job_name=<job-name>

  • namespace=<job-namespace>

  • owner_kind=<owner kind>

  • owner_name=<owner name>

kube_job_labels

  • job_name=<job-name>

  • namespace=<job-namespace>

  • label_job_label=<job_label>

daemonSet

kubernetes.daemonSet.pods.desired

kube_daemonset_status_desired_number_scheduled

  • daemonset=<daemonset-name>

  • namespace=<daemonset-namespace>

kubernetes.daemonSet.pods.misscheduled

kube_daemonset_status_number_misscheduled

kubernetes.daemonSet.pods.ready

kube_daemonset_status_number_ready

kubernetes.daemonSet.pods.scheduled

kube_daemonset_status_current_number_scheduled

kube_daemonset_labels

  • daemonset=<daemonset-name>

  • namespace=<daemonset-namespace>

  • label_daemonset_label=<daemonset_label>

replicaSet

kubernetes.replicaSet.replicas.fullyLabeled

kube_replicaset_status_fully_labeled_replicas

  • replicaset=<replicaset-name>

  • namespace=<replicaset-namespace>

kubernetes.replicaSet.replicas.ready

kube_replicaset_status_ready_replicas

kubernetes.replicaSet.replicas.running

kube_replicaset_status_replicas

kubernetes.replicaSet.replicas.desired

kube_replicaset_spec_replicas

kube_replicaset_owner

  • replicaset=<replicaset-name>

  • namespace=<replicaset-namespace>

  • owner_kind=<owner kind>

  • owner_name=<owner name>

kube_replicaset_labels

  • label_replicaset_label=<replicaset_label>

  • replicaset=<replicaset-name>

  • namespace=<replicaset-namespace>

statefulset

kubernetes.statefulset.replicas

kube_statefulset_replicas

  • statefulset=<statefulset-name>

  • namespace=<statefulset-namespace>

kubernetes.statefulset.status.replicas

kube_statefulset_status_replicas

kubernetes.statefulset.status.replicas.current

kube_statefulset_status_replicas_current

kubernetes.statefulset.status.replicas.ready

kube_statefulset_status_replicas_ready

kubernetes.statefulset.status.replicas.updated

kube_statefulset_status_replicas_updated

kube_statefulset_labels

hpa

kubernetes.hpa.replicas.min

kube_horizontalpodautoscaler_spec_min_replicas

  • hpa=<hpa-name>

  • namespace=<hpa-namespace>

kubernetes.hpa.replicas.max

kube_horizontalpodautoscaler_spec_max_replicas

kubernetes.hpa.replicas.current

kube_horizontalpodautoscaler_status_current_replicas

kubernetes.hpa.replicas.desired

kube_horizontalpodautoscaler_status_desired_replicas

kube_horizontalpodautoscaler_labels

resourcequota

kubernetes.resourcequota.configmaps.hard

kubernetes.resourcequota.configmaps.used

kubernetes.resourcequota.limits.cpu.hard

kubernetes.resourcequota.limits.cpu.used

kubernetes.resourcequota.limits.memory.hard

kubernetes.resourcequota.limits.memory.used

kubernetes.resourcequota.persistentvolumeclaims.hard

kubernetes.resourcequota.persistentvolumeclaims.used

kubernetes.resourcequota.cpu.hard

kubernetes.resourcequota.memory.hard

kubernetes.resourcequota.pods.hard

kubernetes.resourcequota.pods.used

kubernetes.resourcequota.replicationcontrollers.hard

kubernetes.resourcequota.replicationcontrollers.used

kubernetes.resourcequota.requests.cpu.hard

kubernetes.resourcequota.requests.cpu.used

kubernetes.resourcequota.requests.memory.hard

kubernetes.resourcequota.requests.memory.used

kubernetes.resourcequota.requests.storage.hard

kubernetes.resourcequota.requests.storage.used

kubernetes.resourcequota.resourcequotas.hard

kubernetes.resourcequota.resourcequotas.used

kubernetes.resourcequota.secrets.hard

kubernetes.resourcequota.secrets.used

kubernetes.resourcequota.services.hard

kubernetes.resourcequota.services.used

kubernetes.resourcequota.services.loadbalancers.hard

kubernetes.resourcequota.services.loadbalancers.used

kubernetes.resourcequota.services.nodeports.hard

kubernetes.resourcequota.services.nodeports.used

kube_resourcequota

  • resourcequota=<quota-name>

  • namespace=<namespace>

  • resource=<ResourceName>

  • type=<quota-type>

namespace

kube_namespace_labels

  • namespace=<namespace-name>

  • label_ns_label=<ns_label>

replicationcontroller

kubernetes.replicationcontroller.replicas.desired

kube_replicationcontroller_spec_replicase

  • replicationcontroller=<replicationcontroller-name>

  • namespace=<replicationcontroller-namespace>

kubernetes.replicationcontroller.replicas.running

kube_replicationcontroller_status_replicas

kube_replicationcontroller_status_fully_labeled_replicas

kube_replicationcontroller_status_ready_replicas

kube_replicationcontroller_status_available_replicas

kube_replicationcontroller_status_observed_generation

kube_replicationcontroller_metadata_generation

kube_replicationcontroller_created

kube_replicationcontroller_owner

  • replicationcontroller=<replicationcontroller-name>

  • namespace=<replicationcontroller-namespace>

  • owner_kind=<owner kind>

  • owner_name=<owner name>

service

kube_service_info

  • service=<service-name>

  • namespace=<service-namespace>

  • cluster_ip=<service cluster ip>

  • external_name=<service external name>

  • load_balancer_ip=<service load balancer ip>

kube_service_labels

  • service=<service-name>

  • namespace=<service-namespace>

  • label_service_label=<service_label>

persistentvolume

kubernetes.persistentvolume.storage

kube_persistentvolume_capacity_bytes

  • persistentvolume=<pv-name>

kube_persistentvolume_info

  • persistentvolume=<pv-name>

kube_persistentvolume_labels

  • persistentvolume=<pv-name>

  • namespace=<label_persistentvolume_label=<persistentvolume_label>

persistentvolumeclaim

kubernetes.persistentvolumeclaim.requests.storage

kube_persistentvolumeclaim_resource_requests_storage_bytes

  • namespace=<persistentvolumeclaim-namespace>

  • persistentvolumeclaim=<persistentvolumeclaim-name>

kube_persistentvolumeclaim_info

kube_persistentvolumeclaim_labels

  • persistentvolumeclaim=<persistentvolumeclaim-name>

  • namespace=<persistentvolumeclaim-namespace>

  • label_persistentvolumeclaim_label=<persistentvolumeclaim_label>

6.2 - Metrics and Labels in Prometheus Format

The Prometheus metrics library lists the metrics in Prometheus format supported by the Sysdig product suite, as well as kube state and cloud provider metrics.

The metrics listed in this section follows the statsd-compatible Sysdig naming convention. To see a mapping between Prometheus notation and Sysdig notation, see Metrics and Label Mapping.

Overview

Each metric in the dictionary has several pieces of metadata listed to provide greater context for how the metric can be used within Sysdig products. An example layout is displayed below:

Metric Name

Metric definition. For some metrics, the equation for how the value is determined is provided.

Metadata

Definition

Metric Type

Metric type determines whether the metric value is a counter metric or a gauge metric. Sysdig Monitor offers two Metric types:

Counter: The metric whose value keeps on increasing and is reliant on previous values. It helps you record how many times something has happened, for example, a user login.

Gauge: Represents a single numerical value that can arbitrarily fluctuate over time. Each value returns an instantaneous measurement, for example, CPU usage.

Value Type

The type of value the metric can have. The possible values are:

  • Percent (%)

  • Byte

  • Date

  • Double

  • Integer (int)

  • relativeTime

  • String

Segment By

The levels within the infrastructure that the metric can be segmented at:

  • Host

  • Container

  • Process

  • Kubernetes

  • Mesos

  • Swarm

  • CloudProvider

Default Time Aggregation

The default time aggregation format for the metric.

Available Time Aggregation Formats

The time aggregation formats the metric can be aggregated by:

  • Average (Avg)

  • Rate

  • Sum

  • Minimum (Min)

  • Maximum (Max)

Default Group Aggregation

The default group aggregation format for the metric.

Available Group Aggregation Formats

The group aggregation formats the metric can be aggregated by:

  • Average (Avg)

  • Sum

  • Minimum (Min)

  • Maximum (Max)

6.2.1 - Agent

sysdig_agent_info

Prometheus IDsysdig_agent_info
Legacy IDinfo
Metric Typegauge
Unitnumber
DescriptionThe metrics will always have the value of 1.
Additional Notes

sysdig_agent_timeseries_count_appcheck

Prometheus IDsysdig_agent_timeseries_count_appcheck
Legacy IDmetricCount.appCheck
Metric Typegauge
Unitnumber
DescriptionThe total number of time series received from appcheck integrations.
Additional Notes

sysdig_agent_timeseries_count_jmx

Prometheus IDsysdig_agent_timeseries_count_jmx
Legacy IDmetricCount.jmx
Metric Typegauge
Unitnumber
DescriptionThe total number of time series received from JMX integrations.
Additional Notes

sysdig_agent_timeseries_count_prometheus

Prometheus IDsysdig_agent_timeseries_count_prometheus
Legacy IDmetricCount.prometheus
Metric Typegauge
Unitnumber
DescriptionThe total number of time series received from Prometheus integrations.
Additional Notes

sysdig_agent_timeseries_count_statsd

Prometheus IDsysdig_agent_timeseries_count_statsd
Legacy IDmetricCount.statsd
Metric Typegauge
Unitnumber
DescriptionThe total number of time series received from StatsD integrations.
Additional Notes

6.2.2 - Containers

sysdig_container_count

Prometheus IDsysdig_container_count
Legacy IDcontainer.count
Metric Typegauge
Unitnumber
DescriptionThe count of the number of containers.
Additional NotesThis metric is perfect for dashboards and alerts. In particular, you can create alerts that notify you when you have too many (or too few) containers of a certain type in a certain group or node - try segmenting by container.image, .id or .name. See also: host.count.

sysdig_container_cpu_cgroup_used_percent

Prometheus IDsysdig_container_cpu_cgroup_used_percent
Legacy IDcpu.cgroup.used.percent
Metric Typegauge
Unitpercent
DescriptionThe percentage of a container’s cgroup limit that is actually used. This is the minimum usage for the underlying cgroup limits: cpuset.limit and quota.limit.
Additional Notes

sysdig_container_cpu_cores_cgroup_limit

Prometheus IDsysdig_container_cpu_cores_cgroup_limit
Legacy IDcpu.cores.cgroup.limit
Metric Typegauge
Unitnumber
DescriptionThe number of CPU cores assigned to a container. This is the minimum of the cgroup limits: cpuset.limit and quota.limit.
Additional Notes

sysdig_container_cpu_cores_quota_limit

Prometheus IDsysdig_container_cpu_cores_quota_limit
Legacy IDcpu.cores.quota.limit
Metric Typegauge
Unitnumber
DescriptionThe number of CPU cores assigned to a container. Technically, the container’s cgroup quota and period. This is a way of creating a CPU limit for a container.
Additional Notes

sysdig_container_cpu_cores_used

Prometheus IDsysdig_container_cpu_cores_used
Legacy IDcpu.cores.used
Metric Typegauge
Unitnumber
DescriptionThe CPU core usage of each container is obtained from cgroups, and is equal to the number of cores used by the container. For example, if a container uses two of an available four cores, the value of sysdig_container_cpu_cores_used will be two.
Additional Notes

sysdig_container_cpu_cores_used_percent

Prometheus IDsysdig_container_cpu_cores_used_percent
Legacy IDcpu.cores.used.percent
Metric Typegauge
Unitpercent
DescriptionThe CPU core usage percent for each container is obtained from cgroups, and is equal to the number of cores multiplied by 100. For example, if a container uses three cores, the value of sysdig_container_cpu_cores_used_percent would be 300%.
Additional Notes

sysdig_container_cpu_quota_used_percent

Prometheus IDsysdig_container_cpu_quota_used_percent
Legacy IDcpu.quota.used.percent
Metric Typegauge
Unitpercent
DescriptionThe percentage of a container’s CPU Quota that is actually used. CPU Quotas are a common way of creating a CPU limit for a container. CPU Quotas are based on a percentage of time - a container can only spend its quota of time on CPU cycles across a given time period (default period is 100ms). Note that, unlike CPU Shares, CPU Quota is a hard limit to the amount of CPU the container can use - so this metric, CPU Quota %, should not exceed 100%.
Additional Notes

sysdig_container_cpu_shares_count

Prometheus IDsysdig_container_cpu_shares_count
Legacy IDcpu.shares.count
Metric Typegauge
Unitnumber
DescriptionThe number of CPU shares assigned to a container (technically, the container’s cgroup) - this is a common way of creating a CPU limit for a container. CPU Shares represent a relative weight used by the kernel to distribute CPU cycles across different containers. The default value for a container is 1024. Each container receives its own allocation of CPU cycles, according to the ratio of it’s share count vs to the total number of shares claimed by all containers. For example, if you have three containers, each with 1024 shares, then each will recieve 1/3 of the CPU cycles. Note that this is not a hard limit: a container can consume more than its allocation, if the CPU has cycles that aren’t being consumed by the container they were originally allocated to.
Additional Notes

sysdig_container_cpu_shares_used_percent

Prometheus IDsysdig_container_cpu_shares_used_percent
Legacy IDcpu.shares.used.percent
Metric Typegauge
Unitpercent
DescriptionThe percentage of a container’s allocated CPU shares that are actually used. CPU Shares are a common way of creating a CPU limit for a container. CPU Shares represent a relative weight used by the kernel to distribute CPU cycles across different containers. The default value for a container is 1024. Each container receives its own allocation of CPU cycles, according to the ratio of it’s share count vs to the total number of shares claimed by all containers. For example, if you have three containers, each with 1024 shares, then each will recieve 1/3 of the CPU cycles. Note that this is not a hard limit: a container can consume more than its allocation, if the CPU has cycles that aren’t being consumed by the container they were originally allocated to - so this metric, CPU Shares %, can actually exceed 100%.
Additional Notes

sysdig_container_cpu_used_percent

Prometheus IDsysdig_container_cpu_used_percent
Legacy IDcpu.used.percent
Metric Typegauge
Unitpercent
DescriptionThe CPU usage for each container is obtained from cgroups, and normalized by dividing by the number of cores to determine an overall percentage. For example, if the environment contains six cores on a host, and the container or processes are assigned two cores, Sysdig will report CPU usage of 2/6 * 100% = 33.33%. This metric is calculated differently for hosts and processes.
Additional Notes

sysdig_container_fd_used_percent

Prometheus IDsysdig_container_fd_used_percent
Legacy IDfd.used.percent
Metric Typegauge
Unitpercent
DescriptionThe percentage of used file descriptors out of the maximum available.
Additional NotesUsually, when a process reaches its FD limit it will stop operating properly and possibly crash. As a consequence, this is a metric you want to monitor carefully, or even better use for alerts.

sysdig_container_file_error_open_count

Prometheus IDsysdig_container_file_error_open_count
Legacy IDfile.error.open.count
Metric Typecounter
Unitnumber
DescriptionThe number of errors in opening files.
Additional NotesBy default, this metric shows the total value for the selected scope. For instance, if you apply it to a group of machines, you will see the total value for the whole group. However, you can easily segment the metric to see it by host, process, container, and so on. Just use ‘Segment by’ in the UI.

sysdig_container_file_error_total_count

Prometheus IDsysdig_container_file_error_total_count
Legacy IDfile.error.total.count
Metric Typecounter
Unitnumber
DescriptionThe number of error caused by file access.
Additional NotesBy default, this metric shows the total value for the selected scope. For instance, if you apply it to a group of machines, you will see the total value for the whole group. However, you can easily segment the metric to see it by host, process, container, and so on. Just use ‘Segment by’ in the UI.

sysdig_container_file_in_bytes

Prometheus IDsysdig_container_file_in_bytes
Legacy IDfile.bytes.in
Metric Typecounter
Unitdata
DescriptionThe amount of bytes read from file.
Additional NotesBy default, this metric shows the total value for the selected scope. For instance, if you apply it to a group of machines, you will see the total value for the whole group. However, you can easily segment the metric to see it by host, process, container, and so on. Just use ‘Segment by’ in the UI.

sysdig_container_file_in_iops

Prometheus IDsysdig_container_file_in_iops
Legacy IDfile.iops.in
Metric Typecounter
Unitnumber
DescriptionThe number of file read operations per second.
Additional NotesThis is calculated by measuring the actual number of read and write requests made by a process. Therefore, it can differ from what other tools show, which is usually based on interpolating this value from the number of bytes read and written to the file system.

sysdig_container_file_in_time

Prometheus IDsysdig_container_file_in_time
Legacy IDfile.time.in
Metric Typecounter
Unittime
DescriptionThe time spent in file reading.
Additional NotesBy default, this metric shows the total value for the selected scope. For instance, if you apply it to a group of machines, you will see the total value for the whole group. However, you can easily segment the metric to see it by host, process, container, and so on. Just use ‘Segment by’ in the UI.

sysdig_container_file_open_count

Prometheus IDsysdig_container_file_open_count
Legacy IDfile.open.count
Metric Typecounter
Unitnumber
DescriptionThe number of time the file has been opened.
Additional Notes

sysdig_container_file_out_bytes

Prometheus IDsysdig_container_file_out_bytes
Legacy IDfile.bytes.out
Metric Typecounter
Unitdata
DescriptionThe number of of bytes written to file.
Additional NotesBy default, this metric shows the total value for the selected scope. For instance, if you apply it to a group of machines, you will see the total value for the whole group. However, you can easily segment the metric to see it by host, process, container, and so on. Just use ‘Segment by’ in the UI.

sysdig_container_file_out_iops

Prometheus IDsysdig_container_file_out_iops
Legacy IDfile.iops.out
Metric Typecounter
Unitnumber
DescriptionThe Number of file write operations per second.
Additional NotesThis is calculated by measuring the actual number of read and write requests made by a process. Therefore, it can differ from what other tools show, which is usually based on interpolating this value from the number of bytes read and written to the file system.

sysdig_container_file_out_time

Prometheus IDsysdig_container_file_out_time
Legacy IDfile.time.out
Metric Typecounter
Unittime
DescriptionThe time spent in file writing.
Additional NotesBy default, this metric shows the total value for the selected scope. For instance, if you apply it to a group of machines, you will see the total value for the whole group. However, you can easily segment the metric to see it by host, process, container, and so on. Just use ‘Segment by’ in the UI.

sysdig_container_file_total_bytes

Prometheus IDsysdig_container_file_total_bytes
Legacy IDfile.bytes.total
Metric Typecounter
Unitdata
DescriptionThe number of bytes read from and written to file.
Additional NotesBy default, this metric shows the total value for the selected scope. For instance, if you apply it to a group of machines, you will see the total value for the whole group. However, you can easily segment the metric to see it by host, process, container, and so on. Just use ‘Segment by’ in the UI.

sysdig_container_file_total_iops

Prometheus IDsysdig_container_file_total_iops
Legacy IDfile.iops.total
Metric Typecounter
Unitnumber
DescriptionThe number of read and write file operations per second.
Additional NotesThis is calculated by measuring the actual number of read and write requests made by a process. Therefore, it can differ from what other tools show, which is usually based on interpolating this value from the number of bytes read and written to the file system.

sysdig_container_file_total_time

Prometheus IDsysdig_container_file_total_time
Legacy IDfile.time.total
Metric Typecounter
Unittime
DescriptionThe time spent in file I/O.
Additional NotesBy default, this metric shows the total value for the selected scope. For instance, if you apply it to a group of machines, you will see the total value for the whole group. However, you can easily segment the metric to see it by host, process, container, and so on. Just use ‘Segment by’ in the UI.

sysdig_container_fs_free_bytes

Prometheus IDsysdig_container_fs_free_bytes
Legacy IDfs.bytes.free
Metric Typegauge
Unitdata
DescriptionThe available space in the filesystem.
Additional NotesContainer Filesystem metrics report data on filesystems mounted to containers. These are the most useful metrics for stateful containers which have dedicated file storage mounted. Use these metrics with appropriate scoping. Care should be taken when aggregating filesystem metrics to ensure that there is no “double counting” of filesystems that are mounted to multiple containers. Additionally, the metrics from overlay type file systems are generally not reported, so these metrics typically will not show the actual space consumed by a container.

sysdig_container_fs_free_percent

Prometheus IDsysdig_container_fs_free_percent
Legacy IDfs.free.percent
Metric Typegauge
Unitpercent
DescriptionThe percentage of free space in the filesystem.
Additional NotesContainer Filesystem metrics report data on filesystems mounted to containers. These are the most useful metrics for stateful containers which have dedicated file storage mounted. Use these metrics with appropriate scoping. Care should be taken when aggregating filesystem metrics to ensure that there is no “double counting” of filesystems that are mounted to multiple containers. Additionally, the metrics from overlay type file systems are generally not reported, so these metrics typically will not show the actual space consumed by a container.

sysdig_container_fs_inodes_total_count

Prometheus IDsysdig_container_fs_inodes_total_count
Legacy IDfs.inodes.total.count
Metric Typegauge
Unitnumber
DescriptionThe total number of inodes in the filesystem.
Additional NotesContainer Filesystem metrics report data on filesystems mounted to containers. These are the most useful metrics for stateful containers which have dedicated file storage mounted. Use these metrics with appropriate scoping. Care should be taken when aggregating filesystem metrics to ensure that there is no “double counting” of filesystems that are mounted to multiple containers. Additionally, the metrics from overlay type file systems are generally not reported, so these metrics typically will not show the actual space consumed by a container.

sysdig_container_fs_inodes_used_count

Prometheus IDsysdig_container_fs_inodes_used_count
Legacy IDfs.inodes.used.count
Metric Typegauge
Unitnumber
DescriptionThe number of inodes used in the filesystem.
Additional NotesContainer Filesystem metrics report data on filesystems mounted to containers. These are the most useful metrics for stateful containers which have dedicated file storage mounted. Use these metrics with appropriate scoping. Care should be taken when aggregating filesystem metrics to ensure that there is no “double counting” of filesystems that are mounted to multiple containers. Additionally, the metrics from overlay type file systems are generally not reported, so these metrics typically will not show the actual space consumed by a container.

sysdig_container_fs_inodes_used_percent

Prometheus IDsysdig_container_fs_inodes_used_percent
Legacy IDfs.inodes.used.percent
Metric Typegauge
Unitpercent
DescriptionThe percentage of inodes usage in the filesystem.
Additional NotesContainer Filesystem metrics report data on filesystems mounted to containers. These are the most useful metrics for stateful containers which have dedicated file storage mounted. Use these metrics with appropriate scoping. Care should be taken when aggregating filesystem metrics to ensure that there is no “double counting” of filesystems that are mounted to multiple containers. Additionally, the metrics from overlay type file systems are generally not reported, so these metrics typically will not show the actual space consumed by a container.

sysdig_container_fs_largest_used_percent

Prometheus IDsysdig_container_fs_largest_used_percent
Legacy IDfs.largest.used.percent
Metric Typegauge
Unitpercent
DescriptionThe percentage of the largest filesystem in use.
Additional NotesContainer Filesystem metrics report data on filesystems mounted to containers. These are the most useful metrics for stateful containers which have dedicated file storage mounted. Use these metrics with appropriate scoping. Care should be taken when aggregating filesystem metrics to ensure that there is no “double counting” of filesystems that are mounted to multiple containers. Additionally, the metrics from overlay type file systems are generally not reported, so these metrics typically will not show the actual space consumed by a container.

sysdig_container_fs_root_used_percent

Prometheus IDsysdig_container_fs_root_used_percent
Legacy IDfs.root.used.percent
Metric Typegauge
Unitpercent
DescriptionThe percentage of the root filesystem in use in the container.
Additional NotesContainer Filesystem metrics report data on filesystems mounted to containers. These are the most useful metrics for stateful containers which have dedicated file storage mounted. Use these metrics with appropriate scoping. Care should be taken when aggregating filesystem metrics to ensure that there is no “double counting” of filesystems that are mounted to multiple containers. Additionally, the metrics from overlay type file systems are generally not reported, so these metrics typically will not show the actual space consumed by a container.

sysdig_container_fs_total_bytes

Prometheus IDsysdig_container_fs_total_bytes
Legacy IDfs.bytes.total
Metric Typegauge
Unitdata
DescriptionThe size of container filesystem.
Additional NotesContainer Filesystem metrics report data on filesystems mounted to containers. These are the most useful metrics for stateful containers which have dedicated file storage mounted. Use these metrics with appropriate scoping. Care should be taken when aggregating filesystem metrics to ensure that there is no “double counting” of filesystems that are mounted to multiple containers. Additionally, the metrics from overlay type file systems are generally not reported, so these metrics typically will not show the actual space consumed by a container.

sysdig_container_fs_used_bytes

Prometheus IDsysdig_container_fs_used_bytes
Legacy IDfs.bytes.used
Metric Typegauge
Unitdata
DescriptionThe used space in the container filesystem.
Additional NotesContainer Filesystem metrics report data on filesystems mounted to containers. These are the most useful metrics for stateful containers which have dedicated file storage mounted. Use these metrics with appropriate scoping. Care should be taken when aggregating filesystem metrics to ensure that there is no “double counting” of filesystems that are mounted to multiple containers. Additionally, the metrics from overlay type file systems are generally not reported, so these metrics typically will not show the actual space consumed by a container.

sysdig_container_fs_used_percent

Prometheus IDsysdig_container_fs_used_percent
Legacy IDfs.used.percent
Metric Typegauge
Unitpercent
DescriptionThe percentage of the sum of all filesystems in use in the container.
Additional NotesContainer Filesystem metrics report data on filesystems mounted to containers. These are the most useful metrics for stateful containers which have dedicated file storage mounted. Use these metrics with appropriate scoping. Care should be taken when aggregating filesystem metrics to ensure that there is no “double counting” of filesystems that are mounted to multiple containers. Additionally, the metrics from overlay type file systems are generally not reported, so these metrics typically will not show the actual space consumed by a container.

sysdig_container_info

Prometheus IDsysdig_container_info
Legacy IDinfo
Metric Typegauge
Unitnumber
DescriptionThe info metrics will always have the value of 1.
Additional Notes

sysdig_container_memory_limit_bytes

Prometheus IDsysdig_container_memory_limit_bytes
Legacy IDmemory.limit.bytes
Metric Typegauge
Unitdata
DescriptionThe memory limit in bytes assigned to a container.
Additional Notes

sysdig_container_memory_limit_used_percent

Prometheus IDsysdig_container_memory_limit_used_percent
Legacy IDmemory.limit.used.percent
Metric Typegauge
Unitpercent
DescriptionThe percentage of memory limit used by a container.
Additional Notes

sysdig_container_memory_used_bytes

Prometheus IDsysdig_container_memory_used_bytes
Legacy IDmemory.bytes.used
Metric Typegauge
Unitdata
DescriptionThe amount of physical memory currently in use.
Additional NotesBy default, this metric shows the average value for the selected scope. For instance, if you apply it to a group of machines, you will see the average value for the whole group. However, the metric can also be segmented by using ‘Segment by’ in the UI.

sysdig_container_memory_used_percent

Prometheus IDsysdig_container_memory_used_percent
Legacy IDmemory.used.percent
Metric Typegauge
Unitpercent
DescriptionThe percentage of physical memory in use.
Additional NotesBy default, this metric shows the average value for the selected scope. For instance, if you apply it to a group of machines, you will see the average value for the whole group. However, you can easily segment the metric to see it by host, process, container, and so on. Just use ‘Segment by’ in the UI.

sysdig_container_memory_virtual_bytes

Prometheus IDsysdig_container_memory_virtual_bytes
Legacy IDmemory.bytes.virtual
Metric Typegauge
Unitdata
DescriptionThe virtual memory size of the process, in bytes. This value is obtained from Sysdig events.
Additional Notes

sysdig_container_net_connection_in_count

Prometheus IDsysdig_container_net_connection_in_count
Legacy IDnet.connection.count.in
Metric Typecounter
Unitnumber
DescriptionThe number of currently established client (inbound) connections.
Additional NotesThis metric is especially useful when segmented by protocol, port or process.

sysdig_container_net_connection_out_count

Prometheus IDsysdig_container_net_connection_out_count
Legacy IDnet.connection.count.out
Metric Typecounter
Unitnumber
DescriptionThe number of currently established server (outbound) connections.
Additional NotesThis metric is especially useful when segmented by protocol, port or process.

sysdig_container_net_connection_total_count

Prometheus IDsysdig_container_net_connection_total_count
Legacy IDnet.connection.count.total
Metric Typecounter
Unitnumber
DescriptionThe number of currently established connections. This value may exceed the sum of the inbound and outbound metrics since it represents client and server inter-host connections as well as internal only connections.
Additional NotesThis metric is especially useful when segmented by protocol, port or process.

sysdig_container_net_error_count

Prometheus IDsysdig_container_net_error_count
Legacy IDnet.error.count
Metric Typecounter
Unitnumber
DescriptionThe number of network errors.
Additional NotesBy default, this metric shows the total value for the selected scope. For instance, if you apply it to a group of machines, you will see the total value for the whole group. However, you can easily segment the metric to see it by host, process, container, and so on. Just use ‘Segment by’ in the UI.

sysdig_container_net_http_error_count

Prometheus IDsysdig_container_net_http_error_count
Legacy IDnet.http.error.count
Metric Typecounter
Unitnumber
DescriptionThe number of failed HTTP requests as counted from 4xx/5xx status codes.
Additional Notes

sysdig_container_net_http_request_count

Prometheus IDsysdig_container_net_http_request_count
Legacy IDnet.http.request.count
Metric Typecounter
Unitnumber
DescriptionThe count of HTTP requests.
Additional Notes

sysdig_container_net_http_request_time

Prometheus IDsysdig_container_net_http_request_time
Legacy IDnet.http.request.time
Metric Typecounter
Unittime
DescriptionThe average time taken for HTTP requests.
Additional Notes

sysdig_container_net_http_statuscode_error_count

Prometheus IDsysdig_container_net_http_statuscode_error_count
Legacy IDnet.http.statuscode.error.count
Metric Typecounter
Unitnumber
DescriptionThe number of HTTP error codes returned.
Additional Notes

sysdig_container_net_http_statuscode_request_count

Prometheus IDsysdig_container_net_http_statuscode_request_count
Legacy IDnet.http.statuscode.request.count
Metric Typecounter
Unitnumber
DescriptionThe number of HTTP status codes requests.
Additional Notes

sysdig_container_net_http_url_error_count

Prometheus IDsysdig_container_net_http_url_error_count
Legacy IDnet.http.url.error.count
Metric Typecounter
Unitnumber
Description
Additional Notes

sysdig_container_net_http_url_request_count

Prometheus IDsysdig_container_net_http_url_request_count
Legacy IDnet.http.url.request.count
Metric Typecounter
Unitnumber
DescriptionThe number of HTTP URLs requests.
Additional Notes

sysdig_container_net_http_url_request_time

Prometheus IDsysdig_container_net_http_url_request_time
Legacy IDnet.http.url.request.time
Metric Typecounter
Unittime
DescriptionThe time taken for requesting HTTP URLs.
Additional Notes

sysdig_container_net_in_bytes

Prometheus IDsysdig_container_net_in_bytes
Legacy IDnet.bytes.in
Metric Typecounter
Unitdata
DescriptionThe number of inbound network bytes.
Additional NotesBy default, this metric shows the total value for the selected scope. For instance, if you apply it to a group of machines, you will see the total value for the whole group. However, you can easily segment the metric to see it by host, process, container, and so on. Just use ‘Segment by’ in the UI.

sysdig_container_net_mongodb_error_count

Prometheus IDsysdig_container_net_mongodb_error_count
Legacy IDnet.mongodb.error.count
Metric Typecounter
Unitnumber
DescriptionThe number of Failed MongoDB requests.
Additional Notes

sysdig_container_net_mongodb_request_count

Prometheus IDsysdig_container_net_mongodb_request_count
Legacy IDnet.mongodb.request.count
Metric Typecounter
Unitnumber
DescriptionThe total number of MongoDB requests.
Additional Notes

sysdig_container_net_out_bytes

Prometheus IDsysdig_container_net_out_bytes
Legacy IDnet.bytes.out
Metric Typecounter
Unitdata
DescriptionThe number of outbound network bytes.
Additional NotesBy default, this metric shows the total value for the selected scope. For instance, if you apply it to a group of machines, you will see the total value for the whole group. However, you can easily segment the metric to see it by host, process, container, and so on. Just use ‘Segment by’ in the UI.

sysdig_container_net_request_count

Prometheus IDsysdig_container_net_request_count
Legacy IDnet.request.count
Metric Typecounter
Unitnumber
DescriptionThe total number of network requests. Note, this value may exceed the sum of inbound and outbound requests, because this count includes requests over internal connections.
Additional Notes

sysdig_container_net_request_in_count

Prometheus IDsysdig_container_net_request_in_count
Legacy IDnet.request.count.in
Metric Typecounter
Unitnumber
DescriptionThe number of inbound network requests.
Additional Notes

sysdig_container_net_request_in_time

Prometheus IDsysdig_container_net_request_in_time
Legacy IDnet.request.time.in
Metric Typecounter
Unittime
DescriptionThe average time to serve an inbound request.
Additional Notes

sysdig_container_net_request_out_count

Prometheus IDsysdig_container_net_request_out_count
Legacy IDnet.request.count.out
Metric Typecounter
Unitnumber
DescriptionThe number of outbound network requests.
Additional Notes

sysdig_container_net_request_out_time

Prometheus IDsysdig_container_net_request_out_time
Legacy IDnet.request.time.out
Metric Typecounter
Unittime
DescriptionThe average time spent waiting for an outbound request.
Additional Notes

sysdig_container_net_request_time

Prometheus IDsysdig_container_net_request_time
Legacy IDnet.request.time
Metric Typecounter
Unittime
DescriptionThe average time to serve a network request.
Additional Notes

sysdig_container_net_server_connection_in_count

Prometheus IDsysdig_container_net_server_connection_in_count
Legacy IDnet.server.connection.count.in
Metric Typecounter
Unitnumber
Description
Additional Notes

sysdig_container_net_server_in_bytes

Prometheus IDsysdig_container_net_server_in_bytes
Legacy IDnet.server.bytes.in
Metric Typecounter
Unitdata
Description
Additional Notes

sysdig_container_net_server_out_bytes

Prometheus IDsysdig_container_net_server_out_bytes
Legacy IDnet.server.bytes.out
Metric Typecounter
Unitdata
Description
Additional Notes

sysdig_container_net_server_total_bytes

Prometheus IDsysdig_container_net_server_total_bytes
Legacy IDnet.server.bytes.total
Metric Typecounter
Unitdata
Description
Additional Notes

sysdig_container_net_sql_error_count

Prometheus IDsysdig_container_net_sql_error_count
Legacy IDnet.sql.error.count
Metric Typecounter
Unitnumber
DescriptionThe number of failed SQL requests.
Additional Notes

sysdig_container_net_sql_query_error_count

Prometheus IDsysdig_container_net_sql_query_error_count
Legacy IDnet.sql.query.error.count
Metric Typecounter
Unitnumber
Description
Additional Notes

sysdig_container_net_sql_query_request_count

Prometheus IDsysdig_container_net_sql_query_request_count
Legacy IDnet.sql.query.request.count
Metric Typecounter
Unitnumber
Description
Additional Notes

sysdig_container_net_sql_query_request_time

Prometheus IDsysdig_container_net_sql_query_request_time
Legacy IDnet.sql.query.request.time
Metric Typecounter
Unittime
Description
Additional Notes

sysdig_container_net_sql_querytype_error_count

Prometheus IDsysdig_container_net_sql_querytype_error_count
Legacy IDnet.sql.querytype.error.count
Metric Typecounter
Unitnumber
Description
Additional Notes

sysdig_container_net_sql_querytype_request_count

Prometheus IDsysdig_container_net_sql_querytype_request_count
Legacy IDnet.sql.querytype.request.count
Metric Typecounter
Unitnumber
Description
Additional Notes

sysdig_container_net_sql_querytype_request_time

Prometheus IDsysdig_container_net_sql_querytype_request_time
Legacy IDnet.sql.querytype.request.time
Metric Typecounter
Unittime
Description
Additional Notes

sysdig_container_net_sql_request_count

Prometheus IDsysdig_container_net_sql_request_count
Legacy IDnet.sql.request.count
Metric Typecounter
Unitnumber
DescriptionThe number of SQL requests.
Additional Notes

sysdig_container_net_sql_request_time

Prometheus IDsysdig_container_net_sql_request_time
Legacy IDnet.sql.request.time
Metric Typecounter
Unittime
DescriptionThe average time to complete an SQL request.
Additional Notes

sysdig_container_net_sql_table_error_count

Prometheus IDsysdig_container_net_sql_table_error_count
Legacy IDnet.sql.table.error.count
Metric Typecounter
Unitnumber
DescriptionThe total number of SQL errors returned.
Additional Notes

sysdig_container_net_sql_table_request_count

Prometheus IDsysdig_container_net_sql_table_request_count
Legacy IDnet.sql.table.request.count
Metric Typecounter
Unitnumber
DescriptionThe total number of SQL table requests.
Additional Notes

sysdig_container_net_sql_table_request_time

Prometheus IDsysdig_container_net_sql_table_request_time
Legacy IDnet.sql.table.request.time
Metric Typecounter
Unittime
DescriptionThe average time to serve an SQL table request.
Additional Notes

sysdig_container_net_tcp_queue_len

Prometheus IDsysdig_container_net_tcp_queue_len
Legacy IDnet.tcp.queue.len
Metric Typecounter
Unitnumber
DescriptionThe length of the TCP request queue.
Additional Notes

sysdig_container_net_total_bytes

Prometheus IDsysdig_container_net_total_bytes
Legacy IDnet.bytes.total
Metric Typecounter
Unitdata
DescriptionThe total number of network bytes, including inbound and outbound connections.
Additional NotesBy default, this metric shows the total value for the selected scope. For instance, if you apply it to a group of machines, you will see the total value for the whole group. However, you can easily segment the metric to see it by host, process, container, and so on. Just use ‘Segment by’ in the UI.

sysdig_container_proc_count

Prometheus IDsysdig_container_proc_count
Legacy IDproc.count
Metric Typecounter
Unitnumber
DescriptionThe number of processes on host or container.
Additional Notes

sysdig_container_swap_limit_bytes

Prometheus IDsysdig_container_swap_limit_bytes
Legacy IDswap.limit.bytes
Metric Typegauge
Unitdata
DescriptionThe swap limit in bytes assigned to a container.
Additional Notes

sysdig_container_swap_limit_used_percent

Prometheus IDsysdig_container_swap_limit_used_percent
Legacy IDswap.limit.used.percent
Metric Typegauge
Unitpercent
DescriptionThe percentage of swap limit used by the container.
Additional Notes

sysdig_container_syscall_count

Prometheus IDsysdig_container_syscall_count
Legacy IDsyscall.count
Metric Typegauge
Unitnumber
DescriptionThe total number of syscalls seen.
Additional NotesSyscalls are resource intensive. This metric tracks how many have been made by a given process or container

sysdig_container_syscall_error_count

Prometheus IDsysdig_container_syscall_error_count
Legacy IDhost.error.count
Metric Typecounter
Unitnumber
DescriptionThe number of system call errors.
Additional NotesBy default, this metric shows the total value for the selected scope. For instance, if you apply it to a group of machines, you will see the total value for the whole group. However, you can easily segment the metric to see it by host, process, container, and so on. Just use ‘Segment by’ in the UI.

sysdig_container_thread_count

Prometheus IDsysdig_container_thread_count
Legacy IDthread.count
Metric Typecounter
Unitnumber
DescriptionThe number of threads running in a container.
Additional Notes

sysdig_container_timeseries_count_appcheck

Prometheus IDsysdig_container_timeseries_count_appcheck
Legacy IDmetricCount.appCheck
Metric Typegauge
Unitnumber
DescriptionThe number of appcheck custom metrics.
Additional Notes

sysdig_container_timeseries_count_jmx

Prometheus IDsysdig_container_timeseries_count_jmx
Legacy IDmetricCount.jmx
Metric Typegauge
Unitnumber
DescriptionThe number of JMX custom metrics.
Additional Notes

sysdig_container_timeseries_count_prometheus

Prometheus IDsysdig_container_timeseries_count_prometheus
Legacy IDmetricCount.prometheus
Metric Typegauge
Unitnumber
DescriptionThe number of Prometheus custom metrics.
Additional Notes

sysdig_container_timeseries_count_statsd

Prometheus IDsysdig_container_timeseries_count_statsd
Legacy IDmetricCount.statsd
Metric Typegauge
Unitnumber
DescriptionThe number of StatsD custom metrics.
Additional Notes

sysdig_container_up

Prometheus IDsysdig_container_up
Legacy IDuptime
Metric Typegauge
Unitnumber
DescriptionThe percentage of time the selected entity was down during the visualized time sample. This can be used to determine if a machine (or a group of machines) went down.
Additional Notes

6.2.3 - Metric Labels

_sysdig_datasource

Prometheus ID_sysdig_datasource
Legacy ID_sysdig_datasource
OSS KSM ID-
CategorySysdig
DescriptionIndicates the ingestion data source for the metric.
Additional Notes

agent_id

Prometheus IDagent_id
Legacy IDagent.id
OSS KSM ID-
CategoryAgent
DescriptionUnique agent id which sent the metric timeseries from the host
Additional Notes

agent_mode

Prometheus IDagent_mode
Legacy IDagent.mode
OSS KSM ID-
CategoryAgent
Description
Additional Notes

agent_version

Prometheus IDagent_version
Legacy IDagent.version
OSS KSM ID-
CategoryAgent
DescriptionThe Sysdig’s agent installed version
Additional Notes

cloud_provider_account_id

Prometheus IDcloud_provider_account_id
Legacy IDcloudProvider.account.id
OSS KSM ID-
CategoryCloud Provider
DescriptionThe account number related to your AWS account - useful when you have multiple AWS accounts linked with Sysdig Monitor.
Additional Notes

cloud_provider_availability_zone

Prometheus IDcloud_provider_availability_zone
Legacy IDcloudProvider.availabilityZone
OSS KSM ID-
CategoryCloud Provider
DescriptionThe AWS Availability Zone where the entity or entities are located. Each Availability zone is an isolated subsection of an AWS region (see cloudProvider.region).
Additional Notes

cloud_provider_host_ip_private

Prometheus IDcloud_provider_host_ip_private
Legacy IDcloudProvider.host.ip.private
OSS KSM ID-
CategoryCloud Provider
DescriptionThe private IP address allocated by the cloud provider for the instance. This address can be used for communication between instances in the same network.
Additional Notes

cloud_provider_host_ip_public

Prometheus IDcloud_provider_host_ip_public
Legacy IDcloudProvider.host.ip.public
OSS KSM ID-
CategoryCloud Provider
DescriptionPublic IP addresses of the selected host.
Additional Notes

cloud_provider_host_name

Prometheus IDcloud_provider_host_name
Legacy IDcloudProvider.host.name
OSS KSM ID-
CategoryCloud Provider
DescriptionThe name of the host as reported by the cloud provider (e.g. AWS).
Additional Notes

cloud_provider_id

Prometheus IDcloud_provider_id
Legacy IDcloudProvider.id
OSS KSM ID-
CategoryCloud Provider
DescriptionID number as assigned and reported by the cloud provider.
Additional Notes

cloud_provider_instance_type

Prometheus IDcloud_provider_instance_type
Legacy IDcloudProvider.instance.type
OSS KSM ID-
CategoryCloud Provider
DescriptionThe type of AWS instance.
Additional NotesThis metric is extremely useful to segment instances and compare their resource usage and saturation. You can use it as a grouping criteria for the explore table to quickly explore AWS usage on a per-instance-type basis. You can also use it to compare things like CPU usage, number of requests or network utilization for different instance types.

cloud_provider_name

Prometheus IDcloud_provider_name
Legacy IDcloudProvider.name
OSS KSM ID-
CategoryCloud Provider
DescriptionName of the cloud service provider (AWS, etc.).
Additional Notes

cloud_provider_region

Prometheus IDcloud_provider_region
Legacy IDcloudProvider.region
OSS KSM ID-
CategoryCloud Provider
DescriptionThe AWS region where the host (or group of hosts) is located.
Additional NotesUse this grouping criteria in conjunction with the host.count metric to easily create a report on how many instances you have in each region.

cloud_provider_resource_endpoint

Prometheus IDcloud_provider_resource_endpoint
Legacy IDcloudProvider.resource.endPoint
OSS KSM ID-
CategoryCloud Provider
DescriptionDNS name for which the resource can be accessed.
Additional Notes

cloud_provider_resource_name

Prometheus IDcloud_provider_resource_name
Legacy IDcloudProvider.resource.name
OSS KSM ID-
CategoryCloud Provider
DescriptionThe AWS service name (e.g. EC2, RDS, ELB).
Additional Notes

cloud_provider_resource_type

Prometheus IDcloud_provider_resource_type
Legacy IDcloudProvider.resource.type
OSS KSM ID-
CategoryCloud Provider
DescriptionThe service type (e.g. INSTANCE, LOAD_BALANCER, DATABASE).
Additional Notes

cloud_provider_security_groups

Prometheus IDcloud_provider_security_groups
Legacy IDcloudProvider.securityGroups
OSS KSM ID-
CategoryCloud Provider
DescriptionSecurity Groups Name.
Additional Notes

cloud_provider_status

Prometheus IDcloud_provider_status
Legacy IDcloudProvider.status
OSS KSM ID-
CategoryCloud Provider
DescriptionResource status.
Additional Notes

container_full_id

Prometheus IDcontainer_full_id
Legacy IDcontainer.full.id
OSS KSM ID-
CategoryContainer
DescriptionThe full UID of the running container as retrieved from the container runtime.
Additional Notes

container_id

Prometheus IDcontainer_id
Legacy IDcontainer.id
OSS KSM ID-
CategoryContainer
DescriptionThe short ID of the running container via truncating the full ID. In case of Docker, this is a 12 digit hex number.
Additional Notes

container_image

Prometheus IDcontainer_image
Legacy IDcontainer.image
OSS KSM ID-
CategoryContainer
DescriptionThe name of the image used to run the container.
Additional Notes

container_image_digest

Prometheus IDcontainer_image_digest
Legacy IDcontainer.image.digest
OSS KSM ID-
CategoryContainer
DescriptionThe digest of the image used to run the container.
Additional Notes

container_image_id

Prometheus IDcontainer_image_id
Legacy IDcontainer.image.id
OSS KSM ID-
CategoryContainer
DescriptionThe ID of the image used to run the container.
Additional Notes

container_image_repo

Prometheus IDcontainer_image_repo
Legacy IDcontainer.image.repo
OSS KSM ID-
CategoryContainer
DescriptionThe repo where the image used to run the container was retrieved from. Empty if image wasn’t retrieved from a remote repository.
Additional Notes

container_image_tag

Prometheus IDcontainer_image_tag
Legacy IDcontainer.image.tag
OSS KSM ID-
CategoryContainer
DescriptionThe tag of the image used to run the container.
Additional Notes

container_label_io_kubernetes_container_name

Prometheus IDcontainer_label_io_kubernetes_container_name
Legacy IDcontainer.label.io.kubernetes.container.name
OSS KSM ID-
CategoryContainer
DescriptionLabel set on the container in the container runtime when running in a Kubernetes environment. This label will match the container name set in the Kubernetes manifest for the Pod.
Additional Notes

container_label_io_kubernetes_pod_name

Prometheus IDcontainer_label_io_kubernetes_pod_name
Legacy IDcontainer.label.io.kubernetes.pod.name
OSS KSM ID-
CategoryContainer
DescriptionLabel set on the container in the container runtime when running in a Kubernetes environment. This label will match the Pod name set in the Kubernetes manifest for the Pod.
Additional Notes

container_label_io_kubernetes_pod_namespace

Prometheus IDcontainer_label_io_kubernetes_pod_namespace
Legacy IDcontainer.label.io.kubernetes.pod.namespace
OSS KSM ID-
CategoryContainer
DescriptionLabel set on the container in the container runtime when running in a Kubernetes environment. This label will match the Pod namespace set in the Kubernetes manifest for the Pod.
Additional Notes

container_label_io_prometheus_path

Prometheus IDcontainer_label_io_prometheus_path
Legacy IDcontainer.label.io.prometheus.path
OSS KSM ID-
CategoryContainer
Description
Additional Notes

container_label_io_prometheus_port

Prometheus IDcontainer_label_io_prometheus_port
Legacy IDcontainer.label.io.prometheus.port
OSS KSM ID-
CategoryContainer
Description
Additional Notes

container_label_io_prometheus_scrape

Prometheus IDcontainer_label_io_prometheus_scrape
Legacy IDcontainer.label.io.prometheus.scrape
OSS KSM ID-
CategoryContainer
Description
Additional Notes

container_name

Prometheus IDcontainer_name
Legacy IDcontainer.name
OSS KSM ID-
CategoryContainer
DescriptionThe name of a running container.
Additional Notes

container_type

Prometheus IDcontainer_type
Legacy IDcontainer.type
OSS KSM ID-
CategoryContainer
Description
Additional Notes

cpu_core

Prometheus IDcpu_core
Legacy IDcpu.core
OSS KSM ID-
CategoryHost
DescriptionCPU core number
Additional Notes

ecs_cluster_name

Prometheus IDecs_cluster_name
Legacy IDecs.clusterName
OSS KSM ID-
CategoryECS
DescriptionAmazon ECS cluster name
Additional Notes

ecs_service_name

Prometheus IDecs_service_name
Legacy IDecs.serviceName
OSS KSM ID-
CategoryECS
DescriptionAmazon ECS service name
Additional Notes

ecs_task_family_name

Prometheus IDecs_task_family_name
Legacy IDecs.taskFamilyName
OSS KSM ID-
CategoryECS
DescriptionAmazon ECS task family name
Additional Notes

file_mount

Prometheus IDfile_mount
Legacy IDfile.mount
OSS KSM ID-
CategoryFile Stats
DescriptionFile stats mount path
Additional Notes

file_name

Prometheus IDfile_name
Legacy IDfile.name
OSS KSM ID-
CategoryFile Stats
DescriptionFile stats file name including its path
Additional Notes

fs_device

Prometheus IDfs_device
Legacy IDfs.device
OSS KSM ID-
CategoryFile System
DescriptionFile system device name
Additional Notes

fs_mount_dir

Prometheus IDfs_mount_dir
Legacy IDfs.mountDir
OSS KSM ID-
CategoryFile System
DescriptionFile system mounted dir
Additional Notes

fs_type

Prometheus IDfs_type
Legacy IDfs.type
OSS KSM ID-
CategoryFile System
DescriptionFile system type (e.g. EXT, NTFS)
Additional Notes

host_domain

Prometheus IDhost_domain
Legacy IDhost.domain
OSS KSM ID-
CategoryHost
DescriptionThe domain name for external websites.
Additional NotesThis label has been deprecated.

host_hostname

Prometheus IDhost_hostname
Legacy IDhost.hostName
OSS KSM ID-
CategoryHost
DescriptionHost name as defined in the /etc/hostname file.
Additional Notes

host_instance_id

Prometheus IDhost_instance_id
Legacy IDhost.instanceId
OSS KSM ID-
CategoryHost
Description
Additional Notes

host_ip_private

Prometheus IDhost_ip_private
Legacy IDhost.ip.private
OSS KSM ID-
CategoryHost
DescriptionPrivate machine IP address.
Additional Notes

host_ip_public

Prometheus IDhost_ip_public
Legacy IDhost.ip.public
OSS KSM ID-
CategoryHost
DescriptionPublic machine IP address.
Additional Notes

host_mac

Prometheus IDhost_mac
Legacy IDhost.mac
OSS KSM ID-
CategoryHost
DescriptionMedia Access Control address of the host.
Additional Notes

kube_cluster_id

Prometheus IDkube_cluster_id
Legacy IDkubernetes.cluster.id
OSS KSM IDid
CategoryKubernetes
DescriptionUniquely identifying ID for a cluster
Additional NotesAs there is no concept of a cluster ID in Kubernetes, this label is populated with the UID of the “default” namespace in the cluster

kube_cluster_name

Prometheus IDkube_cluster_name
Legacy IDkubernetes.cluster.name
OSS KSM IDcluster
CategoryKubernetes
DescriptionUser-defined name for the cluster
Additional NotesThe cluster name is set by the user via the “k8s_cluster_name” configuration parameter in the Agent or by adding an Agent tag with a key called “cluster”. If the user doesn’t set it, this label will not exist.

concurrency_policy

Prometheus IDconcurrency_policy
Legacy IDkubernetes.cronjob.concurrencyPolicy
OSS KSM ID-
CategoryKubernetes
DescriptionSpecifies how to treat concurrent executions created by this Cron Job. Value can be “Allow”, “Forbid”, or “Replace”
Additional Notes

kube_cronjob_name

Prometheus IDkube_cronjob_name
Legacy IDkubernetes.cronjob.name
OSS KSM IDcronjob
CategoryKubernetes
DescriptionName of the Cron Job as retrieved from the API server.
Additional Notes

schedule

Prometheus IDschedule
Legacy IDkubernetes.cronjob.schedule
OSS KSM ID-
CategoryKubernetes
DescriptionThe scheduled time in which the Cron Job will run. Will be a Cron format string.
Additional Notes

kube_daemonset_name

Prometheus IDkube_daemonset_name
Legacy IDkubernetes.daemonSet.name
OSS KSM IDdaemonset
CategoryKubernetes
DescriptionName of the DaemonSet as retrieved from the API server.
Additional Notes

kube_daemonset_uid

Prometheus IDkube_daemonset_uid
Legacy IDkubernetes.daemonSet.uid
OSS KSM IDuid
CategoryKubernetes
DescriptionUnique ID of the DaemonSet as retrieved from the API server.
Additional Notes

kube_deployment_name

Prometheus IDkube_deployment_name
Legacy IDkubernetes.deployment.name
OSS KSM IDdeployment
CategoryKubernetes
DescriptionName of the Deployment as retrieved from the API server.
Additional Notes

kube_deployment_uid

Prometheus IDkube_deployment_uid
Legacy IDkubernetes.deployment.uid
OSS KSM IDuid
CategoryKubernetes
DescriptionUnique ID of the Deployment as retrieved from the API server.
Additional Notes

kube_hpa_name

Prometheus IDkube_hpa_name
Legacy IDkubernetes.hpa.name
OSS KSM IDhpa
CategoryKubernetes
DescriptionName of the HPA as retrieved from the API server.
Additional Notes

kube_hpa_uid

Prometheus IDkube_hpa_uid
Legacy IDkubernetes.hpa.uid
OSS KSM IDuid
CategoryKubernetes
DescriptionUnique ID of the HPA as retrieved from the API server.
Additional Notes

kube_job_name

Prometheus IDkube_job_name
Legacy IDkubernetes.job.name
OSS KSM IDjob_name
CategoryKubernetes
DescriptionName of the Job as retrieved from the API server.
Additional Notes

kube_job_owner_is_controller

Prometheus IDkube_job_owner_is_controller
Legacy IDkubernetes.job.owner.isController
OSS KSM IDowner_is_controller
CategoryKubernetes
DescriptionDesignates whether the Job is created by a higher-level controller object
Additional Notes

kube_job_owner_kind

Prometheus IDkube_job_owner_kind
Legacy IDkubernetes.job.owner.kind
OSS KSM IDowner_kind
CategoryKubernetes
DescriptionThe workload resource type of the object that created the Job if owned by a higher-level controller object
Additional Notes

kube_job_owner_name

Prometheus IDkube_job_owner_name
Legacy IDkubernetes.job.owner.name
OSS KSM IDowner_name
CategoryKubernetes
DescriptionThe name of the object that created the Job if owned by a higher-level controller object
Additional Notes

kube_job_uid

Prometheus IDkube_job_uid
Legacy IDkubernetes.job.uid
OSS KSM IDuid
CategoryKubernetes
DescriptionUnique ID of the Job as retrieved from the API server.
Additional Notes

kube_namespace_name

Prometheus IDkube_namespace_name
Legacy IDkubernetes.namespace.name
OSS KSM IDnamespace
CategoryKubernetes
DescriptionName of the Namespace as retrieved from the API server.
Additional Notes

kube_namespace_uid

Prometheus IDkube_namespace_uid
Legacy IDkubernetes.namespace.uid
OSS KSM IDuid
CategoryKubernetes
DescriptionUnique ID of the Namespace as retrieved from the API server.
Additional Notes

kube_node_condition

Prometheus IDkube_node_condition
Legacy IDkubernetes.node.condition
OSS KSM IDcondition
CategoryKubernetes
DescriptionDescribes the status of the Node. Can be Ready, DiskPressure, OutOfDisk, MemoryPressure, or Unschedulable.
Additional Notes

kube_node_name

Prometheus IDkube_node_name
Legacy IDkubernetes.node.name
OSS KSM IDnode
CategoryKubernetes
DescriptionName of the Node as retrieved from the API server.
Additional Notes

kube_node_resource

Prometheus IDkube_node_resource
Legacy IDkubernetes.node.resource
OSS KSM IDresource
CategoryKubernetes
DescriptionIndicates the capacity or allocatable limit for the different resources of a node
Additional Notes

kube_node_status

Prometheus IDkube_node_status
Legacy IDkubernetes.node.status
OSS KSM IDstatus
CategoryKubernetes
DescriptionUsed in combination with the kube_node_condition label to indicate the boolean value of that label
Additional Notes

kube_node_uid

Prometheus IDkube_node_uid
Legacy IDkubernetes.node.uid
OSS KSM IDuid
CategoryKubernetes
DescriptionUnique ID of the Node as retrieved from the API server.
Additional Notes

kube_node_unit

Prometheus IDkube_node_unit
Legacy IDkubernetes.node.unit
OSS KSM IDunit
CategoryKubernetes
DescriptionUsed in combination with the kube_node_resource label to indicate the unit of that label
Additional Notes

name

Prometheus IDname
Legacy IDkubernetes.persistentvolume.claim.ref.name
OSS KSM ID-
CategoryKubernetes
DescriptionName of the Persistent Volume’s claimRef as retrieved from the API server.
Additional Notes

claim_namespace

Prometheus IDclaim_namespace
Legacy IDkubernetes.persistentvolume.claim.ref.namespace
OSS KSM ID-
CategoryKubernetes
DescriptionNamespace of the Persistent Volume’s claimRef as retrieved from the API server.
Additional Notes

kube_persistentvolume_name

Prometheus IDkube_persistentvolume_name
Legacy IDkubernetes.persistentvolume.name
OSS KSM IDpersistentvolume
CategoryKubernetes
DescriptionName of the Persistent Volume as retrieved from the API server.
Additional Notes

kube_persistentvolume_uid

Prometheus IDkube_persistentvolume_uid
Legacy IDkubernetes.persistentvolume.uid
OSS KSM IDuid
CategoryKubernetes
DescriptionUnique ID of the Persistent Volume as retrieved from the API server.
Additional Notes

access_mode

Prometheus IDaccess_mode
Legacy IDkubernetes.persistentvolumeclaim.accessMode
OSS KSM ID-
CategoryKubernetes
DescriptionAccess mode of the PVC as retrieved from the API server.
Additional Notes

status

Prometheus IDstatus
Legacy IDkubernetes.persistentvolumeclaim.condition.status
OSS KSM ID-
CategoryKubernetes
DescriptionUsed in combination with the type label to indicate the boolean value of that label
Additional Notes

type

Prometheus IDtype
Legacy IDkubernetes.persistentvolumeclaim.condition.type
OSS KSM ID-
CategoryKubernetes
DescriptionThe type of the condition that the PVC is in
Additional Notes

kube_persistentvolumeclaim_name

Prometheus IDkube_persistentvolumeclaim_name
Legacy IDkubernetes.persistentvolumeclaim.name
OSS KSM IDpersistentvolumeclaim
CategoryKubernetes
DescriptionName of the PVC as retrieved from the API server.
Additional Notes

phase

Prometheus IDphase
Legacy IDkubernetes.persistentvolumeclaim.phase
OSS KSM ID-
CategoryKubernetes
DescriptionThe phase that the PVC is in. Will be Available, Bound, Released, or Failed.
Additional Notes

kube_persistentvolumeclaim_uid

Prometheus IDkube_persistentvolumeclaim_uid
Legacy IDkubernetes.persistentvolumeclaim.uid
OSS KSM IDuid
CategoryKubernetes
DescriptionUnique ID of the PVC as retrieved from the API server.
Additional Notes

kube_pod_condition

Prometheus IDkube_pod_condition
Legacy IDkubernetes.pod.condition
OSS KSM IDcondition
CategoryKubernetes
DescriptionThe condition that the Pod is in. Will be PodScheduled, ContainersReady, Initialized, or Ready
Additional Notes

kube_pod_container_full_id

Prometheus IDkube_pod_container_full_id
Legacy IDkubernetes.pod.container.full.id
OSS KSM IDcontainer_full_id
CategoryKubernetes
DescriptionThe full UID of the container in the Pod
Additional Notes

kube_pod_container_id

Prometheus IDkube_pod_container_id
Legacy IDkubernetes.pod.container.id
OSS KSM IDcontainer_id
CategoryKubernetes
DescriptionA short ID from truncating the full UID of the container in the Pod
Additional Notes

kube_pod_container_name

Prometheus IDkube_pod_container_name
Legacy IDkubernetes.pod.container.name
OSS KSM IDcontainer
CategoryKubernetes
DescriptionThe name of the container in the Pod
Additional Notes

kube_pod_container_reason

Prometheus IDkube_pod_container_reason
Legacy IDkubernetes.pod.container.reason
OSS KSM IDreason
CategoryKubernetes
DescriptionThe reason that the container is in the state that it is in.
Additional Notes

kube_pod_internal_ip

Prometheus IDkube_pod_internal_ip
Legacy IDkubernetes.pod.internalIp
OSS KSM IDinternal_ip
CategoryKubernetes
DescriptionThe IP address associated with the Pod
Additional Notes

kube_pod_name

Prometheus IDkube_pod_name
Legacy IDkubernetes.pod.name
OSS KSM IDpod
CategoryKubernetes
DescriptionName of the Pod as retrieved from the API server.
Additional Notes

kube_pod_node

Prometheus IDkube_pod_node
Legacy IDkubernetes.pod.node
OSS KSM IDnode
CategoryKubernetes
DescriptionThe Node on which the Pod is running.
Additional Notes

kube_pod_owner_is_controller

Prometheus IDkube_pod_owner_is_controller
Legacy IDkubernetes.pod.owner.isController
OSS KSM IDowner_is_controller
CategoryKubernetes
DescriptionDesignates whether the Pod is created by a higher-level controller object
Additional Notes

kube_pod_owner_kind

Prometheus IDkube_pod_owner_kind
Legacy IDkubernetes.pod.owner.kind
OSS KSM IDowner_kind
CategoryKubernetes
DescriptionThe workload resource type of the object that created the Pod if owned by a higher-level controller object
Additional Notes

kube_pod_owner_name

Prometheus IDkube_pod_owner_name
Legacy IDkubernetes.pod.owner.name
OSS KSM IDowner_name
CategoryKubernetes
DescriptionThe name of the object that created the Pod if owned by a higher-level controller object
Additional Notes

kube_pod_persistentvolumeclaim

Prometheus IDkube_pod_persistentvolumeclaim
Legacy IDkubernetes.pod.persistentvolumeclaim
OSS KSM IDpersistentvolumeclaim
CategoryKubernetes
DescriptionThe name of the PVC associated with the Pod
Additional Notes

kube_pod_phase

Prometheus IDkube_pod_phase
Legacy IDkubernetes.pod.phase
OSS KSM IDphase
CategoryKubernetes
DescriptionThe phase that the Pod is in. Can be Pending, Running, Succeeded, Failed, or Unknown.
Additional Notes

kube_pod_pod_ip

Prometheus IDkube_pod_pod_ip
Legacy IDkubernetes.pod.pod.ip
OSS KSM IDpod_ip
CategoryKubernetes
DescriptionThe IP address associated with the Pod
Additional Notes

kube_pod_reason

Prometheus IDkube_pod_reason
Legacy IDkubernetes.pod.reason
OSS KSM IDreason
CategoryKubernetes
DescriptionThe reason the Pod is in the phase that it is in.
Additional Notes

kube_pod_resource

Prometheus IDkube_pod_resource
Legacy IDkubernetes.pod.resource
OSS KSM IDresource
CategoryKubernetes
DescriptionThe Pod’s resource limits and requests. Individual labels are created for memory limits, memory requests, CPU limits, and CPU requests
Additional Notes

kube_pod_uid

Prometheus IDkube_pod_uid
Legacy IDkubernetes.pod.uid
OSS KSM IDuid
CategoryKubernetes
DescriptionUnique ID of the Pod as retrieved from the API server.
Additional Notes

kube_pod_unit

Prometheus IDkube_pod_unit
Legacy IDkubernetes.pod.unit
OSS KSM IDunit
CategoryKubernetes
DescriptionUsed in combination with the kube_pod_resource label to indicate the unit of the resource limit or request
Additional Notes

kube_pod_volume

Prometheus IDkube_pod_volume
Legacy IDkubernetes.pod.volume
OSS KSM IDvolume
CategoryKubernetes
DescriptionName of the volume associated with the Pod.
Additional Notes

kube_replicaset_name

Prometheus IDkube_replicaset_name
Legacy IDkubernetes.replicaSet.name
OSS KSM IDreplicaset
CategoryKubernetes
DescriptionName of the ReplicaSet as retrieved from the API server.
Additional Notes

kube_replicaset_owner_is_controller

Prometheus IDkube_replicaset_owner_is_controller
Legacy IDkubernetes.replicaSet.owner.isController
OSS KSM IDowner_is_controller
CategoryKubernetes
DescriptionDesignates whether the ReplicaSet is created by a higher-level controller object
Additional Notes

kube_replicaset_owner_kind

Prometheus IDkube_replicaset_owner_kind
Legacy IDkubernetes.replicaSet.owner.kind
OSS KSM IDowner_kind
CategoryKubernetes
DescriptionThe workload resource type of the object that created the ReplicaSet if owned by a higher-level controller object
Additional Notes

kube_replicaset_owner_name

Prometheus IDkube_replicaset_owner_name
Legacy IDkubernetes.replicaSet.owner.name
OSS KSM IDowner_name
CategoryKubernetes
DescriptionThe name of the object that created the ReplicaSet if owned by a higher-level controller object
Additional Notes

kube_replicaset_uid

Prometheus IDkube_replicaset_uid
Legacy IDkubernetes.replicaSet.uid
OSS KSM IDuid
CategoryKubernetes
DescriptionUnique ID of the ReplicaSet as retrieved from the API server.
Additional Notes

kube_replicationcontroller_name

Prometheus IDkube_replicationcontroller_name
Legacy IDkubernetes.replicationController.name
OSS KSM IDreplicationcontroller
CategoryKubernetes
DescriptionName of the Replication Controller as retrieved from the API server.
Additional Notes

kube_replicationcontroller_owner_is_controller

Prometheus IDkube_replicationcontroller_owner_is_controller
Legacy IDkubernetes.replicationController.owner.isController
OSS KSM IDowner_is_controller
CategoryKubernetes
DescriptionDesignates whether the Replication Controller is created by a higher-level controller object
Additional Notes

kube_replicationcontroller_owner_kind

Prometheus IDkube_replicationcontroller_owner_kind
Legacy IDkubernetes.replicationController.owner.kind
OSS KSM IDowner_kind
CategoryKubernetes
DescriptionThe workload resource type of the object that created the Replication Controller if owned by a higher-level controller object
Additional Notes

kube_replicationcontroller_owner_name

Prometheus IDkube_replicationcontroller_owner_name
Legacy IDkubernetes.replicationController.owner.name
OSS KSM IDowner_name
CategoryKubernetes
DescriptionThe name of the object that created the Replication Controller if owned by a higher-level controller object
Additional Notes

kube_replicationcontroller_uid

Prometheus IDkube_replicationcontroller_uid
Legacy IDkubernetes.replicationController.uid
OSS KSM ID_uid
CategoryKubernetes
DescriptionUnique ID of the Replication Controller as retrieved from the API server.
Additional Notes

kube_resourcequota_name

Prometheus IDkube_resourcequota_name
Legacy IDkubernetes.resourcequota.name
OSS KSM IDresourcequota
CategoryKubernetes
DescriptionName of the Resource Quota as retrieved from the API server.
Additional Notes

kube_resourcequota_namespace

Prometheus IDkube_resourcequota_namespace
Legacy IDkubernetes.resourcequota.namespace
OSS KSM IDnamespace
CategoryKubernetes
DescriptionNamespace in which the Resource Quota is being enforced
Additional Notes

kube_resourcequota_resource

Prometheus IDkube_resourcequota_resource
Legacy IDkubernetes.resourcequota.resource
OSS KSM IDresource
CategoryKubernetes
DescriptionThe resource and the amount of it in which the Resource Quota is being enforced
Additional Notes

kube_resourcequota_resourcequota

Prometheus IDkube_resourcequota_resourcequota
Legacy IDkubernetes.resourcequota.resourcequota
OSS KSM IDresourcequota
CategoryKubernetes
DescriptionName of the Resource Quota as retrieved from the API server.
Additional Notes

kube_resourcequota_type

Prometheus IDkube_resourcequota_type
Legacy IDkubernetes.resourcequota.type
OSS KSM IDtype
CategoryKubernetes
DescriptionUsed in combination with kube_resourcequota_resource to designate whether the amount is Used or is the Hard limit
Additional Notes

kube_resourcequota_uid

Prometheus IDkube_resourcequota_uid
Legacy IDkubernetes.resourcequota.uid
OSS KSM IDuid
CategoryKubernetes
DescriptionUnique ID of the Resource Quota as retrieved from the API server.
Additional Notes

kube_service_cluster_ip

Prometheus IDkube_service_cluster_ip
Legacy IDkubernetes.service.clusterIp
OSS KSM IDcluster_ip
CategoryKubernetes
DescriptionThe IP address associated with the Service
Additional Notes

kube_service_name

Prometheus IDkube_service_name
Legacy IDkubernetes.service.name
OSS KSM IDservice
CategoryKubernetes
DescriptionName of the Service as retrieved from the API server.
Additional Notes

kube_service_service_ip

Prometheus IDkube_service_service_ip
Legacy IDkubernetes.service.service.ip
OSS KSM IDservice_ip
CategoryKubernetes
DescriptionThe IP address associated with the Service
Additional Notes

kube_service_uid

Prometheus IDkube_service_uid
Legacy IDkubernetes.service.uid
OSS KSM IDuid
CategoryKubernetes
DescriptionUnique ID of the Service as retrieved from the API server.
Additional Notes

kube_statefulset_name

Prometheus IDkube_statefulset_name
Legacy IDkubernetes.statefulSet.name
OSS KSM IDstatefulset
CategoryKubernetes
DescriptionName of the StatefulSet as retrieved from the API server.
Additional Notes

kube_statefulset_uid

Prometheus IDkube_statefulset_uid
Legacy IDkubernetes.statefulSet.uid
OSS KSM IDuid
CategoryKubernetes
DescriptionUnique ID of the StatefulSet as retrieved from the API server.
Additional Notes

kube_storageclass_name

Prometheus IDkube_storageclass_name
Legacy IDkubernetes.storageclass.name
OSS KSM IDstorageclass
CategoryKubernetes
DescriptionName of the Storage Class as retrieved from the API server.
Additional Notes

provisioner

Prometheus IDprovisioner
Legacy IDkubernetes.storageclass.provisioner
OSS KSM ID-
CategoryKubernetes
DescriptionThe Provisioner of the Storage Class as retrieved from the API server.
Additional Notes

reclaim_policy

Prometheus IDreclaim_policy
Legacy IDkubernetes.storageclass.reclaimPolicy
OSS KSM ID-
CategoryKubernetes
DescriptionThe reclaim policy for the Storage Class as retrieved from the API server.
Additional Notes

kube_storageclass_uid

Prometheus IDkube_storageclass_uid
Legacy IDkubernetes.storageclass.uid
OSS KSM IDuid
CategoryKubernetes
DescriptionUnique ID of the Storage Class as retrieved from the API server.
Additional Notes

volume_binding_mode

Prometheus IDvolume_binding_mode
Legacy IDkubernetes.storageclass.volumeBindingMode
OSS KSM ID-
CategoryKubernetes
DescriptionThe volume binding mode for the Storage Class as retrieved from the API server.
Additional Notes

kube_workload_name

Prometheus IDkube_workload_name
Legacy IDkubernetes.workload.name
OSS KSM IDworkload_name
CategoryKubernetes
DescriptionThe name of the Kubernetes workload resource object
Additional Notes

kube_workload_type

Prometheus IDkube_workload_type
Legacy IDkubernetes.workload.type
OSS KSM IDworkload_type
CategoryKubernetes
DescriptionThe type of the Kubernetes workload resource i.e. Deployment, DaemonSet, Job, etc.
Additional Notes

marathon_app_id

Prometheus IDmarathon_app_id
Legacy IDmarathon.app.id
OSS KSM ID-
CategoryMarathon
Description
Additional Notes

marathon_app_name

Prometheus IDmarathon_app_name
Legacy IDmarathon.app.name
OSS KSM ID-
CategoryMarathon
Description
Additional Notes

marathon_group_id

Prometheus IDmarathon_group_id
Legacy IDmarathon.group.id
OSS KSM ID-
CategoryMarathon
Description
Additional Notes

marathon_group_name

Prometheus IDmarathon_group_name
Legacy IDmarathon.group.name
OSS KSM ID-
CategoryMarathon
Description
Additional Notes

mesos_cluster_id

Prometheus IDmesos_cluster_id
Legacy IDmesos.cluster.id
OSS KSM ID-
CategoryMesos
Description
Additional Notes

mesos_cluster_name

Prometheus IDmesos_cluster_name
Legacy IDmesos.cluster.name
OSS KSM ID-
CategoryMesos
Description
Additional Notes

mesos_framework_id

Prometheus IDmesos_framework_id
Legacy IDmesos.framework.id
OSS KSM ID-
CategoryMesos
Description
Additional Notes

mesos_framework_name

Prometheus IDmesos_framework_name
Legacy IDmesos.framework.name
OSS KSM ID-
CategoryMesos
Description
Additional Notes

mesos_slave_id

Prometheus IDmesos_slave_id
Legacy IDmesos.slave.id
OSS KSM ID-
CategoryMesos
Description
Additional Notes

mesos_slave_name

Prometheus IDmesos_slave_name
Legacy IDmesos.slave.name
OSS KSM ID-
CategoryMesos
Description
Additional Notes

mesos_task_id

Prometheus IDmesos_task_id
Legacy IDmesos.task.id
OSS KSM ID-
CategoryMesos
Description
Additional Notes

mesos_task_name

Prometheus IDmesos_task_name
Legacy IDmesos.task.name
OSS KSM ID-
CategoryMesos
Description
Additional Notes

net_client_ip

Prometheus IDnet_client_ip
Legacy IDnet.client.ip
OSS KSM ID-
CategoryNetwork
DescriptionClient IP address.
Additional Notes

net_http_method

Prometheus IDnet_http_method
Legacy IDnet.http.method
OSS KSM ID-
CategoryNetwork
DescriptionHTTP request method.
Additional Notes

net_http_statuscode

Prometheus IDnet_http_statuscode
Legacy IDnet.http.statusCode
OSS KSM ID-
CategoryNetwork
DescriptionHTTP response status code.
Additional Notes

net_http_url

Prometheus IDnet_http_url
Legacy IDnet.http.url
OSS KSM ID-
CategoryNetwork
DescriptionURL from an HTTP request.
Additional Notes

net_local_endpoint

Prometheus IDnet_local_endpoint
Legacy IDnet.local.endpoint
OSS KSM ID-
CategoryNetwork
DescriptionIP address of a local node.
Additional Notes

net_local_service

Prometheus IDnet_local_service
Legacy IDnet.local.service
OSS KSM ID-
CategoryNetwork
DescriptionService (port number) of a local node.
Additional Notes

net_mongodb_collection

Prometheus IDnet_mongodb_collection
Legacy IDnet.mongodb.collection
OSS KSM ID-
CategoryNetwork
DescriptionMongoDB collection.
Additional Notes

net_mongodb_operation

Prometheus IDnet_mongodb_operation
Legacy IDnet.mongodb.operation
OSS KSM ID-
CategoryNetwork
DescriptionMongoDB operation.
Additional Notes

net_protocol

Prometheus IDnet_protocol
Legacy IDnet.protocol
OSS KSM ID-
CategoryNetwork
DescriptionThe network protocol of a request (e.g. HTTP, MySQL).
Additional Notes

net_remote_endpoint

Prometheus IDnet_remote_endpoint
Legacy IDnet.remote.endpoint
OSS KSM ID-
CategoryNetwork
DescriptionIP address of a remote node.
Additional Notes

net_remote_service

Prometheus IDnet_remote_service
Legacy IDnet.remote.service
OSS KSM ID-
CategoryNetwork
DescriptionService (port number) of a remote node.
Additional Notes

net_server_ip

Prometheus IDnet_server_ip
Legacy IDnet.server.ip
OSS KSM ID-
CategoryNetwork
DescriptionServer IP address.
Additional Notes

net_server_port

Prometheus IDnet_server_port
Legacy IDnet.server.port
OSS KSM ID-
CategoryNetwork
DescriptionTCP/UDP Server Port number.
Additional Notes

net_sql_query

Prometheus IDnet_sql_query
Legacy IDnet.sql.query
OSS KSM ID-
CategoryNetwork
DescriptionThe full SQL query.
Additional Notes

net_sql_querytype

Prometheus IDnet_sql_querytype
Legacy IDnet.sql.query.type
OSS KSM ID-
CategoryNetwork
DescriptionSQL query type (SELECT, INSERT, DELETE, etc.).
Additional Notes

net_sql_table

Prometheus IDnet_sql_table
Legacy IDnet.sql.table
OSS KSM ID-
CategoryNetwork
DescriptionSQL query table name.
Additional Notes

program_name

Prometheus IDprogram_name
Legacy IDproc.client.name
OSS KSM ID-
CategoryProgram
DescriptionName of the Client process.
Additional Notes

program_cmd_line

Prometheus IDprogram_cmd_line
Legacy IDproc.commandLine
OSS KSM ID-
CategoryProgram
DescriptionCommand line used to start the process.
Additional Notes

program_name

Prometheus IDprogram_name
Legacy IDproc.name
OSS KSM ID-
CategoryProgram
DescriptionName of the process.
Additional Notes

program_name

Prometheus IDprogram_name
Legacy IDproc.server.name
OSS KSM ID-
CategoryProgram
DescriptionName of the server process.
Additional Notes

swarm_cluster_id

Prometheus IDswarm_cluster_id
Legacy IDswarm.cluster.id
OSS KSM ID-
CategorySwarm
Description
Additional Notes

swarm_cluster_name

Prometheus IDswarm_cluster_name
Legacy IDswarm.cluster.name
OSS KSM ID-
CategorySwarm
Description
Additional Notes

swarm_manager_reachability

Prometheus IDswarm_manager_reachability
Legacy IDswarm.manager.reachability
OSS KSM ID-
CategorySwarm
Description
Additional Notes

swarm_node_availability

Prometheus IDswarm_node_availability
Legacy IDswarm.node.availability
OSS KSM ID-
CategorySwarm
Description
Additional Notes

swarm_node_id

Prometheus IDswarm_node_id
Legacy IDswarm.node.id
OSS KSM ID-
CategorySwarm
Description
Additional Notes

swarm_node_ip_address

Prometheus IDswarm_node_ip_address
Legacy IDswarm.node.ip_address
OSS KSM ID-
CategorySwarm
Description
Additional Notes

swarm_node_name

Prometheus IDswarm_node_name
Legacy IDswarm.node.name
OSS KSM ID-
CategorySwarm
Description
Additional Notes

swarm_node_role

Prometheus IDswarm_node_role
Legacy IDswarm.node.role
OSS KSM ID-
CategorySwarm
Description
Additional Notes

swarm_node_state

Prometheus IDswarm_node_state
Legacy IDswarm.node.state
OSS KSM ID-
CategorySwarm
Description
Additional Notes

swarm_node_version

Prometheus IDswarm_node_version
Legacy IDswarm.node.version
OSS KSM ID-
CategorySwarm
Description
Additional Notes

swarm_service_id

Prometheus IDswarm_service_id
Legacy IDswarm.service.id
OSS KSM ID-
CategorySwarm
Description
Additional Notes

swarm_service_name

Prometheus IDswarm_service_name
Legacy IDswarm.service.name
OSS KSM ID-
CategorySwarm
Description
Additional Notes

swarm_task_container_id

Prometheus IDswarm_task_container_id
Legacy IDswarm.task.container_id
OSS KSM ID-
CategorySwarm
Description
Additional Notes

swarm_task_id

Prometheus IDswarm_task_id
Legacy IDswarm.task.id
OSS KSM ID-
CategorySwarm
Description
Additional Notes

swarm_task_name

Prometheus IDswarm_task_name
Legacy IDswarm.task.name
OSS KSM ID-
CategorySwarm
Description
Additional Notes

swarm_task_node_id

Prometheus IDswarm_task_node_id
Legacy IDswarm.task.node_id
OSS KSM ID-
CategorySwarm
Description
Additional Notes

swarm_task_service_id

Prometheus IDswarm_task_service_id
Legacy IDswarm.task.service_id
OSS KSM ID-
CategorySwarm
Description
Additional Notes

swarm_task_state

Prometheus IDswarm_task_state
Legacy IDswarm.task.state
OSS KSM ID-
CategorySwarm
Description
Additional Notes

6.2.4 - File

sysdig_filestats_host_file_error_total_count

Prometheus IDsysdig_filestats_host_file_error_total_count
Legacy IDfile.error.total.count
Metric Typecounter
Unitnumber
DescriptionNumber of error caused by file access.
Additional NotesBy default, this metric shows the total value for the selected scope. For instance, if you apply it to a group of machines, you will see the total value for the whole group. However, you can easily segment the metric to see it by host, process, container, and so on. Just use ‘Segment by’ in the UI.

sysdig_filestats_host_file_in_bytes

Prometheus IDsysdig_filestats_host_file_in_bytes
Legacy IDfile.bytes.in
Metric Typecounter
Unitdata
DescriptionAmount of bytes read from file.
Additional NotesBy default, this metric shows the total value for the selected scope. For instance, if you apply it to a group of machines, you will see the total value for the whole group. However, you can easily segment the metric to see it by host, process, container, and so on. Just use ‘Segment by’ in the UI.

sysdig_filestats_host_file_open_count

Prometheus IDsysdig_filestats_host_file_open_count
Legacy IDfile.open.count
Metric Typecounter
Unitnumber
DescriptionNumber of time the file has been opened.
Additional Notes

sysdig_filestats_host_file_out_bytes

Prometheus IDsysdig_filestats_host_file_out_bytes
Legacy IDfile.bytes.out
Metric Typecounter
Unitdata
DescriptionAmount of bytes written to file.
Additional NotesBy default, this metric shows the total value for the selected scope. For instance, if you apply it to a group of machines, you will see the total value for the whole group. However, you can easily segment the metric to see it by host, process, container, and so on. Just use ‘Segment by’ in the UI.

sysdig_filestats_host_file_total_bytes

Prometheus IDsysdig_filestats_host_file_total_bytes
Legacy IDfile.bytes.total
Metric Typecounter
Unitdata
DescriptionAmount of bytes read from and written to file.
Additional NotesBy default, this metric shows the total value for the selected scope. For instance, if you apply it to a group of machines, you will see the total value for the whole group. However, you can easily segment the metric to see it by host, process, container, and so on. Just use ‘Segment by’ in the UI.

sysdig_filestats_host_file_total_time

Prometheus IDsysdig_filestats_host_file_total_time
Legacy IDfile.time.total
Metric Typecounter
Unittime
DescriptionTime spent in file I/O.
Additional NotesBy default, this metric shows the total value for the selected scope. For instance, if you apply it to a group of machines, you will see the total value for the whole group. However, you can easily segment the metric to see it by host, process, container, and so on. Just use ‘Segment by’ in the UI.

sysdig_fs_free_bytes

Prometheus IDsysdig_fs_free_bytes
Legacy IDfs.bytes.free
Metric Typegauge
Unitdata
DescriptionFilesystem available space.
Additional Notes

sysdig_fs_free_percent

Prometheus IDsysdig_fs_free_percent
Legacy IDfs.free.percent
Metric Typegauge
Unitpercent
DescriptionPercentage of filesystem free space.
Additional Notes

sysdig_fs_inodes_total_count

Prometheus IDsysdig_fs_inodes_total_count
Legacy IDfs.inodes.total.count
Metric Typegauge
Unitnumber
Description
Additional Notes

sysdig_fs_inodes_used_count

Prometheus IDsysdig_fs_inodes_used_count
Legacy IDfs.inodes.used.count
Metric Typegauge
Unitnumber
Description
Additional Notes

sysdig_fs_inodes_used_percent

Prometheus IDsysdig_fs_inodes_used_percent
Legacy IDfs.inodes.used.percent
Metric Typegauge
Unitpercent
Description
Additional Notes

sysdig_fs_total_bytes

Prometheus IDsysdig_fs_total_bytes
Legacy IDfs.bytes.total
Metric Typegauge
Unitdata
DescriptionFilesystem size.
Additional Notes

sysdig_fs_used_bytes

Prometheus IDsysdig_fs_used_bytes
Legacy IDfs.bytes.used
Metric Typegauge
Unitdata
DescriptionFilesystem used space.
Additional Notes

sysdig_fs_used_percent

Prometheus IDsysdig_fs_used_percent
Legacy IDfs.used.percent
Metric Typegauge
Unitpercent
DescriptionPercentage of the sum of all filesystems in use.
Additional Notes

6.2.5 - Host

sysdig_host_container_count

Prometheus IDsysdig_host_container_count
Legacy IDcontainer.count
Metric Typegauge
Unitnumber
DescriptionCount of the number of containers.
Additional NotesThis metric is perfect for dashboards and alerts. In particular, you can create alerts that notify you when you have too many (or too few) containers of a certain type in a certain group or node - try segmenting by container.image, .id or .name. See also: host.count.

sysdig_host_container_start_count

Prometheus IDsysdig_host_container_start_count
Legacy IDhost.container.start.count
Metric Typecounter
Unitnumber
Description
Additional Notes

sysdig_host_count

Prometheus IDsysdig_host_count
Legacy IDhost.count
Metric Typegauge
Unitnumber
DescriptionCount of the number of hosts.
Additional NotesThis metric is perfect for dashboards and alerts. In particular, you can create alerts that notify you when you have too many (or too few) machines of a certain type in a certain group - try segment by tag or hostname. See also: container.count.

sysdig_host_cpu_cores_used

Prometheus IDsysdig_host_cpu_cores_used
Legacy IDcpu.cores.used
Metric Typegauge
Unitnumber
Description
Additional Notes

sysdig_host_cpu_cores_used_percent

Prometheus IDsysdig_host_cpu_cores_used_percent
Legacy IDcpu.cores.used.percent
Metric Typegauge
Unitpercent
Description
Additional Notes

sysdig_host_cpu_idle_percent

Prometheus IDsysdig_host_cpu_idle_percent
Legacy IDcpu.idle.percent
Metric Typegauge
Unitpercent
DescriptionPercentage of time that the CPU or CPUs were idle and the system did not have an outstanding disk I/O request.
Additional NotesBy default, this metric shows the average value for the selected scope. For instance, if you apply it to a group of machines, you will see the average value for the whole group. However, the metric can also be segmented by using ‘Segment by’ in the UI.

sysdig_host_cpu_iowait_percent

Prometheus IDsysdig_host_cpu_iowait_percent
Legacy IDcpu.iowait.percent
Metric Typegauge
Unitpercent
DescriptionPercentage of time that the CPU or CPUs were idle during which the system had an outstanding disk I/O request.
Additional NotesBy default, this metric shows the average value for the selected scope. For instance, if you apply it to a group of machines, you will see the average value for the whole group. However, the metric can also be segmented by using ‘Segment by’ in the UI.

sysdig_host_cpu_nice_percent

Prometheus IDsysdig_host_cpu_nice_percent
Legacy IDcpu.nice.percent
Metric Typegauge
Unitpercent
DescriptionPercentage of CPU utilization that occurred while executing at the user level with nice priority.
Additional NotesBy default, this metric shows the average value for the selected scope. For instance, if you apply it to a group of machines, you will see the average value for the whole group. However, the metric can also be segmented by using ‘Segment by’ in the UI.

sysdig_host_cpu_stolen_percent

Prometheus IDsysdig_host_cpu_stolen_percent
Legacy IDcpu.stolen.percent
Metric Typegauge
Unitpercent
DescriptionCPU steal time is a measure of the percent of time that a virtual machine’s CPU is in a state of involuntary wait due to the fact that the physical CPU is shared among virtual machines. In calculating steal time, the operating system kernel detects when it has work available but does not have access to the physical CPU to perform that work.
Additional NotesIf the percent of steal time is consistently high, you may want to stop and restart the instance (since it will most likely start on different physical hardware) or upgrade to a virtual machine with more CPU power. Also see the metric ‘capacity total percent’ to see how steal time directly impacts the number of server requests that could not be handled. On AWS EC2, steal time does not depend on the activity of other virtual machine neighbours. EC2 is simply making sure your instance is not using more CPU cycles than paid for.

sysdig_host_cpu_system_percent

Prometheus IDsysdig_host_cpu_system_percent
Legacy IDcpu.system.percent
Metric Typegauge
Unitpercent
DescriptionPercentage of CPU utilization that occurred while executing at the system level (kernel).
Additional NotesBy default, this metric shows the average value for the selected scope. For instance, if you apply it to a group of machines, you will see the average value for the whole group. However, the metric can also be segmented by using ‘Segment by’ in the UI.

sysdig_host_cpu_used_percent

Prometheus IDsysdig_host_cpu_used_percent
Legacy IDcpu.used.percent
Metric Typegauge
Unitpercent
DescriptionThe CPU usage for each container is obtained from cgroups, and normalized by dividing by the number of cores to determine an overall percentage. For example, if the environment contains six cores on a host, and the container or processes are assigned two cores, Sysdig will report CPU usage of 2/6 * 100% = 33.33%. This metric is calculated differently for hosts and processes.
Additional Notes

sysdig_host_cpu_user_percent

Prometheus IDsysdig_host_cpu_user_percent
Legacy IDcpu.user.percent
Metric Typegauge
Unitpercent
DescriptionPercentage of CPU utilization that occurred while executing at the user level (application).
Additional NotesBy default, this metric shows the average value for the selected scope. For instance, if you apply it to a group of machines, you will see the average value for the whole group. However, the metric can also be segmented by using ‘Segment by’ in the UI.

sysdig_host_cpucore_idle_percent

Prometheus IDsysdig_host_cpucore_idle_percent
Legacy IDcpucore.idle.percent
Metric Typegauge
Unitpercent
Description
Additional Notes

sysdig_host_cpucore_iowait_percent

Prometheus IDsysdig_host_cpucore_iowait_percent
Legacy IDcpucore.iowait.percent
Metric Typegauge
Unitpercent
Description
Additional Notes

sysdig_host_cpucore_nice_percent

Prometheus IDsysdig_host_cpucore_nice_percent
Legacy IDcpucore.nice.percent
Metric Typegauge
Unitpercent
Description
Additional Notes

sysdig_host_cpucore_stolen_percent

Prometheus IDsysdig_host_cpucore_stolen_percent
Legacy IDcpucore.stolen.percent
Metric Typegauge
Unitpercent
Description
Additional Notes

sysdig_host_cpucore_system_percent

Prometheus IDsysdig_host_cpucore_system_percent
Legacy IDcpucore.system.percent
Metric Typegauge
Unitpercent
Description
Additional Notes

sysdig_host_cpucore_used_percent

Prometheus IDsysdig_host_cpucore_used_percent
Legacy IDcpucore.used.percent
Metric Typegauge
Unitpercent
Description
Additional Notes

sysdig_host_cpucore_user_percent

Prometheus IDsysdig_host_cpucore_user_percent
Legacy IDcpucore.user.percent
Metric Typegauge
Unitpercent
Description
Additional Notes

sysdig_host_fd_used_percent

Prometheus IDsysdig_host_fd_used_percent
Legacy IDfd.used.percent
Metric Typegauge
Unitpercent
DescriptionPercentage of used file descriptors out of the maximum available.
Additional NotesUsually, when a process reaches its FD limit it will stop operating properly and possibly crash. As a consequence, this is a metric you want to monitor carefully, or even better use for alerts.

sysdig_host_file_error_open_count

Prometheus IDsysdig_host_file_error_open_count
Legacy IDfile.error.open.count
Metric Typecounter
Unitnumber
DescriptionNumber of errors in opening files.
Additional NotesBy default, this metric shows the total value for the selected scope. For instance, if you apply it to a group of machines, you will see the total value for the whole group. However, you can easily segment the metric to see it by host, process, container, and so on. Just use ‘Segment by’ in the UI.

sysdig_host_file_error_total_count

Prometheus IDsysdig_host_file_error_total_count
Legacy IDfile.error.total.count
Metric Typecounter
Unitnumber
DescriptionNumber of error caused by file access.
Additional NotesBy default, this metric shows the total value for the selected scope. For instance, if you apply it to a group of machines, you will see the total value for the whole group. However, you can easily segment the metric to see it by host, process, container, and so on. Just use ‘Segment by’ in the UI.

sysdig_host_file_in_bytes

Prometheus IDsysdig_host_file_in_bytes
Legacy IDfile.bytes.in
Metric Typecounter
Unitdata
DescriptionAmount of bytes read from file.
Additional NotesBy default, this metric shows the total value for the selected scope. For instance, if you apply it to a group of machines, you will see the total value for the whole group. However, you can easily segment the metric to see it by host, process, container, and so on. Just use ‘Segment by’ in the UI.

sysdig_host_file_in_iops

Prometheus IDsysdig_host_file_in_iops
Legacy IDfile.iops.in
Metric Typecounter
Unitnumber
DescriptionNumber of file read operations per second.
Additional NotesThis is calculated by measuring the actual number of read and write requests made by a process. Therefore, it can differ from what other tools show, which is usually based on interpolating this value from the number of bytes read and written to the file system.

sysdig_host_file_in_time

Prometheus IDsysdig_host_file_in_time
Legacy IDfile.time.in
Metric Typecounter
Unittime
DescriptionTime spent in file reading.
Additional NotesBy default, this metric shows the total value for the selected scope. For instance, if you apply it to a group of machines, you will see the total value for the whole group. However, you can easily segment the metric to see it by host, process, container, and so on. Just use ‘Segment by’ in the UI.

sysdig_host_file_open_count

Prometheus IDsysdig_host_file_open_count
Legacy IDfile.open.count
Metric Typecounter
Unitnumber
DescriptionNumber of time the file has been opened.
Additional Notes

sysdig_host_file_out_bytes

Prometheus IDsysdig_host_file_out_bytes
Legacy IDfile.bytes.out
Metric Typecounter
Unitdata
DescriptionAmount of bytes written to file.
Additional NotesBy default, this metric shows the total value for the selected scope. For instance, if you apply it to a group of machines, you will see the total value for the whole group. However, you can easily segment the metric to see it by host, process, container, and so on. Just use ‘Segment by’ in the UI.

sysdig_host_file_out_iops

Prometheus IDsysdig_host_file_out_iops
Legacy IDfile.iops.out
Metric Typecounter
Unitnumber
DescriptionNumber of file write operations per second.
Additional NotesThis is calculated by measuring the actual number of read and write requests made by a process. Therefore, it can differ from what other tools show, which is usually based on interpolating this value from the number of bytes read and written to the file system.

sysdig_host_file_out_time

Prometheus IDsysdig_host_file_out_time
Legacy IDfile.time.out
Metric Typecounter
Unittime
DescriptionTime spent in file writing.
Additional NotesBy default, this metric shows the total value for the selected scope. For instance, if you apply it to a group of machines, you will see the total value for the whole group. However, you can easily segment the metric to see it by host, process, container, and so on. Just use ‘Segment by’ in the UI.

sysdig_host_file_total_bytes

Prometheus IDsysdig_host_file_total_bytes
Legacy IDfile.bytes.total
Metric Typecounter
Unitdata
DescriptionAmount of bytes read from and written to file.
Additional NotesBy default, this metric shows the total value for the selected scope. For instance, if you apply it to a group of machines, you will see the total value for the whole group. However, you can easily segment the metric to see it by host, process, container, and so on. Just use ‘Segment by’ in the UI.

sysdig_host_file_total_iops

Prometheus IDsysdig_host_file_total_iops
Legacy IDfile.iops.total
Metric Typecounter
Unitnumber
DescriptionNumber of read and write file operations per second.
Additional NotesThis is calculated by measuring the actual number of read and write requests made by a process. Therefore, it can differ from what other tools show, which is usually based on interpolating this value from the number of bytes read and written to the file system.

sysdig_host_file_total_time

Prometheus IDsysdig_host_file_total_time
Legacy IDfile.time.total
Metric Typecounter
Unittime
DescriptionTime spent in file I/O.
Additional NotesBy default, this metric shows the total value for the selected scope. For instance, if you apply it to a group of machines, you will see the total value for the whole group. However, you can easily segment the metric to see it by host, process, container, and so on. Just use ‘Segment by’ in the UI.

sysdig_host_fs_free_bytes

Prometheus IDsysdig_host_fs_free_bytes
Legacy IDfs.bytes.free
Metric Typegauge
Unitdata
DescriptionFilesystem available space.
Additional Notes

sysdig_host_fs_free_percent

Prometheus IDsysdig_host_fs_free_percent
Legacy IDfs.free.percent
Metric Typegauge
Unitpercent
DescriptionPercentage of filesystem free space.
Additional Notes

sysdig_host_fs_inodes_total_count

Prometheus IDsysdig_host_fs_inodes_total_count
Legacy IDfs.inodes.total.count
Metric Typegauge
Unitnumber
Description
Additional Notes

sysdig_host_fs_inodes_used_count

Prometheus IDsysdig_host_fs_inodes_used_count
Legacy IDfs.inodes.used.count
Metric Typegauge
Unitnumber
Description
Additional Notes

sysdig_host_fs_inodes_used_percent

Prometheus IDsysdig_host_fs_inodes_used_percent
Legacy IDfs.inodes.used.percent
Metric Typegauge
Unitpercent
Description
Additional Notes

sysdig_host_fs_largest_used_percent

Prometheus IDsysdig_host_fs_largest_used_percent
Legacy IDfs.largest.used.percent
Metric Typegauge
Unitpercent
DescriptionPercentage of the largest filesystem in use.
Additional Notes

sysdig_host_fs_root_used_percent

Prometheus IDsysdig_host_fs_root_used_percent
Legacy IDfs.root.used.percent
Metric Typegauge
Unitpercent
DescriptionPercentage of the root filesystem in use.
Additional Notes

sysdig_host_fs_total_bytes

Prometheus IDsysdig_host_fs_total_bytes
Legacy IDfs.bytes.total
Metric Typegauge
Unitdata
DescriptionFilesystem size.
Additional Notes

sysdig_host_fs_used_bytes

Prometheus IDsysdig_host_fs_used_bytes
Legacy IDfs.bytes.used
Metric Typegauge
Unitdata
DescriptionFilesystem used space.
Additional Notes

sysdig_host_fs_used_percent

Prometheus IDsysdig_host_fs_used_percent
Legacy IDfs.used.percent
Metric Typegauge
Unitpercent
DescriptionPercentage of the sum of all filesystems in use.
Additional Notes

sysdig_host_info

Prometheus IDsysdig_host_info
Legacy IDinfo
Metric Typegauge
Unitnumber
Description
Additional Notes

sysdig_host_load_average_15m

Prometheus IDsysdig_host_load_average_15m
Legacy IDload.average.15m
Metric Typegauge
Unitnumber
DescriptionThe 15 minute system load average represents the average number of jobs in (1) the CPU run queue or (2) waiting for disk I/O averaged over 15 minutes for all cores. The value should correspond to the third (and last) load average value displayed by ‘uptime’ command.
Additional Notes

sysdig_host_load_average_1m

Prometheus IDsysdig_host_load_average_1m
Legacy IDload.average.1m
Metric Typegauge
Unitnumber
DescriptionThe 1 minute system load average represents the average number of jobs in (1) the CPU run queue or (2) waiting for disk I/O averaged over 1 minute for all cores. The value should correspond to the first (of three) load average values displayed by ‘uptime’ command.
Additional Notes

sysdig_host_load_average_5m

Prometheus IDsysdig_host_load_average_5m
Legacy IDload.average.5m
Metric Typegauge
Unitnumber
DescriptionThe 5 minute system load average represents the average number of jobs in (1) the CPU run queue or (2) waiting for disk I/O averaged over 5 minutes for all cores. The value should correspond to the second (of three) load average values displayed by ‘uptime’ command.
Additional Notes

sysdig_host_load_average_percpu_15m

Prometheus IDsysdig_host_load_average_percpu_15m
Legacy IDload.average.percpu.15m
Metric Typegauge
Unitnumber
DescriptionThe 15 minute system load average represents the average number of jobs in (1) the CPU run queue or (2) waiting for disk I/O averaged over 15 minutes, divided by number of system CPUs.
Additional Notes

sysdig_host_load_average_percpu_1m

Prometheus IDsysdig_host_load_average_percpu_1m
Legacy IDload.average.percpu.1m
Metric Typegauge
Unitnumber
DescriptionThe 1 minute system load average represents the average number of jobs in (1) the CPU run queue or (2) waiting for disk I/O averaged over 1 minute, divided by number of system CPUs.
Additional Notes

sysdig_host_load_average_percpu_5m

Prometheus IDsysdig_host_load_average_percpu_5m
Legacy IDload.average.percpu.5m
Metric Typegauge
Unitnumber
DescriptionThe 5 minute system load average represents the average number of jobs in (1) the CPU run queue or (2) waiting for disk I/O averaged over 5 minutes, divided by number of system CPUs.
Additional Notes

sysdig_host_memory_available_bytes

Prometheus IDsysdig_host_memory_available_bytes
Legacy IDmemory.bytes.available
Metric Typegauge
Unitdata
DescriptionThe available memory for a host is obtained from /proc/meminfo. For environments using Linux kernel version 3.12 and later, the available memory is obtained using the mem.available field in /proc/meminfo. For environments using earlier kernel versions, the formula is MemFree + Cached + Buffers.
Additional Notes

sysdig_host_memory_swap_available_bytes

Prometheus IDsysdig_host_memory_swap_available_bytes
Legacy IDmemory.swap.bytes.available
Metric Typegauge
Unitdata
DescriptionAvailable amount of swap memory.
Additional NotesSum of free and cached swap memory. By default, this metric shows the average value for the selected scope. For instance, if you apply it to a group of machines, you will see the average value for the whole group. However, the metric can also be segmented by using ‘Segment by’ in the UI.

sysdig_host_memory_swap_total_bytes

Prometheus IDsysdig_host_memory_swap_total_bytes
Legacy IDmemory.swap.bytes.total
Metric Typegauge
Unitdata
DescriptionTotal amount of swap memory.
Additional NotesBy default, this metric shows the average value for the selected scope. For instance, if you apply it to a group of machines, you will see the average value for the whole group. However, the metric can also be segmented by using ‘Segment by’ in the UI.

sysdig_host_memory_swap_used_bytes

Prometheus IDsysdig_host_memory_swap_used_bytes
Legacy IDmemory.swap.bytes.used
Metric Typegauge
Unitdata
DescriptionUsed amount of swap memory.
Additional NotesThe amount of used swap memory is calculated by subtracting available from total swap memory. By default, this metric shows the average value for the selected scope. For instance, if you apply it to a group of machines, you will see the average value for the whole group. However, the metric can also be segmented by using ‘Segment by’ in the UI.

sysdig_host_memory_swap_used_percent

Prometheus IDsysdig_host_memory_swap_used_percent
Legacy IDmemory.swap.used.percent
Metric Typegauge
Unitpercent
DescriptionUsed percent of swap memory.
Additional NotesThe percentage of used swap memory is calculated as percentual ratio of used and total swap memory. By default, this metric shows the average value for the selected scope. For instance, if you apply it to a group of machines, you will see the average value for the whole group. However, the metric can also be segmented by using ‘Segment by’ in the UI.

sysdig_host_memory_total_bytes

Prometheus IDsysdig_host_memory_total_bytes
Legacy IDmemory.bytes.total
Metric Typegauge
Unitdata
DescriptionThe total memory of a host, in bytes. This value is obtained from /proc.
Additional Notes

sysdig_host_memory_used_bytes

Prometheus IDsysdig_host_memory_used_bytes
Legacy IDmemory.bytes.used
Metric Typegauge
Unitdata
DescriptionThe amount of physical memory currently in use.
Additional NotesBy default, this metric shows the average value for the selected scope. For instance, if you apply it to a group of machines, you will see the average value for the whole group. However, the metric can also be segmented by using ‘Segment by’ in the UI.

sysdig_host_memory_used_percent

Prometheus IDsysdig_host_memory_used_percent
Legacy IDmemory.used.percent
Metric Typegauge
Unitpercent
DescriptionThe percentage of physical memory in use.
Additional NotesBy default, this metric shows the average value for the selected scope. For instance, if you apply it to a group of machines, you will see the average value for the whole group. However, you can easily segment the metric to see it by host, process, container, and so on. Just use ‘Segment by’ in the UI.

sysdig_host_memory_virtual_bytes

Prometheus IDsysdig_host_memory_virtual_bytes
Legacy IDmemory.bytes.virtual
Metric Typegauge
Unitdata
DescriptionThe virtual memory size of the process, in bytes. This value is obtained from Sysdig events.
Additional Notes

sysdig_host_net_connection_in_count

Prometheus IDsysdig_host_net_connection_in_count
Legacy IDnet.connection.count.in
Metric Typecounter
Unitnumber
DescriptionNumber of currently established client (inbound) connections.
Additional NotesThis metric is especially useful when segmented by protocol, port or process.

sysdig_host_net_connection_out_count

Prometheus IDsysdig_host_net_connection_out_count
Legacy IDnet.connection.count.out
Metric Typecounter
Unitnumber
DescriptionNumber of currently established server (outbound) connections.
Additional NotesThis metric is especially useful when segmented by protocol, port or process.

sysdig_host_net_connection_total_count

Prometheus IDsysdig_host_net_connection_total_count
Legacy IDnet.connection.count.total
Metric Typecounter
Unitnumber
DescriptionNumber of currently established connections. This value may exceed the sum of the inbound and outbound metrics since it represents client and server inter-host connections as well as internal only connections.
Additional NotesThis metric is especially useful when segmented by protocol, port or process.

sysdig_host_net_error_count

Prometheus IDsysdig_host_net_error_count
Legacy IDnet.error.count
Metric Typecounter
Unitnumber
DescriptionNumber of network errors.
Additional NotesBy default, this metric shows the total value for the selected scope. For instance, if you apply it to a group of machines, you will see the total value for the whole group. However, you can easily segment the metric to see it by host, process, container, and so on. Just use ‘Segment by’ in the UI.

sysdig_host_net_http_error_count

Prometheus IDsysdig_host_net_http_error_count
Legacy IDnet.http.error.count
Metric Typecounter
Unitnumber
DescriptionNumber of failed HTTP requests as counted from 4xx/5xx status codes.
Additional Notes

sysdig_host_net_http_request_count

Prometheus IDsysdig_host_net_http_request_count
Legacy IDnet.http.request.count
Metric Typecounter
Unitnumber
DescriptionCount of HTTP requests.
Additional Notes

sysdig_host_net_http_request_time

Prometheus IDsysdig_host_net_http_request_time
Legacy IDnet.http.request.time
Metric Typecounter
Unittime
DescriptionAverage time for HTTP requests.
Additional Notes

sysdig_host_net_http_statuscode_error_count

Prometheus IDsysdig_host_net_http_statuscode_error_count
Legacy IDnet.http.statuscode.error.count
Metric Typecounter
Unitnumber
Description
Additional Notes

sysdig_host_net_http_statuscode_request_count

Prometheus IDsysdig_host_net_http_statuscode_request_count
Legacy IDnet.http.statuscode.request.count
Metric Typecounter
Unitnumber
Description
Additional Notes

sysdig_host_net_http_url_error_count

Prometheus IDsysdig_host_net_http_url_error_count
Legacy IDnet.http.url.error.count
Metric Typecounter
Unitnumber
Description
Additional Notes

sysdig_host_net_http_url_request_count

Prometheus IDsysdig_host_net_http_url_request_count
Legacy IDnet.http.url.request.count
Metric Typecounter
Unitnumber
Description
Additional Notes

sysdig_host_net_http_url_request_time

Prometheus IDsysdig_host_net_http_url_request_time
Legacy IDnet.http.url.request.time
Metric Typecounter
Unittime
Description
Additional Notes

sysdig_host_net_mongodb_collection_error_count

Prometheus IDsysdig_host_net_mongodb_collection_error_count
Legacy IDnet.mongodb.collection.error.count
Metric Typecounter
Unitnumber
Description
Additional Notes

sysdig_host_net_mongodb_collection_request_count

Prometheus IDsysdig_host_net_mongodb_collection_request_count
Legacy IDnet.mongodb.collection.request.count
Metric Typecounter
Unitnumber
Description
Additional Notes

sysdig_host_net_mongodb_collection_request_time

Prometheus IDsysdig_host_net_mongodb_collection_request_time
Legacy IDnet.mongodb.collection.request.time
Metric Typecounter
Unittime
Description
Additional Notes

sysdig_host_net_mongodb_error_count

Prometheus IDsysdig_host_net_mongodb_error_count
Legacy IDnet.mongodb.error.count
Metric Typecounter
Unitnumber
Description
Additional Notes

sysdig_host_net_mongodb_operation_error_count

Prometheus IDsysdig_host_net_mongodb_operation_error_count
Legacy IDnet.mongodb.operation.error.count
Metric Typecounter
Unitnumber
Description
Additional Notes

sysdig_host_net_mongodb_operation_request_count

Prometheus IDsysdig_host_net_mongodb_operation_request_count
Legacy IDnet.mongodb.operation.request.count
Metric Typecounter
Unitnumber
Description
Additional Notes

sysdig_host_net_mongodb_operation_request_time

Prometheus IDsysdig_host_net_mongodb_operation_request_time
Legacy IDnet.mongodb.operation.request.time
Metric Typecounter
Unittime
Description
Additional Notes

sysdig_host_net_mongodb_request_count

Prometheus IDsysdig_host_net_mongodb_request_count
Legacy IDnet.mongodb.request.count
Metric Typecounter
Unitnumber
Description
Additional Notes

sysdig_host_net_mongodb_request_time

Prometheus IDsysdig_host_net_mongodb_request_time
Legacy IDnet.mongodb.request.time
Metric Typecounter
Unittime
Description
Additional Notes

sysdig_host_net_in_bytes

Prometheus IDsysdig_host_net_in_bytes
Legacy IDnet.bytes.in
Metric Typecounter
Unitdata
DescriptionInbound network bytes.
Additional NotesBy default, this metric shows the total value for the selected scope. For instance, if you apply it to a group of machines, you will see the total value for the whole group. However, you can easily segment the metric to see it by host, process, container, and so on. Just use ‘Segment by’ in the UI.

sysdig_host_net_out_bytes

Prometheus IDsysdig_host_net_out_bytes
Legacy IDnet.bytes.out
Metric Typecounter
Unitdata
DescriptionOutbound network bytes.
Additional NotesBy default, this metric shows the total value for the selected scope. For instance, if you apply it to a group of machines, you will see the total value for the whole group. However, you can easily segment the metric to see it by host, process, container, and so on. Just use ‘Segment by’ in the UI.

sysdig_host_net_request_count

Prometheus IDsysdig_host_net_request_count
Legacy IDnet.request.count
Metric Typecounter
Unitnumber
DescriptionTotal number of network requests. Note, this value may exceed the sum of inbound and outbound requests, because this count includes requests over internal connections.
Additional Notes

sysdig_host_net_request_in_count

Prometheus IDsysdig_host_net_request_in_count
Legacy IDnet.request.count.in
Metric Typecounter
Unitnumber
DescriptionNumber of inbound network requests.
Additional Notes

sysdig_host_net_request_in_time

Prometheus IDsysdig_host_net_request_in_time
Legacy IDnet.request.time.in
Metric Typecounter
Unittime
DescriptionAverage time to serve an inbound request.
Additional Notes

sysdig_host_net_request_out_count

Prometheus IDsysdig_host_net_request_out_count
Legacy IDnet.request.count.out
Metric Typecounter
Unitnumber
DescriptionNumber of outbound network requests.
Additional Notes

sysdig_host_net_request_out_time

Prometheus IDsysdig_host_net_request_out_time
Legacy IDnet.request.time.out
Metric Typecounter
Unittime
DescriptionAverage time spent waiting for an outbound request.
Additional Notes

sysdig_host_net_request_time

Prometheus IDsysdig_host_net_request_time
Legacy IDnet.request.time
Metric Typecounter
Unittime
DescriptionAverage time to serve a network request.
Additional Notes

sysdig_host_net_server_connection_in_count

Prometheus IDsysdig_host_net_server_connection_in_count
Legacy IDnet.server.connection.count.in
Metric Typecounter
Unitnumber
Description
Additional Notes

sysdig_host_net_server_in_bytes

Prometheus IDsysdig_host_net_server_in_bytes
Legacy IDnet.server.bytes.in
Metric Typecounter
Unitdata
Description
Additional Notes

sysdig_host_net_server_out_bytes

Prometheus IDsysdig_host_net_server_out_bytes
Legacy IDnet.server.bytes.out
Metric Typecounter
Unitdata
Description
Additional Notes

sysdig_host_net_server_total_bytes

Prometheus IDsysdig_host_net_server_total_bytes
Legacy IDnet.server.bytes.total
Metric Typecounter
Unitdata
Description
Additional Notes

sysdig_host_net_sql_error_count

Prometheus IDsysdig_host_net_sql_error_count
Legacy IDnet.sql.error.count
Metric Typecounter
Unitnumber
DescriptionNumber of Failed SQL requests.
Additional Notes

sysdig_host_net_sql_query_error_count

Prometheus IDsysdig_host_net_sql_query_error_count
Legacy IDnet.sql.query.error.count
Metric Typecounter
Unitnumber
Description
Additional Notes

sysdig_host_net_sql_query_request_count

Prometheus IDsysdig_host_net_sql_query_request_count
Legacy IDnet.sql.query.request.count
Metric Typecounter
Unitnumber
Description
Additional Notes

sysdig_host_net_sql_query_request_time

Prometheus IDsysdig_host_net_sql_query_request_time
Legacy IDnet.sql.query.request.time
Metric Typecounter
Unittime
Description
Additional Notes

sysdig_host_net_sql_querytype_error_count

Prometheus IDsysdig_host_net_sql_querytype_error_count
Legacy IDnet.sql.querytype.error.count
Metric Typecounter
Unitnumber
Description
Additional Notes

sysdig_host_net_sql_querytype_request_count

Prometheus IDsysdig_host_net_sql_querytype_request_count
Legacy IDnet.sql.querytype.request.count
Metric Typecounter
Unitnumber
Description
Additional Notes

sysdig_host_net_sql_querytype_request_time

Prometheus IDsysdig_host_net_sql_querytype_request_time
Legacy IDnet.sql.querytype.request.time
Metric Typecounter
Unittime
Description
Additional Notes

sysdig_host_net_sql_request_count

Prometheus IDsysdig_host_net_sql_request_count
Legacy IDnet.sql.request.count
Metric Typecounter
Unitnumber
DescriptionNumber of SQL requests.
Additional Notes

sysdig_host_net_sql_request_time

Prometheus IDsysdig_host_net_sql_request_time
Legacy IDnet.sql.request.time
Metric Typecounter
Unittime
DescriptionAverage time to complete a SQL request.
Additional Notes

sysdig_host_net_sql_table_error_count

Prometheus IDsysdig_host_net_sql_table_error_count
Legacy IDnet.sql.table.error.count
Metric Typecounter
Unitnumber
Description
Additional Notes

sysdig_host_net_sql_table_request_count

Prometheus IDsysdig_host_net_sql_table_request_count
Legacy IDnet.sql.table.request.count
Metric Typecounter
Unitnumber
Description
Additional Notes

sysdig_host_net_sql_table_request_time

Prometheus IDsysdig_host_net_sql_table_request_time
Legacy IDnet.sql.table.request.time
Metric Typecounter
Unittime
Description
Additional Notes

sysdig_host_net_tcp_queue_len

Prometheus IDsysdig_host_net_tcp_queue_len
Legacy IDnet.tcp.queue.len
Metric Typecounter
Unitnumber
DescriptionLength of the TCP request queue.
Additional Notes

sysdig_host_net_total_bytes

Prometheus IDsysdig_host_net_total_bytes
Legacy IDnet.bytes.total
Metric Typecounter
Unitdata
DescriptionTotal network bytes, inbound and outbound.
Additional NotesBy default, this metric shows the total value for the selected scope. For instance, if you apply it to a group of machines, you will see the total value for the whole group. However, you can easily segment the metric to see it by host, process, container, and so on. Just use ‘Segment by’ in the UI.

sysdig_host_proc_count

Prometheus IDsysdig_host_proc_count
Legacy IDproc.count
Metric Typecounter
Unitnumber
DescriptionNumber of processes on host or container.
Additional Notes

sysdig_host_syscall_count

Prometheus IDsysdig_host_syscall_count
Legacy IDsyscall.count
Metric Typegauge
Unitnumber
DescriptionTotal number of syscalls seen
Additional NotesSyscalls are resource intensive. This metric tracks how many have been made by a given process or container

sysdig_host_syscall_error_count

Prometheus IDsysdig_host_syscall_error_count
Legacy IDhost.error.count
Metric Typecounter
Unitnumber
DescriptionNumber of system call errors.
Additional NotesBy default, this metric shows the total value for the selected scope. For instance, if you apply it to a group of machines, you will see the total value for the whole group. However, you can easily segment the metric to see it by host, process, container, and so on. Just use ‘Segment by’ in the UI.

sysdig_host_system_uptime

Prometheus IDsysdig_host_system_uptime
Legacy IDsystem.uptime
Metric Typegauge
Unittime
DescriptionThis metric is sent by the agent and represent the amount of seconds since host boot time. It is not available with container granularity.
Additional Notes

sysdig_host_thread_count

Prometheus IDsysdig_host_thread_count
Legacy IDthread.count
Metric Typecounter
Unitnumber
Description
Additional Notes

sysdig_host_timeseries_count_appcheck

Prometheus IDsysdig_host_timeseries_count_appcheck
Legacy IDmetricCount.appCheck
Metric Typegauge
Unitnumber
Description
Additional Notes

sysdig_host_timeseries_count_jmx

Prometheus IDsysdig_host_timeseries_count_jmx
Legacy IDmetricCount.jmx
Metric Typegauge
Unitnumber
Description
Additional Notes

sysdig_host_timeseries_count_prometheus

Prometheus IDsysdig_host_timeseries_count_prometheus
Legacy IDmetricCount.prometheus
Metric Typegauge
Unitnumber
Description
Additional Notes

sysdig_host_timeseries_count_statsd

Prometheus IDsysdig_host_timeseries_count_statsd
Legacy IDmetricCount.statsd
Metric Typegauge
Unitnumber
Description
Additional Notes

sysdig_host_up

Prometheus IDsysdig_host_up
Legacy IDuptime
Metric Typegauge
Unitnumber
DescriptionThe percentage of time the selected entity was down during the visualized time sample. This can be used to determine if a machine (or a group of machines) went down.
Additional Notes

6.2.6 - JMX/JVM

jmx_jvm_class_loaded

Prometheus IDjmx_jvm_class_loaded
Legacy IDjvm.class.loaded
Metric Typegauge
Unitnumber
DescriptionThe number of classes that are currently loaded in the JVM.
Additional NotesBy default, this metric shows the total value for the selected scope. For instance, if you apply it to a group of machines, you will see the total value for the whole group. However, you can easily segment the metric to see it by host, process, container, and so on. Just use ‘Segment by’ in the UI.

jmx_jvm_class_unloaded

Prometheus IDjmx_jvm_class_unloaded
Legacy IDjvm.class.unloaded
Metric Typegauge
Unitnumber
Description
Additional Notes

jmx_jvm_gc_ConcurrentMarkSweep_count

Prometheus IDjmx_jvm_gc_ConcurrentMarkSweep_count
Legacy IDjvm.gc.ConcurrentMarkSweep.count
Metric Typecounter
Unitnumber
DescriptionThe number of times the Concurrent Mark-Sweep garbage collector has run.
Additional Notes

jmx_jvm_gc_ConcurrentMarkSweep_time

Prometheus IDjmx_jvm_gc_ConcurrentMarkSweep_time
Legacy IDjvm.gc.ConcurrentMarkSweep.time
Metric Typecounter
Unittime
DescriptionThe amount of time the Concurrent Mark-Sweep garbage collector has run.
Additional Notes

jmx_jvm_gc_Copy_count

Prometheus IDjmx_jvm_gc_Copy_count
Legacy IDjvm.gc.Copy.count
Metric Typecounter
Unitnumber
Description
Additional Notes

jmx_jvm_gc_Copy_time

Prometheus IDjmx_jvm_gc_Copy_time
Legacy IDjvm.gc.Copy.time
Metric Typecounter
Unittime
Description
Additional Notes

jmx_jvm_gc_G1_Old_Generation_count

Prometheus IDjmx_jvm_gc_G1_Old_Generation_count
Legacy IDjvm.gc.G1_Old_Generation.count
Metric Typecounter
Unitnumber
Description
Additional Notes

jmx_jvm_gc_G1_Old_Generation_time

Prometheus IDjmx_jvm_gc_G1_Old_Generation_time
Legacy IDjvm.gc.G1_Old_Generation.time
Metric Typecounter
Unittime
Description
Additional Notes

jmx_jvm_gc_G1_Young_Generation_count

Prometheus IDjmx_jvm_gc_G1_Young_Generation_count
Legacy IDjvm.gc.G1_Young_Generation.count
Metric Typecounter
Unitnumber
Description
Additional Notes

jmx_jvm_gc_G1_Young_Generation_time

Prometheus IDjmx_jvm_gc_G1_Young_Generation_time
Legacy IDjvm.gc.G1_Young_Generation.time
Metric Typecounter
Unittime
Description
Additional Notes

jmx_jvm_gc_MarkSweepCompact_count

Prometheus IDjmx_jvm_gc_MarkSweepCompact_count
Legacy IDjvm.gc.MarkSweepCompact.count
Metric Typecounter
Unitnumber
Description
Additional Notes

jmx_jvm_gc_MarkSweepCompact_time

Prometheus IDjmx_jvm_gc_MarkSweepCompact_time
Legacy IDjvm.gc.MarkSweepCompact.time
Metric Typecounter
Unittime
Description
Additional Notes

jmx_jvm_gc_PS_MarkSweep_count

Prometheus IDjmx_jvm_gc_PS_MarkSweep_count
Legacy IDjvm.gc.PS_MarkSweep.count
Metric Typecounter
Unitnumber
DescriptionThe number of times the parallel scavenge Mark-Sweep old generation garbage collector has run.
Additional Notes

jmx_jvm_gc_PS_MarkSweep_time

Prometheus IDjmx_jvm_gc_PS_MarkSweep_time
Legacy IDjvm.gc.PS_MarkSweep.time
Metric Typecounter
Unittime
DescriptionThe amount of time the parallel scavenge Mark-Sweep old generation garbage collector has run.
Additional Notes

jmx_jvm_gc_PS_Scavenge_count

Prometheus IDjmx_jvm_gc_PS_Scavenge_count
Legacy IDjvm.gc.PS_Scavenge.count
Metric Typecounter
Unitnumber
DescriptionThe number of times the parallel eden/survivor space garbage collector has run.
Additional Notes

jmx_jvm_gc_PS_Scavenge_time

Prometheus IDjmx_jvm_gc_PS_Scavenge_time
Legacy IDjvm.gc.PS_Scavenge.time
Metric Typecounter
Unittime
DescriptionThe amount of time the parallel eden/survivor space garbage collector has run.
Additional Notes

jmx_jvm_gc_ParNew_count

Prometheus IDjmx_jvm_gc_ParNew_count
Legacy IDjvm.gc.ParNew.count
Metric Typecounter
Unitnumber
DescriptionThe number of times the parallel garbage collector has run.
Additional Notes

jmx_jvm_gc_ParNew_time

Prometheus IDjmx_jvm_gc_ParNew_time
Legacy IDjvm.gc.ParNew.time
Metric Typecounter
Unittime
DescriptionThe amount of time the parallel garbage collector has run.
Additional Notes

jmx_jvm_heap_committed

Prometheus IDjmx_jvm_heap_committed
Legacy IDjvm.heap.committed
Metric Typecounter
Unitnumber
DescriptionThe amount of memory that is currently allocated to the JVM for heap memory. Heap memory is the storage area for Java objects. The JVM may release memory to the system and Heap Committed could decrease below Heap Init; but Heap Committed can never increase above Heap Max.
Additional NotesBy default, this metric shows the average value for the selected scope. For instance, if you apply it to a group of machines, you will see the average value for the whole group. However, you can easily segment the metric to see it by host, process, container, and so on. Just use ‘Segment by’ in the UI.

jmx_jvm_heap_init

Prometheus IDjmx_jvm_heap_init
Legacy IDjvm.heap.init
Metric Typecounter
Unitnumber
DescriptionThe initial amount of memory that the JVM requests from the operating system for heap memory during startup (defined by the –Xms option). The JVM may request additional memory from the operating system and may also release memory to the system over time. The value of Heap Init may be undefined.
Additional NotesBy default, this metric shows the average value for the selected scope. For instance, if you apply it to a group of machines, you will see the average value for the whole group. However, you can easily segment the metric to see it by host, process, container, and so on. Just use ‘Segment by’ in the UI.

jmx_jvm_heap_max

Prometheus IDjmx_jvm_heap_max
Legacy IDjvm.heap.max
Metric Typecounter
Unitnumber
DescriptionThe maximum size allocation of heap memory for the JVM (defined by the –Xmx option). Any memory allocation attempt that would exceed this limit will cause an OutOfMemoryError exception to be thrown.
Additional NotesBy default, this metric shows the average value for the selected scope. For instance, if you apply it to a group of machines, you will see the average value for the whole group. However, you can easily segment the metric to see it by host, process, container, and so on. Just use ‘Segment by’ in the UI.

jmx_jvm_heap_used

Prometheus IDjmx_jvm_heap_used
Legacy IDjvm.heap.used
Metric Typecounter
Unitnumber
DescriptionThe amount of allocated heap memory (ie Heap Committed) currently in use. Heap memory is the storage area for Java objects. An object in the heap that is referenced by another object is ’live’, and will remain in the heap as long as it continues to be referenced. Objects that are no longer referenced are garbage and will be cleared out of the heap to reclaim space.
Additional NotesBy default, this metric shows the average value for the selected scope. For instance, if you apply it to a group of machines, you will see the average value for the whole group. However, you can easily segment the metric to see it by host, process, container, and so on. Just use ‘Segment by’ in the UI.

jmx_jvm_heap_used_percent

Prometheus IDjmx_jvm_heap_used_percent
Legacy IDjvm.heap.used.percent
Metric Typegauge
Unitpercent
DescriptionThe ratio between Heap Used and Heap Committed.
Additional NotesBy default, this metric shows the average value for the selected scope. For instance, if you apply it to a group of machines, you will see the average value for the whole group. However, you can easily segment the metric to see it by host, process, container, and so on. Just use ‘Segment by’ in the UI.

jmx_jvm_nonHeap_committed

Prometheus IDjmx_jvm_nonHeap_committed
Legacy IDjvm.nonHeap.committed
Metric Typecounter
Unitnumber
DescriptionThe amount of memory that is currently allocated to the JVM for non-heap memory. Non-heap memory is used by Java to store loaded classes and other meta-data. The JVM may release memory to the system and Non-Heap Committed could decrease below Non-Heap Init; but Non-Heap Committed can never increase above Non-Heap Max.
Additional NotesBy default, this metric shows the average value for the selected scope. For instance, if you apply it to a group of machines, you will see the average value for the whole group. However, you can easily segment the metric to see it by host, process, container, and so on. Just use ‘Segment by’ in the UI.

jmx_jvm_nonHeap_init

Prometheus IDjmx_jvm_nonHeap_init
Legacy IDjvm.nonHeap.init
Metric Typecounter
Unitnumber
DescriptionThe initial amount of memory that the JVM requests from the operating system for non-heap memory during startup. The JVM may request additional memory from the operating system and may also release memory to the system over time. The value of Non-Heap Init may be undefined.
Additional NotesBy default, this metric shows the average value for the selected scope. For instance, if you apply it to a group of machines, you will see the average value for the whole group. However, you can easily segment the metric to see it by host, process, container, and so on. Just use ‘Segment by’ in the UI.

jmx_jvm_nonHeap_max

Prometheus IDjmx_jvm_nonHeap_max
Legacy IDjvm.nonHeap.max
Metric Typecounter
Unitnumber
DescriptionThe maximum size allocation of non-heap memory for the JVM. This memory is used by Java to store loaded classes and other meta-data.
Additional NotesBy default, this metric shows the average value for the selected scope. For instance, if you apply it to a group of machines, you will see the average value for the whole group. However, you can easily segment the metric to see it by host, process, container, and so on. Just use ‘Segment by’ in the UI.

jmx_jvm_nonHeap_used

Prometheus IDjmx_jvm_nonHeap_used
Legacy IDjvm.nonHeap.used
Metric Typecounter
Unitnumber
DescriptionThe amount of allocated non-heap memory (ie Non-Heap Committed) currently in use. Non-heap memory is used by Java to store loaded classes and other meta-data.
Additional NotesBy default, this metric shows the average value for the selected scope. For instance, if you apply it to a group of machines, you will see the average value for the whole group. However, you can easily segment the metric to see it by host, process, container, and so on. Just use ‘Segment by’ in the UI.

jmx_jvm_nonHeap_used_percent

Prometheus IDjmx_jvm_nonHeap_used_percent
Legacy IDjvm.nonHeap.used.percent
Metric Typegauge
Unitpercent
DescriptionThe ratio between Non-Heap Used and Non-Heap Committed.
Additional NotesBy default, this metric shows the average value for the selected scope. For instance, if you apply it to a group of machines, you will see the average value for the whole group. However, you can easily segment the metric to see it by host, process, container, and so on. Just use ‘Segment by’ in the UI.

jmx_jvm_thread_count

Prometheus IDjmx_jvm_thread_count
Legacy IDjvm.thread.count
Metric Typegauge
Unitnumber
DescriptionThe current number of live daemon and non-daemon threads.
Additional NotesBy default, this metric shows the total value for the selected scope. For instance, if you apply it to a group of machines, you will see the total value for the whole group. However, you can easily segment the metric to see it by host, process, container, and so on. Just use ‘Segment by’ in the UI.

jmx_jvm_thread_daemon

Prometheus IDjmx_jvm_thread_daemon
Legacy IDjvm.thread.daemon
Metric Typegauge
Unitnumber
DescriptionThe current number of live daemon threads. Daemon threads are used for background supporting tasks and are only needed while normal threads are executing.
Additional NotesBy default, this metric shows the total value for the selected scope. For instance, if you apply it to a group of machines, you will see the total value for the whole group. However, you can easily segment the metric to see it by host, process, container, and so on. Just use ‘Segment by’ in the UI.

6.2.7 - Kubernetes

kube_certificatesigningrequest_created

Prometheus IDkube_certificatesigningrequest_created
Legacy ID
Metric Typegauge
Unit-
DescriptionTimestamp of when the CSR object was created
Additional NotesThe timestamp is in Unix epoch time.

kube_certificatesigningrequest_condition

Prometheus IDkube_certificatesigningrequest_condition
Legacy ID
Metric Typegauge
Unitnumber
DescriptionStores whether the CSR was approved or denied
Additional NotesThe metric will be 1 if the condition occurred and 0 if it didn’t.

kube_certificatesigningrequest_labels

Prometheus IDkube_certificatesigningrequest_labels
Legacy ID
Metric Typegauge
Unitnumber
DescriptionStores the labels associated with the object as labels on the metric.
Additional NotesThe value of the metric will always be 1. The labels will be prepended with ’label_'.

kube_certificatesigningrequest_cert_length

Prometheus IDkube_certificatesigningrequest_cert_length
Legacy ID
Metric Typegauge
Unitnumber
DescriptionThe number of characters in the certificate
Additional Notes

kube_daemonset_labels

Prometheus IDkube_daemonset_labels
Legacy ID
Metric Typegauge
Unitnumber
DescriptionStores the labels associated with the object as labels on the metric.
Additional NotesThe value of the metric will always be 1. The labels will be prepended with ’label_'.

kube_daemonset_status_current_number_scheduled

Prometheus IDkube_daemonset_status_current_number_scheduled
Legacy IDkubernetes.daemonSet.pods.scheduled
Metric Typegauge
Unitnumber
DescriptionThe number of nodes that running at least one daemon and are supposed to.
Additional Notes

kube_daemonset_status_desired_number_scheduled

Prometheus IDkube_daemonset_status_desired_number_scheduled
Legacy IDkubernetes.daemonSet.pods.desired
Metric Typegauge
Unitnumber
DescriptionThe number of nodes that should be running the daemon Pod.
Additional Notes

kube_daemonset_status_number_misscheduled

Prometheus IDkube_daemonset_status_number_misscheduled
Legacy IDkubernetes.daemonSet.pods.misscheduled
Metric Typegauge
Unitnumber
DescriptionThe number of nodes running a daemon Pod that are not supposed to.
Additional Notes

kube_daemonset_status_number_ready

Prometheus IDkube_daemonset_status_number_ready
Legacy IDkubernetes.daemonSet.pods.ready
Metric Typegauge
Unitnumber
DescriptionThe number of nodes that should be running the daemon Pod and have one or more of the daemon Pod running and ready.
Additional Notes

kube_deployment_labels

Prometheus IDkube_deployment_labels
Legacy ID
Metric Typegauge
Unitnumber
DescriptionStores the labels associated with the object as labels on the metric.
Additional NotesThe value of the metric will always be 1. The labels will be prepended with ’label_'.

kube_deployment_spec_paused

Prometheus IDkube_deployment_spec_paused
Legacy IDkubernetes.deployment.replicas.paused
Metric Typegauge
Unitnumber
DescriptionThe number of paused Pods per deployment. These Pods will not be processed by the deployment controller.
Additional Notes

kube_deployment_spec_replicas

Prometheus IDkube_deployment_spec_replicas
Legacy IDkubernetes.deployment.replicas.desired
Metric Typegauge
Unitnumber
DescriptionThe number of desired Pods per deployment.
Additional Notes

kube_deployment_status_replicas

Prometheus IDkube_deployment_status_replicas
Legacy IDkubernetes.deployment.replicas.running
Metric Typegauge
Unitnumber
DescriptionThe number of running Pods per deployment.
Additional Notes

kube_deployment_status_replicas_available

Prometheus IDkube_deployment_status_replicas_available
Legacy IDkubernetes.deployment.replicas.available
Metric Typegauge
Unitnumber
DescriptionThe number of available Pods per deployment.
Additional Notes

kube_deployment_status_replicas_unavailable

Prometheus IDkube_deployment_status_replicas_unavailable
Legacy IDkubernetes.deployment.replicas.unavailable
Metric Typegauge
Unitnumber
DescriptionThe number of unavailable Pods per deployment.
Additional Notes

kube_deployment_status_replicas_updated

Prometheus IDkube_deployment_status_replicas_updated
Legacy IDkubernetes.deployment.replicas.updated
Metric Typegauge
Unitnumber
DescriptionThe number of updated Pods per deployment.
Additional Notes

kube_hpa_labels

Prometheus IDkube_hpa_labels
Legacy ID
Metric Typegauge
Unitnumber
DescriptionStores the labels associated with the object as labels on the metric.
Additional NotesThe value of the metric will always be 1. The labels will be prepended with ’label_'.

kube_hpa_spec_max_replicas

Prometheus IDkube_hpa_spec_max_replicas
Legacy IDkubernetes.hpa.replicas.max
Metric Typegauge
Unitnumber
DescriptionUpper limit for the number of Pods that can be set by the autoscaler.
Additional Notes

kube_hpa_spec_min_replicas

Prometheus IDkube_hpa_spec_min_replicas
Legacy IDkubernetes.hpa.replicas.min
Metric Typegauge
Unitnumber
DescriptionLower limit for the number of Pods that can be set by the autoscaler.
Additional Notes

kube_hpa_status_current_replicas

Prometheus IDkube_hpa_status_current_replicas
Legacy IDkubernetes.hpa.replicas.current
Metric Typegauge
Unitnumber
DescriptionCurrent number of replicas of Pods managed by this autoscaler.
Additional Notes

kube_hpa_status_desired_replicas

Prometheus IDkube_hpa_status_desired_replicas
Legacy IDkubernetes.hpa.replicas.desired
Metric Typegauge
Unitnumber
DescriptionDesired number of replicas of Pods managed by this autoscaler.
Additional Notes

kube_ingress_info

Prometheus IDkube_ingress_info
Legacy ID
Metric Typegauge
Unitnumber
DescriptionThe labels on the metric store information about the object.
Additional NotesThe value of the metric will always be 1.

kube_ingress_labels

Prometheus IDkube_ingress_labels
Legacy ID
Metric Typegauge
Unitnumber
DescriptionStores the labels associated with the object as labels on the metric.
Additional NotesThe value of the metric will always be 1. The labels will be prepended with ’label_'.

kube_ingress_created

Prometheus IDkube_ingress_created
Legacy ID
Metric Typegauge
Unit-
DescriptionTimestamp of when the ingress object was created
Additional NotesThe timestamp is in Unix epoch time.

kube_ingress_path

Prometheus IDkube_ingress_path
Legacy ID
Metric Typegauge
Unitnumber
DescriptionInformation about the path of the ingress object is stored as labels on the metric.
Additional NotesThe value of the metric will always be 1.

kube_ingress_tls

Prometheus IDkube_ingress_tls
Legacy ID
Metric Typegauge
Unitnumber
DescriptionInformation about the TLS configuration of the ingress object is stored as labels on the metric.
Additional NotesThe value of the metric will always be 1.

kube_job_complete

Prometheus IDkube_job_complete
Legacy IDkubernetes.job.numSucceeded
Metric Typegauge
Unitnumber
DescriptionThe number of Pods which reached Phase Succeeded.
Additional Notes

kube_job_failed

Prometheus IDkube_job_failed
Legacy IDkubernetes.job.numFailed
Metric Typegauge
Unitnumber
DescriptionThe number of Pods which reached Phase Failed.
Additional Notes

kube_job_info

Prometheus IDkube_job_info
Legacy ID
Metric Typegauge
Unitnumber
DescriptionThe labels on the metric store information about the object.
Additional NotesThe value of the metric will always be 1.

kube_job_labels

Prometheus IDkube_job_labels
Legacy ID
Metric Typegauge
Unitnumber
DescriptionStores the labels associated with the object as labels on the metric.
Additional NotesThe value of the metric will always be 1. The labels will be prepended with ’label_'.

kube_job_owner

Prometheus IDkube_job_owner
Legacy ID
Metric Typegauge
Unitnumber
DescriptionInformation about the owner of the job is stored as labels on the metric.
Additional NotesThe value of the metric will always be 1.

kube_job_spec_completions

Prometheus IDkube_job_spec_completions
Legacy IDkubernetes.job.completions
Metric Typegauge
Unitnumber
DescriptionThe desired number of successfully finished Pods that the job should be run with.
Additional Notes

kube_job_spec_parallelism

Prometheus IDkube_job_spec_parallelism
Legacy IDkubernetes.job.parallelism
Metric Typegauge
Unitnumber
DescriptionThe maximum desired number of Pods that the job should run at any given time.
Additional Notes

kube_job_status_active

Prometheus IDkube_job_status_active
Legacy IDkubernetes.job.status.active
Metric Typegauge
Unitnumber
DescriptionThe number of actively running Pods.
Additional Notes

kube_namespace_labels

Prometheus IDkube_namespace_labels
Legacy ID
Metric Typegauge
Unitnumber
DescriptionStores the labels associated with the object as labels on the metric.
Additional NotesThe value of the metric will always be 1. The labels will be prepended with ’label_'.

kube_namespace_sysdig_count

Prometheus IDkube_namespace_sysdig_count
Legacy IDkubernetes.namespace.count
Metric Typegauge
Unitnumber
DescriptionThe number of namespaces.
Additional Notes

kube_namespace_sysdig_deployment_count

Prometheus IDkube_namespace_sysdig_deployment_count
Legacy IDkubernetes.namespace.deployment.count
Metric Typegauge
Unitnumber
DescriptionThe number of deployments per namespace.
Additional Notes

kube_namespace_sysdig_hpa_count

Prometheus IDkube_namespace_sysdig_hpa_count
Legacy IDkubernetes.namespace.hpa.count
Metric Typegauge
Unitnumber
DescriptionThe number of HPA per namespace.
Additional Notes

kube_namespace_sysdig_job_count

Prometheus IDkube_namespace_sysdig_job_count
Legacy IDkubernetes.namespace.job.count
Metric Typegauge
Unitnumber
DescriptionThe number of jobs per namespace.
Additional Notes

kube_namespace_sysdig_persistentvolumeclaim_count

Prometheus IDkube_namespace_sysdig_persistentvolumeclaim_count
Legacy IDkubernetes.namespace.persistentvolumeclaim.count
Metric Typegauge
Unitnumber
DescriptionThe number of persistentvolumeclaim per namespace.
Additional Notes

kube_namespace_sysdig_pod_available_count

Prometheus IDkube_namespace_sysdig_pod_available_count
Legacy IDkubernetes.namespace.pod.available.count
Metric Typegauge
Unitnumber
DescriptionThe number of available Pods per namespace.
Additional Notes

kube_namespace_sysdig_pod_desired_count

Prometheus IDkube_namespace_sysdig_pod_desired_count
Legacy IDkubernetes.namespace.pod.desired.count
Metric Typegauge
Unitnumber
DescriptionThe number of desired Pods per namespace.
Additional Notes

kube_namespace_sysdig_pod_running_count

Prometheus IDkube_namespace_sysdig_pod_running_count
Legacy IDkubernetes.namespace.pod.running.count
Metric Typegauge
Unitnumber
DescriptionThe number of Pods running per namespace.
Additional Notes

kube_namespace_sysdig_replicaset_count

Prometheus IDkube_namespace_sysdig_replicaset_count
Legacy IDkubernetes.namespace.replicaSet.count
Metric Typegauge
Unitnumber
DescriptionThe number of replicaSets per namespace.
Additional Notes

kube_namespace_sysdig_resourcequota_count

Prometheus IDkube_namespace_sysdig_resourcequota_count
Legacy IDkubernetes.namespace.resourcequota.count
Metric Typegauge
Unitnumber
DescriptionThe number of resource quota per namespace.
Additional Notes

kube_namespace_sysdig_service_count

Prometheus IDkube_namespace_sysdig_service_count
Legacy IDkubernetes.namespace.service.count
Metric Typegauge
Unitnumber
DescriptionThe number of services per namespace.
Additional Notes

kube_namespace_sysdig_statefulset_count

Prometheus IDkube_namespace_sysdig_statefulset_count
Legacy IDkubernetes.namespace.statefulSet.count
Metric Typegauge
Unitnumber
DescriptionThe number of statefulset per namespace.
Additional Notes

kube_node_info

Prometheus IDkube_node_info
Legacy ID
Metric Typegauge
Unitnumber
DescriptionThe labels on the metric store information about the object.
Additional NotesThe value of the metric will always be 1.

kube_node_labels

Prometheus IDkube_node_labels
Legacy ID
Metric Typegauge
Unitnumber
DescriptionStores the labels associated with the object as labels on the metric.
Additional NotesThe value of the metric will always be 1. The labels will be prepended with ’label_'.

kube_node_spec_unschedulable

Prometheus IDkube_node_spec_unschedulable
Legacy IDkubernetes.node.unschedulable
Metric Typegauge
Unitnumber
DescriptionThe number of nodes unavailable to schedule new Pods.
Additional Notes

kube_node_spec_taint

Prometheus IDkube_node_spec_taint
Legacy ID
Metric Typegauge
Unitnumber
DescriptionStores the taint’s key, value, and effect as labels on the metric.
Additional NotesThe value of the metric will always be 1.

kube_node_status_allocatable

Prometheus IDkube_node_status_allocatable
Legacy ID
Metric Typegauge
Unitnumber
DescriptionThe amount of a resource on a node that is freely available.
Additional NotesThe type and unit of the resource are stored as labels on the metric.

kube_node_status_allocatable_cpu_cores

Prometheus IDkube_node_status_allocatable_cpu_cores
Legacy IDkubernetes.node.allocatable.cpuCores
Metric Typegauge
Unitnumber
DescriptionThe CPU resources of a node that are available for scheduling.
Additional Notes

kube_node_status_allocatable_memory_bytes

Prometheus IDkube_node_status_allocatable_memory_bytes
Legacy IDkubernetes.node.allocatable.memBytes
Metric Typegauge
Unitdata
DescriptionThe memory resources of a node that are available for scheduling.
Additional Notes

kube_node_status_allocatable_pods

Prometheus IDkube_node_status_allocatable_pods
Legacy IDkubernetes.node.allocatable.pods
Metric Typegauge
Unitnumber
DescriptionThe Pod resources of a node that are available for scheduling.
Additional Notes

kube_node_status_capacity

Prometheus IDkube_node_status_capacity
Legacy ID
Metric Typegauge
Unitnumber
DescriptionThe total amount of a resource on a node.
Additional NotesThe type and unit of the resource are stored as labels on the metric.

kube_node_status_capacity_cpu_cores

Prometheus IDkube_node_status_capacity_cpu_cores
Legacy IDkubernetes.node.capacity.cpuCores
Metric Typegauge
Unitnumber
DescriptionThe maximum CPU resources of the node.
Additional Notes

kube_node_status_capacity_memory_bytes

Prometheus IDkube_node_status_capacity_memory_bytes
Legacy IDkubernetes.node.capacity.memBytes
Metric Typegauge
Unitdata
DescriptionThe maximum memory resources of the node.
Additional Notes

kube_node_status_capacity_pods

Prometheus IDkube_node_status_capacity_pods
Legacy IDkubernetes.node.capacity.pods
Metric Typegauge
Unitnumber
DescriptionThe maximum number of Pods of the node.
Additional Notes

kube_node_status_condition

Prometheus IDkube_node_status_condition
Legacy ID
Metric Typegauge
Unitnumber
DescriptionStores the condition of the node as a label on the metric.
Additional NotesThe value of the metric will always be 1.

kube_node_sysdig_disk_pressure

Prometheus IDkube_node_sysdig_disk_pressure
Legacy IDkubernetes.node.diskPressure
Metric Typegauge
Unitnumber
DescriptionThe number of nodes with disk pressure.
Additional Notes

kube_node_sysdig_host

Prometheus IDkube_node_sysdig_host
Legacy ID
Metric Typegauge
Unitnumber
DescriptionStores the hostname of the node as a label on the metric.
Additional NotesThe value of the metric will always be 1.

kube_node_sysdig_memory_pressure

Prometheus IDkube_node_sysdig_memory_pressure
Legacy IDkubernetes.node.memoryPressure
Metric Typegauge
Unitnumber
DescriptionThe number of nodes with memory pressure.
Additional Notes

kube_node_sysdig_network_unavailable

Prometheus IDkube_node_sysdig_network_unavailable
Legacy IDkubernetes.node.networkUnavailable
Metric Typegauge
Unitnumber
DescriptionThe number of nodes with network unavailable.
Additional Notes

kube_node_sysdig_ready

Prometheus IDkube_node_sysdig_ready
Legacy IDkubernetes.node.ready
Metric Typegauge
Unitnumber
DescriptionThe number of nodes that are ready.
Additional Notes

kube_persistentvolume_capacity_bytes

Prometheus IDkube_persistentvolume_capacity_bytes
Legacy IDkubernetes.persistentvolume.storage
Metric Typegauge
Unitnumber
DescriptionThe persistent volume’s capacity.
Additional Notes

kube_persistentvolume_claim_ref

Prometheus IDkube_persistentvolume_claim_ref
Legacy ID
Metric Typegauge
Unitnumber
DescriptionStores the claim’s name and namespace as labels on the metric.
Additional NotesThe value of the metric will always be 1.

kube_persistentvolume_info

Prometheus IDkube_persistentvolume_info
Legacy ID
Metric Typegauge
Unitnumber
DescriptionThe labels on the metric store information about the object.
Additional NotesThe value of the metric will always be 1.

kube_persistentvolume_labels

Prometheus IDkube_persistentvolume_labels
Legacy ID
Metric Typegauge
Unitnumber
DescriptionStores the labels associated with the object as labels on the metric.
Additional NotesThe value of the metric will always be 1. The labels will be prepended with ’label_'.

kube_persistentvolume_status_phase

Prometheus IDkube_persistentvolume_status_phase
Legacy ID
Metric Typegauge
Unitnumber
DescriptionStores the phase of the PV as a label on the metric.
Additional NotesThe value of the metric will always be 1.

kube_persistentvolumeclaim_access_mode

Prometheus IDkube_persistentvolumeclaim_access_mode
Legacy ID
Metric Typegauge
Unitnumber
DescriptionStores the access mode of the PVC as a label on the metric.
Additional NotesThe value of the metric will always be 1.

kube_persistentvolumeclaim_info

Prometheus IDkube_persistentvolumeclaim_info
Legacy ID
Metric Typegauge
Unitnumber
DescriptionThe labels on the metric store information about the object.
Additional NotesThe value of the metric will always be 1.

kube_persistentvolumeclaim_labels

Prometheus IDkube_persistentvolumeclaim_labels
Legacy ID
Metric Typegauge
Unitnumber
DescriptionStores the labels associated with the object as labels on the metric.
Additional NotesThe value of the metric will always be 1. The labels will be prepended with ’label_'.

kube_persistentvolumeclaim_resource_requests_storage_bytes

Prometheus IDkube_persistentvolumeclaim_resource_requests_storage_bytes
Legacy IDkubernetes.persistentvolumeclaim.requests.storage
Metric Typegauge
Unitnumber
DescriptionThe amount of bytes that the PVC has requested.
Additional Notes

kube_persistentvolumeclaim_status_phase

Prometheus IDkube_persistentvolumeclaim_status_phase
Legacy ID
Metric Typegauge
Unitnumber
DescriptionStores the phase of the PVC as a label on the metric.
Additional NotesThe value of the metric will always be 1.

kube_persistentvolumeclaim_sysdig_storage

Prometheus IDkube_persistentvolumeclaim_sysdig_storage
Legacy IDkubernetes.persistentvolumeclaim.storage
Metric Typegauge
Unitnumber
DescriptionThe actual resources of the underlying volume.
Additional Notes

kube_pod_container_info

Prometheus IDkube_pod_container_info
Legacy ID
Metric Typegauge
Unitnumber
DescriptionThe labels on the metric store information about the object.
Additional NotesThe value of the metric will always be 1.

kube_pod_container_resource_limits

Prometheus IDkube_pod_container_resource_limits
Legacy ID
Metric Typegauge
Unitnumber
DescriptionThe amount of the resource limit for a container in a pod.
Additional Notes

kube_pod_container_resource_requests

Prometheus IDkube_pod_container_resource_requests
Legacy ID
Metric Typegauge
Unitnumber
DescriptionThe amount of the resource request for a container in a pod.
Additional Notes

kube_pod_container_status_last_terminated_reason

Prometheus IDkube_pod_container_status_last_terminated_reason
Legacy ID
Metric Typegauge
Unitnumber
DescriptionStores the reason for the last terminated state as a label on the metric.
Additional NotesThe value of the metric will always be 1.

kube_pod_container_status_ready

Prometheus IDkube_pod_container_status_ready
Legacy ID
Metric Typegauge
Unitnumber
DescriptionThe number of containers in the Pod in the ready state.
Additional Notes

kube_pod_container_status_restarts_total

Prometheus IDkube_pod_container_status_restarts_total
Legacy ID
Metric Typecounter
Unitnumber
DescriptionThe number of times that containers in the Pod have restarted.
Additional Notes

kube_pod_container_status_running

Prometheus IDkube_pod_container_status_running
Legacy ID
Metric Typegauge
Unitnumber
DescriptionThe number of containers in the Pod in the running state.
Additional Notes

kube_pod_container_status_terminated

Prometheus IDkube_pod_container_status_terminated
Legacy ID
Metric Typegauge
Unitnumber
DescriptionThe number of containers in the Pod in the terminated state.
Additional Notes

kube_pod_container_status_terminated_reason

Prometheus IDkube_pod_container_status_terminated_reason
Legacy ID
Metric Typegauge
Unitnumber
DescriptionStores the reason that the container is in the terminated state as a label on the metric.
Additional NotesThe value of the metric will always be 1.

kube_pod_container_status_waiting

Prometheus IDkube_pod_container_status_waiting
Legacy ID
Metric Typegauge
Unitnumber
DescriptionThe number of containers in the Pod in the waiting state.
Additional Notes

kube_pod_container_status_waiting_reason

Prometheus IDkube_pod_container_status_waiting_reason
Legacy ID
Metric Typegauge
Unitnumber
DescriptionStores the reason that the container is in the waiting state as a label on the metric.
Additional NotesThe value of the metric will always be 1.

kube_pod_info

Prometheus IDkube_pod_info
Legacy IDkubernetes.pod.info
Metric Typegauge
Unitnumber
DescriptionThe labels on the metric store information about the object.
Additional NotesThe value of the metric will always be 1.

kube_pod_init_container_resource_limits

Prometheus IDkube_pod_init_container_resource_limits
Legacy ID
Metric Typegauge
Unitnumber
DescriptionThe amount of the resource limit for an init container in a pod.
Additional Notes

kube_pod_init_container_resource_requests

Prometheus IDkube_pod_init_container_resource_requests
Legacy ID
Metric Typegauge
Unitnumber
DescriptionThe amount of the resource request for an init container in a pod.
Additional Notes

kube_pod_init_container_status_last_terminated_reason

Prometheus IDkube_pod_init_container_status_last_terminated_reason
Legacy ID
Metric Typegauge
Unitnumber
DescriptionStores the reason for the last terminated state as a label on the metric.
Additional NotesThe value of the metric will always be 1.

kube_pod_init_container_status_ready

Prometheus IDkube_pod_init_container_status_ready
Legacy ID
Metric Typegauge
Unitnumber
DescriptionThe number of init containers in the Pod in the ready state.
Additional Notes

kube_pod_init_container_status_restarts_total

Prometheus IDkube_pod_init_container_status_restarts_total
Legacy ID
Metric Typecounter
Unitnumber
DescriptionThe number of times that init containers in the Pod have restarted.
Additional Notes

kube_pod_init_container_status_running

Prometheus IDkube_pod_init_container_status_running
Legacy ID
Metric Typegauge
Unitnumber
DescriptionThe number of init containers in the Pod in the running state.
Additional Notes

kube_pod_init_container_status_terminated

Prometheus IDkube_pod_init_container_status_terminated
Legacy ID
Metric Typegauge
Unitnumber
DescriptionThe number of init containers in the Pod in the terminated state.
Additional Notes

kube_pod_init_container_status_terminated_reason

Prometheus IDkube_pod_init_container_status_terminated_reason
Legacy ID
Metric Typegauge
Unitnumber
DescriptionStores the reason that the init container is in the terminated state as a label on the metric.
Additional NotesThe value of the metric will always be 1.

kube_pod_init_container_status_waiting

Prometheus IDkube_pod_init_container_status_waiting
Legacy ID
Metric Typegauge
Unitnumber
DescriptionThe number of init containers in the Pod in the waiting state.
Additional Notes

kube_pod_init_container_status_waiting_reason

Prometheus IDkube_pod_init_container_status_waiting_reason
Legacy ID
Metric Typegauge
Unitnumber
DescriptionStores the reason that the init container is in the waiting state as a label on the metric.
Additional NotesThe value of the metric will always be 1.

kube_pod_labels

Prometheus IDkube_pod_labels
Legacy ID
Metric Typegauge
Unitnumber
DescriptionStores the labels associated with the object as labels on the metric.
Additional NotesThe value of the metric will always be 1. The labels will be prepended with ’label_'.

kube_pod_owner

Prometheus IDkube_pod_owner
Legacy ID
Metric Typegauge
Unitnumber
DescriptionInformation about the owner of the pod is stored as labels on the metric.
Additional NotesThe value of the metric will always be 1.

kube_pod_spec_volumes_persistentvolumeclaims_info

Prometheus IDkube_pod_spec_volumes_persistentvolumeclaims_info
Legacy ID
Metric Typegauge
Unitnumber
DescriptionStores information about the PVC specified in a Pod’s spec.
Additional NotesThe value of the metric will always be 1.

kube_pod_spec_volumes_persistentvolumeclaims_readonly

Prometheus IDkube_pod_spec_volumes_persistentvolumeclaims_readonly
Legacy ID
Metric Typegauge
Unitnumber
DescriptionDescribes whether a PVC is mounted read-only.
Additional NotesThe value of the metric wil be 1 if the PVC is read-only and 0 if not.

kube_pod_sysdig_containers_waiting

Prometheus IDkube_pod_sysdig_containers_waiting
Legacy IDkubernetes.pod.containers.waiting
Metric Typegauge
Unitnumber
DescriptionThe number of containers waiting for a Pod.
Additional Notes

kube_pod_sysdig_resource_limits_cpu_cores

Prometheus IDkube_pod_sysdig_resource_limits_cpu_cores
Legacy IDkubernetes.pod.resourceLimits.cpuCores
Metric Typegauge
Unitnumber
DescriptionThe limit on CPU cores to be used by a container.
Additional Notes

kube_pod_sysdig_resource_limits_memory_bytes

Prometheus IDkube_pod_sysdig_resource_limits_memory_bytes
Legacy IDkubernetes.pod.resourceLimits.memBytes
Metric Typegauge
Unitdata
DescriptionThe limit on memory to be used by a container in bytes.
Additional Notes

kube_pod_sysdig_resource_requests_cpu_cores

Prometheus IDkube_pod_sysdig_resource_requests_cpu_cores
Legacy IDkubernetes.pod.resourceRequests.cpuCores
Metric Typegauge
Unitnumber
DescriptionThe number of CPU cores requested by containers in the Pod.
Additional Notes

kube_pod_sysdig_resource_requests_memory_bytes

Prometheus IDkube_pod_sysdig_resource_requests_memory_bytes
Legacy IDkubernetes.pod.resourceRequests.memBytes
Metric Typegauge
Unitdata
DescriptionThe number of memory bytes requested by containers in the Pod.
Additional Notes

kube_pod_sysdig_restart_count

Prometheus IDkube_pod_sysdig_restart_count
Legacy IDkubernetes.pod.restart.count
Metric Typegauge
Unitnumber
DescriptionThe number of container restarts for the Pod.
Additional Notes

kube_pod_sysdig_restart_rate

Prometheus IDkube_pod_sysdig_restart_rate
Legacy IDkubernetes.pod.restart.rate
Metric Typegauge
Unitnumber
DescriptionNumber of times the pod has been restarted per second
Additional Notes

kube_pod_sysdig_status_ready

Prometheus IDkube_pod_sysdig_status_ready
Legacy IDkubernetes.pod.status.ready
Metric Typegauge
Unitnumber
DescriptionThe number of pods ready to serve requests.
Additional Notes

kube_replicaset_labels

Prometheus IDkube_replicaset_labels
Legacy ID
Metric Typegauge
Unitnumber
DescriptionStores the labels associated with the object as labels on the metric.
Additional NotesThe value of the metric will always be 1. The labels will be prepended with ’label_'.

kube_replicaset_owner

Prometheus IDkube_replicaset_owner
Legacy ID
Metric Typegauge
Unitnumber
DescriptionInformation about the owner of the pod is stored as labels on the metric.
Additional NotesThe value of the metric will always be 1.

kube_replicaset_spec_replicas

Prometheus IDkube_replicaset_spec_replicas
Legacy IDkubernetes.replicaSet.replicas.desired
Metric Typegauge
Unitnumber
DescriptionThe number of desired Pods per replicaSet.
Additional Notes

kube_replicaset_status_fully_labeled_replicas

Prometheus IDkube_replicaset_status_fully_labeled_replicas
Legacy IDkubernetes.replicaSet.replicas.fullyLabeled
Metric Typegauge
Unitnumber
DescriptionThe number of fully labeled Pods per replicaSet.
Additional Notes

kube_replicaset_status_ready_replicas

Prometheus IDkube_replicaset_status_ready_replicas
Legacy IDkubernetes.replicaSet.replicas.ready
Metric Typegauge
Unitnumber
DescriptionThe number of ready Pods per replicaSet.
Additional Notes

kube_replicaset_status_replicas

Prometheus IDkube_replicaset_status_replicas
Legacy IDkubernetes.replicaSet.replicas.running
Metric Typegauge
Unitnumber
DescriptionThe number of running Pods per replicaSet.
Additional Notes

kube_resourcequota

Prometheus IDkube_resourcequota
Legacy ID
Metric Typegauge
Unitnumber
DescriptionThe amount of a resource that the resource quota is configured for.
Additional NotesThe resource type and whether the quota is hard or soft is stored as labels on the metric.

kube_resourcequota_sysdig_limits_cpu_hard

Prometheus IDkube_resourcequota_sysdig_limits_cpu_hard
Legacy IDkubernetes.resourcequota.limits.cpu.hard
Metric Typegauge
Unitnumber
DescriptionEnforced CPU Limit quota per namespace.
Additional Notes

kube_resourcequota_sysdig_limits_cpu_used

Prometheus IDkube_resourcequota_sysdig_limits_cpu_used
Legacy IDkubernetes.resourcequota.limits.cpu.used
Metric Typegauge
Unitnumber
DescriptionCurrent observed CPU limit usage per namespace.
Additional Notes

kube_resourcequota_sysdig_limits_memory_hard

Prometheus IDkube_resourcequota_sysdig_limits_memory_hard
Legacy IDkubernetes.resourcequota.limits.memory.hard
Metric Typegauge
Unitnumber
DescriptionEnforced memory limit quota per namespace.
Additional Notes

kube_resourcequota_sysdig_limits_memory_used

Prometheus IDkube_resourcequota_sysdig_limits_memory_used
Legacy IDkubernetes.resourcequota.limits.memory.used
Metric Typegauge
Unitnumber
DescriptionCurrent observed memory limit usage per namespace.
Additional Notes

kube_resourcequota_sysdig_persistentvolumeclaims_hard

Prometheus IDkube_resourcequota_sysdig_persistentvolumeclaims_hard
Legacy IDkubernetes.resourcequota.persistentvolumeclaims.hard
Metric Typegauge
Unitnumber
DescriptionEnforced Peristentvolumeclaim quota per namespace.
Additional Notes

kube_resourcequota_sysdig_persistentvolumeclaims_used

Prometheus IDkube_resourcequota_sysdig_persistentvolumeclaims_used
Legacy IDkubernetes.resourcequota.persistentvolumeclaims.used
Metric Typegauge
Unitnumber
DescriptionCurrent observed Persistentvolumeclaim usage per namespace.
Additional Notes

kube_resourcequota_sysdig_pods_hard

Prometheus IDkube_resourcequota_sysdig_pods_hard
Legacy IDkubernetes.resourcequota.pods.hard
Metric Typegauge
Unitnumber
DescriptionEnforced Pod quota per namespace.
Additional Notes

kube_resourcequota_sysdig_pods_used

Prometheus IDkube_resourcequota_sysdig_pods_used
Legacy IDkubernetes.resourcequota.pods.used
Metric Typegauge
Unitnumber
DescriptionCurrent observed Pod usage per namespace.
Additional Notes

kube_resourcequota_sysdig_requests_cpu_hard

Prometheus IDkube_resourcequota_sysdig_requests_cpu_hard
Legacy IDkubernetes.resourcequota.requests.cpu.hard
Metric Typegauge
Unitnumber
DescriptionEnforced CPU request quota per namespace.
Additional Notes

kube_resourcequota_sysdig_requests_cpu_used

Prometheus IDkube_resourcequota_sysdig_requests_cpu_used
Legacy IDkubernetes.resourcequota.requests.cpu.used
Metric Typegauge
Unitnumber
DescriptionCurrent observed CPU request usage per namespace.
Additional Notes

kube_resourcequota_sysdig_requests_memory_hard

Prometheus IDkube_resourcequota_sysdig_requests_memory_hard
Legacy IDkubernetes.resourcequota.requests.memory.hard
Metric Typegauge
Unitnumber
DescriptionEnforced memory request quota per namespace.
Additional Notes

kube_resourcequota_sysdig_requests_memory_used

Prometheus IDkube_resourcequota_sysdig_requests_memory_used
Legacy IDkubernetes.resourcequota.requests.memory.used
Metric Typegauge
Unitnumber
DescriptionCurrent observed memory request usage per namespace.
Additional Notes

kube_resourcequota_sysdig_services_hard

Prometheus IDkube_resourcequota_sysdig_services_hard
Legacy IDkubernetes.resourcequota.services.hard
Metric Typegauge
Unitnumber
DescriptionEnforced service quota per namespace.
Additional Notes

kube_resourcequota_sysdig_services_used

Prometheus IDkube_resourcequota_sysdig_services_used
Legacy IDkubernetes.resourcequota.services.used
Metric Typegauge
Unitnumber
DescriptionCurrent observed service usage per namespace.
Additional Notes

kube_service_info

Prometheus IDkube_service_info
Legacy ID
Metric Typegauge
Unitnumber
DescriptionThe labels on the metric store information about the object.
Additional NotesThe value of the metric will always be 1.

kube_service_labels

Prometheus IDkube_service_labels
Legacy ID
Metric Typegauge
Unitnumber
DescriptionStores the labels associated with the object as labels on the metric.
Additional NotesThe value of the metric will always be 1. The labels will be prepended with ’label_'.

kube_statefulset_labels

Prometheus IDkube_statefulset_labels
Legacy ID
Metric Typegauge
Unitnumber
DescriptionStores the labels associated with the object as labels on the metric.
Additional NotesThe value of the metric will always be 1. The labels will be prepended with ’label_'.

kube_statefulset_replicas

Prometheus IDkube_statefulset_replicas
Legacy IDkubernetes.statefulSet.replicas
Metric Typegauge
Unitnumber
DescriptionDesired number of replicas of the given Template.
Additional Notes

kube_statefulset_status_replicas

Prometheus IDkube_statefulset_status_replicas
Legacy IDkubernetes.statefulSet.status.replicas
Metric Typegauge
Unitnumber
DescriptionNumber of Pods created by the StatefulSet controller.
Additional Notes

kube_statefulset_status_replicas_current

Prometheus IDkube_statefulset_status_replicas_current
Legacy IDkubernetes.statefulSet.status.replicas.current
Metric Typegauge
Unitnumber
DescriptionThe number of Pods created by the StatefulSet controller from the StatefulSet version indicated by currrentRevision.
Additional Notes

kube_statefulset_status_replicas_ready

Prometheus IDkube_statefulset_status_replicas_ready
Legacy IDkubernetes.statefulSet.status.replicas.ready
Metric Typegauge
Unitnumber
DescriptionNumber of Pods created by the StatefulSet controller that have a Ready Condition.
Additional Notes

kube_statefulset_status_replicas_updated

Prometheus IDkube_statefulset_status_replicas_updated
Legacy IDkubernetes.statefulSet.status.replicas.updated
Metric Typegauge
Unitnumber
DescriptionNumber of Pods created by the StatefulSet controller from the StatefulSet version indicated by updateRevision.
Additional Notes

kube_storageclass_created

Prometheus IDkube_storageclass_created
Legacy ID
Metric Typegauge
Unit-
DescriptionUnix epoch time when the storageclass was created.
Additional Notes

kube_storageclass_info

Prometheus IDkube_storageclass_info
Legacy ID
Metric Typegauge
Unitnumber
DescriptionThe labels on the metric store information about the object.
Additional NotesThe value of the metric will always be 1.

kube_storageclass_labels

Prometheus IDkube_storageclass_labels
Legacy ID
Metric Typegauge
Unitnumber
DescriptionStores the labels associated with the object as labels on the metric.
Additional NotesThe value of the metric will always be 1. The labels will be prepended with ’label_'.

kube_workload_pods_status_phase

Prometheus IDkube_workload_pods_status_phase
Legacy IDkubernetes.workload.pods.status.phase
Metric Typegauge
Unitnumber
DescriptionThe number of Pods in a particular phase for the workload.
Additional NotesStores the phase as a label on the metric.

kube_workload_status_replicas_misscheduled

Prometheus IDkube_workload_status_replicas_misscheduled
Legacy IDkubernetes.workload.status.replicas.misscheduled
Metric Typegauge
Unitnumber
DescriptionThe number of running Pods for a workload that are not supposed to be running.
Additional Notes

kube_workload_status_replicas_scheduled

Prometheus IDkube_workload_status_replicas_scheduled
Legacy IDkubernetes.workload.status.replicas.scheduled
Metric Typegauge
Unitnumber
DescriptionThe number of Pods scheduled to be run for a workload.
Additional Notes

kube_workload_status_replicas_updated

Prometheus IDkube_workload_status_replicas_updated
Legacy IDkubernetes.workload.status.replicas.updated
Metric Typegauge
Unitnumber
DescriptionThe number of updated Pods per workload.
Additional Notes

kube_workload_status_running

Prometheus IDkube_workload_status_running
Legacy IDkubernetes.workload.status.running
Metric Typegauge
Unitnumber
DescriptionThe number of running Pods for a workload.
Additional Notes

kube_workload_status_unavailable

Prometheus IDkube_workload_status_unavailable
Legacy IDkubernetes.workload.status.unavailable
Metric Typegauge
Unitnumber
DescriptionThe number of unavailable Pods per workload.
Additional Notes

6.2.8 - Network

sysdig_connection_net_connection_in_count

Prometheus IDsysdig_connection_net_connection_in_count
Legacy IDnet.connection.count.in
Metric Typecounter
Unitnumber
DescriptionThe number of currently established client (inbound) connections.
Additional Notesnet_connection* metric is especially useful when segmented by protocol, port or process. This is the TCP-level connection counts. Sysdig also performs heuristics to present UDP packets as connections based on src and dst IP for these. These calculation is depend on syscalls such as connect and accept. This is different from the net_request* metrics. They are calculated based on our classification of data by parsing read /write buffers associated with read/write and send/receive syscalls into request and response. Even though buffers are not evaluated to determine protocol-level info, Sysdig can determine that a request (for example, a certain server process has received a read syscall or a client process has sent a write syscall) has been made and an associated response has been sent. Using this information, Sysdig generates the metrics without protocol-level segmentation. The latency is determined using the time delta between a request and a response.

sysdig_connection_net_connection_out_count

Prometheus IDsysdig_connection_net_connection_out_count
Legacy IDnet.connection.count.out
Metric Typecounter
Unitnumber
DescriptionThe number of currently established server (outbound) connections.
Additional NotesThis metric is especially useful when segmented by protocol, port or process.

sysdig_connection_net_connection_total_count

Prometheus IDsysdig_connection_net_connection_total_count
Legacy IDnet.connection.count.total
Metric Typecounter
Unitnumber
DescriptionThe number of currently established connections. This value may exceed the sum of the inbound and outbound metrics since it represents client and server inter-host connections as well as internal only connections.
Additional NotesThis metric is especially useful when segmented by protocol, port or process.

sysdig_connection_net_in_bytes

Prometheus IDsysdig_connection_net_in_bytes
Legacy IDnet.bytes.in
Metric Typecounter
Unitdata
DescriptionThe number of inbound network bytes.
Additional NotesBy default, this metric shows the total value for the selected scope. For instance, if you apply it to a group of machines, you will see the total value for the whole group. However, you can easily segment the metric to see it by host, process, container, and so on. Just use ‘Segment by’ in the UI.

sysdig_connection_net_out_bytes

Prometheus IDsysdig_connection_net_out_bytes
Legacy IDnet.bytes.out
Metric Typecounter
Unitdata
DescriptionThe number of outbound network bytes.
Additional NotesBy default, this metric shows the total value for the selected scope. For instance, if you apply it to a group of machines, you will see the total value for the whole group. However, you can easily segment the metric to see it by host, process, container, and so on. Just use ‘Segment by’ in the UI.

sysdig_connection_net_request_count

Prometheus IDsysdig_connection_net_request_count
Legacy IDnet.request.count
Metric Typecounter
Unitnumber
DescriptionThe total number of network requests. Note, this value may exceed the sum of inbound and outbound requests, because this count includes requests over internal connections.
Additional Notes

sysdig_connection_net_request_in_count

Prometheus IDsysdig_connection_net_request_in_count
Legacy IDnet.request.count.in
Metric Typecounter
Unitnumber
DescriptionThe number of inbound network requests.
Additional Notes

sysdig_connection_net_request_in_time

Prometheus IDsysdig_connection_net_request_in_time
Legacy IDnet.request.time.in
Metric Typecounter
Unittime
DescriptionThe average time to serve an inbound request.
Additional Notes

sysdig_connection_net_request_out_count

Prometheus IDsysdig_connection_net_request_out_count
Legacy IDnet.request.count.out
Metric Typecounter
Unitnumber
DescriptionThe number of outbound network requests.
Additional Notes

sysdig_connection_net_request_out_time

Prometheus IDsysdig_connection_net_request_out_time
Legacy IDnet.request.time.out
Metric Typecounter
Unittime
DescriptionThe number of average time spent waiting for an outbound request.
Additional Notes

sysdig_connection_net_request_time

Prometheus IDsysdig_connection_net_request_time
Legacy IDnet.request.time
Metric Typecounter
Unittime
DescriptionThe number of average time to serve a network request.
Additional Notes

sysdig_connection_net_total_bytes

Prometheus IDsysdig_connection_net_total_bytes
Legacy IDnet.bytes.total
Metric Typecounter
Unitdata
DescriptionThe total network bytes, including both inbound and outbound connections.
Additional NotesBy default, this metric shows the total value for the selected scope. For instance, if you apply it to a group of machines, you will see the total value for the whole group. However, you can easily segment the metric to see it by host, process, container, and so on. Just use ‘Segment by’ in the UI.

6.2.9 - Program

sysdig_program_cpu_cores_used

Prometheus IDsysdig_program_cpu_cores_used
Legacy IDcpu.cores.used
Metric Typegauge
Unitnumber
DescriptionThe CPU core usage of each program is obtained from cgroups, and is equal to the number of cores used by the program. For example, if a program uses two of an available four cores, the value of sysdig_program_cpu_cores_used will be two.
Additional Notes

sysdig_program_cpu_cores_used_percent

Prometheus IDsysdig_program_cpu_cores_used_percent
Legacy IDcpu.cores.used.percent
Metric Typegauge
Unitpercent
DescriptionThe CPU core usage percent for each program is obtained from cgroups, and is equal to the number of cores multiplied by 100. For example, if a program uses three cores, the value of sysdig_program_cpu_cores_used_percent would be 300%.
Additional Notes

sysdig_program_cpu_used_percent

Prometheus IDsysdig_program_cpu_used_percent
Legacy IDcpu.used.percent
Metric Typegauge
Unitpercent
DescriptionThe CPU usage for each program is obtained from cgroups, and normalized by dividing by the number of cores to determine an overall percentage. For example, if the environment contains six cores on a host, and the processes are assigned two cores, Sysdig will report CPU usage of 2/6 * 100% = 33.33%. This metric is calculated differently for hosts and containers.
Additional Notes

sysdig_program_fd_used_percent

Prometheus IDsysdig_program_fd_used_percent
Legacy IDfd.used.percent
Metric Typegauge
Unitpercent
DescriptionThe percentage of used file descriptors out of the maximum available.
Additional NotesUsually, when a process reaches its FD limit it will stop operating properly and possibly crash. As a consequence, this is a metric you want to monitor carefully, or even better use for alerts.

sysdig_program_file_error_open_count

Prometheus IDsysdig_program_file_error_open_count
Legacy IDfile.error.open.count
Metric Typecounter
Unitnumber
DescriptionThe number of errors caused by opening files.
Additional NotesBy default, this metric shows the total value for the selected scope. For instance, if you apply it to a group of machines, you will see the total value for the whole group. However, you can easily segment the metric to see it by host, process, container, and so on. Just use ‘Segment by’ in the UI.

sysdig_program_file_error_total_count

Prometheus IDsysdig_program_file_error_total_count
Legacy IDfile.error.total.count
Metric Typecounter
Unitnumber
DescriptionThe number of error caused by file access.
Additional NotesBy default, this metric shows the total value for the selected scope. For instance, if you apply it to a group of machines, you will see the total value for the whole group. However, you can easily segment the metric to see it by host, process, container, and so on. Just use ‘Segment by’ in the UI.

sysdig_program_file_in_bytes

Prometheus IDsysdig_program_file_in_bytes
Legacy IDfile.bytes.in
Metric Typecounter
Unitdata
DescriptionThe number of bytes read from file.
Additional NotesBy default, this metric shows the total value for the selected scope. For instance, if you apply it to a group of machines, you will see the total value for the whole group. However, you can easily segment the metric to see it by host, process, container, and so on. Just use ‘Segment by’ in the UI.

sysdig_program_file_in_iops

Prometheus IDsysdig_program_file_in_iops
Legacy IDfile.iops.in
Metric Typecounter
Unitnumber
DescriptionThe number of file read operations per second.
Additional NotesThis is calculated by measuring the actual number of read and write requests made by a process. Therefore, it can differ from what other tools show, which is usually based on interpolating this value from the number of bytes read and written to the file system.

sysdig_program_file_in_time

Prometheus IDsysdig_program_file_in_time
Legacy IDfile.time.in
Metric Typecounter
Unittime
DescriptionThe time spent in file reading.
Additional NotesBy default, this metric shows the total value for the selected scope. For instance, if you apply it to a group of machines, you will see the total value for the whole group. However, you can easily segment the metric to see it by host, process, container, and so on. Just use ‘Segment by’ in the UI.

sysdig_program_file_open_count

Prometheus IDsysdig_program_file_open_count
Legacy IDfile.open.count
Metric Typecounter
Unitnumber
DescriptionThe number of time the file has been opened.
Additional Notes

sysdig_program_file_out_bytes

Prometheus IDsysdig_program_file_out_bytes
Legacy IDfile.bytes.out
Metric Typecounter
Unitdata
DescriptionThe number of bytes written to file.
Additional NotesBy default, this metric shows the total value for the selected scope. For instance, if you apply it to a group of machines, you will see the total value for the whole group. However, you can easily segment the metric to see it by host, process, container, and so on. Just use ‘Segment by’ in the UI.

sysdig_program_file_out_iops

Prometheus IDsysdig_program_file_out_iops
Legacy IDfile.iops.out
Metric Typecounter
Unitnumber
DescriptionThe number of file write operations per second.
Additional NotesThis is calculated by measuring the actual number of read and write requests made by a process. Therefore, it can differ from what other tools show, which is usually based on interpolating this value from the number of bytes read and written to the file system.

sysdig_program_file_out_time

Prometheus IDsysdig_program_file_out_time
Legacy IDfile.time.out
Metric Typecounter
Unittime
DescriptionThe time spent in file writing.
Additional NotesBy default, this metric shows the total value for the selected scope. For instance, if you apply it to a group of machines, you will see the total value for the whole group. However, you can easily segment the metric to see it by host, process, container, and so on. Just use ‘Segment by’ in the UI.

sysdig_program_file_total_bytes

Prometheus IDsysdig_program_file_total_bytes
Legacy IDfile.bytes.total
Metric Typecounter
Unitdata
DescriptionThe number of bytes read from and written to file.
Additional NotesBy default, this metric shows the total value for the selected scope. For instance, if you apply it to a group of machines, you will see the total value for the whole group. However, you can easily segment the metric to see it by host, process, container, and so on. Just use ‘Segment by’ in the UI.

sysdig_program_file_total_iops

Prometheus IDsysdig_program_file_total_iops
Legacy IDfile.iops.total
Metric Typecounter
Unitnumber
DescriptionThe number of read and write file operations per second.
Additional NotesThis is calculated by measuring the actual number of read and write requests made by a process. Therefore, it can differ from what other tools show, which is usually based on interpolating this value from the number of bytes read and written to the file system.

sysdig_program_file_total_time

Prometheus IDsysdig_program_file_total_time
Legacy IDfile.time.total
Metric Typecounter
Unittime
DescriptionThe time spent in file I/O.
Additional NotesBy default, this metric shows the total value for the selected scope. For instance, if you apply it to a group of machines, you will see the total value for the whole group. However, you can easily segment the metric to see it by host, process, container, and so on. Just use ‘Segment by’ in the UI.

sysdig_program_info

Prometheus IDsysdig_program_info
Legacy IDinfo
Metric Typegauge
Unitnumber
Description
Additional Notes

sysdig_program_memory_used_bytes

Prometheus IDsysdig_program_memory_used_bytes
Legacy IDmemory.bytes.used
Metric Typegauge
Unitdata
DescriptionThe amount of physical memory currently in use.
Additional NotesBy default, this metric shows the average value for the selected scope. For instance, if you apply it to a group of machines, you will see the average value for the whole group. However, the metric can also be segmented by using ‘Segment by’ in the UI.

sysdig_program_memory_used_percent

Prometheus IDsysdig_program_memory_used_percent
Legacy IDmemory.used.percent
Metric Typegauge
Unitpercent
DescriptionThe percentage of physical memory in use.
Additional NotesBy default, this metric shows the average value for the selected scope. For instance, if you apply it to a group of machines, you will see the average value for the whole group. However, you can easily segment the metric to see it by host, process, container, and so on. Just use ‘Segment by’ in the UI.

sysdig_program_net_connection_in_count

Prometheus IDsysdig_program_net_connection_in_count
Legacy IDnet.connection.count.in
Metric Typecounter
Unitnumber
DescriptionThe number of currently established client (inbound) connections.
Additional NotesThis metric is especially useful when segmented by protocol, port or process.

sysdig_program_net_connection_out_count

Prometheus IDsysdig_program_net_connection_out_count
Legacy IDnet.connection.count.out
Metric Typecounter
Unitnumber
DescriptionThe number of currently established server (outbound) connections.
Additional NotesThis metric is especially useful when segmented by protocol, port or process.

sysdig_program_net_connection_total_count

Prometheus IDsysdig_program_net_connection_total_count
Legacy IDnet.connection.count.total
Metric Typecounter
Unitnumber
DescriptionThe number of currently established connections. This value may exceed the sum of the inbound and outbound metrics since it represents client and server inter-host connections as well as internal only connections.
Additional NotesThis metric is especially useful when segmented by protocol, port or process.

sysdig_program_net_error_count

Prometheus IDsysdig_program_net_error_count
Legacy IDnet.error.count
Metric Typecounter
Unitnumber
DescriptionThe total number of network errors occurred in a second.
Additional NotesBy default, this metric shows the total value for the selected scope. For instance, if you apply it to a group of machines, you will see the total value for the whole group. However, you can easily segment the metric to see it by host, process, container, and so on. Just use ‘Segment by’ in the UI.

sysdig_program_net_in_bytes

Prometheus IDsysdig_program_net_in_bytes
Legacy IDnet.bytes.in
Metric Typecounter
Unitdata
DescriptionThe number of inbound network bytes.
Additional NotesBy default, this metric shows the total value for the selected scope. For instance, if you apply it to a group of machines, you will see the total value for the whole group. However, you can easily segment the metric to see it by host, process, container, and so on. Just use ‘Segment by’ in the UI.

sysdig_program_net_out_bytes

Prometheus IDsysdig_program_net_out_bytes
Legacy IDnet.bytes.out
Metric Typecounter
Unitdata
DescriptionThe number of outbound network bytes.
Additional NotesBy default, this metric shows the total value for the selected scope. For instance, if you apply it to a group of machines, you will see the total value for the whole group. However, you can easily segment the metric to see it by host, process, container, and so on. Just use ‘Segment by’ in the UI.

sysdig_program_net_request_count

Prometheus IDsysdig_program_net_request_count
Legacy IDnet.request.count
Metric Typecounter
Unitnumber
DescriptionThe total number of network requests. Note, this value may exceed the sum of inbound and outbound requests, because this count includes requests over internal connections.
Additional Notes

sysdig_program_net_request_in_count

Prometheus IDsysdig_program_net_request_in_count
Legacy IDnet.request.count.in
Metric Typecounter
Unitnumber
DescriptionThe number of inbound network requests.
Additional Notes

sysdig_program_net_request_in_time

Prometheus IDsysdig_program_net_request_in_time
Legacy IDnet.request.time.in
Metric Typecounter
Unittime
DescriptionThe average time to serve an inbound request.
Additional Notes

sysdig_program_net_request_out_count

Prometheus IDsysdig_program_net_request_out_count
Legacy IDnet.request.count.out
Metric Typecounter
Unitnumber
DescriptionThe number of outbound network requests.
Additional Notes

sysdig_program_net_request_out_time

Prometheus IDsysdig_program_net_request_out_time
Legacy IDnet.request.time.out
Metric Typecounter
Unittime
DescriptionThe average time spent waiting for an outbound request.
Additional Notes

sysdig_program_net_request_time

Prometheus IDsysdig_program_net_request_time
Legacy IDnet.request.time
Metric Typecounter
Unittime
DescriptionAverage time to serve a network request.
Additional Notes

sysdig_program_net_tcp_queue_len

Prometheus IDsysdig_program_net_tcp_queue_len
Legacy IDnet.tcp.queue.len
Metric Typecounter
Unitnumber
DescriptionThe length of the TCP request queue.
Additional Notes

sysdig_program_net_total_bytes

Prometheus IDsysdig_program_net_total_bytes
Legacy IDnet.bytes.total
Metric Typecounter
Unitdata
DescriptionThe total network bytes, including inbound and outbound connections, in a program.
Additional NotesBy default, this metric shows the total value for the selected scope. For instance, if you apply it to a group of machines, you will see the total value for the whole group. However, you can easily segment the metric to see it by host, process, container, and so on. Just use ‘Segment by’ in the UI.

sysdig_program_proc_count

Prometheus IDsysdig_program_proc_count
Legacy IDproc.count
Metric Typecounter
Unitnumber
DescriptionThe number of processes on a host or container.
Additional Notes

sysdig_program_syscall_count

Prometheus IDsysdig_program_syscall_count
Legacy IDsyscall.count
Metric Typegauge
Unitnumber
DescriptionThe total number of syscalls seen
Additional NotesSyscalls are resource intensive. This metric tracks how many have been made by a given process or container

sysdig_program_thread_count

Prometheus IDsysdig_program_thread_count
Legacy IDthread.count
Metric Typecounter
Unitnumber
DescriptionThe total number of threads running in a program.
Additional Notes

sysdig_program_timeseries_count_appcheck

Prometheus IDsysdig_program_timeseries_count_appcheck
Legacy IDmetricCount.appCheck
Metric Typegauge
Unitnumber
DescriptionThe number of app check custom metrics.
Additional Notes

sysdig_program_timeseries_count_jmx

Prometheus IDsysdig_program_timeseries_count_jmx
Legacy IDmetricCount.jmx
Metric Typegauge
Unitnumber
DescriptionThe number of JMS custom metrics.
Additional Notes

sysdig_program_timeseries_count_prometheus

Prometheus IDsysdig_program_timeseries_count_prometheus
Legacy IDmetricCount.prometheus
Metric Typegauge
Unitnumber
DescriptionThe number of Prometheus custom metrics.
Additional Notes

sysdig_program_up

Prometheus IDsysdig_program_up
Legacy IDuptime
Metric Typegauge
Unitnumber
DescriptionThe percentage of time the selected entity was down during the visualized time sample. This can be used to determine if a machine (or a group of machines) went down.
Additional Notes

6.2.10 - Provider

sysdig_cloud_provider_info

Prometheus IDsysdig_cloud_provider_info
Legacy IDinfo
Metric Typegauge
Unitnumber
DescriptionThe metrics will always have the value of 1.
Additional Notes

6.3 - Metrics in Sysdig Legacy Format

The Sysdig legacy metrics dictionary lists the default legacy metrics supported by the Sysdig product suite, as well as kube state and cloud provider metrics.

The metrics listed in this section follows the statsd-compatible Sysdig naming convention. To see a mapping between Prometheus notation and Sysdig notation, see Metrics and Label Mapping.

Overview

Each metric in the dictionary has several pieces of metadata listed to provide greater context for how the metric can be used within Sysdig products. An example layout is displayed below:

Metric Name

Metric definition. For some metrics, the equation for how the value is determined is provided.

Metadata

Definition

Metric Type

Metric type determines whether the metric value is a counter metric or a gauge metric. Sysdig Monitor offers two Metric types:

Counter: The metric whose value keeps on increasing and is reliant on previous values. It helps you record how many times something has happened, for example, a user login.

Gauge: Represents a single numerical value that can arbitrarily fluctuate over time. Each value returns an instantaneous measurement, for example, CPU usage.

Value Type

The type of value the metric can have. The possible values are:

  • Percent (%)

  • Byte

  • Date

  • Double

  • Integer (int)

  • relativeTime

  • String

Segment By

The levels within the infrastructure that the metric can be segmented at:

  • Host

  • Container

  • Process

  • Kubernetes

  • Mesos

  • Swarm

  • CloudProvider

Default Time Aggregation

The default time aggregation format for the metric.

Available Time Aggregation Formats

The time aggregation formats the metric can be aggregated by:

  • Average (Avg)

  • Rate

  • Sum

  • Minimum (Min)

  • Maximum (Max)

Default Group Aggregation

The default group aggregation format for the metric.

Available Group Aggregation Formats

The group aggregation formats the metric can be aggregated by:

  • Average (Avg)

  • Sum

  • Minimum (Min)

  • Maximum (Max)

6.3.1 - Agent

Note: Sysdig follows the Prometheus-compabtible naming convention for both metrics and labels as opposed to the previous statsd-compatible, legacy Sysdig naming convention. However, this page still shows metrics in the legacy Sysdig naming convention. Until this page is updated, see Metrics and Label Mapping for the mapping between Sysdig legacy and Prometheus naming conventions.

dragent.analyzer

dragent is the main process in the agent that collects and collates data from multiple sources, including syscall events from the kernel in order to generate metrics. The analyzer module that runs in the dragent process does much of the work involved in generating metrics. These internal metrics are used to troubleshoot the health of the analyzer component.

Sysdig Monitor provides the following analyzer metrics:

MetricsTypeMinimum Agent VersionDescription
dragent.analyzer.processesgauge0.80.0 or aboveThe number of processes found by the analyzer.
dragent.analyzer.threadsThe number of threads found by the analyzer.
dragent.analyzer.threads.droppedcounterThe number of threads not reported due to thread limits.
dragent.analyzer.containersgaugeThe number of containers found by the analyzer.
dragent.analyzer.javaprocsThe number of java processes found by the analyzer.
dragent.analyzer.appchecksThe number of application checks reporting to the analyzer.
dragent.analyzer.mesos.autodetectIf the agent is configured to autodetect a Mesos environment, value is 1, otherwise is 0.
dragent.analyzer.mesos.detectedIf the agent actually found a Mesos environment, value is 1, otherwise, value is 0
dragent.analyzer.fp.pct100The analyzer flush CPU % (0-100)
dragent.analyzer.fl.msThe analyzer flush duration (milliseconds)
dragent.analyzer.srThe current sampling ratio (1=all events, 2= half of events analyzed, 4=one fourth of events analyzed, and so on.
dragent.analyzer.n_evtsThe number of events processed
dragent.analyzer.n_dropsThe number of events dropped
dragent.analyzer.n_drops_bufferThe number of events dropped due to the buffer being full.
dragent.analyzer.n_preemptionsThe number of driver preemptions.
dragent.analyzer.n_command_linesThe number of command lines collected and sent to the collector.
dragent.analyzer.command_line_cats.n_none
dragent.analyzer.n_container_healthcheck_command_lines0.80.1 or aboveThe number of command lines identified as container health checks. This metric does not change even if healthcheck command lines are not sent to the collector.

6.3.2 - Applications

Note: Sysdig follows the Prometheus-compabtible naming convention for both metrics and labels as opposed to the previous statsd-compatible, legacy Sysdig naming convention. However, this page still shows metrics in the legacy Sysdig naming convention. Until this page is updated, see Metrics and Label Mapping for the mapping between Sysdig legacy and Prometheus naming conventions.

The metrics in this section are collected from either default or customized agent configurations for integrated applications. See also: Integrate Applications (Default App Checks).

Contents

6.3.2.1 - Apache Metrics

See Application Integrations for more information.

apache.conns_async_closing

The number of asynchronous closing connections.

apache.conns_async_keep_alive

The number of asynchronous keep-alive connections.

apache.conns_async_writing

The number of asynchronous write connections.

apache.conns_total

The total number of connections handled.

apache.net.bytes

The total number of bytes served.

apache.net.bytes_per_s

The number of bytes served per second.

apache.net.hits

The total number of requests performed.

apache.net.request_per_s

The number of requests performed per second.

apache.performance.busy_workers

The number of workers currently serving requests.

apache.performance.cpu_load

The percentage of CPU used.

apache.performance.idle_workers

The number of idle workers in the instance.

apache.performance.uptime

The amount of time the server has been running in seconds.

6.3.2.2 - Apache Kafka Metrics

Contents

6.3.2.2.1 - Apache Kafka Consumer Metrics

See Application Integrations for more information.

kafka.broker_offset

The current message offset value on the broker.

kafka.consumer_lag

The lag in messages between the consumer and the broker.

kafka.consumer_offset

The current message offset value on the consumer.

6.3.2.2.2 - Apache Kafka JMX Metrics

See Application Integrations for more information.

The kafka.consumer.* and kafka.producer.* metrics are only available with JMX customization as documented in Integrate JMX Metrics from Java Virtual Machines.

kafka.consumer.bytes_consumed

The average number of bytes consumed for a specific topic per second.

kafka.consumer.bytes_in

The rate of bytes coming in to the consumer.

kafka.consumer.delayed_requests

The number of delayed consumer requests.

kafka.consumer.expires_per_second

The rate of delayed consumer request expiration.

kafka.consumer.fetch_rate

The minimum rate at which the consumer sends fetch requests to a broker.

kafka.consumer.fetch_size_avg

The average number of bytes fetched for a specific topic per request.

kafka.consumer.fetch_size_max

The maximum number of bytes fetched for a specific topic per request.

kafka.consumer.kafka_commits

The rate of offset commits to Kafka.

kafka.consumer.max_lag

The maximum consumer lag.

kafka.consumer.messages_in

The rate of consumer message consumption.

kafka.consumer.records_consumed

The average number of records consumed per second for a specific topic.

kafka.consumer.records_per_request_avg

The average number of records in each request for a specific topic.

kafka.consumer.zookeeper_commits

The rate of offset commits to ZooKeeper.

kafka.expires_sec

The rate of delayed producer request expiration.

kafka.follower.expires_per_second

The rate of request expiration on followers.

kafka.log.flush_rate

The log flush rate.

kafka.messages_in

The incoming message rate.

kafka.net.bytes_in

The incoming byte rate.

kafka.net.bytes_out

The outgoing byte rate.

kafka.net.bytes_rejected

The rejected byte rate.

kafka.producer.available_buffer_bytes

The total amount of buffer memory, including unallocated buffer memory and memory in the free list, that is not being used.

kafka.producer.batch_size_avg

The average number of bytes sent per partition per-request.

kafka.producer.batch_size_max

The maximum number of bytes sent per partition per-request.

kafka.producer.buffer_bytes_total

The maximum amount of buffer memory the client can use.

kafka.producer.bufferpool_wait_time

The fraction of time an appender waits for space allocation.

kafka.producer.bytes_out

The rate of bytes going out for the producer.

kafka.producer.compression_rate

The average compression rate of record batches for a topic.

kafka.producer.compression_rate_avg

The average compression rate of record batches.

kafka.producer.delayed_requests

The number of producer requests delayed.

kafka.producer.expires_per_seconds

The rate of producer request expiration.

kafka.producer.io_wait

The producer I/O wait time.

kafka.producer.message_rate

The producer message rate.

kafka.producer.metadata_age

The age of the current producer metadata being used, in seconds.

kafka.producer.record_error_rate

The average number of retried record sends for a topic per second.

kafka.producer.record_queue_time_avg

The average time that record batches spent in the record accumulator, in milliseconds.

kafka.producer.record_queue_time_max

The maximum amount of time record batches can spend in the record accumulator, in milliseconds.

kafka.producer.record_retry_rate

The average number of retried record sends for a topic per second.

kafka.producer.record_send_rate

The average number of records sent per second for a topic.

kafka.producer.record_size_avg

The average record size.

kafka.producer.record_size_max

The maximum record size.

kafka.producer.records_per_request

The average number of records sent per second.

kafka.producer.request_latency_avg

The average request latency of the producer.

kafka.producer.request_latency_max

The maximum request latency in milliseconds.

kafka.producer.request_rate

The number of producer requests per second.

kafka.producer.requests_in_flight

The current number of in-flight requests awaiting a response

kafka.producer.response_rate

The number of producer responses per second.

kafka.producer.throttle_time_avg

The average time in a request was throttled by a broker, in milliseconds.

kafka.producer.throttle_time_max

The maximum time in a request was throttled by a broker, in milliseconds.

kafka.producer.waiting_threads

The number of user threads blocked waiting for buffer memory to enqueue their records.

kafka.replication.isr_expands

The rate of replicas joining the ISR pool.

kafka.replication.isr_shrinks

The rate of replicas leaving the ISR pool.

kafka.replication.leader_elections

The leader election rate.

kafka.replication.unclean_leader_elections

The unclean leader election rate.

kafka.replication.under_replicated_partitions

The number of unreplicated partitions.

kafka.request.fetch.failed

The number of client fetch request failures.

kafka.request.fetch.failed_per_second

The rate of client fetch request failures per second.

kafka.request.fetch.time.99percentile

The time for fetch requests for the 99th percentile.

kafka.request.fetch.time.avg

The average time per fetch request.

kafka.request.handler.avg.idle.pct

The average fraction of time the request handler threads are idle.

kafka.request.metadata.time.99percentile

The time for metadata requests for 99th percentile.

kafka.request.metadata.time.avg

The average time for a metadata request.

kafka.request.offsets.time.99percentile

The time for offset requests for the 99th percentile.

kafka.request.offsets.time.avg

The average time for an offset request.

kafka.request.produce.failed

The number of failed produce requests.

kafka.request.produce.failed_per_second

The rate of failed produce requests per second.

kafka.request.produce.time.99percentile

The time for produce requests for the 99th percentile.

kafka.request.produce.time.avg

The average time for a produce request.

kafka.request.update_metadata.time.99percentile

The time for update metadata requests for the 99th percentile

kafka.request.update_metadata.time.avg

The average time for a request to update metadata.

6.3.2.3 - Consul Metrics

Contents

6.3.2.3.1 - Base Consul Metrics

See Application Integrations for more information.

consul.catalog.nodes_critical

Number of nodes with service status `critical` from those registered.

consul.catalog.nodes_passing

Number of nodes with service status `passing` from those registered.

consul.catalog.nodes_up

Number of nodes.

consul.catalog.nodes_warning

Number of nodes with service status `warning` from those registered.

consul.catalog.services_critical

Total critical services on nodes.

consul.catalog.services_passing

Total passing services on nodes.

consul.catalog.services_up

Total services registered on nodes.

consul.catalog.services_warning

Total warning services on nodes.

consul.catalog.total_nodes

Number of nodes registered in the consul cluster.

consul.net.node.latency.max

Maximum latency from this node to all others.

consul.net.node.latency.median

Median latency from this node to all others.

consul.net.node.latency.min

Minimum latency from this node to all others.

consul.net.node.latency.p25

p25 latency from this node to all others.

consul.net.node.latency.p75

p75 latency from this node to all others.

consul.net.node.latency.p90

p90 latency from this node to all others.

consul.net.node.latency.p95

p95 latency from this node to all others.

consul.net.node.latency.p99

p99 latency from this node to all others.

consul.peers

Number of peers in the peer set.

6.3.2.3.2 - Consul StatsD Metrics

See Application Integrations for more information.

consul.memberlist.msg.suspect

Number of times an agent suspects another as failed while probing during gossip protocol.

consul.raft.apply

Number of raft transactions occurring.

consul.raft.commitTime.95percentile

The p95 time it takes to commit a new entry to the raft log on the leader.

consul.raft.commitTime.avg

The average time it takes to commit a new entry to the raft log on the leader.

consul.raft.commitTime.count

The number of samples of raft.commitTime

consul.raft.commitTime.max

The max time it takes to commit a new entry to the raft log on the leader.

consul.raft.commitTime.median

The median time it takes to commit a new entry to the raft log on the leader.

consul.raft.leader.dispatchLog.95percentile

The p95 time it takes for the leader to write log entries to disk.

consul.raft.leader.dispatchLog.avg

The average time it takes for the leader to write log entries to disk.

consul.raft.leader.dispatchLog.count

The number of samples of raft.leader.dispatchLog.

consul.raft.leader.dispatchLog.max

The max time it takes for the leader to write log entries to disk.

consul.raft.leader.dispatchLog.median

The median time it takes for the leader to write log entries to disk.

consul.raft.leader.lastContact.95percentile

P95 time elapsed since the leader was last able to check its lease with followers.

consul.raft.leader.lastContact.avg

Average time elapsed since the leader was last able to check its lease with followers.

consul.raft.leader.lastContact.count

The number of samples of raft.leader.lastContact.

consul.raft.leader.lastContact.max

Max time elapsed since the leader was last able to check its lease with followers.

consul.raft.leader.lastContact.median

Median time elapsed since the leader was last able to check its lease with followers.

consul.raft.state.candidate

The number of initiated leader elections.

consul.raft.state.leader

Number of completed leader elections.

consul.runtime.alloc_bytes

Current bytes allocated by the Consul process.

consul.runtime.free_count

Cumulative count of heap objects freed.

consul.runtime.heap_objects

Number of objects allocated on the heap.

consul.runtime.malloc_count

Cumulative count of heap objects allocated.

consul.runtime.num_goroutines

Number of running goroutines.

consul.runtime.sys_bytes

Total size of the virtual address space reserved by the Go runtime.

consul.runtime.total_gc_pause_ns

Cumulative nanoseconds in GC stop-the-world pauses since Consul started.

consul.runtime.total_gc_runs

Number of completed GC cycles.

consul.serf.events

Incremented when an agent processes a serf event.

consul.serf.member.flap

Number of times an agent is marked dead and then quickly recovers.

consul.serf.member.join

Incremented when an agent processes a join event.

6.3.2.4 - Couchbase Metrics

See Application Integrations for more information.

couchbase.by_bucket.avg_bg_wait_time

The average background wait time.

couchbase.by_bucket.avg_disk_commit_time

The average disk commit time.

couchbase.by_bucket.avg_disk_update_time

The average disk update time.

couchbase.by_bucket.bg_wait_total

The total background wait time.

couchbase.by_bucket.bytes_read

The number of bytes read.

couchbase.by_bucket.bytes_written

The number of bytes written.

couchbase.by_bucket.cas_badval

The number of compare and swap bad values.

couchbase.by_bucket.cas_hits

The number of compare and swap hits.

couchbase.by_bucket.cas_misses

The number of compare and swap misses.

couchbase.by_bucket.cmd_get

The number of compare and swap gets.

couchbase.by_bucket.cmd_set

The number of compare and swap sets.

couchbase.by_bucket.couch_docs_actual_disk_size

The size of the couchbase docs on disk.

couchbase.by_bucket.couch_docs_data_size

The data size of the couchbase docs.

couchbase.by_bucket.couch_docs_disk_size

Couch docs total size in bytes.

couchbase.by_bucket.couch_docs_fragmentation

The percentage of couchbase docs fragmentation.

couchbase.by_bucket.couch_spatial_data_size

The size of object data for spatial views.

couchbase.by_bucket.couch_spatial_disk_size

The amount of disk space occupied by spatial views.

couchbase.by_bucket.couch_spatial_ops

Spatial operations.

couchbase.by_bucket.couch_total_disk_size

The total disk size for couchbase.

couchbase.by_bucket.couch_views_data_size

The size of object data for views.

couchbase.by_bucket.couch_views_disk_size

The amount of disk space occupied by views.

couchbase.by_bucket.couch_views_fragmentation

The view fragmentation.

couchbase.by_bucket.couch_views_ops

View operations.

couchbase.by_bucket.cpu_idle_ms

CPU idle milliseconds.

couchbase.by_bucket.cpu_utilization_rate

CPU utilization percentage.

couchbase.by_bucket.curr_connections

Current bucket connections.

couchbase.by_bucket.curr_items

Number of active items in memory.

couchbase.by_bucket.curr_items_tot

Total number of items.

couchbase.by_bucket.decr_hits

Decrement hits.

couchbase.by_bucket.decr_misses

Decrement misses.

couchbase.by_bucket.delete_hits

Delete hits.

couchbase.by_bucket.delete_misses

Delete misses.

couchbase.by_bucket.disk_commit_count

Disk commits.

couchbase.by_bucket.disk_update_count

Disk updates.

couchbase.by_bucket.disk_write_queue

Disk write queue depth.

couchbase.by_bucket.ep_bg_fetched

Disk reads per second.

couchbase.by_bucket.ep_cache_miss_rate

Cache miss rate.

couchbase.by_bucket.ep_cache_miss_ratio

Cache miss ratio.

couchbase.by_bucket.ep_dcp_2i_backoff

Number of backoffs for indexes DCP connections.

couchbase.by_bucket.ep_dcp_2i_count

Number of indexes DCP connections.

couchbase.by_bucket.ep_dcp_2i_items_remaining

Number of indexes items remaining to be sent.

couchbase.by_bucket.ep_dcp_2i_items_sent

Number of indexes items sent.

couchbase.by_bucket.ep_dcp_2i_producer_count

Number of indexes producers

couchbase.by_bucket.ep_dcp_2i_total_bytes

Number bytes per second being sent for indexes DCP connections.

couchbase.by_bucket.ep_dcp_fts_backoff

Number of backoffs for fts DCP connections.

couchbase.by_bucket.ep_dcp_fts_count

Number of fts DCP connections.

couchbase.by_bucket.ep_dcp_fts_items_remaining

Number of fts items remaining to be sent.

couchbase.by_bucket.ep_dcp_fts_items_sent

Number of fts items sent.

couchbase.by_bucket.ep_dcp_fts_producer_count

Number of fts producers.

couchbase.by_bucket.ep_dcp_fts_total_bytes

Number bytes per second being sent for fts DCP connections.

couchbase.by_bucket.ep_dcp_other_backoff

Number of backoffs for other DCP connections.

couchbase.by_bucket.ep_dcp_other_count

Number of other DCP connections.

couchbase.by_bucket.ep_dcp_other_items_remaining

Number of other items remaining to be sent.

couchbase.by_bucket.ep_dcp_other_items_sent

Number of other items sent.

couchbase.by_bucket.ep_dcp_other_producer_count

Number of other producers.

couchbase.by_bucket.ep_dcp_other_total_bytes

Number bytes per second being sent for other DCP connections.

couchbase.by_bucket.ep_dcp_replica_backoff

Number of backoffs for replica DCP connections.

couchbase.by_bucket.ep_dcp_replica_count

Number of replica DCP connections.

couchbase.by_bucket.ep_dcp_replica_items_remaining

Number of replica items remaining to be sent.

couchbase.by_bucket.ep_dcp_replica_items_sent

Number of replica items sent.

couchbase.by_bucket.ep_dcp_replica_producer_count

Number of replica producers.

couchbase.by_bucket.ep_dcp_replica_total_bytes

Number bytes per second being sent for replica DCP connections.

couchbase.by_bucket.ep_dcp_views_backoff

Number of backoffs for views DCP connections.

couchbase.by_bucket.ep_dcp_views_count

Number of views DCP connections.

couchbase.by_bucket.ep_dcp_views_items_remaining

Number of views items remaining to be sent.

couchbase.by_bucket.ep_dcp_views_items_sent

Number of views items sent.

couchbase.by_bucket.ep_dcp_views_producer_count

Number of views producers.

couchbase.by_bucket.ep_dcp_views_total_bytes

Number bytes per second being sent for views DCP connections.

couchbase.by_bucket.ep_dcp_xdcr_backoff

Number of backoffs for xdcr DCP connections.

couchbase.by_bucket.ep_dcp_xdcr_count

Number of xdcr DCP connections.

couchbase.by_bucket.ep_dcp_xdcr_items_remaining

Number of xdcr items remaining to be sent.

couchbase.by_bucket.ep_dcp_xdcr_items_sent

Number of xdcr items sent.

couchbase.by_bucket.ep_dcp_xdcr_producer_count

Number of xdcr producers.

couchbase.by_bucket.ep_dcp_xdcr_total_bytes

Number bytes per second being sent for xdcr DCP connections.

couchbase.by_bucket.ep_diskqueue_drain

Total Drained items on disk queue.

couchbase.by_bucket.ep_diskqueue_fill

Total enqueued items on disk queue.

couchbase.by_bucket.ep_diskqueue_items

Total number of items waiting to be written to disk.

couchbase.by_bucket.ep_flusher_todo

Number of items currently being written.

couchbase.by_bucket.ep_item_commit_failed

Number of times a transaction failed to commit due to storage errors.

couchbase.by_bucket.ep_kv_size

Total amount of user data cached in RAM in this bucket.

couchbase.by_bucket.ep_max_size

The maximum amount of memory this bucket can use.

couchbase.by_bucket.ep_mem_high_wat

Memory usage high water mark for auto-evictions.

couchbase.by_bucket.ep_mem_low_wat

Memory usage low water mark for auto-evictions.

couchbase.by_bucket.ep_meta_data_memory

Total amount of item metadata consuming RAM in this bucket.

couchbase.by_bucket.ep_num_non_resident

Number of non-resident items.

couchbase.by_bucket.ep_num_ops_del_meta

Number of delete operations per second for this bucket as the target for XDCR.

couchbase.by_bucket.ep_num_ops_del_ret_meta

Number of delRetMeta operations per second for this bucket as the target for XDCR.

couchbase.by_bucket.ep_num_ops_get_meta

Number of read operations per second for this bucket as the target for XDCR.

couchbase.by_bucket.ep_num_ops_set_meta

Number of set operations per second for this bucket as the target for XDCR.

couchbase.by_bucket.ep_num_ops_set_ret_meta

Number of setRetMeta operations per second for this bucket as the target for XDCR.

couchbase.by_bucket.ep_num_value_ejects

Number of times item values got ejected from memory to disk.\

couchbase.by_bucket.ep_oom_errors

Number of times unrecoverable OOMs happened while processing operations.

couchbase.by_bucket.ep_ops_create

Create operations.

couchbase.by_bucket.ep_ops_update

Update operations.

couchbase.by_bucket.ep_overhead

Extra memory used by transient data like persistence queues or checkpoints.

couchbase.by_bucket.ep_queue_size

Number of items queued for storage.

couchbase.by_bucket.ep_resident_items_rate

Number of resident items.

couchbase.by_bucket.ep_tap_replica_queue_drain

Total drained items in the replica queue.

couchbase.by_bucket.ep_tap_total_queue_drain

Total drained items in the queue.

couchbase.by_bucket.ep_tap_total_queue_fill

Total enqueued items in the queue.

couchbase.by_bucket.ep_tap_total_total_backlog_size

Number of remaining items for replication.

couchbase.by_bucket.ep_tmp_oom_errors

Number of times recoverable OOMs happened while processing operations.

couchbase.by_bucket.ep_vb_total

Total number of vBuckets for this bucket.

couchbase.by_bucket.evictions

Number of evictions

couchbase.by_bucket.get_hits

Number of get hits

couchbase.by_bucket.get_misses

Number of get misses.

couchbase.by_bucket.hibernated_requests

Number of streaming requests now idle.

couchbase.by_bucket.hibernated_waked

Rate of streaming request wakeups.

couchbase.by_bucket.hit_ratio

Hit ratio.

couchbase.by_bucket.incr_hits

Number of increment hits.

couchbase.by_bucket.incr_misses

Number of increment misses.

couchbase.by_bucket.mem_actual_free

Free memory.

couchbase.by_bucket.mem_actual_used

Used memory.

couchbase.by_bucket.mem_free

Free memory.

couchbase.by_bucket.mem_total

Total available memory.

couchbase.by_bucket.mem_used (deprecated)

Engine’s total memory usage.

couchbase.by_bucket.mem_used_sys

System memory usage.

couchbase.by_bucket.misses

Total number of misses.

couchbase.by_bucket.ops

Total number of operations.

couchbase.by_bucket.page_faults

Number of page faults.

couchbase.by_bucket.replication_docs_rep_queue

couchbase.by_bucket.replication_meta_latency_aggr

couchbase.by_bucket.rest_requests

Number of HTTP requests.

couchbase.by_bucket.swap_total

Total amount of swap available.

couchbase.by_bucket.swap_used

Amount of swap used.

couchbase.by_bucket.vb_active_eject

Number of items per second being ejected to disk from active vBuckets.

couchbase.by_bucket.vb_active_itm_memory

Amount of active user data cached in RAM in this bucket.

couchbase.by_bucket.vb_active_meta_data_memory

Amount of active item metadata consuming RAM in this bucket.

couchbase.by_bucket.vb_active_num

Number of active items.

couchbase.by_bucket.vb_active_num_non_resident

Number of non resident vBuckets in the active state for this bucket.

couchbase.by_bucket.vb_active_ops_create

New items per second being inserted into active vBuckets in this bucket.

couchbase.by_bucket.vb_active_ops_update

Number of items updated on active vBucket per second for this bucket.

couchbase.by_bucket.vb_active_queue_age

Sum of disk queue item age in milliseconds.

couchbase.by_bucket.vb_active_queue_drain

Total drained items in the queue.

couchbase.by_bucket.vb_active_queue_fill

Number of active items per second being put on the active item disk queue.

couchbase.by_bucket.vb_active_queue_size

Number of active items in the queue.

couchbase.by_bucket.vb_active_resident_items_ratio

Number of resident items.

couchbase.by_bucket.vb_avg_active_queue_age

Average age in seconds of active items in the active item queue.

couchbase.by_bucket.vb_avg_pending_queue_age

Average age in seconds of pending items in the pending item queue.

couchbase.by_bucket.vb_avg_replica_queue_age

Average age in seconds of replica items in the replica item queue.

couchbase.by_bucket.vb_avg_total_queue_age

Average age of items in the queue.

couchbase.by_bucket.vb_pending_curr_items

Number of items in pending vBuckets.

couchbase.by_bucket.vb_pending_eject

Number of items per second being ejected to disk from pending vBuckets.

couchbase.by_bucket.vb_pending_itm_memory

Amount of pending user data cached in RAM in this bucket.

couchbase.by_bucket.vb_pending_meta_data_memory

Amount of pending item metadata consuming RAM in this bucket.

couchbase.by_bucket.vb_pending_num

Number of pending items.

couchbase.by_bucket.vb_pending_num_non_resident

Number of non resident vBuckets in the pending state for this bucket.

couchbase.by_bucket.vb_pending_ops_create

Number of pending create operations.

couchbase.by_bucket.vb_pending_ops_update

Number of items updated on pending vBucket per second for this bucket.

couchbase.by_bucket.vb_pending_queue_age

Sum of disk pending queue item age in milliseconds.

couchbase.by_bucket.vb_pending_queue_drain

Total drained pending items in the queue.

couchbase.by_bucket.vb_pending_queue_fill

Total enqueued pending items on disk queue.

couchbase.by_bucket.vb_pending_queue_size

Number of pending items in the queue.

couchbase.by_bucket.vb_pending_resident_items_ratio

Number of resident pending items.

couchbase.by_bucket.vb_replica_curr_items

Number of in memory items.

couchbase.by_bucket.vb_replica_eject

Number of items per second being ejected to disk from replica vBuckets.

couchbase.by_bucket.vb_replica_itm_memory

Amount of replica user data cached in RAM in this bucket.

couchbase.by_bucket.vb_replica_meta_data_memory

Total metadata memory.

couchbase.by_bucket.vb_replica_num

Number of replica vBuckets.

couchbase.by_bucket.vb_replica_num_non_resident

Number of non resident vBuckets in the replica state for this bucket.

couchbase.by_bucket.vb_replica_ops_create

Number of replica create operations.

couchbase.by_bucket.vb_replica_ops_update

Number of items updated on replica vBucket per second for this bucket.

couchbase.by_bucket.vb_replica_queue_age

Sum of disk replica queue item age in milliseconds.

couchbase.by_bucket.vb_replica_queue_drain

Total drained replica items in the queue.

couchbase.by_bucket.vb_replica_queue_fill

Total enqueued replica items on disk queue.

couchbase.by_bucket.vb_replica_queue_size

Replica items in disk queue.

couchbase.by_bucket.vb_replica_resident_items_ratio

Number of resident replica items.

couchbase.by_bucket.vb_total_queue_age

Sum of disk queue item age in milliseconds.

couchbase.by_bucket.xdc_ops

Number of cross-datacenter replication operations.

couchbase.by_node.couch_docs_actual_disk_size

Couch docs total size on disk in bytes.

couchbase.by_node.couch_docs_data_size

Couch docs data size in bytes.

couchbase.by_node.couch_views_actual_disk_size

Couch views total size on disk in bytes.

couchbase.by_node.couch_views_data_size

Couch views data size on disk in bytes.

couchbase.by_node.curr_items

Number of active items in memory.

couchbase.by_node.curr_items_tot

Total number of items.

couchbase.by_node.vb_replica_curr_items

Number of in memory items.

couchbase.hdd.free

Free hard disk space.

couchbase.hdd.quota_total

Hard disk quota.

couchbase.hdd.total

Total hard disk space.

couchbase.hdd.used

Used hard disk space.

couchbase.hdd.used_by_data

Hard disk used for data.

couchbase.query.cores

couchbase.query.cpu_sys_percent

couchbase.query.cpu_user_percent

couchbase.query.gc_num

couchbase.query.gc_pause_percent

couchbase.query.gc_pause_time

couchbase.query.memory_system

couchbase.query.memory_total

couchbase.query.memory_usage

couchbase.query.request_active_count

couchbase.query.request_completed_count

couchbase.query.request_per_sec_15min

couchbase.query.request_per_sec_1min

couchbase.query.request_per_sec_5min

couchbase.query.request_prepared_percent

couchbase.query.request_time_80percentile

couchbase.query.request_time_95percentile

couchbase.query.request_time_99percentile

couchbase.query.request_time_mean

couchbase.query.request_time_median

couchbase.query.total_threads

couchbase.ram.quota_total

RAM quota.

couchbase.ram.total

The total RAM available.

couchbase.ram.used

The amount of RAM in use.

couchbase.ram.used_by_data

The amount of RAM used for data.

6.3.2.5 - Elasticsearch Metrics

See Application Integrations for more information.

All Elasticsearch metrics have the type gauge.

elasticsearch.active_primary_shards

The number of active primary shards in the cluster.

elasticsearch.active_shards

The number of active shards in the cluster.

elasticsearch.breakers.fielddata.estimated_size_in_bytes

The estimated size in bytes of the field data circuit breaker.

elasticsearch.breakers.fielddata.overhead

The constant multiplier for byte estimations of the field data circuit breaker.

elasticsearch.breakers.fielddata.tripped

The number of times the field data circuit breaker has tripped.

elasticsearch.breakers.parent.estimated_size_in_bytes

The estimated size in bytes of the parent circuit breaker.

elasticsearch.breakers.parent.overhead

The constant multiplier for byte estimations of the parent circuit breaker.

elasticsearch.breakers.parent.tripped

The number of times the parent circuit breaker has tripped.

elasticsearch.breakers.request.estimated_size_in_bytes

The estimated size in bytes of the request circuit breaker.

elasticsearch.breakers.request.overhead

The constant multiplier for byte estimations of the request circuit breaker.

elasticsearch.breakers.request.tripped

The number of times the request circuit breaker has tripped.

elasticsearch.breakers.inflight_requests.tripped

The number of times the inflight circuit breaker has tripped.

elasticsearch.breakers.inflight_requests.overhead

The constant multiplier for byte estimations of the inflight circuit breaker.

elasticsearch.breakers.inflight_requests.estimated_size_in_bytes

The estimated size in bytes of the inflight circuit breaker.

elasticsearch.cache.field.evictions

The total number of evictions from the field data cache.

elasticsearch.cache.field.size

The size of the field cache.

elasticsearch.cache.filter.count

The number of items in the filter cache.

elasticsearch.cache.filter.evictions

The total number of evictions from the filter cache.

elasticsearch.cache.filter.size

The size of the filter cache.

elasticsearch.cluster_status

The elasticsearch cluster health as a number: red = 0, yellow = 1, green = 2

elasticsearch.docs.count

The total number of documents in the cluster across all shards.

elasticsearch.docs.deleted

The total number of documents deleted from the cluster across all shards.

elasticsearch.fielddata.evictions

The total number of evictions from the fielddata cache.

elasticsearch.fielddata.size

The size of the fielddata cache.

elasticsearch.flush.total

The total number of index flushes to disk since start.

elasticsearch.flush.total.time

The total time spent flushing the index to disk.

elasticsearch.fs.total.available_in_bytes

The total number of bytes available to this Java virtual machine on this file store.

elasticsearch.fs.total.disk_io_op

The total I/O operations on the file store.

elasticsearch.fs.total.disk_io_size_in_bytes

Total bytes used for all I/O operations on the file store.

elasticsearch.fs.total.disk_read_size_in_bytes

The total bytes read from the file store.

elasticsearch.fs.total.disk_reads

The total number of reads from the file store.

elasticsearch.fs.total.disk_write_size_in_bytes

The total bytes written to the file store.

elasticsearch.fs.total.disk_writes

The total number of writes to the file store.

elasticsearch.fs.total.free_in_bytes

The total number of unallocated bytes in the file store.

elasticsearch.fs.total.total_in_bytes

The total size in bytes of the file store.

elasticsearch.get.current

The number of get requests currently running.

elasticsearch.get.exists.time

The total time spent on get requests where the document existed.

elasticsearch.get.exists.total

The total number of get requests where the document existed.

elasticsearch.get.missing.time

The total time spent on get requests where the document was missing.

elasticsearch.get.missing.total

The total number of get requests where the document was missing.

elasticsearch.get.time

The total time spent on get requests.

elasticsearch.get.total

The total number of get requests.

elasticsearch.http.current_open

The number of current open HTTP connections.

elasticsearch.http.total_opened

The total number of opened HTTP connections.

elasticsearch.id_cache.size

The size of the id cache

elasticsearch.indexing.delete.current

The number of documents currently being deleted from an index.

elasticsearch.indexing.delete.time

The total time spent deleting documents from an index.

elasticsearch.indexing.delete.total

The total number of documents deleted from an index.

elasticsearch.indexing.index.current

The number of documents currently being indexed to an index.

elasticsearch.indexing.index.time

The total time spent indexing documents to an index.

elasticsearch.indexing.index.total

The total number of documents indexed to an index.

elasticsearch.indices.count

The number of indices in the cluster.

elasticsearch.indices.indexing.index_failed

The number of failed indexing operations.

elasticsearch.indices.indexing.throttle_time

The total time indexing waited due to throttling.

elasticsearch.indices.query_cache.evictions

The number of query cache evictions.

elasticsearch.indices.query_cache.hit_count

The number of query cache hits.

elasticsearch.indices.query_cache.memory_size_in_bytes

The memory used by the query cache.

elasticsearch.indices.query_cache.miss_count

The number of query cache misses.

elasticsearch.indices.recovery.current_as_source

The number of ongoing recoveries for which a shard serves as a source.

elasticsearch.indices.recovery.current_as_target

The number of ongoing recoveries for which a shard serves as a target.

elasticsearch.indices.recovery.throttle_time

The total time recoveries waited due to throttling.

elasticsearch.indices.request_cache.evictions

The number of request cache evictions.

elasticsearch.indices.request_cache.hit_count

The number of request cache hits.

elasticsearch.indices.request_cache.memory_size_in_bytes

The memory used by the request cache.

elasticsearch.indices.request_cache.miss_count

The number of request cache misses.

elasticsearch.indices.segments.count

The number of segments in an index shard.

elasticsearch.indices.segments.doc_values_memory_in_bytes

The memory used by doc values.

elasticsearch.indices.segments.fixed_bit_set_memory_in_bytes

The memory used by fixed bit set.

elasticsearch.indices.segments.index_writer_max_memory_in_bytes

The maximum memory used by the index writer.

elasticsearch.indices.segments.index_writer_memory_in_bytes

The memory used by the index writer.

elasticsearch.indices.segments.memory_in_bytes

The memory used by index segments.

elasticsearch.indices.segments.norms_memory_in_bytes

The memory used by norms.

elasticsearch.indices.segments.stored_fields_memory_in_bytes

The memory used by stored fields.

elasticsearch.indices.segments.term_vectors_memory_in_bytes

The memory used by term vectors.

elasticsearch.indices.segments.terms_memory_in_bytes

The memory used by terms.

elasticsearch.indices.segments.version_map_memory_in_bytes

The memory used by the segment version map.

elasticsearch.indices.translog.operations

The number of operations in the transaction log.

elasticsearch.indices.translog.size_in_bytes

The size of the transaction log.

elasticsearch.initializing_shards

The number of shards that are currently initializing.

elasticsearch.merges.current

The number of currently active segment merges.

elasticsearch.merges.current.docs

The number of documents across segments currently being merged.

elasticsearch.merges.current.size

The size of the segments currently being merged.

elasticsearch.merges.total

The total number of segment merges.

elasticsearch.merges.total.docs

The total number of documents across all merged segments.

elasticsearch.merges.total.size

The total size of all merged segments.

elasticsearch.merges.total.time

The total time spent on segment merging.

elasticsearch.number_of_data_nodes

The number of data nodes in the cluster.

elasticsearch.number_of_nodes

The total number of nodes in the cluster.

elasticsearch.pending_tasks_priority_high

The number of high priority pending tasks.

elasticsearch.pending_tasks_priority_urgent

The number of urgent priority pending tasks.

elasticsearch.pending_tasks_time_in_queue

The average time spent by tasks in the queue.

elasticsearch.pending_tasks_total

The total number of pending tasks.

elasticsearch.process.open_fd

The number of opened file descriptors associated with the current process, or -1 if not supported.

elasticsearch.refresh.total

The total number of index refreshes.

elasticsearch.refresh.total.time

The total time spent on index refreshes.

elasticsearch.relocating_shards

The number of shards that are relocating from one node to another.

elasticsearch.search.fetch.current

The number of search fetches currently running.

elasticsearch.search.fetch.open_contexts

The number of active searches.

elasticsearch.search.fetch.time

The total time spent on the search fetch.

elasticsearch.search.fetch.total

The total number of search fetches.

elasticsearch.search.query.current

The number of currently active queries.

elasticsearch.search.query.time

The total time spent on queries.

elasticsearch.search.query.total

The total number of queries.

elasticsearch.store.size

The total size in bytes of the store.

elasticsearch.thread_pool.bulk.active

The number of active threads in the bulk pool.

elasticsearch.thread_pool.bulk.queue

The number of queued threads in the bulk pool.

elasticsearch.thread_pool.bulk.threads

The total number of threads in the bulk pool.

elasticsearch.thread_pool.bulk.rejected

The number of rejected threads in the bulk pool.

elasticsearch.thread_pool.fetch_shard_started.active

The number of active threads in the fetch shard started pool.

elasticsearch.thread_pool.fetch_shard_started.threads

The total number of threads in the fetch shard started pool.

elasticsearch.thread_pool.fetch_shard_started.queue

The number of queued threads in the fetch shard started pool.

elasticsearch.thread_pool.fetch_shard_started.rejected

The number of rejected threads in the fetch shard started pool.

elasticsearch.thread_pool.fetch_shard_store.active

The number of active threads in the fetch shard store pool.

elasticsearch.thread_pool.fetch_shard_store.threads

The total number of threads in the fetch shard store pool.

elasticsearch.thread_pool.fetch_shard_store.queue

The number of queued threads in the fetch shard store pool.

elasticsearch.thread_pool.fetch_shard_store.rejected

The number of rejected threads in the fetch shard store pool.

elasticsearch.thread_pool.flush.active

The number of active threads in the flush queue.

elasticsearch.thread_pool.flush.queue

The number of queued threads in the flush pool.

elasticsearch.thread_pool.flush.threads

The total number of threads in the flush pool.

elasticsearch.thread_pool.flush.rejected

The number of rejected threads in the flush pool.

elasticsearch.thread_pool.force_merge.active

The number of active threads for force merge operations.

elasticsearch.thread_pool.force_merge.threads

The total number of threads for force merge operations.

elasticsearch.thread_pool.force_merge.queue

The number of queued threads for force merge operations.

elasticsearch.thread_pool.force_merge.rejected

The number of rejected threads for force merge operations.

elasticsearch.thread_pool.generic.active

The number of active threads in the generic pool.

elasticsearch.thread_pool.generic.queue

The number of queued threads in the generic pool.

elasticsearch.thread_pool.generic.threads

The total number of threads in the generic pool.

elasticsearch.thread_pool.generic.rejected

The number of rejected threads in the generic pool.

elasticsearch.thread_pool.get.active

The number of active threads in the get pool.

elasticsearch.thread_pool.get.queue

The number of queued threads in the get pool.

elasticsearch.thread_pool.get.threads

The total number of threads in the get pool.

elasticsearch.thread_pool.get.rejected

The number of rejected threads in the get pool.

elasticsearch.thread_pool.index.active

The number of active threads in the index pool.

elasticsearch.thread_pool.index.queue

The number of queued threads in the index pool.

elasticsearch.thread_pool.index.threads

The total number of threads in the index pool.

elasticsearch.thread_pool.index.rejected

The number of rejected threads in the index pool.

elasticsearch.thread_pool.listener.active

The number of active threads in the listener pool.

elasticsearch.thread_pool.listener.queue

The number of queued threads in the listener pool.

elasticsearch.thread_pool.listener.threads

The total number of threads in the listener pool.

elasticsearch.thread_pool.listener.rejected

The number of rejected threads in the listener pool.

elasticsearch.thread_pool.management.active

The number of active threads in the management pool.

elasticsearch.thread_pool.management.queue

The number of queued threads in the management pool.

elasticsearch.thread_pool.management.threads

The total number of threads in the management pool.

elasticsearch.thread_pool.management.rejected

The number of rejected threads in the management pool.

elasticsearch.thread_pool.merge.active

The number of active threads in the merge pool.

elasticsearch.thread_pool.merge.queue

The number of queued threads in the merge pool.

elasticsearch.thread_pool.merge.threads

The total number of threads in the merge pool.

elasticsearch.thread_pool.merge.rejected

The number of rejected threads in the merge pool.

elasticsearch.thread_pool.percolate.active

The number of active threads in the percolate pool.

elasticsearch.thread_pool.percolate.queue

The number of queued threads in the percolate pool.

elasticsearch.thread_pool.percolate.threads

The total number of threads in the percolate pool.

elasticsearch.thread_pool.percolate.rejected

The number of rejected threads in the percolate pool.

elasticsearch.thread_pool.refresh.active

The number of active threads in the refresh pool.

elasticsearch.thread_pool.refresh.queue

The number of queued threads in the refresh pool.

elasticsearch.thread_pool.refresh.threads

The total number of threads in the refresh pool.

elasticsearch.thread_pool.refresh.rejected

The number of rejected threads in the refresh pool.

elasticsearch.thread_pool.search.active

The number of active threads in the search pool.

elasticsearch.thread_pool.search.queue

The number of queued threads in the search pool.

elasticsearch.thread_pool.search.threads

The total number of threads in the search pool.

elasticsearch.thread_pool.search.rejected

The number of rejected threads in the search pool.

elasticsearch.thread_pool.snapshot.active

The number of active threads in the snapshot pool.

elasticsearch.thread_pool.snapshot.queue

The number of queued threads in the snapshot pool.

elasticsearch.thread_pool.snapshot.threads

The total number of threads in the snapshot pool.

elasticsearch.thread_pool.snapshot.rejected

The number of rejected threads in the snapshot pool.

elasticsearch.thread_pool.write.active

The number of active threads in the write pool.

elasticsearch.thread_pool.write.queue

The number of queued threads in the write pool.

elasticsearch.thread_pool.write.threads

The total number of threads in the write pool.

elasticsearch.thread_pool.write.rejected

The number of rejected threads in the write pool.

elasticsearch.transport.rx_count

The total number of packets received in cluster communication.

elasticsearch.transport.rx_size

The total size of data received in cluster communication.

elasticsearch.transport.server_open

The number of connections opened for cluster communication.

elasticsearch.transport.tx_count

The total number of packets sent in cluster communication.

elasticsearch.transport.tx_size

The total size of data sent in cluster communication.

elasticsearch.unassigned_shards

The number of shards that are unassigned to a node.

elasticsearch.delayed_unassigned_shards

The number of shards whose allocation has been delayed.

jvm.gc.collection_count

The total number of garbage collections run by the JVM.

jvm.gc.collection_time

The total time spent on garbage collection in the JVM.

jvm.gc.collectors.old.collection_time

The total time spent in major GCs in the JVM that collect old generation objects.

jvm.gc.collectors.old.count

The total count of major GCs in the JVM that collect old generation objects.

jvm.gc.collectors.young.collection_time

The total time spent in minor GCs in the JVM that collects young generation objects.

jvm.gc.collectors.young.count

The total count of minor GCs in the JVM that collects young generation objects.

jvm.gc.concurrent_mark_sweep.collection_time

The total time spent on “concurrent mark & sweep” GCs in the JVM.

jvm.gc.concurrent_mark_sweep.count

The total count of “concurrent mark & sweep” GCs in the JVM.

jvm.gc.par_new.collection_time

The total time spent on “parallel new” GCs in the JVM.

jvm.gc.par_new.count

The total count of “parallel new” GCs in the JVM.

jvm.mem.heap_committed

The amount of memory guaranteed to be available to the JVM heap.

jvm.mem.heap_in_use

The amount of memory currently used by the JVM heap as a value between 0 and 1.

jvm.mem.heap_max

The maximum amount of memory that can be used by the JVM heap.

jvm.mem.heap_used

The amount of memory in bytes currently used by the JVM heap.

jvm.mem.non_heap_committed

The amount of memory guaranteed to be available to JVM non-heap.

jvm.mem.non_heap_used

The amount of memory in bytes currently used by the JVM non-heap.

jvm.mem.pools.young.used

The amount of memory in bytes currently used by the Young Generation heap region.

jvm.mem.pools.young.max

The maximum amount of memory that can be used by the Young Generation heap region.

jvm.mem.pools.old.used

The amount of memory in bytes currently used by the Old Generation heap region.

jvm.mem.pools.old.max

The maximum amount of memory that can be used by the Old Generation heap region.

jvm.mem.pools.survivor.used

The amount of memory in bytes currently used by the Survivor Space.

jvm.mem.pools.survivor.max

The maximum amount of memory that can be used by the Survivor Space.

jvm.threads.count

The number of active threads in the JVM.

jvm.threads.peak_count

The peak number of threads used by the JVM.

elasticsearch.index.health

The status of the index.

elasticsearch.index.docs.count

The number of documents in the index.

elasticsearch.index.docs.deleted

The number of deleted documents in the index.

elasticsearch.index.primary_shards

The number of primary shards in the index.

elasticsearch.index.replica_shards

The number of replica shards in the index.

elasticsearch.index.primary_store_size

The store size of primary shards in the index.

elasticsearch.index.store_size

The store size of primary and replica shards in the index.

6.3.2.6 - etcd Metrics

See Application Integrations for more information.

etcd.leader.counts.fail

Rate of failed Raft RPC requests.

etcd.leader.counts.success

Rate of successful Raft RPC requests.

etcd.leader.latency.avg

Average latency to each peer in the cluster.

etcd.leader.latency.current

Current latency to each peer in the cluster.

etcd.leader.latency.max

Maximum latency to each peer in the cluster.

etcd.leader.latency.min

Minimum latency to each peer in the cluster.

etcd.leader.latency.stddev

Standard deviation latency to each peer in the cluster.

etcd.self.recv.appendrequest.count

Rate of append requests this node has processed.

etcd.self.recv.bandwidthrate

Rate of bytes received.

etcd.self.recv.pkgrate

Rate of packets received.

etcd.self.send.appendrequest.count

Rate of append requests this node has sent.

etcd.self.send.bandwidthrate

Rate of bytes sent.

etcd.self.send.pkgrate

Rate of packets sent.

etcd.store.compareanddelete.fail

Rate of compare and delete requests failure.

etcd.store.compareanddelete.success

Rate of compare and delete requests success.

etcd.store.compareandswap.fail

Rate of compare and swap requests failure.

etcd.store.compareandswap.success

Rate of compare and swap requests success.

etcd.store.create.fail

Rate of failed create requests.

etcd.store.create.success

Rate of successful create requests.

etcd.store.delete.fail

Rate of failed delete requests.

etcd.store.delete.success

Rate of successful delete requests.

etcd.store.expire.count

Rate of expired keys.

etcd.store.gets.fail

Rate of failed get requests.

etcd.store.gets.success

Rate of successful get requests.

etcd.store.sets.fail

Rate of failed set requests.

etcd.store.sets.success

Rate of successful set requests.

etcd.store.update.fail

Rate of failed update requests.

etcd.store.update.success

Rate of successful update requests.

etcd.store.watchers

Rate of watchers.

6.3.2.7 - fluentd Metrics

See Application Integrations for more information.

fluentd.buffer_queue_length

The length of the plugin buffer queue for this plugin.

fluentd.buffer_total_queued_size

The size of the buffer queue for this plugin.

fluentd.retry_count

The number of retries for this plugin.

6.3.2.8 - Go Metrics

See Application Integrations for more information.

go_expvar.memstats.alloc

The number of bytes allocated and not yet freed.

go_expvar.memstats.frees

The number of free bytes.

go_expvar.memstats.heap_alloc

go_expvar.memstats.heap_idle

The number of bytes in idle spans.

go_expvar.memstats.heap_inuse

The number of bytes in non-idle spans.

go_expvar.memstats.heap_objects

The total number of allocated objects.

go_expvar.memstats.heap_released

The number of bytes released to the OS.

go_expvar.memstats.heap_sys

The number of bytes obtained from the system.

go_expvar.memstats.lookups

The number of pointer lookups.

go_expvar.memstats.mallocs

The number of mallocs.

go_expvar.memstats.num_gc

The number of garbage collections.

go_expvar.memstats.pause_ns.avg

The average of recent GC pause durations.

go_expvar.memstats.pause_ns.count

The number of submitted GC pause durations.

go_expvar.memstats.pause_ns.max

The max GC pause duration.

go_expvar.memstats.pause_ns.median

The median GC pause duration.

go_expvar.memstats.pause_total_ns

The total GC pause duration over the lifetime of process.

go_expvar.memstats.total_alloc

The bytes allocated (even if freed).

6.3.2.9 - HTTP Metrics

See Application Integrations for more information.

http.ssl.days_left

The number of days until the SSL certificate expires.

network.http.response_time

The response time of a HTTP request to a specified URL.

6.3.2.10 - HAProxy Metrics

See Application Integrations for more information.

haproxy.backend_hosts

The number of backend hosts.

haproxy.backend.bytes.in_rate

The rate of bytes in on backend hosts.

haproxy.backend.bytes.out_rate

The rate of bytes out on backend hosts.

haproxy.backend.connect.time

The average connect time over the last 1024 requests.

haproxy.backend.denied.req_rate

The number of requests denied due to security concerns.

haproxy.backend.denied.resp_rate

The number of responses denied due to security concerns.

haproxy.backend.errors.con_rate

The rate of requests that encountered an error trying to connect to a backend server.

haproxy.backend.errors.resp_rate

The rate of responses aborted due to error.

haproxy.backend.queue.current

The number of requests without an assigned backend.

haproxy.backend.queue.time

The average queue time over the last 1024 requests.

haproxy.backend.response.1xx

The backend HTTP responses with 1xx code.

haproxy.backend.response.2xx

The backend HTTP responses with 2xx code.

haproxy.backend.response.3xx

The backend HTTP responses with 3xx code.

haproxy.backend.response.4xx

The backend HTTP responses with 4xx code.

haproxy.backend.response.5xx

The backend HTTP responses with 5xx code.

haproxy.backend.response.other

The backend HTTP responses with another code (protocol error).

haproxy.backend.response.time

The average response time over the last 1024 requests (0 for TCP).

haproxy.backend.session.current

The number of active backend sessions.

haproxy.backend.session.limit

The configured backend session limit.

haproxy.backend.session.pct

The percentage of sessions in use. The formula used for this metric is backend.session.current / backend.session.limit * 100.

haproxy.backend.session.rate

The number of backend sessions created per second.

haproxy.backend.session.time

The average total session time over the last 1024 requests.

haproxy.backend.uptime

The number of seconds since the last UP<->DOWN transition.

haproxy.backend.warnings.redis_rate

The number of times a request was redispatched to another server.

haproxy.backend.warnings.retr_rate

The number of times a connection to a server was retried.

haproxy.count_per_status

The number of hosts by status (UP/DOWN/NOLB/MAINT).

haproxy.frontend.bytes.in_rate

The rate of bytes in on frontend hosts.

haproxy.frontend.bytes.out_rate

The rate of bytes out on frontend hosts.

haproxy.frontend.denied.req_rate

The number of requests denied due to security concerns.

haproxy.frontend.denied.resp_rate

The number of responses denied due to security concerns.

haproxy.frontend.errors.req_rate

The rate of request errors.

haproxy.frontend.requests.rate

The number of HTTP requests per second.

haproxy.frontend.response.1xx

The frontend HTTP responses with 1xx code.

haproxy.frontend.response.2xx

The frontend HTTP responses with 2xx code.

haproxy.frontend.response.3xx

The frontend HTTP responses with 3xx code.

haproxy.frontend.response.4xx

The frontend HTTP responses with 4xx code.

haproxy.frontend.response.5xx

The frontend HTTP responses with 5xx code.

haproxy.frontend.response.other

The frontend HTTP responses with another code (protocol error).

haproxy.frontend.session.current

The number of active frontend sessions.

haproxy.frontend.session.limit

The configured backend session limit.

haproxy.frontend.session.pct

The percentage of sessions in use. The formula used for this metric is frontend.session.current / frontend.session.limit * 100.

haproxy.frontend.session.rate

The number of frontend sessions created per second.

Agent 9.6.0 Additional HAProxy Metrics

  • haproxy.backend.requests.tot_rate

    Rate of total number of HTTP requests

  • haproxy.frontend.connections.rate

    Number of connections per second

  • haproxy.frontend.connections.tot_rate

    Rate of total number of connections

  • haproxy.frontend.requests.intercepted

    Number of intercepted requests per second

  • haproxy.frontend.requests.tot_rate

    Rate of total number of HTTP requests

6.3.2.11 - Jenkins Metrics

See Application Integrations for more information.

jenkins.job.duration

The duration of a job, measured in seconds.

jenkins.job.success

The status of a successful job.

jenkins.job.failure

The status of a failed job.

6.3.2.12 - Lighttpd Metrics

See Application Integrations for more information.

lighttpd.net.bytes

The total number of bytes sent and received.

lighttpd.net.bytes_per_s

The number of bytes sent and received per second.

lighttpd.net.hits

The total number of hits since the start.

lighttpd.net.request_per_s

The number of requests per second.

lighttpd.performance.busy_servers

The number of active connections.

lighttpd.performance.idle_server

The number of idle connections.

lighttpd.performance.uptime

The amount of time the server has been up and running.

6.3.2.13 - Memcached Metrics

See Application Integrations for more information.

memcache.avg_item_size

The average size of an item.

memcache.bytes

The current number of bytes used by this server to store items.

memcache.bytes_read_rate

The rate of bytes read from the network by this server.

memcache.bytes_written_rate

The rate of bytes written to the network by this server.

memcache.cas_badval_rate

The rate at which keys are compared and swapped where the comparison (original) value did not match the supplied value.

memcache.cas_hits_rate

The rate at which keys are compared and swapped and found present.

memcache.cas_misses_rate

The rate at which keys are compared and swapped and not found present.

memcache.cmd_flush_rate

The rate of flush_all commands.

memcache.cmd_get_rate

The rate of get commands.

memcache.cmd_set_rate

The rate of set commands.

memcache.connection_structures

The number of connection structures allocated by the server.

memcache.curr_connections

The number of open connections to this server.

memcache.curr_items

The current number of items stored by the server.

memcache.delete_hits_rate

The rate at which delete commands result in items being removed.

memcache.delete_misses_rate

The rate at which delete commands result in no items being removed.

memcache.evictions_rate

The rate at which valid items are removed from cache to free memory for new items.

memcache.fill_percent

The amount of memory being used by the server for storing items as a percentage of the max allowed.

memcache.get_hit_percent

The percentage of requested keys that are found present since the start of the Memcached server.

memcache.get_hits_rate

The rate at which keys are requested and found present.

memcache.get_misses_rate

The rate at which keys are requested and not found.

memcache.items.age

The age of the oldest item in the LRU.

memcache.items.crawler_reclaimed_rate

The rate at which items freed by the LRU Crawler.

memcache.items.direct_reclaims_rate

The rate at which worker threads had to directly pull LRU tails to find memory for a new item.

memcache.items.evicted_nonzero_rate

The rate at which nonzero items which had an explicit expire time set had to be evicted from the LRU before expiring.

memcache.items.evicted_rate

The rate st which items had to be evicted from the LRU before expiring.

memcache.items.evicted_time

The number of seconds since the last access for the most recent item evicted from this class.

memcache.items.evicted_unfetched_rate

The rate at which valid items evicted from the LRU which were never touched after being set.

memcache.items.expired_unfetched_rate

The rate at which expired items reclaimed from the LRU which were never touched after being set.

memcache.items.lrutail_reflocked_rate

The rate at which items found to be refcount locked in the LRU tail.

memcache.items.moves_to_cold_rate

The rate at which items were moved from HOT or WARM into COLD.

memcache.items.moves_to_warm_rate

The rate at which items were moved from COLD to WARM.

memcache.items.moves_within_lru_rate

The rate at which active items were bumped within HOT or WARM.

memcache.items.number

The number of items presently stored in this slab class.

memcache.items.number_cold

The number of items presently stored in the COLD LRU.

memcache.items.number_hot

The number of items presently stored in the HOT LRU.

memcache.items.number_noexp

The number of items presently stored in the NOEXP class.

memcache.items.number_warm

The number of items presently stored in the WARM LRU.

memcache.items.outofmemory_rate

The rate at which the underlying slab class was unable to store a new item.

memcache.items.reclaimed_rate

The rate at which entries were stored using memory from an expired entry.

memcache.items.tailrepairs_rate

The rate at which Memcached self-healed a slab with a refcount leak.

memcache.limit_maxbytes

The number of bytes this server is allowed to use for storage.

memcache.listen_disabled_num_rate

The rate at which the server has reached the max connection limit.

memcache.pointer_size

The default size of pointers on the host OS (generally 32 or 64).

memcache.rusage_system_rate

The fraction of user time the CPU spent executing this server process.

memcache.rusage_user_rate

The fraction of time the CPU spent executing kernel code on behalf of this server process.

memcache.slabs.active_slabs

The total number of slab classes allocated.

memcache.slabs.cas_badval_rate

The rate at which CAS commands failed to modify a value due to a bad CAS ID.

memcache.slabs.cas_hits_rate

The rate at which CAS commands modified this slab class.

memcache.slabs.chunk_size

The amount of space each chunk uses.

memcache.slabs.chunks_per_page

The number of chunks that exist within one page.

memcache.slabs.cmd_set_rate

The rate at which set requests stored data in this slab class.

memcache.slabs.decr_hits_rate

The rate at which decrs commands modified this slab class.

memcache.slabs.delete_hits_rate

The rate at which delete commands succeeded in this slab class.

memcache.slabs.free_chunks

The number of chunks not yet allocated to items or freed via delete.

memcache.slabs.free_chunks_end

The number of free chunks at the end of the last allocated page.

memcache.slabs.get_hits_rate

The rate at which get requests were serviced by this slab class.

memcache.slabs.incr_hits_rate

The rate at which incrs commands modified this slab class.

memcache.slabs.mem_requested

The number of bytes requested to be stored in this slab.

memcache.slabs.total_chunks

The total number of chunks allocated to the slab class.

memcache.slabs.total_malloced

The total amount of memory allocated to slab pages.

memcache.slabs.total_pages

The total number of pages allocated to the slab class.

memcache.slabs.touch_hits_rate

The rate of touches serviced by this slab class.

memcache.slabs.used_chunks

The number of chunks that have been allocated to items.

memcache.slabs.used_chunks_rate

The rate at which chunks have been allocated to items.

memcache.threads

The number of threads used by the current Memcached server process.

memcache.total_connections_rate

The rate at which connections to this server are opened.

memcache.total_items

The total number of items stored by this server since it started.

memcache.uptime

The number of seconds this server has been running.

6.3.2.14 - Mesos/Marathon Metrics

Contents

6.3.2.14.1 - Mesos Agent Metrics

See Application Integrations for more information.

mesos.slave.cpus_percent

The percentage of CPUs allocated to the slave.

mesos.slave.cpus_total

The total number of CPUs.

mesos.slave.cpus_used

The number of CPUs allocated to the slave.

mesos.slave.disk_percent

The percentage of disk space allocated to the slave.

mesos.slave.disk_total

The total disk space available.

mesos.slave.disk_used

The amount of disk space allocated to the slave.

mesos.slave.executors_registering

The number of executors registering.

mesos.slave.executors_running

The number of executors currently running.

mesos.slave.executors_terminated

The number of terminated executors.

mesos.slave.executors_terminating

The number of terminating executors.

mesos.slave.frameworks_active

The number of active frameworks.

mesos.slave.invalid_framework_messages

The number of invalid framework messages.

mesos.slave.invalid_status_updates

The number of invalid status updates.

mesos.slave.mem_percent

The percentage of memory allocated to the slave.

mesos.slave.mem_total

The total memory available.

mesos.slave.mem_used

The amount of memory allocated to the slave.

mesos.slave.recovery_errors

The number of errors encountered during slave recovery.

mesos.slave.tasks_failed

The number of failed tasks.

mesos.slave.tasks_finished

The number of finished tasks.

mesos.slave.tasks_killed

The number of killed tasks.

mesos.slave.tasks_lost

The number of lost tasks.

mesos.slave.tasks_running

The number of running tasks.

mesos.slave.tasks_staging

The number of staging tasks.

mesos.slave.tasks_starting

The number of starting tasks.

mesos.slave.valid_framework_messages

The number of valid framework messages.

mesos.slave.valid_status_updates

The number of valid status updates.

mesos.state.task.cpu

The task CPU.

mesos.state.task.disk

The disk space available for the task.

mesos.state.task.mem

The amount of memory used by the task.

mesos.stats.registered

Defines whether this slave is registered with a master.

mesos.stats.system.cpus_total

The total number of CPUs available.

mesos.stats.system.load_1min

The average load for the last minute.

mesos.stats.system.load_5min

The average load for the last five minutes.

mesos.stats.system.load_15min

The average load for the last 15 minutes.

mesos.stats.system.mem_free_bytes

The amount of free memory.

mesos.stats.system.mem_total_bytes

The total amount of memory.

mesos.stats.uptime_secs

The current uptime for the slave.

6.3.2.14.2 - Mesos Master Metrics

See Application Integrations for more information.

mesos.cluster.cpus_percent

The percentage of CPUs allocated to the cluster.

mesos.cluster.cpus_total

The total number of CPUs.

mesos.cluster.cpus_used

The number of CPUs used by the cluster.

mesos.cluster.disk_percent

The percentage of disk space allocated to the cluster.

mesos.cluster.disk_total

The total amount of disk space.

mesos.cluster.disk_used

The amount of disk space used by the cluster.

mesos.cluster.dropped_messages

The number of dropped messages.

mesos.cluster.event_queue_dispatches

The number of dispatches in the event queue.

mesos.cluster.event_queue_http_requests

The number of HTTP requests in the event queue.

mesos.cluster.event_queue_messages

The number of messages in the event queue.

mesos.cluster.frameworks_active

The number of active frameworks.

mesos.cluster.frameworks_connected

The number of connected frameworks.

mesos.cluster.frameworks_disconnected

The number of disconnected frameworks.

mesos.cluster.frameworks_inactive

The number of inactive frameworks.

mesos.cluster.gpus_total

The total number of GPUs.

mesos.cluster.invalid_framework_to_executor_messages

The number of invalid messages between the framework and the executor.

mesos.cluster.invalid_status_update_acknowledgements

The number of invalid status update acknowledgements.

mesos.cluster.invalid_status_updates

The number of invalid framework messages.

mesos.cluster.mem_percent

The percentage of memory allocated to the cluster.

mesos.cluster.mem_total

The total amount of memory available.

mesos.cluster.mem_used

The amount of memory the cluster is using.

mesos.cluster.outstanding_offers

The number of outstanding resource offers.

mesos.cluster.slave_registrations

The number of slaves able to rejoin the cluster after a disconnect.

mesos.cluster.slave_removals

The number of slaves that have been removed for any reason, including maintenance.

mesos.cluster.slave_reregistrations

The number of slaves that have re-registered.

mesos.cluster.slave_shutdowns_canceled

The number of slave shutdowns processes that have been cancelled.

mesos.cluster.slave_shutdowns_scheduled

The number of slaves that have failed health checks and are scheduled for removal.

mesos.cluster.slaves_active

The number of active slaves.

mesos.cluster.slaves_connected

The number of connected slaves.

mesos.cluster.slaves_disconnected

The number of disconnected slaves.

mesos.cluster.slaves_inactive

The number of inactive slaves.

mesos.cluster.tasks_error

The number of cluster tasks that resulted in an error.

mesos.cluster.tasks_failed

The number of failed cluster tasks.

mesos.cluster.tasks_finished

The number of completed cluster tasks.

mesos.cluster.tasks_killed

The number of killed cluster tasks.

mesos.cluster.tasks_lost

The number of lost cluster tasks.

mesos.cluster.tasks_running

The number of cluster tasks currently running.

mesos.cluster.tasks_staging

The number of cluster tasks currently staging.

mesos.cluster.tasks_starting

The number of cluster tasks starting.

mesos.cluster.valid_framework_to_executor_messages

The number of valid framework messages.

mesos.cluster.valid_status_update_acknowledgements

The number of valid status update acknowledgements.

mesos.cluster.valid_status_updates

The number of valid status updates.

mesos.framework.cpu

The CPU of the Mesos framework.

mesos.framework.disk

The total disk space of the Mesos framework, measured in mebibytes.

mesos.framework.mem

The total memory of the Mesos framework, measured in mebibytes.

mesos.registrar.queued_operations

The number of queued operations.

mesos.registrar.registry_size_bytes

The size of the Mesos registry in bytes.

mesos.registrar.state_fetch_ms

The Mesos registry’s read latency, in bytes.

mesos.registrar.state_store_ms

The Mesos registry’s write latency, in bytes.

mesos.registrar.state_store_ms.count

The Mesos registry’s write count, in bytes.

mesos.registrar.state_store_ms.max

The maximum write latency for the registry, in milliseconds.

mesos.registrar.state_store_ms.min

The minimum write latency for the registry, in miliseconds.

mesos.registrar.state_store_ms.p50

The median registry write latency, in milliseconds.

mesos.registrar.state_store_ms.p90

The 90th percentile registry write latency, in milliseconds.

mesos.registrar.state_store_ms.p95

The 95th percentile registry write latency, in milliseconds.

mesos.registrar.state_store_ms.p99

The 99th percentile registry write latency, in milliseconds.

mesos.registrar.state_store_ms.p999

The 99.9th percentile registry write latency, in milliseconds.

mesos.registrar.state_store_ms.p9999

The 99.99th percentile registry write latency, in milliseconds.

mesos.role.cpu

The CPU capacity of the configured role.

mesos.role.disk

The total disk space available to the Mesos role, in mebibytes.

mesos.role.mem

The total memory available to the Mesos role, in mebibytes.

mesos.stats.elected

Defines whether this is the elected master or not.

mesos.stats.system.cpus_total

The total number of CPUs in the system.

mesos.stats.system.load_1min

The average load for the last minute.

mesos.stats.system.load_5min

The average load for the last five minutes.

mesos.stats.system.load_15min

The average load for the last fifteen minutes.

mesos.stats.system.mem_free_bytes

The total amount of free system memory, in bytes.

mesos.stats.system.mem_total_bytes

The total cluster memory in bytes.

mesos.stats.uptime_secs

The current uptime of the cluster.

6.3.2.14.3 - Marathon Metrics

See Application Integrations for more information.

marathon.apps

The total number of applications.

marathon.backoffFactor

The multiplication factor for the delay between each consecutive failed task. This value is multiplied by the value of marathon.backoffSeconds each time the task fails until the maximum delay is reached, or the task succeeds.

marathon.backoffSeconds

The period of time between attempts to run a failed task. This value is multiplied by marathon.backoffFactor for each consecutive task failure, until either the task succeeds or the maximum delay is reached.

marathon.cpus

The number of CPUs configured for each application instance.

marathon.disk

The amount of disk space configured for each application instance.

marathon.instances

The number of instances of a specific application.

marathon.mem

The total amount of configured memory for each instance of a specific application.

marathon.tasksRunning

The number of tasks running for a specific application.

marathon.tasksStaged

The number of tasks staged for a specific application.

6.3.2.15 - MongoDB Metrics

See Application Integrations for more information.

Metrics Introduced with Agent v9.7.0

The following metrics are supported by Sysdig Agent v9.7.0 and above.

Metric NameDescription
mongodb.tcmalloc.generic.current_allocated_bytesThe number of bytes used by the application.
mongodb.tcmalloc.generic.heap_sizeBytes of system memory reserved by TCMalloc.
mongodb.tcmalloc.tcmalloc.aggressive_memory_decommitStatus of aggressive memory de-commit mode.
mongodb.tcmalloc.tcmalloc.central_cache_free_bytesThe number of free bytes in the central cache.
mongodb.tcmalloc.tcmalloc.current_total_thread_cache_bytesThe number of bytes used across all thread caches.
mongodb.tcmalloc.tcmalloc.max_total_thread_cache_bytesThe upper limit on the total number of bytes stored across all per-thread caches.
mongodb.tcmalloc.tcmalloc.pageheap_free_bytesThe number of bytes in free mapped pages in page heap.
mongodb.tcmalloc.tcmalloc.pageheap_unmapped_bytesThe number of bytes in free unmapped pages in page heap.
mongodb.tcmalloc.tcmalloc.spinlock_total_delay_nsGives the spinlock delay time.
mongodb.tcmalloc.tcmalloc.thread_cache_free_bytesThe number of free bytes in thread caches.
mongodb.tcmalloc.tcmalloc.transfer_cache_free_bytesThe number of free bytes that are waiting to be transferred between the central cache and a thread cache.

mongodb.asserts.msgps

Number of message assertions raised per second.

mongodb.asserts.regularps

Number of regular assertions raised per second.

mongodb.asserts.rolloversps

Number of times that the rollover counters roll over per second. The counters rollover to zero every 2^30 assertions.

mongodb.asserts.userps

Number of user assertions raised per second.

mongodb.asserts.warningps

Number of warnings raised per second.

mongodb.backgroundflushing.average_ms

Average time for each flush to disk.

mongodb.backgroundflushing.flushesps

Number of times the database has flushed all writes to disk.

mongodb.backgroundflushing.last_ms

Amount of time that the last flush operation took to complete.

mongodb.backgroundflushing.total_ms

Total number of time that the `mongod` processes have spent writing (i.e. flushing) data to disk.

mongodb.connections.available

Number of unused available incoming connections the database can provide.

mongodb.connections.current

Number of connections to the database server from clients.

mongodb.connections.totalcreated

Total number of connections created.

mongodb.cursors.timedout

Total number of cursors that have timed out since the server process started.

mongodb.cursors.totalopen

Number of cursors that MongoDB is maintaining for clients

mongodb.dbs

Total number of existing databases

mongodb.dur.commits

Number of transactions written to the journal during the last journal group commit interval.

mongodb.dur.commitsinwritelock

Count of the commits that occurred while a write lock was held.

mongodb.dur.compression

Compression ratio of the data written to the journal.

mongodb.dur.earlycommits

Number of times MongoDB requested a commit before the scheduled journal group commit interval.

mongodb.dur.journaledmb

Amount of data written to journal during the last journal group commit interval.

mongodb.dur.timems.commits

Amount of time spent for commits.

mongodb.dur.timems.commitsinwritelock

Amount of time spent for commits that occurred while a write lock was held.

mongodb.dur.timems.dt

Amount of time over which MongoDB collected the `dur.timeMS` data.

mongodb.dur.timems.preplogbuffer

Amount of time spent preparing to write to the journal.

mongodb.dur.timems.remapprivateview

Amount of time spent remapping copy-on-write memory mapped views.

mongodb.dur.timems.writetodatafiles

Amount of time spent writing to data files after journaling.

mongodb.dur.timems.writetojournal

Amount of time spent writing to the journal

mongodb.dur.writetodatafilesmb

Amount of data written from journal to the data files during the last journal group commit interval.

mongodb.extra_info.page_faultsps

Number of page faults per second that require disk operations.

mongodb.fsynclocked

Number of fsynclocked performed on a mongo instance.

mongodb.globallock.activeclients.readers

Count of the active client connections performing read operations.

mongodb.globallock.activeclients.total

Total number of active client connections to the database.

mongodb.globallock.activeclients.writers

Count of active client connections performing write operations.

mongodb.globallock.currentqueue.readers

Number of operations that are currently queued and waiting for the read lock.

mongodb.globallock.currentqueue.total

Total number of operations queued waiting for the lock.

mongodb.globallock.currentqueue.writers

Number of operations that are currently queued and waiting for the write lock.

mongodb.globallock.locktime

Time since the database last started that the globalLock has been held.

mongodb.globallock.ratio

Ratio of the time that the globalLock has been held to the total time since it was created.

mongodb.globallock.totaltime

Time since the database last started and created the global lock.

mongodb.indexcounters.accessesps

Number of times that operations have accessed indexes per second.

mongodb.indexcounters.hitsps

Number of times per second that an index has been accessed and mongod is able to return the index from memory.

mongodb.indexcounters.missesps

Number of times per second that an operation attempted to access an index that was not in memory.

mongodb.indexcounters.missratio

Ratio of index hits to misses.

mongodb.indexcounters.resetsps

Number of times per second the index counters have been reset.

mongodb.locks.collection.acquirecount.exclusiveps

Number of times the collection lock type was acquired in the Exclusive (X) mode.

mongodb.locks.collection.acquirecount.intent_exclusiveps

Number of times the collection lock type was acquired in the Intent Exclusive (IX) mode.

mongodb.locks.collection.acquirecount.intent_sharedps

Number of times the collection lock type was acquired in the Intent Shared (IS) mode.

mongodb.locks.collection.acquirecount.sharedps

Number of times the collection lock type was acquired in the Shared (S) mode.

mongodb.locks.collection.acquirewaitcount.exclusiveps

Number of times the collection lock type acquisition in the Exclusive (X) mode encountered waits because the locks were held in a conflicting mode.

mongodb.locks.collection.acquirewaitcount.sharedps

Number of times the collection lock type acquisition in the Shared (S) mode encountered waits because the locks were held in a conflicting mode.

mongodb.locks.collection.timeacquiringmicros.exclusiveps

Wait time for the collection lock type acquisitions in the Exclusive (X) mode.

mongodb.locks.collection.timeacquiringmicros.sharedps

Wait time for the collection lock type acquisitions in the Shared (S) mode.

mongodb.locks.database.acquirecount.exclusiveps

Number of times the database lock type was acquired in the Exclusive (X) mode.

mongodb.locks.database.acquirecount.intent_exclusiveps

Number of times the database lock type was acquired in the Intent Exclusive (IX) mode.

mongodb.locks.database.acquirecount.intent_sharedps

Number of times the database lock type was acquired in the Intent Shared (IS) mode.

mongodb.locks.database.acquirecount.sharedps

Number of times the database lock type was acquired in the Shared (S) mode.

mongodb.locks.database.acquirewaitcount.exclusiveps

Number of times the database lock type acquisition in the Exclusive (X) mode encountered waits because the locks were held in a conflicting mode.

mongodb.locks.database.acquirewaitcount.intent_exclusiveps

Number of times the database lock type acquisition in the Intent Exclusive (IX) mode encountered waits because the locks were held in a conflicting mode.

mongodb.locks.database.acquirewaitcount.intent_sharedps

Number of times the database lock type acquisition in the Intent Shared (IS) mode encountered waits because the locks were held in a conflicting mode.

mongodb.locks.database.acquirewaitcount.sharedps

Number of times the database lock type acquisition in the Shared (S) mode encountered waits because the locks were held in a conflicting mode.

mongodb.locks.database.timeacquiringmicros.exclusiveps

Wait time for the database lock type acquisitions in the Exclusive (X) mode.

mongodb.locks.database.timeacquiringmicros.intent_exclusiveps

Wait time for the database lock type acquisitions in the Intent Exclusive (IX) mode.

mongodb.locks.database.timeacquiringmicros.intent_sharedps

Wait time for the database lock type acquisitions in the Intent Shared (IS) mode.

mongodb.locks.database.timeacquiringmicros.sharedps

Wait time for the database lock type acquisitions in the Shared (S) mode.

mongodb.locks.global.acquirecount.exclusiveps

Number of times the global lock type was acquired in the Exclusive (X) mode.

mongodb.locks.global.acquirecount.intent_exclusiveps

Number of times the global lock type was acquired in the Intent Exclusive (IX) mode.

mongodb.locks.global.acquirecount.intent_sharedps

Number of times the global lock type was acquired in the Intent Shared (IS) mode.

mongodb.locks.global.acquirecount.sharedps

Number of times the global lock type was acquired in the Shared (S) mode.

mongodb.locks.global.acquirewaitcount.exclusiveps

Number of times the global lock type acquisition in the Exclusive (X) mode encountered waits because the locks were held in a conflicting mode.

mongodb.locks.global.acquirewaitcount.intent_exclusiveps

Number of times the global lock type acquisition in the Intent Exclusive (IX) mode encountered waits because the locks were held in a conflicting mode.

mongodb.locks.global.acquirewaitcount.intent_sharedps

Number of times the global lock type acquisition in the Intent Shared (IS) mode encountered waits because the locks were held in a conflicting mode.

mongodb.locks.global.acquirewaitcount.sharedps

Number of times the global lock type acquisition in the Shared (S) mode encountered waits because the locks were held in a conflicting mode.

mongodb.locks.global.timeacquiringmicros.exclusiveps

Wait time for the global lock type acquisitions in the Exclusive (X) mode.

mongodb.locks.global.timeacquiringmicros.intent_exclusiveps

Wait time for the global lock type acquisitions in the Intent Exclusive (IX) mode.

mongodb.locks.global.timeacquiringmicros.intent_sharedps

Wait time for the global lock type acquisitions in the Intent Shared (IS) mode.

mongodb.locks.global.timeacquiringmicros.sharedps

Wait time for the global lock type acquisitions in the Shared (S) mode.

mongodb.locks.metadata.acquirecount.exclusiveps

Number of times the metadata lock type was acquired in the Exclusive (X) mode.

mongodb.locks.metadata.acquirecount.sharedps

Number of times the metadata lock type was acquired in the Shared (S) mode.

mongodb.locks.mmapv1journal.acquirecount.intent_exclusiveps

Number of times the MMAPv1 storage engine lock type was acquired in the Intent Exclusive (IX) mode.

mongodb.locks.mmapv1journal.acquirecount.intent_sharedps

Number of times the MMAPv1 storage engine lock type was acquired in the Intent Shared (IS) mode.

mongodb.locks.mmapv1journal.acquirewaitcount.intent_exclusiveps

Number of times the MMAPv1 storage engine lock type acquisition in the Intent Exclusive (IX) mode encountered waits because the locks were held in a conflicting mode.

mongodb.locks.mmapv1journal.acquirewaitcount.intent_sharedps

Number of times the MMAPv1 storage engine lock type acquisition in the Intent Shared (IS) mode encountered waits because the locks were held in a conflicting mode.

mongodb.locks.mmapv1journal.timeacquiringmicros.intent_exclusiveps

Wait time for the MMAPv1 storage engine lock type acquisitions in the Intent Exclusive (IX) mode.

mongodb.locks.mmapv1journal.timeacquiringmicros.intent_sharedps

Wait time for the MMAPv1 storage engine lock type acquisitions in the Intent Shared (IS) mode.

mongodb.locks.oplog.acquirecount.intent_exclusiveps

Number of times the oplog lock type was acquired in the Intent Exclusive (IX) mode.

mongodb.locks.oplog.acquirecount.sharedps

Number of times the oplog lock type was acquired in the Shared (S) mode.

mongodb.locks.oplog.acquirewaitcount.intent_exclusiveps

Number of times the oplog lock type acquisition in the Intent Exclusive (IX) mode encountered waits because the locks were held in a conflicting mode.

mongodb.locks.oplog.acquirewaitcount.sharedps

Number of times the oplog lock type acquisition in the Shared (S) mode encountered waits because the locks were held in a conflicting mode.

mongodb.locks.oplog.timeacquiringmicros.intent_exclusiveps

Wait time for the oplog lock type acquisitions in the Intent Exclusive (IX) mode.

mongodb.locks.oplog.timeacquiringmicros.sharedps

Wait time for the oplog lock type acquisitions in the Shared (S) mode.

mongodb.mem.bits

Size of the in-memory storage engine.

mongodb.mem.mapped

Amount of mapped memory by the database.

mongodb.mem.mappedwithjournal

The amount of mapped memory, including the memory used for journaling.

mongodb.mem.resident

Amount of memory currently used by the database process.

mongodb.mem.virtual

Amount of virtual memory used by the mongod process.

mongodb.metrics.commands.count.failed

Number of times count failed

mongodb.metrics.commands.count.total

Number of times count executed

mongodb.metrics.commands.createIndexes.failed

Number of times createIndexes failed

mongodb.metrics.commands.createIndexes.total

Number of times createIndexes executed

mongodb.metrics.commands.delete.failed

Number of times delete failed

mongodb.metrics.commands.delete.total

Number of times delete executed

mongodb.metrics.commands.eval.failed

Number of times eval failed

mongodb.metrics.commands.eval.total

Number of times eval executed

mongodb.metrics.commands.findAndModify.failed

Number of times findAndModify failed

mongodb.metrics.commands.findAndModify.total

Number of times findAndModify executed

mongodb.metrics.commands.insert.failed

Number of times insert failed

mongodb.metrics.commands.insert.total

Number of times insert executed

mongodb.metrics.commands.update.failed

Number of times update failed

mongodb.metrics.commands.update.total

Number of times update executed

mongodb.metrics.cursor.open.notimeout

Number of open cursors with the option `DBQuery.Option.noTimeout` set to prevent timeout after a period of inactivity.

mongodb.metrics.cursor.open.pinned

Number of pinned open cursors.

mongodb.metrics.cursor.open.total

Number of cursors that MongoDB is maintaining for clients.

mongodb.metrics.cursor.timedoutps

Number of cursors that time out, per second.

mongodb.metrics.document.deletedps

Number of documents deleted per second.

mongodb.metrics.document.insertedps

Number of documents inserted per second.

mongodb.metrics.document.returnedps

Number of documents returned by queries per second.

mongodb.metrics.document.updatedps

Number of documents updated per second.

mongodb.metrics.getlasterror.wtime.numps

Number of getLastError operations per second with a specified write concern (i.e. w) that wait for one or more members of a replica set to acknowledge the write operation.

mongodb.metrics.getlasterror.wtime.totalmillisps

Fraction of time (ms/s) that the mongod has spent performing getLastError operations with write concern (i.e. w) that wait for one or more members of a replica set to acknowledge the write operation.

mongodb.metrics.getlasterror.wtimeoutsps

Number of times per second that write concern operations have timed out as a result of the wtimeout threshold to getLastError

mongodb.metrics.operation.fastmodps

Number of update operations per second that neither cause documents to grow nor require updates to the index.

mongodb.metrics.operation.idhackps

Number of queries per second that contain the _id field.

mongodb.metrics.operation.writeconflictsps

Number of times per second that write concern operations has encounter a conflict.

mongodb.metrics.operation.scanandorderps

Number of queries per second that return sorted numbers that cannot perform the sort operation using an index.

mongodb.metrics.queryexecutor.scannedps

Number of index items scanned per second during queries and query-plan evaluation.

mongodb.metrics.record.movesps

Number of times per second documents move within the on-disk representation of the MongoDB data set.

mongodb.metrics.repl.apply.batches.numps

Number of batches applied across all databases per second.

mongodb.metrics.repl.apply.batches.totalmillisps

Fraction of time (ms/s) the mongod has spent applying operations from the oplog.

mongodb.metrics.repl.apply.opsps

Number of oplog operations applied per second.

mongodb.metrics.repl.buffer.count

Number of operations in the oplog buffer.

mongodb.metrics.repl.buffer.maxsizebytes

Maximum size of the buffer.

mongodb.metrics.repl.buffer.sizebytes

Current size of the contents of the oplog buffer.

mongodb.metrics.repl.network.bytesps

Amount of data read from the replication sync source per second.

mongodb.metrics.repl.network.getmores.numps

Number of getmore operations per second.

mongodb.metrics.repl.network.getmores.totalmillisps

Fraction of time (ms/s) required to collect data from getmore operations.

mongodb.metrics.repl.network.opsps

Number of operations read from the replication source per second.

mongodb.metrics.repl.network.readerscreatedps

Number of oplog query processes created per second.

mongodb.metrics.repl.preload.docs.numps

Number of documents loaded during the pre-fetch stage of replication.

mongodb.metrics.repl.preload.docs.totalmillisps

Amount of time spent loading documents as part of the pre-fetch stage of replication.

mongodb.metrics.repl.preload.indexes.numps

Number of index entries loaded by members before updating documents as part of the pre-fetch stage of replication.

mongodb.metrics.repl.preload.indexes.totalmillisps

Amount of time spent loading documents as part of the pre-fetch stage of replication.

mongodb.metrics.ttl.deleteddocumentsps

Number of documents deleted from collections with a ttl index per second.

mongodb.metrics.ttl.passesps

Number of times per second the background process removes documents from collections with a ttl index.

mongodb.network.bytesinps

The number of bytes that reflects the amount of network traffic received by this database.

mongodb.network.bytesoutps

The number of bytes that reflects the amount of network traffic sent from this database.

mongodb.network.numrequestsps

Number of distinct requests that the server has received.

mongodb.opcounters.commandps

Total number of commands per second issued to the database.

mongodb.opcounters.deleteps

Number of delete operations per second.

mongodb.opcounters.getmoreps

Number of getmore operations per second.

mongodb.opcounters.insertps

Number of insert operations per second.

mongodb.opcounters.queryps

Total number of queries per second.

mongodb.opcounters.updateps

Number of update operations per second.

mongodb.opcountersrepl.commandps

Total number of replicated commands issued to the database per second.

mongodb.opcountersrepl.deleteps

Number of replicated delete operations per second.

mongodb.opcountersrepl.getmoreps

Number of replicated getmore operations per second.

mongodb.opcountersrepl.insertps

Number of replicated insert operations per second.

mongodb.opcountersrepl.queryps

Total number of replicated queries per second.

mongodb.opcountersrepl.updateps

Number of replicated update operations per second.

mongodb.oplog.logsizemb

Total size of the oplog.

mongodb.oplog.timediff

Oplog window: difference between the first and last operation in the oplog.

mongodb.oplog.usedsizemb

Total amount of space used by the oplog.

mongodb.replset.health

Member health value of the replica set: conveys if the member is up (i.e. 1) or down (i.e. 0).

mongodb.replset.replicationlag

Delay between a write operation on the primary and its copy to a secondary.

mongodb.replset.state

State of a replica that reflects its disposition within the set.

mongodb.replset.votefraction

Fraction of votes a server will cast in a replica set election.

mongodb.replset.votes

The number of votes a server will cast in a replica set election.

mongodb.stats.datasize

Total size of the data held in this database including the padding factor.

mongodb.stats.indexes

Total number of indexes across all collections in the database.

mongodb.stats.indexsize

Total size of all indexes created on this database.

mongodb.stats.objects

Number of objects (documents) in the database across all collections.

mongodb.stats.storagesize

Total amount of space allocated to collections in this database for document storage.

mongodb.uptime

Number of seconds that the mongos or mongod process has been active.

mongodb.wiredtiger.cache.bytes_currently_in_cache

Size of the data currently in cache.

mongodb.wiredtiger.cache.failed_eviction_of_pages_exceeding_the_in_memory_maximumps

Number of failed eviction of pages that exceeded the in-memory maximum, per second.

mongodb.wiredtiger.cache.in_memory_page_splits

In-memory page splits.

mongodb.wiredtiger.cache.maximum_bytes_configured

Maximum cache size.

mongodb.wiredtiger.cache.maximum_page_size_at_eviction

Maximum page size at eviction.

mongodb.wiredtiger.cache.modified_pages_evicted

Number of pages, that have been modified, evicted from the cache.

mongodb.wiredtiger.cache.pages_currently_held_in_cache

Number of pages currently held in the cache.

mongodb.wiredtiger.cache.pages_evicted_by_application_threadsps

Number of page evicted by application threads per second.

mongodb.wiredtiger.cache.pages_evicted_exceeding_the_in_memory_maximumps

Number of pages evicted because they exceeded the cache in-memory maximum, per second.

mongodb.wiredtiger.cache.tracked_dirty_bytes_in_cache

Size of the dirty data in the cache.

mongodb.wiredtiger.cache.unmodified_pages_evicted

Number of pages, that were not modified, evicted from the cache.

mongodb.wiredtiger.concurrenttransactions.read.available

Number of available read tickets (concurrent transactions) remaining.

mongodb.wiredtiger.concurrenttransactions.read.out

Number of read tickets (concurrent transactions) in use.

mongodb.wiredtiger.concurrenttransactions.read.totaltickets

Total number of read tickets (concurrent transactions) available.

mongodb.wiredtiger.concurrenttransactions.write.available

Number of available write tickets (concurrent transactions) remaining.

mongodb.wiredtiger.concurrenttransactions.write.out

Number of write tickets (concurrent transactions) in use.

mongodb.wiredtiger.concurrenttransactions.write.totaltickets

Total number of write tickets (concurrent transactions) available.

mongodb.collection.size

The total size in bytes of the data in the collection plus the size of every indexes on the mongodb.collection.

mongodb.collection.avgObjSize

The size of the average object in the collection in bytes.

mongodb.collection.count

Total number of objects in the collection.

mongodb.collection.capped

Whether or not the collection is capped.

mongodb.collection.max

Maximum number of documents in a capped collection.

mongodb.collection.maxSize

Maximum size of a capped collection in bytes.

mongodb.collection.storageSize

Total storage space allocated to this collection for document storage.

mongodb.collection.nindexes

Total number of indices on the collection.

mongodb.collection.indexSizes

Size of index in bytes.

mongodb.collection.indexes.accesses.ops

Number of time the index was used.

mongodb.usage.commands.countps

Number of commands per second

mongodb.usage.commands.count

Number of commands since server start (deprecated)

mongodb.usage.commands.time

Total time spent performing commands in microseconds

mongodb.usage.getmore.countps

Number of getmore per second

mongodb.usage.getmore.count

Number of getmore since server start (deprecated)

mongodb.usage.getmore.time

Total time spent performing getmore in microseconds

mongodb.usage.insert.countps

Number of inserts per second

mongodb.usage.insert.count

Number of inserts since server start (deprecated)

mongodb.usage.insert.time

Total time spent performing inserts in microseconds

mongodb.usage.queries.countps

Number of queries per second

mongodb.usage.queries.count

Number of queries since server start (deprecated)

mongodb.usage.queries.time

Total time spent performing queries in microseconds

mongodb.usage.readLock.countps

Number of read locks per second

mongodb.usage.readLock.count

Number of read locks since server start (deprecated)

mongodb.usage.readLock.time

Total time spent performing read locks in microseconds

mongodb.usage.remove.countps

Number of removes per second

mongodb.usage.remove.count

Number of removes since server start (deprecated)

mongodb.usage.remove.time

Total time spent performing removes in microseconds

mongodb.usage.total.countps

Number of operations per second

mongodb.usage.total.count

Number of operations since server start (deprecated)

mongodb.usage.total.time

Total time spent performing operations in microseconds

mongodb.usage.update.countps

Number of updates per second

mongodb.usage.update.count

Number of updates since server start (deprecated)

mongodb.usage.update.time

Total time spent performing updates in microseconds

mongodb.usage.writeLock.countps

Number of write locks per second

mongodb.usage.writeLock.count

Number of write locks since server start (deprecated)

mongodb.usage.writeLock.time

Total time spent performing write locks in microseconds

6.3.2.16 - MySQL Metrics

See Application Integrations for more information.

mysql.galera.wsrep_cluster_size

The current number of nodes in the Galera cluster.

mysql.innodb.buffer_pool_free

The number of free pages in the InnoDB Buffer Pool.

mysql.innodb.buffer_pool_total

The total number of pages in the InnoDB Buffer Pool.

mysql.innodb.buffer_pool_used

The number of used pages in the InnoDB Buffer Pool.

mysql.innodb.buffer_pool_utilization

The utilization of the InnoDB Buffer Pool.

mysql.innodb.current_row_locks

The number of current row locks.

mysql.innodb.data_reads

The rate of data reads.

mysql.innodb.data_writes

The rate of data writes.

mysql.innodb.mutex_os_waits

The rate of mutex OS waits.

mysql.innodb.mutex_spin_rounds

The rate of mutex spin rounds.

mysql.innodb.mutex_spin_waits

The rate of mutex spin waits.

mysql.innodb.os_log_fsyncs

The rate of fsync writes to the log file.

mysql.innodb.row_lock_time

The fraction of time spent (ms/s) acquring row locks.

mysql.innodb.row_lock_waits

The number of times per second a row lock had to be waited for.

mysql.net.connections

The rate of connections to the server.

mysql.net.max_connections

The maximum number of connections that have been in use simultaneously since the server started.

mysql.performance.com_delete

The rate of delete statements.

mysql.performance.com_delete_multi

The rate of delete-multi statements.

mysql.performance.com_insert

The rate of insert statements.

mysql.performance.com_insert_select

The rate of insert-select statements.

mysql.performance.com_replace_select

The rate of replace-select statements.

mysql.performance.com_select

The rate of select statements.

mysql.performance.com_update

The rate of update statements.

mysql.performance.com_update_multi

The rate of update-multi.

mysql.performance.created_tmp_disk_tables

The rate of internal on-disk temporary tables created by second by the server while executing statements.

mysql.performance.created_tmp_files

The rate of temporary files created by second.

mysql.performance.created_tmp_tables

The rate of internal temporary tables created by second by the server while executing statements.

mysql.performance.kernel_time

The percentage of CPU time spent in kernel space by MySQL.

mysql.performance.key_cache_utilization

The key cache utilization ratio.

mysql.performance.open_files

The number of open files.

mysql.performance.open_tables

The number of of tables that are open.

mysql.performance.qcache_hits

The rate of query cache hits.

mysql.performance.queries

The rate of queries.

mysql.performance.questions

The rate of statements executed by the server.

mysql.performance.slow_queries

The rate of slow queries.

mysql.performance.table_locks_waited

The total number of times that a request for a table lock could not be granted immediately and a wait was needed.

mysql.performance.table_locks_waited.gauge

mysql.performance.threads_connected

The number of currently open connections.

mysql.performance.threads_running

The number of threads that are not sleeping.

mysql.performance.user_time

The percentage of CPU time spent in user space by MySQL.

mysql.replication.seconds_behind_master

The lag in seconds between the master and the slave.

mysql.replication.slave_running

A boolean showing if this server is a replication slave that is connected to a replication master.

mysql.replication.slaves_connected

The number of slaves connected to a replication master.

6.3.2.17 - NGINX and NGINX Plus Metrics

Contents

6.3.2.17.1 - NGINX Metrics

See Application Integrations for more information.

nginx.net.conn_dropped_per_s

The rate of connections dropped.

nginx.net.conn_opened_per_s

The rate of connections opened.

nginx.net.connections

The total number of active connections.

nginx.net.reading

The number of connections reading client requests.

nginx.net.request_per_s

The rate of requests processed.

nginx.net.waiting

The number of keep-alive connections waiting for work.

nginx.net.writing

The number of connections waiting on upstream responses and/or writing responses back to the client.

6.3.2.17.2 - NGINX Plus Metrics

See Application Integrations for more information.

nginx.plus.cache.bypass.bytes

The total number of bytes read from the proxied server.

nginx.plus.cache.bypass.bytes_written

The total number of bytes written to the cache.

nginx.plus.cache.bypass.responses

The total number of responses from the cache.

nginx.plus.cache.bypass.responses_written

The total number of responses written to the cache.

nginx.plus.cache.cold

Boolean. Defines whether the cache loader process is still loading data from the disk into the cache or not.

nginx.plus.cache.expired.bytes

The total number of bytes read from the proxied server.

nginx.plus.cache.expired.bytes_written

The total number of bytes written to the cache.

nginx.plus.cache.expired.responses

The total number of responses not taken from the cache

nginx.plus.cache.expired.responses_written

The total number of responses written to the cache

nginx.plus.cache.hit.bytes

The total number of bytes read from the cache

nginx.plus.cache.hit.responses

The total number of responses read from the cache

nginx.plus.cache.max_size

The limit on the maximum size of the cache specified in the configuration

nginx.plus.cache.miss.bytes

The total number of bytes read from the proxied server

nginx.plus.cache.miss.bytes_written

The total number of bytes written to the cache

nginx.plus.cache.miss.responses

The total number of responses not taken from the cache

nginx.plus.cache.miss.responses_written

The total number of responses written to the cache

nginx.plus.cache.revalidated.bytes

The total number of bytes read from the cache

nginx.plus.cache.revalidated.response

The total number of responses read from the cache

nginx.plus.cache.size

The current size of the cache

nginx.plus.cache.stale.bytes

The total number of bytes read from the cache

nginx.plus.cache.stale.responses

The total number of responses read from the cache

nginx.plus.cache.updating.bytes

The total number of bytes read from the cache

nginx.plus.cache.updating.responses

The total number of responses read from the cache

nginx.plus.connections.accepted

The total number of accepted client connections.

nginx.plus.connections.active

The current number of active client connections.

nginx.plus.connections.dropped

The total number of dropped client connections.

nginx.plus.connections.idle

The current number of idle client connections.

nginx.plus.generation

The total number of configuration reloads

nginx.plus.load_timestamp

Time of the last reload of configuration (time since Epoch).

nginx.plus.pid

The ID of the worker process that handled status request.

nginx.plus.plus.upstream.peers.fails

The total number of unsuccessful attempts to communicate with the server.

nginx.plus.ppid

The ID of the master process that started the worker process

nginx.plus.processes.respawned

The total number of abnormally terminated and re-spawned child processes.

nginx.plus.requests.current

The current number of client requests.

nginx.plus.requests.total

The total number of client requests.

nginx.plus.server_zone.discarded

The total number of requests completed without sending a response.

nginx.plus.server_zone.processing

The number of client requests that are currently being processed.

nginx.plus.server_zone.received

The total amount of data received from clients.

nginx.plus.server_zone.requests

The total number of client requests received from clients.

nginx.plus.server_zone.responses.1xx

The number of responses with 1xx status code.

nginx.plus.server_zone.responses.2xx

The number of responses with 2xx status code.

nginx.plus.server_zone.responses.3xx

The number of responses with 3xx status code.

nginx.plus.server_zone.responses.4xx

The number of responses with 4xx status code.

nginx.plus.server_zone.responses.5xx

The number of responses with 5xx status code.

nginx.plus.server_zone.responses.total

The total number of responses sent to clients.

nginx.plus.server_zone.sent

The total amount of data sent to clients.

nginx.plus.slab.pages.free

The current number of free memory pages

nginx.plus.slab.pages.used

The current number of used memory pages

nginx.plus.slab.slots.fails

The number of unsuccessful attempts to allocate memory of specified size

nginx.plus.slab.slots.free

The current number of free memory slots

nginx.plus.slab.slots.reqs

The total number of attempts to allocate memory of specified size

nginx.plus.slab.slots.used

The current number of used memory slots

nginx.plus.ssl.handshakes

The total number of successful SSL handshakes.

nginx.plus.ssl.handshakes_failed

The total number of failed SSL handshakes.

nginx.plus.ssl.session_reuses

The total number of session reuses during SSL handshake.

nginx.plus.stream.server_zone.connections

The total number of connections accepted from clients

nginx.plus.stream.server_zone.connections

The total number of connections accepted from clients

nginx.plus.stream.server_zone.discarded

The total number of requests completed without sending a response.

nginx.plus.stream.server_zone.discarded

The total number of requests completed without sending a response.

nginx.plus.stream.server_zone.processing

The number of client requests that are currently being processed.

nginx.plus.stream.server_zone.processing

The number of client requests that are currently being processed.

nginx.plus.stream.server_zone.received

The total amount of data received from clients.

nginx.plus.stream.server_zone.received

The total amount of data received from clients.

nginx.plus.stream.server_zone.sent

The total amount of data sent to clients.

nginx.plus.stream.server_zone.sent

The total amount of data sent to clients.

nginx.plus.stream.server_zone.sessions.1xx

The number of responses with 1xx status code.

nginx.plus.stream.server_zone.sessions.2xx

The number of responses with 2xx status code.

nginx.plus.stream.server_zone.sessions.3xx

The number of responses with 3xx status code.

nginx.plus.stream.server_zone.sessions.4xx

The number of responses with 4xx status code.

nginx.plus.stream.server_zone.sessions.5xx

The number of responses with 5xx status code.

nginx.plus.stream.server_zone.sessions.total

The total number of responses sent to clients.

nginx.plus.stream.upstream.peers.active

The current number of connections

nginx.plus.stream.upstream.peers.backup

A boolean value indicating whether the server is a backup server.

nginx.plus.stream.upstream.peers.connections

The total number of client connections forwarded to this server.

nginx.plus.stream.upstream.peers.downstart

The time (time since Epoch) when the server became “unavail” or “checking” or “unhealthy”

nginx.plus.stream.upstream.peers.downtime

Total time the server was in the “unavail” or “checking” or “unhealthy” states.

nginx.plus.stream.upstream.peers.fails

The total number of unsuccessful attempts to communicate with the server.

nginx.plus.stream.upstream.peers.health_checks.checks

The total number of health check requests made.

nginx.plus.stream.upstream.peers.health_checks.fails

The number of failed health checks.

nginx.plus.stream.upstream.peers.health_checks.last_passed

Boolean indicating if the last health check request was successful and passed tests.

nginx.plus.stream.upstream.peers.health_checks.unhealthy

How many times the server became unhealthy (state “unhealthy”).

nginx.plus.stream.upstream.peers.id

The ID of the server.

nginx.plus.stream.upstream.peers.received

The total number of bytes received from this server.

nginx.plus.stream.upstream.peers.selected

The time (time since Epoch) when the server was last selected to process a connection.

nginx.plus.stream.upstream.peers.sent

The total number of bytes sent to this server.

nginx.plus.stream.upstream.peers.unavail

How many times the server became unavailable for client connections (state “unavail”).

nginx.plus.stream.upstream.peers.weight

Weight of the server.

nginx.plus.stream.upstream.zombies

The current number of servers removed from the group but still processing active client connections.

nginx.plus.timestamp

Current time since Epoch.

nginx.plus.upstream.keepalive

The current number of idle keepalive connections.

nginx.plus.upstream.peers.active

The current number of active connections.

nginx.plus.upstream.peers.backup

A boolean value indicating whether the server is a backup server.

nginx.plus.upstream.peers.downstart

The time (since Epoch) when the server became “unavail” or “unhealthy”.

nginx.plus.upstream.peers.downtime

Total time the server was in the “unavail” and “unhealthy” states.

nginx.plus.upstream.peers.health_checks.checks

The total number of health check requests made.

nginx.plus.upstream.peers.health_checks.fails

The number of failed health checks.

nginx.plus.upstream.peers.health_checks.last_passed

Boolean indicating if the last health check request was successful and passed tests.

nginx.plus.upstream.peers.health_checks.unhealthy

How many times the server became unhealthy (state “unhealthy”).

nginx.plus.upstream.peers.id

he ID of the server.

nginx.plus.upstream.peers.received

The total amount of data received from this server.

nginx.plus.upstream.peers.requests

The total number of client requests forwarded to this server.

nginx.plus.upstream.peers.responses.1xx

The number of responses with 1xx status code.

nginx.plus.upstream.peers.responses.1xx_count

The number of responses with 1xx status code (shown as count).

nginx.plus.upstream.peers.responses.2xx

The number of responses with 2xx status code.

nginx.plus.upstream.peers.responses.2xx_count

The number of responses with 2xx status code (shown as count).

nginx.plus.upstream.peers.responses.3xx

The number of responses with 3xx status code.

nginx.plus.upstream.peers.responses.3xx_count

The number of responses with 3xx status code (shown as count).

nginx.plus.upstream.peers.responses.4xx

The number of responses with 4xx status code.

nginx.plus.upstream.peers.responses.4xx_count

The number of responses with 4xx status code (shown as count).

nginx.plus.upstream.peers.responses.5xx

The number of responses with 5xx status code.

nginx.plus.upstream.peers.responses.5xx_count

The number of responses with 5xx status code (shown as count).

nginx.plus.upstream.peers.responses.total

The total number of responses obtained from this server.

nginx.plus.upstream.peers.selected

The time (since Epoch) when the server was last selected to process a request (1.7.5).

nginx.plus.upstream.peers.sent

The total amount of data sent to this server.

nginx.plus.upstream.peers.unavail

How many times the server became unavailable for client requests (state “unavail”) due to the number of unsuccessful attempts reaching the max_fails threshold.

nginx.plus.upstream.peers.weight

The weight of the server.

nginx.plus.version

The NGINX version.

6.3.2.18 - NTP Metrics

See Application Integrations for more information.

ntp.offset

The time difference between the local clock and the NTP reference clock, in seconds.

6.3.2.19 - PGBouncer Metrics

See Application Integrations for more information.

pgbouncer.pools.cl_active

The number of client connections linked to a server connection and able to process queries.

pgbouncer.pools.cl_waiting

The number of client connections waiting on a server connection.

pgbouncer.pools.maxwait

The age of the oldest unserved client connection.

pgbouncer.pools.sv_active

The number of server connections linked to a client connection.

pgbouncer.pools.sv_idle

The number of server connections idle and ready for a client query.

pgbouncer.pools.sv_login

The number of server connections currently in the process of logging in.

pgbouncer.pools.sv_tested

The number of server connections currently running either server_reset_query or server_check_query.

pgbouncer.pools.sv_used

The number of server connections idle more than server_check_delay, needing server_check_query.

pgbouncer.stats.avg_query

The average query duration.

pgbouncer.stats.avg_recv

The average amount of client network traffic received.

pgbouncer.stats.avg_req

The average number of requests per second in the last stat period.

pgbouncer.stats.avg_sent

The average amount of client network traffic sent.

pgbouncer.stats.bytes_received_per_second

The total network traffic received.

pgbouncer.stats.bytes_sent_per_second

The total network traffic sent.

pgbouncer.stats.requests_per_second

The request rate.

pgbouncer.stats.total_query_time

The time spent by PgBouncer actively querying PostgreSQL.

6.3.2.20 - PHP-FPM Metrics

See Application Integrations for more information.

php_fpm.listen_queue.size

The size of the socket queue of pending connections.

php_fpm.processes.active

The total number of active processes.

php_fpm.processes.idle

The total number of idle processes.

php_fpm.processes.max_reached

The number of times the process limit has been reached.

php_fpm.processes.total

The total number of processes.

php_fpm.requests.accepted

The total number of accepted requests.

php_fpm.requests.slow

The total number of slow requests.

6.3.2.21 - PostgreSQL Metrics

See Application Integrations for more information.

Metric NameTypeDescription
postgresql.seq_scansgaugeThe number of sequential scans initiated on this table.
postgresql.index_scansgaugeThe number of index scans initiated on this table.
postgresql.index_rows_fetchedgaugeThe number of live rows fetched by index scans.
postgresql.rows_hot_updatedgaugeThe number of rows HOT updated, meaning no separate index update was needed.
postgresql.live_rowsgaugeThe estimated number of live rows.
postgresql.dead_rowsgaugeThe estimated number of dead rows.
postgresql.index_rows_readgaugeThe number of index entries returned by scans on this index.
postgresql.table_sizegaugeThe total disk space used by the specified table. Includes TOAST, free space map, and visibility map. Excludes indexes.
postgresql.index_sizegaugeThe total disk space used by indexes attached to the specified table.
postgresql.total_sizegaugeThe total disk space used by the table, including indexes and TOAST data.
postgresql.heap_blocks_readgaugeThe number of disk blocks read from this table.
postgresql.heap_blocks_hitgaugeThe number of buffer hits in this table.
postgresql.index_blocks_readgaugeThe number of disk blocks read from all indexes on this table.
postgresql.index_blocks_hitgaugeThe number of buffer hits in all indexes on this table.
postgresql.toast_blocks_readgaugeThe number of disk blocks read from this table’s TOAST table.
postgresql.toast_blocks_hitgaugeThe number of buffer hits in this table’s TOAST table.
postgresql.toast_index_blocks_readgaugeThe number of disk blocks read from this table’s TOAST table index.
postgresql.toast_index_blocks_hitgaugeThe number of buffer hits in this table’s TOAST table index.
postgresql.active_queriesgaugeThe number of active queries in this database.
postgresql.archiver.archived_countgaugeThe number of WAL files that have been successfully archived.
postgresql.archiver.failed_countgaugeThe number of failed attempts for archiving WAL files.
postgresql.before_xid_wraparoundgaugeThe number of transactions that can occur until a transaction wraparound.
postgresql.index_rel_rows_fetchedrateThe number of live rows fetched by index scans.
postgresql.transactions.idle_in_transactiongaugeThe number of ‘idle in transaction’ transactions in this database.
postgresql.transactions.opengaugeThe number of open transactions in this database.
postgresql.waiting_queriesgaugeThe number of waiting queries in this database.
postgresql.waiting_queriesgaugeThe number of buffers allocated
postgresql.bgwriter.buffers_backendgaugeThe number of buffers written directly by a backend.
postgresql.bgwriter.buffers_backend_fsyncgaugeThe of times a backend had to execute its own fsync call instead of the background writer.
postgresql.bgwriter.buffers_checkpointgaugeThe number of buffers written during checkpoints.
postgresql.bgwriter.buffers_cleangaugeThe number of buffers written by the background writer.
postgresql.bgwriter.checkpoints_requestedgaugeThe number of requested checkpoints that were performed.
postgresql.bgwriter.checkpoints_timedgaugeThe number of scheduled checkpoints that were performed.
postgresql.bgwriter.maxwritten_cleangauge.The number of times the background writer stopped a cleaning scan due to writing too many buffers.
postgresql.bgwriter.sync_timegaugeThe total amount of checkpoint processing time spent synchronizing files to disk.
postgresql.bgwriter.write_timegaugeThe total amount of checkpoint processing time spent writing files to disk.
postgresql.buffer_hitgaugeThe number of times disk blocks were found in the buffer cache, preventing the need to read from the database.
postgresql.commitsgaugeThe number of transactions that have been committed in this database.
postgresql.connectionsgaugeThe number of active connections to this database.
postgresql.database_sizegaugeThe disk space used by this database.
postgresql.deadlocksgaugeThe number of deadlocks detected in this database
postgresql.disk_readgaugeThe number of disk blocks read in this database.
postgresql.locksgaugeThe number of locks active for this database.
postgresql.max_connectionsgaugeThe maximum number of client connections allowed to this database.
postgresql.percent_usage_connectionsgaugeThe number of connections to this database as a fraction of the maximum number of allowed connections.
postgresql.replication_delaygaugeThe current replication delay in seconds. Only available with PostgreSQL 9.1 and newer.
postgresql.replication_delay_bytesgaugeThe current replication delay in bytes. Only available with PostgreSQL 9.2 and newer.
postgresql.rollbacksgaugeThe number of transactions that have been rolled back in this database.
postgresql.rows_deletedgaugeThe number of rows deleted by queries in this database.
postgresql.rows_fetchedgaugeThe number of rows fetched by queries in this database.
postgresql.rows_insertedgaugeThe number of rows inserted by queries in this database. The metrics can be segmented by ‘db’ or ’table’ and can be viewed per-relation.
postgresql.rows_returnedgaugeThe number of rows returned by queries in this database. The metrics can be segmented by ‘db’ or ’table’ and can be viewed per-relation.
postgresql.rows_updatedgaugeThe number of rows updated by queries in this database.
postgresql.rows_deletedgaugeThe number of rows deleted by queries in this database. The metrics can be segmented by ‘db’ or ’table’ and can be viewed per-relation.
postgresql.table.countgaugeThe number of user tables in this database.
postgresql.temp_bytesgaugeThe amount of data written to temporary files by queries in this database.
postgresql.temp_filesgaugeThe number of temporary files created by queries in this database.
postgresql.toast_blocks_readgaugeThe number of disk blocks read from this table’s TOAST table.
postgresql.transactions.idle_in_transactiongaugeThe number of ‘idle in transaction’ transactions in this database.
postgresql.transactions.opengaugeThe number of open transactions in this database.

6.3.2.22 - RabbitMQ Metrics

See Application Integrations for more information.

rabbitmq.connections

The number of current connections to a given rabbitmq vhost. Each connection is tagged as rabbitmq_vhost:<vhost_name>.

rabbitmq.connections.state

The number of connections in the specified connection state.

rabbitmq.exchange.messages.ack.count

The number of messages delivered to clients and acknowledged.

rabbitmq.exchange.messages.ack.rate

The rate of messages delivered to clients and acknowledged per second.

rabbitmq.exchange.messages.confirm.count

The number of messages confirmed.

rabbitmq.exchange.messages.confirm.rate

The rate of messages confirmed per second.

rabbitmq.exchange.messages.deliver_get.count

The sum of messages delivered in acknowledgement mode to consumers, in no-acknowledgement mode to consumers, in acknowledgement mode in response to basic.get, and in no-acknowledgement mode in response to basic.get.

rabbitmq.exchange.messages.deliver_get.rate

The rate per second of the sum of messages delivered in acknowledgement mode to consumers, in no-acknowledgement mode to consumers, in acknowledgement mode in response to basic.get, and in no-acknowledgement mode in response to basic.get.

rabbitmq.exchange.messages.publish_in.count

The number of messages published from channels into this exchange.

rabbitmq.exchange.messages.publish_in.rate

The amount of messages published from channels into this exchange per second.

rabbitmq.exchange.messages.publish_out.count

The number of messages published from this exchange into queues.

rabbitmq.exchange.messages.publish_out.rate

The amount of messages published from this exchange into queues per second.

rabbitmq.exchange.messages.publish.count

The number of messages published.

rabbitmq.exchange.messages.publish.rate

The amount of messages published per second.

rabbitmq.exchange.messages.redeliver.count

The number of subset of messages in deliver_get which had the redelivered flag set.

rabbitmq.exchange.messages.redeliver.rate

The amount of subset of messages in deliver_get which had the redelivered flag set per second.

rabbitmq.exchange.messages.return_unroutable.count

The number of messages returned to the publisher as unroutable.

rabbitmq.exchange.messages.return_unroutable.rate

The amount of messages returned to publisher as unroutable per second.

rabbitmq.node.disk_alarm

Defines whether the node has a disk alarm configured.

rabbitmq.node.disk_free

The current free disk space.

rabbitmq.node.fd_used

Used file descriptors.

rabbitmq.node.mem_alarm

Defines whether the node has a memory alarm configured.

rabbitmq.node.mem_used

The total memory used in bytes.

rabbitmq.node.partitions

The number of network partitions this node is seeing.

rabbitmq.node.run_queue

The average number of Erlang processes waiting to run.

rabbitmq.node.running

Defines whether the node is running or not.

rabbitmq.node.sockets_used

The number of file descriptors used as sockets.

rabbitmq.overview.messages.ack.count

The number of messages delivered to clients and acknowledged.

rabbitmq.overview.messages.ack.rate

The rate of messages delivered to clients and acknowledged per second.

rabbitmq.overview.messages.confirm.count

The number of messages confirmed.

rabbitmq.overview.messages.confirm.rate

The rate of messages confirmed per second.

rabbitmq.overview.messages.deliver_get.count

The sum of messages delivered in acknowledgement mode to consumers, in no-acknowledgement mode to consumers, in acknowledgement mode in response to basic.get, and in no-acknowledgement mode in response to basic.get.

rabbitmq.overview.messages.deliver_get.rate

The rate per second of the sum of messages delivered in acknowledgement mode to consumers, in no-acknowledgement mode to consumers, in acknowledgement mode in response to basic.get, and in no-acknowledgement mode in response to basic.get.

rabbitmq.overview.messages.publish_in.count

The number of messages published from channels into this overview.

rabbitmq.overview.messages.publish_in.rate

The rate of messages published from channels into this overview per second.

rabbitmq.overview.messages.publish_out.count

The number of messages published from this overview into queues.

rabbitmq.overview.messages.publish_out.rate

The rate of messages published from this overview into queues per second.

rabbitmq.overview.messages.publish.count

The number of messages published.

rabbitmq.overview.messages.publish.rate

The rate of messages published per second.

rabbitmq.overview.messages.redeliver.count

The number of subset of messages in deliver_get which had the redelivered flag set.

rabbitmq.overview.messages.redeliver.rate

The rate of subset of messages in deliver_get which had the redelivered flag set per second.

rabbitmq.overview.messages.return_unroutable.count

The number of messages returned to publisher as unroutable.

rabbitmq.overview.messages.return_unroutable.rate

The rate of messages returned to publisher as unroutable per second.

rabbitmq.overview.object_totals.channels

The total number of channels.

rabbitmq.overview.object_totals.connections

The total number of connections.

rabbitmq.overview.object_totals.consumers

The total number of consumers.

rabbitmq.overview.object_totals.queues

The total number of queues.

rabbitmq.overview.queue_totals.messages_ready.count

The number of messages ready for delivery.

rabbitmq.overview.queue_totals.messages_ready.rate

The rate of messages ready for delivery.

rabbitmq.overview.queue_totals.messages_unacknowledged.count

The number of unacknowledged messages.

rabbitmq.overview.queue_totals.messages_unacknowledged.rate

The rate of unacknowledged messages.

rabbitmq.overview.queue_totals.messages.count

The total number of messages (ready plus unacknowledged).

rabbitmq.overview.queue_totals.messages.rate

The rate of messages (ready plus unacknowledged).

rabbitmq.queue.active_consumers

The number of active consumers, consumers that can immediately receive any messages sent to the queue.

rabbitmq.queue.bindings.count

The number of bindings for a specific queue.

rabbitmq.queue.consumer_utilisation

The ratio of time that a queue’s consumers can take new messages.

rabbitmq.queue.consumers

The number of consumers.

rabbitmq.queue.memory

The number of bytes of memory consumed by the Erlang process associated with the queue, including stack, heap and internal structures.

rabbitmq.queue.messages

The total number of messages in the queue.

rabbitmq.queue.messages_ready

The number of messages ready to be delivered to clients.

rabbitmq.queue.messages_ready.rate

The number of messages ready to be delivered to clients per second.

rabbitmq.queue.messages_unacknowledged

The number of messages delivered to clients but not yet acknowledged.

rabbitmq.queue.messages_unacknowledged.rate

The number of messages delivered to clients but not yet acknowledged per second.

rabbitmq.queue.messages.ack.count

The number of messages delivered to clients and acknowledged.

rabbitmq.queue.messages.ack.rate

The number of messages delivered to clients and acknowledged per second.

rabbitmq.queue.messages.deliver_get.count

The sum of messages delivered in acknowledgement mode to consumers, in no-acknowledgement mode to consumers, in acknowledgement mode in response to basic.get, and in no-acknowledgement mode in response to basic.get.

rabbitmq.queue.messages.deliver_get.rate

The sum of messages delivered in acknowledgement mode to consumers, in no-acknowledgement mode to consumers, in acknowledgement mode in response to basic.get, and in no-acknowledgement mode in response to basic.get per second.

rabbitmq.queue.messages.deliver.count

The number of messages delivered in acknowledgement mode to consumers.

rabbitmq.queue.messages.deliver.rate

The number of messages delivered in acknowledgement mode to consumers.

rabbitmq.queue.messages.publish.count

The number of messages published.

rabbitmq.queue.messages.publish.rate

The rate of messages published per second.

rabbitmq.queue.messages.rate

The total number of messages in the queue per second.

rabbitmq.queue.messages.redeliver.count

The number of subset of messages in deliver_get which had the redelivered flag set.

rabbitmq.queue.messages.redeliver.rate

The rate per second of subset of messages in deliver_get which had the redelivered flag set.

6.3.2.23 - Supervisord Metrics

See Application Integrations for more information.

supervisord.process.count

The number of supervisord monitored processes.

supervisord.process.uptime

The process uptime.

6.3.2.24 - TCP Metrics

See Application Integrations for more information.

network.tcp.response_time

The response time of a given host and TCP port.

6.3.2.25 - Varnish Metrics

See Application Integrations for more information.

All Varnish metrics have the type gauge except varnish.n_purgesps, which has the type rate.

varnish.accept_fail

Accept failures. This metric is only provided by varnish 3.x.

varnish.backend_busy

Maximum number of connections to a given backend.

varnish.backend_conn

Successful connections to a given backend.

varnish.backend_fail

Failed connections for a given backend.

varnish.backend_recycle

Backend connections with keep-alive that are returned to the pool of connections.

varnish.backend_req

Backend requests.

varnish.backend_retry

Backend connection retries.

varnish.backend_reuse

Recycled connections that has were reused.

varnish.backend_toolate

Backend connections closed because they were idle too long.

varnish.backend_unhealthy

Backend connections not tried because the backend was unhealthy.

varnish.bans

Bans in system, including bans superseded by newer bans and bans already checked by the ban-lurker. This metric is only provided by varnish 4.x.

varnish.bans_added

Bans added to ban list. This metric is only provided by varnish 4.x.

varnish.bans_completed

Bans which are no longer active, either because they got checked by the ban-lurker or superseded by newer identical bans. This metric is only provided by varnish 4.x.

varnish.bans_deleted

Bans deleted from ban list. This metric is only provided by varnish 4.x.

varnish.bans_dups

Bans replaced by later identical bans. This metric is only provided by varnish 4.x.

varnish.bans_lurker_contention

Times the ban-lurker waited for lookups. This metric is only provided by varnish 4.x.

varnish.bans_lurker_obj_killed

Objects killed by ban-lurker. This metric is only provided by varnish 4.x.

varnish.bans_lurker_tested

Bans and objects tested against each other by the ban-lurker. This metric is only provided by varnish 4.x.

varnish.bans_lurker_tests_tested

Tests and objects tested against each other by the ban-lurker. ‘ban req.url == foo && req.http.host == bar’ counts as one in ‘bans_tested’ and as two in ‘bans_tests_tested’. This metric is only provided by varnish 4.x.

varnish.bans_obj

Bans which use obj.* variables. These bans can possibly be washed by the ban-lurker. This metric is only provided by varnish 4.x.

varnish.bans_obj_killed

Objects killed by bans during object lookup. This metric is only provided by varnish 4.x

varnish.bans_persisted_bytes

Bytes used by the persisted ban lists. This metric is only provided by varnish 4.x.

varnish.bans_persisted_fragmentation

Extra bytes accumulated through dropped and completed bans in the persistent ban lists. This metric is only provided by varnish 4.x.

varnish.bans_req

Bans which use req.* variables. These bans can not be washed by the ban-lurker. This metric is only provided by varnish 4.x.

varnish.bans_tested

Bans and objects tested against each other during hash lookup. This metric is only provided by varnish 4.x.

varnish.bans_tests_tested

Tests and objects tested against each other during lookup. ‘ban req.url == foo && req.http.host == bar’ counts as one in ‘bans_tested’ and as two in ‘bans_tests_tested’. This metric is only provided by varnish 4.x.

varnish.busy_sleep

Requests sent to sleep without a worker thread because they found a busy object. This metric is only provided by varnish 4.x.

varnish.busy_wakeup

Requests taken off the busy object sleep list and and rescheduled. This metric is only provided by varnish 4.x.

varnish.cache_hit

Requests served from the cache.

varnish.cache_hitpass

Requests passed to a backend where the decision to pass them found in the cache.

varnish.cache_miss

Requests fetched from a backend server.

varnish.client_conn

Client connections accepted. This metric is only provided by varnish 3.x.

varnish.client_drop

Client connection dropped, no session. This metric is only provided by varnish 3.x.

varnish.client_drop_late

Client connection dropped late. This metric is only provided by varnish 3.x.

varnish.client_req

Parseable client requests seen.

varnish.client_req_400

Requests that were malformed in some drastic way. This metric is only provided by varnish 4.x.

varnish.client_req_411

Requests that were missing a Content-Length: header. This metric is only provided by varnish 4.x.

varnish.client_req_413

Requests that were too big. This metric is only provided by varnish 4.x.

varnish.client_req_417

Requests with a bad Expect: header. This metric is only provided by varnish 4.x.

varnish.dir_dns_cache_full

DNS director full DNS cache. This metric is only provided by varnish 3.x.

varnish.dir_dns_failed

DNS director failed lookup. This metric is only provided by varnish 3.x.

varnish.dir_dns_hit

DNS director cached lookup hit. This metric is only provided by varnish 3.x.

varnish.dir_dns_lookups

DNS director lookups. This metric is only provided by varnish 3.x.

varnish.esi_errors

Edge Side Includes (ESI) parse errors.

varnish.esi_warnings

Edge Side Includes (ESI) parse warnings.

varnish.exp_mailed

Objects mailed to expiry thread for handling. This metric is only provided by varnish 4.x.

varnish.exp_received

Objects received by expiry thread for handling. This metric is only provided by varnish 4.x.

varnish.fetch_1xx

Back end response with no body because of 1XX response (Informational).

varnish.fetch_204

Back end response with no body because of 204 response (No Content).

varnish.fetch_304

Back end response with no body because of 304 response (Not Modified).

varnish.fetch_bad

Back end response’s body length could not be determined and/or had bad headers.

varnish.fetch_chunked

Back end response bodies that were chunked.

varnish.fetch_close

Fetch wanted close.

varnish.fetch_eof

Back end response bodies with EOF.

varnish.fetch_failed

Back end response fetches that failed.

varnish.fetch_head

Back end HEAD requests.

varnish.fetch_length

Back end response bodies with Content-Length.

varnish.fetch_no_thread

Back end fetches that failed because no thread was available. This metric is only provided by varnish 4.x.

varnish.fetch_oldhttp

Number of responses served by backends with http < 1.1

varnish.fetch_zero

Number of responses that have zero length.

varnish.hcb_insert

HCB inserts.

varnish.hcb_lock

HCB lookups with lock.

varnish.hcb_nolock

HCB lookups without lock.

varnish.LCK.backend.colls

Collisions. This metric is only provided by varnish 3.x.

varnish.LCK.backend.creat

Created locks.

varnish.LCK.backend.destroy

Destroyed locks.

varnish.LCK.backend.locks

Lock operations.

varnish.LCK.ban.colls

Collisions. This metric is only provided by varnish 3.x.

varnish.LCK.ban.creat

Created locks.

varnish.LCK.ban.destroy

Destroyed locks.

varnish.LCK.ban.locks

Lock operations.

varnish.LCK.busyobj.creat

Created locks. This metric is only provided by varnish 4.x.

varnish.LCK.busyobj.destroy

Destroyed locks. This metric is only provided by varnish 4.x.

varnish.LCK.busyobj.locks

Lock operations. This metric is only provided by varnish 4.x.

varnish.LCK.cli.colls

Collisions. This metric is only provided by varnish 3.x.

varnish.LCK.cli.creat

Created locks.

varnish.LCK.cli.destroy

Destroyed locks.

varnish.LCK.cli.locks

Lock operations.

varnish.LCK.exp.colls

Collisions. This metric is only provided by varnish 3.x.

varnish.LCK.exp.creat

Created locks.

varnish.LCK.exp.destroy

Destroyed locks.

varnish.LCK.exp.locks

Lock operations.

varnish.LCK.hcb.colls

Collisions. This metric is only provided by varnish 3.x.

varnish.LCK.hcb.creat

Created locks.

varnish.LCK.hcb.destroy

Destroyed locks.

varnish.LCK.hcb.locks

Lock operations.

varnish.LCK.hcl.colls

Collisions. This metric is only provided by varnish 3.x.

varnish.LCK.hcl.creat

Created locks.

varnish.LCK.hcl.destroy

Destroyed locks.

varnish.LCK.hcl.locks

Lock operations.

varnish.LCK.herder.colls

Collisions. This metric is only provided by varnish 3.x.

varnish.LCK.herder.creat

Created locks.

varnish.LCK.herder.destroy

Destroyed locks.

varnish.LCK.herder.locks

Lock operations.

varnish.LCK.hsl.colls

Collisions. This metric is only provided by varnish 3.x.

varnish.LCK.hsl.creat

Created locks.

varnish.LCK.hsl.destroy

Destroyed locks.

varnish.LCK.hsl.locks

Lock operations.

varnish.LCK.lru.colls

Collisions. This metric is only provided by varnish 3.x.

varnish.LCK.lru.creat

Created locks.

varnish.LCK.lru.destroy

Destroyed locks.

varnish.LCK.lru.locks

Lock operations.

varnish.LCK.mempool.creat

Created locks. This metric is only provided by varnish 4.x.

varnish.LCK.mempool.destroy

Destroyed locks. This metric is only provided by varnish 4.x.

varnish.LCK.mempool.locks

Lock operations. This metric is only provided by varnish 4.x.

varnish.LCK.nbusyobj.creat

Created locks. This metric is only provided by varnish 4.x.

varnish.LCK.nbusyobj.destroy

Destroyed locks. This metric is only provided by varnish 4.x.

varnish.LCK.nbusyobj.locks

Lock operations. This metric is only provided by varnish 4.x.

varnish.LCK.objhdr.colls

Collisions. This metric is only provided by varnish 3.x.

varnish.LCK.objhdr.creat

Created locks.

varnish.LCK.objhdr.destroy

Destroyed locks.

varnish.LCK.objhdr.locks

Lock operations.

varnish.LCK.pipestat.creat

Created locks. This metric is only provided by varnish 4.x.

varnish.LCK.pipestat.destroy

Destroyed locks. This metric is only provided by varnish 4.x.

varnish.LCK.pipestat.locks

Lock operations. This metric is only provided by varnish 4.x.

varnish.LCK.sess.creat

Created locks. This metric is only provided by varnish 4.x.

varnish.LCK.sess.destroy

Destroyed locks. This metric is only provided by varnish 4.x.

varnish.LCK.sess.locks

Lock operations. This metric is only provided by varnish 4.x.

varnish.LCK.sessmem.colls

Collisions. This metric is only provided by varnish 3.x.

varnish.LCK.sessmem.creat

Created locks.

varnish.LCK.sessmem.destroy

Destroyed locks.

varnish.LCK.sessmem.locks

Lock operations.

varnish.LCK.sma.colls

Collisions. This metric is only provided by varnish 3.x.

varnish.LCK.sma.creat

Created locks.

varnish.LCK.sma.destroy

Destroyed locks.

varnish.LCK.sma.locks

Lock operations.

varnish.LCK.smf.colls

Collisions. This metric is only provided by varnish 3.x.

varnish.LCK.smf.creat

Created locks.

varnish.LCK.smf.destroy

Destroyed locks.

varnish.LCK.smf.locks

Lock operations.

varnish.LCK.smp.colls

Collisions. This metric is only provided by varnish 3.x.

varnish.LCK.smp.creat

Created locks.

varnish.LCK.smp.destroy

Destroyed locks.

varnish.LCK.smp.locks

Lock operations.

varnish.LCK.sms.colls

Collisions. This metric is only provided by varnish 3.x.

varnish.LCK.sms.creat

Created locks.

varnish.LCK.sms.destroy

Destroyed locks.

varnish.LCK.sms.locks

Lock operations.

varnish.LCK.stat.colls

Collisions. This metric is only provided by varnish 3.x.

varnish.LCK.stat.creat

Created locks. This metric is only provided by varnish 3.x.

varnish.LCK.stat.destroy

Destroyed locks. This metric is only provided by varnish 3.x.

varnish.LCK.stat.locks

Lock operations. This metric is only provided by varnish 3.x.

varnish.LCK.vbe.colls

Collisions. This metric is only provided by varnish 3.x.

varnish.LCK.vbe.creat

Created locks. This metric is only provided by varnish 3.x.

varnish.LCK.vbe.destroy

Destroyed locks. This metric is only provided by varnish 3.x.

varnish.LCK.vbe.locks

Lock operations. This metric is only provided by varnish 3.x.

varnish.LCK.vbp.colls

Collisions. This metric is only provided by varnish 3.x.

varnish.LCK.vbp.creat

Created locks.

varnish.LCK.vbp.destroy

Destroyed locks.

varnish.LCK.vbp.locks

Lock operations.

varnish.LCK.vcapace.creat

Created locks. This metric is only provided by varnish 4.x.

varnish.LCK.vcapace.destroy

Destroyed locks. This metric is only provided by varnish 4.x.

varnish.LCK.vcapace.locks

Lock operations. This metric is only provided by varnish 4.x.

varnish.LCK.vcl.colls

Collisions. This metric is only provided by varnish 3.x.

varnish.LCK.vcl.creat

Created locks.

varnish.LCK.vcl.destroy

Destroyed locks.

varnish.LCK.vcl.locks

Lock operations.

varnish.LCK.vxid.creat

Created locks. This metric is only provided by varnish 4.x.

varnish.LCK.vxid.destroy

Destroyed locks. This metric is only provided by varnish 4.x.

varnish.LCK.vxid.locks

Lock operations. This metric is only provided by varnish 4.x.

varnish.LCK.wq.colls

Collisions. This metric is only provided by varnish 3.x.

varnish.LCK.wq.creat

Created locks.

varnish.LCK.wq.destroy

Destroyed locks.

varnish.LCK.wq.locks

Lock operations.

varnish.LCK.wstat.colls

Collisions. This metric is only provided by varnish 3.x.

varnish.LCK.wstat.creat

Created locks.

varnish.LCK.wstat.destroy

Destroyed locks.

varnish.LCK.wstat.locks

Lock operations.

varnish.losthdr

HTTP header overflows.

varnish.MEMPOOL.busyobj.allocs

Allocations. This metric is only provided by varnish 4.x.

varnish.MEMPOOL.busyobj.frees

Frees. This metric is only provided by varnish 4.x.

varnish.MEMPOOL.busyobj.live

In use. This metric is only provided by varnish 4.x.

varnish.MEMPOOL.busyobj.pool

In pool. This metric is only provided by varnish 4.x.

varnish.MEMPOOL.busyobj.randry

Pool ran dry. This metric is only provided by varnish 4.x.

varnish.MEMPOOL.busyobj.recycle

Recycled from pool. This metric is only provided by varnish 4.x.

varnish.MEMPOOL.busyobj.surplus

Too many for pool. This metric is only provided by varnish 4.x.

varnish.MEMPOOL.busyobj.sz_needed

Size allocated. This metric is only provided by varnish 4.x.

varnish.MEMPOOL.busyobj.sz_wanted

Size requested. This metric is only provided by varnish 4.x.

varnish.MEMPOOL.busyobj.timeout

Timed out from pool. This metric is only provided by varnish 4.x.

varnish.MEMPOOL.busyobj.toosmall

Too small to recycle. This metric is only provided by varnish 4.x.

varnish.MEMPOOL.req0.allocs

Allocations. This metric is only provided by varnish 4.x.

varnish.MEMPOOL.req0.frees

Frees. This metric is only provided by varnish 4.x.

varnish.MEMPOOL.req0.live

In use. This metric is only provided by varnish 4.x.

varnish.MEMPOOL.req0.pool

In pool. This metric is only provided by varnish 4.x.

varnish.MEMPOOL.req0.randry

Pool ran dry. This metric is only provided by varnish 4.x.

varnish.MEMPOOL.req0.recycle

Recycled from pool. This metric is only provided by varnish 4.x.

varnish.MEMPOOL.req0.surplus

Too many for pool. This metric is only provided by varnish 4.x.

varnish.MEMPOOL.req0.sz_needed

Size allocated. This metric is only provided by varnish 4.x.

varnish.MEMPOOL.req0.sz_wanted

Size requested. This metric is only provided by varnish 4.x.

varnish.MEMPOOL.req0.timeout

Timed out from pool. This metric is only provided by varnish 4.x.

varnish.MEMPOOL.req0.toosmall

Too small to recycle. This metric is only provided by varnish 4.x.

varnish.MEMPOOL.req1.allocs

Allocations. This metric is only provided by varnish 4.x.

varnish.MEMPOOL.req1.frees

Frees. This metric is only provided by varnish 4.x.

varnish.MEMPOOL.req1.live

In use. This metric is only provided by varnish 4.x.

varnish.MEMPOOL.req1.pool

In pool. This metric is only provided by varnish 4.x.

varnish.MEMPOOL.req1.randry

Pool ran dry. This metric is only provided by varnish 4.x.

varnish.MEMPOOL.req1.recycle

Recycled from pool. This metric is only provided by varnish 4.x.

varnish.MEMPOOL.req1.surplus

Too many for pool. This metric is only provided by varnish 4.x.

varnish.MEMPOOL.req1.sz_needed

Size allocated. This metric is only provided by varnish 4.x.

varnish.MEMPOOL.req1.sz_wanted

Size requested. This metric is only provided by varnish 4.x.

varnish.MEMPOOL.req1.timeout

Timed out from pool. This metric is only provided by varnish 4.x.

varnish.MEMPOOL.req1.toosmall

Too small to recycle. This metric is only provided by varnish 4.x.

varnish.MEMPOOL.sess0.allocs

Allocations. This metric is only provided by varnish 4.x.

varnish.MEMPOOL.sess0.frees

Frees. This metric is only provided by varnish 4.x.

varnish.MEMPOOL.sess0.live

In use. This metric is only provided by varnish 4.x.

varnish.MEMPOOL.sess0.pool

In pool. This metric is only provided by varnish 4.x.

varnish.MEMPOOL.sess0.randry

Pool ran dry. This metric is only provided by varnish 4.x.

varnish.MEMPOOL.sess0.recycle

Recycled from pool. This metric is only provided by varnish 4.x.

varnish.MEMPOOL.sess0.surplus

Too many for pool. This metric is only provided by varnish 4.x.

varnish.MEMPOOL.sess0.sz_needed

Size allocated. This metric is only provided by varnish 4.x.

varnish.MEMPOOL.sess0.sz_wanted

Size requested. This metric is only provided by varnish 4.x.

varnish.MEMPOOL.sess0.timeout

Timed out from pool. This metric is only provided by varnish 4.x.

varnish.MEMPOOL.sess0.toosmall

Too small to recycle. This metric is only provided by varnish 4.x.

varnish.MEMPOOL.sess1.allocs

Allocations. This metric is only provided by varnish 4.x.

varnish.MEMPOOL.sess1.frees

Frees. This metric is only provided by varnish 4.x.

varnish.MEMPOOL.sess1.live

In use. This metric is only provided by varnish 4.x.

varnish.MEMPOOL.sess1.pool

In pool. This metric is only provided by varnish 4.x.

varnish.MEMPOOL.sess1.randry

Pool ran dry. This metric is only provided by varnish 4.x.

varnish.MEMPOOL.sess1.recycle

Recycled from pool. This metric is only provided by varnish 4.x.

varnish.MEMPOOL.sess1.surplus

Too many for pool. This metric is only provided by varnish 4.x.

varnish.MEMPOOL.sess1.sz_needed

Size allocated. This metric is only provided by varnish 4.x.

varnish.MEMPOOL.sess1.sz_wanted

Size requested. This metric is only provided by varnish 4.x.

varnish.MEMPOOL.sess1.timeout

Timed out from pool. This metric is only provided by varnish 4.x.

varnish.MEMPOOL.sess1.toosmall

Too small to recycle. This metric is only provided by varnish 4.x.

varnish.MEMPOOL.vbc.allocs

Allocations. This metric is only provided by varnish 4.x.

varnish.MEMPOOL.vbc.frees

Frees. This metric is only provided by varnish 4.x.

varnish.MEMPOOL.vbc.live

In use. This metric is only provided by varnish 4.x.

varnish.MEMPOOL.vbc.pool

In pool. This metric is only provided by varnish 4.x.

varnish.MEMPOOL.vbc.randry

Pool ran dry. This metric is only provided by varnish 4.x.

varnish.MEMPOOL.vbc.recycle

Recycled from pool. This metric is only provided by varnish 4.x.

varnish.MEMPOOL.vbc.surplus

Too many for pool. This metric is only provided by varnish 4.x.

varnish.MEMPOOL.vbc.sz_needed

Size allocated. This metric is only provided by varnish 4.x.

varnish.MEMPOOL.vbc.sz_wanted

Size requested. This metric is only provided by varnish 4.x.

varnish.MEMPOOL.vbc.timeout

Timed out from pool. This metric is only provided by varnish 4.x.

varnish.MEMPOOL.vbc.toosmall

Too small to recycle. This metric is only provided by varnish 4.x.

varnish.MGT.child_died

Child processes that died due to signals. This metric is only provided by varnish 4.x.

varnish.MGT.child_dump

Child processes that produced core dumps. This metric is only provided by varnish 4.x.

varnish.MGT.child_exit

Child processes the were cleanly stopped. This metric is only provided by varnish 4.x.

varnish.MGT.child_panic

Child processes that panicked. This metric is only provided by varnish 4.x.

varnish.MGT.child_start

Child processes that started. This metric is only provided by varnish 4.x.

varnish.MGT.child_stop

Child processes that exited with an unexpected return code. This metric is only provided by varnish 4.x.

varnish.MGT.uptime

This metric is only provided by varnish 4.x.

varnish.n_backend

Number of backends.

varnish.n_ban

Active bans. This metric is only provided by varnish 3.x.

varnish.n_ban_add

New bans added. This metric is only provided by varnish 3.x.

varnish.n_ban_dups

Duplicate bans removed. This metric is only provided by varnish 3.x.

varnish.n_ban_obj_test

Objects tested. This metric is only provided by varnish 3.x.

varnish.n_ban_re_test

Regexps tested against. This metric is only provided by varnish 3.x.

varnish.n_ban_retire

Old bans deleted. This metric is only provided by varnish 3.x.

varnish.n_expired

Objects that expired from cache because of TTL.

varnish.n_gunzip

Gunzip operations.

varnish.n_gzip

Gzip operations.

varnish.n_lru_moved

Move operations done on the LRU list.

varnish.n_lru_nuked

Objects forcefully evicted from storage to make room for new objects.

varnish.n_obj_purged

Purged objects. This metric is only provided by varnish 4.x.

varnish.n_object

object structs made.

varnish.n_objectcore

objectcore structs made.

varnish.n_objecthead

objecthead structs made.

varnish.n_objoverflow

Objects overflowing workspace. This metric is only provided by varnish 3.x.

varnish.n_objsendfile

Objects sent with sendfile. This metric is only provided by varnish 3.x.

varnish.n_objwrite

Objects sent with write. This metric is only provided by varnish 3.x.

varnish.n_purges

Purges executed. This metric is only provided by varnish 4.x.

varnish.n_sess

sess structs made. This metric is only provided by varnish 3.x.

varnish.n_sess_mem

sess_mem structs made. This metric is only provided by varnish 3.x.

varnish.n_vampireobject

Unresurrected objects.

varnish.n_vbc

vbc structs made. This metric is only provided by varnish 3.x.

varnish.n_vcl

Total VCLs loaded.

varnish.n_vcl_avail

Available VCLs.

varnish.n_vcl_discard

Discarded VCLs.

varnish.n_waitinglist

waitinglist structs made.

varnish.n_wrk

Worker threads. This metric is only provided by varnish 3.x.

varnish.n_wrk_create

Worker threads created. This metric is only provided by varnish 3.x.

varnish.n_wrk_drop

Dropped work requests. This metric is only provided by varnish 3.x.

varnish.n_wrk_failed

Worker threads not created. This metric is only provided by varnish 3.x.

varnish.n_wrk_lqueue

Work request queue length. This metric is only provided by varnish 3.x.

varnish.n_wrk_max

Worker threads limited. This metric is only provided by varnish 3.x.

varnish.n_wrk_queued

Queued work requests. This metric is only provided by varnish 3.x.

varnish.pools

Thread pools. This metric is only provided by varnish 4.x.

varnish.s_bodybytes

Total body size. This metric is only provided by varnish 3.x.

varnish.s_fetch

Backend fetches.

varnish.s_hdrbytes

Total header size. This metric is only provided by varnish 3.x.

varnish.s_pass

Passed requests.

varnish.s_pipe

Pipe sessions seen.

varnish.s_pipe_hdrbytes

Total request bytes received for piped sessions. This metric is only provided by varnish 4.x.

varnish.s_pipe_in

Total number of bytes forwarded from clients in pipe sessions. This metric is only provided by varnish 4.x.

varnish.s_pipe_out

Total number of bytes forwarded to clients in pipe sessions. This metric is only provided by varnish 4.x.

varnish.s_req

Requests.

varnish.s_req_bodybytes

Total request body bytes received. This metric is only provided by varnish 4.x.

varnish.s_req_hdrbytes

Total request header bytes received. This metric is only provided by varnish 4.x.

varnish.s_resp_bodybytes

Total response body bytes transmitted. This metric is only provided by varnish 4.x.

varnish.s_resp_hdrbytes

Total response header bytes transmitted. This metric is only provided by varnish 4.x.

varnish.s_sess

Client connections.

varnish.s_synth

Synthetic responses made. This metric is only provided by varnish 4.x.

varnish.sess_closed

Client connections closed.

varnish.sess_conn

Client connections accepted. This metric is only provided by varnish 4.x.

varnish.sess_drop

Client connections dropped due to lack of worker thread. This metric is only provided by varnish 4.x.

varnish.sess_dropped

Client connections dropped due to a full queue. This metric is only provided by varnish 4.x.

varnish.sess_fail

Failures to accept a TCP connection. Either the client changed its mind, or the kernel ran out of some resource like file descriptors. This metric is only provided by varnish 4.x.

varnish.sess_herd varnish.sess_linger

This metric is only provided by varnish 3.x.

varnish.sess_pipe_overflow

This metric is only provided by varnish 4.x.

varnish.sess_pipeline varnish.sess_queued

Client connections queued to wait for a thread. This metric is only provided by varnish 4.x.

varnish.sess_readahead varnish.shm_cont

SHM MTX contention.

varnish.shm_cycles

SHM cycles through buffer.

varnish.shm_flushes

SHM flushes due to overflow.

varnish.shm_records

SHM records.

varnish.shm_writes

SHM writes.

varnish.SMA.s0.c_bytes

Total space allocated by this storage.

varnish.SMA.s0.c_fail

Times the storage has failed to provide a storage segment.

varnish.SMA.s0.c_freed

Total space returned to this storage.

varnish.SMA.s0.c_req

Times the storage has been asked to provide a storage segment.

varnish.SMA.s0.g_alloc

Storage allocations outstanding.

varnish.SMA.s0.g_bytes

Space allocated from the storage.

varnish.SMA.s0.g_space

Space left in the storage.

varnish.SMA.Transient.c_bytes

Total space allocated by this storage.

varnish.SMA.Transient.c_fail

Times the storage has failed to provide a storage segment.

varnish.SMA.Transient.c_freed

Total space returned to this storage.

varnish.SMA.Transient.c_req

Times the storage has been asked to provide a storage segment.

varnish.SMA.Transient.g_alloc

Storage allocations outstanding.

varnish.SMA.Transient.g_bytes

Space allocated from the storage.

varnish.SMA.Transient.g_space

Space left in the storage.

varnish.sms_balloc

SMS space allocated.

varnish.sms_bfree

SMS space freed.

varnish.sms_nbytes

SMS outstanding space.

varnish.sms_nobj

SMS outstanding allocations.

varnish.sms_nreq

SMS allocator requests.

varnish.thread_queue_len

Length of session queue waiting for threads. This metric is only provided by varnish 4.x.

varnish.threads

Number of threads. This metric is only provided by varnish 4.x.

varnish.threads_created

Threads created. This metric is only provided by varnish 4.x.

varnish.threads_destroyed

Threads destroyed. This metric is only provided by varnish 4.x.

varnish.threads_failed

Threads that failed to get created. This metric is only provided by varnish 4.x.

varnish.threads_limited

Threads that were needed but couldn’t be created because of a thread pool limit. This metric is only provided by varnish 4.x.

varnish.uptime

varnish.vmods

Loaded VMODs. This metric is only provided by varnish 4.x.

varnish.vsm_cooling

Space which will soon (max 1 minute) be freed in the shared memory used to communicate with tools like varnishstat, varnishlog etc. This metric is only provided by varnish 4.x.

varnish.vsm_free

Free space in the shared memory used to communicate with tools like varnishstat, varnishlog etc. This metric is only provided by varnish 4.x.

varnish.vsm_overflow

Data which does not fit in the shared memory used to communicate with tools like varnishstat, varnishlog etc. This metric is only provided by varnish 4.x.

varnish.vsm_overflowed

Total data which did not fit in the shared memory used to communicate with tools like varnishstat, varnishlog etc. This metric is only provided by varnish 4.x.

varnish.vsm_used

Used space in the shared memory used to communicate with tools like varnishstat, varnishlog etc. This metric is only provided by varnish 4.x.

varnish.n_purgesps

Purges executed. This metric is only provided by varnish 4.x.

6.3.3 - Benchmarks and Compliance

Note: Sysdig follows the Prometheus-compabtible naming convention for both metrics and labels as opposed to the previous statsd-compatible one. However, this page still shows metrics in the legacy Sysdig naming convention. Until this page is updated, see Metrics and Label Mapping for the mapping between legacy Sysdig and Prometheus naming conventions.

Compliance metrics are generated from scheduled CIS Benchmark scans that occur in Sysdig Secure. These metrics cover aggregate results of the various CIS Benchmark sections, as well as granular details about how many running containers are failing specific run-time compliance checks.

Contents

6.3.3.1 - Docker/CIS Benchmarks

compliance.docker-bench.container-images-and-build-file.pass_pct

The percentage of successful Docker benchmark tests run on the container images and build files.

MetadataDescription
Metric TypeGauge
Value Type%
Segment ByContainer
Default Time AggregationAverage
Available Time Aggregation FormatsAvg, Rate, Sum, Min, Max
Default Group AggregationAverage
Available Group Aggregation FormatsAvg, Sum, Min, Max

compliance.docker-bench.container-images-and-build-file.tests_fail

The number of failed Docker benchmark tests run against the container images and build file.

MetadataDescription
Metric TypeGauge
Value TypeInteger
Segment ByContainer
Default Time AggregationRate
Available Time Aggregation FormatsAvg, Rate, Sum, Min, Max
Default Group AggregationSum
Available Group Aggregation FormatsAvg, Sum, Min, Max

compliance.docker-bench.container-images-and-build-file.tests_pass

The number of successful Docker benchmark tests run against the container images and build file.

MetadataDescription
Metric TypeGauge
Value TypeInteger
Segment ByContainer
Default Time AggregationRate
Available Time Aggregation FormatsAvg, Rate, Sum, Min, Max
Default Group AggregationSum
Available Group Aggregation FormatsAvg, Sum, Min, Max

compliance.docker-bench.container-images-and-build-file.tests_total

The total number of tests run against the container images and build file.

MetadataDescription
Metric TypeGauge
Value TypeInteger
Segment ByContainer
Default Time AggregationRate
Available Time Aggregation FormatsAvg, Rate, Sum, Min, Max
Default Group AggregationSum
Available Group Aggregation FormatsAvg, Sum, Min, Max

compliance.docker-bench.container-runtime.pass_pct

The percentage of successful container runtime Docker benchmark tests.

MetadataDescription
Metric TypeGauge
Value Type%
Segment ByContainer
Default Time AggregationAverage
Available Time Aggregation FormatsAvg, Rate, Sum, Min, Max
Default Group AggregationAverage
Available Group Aggregation FormatsAvg, Sum, Min, Max

compliance.docker-bench.container-runtime.tests_fail

The number of failed container runtime benchmark tests.

MetadataDescription
Metric TypeGauge
Value TypeInteger
Segment ByContainer
Default Time AggregationRate
Available Time Aggregation FormatsAvg, Rate, Sum, Min, Max
Default Group AggregationSum
Available Group Aggregation FormatsAvg, Sum, Min, Max

compliance.docker-bench.container-runtime.tests_pass

The number of successful container runtime Docker benchmark tests.

MetadataDescription
Metric TypeGauge
Value TypeInteger
Segment ByContainer
Default Time AggregationRate
Available Time Aggregation FormatsAvg, Rate, Sum, Min, Max
Default Group AggregationSum
Available Group Aggregation FormatsAvg, Sum, Min, Max

compliance.docker-bench.container-runtime.tests_total

The total number of Docker benchmark tests run against container runtimes.

MetadataDescription
Metric TypeGauge
Value TypeInteger
Segment ByContainer
Default Time AggregationRate
Available Time Aggregation FormatsAvg, Rate, Sum, Min, Max
Default Group AggregationSum
Available Group Aggregation FormatsAvg, Sum, Min, Max

compliance.docker-bench.c-caps-added

The number of containers running without kernel restrictions in place.

MetadataDescription
Metric TypeGauge
Value TypeInteger
Segment ByContainer
Default Time AggregationRate
Available Time Aggregation FormatsAvg, Rate, Sum, Min, Max
Default Group AggregationSum
Available Group Aggregation FormatsAvg, Sum, Min, Max

compliance.docker-bench.c-maxretry-not-set

The number of containers configured to not limit installation retries if the initial attempt fails.

MetadataDescription
Metric TypeGauge
Value TypeInteger
Segment ByContainer
Default Time AggregationRate
Available Time Aggregation FormatsAvg, Rate, Sum, Min, Max
Default Group AggregationSum
Available Group Aggregation FormatsAvg, Sum, Min, Max

compliance.docker-bench.c-mount-prop-shared

The number of containers that use mount propagation.

MetadataDescription
Metric TypeGauge
Value TypeInteger
Segment ByContainer
Default Time AggregationRate
Available Time Aggregation FormatsAvg, Rate, Sum, Min, Max
Default Group AggregationSum
Available Group Aggregation FormatsAvg, Sum, Min, Max

compliance.docker-bench.c-networking-host

The number of containers that share the host’s network namespace.

MetadataDescription
Metric TypeGauge
Value TypeInteger
Segment ByContainer
Default Time AggregationRate
Available Time Aggregation FormatsAvg, Rate, Sum, Min, Max
Default Group AggregationSum
Available Group Aggregation FormatsAvg, Sum, Min, Max

compliance.docker-bench.c-no-apparmor

The number of containers running without an AppArmor profile.

MetadataDescription
Metric TypeGauge
Value TypeInteger
Segment ByContainer
Default Time AggregationRate
Available Time Aggregation FormatsAvg, Rate, Sum, Min, Max
Default Group AggregationSum
Available Group Aggregation FormatsAvg, Sum, Min, Max

compliance.docker-bench.c-no-cpu-limits

The number of containers running with no CPU limits configured.

MetadataDescription
Metric TypeGauge
Value TypeInteger
Segment ByContainer
Default Time AggregationRate
Available Time Aggregation FormatsAvg, Rate, Sum, Min, Max
Default Group AggregationSum
Available Group Aggregation FormatsAvg, Sum, Min, Max

compliance.docker-bench.c-no-health-check

The number of containers that have no HEALTHCHECK instruction configured.

MetadataDescription
Metric TypeGauge
Value TypeInteger
Segment ByContainer
Default Time AggregationRate
Available Time Aggregation FormatsAvg, Rate, Sum, Min, Max
Default Group AggregationSum
Available Group Aggregation FormatsAvg, Sum, Min, Max

compliance.docker-bench.c-no-mem-limits

The number of containers configured to run without memory limitations.

MetadataDescription
Metric TypeGauge
Value TypeInteger
Segment ByContainer
Default Time AggregationRate
Available Time Aggregation FormatsAvg, Rate, Sum, Min, Max
Default Group AggregationSum
Available Group Aggregation FormatsAvg, Sum, Min, Max

compliance.docker-bench.c-no-pids-cgroup-limit

The number of containers that do not use a cgroup for PIDs.

MetadataDescription
Metric TypeGauge
Value TypeInteger
Segment ByContainer
Default Time AggregationRate
Available Time Aggregation FormatsAvg, Rate, Sum, Min, Max
Default Group AggregationSum
Available Group Aggregation FormatsAvg, Sum, Min, Max

compliance.docker-bench.c-no-restricted-privs

The number of containers running that can have additional privileges configured.

MetadataDescription
Metric TypeGauge
Value TypeInteger
Segment ByContainer
Default Time AggregationRate
Available Time Aggregation FormatsAvg, Rate, Sum, Min, Max
Default Group AggregationSum
Available Group Aggregation FormatsAvg, Sum, Min, Max

compliance.docker-bench.c-no-seccomp

The number of containers that disable the default seccomp profile.

MetadataDescription
Metric TypeGauge
Value TypeInteger
Segment ByContainer
Default Time AggregationRate
Available Time Aggregation FormatsAvg, Rate, Sum, Min, Max
Default Group AggregationSum
Available Group Aggregation FormatsAvg, Sum, Min, Max

compliance.docker-bench.c-no-securityopts

The number of containers running without SELinux options configured.

MetadataDescription
Metric TypeGauge
Value TypeInteger
Segment ByContainer
Default Time AggregationRate
Available Time Aggregation FormatsAvg, Rate, Sum, Min, Max
Default Group AggregationSum
Available Group Aggregation FormatsAvg, Sum, Min, Max

compliance.docker-bench.c-no-ulimit-override

The number of containers running that override the default ulimit.

MetadataDescription
Metric TypeGauge
Value TypeInteger
Segment ByContainer
Default Time AggregationRate
Available Time Aggregation FormatsAvg, Rate, Sum, Min, Max
Default Group AggregationSum
Available Group Aggregation FormatsAvg, Sum, Min, Max

compliance.docker-bench.c-privileged-ports

The number of containers that have privileged ports mapped into them.

MetadataDescription
Metric TypeGauge
Value TypeInteger
Segment ByContainer
Default Time AggregationRate
Available Time Aggregation FormatsAvg, Rate, Sum, Min, Max
Default Group AggregationSum
Available Group Aggregation FormatsAvg, Sum, Min, Max

compliance.docker-bench.c-root-mounted-rw

The number of containers that mount the host’s root filesystem with read/write privileges.

MetadataDescription
Metric TypeGauge
Value TypeInteger
Segment ByContainer
Default Time AggregationRate
Available Time Aggregation FormatsAvg, Rate, Sum, Min, Max
Default Group AggregationSum
Available Group Aggregation FormatsAvg, Sum, Min, Max

compliance.docker-bench.c-running-privileged

The number of containers running with the --privileged configuration option set.

MetadataDescription
Metric TypeGauge
Value TypeInteger
Segment ByContainer
Default Time AggregationRate
Available Time Aggregation FormatsAvg, Rate, Sum, Min, Max
Default Group AggregationSum
Available Group Aggregation FormatsAvg, Sum, Min, Max

compliance.docker-bench.c-sensitive-dirs

The number of containers that have mounted a sensitive directory from the host.

MetadataDescription
Metric TypeGauge
Value TypeInteger
Segment ByContainer
Default Time AggregationRate
Available Time Aggregation FormatsAvg, Rate, Sum, Min, Max
Default Group AggregationSum
Available Group Aggregation FormatsAvg, Sum, Min, Max

compliance.docker-bench.c-sharing-docker-sock

The number of containers that share the host’s docker socket.

MetadataDescription
Metric TypeGauge
Value TypeInteger
Segment ByContainer
Default Time AggregationRate
Available Time Aggregation FormatsAvg, Rate, Sum, Min, Max
Default Group AggregationSum
Available Group Aggregation FormatsAvg, Sum, Min, Max

compliance.docker-bench.c-sharing-host-devs

The number of containers that share one or more host devices.

MetadataDescription
Metric TypeGauge
Value TypeInteger
Segment ByContainer
Default Time AggregationRate
Available Time Aggregation FormatsAvg, Rate, Sum, Min, Max
Default Group AggregationSum
Available Group Aggregation FormatsAvg, Sum, Min, Max

compliance.docker-bench.c-sharing-host-ipc-ns

The number of containers that share the host’s IPC namespace.

MetadataDescription
Metric TypeGauge
Value TypeInteger
Segment ByContainer
Default Time AggregationRate
Available Time Aggregation FormatsAvg, Rate, Sum, Min, Max
Default Group AggregationSum
Available Group Aggregation FormatsAvg, Sum, Min, Max

compliance.docker-bench.c-sharing-host-pid-ns

The number of containers that share the host’s PID namespace.

MetadataDescription
Metric TypeGauge
Value TypeInteger
Segment ByContainer
Default Time AggregationRate
Available Time Aggregation FormatsAvg, Rate, Sum, Min, Max
Default Group AggregationSum
Available Group Aggregation FormatsAvg, Sum, Min, Max

compliance.docker-bench.c-sharing-host-user-ns

The number of containers that share the host’s user namespace.

MetadataDescription
Metric TypeGauge
Value TypeInteger
Segment ByContainer
Default Time AggregationRate
Available Time Aggregation FormatsAvg, Rate, Sum, Min, Max
Default Group AggregationSum
Available Group Aggregation FormatsAvg, Sum, Min, Max

compliance.docker-bench.c-sharing-host-uts-ns

The number of containers that share the host’s UTS namespace.

MetadataDescription
Metric TypeGauge
Value TypeInteger
Segment ByContainer
Default Time AggregationRate
Available Time Aggregation FormatsAvg, Rate, Sum, Min, Max
Default Group AggregationSum
Available Group Aggregation FormatsAvg, Sum, Min, Max

compliance.docker-bench.c-sshd-docker-exec-failures

The number of containers running an SSH daemon.

MetadataDescription
Metric TypeGauge
Value TypeInteger
Segment ByContainer
Default Time AggregationRate
Available Time Aggregation FormatsAvg, Rate, Sum, Min, Max
Default Group AggregationSum
Available Group Aggregation FormatsAvg, Sum, Min, Max

compliance.docker-bench.c-unexpected-cgroup

The number of containers running without a dedicated cgroup configured.

MetadataDescription
Metric TypeGauge
Value TypeInteger
Segment ByContainer
Default Time AggregationRate
Available Time Aggregation FormatsAvg, Rate, Sum, Min, Max
Default Group AggregationSum
Available Group Aggregation FormatsAvg, Sum, Min, Max

compliance.docker-bench.c-using-docker0-net

The number of containers using the default docker bridge network docker0.

MetadataDescription
Metric TypeGauge
Value TypeInteger
Segment ByContainer
Default Time AggregationRate
Available Time Aggregation FormatsAvg, Rate, Sum, Min, Max
Default Group AggregationSum
Available Group Aggregation FormatsAvg, Sum, Min, Max

compliance.docker-bench.c-wildcard-bound-port

The number of containers that do not bind incoming traffic to a specific interface.

MetadataDescription
Metric TypeGauge
Value TypeInteger
Segment ByContainer
Default Time AggregationRate
Available Time Aggregation FormatsAvg, Rate, Sum, Min, Max
Default Group AggregationSum
Available Group Aggregation FormatsAvg, Sum, Min, Max

compliance.docker-bench.docker-daemon-configuration.pass_pct

The percentage of successful Docker benchmark tests run against the Docker daemon configuration.

MetadataDescription
Metric TypeGauge
Value Type%
Segment ByContainer
Default Time AggregationAverage
Available Time Aggregation FormatsAvg, Rate, Sum, Min, Max
Default Group AggregationAverage
Available Group Aggregation FormatsAvg, Sum, Min, Max

compliance.docker-bench.docker-daemon-configuration.tests_fail

The number of benchmark tests run against the Docker daemon configuration that failed.

MetadataDescription
Metric TypeGauge
Value TypeInteger
Segment ByContainer
Default Time AggregationRate
Available Time Aggregation FormatsAvg, Rate, Sum, Min, Max
Default Group AggregationSum
Available Group Aggregation FormatsAvg, Sum, Min, Max

compliance.docker-bench.docker-daemon-configuration.tests_pass

The number of benchmark tests run against the Docker daemon configuration that passed.

MetadataDescription
Metric TypeGauge
Value TypeInteger
Segment ByContainer
Default Time AggregationRate
Available Time Aggregation FormatsAvg, Rate, Sum, Min, Max
Default Group AggregationSum
Available Group Aggregation FormatsAvg, Sum, Min, Max

compliance.docker-bench.docker-daemon-configuration.tests_total

The total number of benchmark tests run against the Docker daemon configuration.

MetadataDescription
Metric TypeGauge
Value TypeInteger
Segment ByContainer
Default Time AggregationRate
Available Time Aggregation FormatsAvg, Rate, Sum, Min, Max
Default Group AggregationSum
Available Group Aggregation FormatsAvg, Sum, Min, Max

compliance.docker-bench.docker-daemon-configuration-files.pass_pct

The percentage of successful Docker benchmark tests run against the Docker daemon configuration files.

MetadataDescription
Metric TypeGauge
Value Type%
Segment ByContainer
Default Time AggregationAverage
Available Time Aggregation FormatsAvg, Rate, Sum, Min, Max
Default Group AggregationAverage
Available Group Aggregation FormatsAvg, Sum, Min, Max

compliance.docker-bench.docker-daemon-configuration-files.tests_fail

The number of benchmark tests run against the Docker daemon configuration files that failed.

MetadataDescription
Metric TypeGauge
Value TypeInteger
Segment ByContainer
Default Time AggregationRate
Available Time Aggregation FormatsAvg, Rate, Sum, Min, Max
Default Group AggregationSum
Available Group Aggregation FormatsAvg, Sum, Min, Max

compliance.docker-bench.docker-daemon-configuration-files.tests_pass

The number of benchmark tests run against the Docker daemon configuration files that passed.

MetadataDescription
Metric TypeGauge
Value TypeInteger
Segment ByContainer
Default Time AggregationRate
Available Time Aggregation FormatsAvg, Rate, Sum, Min, Max
Default Group AggregationSum
Available Group Aggregation FormatsAvg, Sum, Min, Max

compliance.docker-bench.docker-daemon-configuration-files.tests_total

The total number of benchmark tests run against the Docker daemon configuration files.

MetadataDescription
Metric TypeGauge
Value TypeInteger
Segment ByContainer
Default Time AggregationRate
Available Time Aggregation FormatsAvg, Rate, Sum, Min, Max
Default Group AggregationSum
Available Group Aggregation FormatsAvg, Sum, Min, Max

compliance.docker-bench.docker-security-operations.pass_pct

The percentage of benchmark tests run against Docker security operations that were successful.

MetadataDescription
Metric TypeGauge
Value Type%
Segment ByContainer
Default Time AggregationAverage
Available Time Aggregation FormatsAvg, Rate, Sum, Min, Max
Default Group AggregationAverage
Available Group Aggregation FormatsAvg, Sum, Min, Max

compliance.docker-bench.docker-security-operations.tests_fail

The number of benchmark tests run against Docker security operations that failed.

MetadataDescription
Metric TypeGauge
Value TypeInteger
Segment ByContainer
Default Time AggregationRate
Available Time Aggregation FormatsAvg, Rate, Sum, Min, Max
Default Group AggregationSum
Available Group Aggregation FormatsAvg, Sum, Min, Max

compliance.docker-bench.docker-security-operations.tests_pass

The number of benchmark tests run against Docker security operations that passed.

MetadataDescription
Metric TypeGauge
Value TypeInteger
Segment ByContainer
Default Time AggregationRate
Available Time Aggregation FormatsAvg, Rate, Sum, Min, Max
Default Group AggregationSum
Available Group Aggregation FormatsAvg, Sum, Min, Max

compliance.docker-bench.docker-security-operations.tests_total

The total number of benchmark tests run against Docker security operations.

MetadataDescription
Metric TypeGauge
Value TypeInteger
Segment ByContainer
Default Time AggregationRate
Available Time Aggregation FormatsAvg, Rate, Sum, Min, Max
Default Group AggregationSum
Available Group Aggregation FormatsAvg, Sum, Min, Max

compliance.docker-bench.docker-swarm-configuration.pass_pct

The percentage of benchmark tests run against the Docker swarm configuration that were successful.

MetadataDescription
Metric TypeGauge
Value Type%
Segment ByContainer
Default Time AggregationAverage
Available Time Aggregation FormatsAvg, Rate, Sum, Min, Max
Default Group AggregationAverage
Available Group Aggregation FormatsAvg, Sum, Min, Max

compliance.docker-bench.docker-swarm-configuration.tests_fail

The number of benchmark tests run against the Docker swarm configuration that failed.

MetadataDescription
Metric TypeGauge
Value TypeInteger
Segment ByContainer
Default Time AggregationRate
Available Time Aggregation FormatsAvg, Rate, Sum, Min, Max
Default Group AggregationSum
Available Group Aggregation FormatsAvg, Sum, Min, Maxv

compliance.docker-bench.docker-swarm-configuration.tests_pass

The number of benchmark tests run against the Docker swarm configuration that passed.

MetadataDescription
Metric TypeGauge
Value TypeInteger
Segment ByContainer
Default Time AggregationRate
Available Time Aggregation FormatsAvg, Rate, Sum, Min, Max
Default Group AggregationSum
Available Group Aggregation FormatsAvg, Sum, Min, Max

compliance.docker-bench.docker-swarm-configuration.tests_total

The total number of benchmark tests run against the Docker swarm configuration.

MetadataDescription
Metric TypeGauge
Value TypeInteger
Segment ByContainer
Default Time AggregationRate
Available Time Aggregation FormatsAvg, Rate, Sum, Min, Max
Default Group AggregationSum
Available Group Aggregation FormatsAvg, Sum, Min, Max

compliance.docker-bench.docker-users

The number of user accounts with permission to access the Docker daemon socket.

MetadataDescription
Metric TypeGauge
Value TypeInteger
Segment ByContainer
Default Time AggregationRate
Available Time Aggregation FormatsAvg, Rate, Sum, Min, Max
Default Group AggregationSum
Available Group Aggregation FormatsAvg, Sum, Min, Max

compliance.docker-bench.host-configuration.pass_pct

The percentage of benchmark tests run against the host configuration that were successful.

MetadataDescription
Metric TypeGauge
Value Type%
Segment ByContainer
Default Time AggregationAverage
Available Time Aggregation FormatsAvg, Rate, Sum, Min, Max
Default Group AggregationAverage
Available Group Aggregation FormatsAvg, Sum, Min, Max

compliance.docker-bench.host-configuration.tests_fail

The number of benchmark tests run against the host configuration that failed.

MetadataDescription
Metric TypeGauge
Value TypeInteger
Segment ByContainer
Default Time AggregationRate
Available Time Aggregation FormatsAvg, Rate, Sum, Min, Max
Default Group AggregationSum
Available Group Aggregation FormatsAvg, Sum, Min, Max

compliance.docker-bench.host-configuration.tests_pass

The number of benchmark tests run against the host configuration that passed.

MetadataDescription
Metric TypeGauge
Value TypeInteger
Segment ByContainer
Default Time AggregationRate
Available Time Aggregation FormatsAvg, Rate, Sum, Min, Max
Default Group AggregationSum
Available Group Aggregation FormatsAvg, Sum, Min, Max

compliance.docker-bench.host-configuration.tests_total

The total number of benchmark tests run against the host configuration.

MetadataDescription
Metric TypeGauge
Value TypeInteger
Segment ByContainer
Default Time AggregationRate
Available Time Aggregation FormatsAvg, Rate, Sum, Min, Max
Default Group AggregationSum
Available Group Aggregation FormatsAvg, Sum, Min, Max

compliance.docker-bench.img-images-using-add

The number of images that use the COPY function rather than the ADD function in Dockerfile.

MetadataDescription
Metric TypeGauge
Value TypeInteger
Segment ByContainer
Default Time AggregationRate
Available Time Aggregation FormatsAvg, Rate, Sum, Min, Max
Default Group AggregationSum
Available Group Aggregation FormatsAvg, Sum, Min, Max

compliance.docker-bench.img-no-healthcheck

The number of images with no HEALTHCHECK instruction configured.

MetadataDescription
Metric TypeGauge
Value TypeInteger
Segment ByContainer
Default Time AggregationRate
Available Time Aggregation FormatsAvg, Rate, Sum, Min, Max
Default Group AggregationSum
Available Group Aggregation FormatsAvg, Sum, Min, Max

compliance.docker-bench.img-running-root

The number of images that use the root user.

MetadataDescription
Metric TypeGauge
Value TypeInteger
Segment ByContainer
Default Time AggregationRate
Available Time Aggregation FormatsAvg, Rate, Sum, Min, Max
Default Group AggregationSum
Available Group Aggregation FormatsAvg, Sum, Min, Max

compliance.docker-bench.img-update-insts-found

The number of images that run a package update step without a package installation step.

MetadataDescription
Metric TypeGauge
Value TypeInteger
Segment ByContainer
Default Time AggregationRate
Available Time Aggregation FormatsAvg, Rate, Sum, Min, Max
Default Group AggregationSum
Available Group Aggregation FormatsAvg, Sum, Min, Max

compliance.docker-bench.pass_pct

The percentage of Docker benchmark tests run that passed.

MetadataDescription
Metric TypeGauge
Value Type%
Segment ByContainer
Default Time AggregationAverage
Available Time Aggregation FormatsAvg, Rate, Sum, Min, Max
Default Group AggregationAverage
Available Group Aggregation FormatsAvg, Sum, Min, Max

compliance.docker-bench.score

The current pass/fail score for Docker benchmark tests run. The value of this metric is calculated by starting at zero, and incrementing once for every successful test, and decrementing once for every test that returns a WARN result or worse.

MetadataDescription
Metric TypeCounter
Value TypeInteger
Segment ByContainer
Default Time AggregationRate
Available Time Aggregation FormatsAvg, Rate, Sum, Min, Max
Default Group AggregationSum
Available Group Aggregation FormatsAvg, Sum, Min, Max

compliance.docker-bench.tests_fail

The total number of Docker benchmark tests that have failed.

MetadataDescription
Metric TypeGauge
Value TypeInteger
Segment ByContainer
Default Time AggregationRate
Available Time Aggregation FormatsAvg, Rate, Sum, Min, Max
Default Group AggregationSum
Available Group Aggregation FormatsAvg, Sum, Min, Max

compliance.docker-bench.tests_pass

The total number of Docker benchmark tests that have passed

MetadataDescription
Metric TypeGauge
Value TypeInteger
Segment ByContainer
Default Time AggregationRate
Available Time Aggregation FormatsAvg, Rate, Sum, Min, Max
Default Group AggregationSum
Available Group Aggregation FormatsAvg, Sum, Min, Max

compliance.docker-bench.tests_total

The total number of Docker benchmark tests that have been run.

MetadataDescription
Metric TypeGauge
Value TypeInteger
Segment ByContainer
Default Time AggregationRate
Available Time Aggregation FormatsAvg, Rate, Sum, Min, Max
Default Group AggregationSum
Available Group Aggregation FormatsAvg, Sum, Min, Max

6.3.3.2 - Kubernetes Benchmarks

compliance.k8s-bench.api-server.pass_pct

The percentage of Kubernetes benchmark tests run on the API server that passed.

MetadataDescription
Metric TypeGauge
Value Type%
Segment ByContainer
Default Time AggregationAverage
Available Time Aggregation FormatsAvg, Rate, Sum, Min, Max
Default Group AggregationAverage
Available Group Aggregation FormatsAvg, Sum, Min, Max

compliance.k8s-bench.api-server.tests_fail

The number of Kubernetes benchmark tests run on the API server that failed.

MetadataDescription
Metric TypeGauge
Value TypeInteger
Segment ByContainer
Default Time AggregationRate
Available Time Aggregation FormatsAvg, Rate, Sum, Min, Max
Default Group AggregationSum
Available Group Aggregation FormatsAvg, Sum, Min, Max

compliance.k8s-bench.api-server.tests_pass

The number of Kubernetes benchmark tests run on the API server that passed.

MetadataDescription
Metric TypeGauge
Value TypeInteger
Segment ByContainer
Default Time AggregationRate
Available Time Aggregation FormatsAvg, Rate, Sum, Min, Max
Default Group AggregationSum
Available Group Aggregation FormatsAvg, Sum, Min, Max

compliance.k8s-bench.api-server.tests_total

The total number of Kubernetes benchmark tests run on the API server.

MetadataDescription
Metric TypeGauge
Value TypeInteger
Segment ByContainer
Default Time AggregationRate
Available Time Aggregation FormatsAvg, Rate, Sum, Min, Max
Default Group AggregationSum
Available Group Aggregation FormatsAvg, Sum, Min, Max

compliance.k8s-bench.api-server.tests_warn

The number of Kubernetes benchmark tests run on the API server that returned a result of WARN.

MetadataDescription
Metric TypeGauge
Value TypeInteger
Segment ByContainer
Default Time AggregationRate
Available Time Aggregation FormatsAvg, Rate, Sum, Min, Max
Default Group AggregationSum
Available Group Aggregation FormatsAvg, Sum, Min, Max

compliance.k8s-bench.configuration-files.pass_pct

The percentage of Kubernetes benchmark tests run on the configuration files of non-master nodes that passed.

MetadataDescription
Metric TypeGauge
Value Type%
Segment ByContainer
Default Time AggregationAverage
Available Time Aggregation FormatsAvg, Rate, Sum, Min, Max
Default Group AggregationAverage
Available Group Aggregation FormatsAvg, Sum, Min, Max

compliance.k8s-bench.configuration-files.tests_fail

The number of Kubernetes benchmark tests run on the configuration files of non-master nodes that failed.

MetadataDescription
Metric TypeGauge
Value TypeInteger
Segment ByContainer
Default Time AggregationRate
Available Time Aggregation FormatsAvg, Rate, Sum, Min, Max
Default Group AggregationSum
Available Group Aggregation FormatsAvg, Sum, Min, Max

compliance.k8s-bench.configuration-files.tests_pass

The number of Kubernetes benchmark tests run on the configuration files that passed.

MetadataDescription
Metric TypeGauge
Value TypeInteger
Segment ByContainer
Default Time AggregationRate
Available Time Aggregation FormatsAvg, Rate, Sum, Min, Max
Default Group AggregationSum
Available Group Aggregation FormatsAvg, Sum, Min, Max

compliance.k8s-bench.configuration-files.tests_total

The total number of Kubernetes benchmark tests run on the configuration files of non-master nodes.

MetadataDescription
Metric TypeGauge
Value TypeInteger
Segment ByContainer
Default Time AggregationRate
Available Time Aggregation FormatsAvg, Rate, Sum, Min, Max
Default Group AggregationSum
Available Group Aggregation FormatsAvg, Sum, Min, Max

compliance.k8s-bench.configuration-files.tests_warn

The number of Kubernetes benchmark tests run on the configuration files of non-master nodes that returned a result of WARN.

MetadataDescription
Metric TypeGauge
Value TypeInteger
Segment ByContainer
Default Time AggregationRate
Available Time Aggregation FormatsAvg, Rate, Sum, Min, Max
Default Group AggregationSum
Available Group Aggregation FormatsAvg, Sum, Min, Max

compliance.k8s-bench.configure-files.pass_pct

The percentage of Kubernetes benchmark tests run on the master node configuration files that passed.

MetadataDescription
Metric TypeGauge
Value Type%
Segment ByContainer
Default Time AggregationAverage
Available Time Aggregation FormatsAvg, Rate, Sum, Min, Max
Default Group AggregationAverage
Available Group Aggregation FormatsAvg, Sum, Min, Max

compliance.k8s-bench.configure-files.tests_fail

The number of Kubernetes benchmark tests run on the master node configuration files that failed.

MetadataDescription
Metric TypeGauge
Value TypeInteger
Segment ByContainer
Default Time AggregationRate
Available Time Aggregation FormatsAvg, Rate, Sum, Min, Max
Default Group AggregationSum
Available Group Aggregation FormatsAvg, Sum, Min, Max

compliance.k8s-bench.configure-files.tests_pass

The number of Kubernetes benchmark tests run on the master node configuration files that passed.

MetadataDescription
Metric TypeGauge
Value TypeInteger
Segment ByContainer
Default Time AggregationRate
Available Time Aggregation FormatsAvg, Rate, Sum, Min, Max
Default Group AggregationSum
Available Group Aggregation FormatsAvg, Sum, Min, Max

compliance.k8s-bench.configure-files.tests_total

The total number of Kubernetes benchmark tests run on the master node configuration files.

MetadataDescription
Metric TypeGauge
Value TypeInteger
Segment ByContainer
Default Time AggregationRate
Available Time Aggregation FormatsAvg, Rate, Sum, Min, Max
Default Group AggregationSum
Available Group Aggregation FormatsAvg, Sum, Min, Max

compliance.k8s-bench.configure-files.tests_warn

The number of Kubernetes benchmark tests run on the master node configuration files that returned a result of WARN.

MetadataDescription
Metric TypeGauge
Value TypeInteger
Segment ByContainer
Default Time AggregationRate
Available Time Aggregation FormatsAvg, Rate, Sum, Min, Max
Default Group AggregationSum
Available Group Aggregation FormatsAvg, Sum, Min, Max

compliance.k8s-bench.controller-manager.pass_pct

The percentage of Kubernetes benchmark tests run on the controller manager that passed.

MetadataDescription
Metric TypeGauge
Value Type%
Segment ByContainer
Default Time AggregationAverage
Available Time Aggregation FormatsAvg, Rate, Sum, Min, Max
Default Group AggregationAverage
Available Group Aggregation FormatsAvg, Sum, Min, Max

compliance.k8s-bench.controller-manager.tests_fail

The number of Kubernetes benchmark tests run on the controller manager that failed.

MetadataDescription
Metric TypeGauge
Value TypeInteger
Segment ByContainer
Default Time AggregationRate
Available Time Aggregation FormatsAvg, Rate, Sum, Min, Max
Default Group AggregationSum
Available Group Aggregation FormatsAvg, Sum, Min, Max

compliance.k8s-bench.controller-manager.tests_pass

The number of Kubernetes benchmark tests run on the controller manager that passed.

MetadataDescription
Metric TypeGauge
Value TypeInteger
Segment ByContainer
Default Time AggregationRate
Available Time Aggregation FormatsAvg, Rate, Sum, Min, Max
Default Group AggregationSum
Available Group Aggregation FormatsAvg, Sum, Min, Max

compliance.k8s-bench.controller-manager.tests_total

The total number of Kubernetes benchmark tests run on the controller manager.

MetadataDescription
Metric TypeGauge
Value TypeInteger
Segment ByContainer
Default Time AggregationRate
Available Time Aggregation FormatsAvg, Rate, Sum, Min, Max
Default Group AggregationSum
Available Group Aggregation FormatsAvg, Sum, Min, Max

compliance.k8s-bench.controller-manager.tests_warn

The number of Kubernetes benchmark tests run on the controller manager that returned a result of WARN.

MetadataDescription
Metric TypeGauge
Value TypeInteger
Segment ByContainer
Default Time AggregationRate
Available Time Aggregation FormatsAvg, Rate, Sum, Min, Max
Default Group AggregationSum
Available Group Aggregation FormatsAvg, Sum, Min, Max

compliance.k8s-bench.etcd.pass_pct

The percentage of Kubernetes benchmark tests run on the etcd key value store that passed.

MetadataDescription
Metric TypeGauge
Value Type%
Segment ByContainer
Default Time AggregationAverage
Available Time Aggregation FormatsAvg, Rate, Sum, Min, Max
Default Group AggregationAverage
Available Group Aggregation FormatsAvg, Sum, Min, Max

compliance.k8s-bench.etcd.tests_fail

The number of Kubernetes benchmark tests run on the etcd key value store that failed.

MetadataDescription
Metric TypeGauge
Value TypeInteger
Segment ByContainer
Default Time AggregationRate
Available Time Aggregation FormatsAvg, Rate, Sum, Min, Max
Default Group AggregationSum
Available Group Aggregation FormatsAvg, Sum, Min, Max

compliance.k8s-bench.etcd.tests_pass

The number of Kubernetes benchmark tests run on the etcd key value store that passed.

MetadataDescription
Metric TypeGauge
Value TypeInteger
Segment ByContainer
Default Time AggregationRate
Available Time Aggregation FormatsAvg, Rate, Sum, Min, Max
Default Group AggregationSum
Available Group Aggregation FormatsAvg, Sum, Min, Max

compliance.k8s-bench.etcd.tests_total

The total number of Kubernetes benchmark tests run on the etcd key value store.

MetadataDescription
Metric TypeGauge
Value TypeInteger
Segment ByContainer
Default Time AggregationRate
Available Time Aggregation FormatsAvg, Rate, Sum, Min, Max
Default Group AggregationSum
Available Group Aggregation FormatsAvg, Sum, Min, Max

compliance.k8s-bench.etcd.tests_warn

The number of Kubernetes benchmark tests run on the etcd key value store that returned a result of WARN.

MetadataDescription
Metric TypeGauge
Value TypeInteger
Segment ByContainer
Default Time AggregationRate
Available Time Aggregation FormatsAvg, Rate, Sum, Min, Max
Default Group AggregationSum
Available Group Aggregation FormatsAvg, Sum, Min, Max

compliance.k8s-bench.general-security-primitives.pass_pct

The percentage of Kubernetes benchmark tests run on the security primitives that passed.

MetadataDescription
Metric TypeGauge
Value Type%
Segment ByContainer
Default Time AggregationAverage
Available Time Aggregation FormatsAvg, Rate, Sum, Min, Max
Default Group AggregationAverage
Available Group Aggregation FormatsAvg, Sum, Min, Max

compliance.k8s-bench.general-security-primitives.tests_fail

The number of Kubernetes benchmark tests run on the security primitives that failed.

MetadataDescription
Metric TypeGauge
Value TypeInteger
Segment ByContainer
Default Time AggregationRate
Available Time Aggregation FormatsAvg, Rate, Sum, Min, Max
Default Group AggregationSum
Available Group Aggregation FormatsAvg, Sum, Min, Max

compliance.k8s-bench.general-security-primitives.tests_pass

The number of Kubernetes benchmark tests run on the security primitives that passed.

MetadataDescription
Metric TypeGauge
Value TypeInteger
Segment ByContainer
Default Time AggregationRate
Available Time Aggregation FormatsAvg, Rate, Sum, Min, Max
Default Group AggregationSum
Available Group Aggregation FormatsAvg, Sum, Min, Max

compliance.k8s-bench.general-security-primitives.tests_total

The total number of Kubernetes benchmark tests run on the security primitives.

MetadataDescription
Metric TypeGauge
Value TypeInteger
Segment ByContainer
Default Time AggregationRate
Available Time Aggregation FormatsAvg, Rate, Sum, Min, Max
Default Group AggregationSum
Available Group Aggregation FormatsAvg, Sum, Min, Max

compliance.k8s-bench.general-security-primitives.tests_warn

The number of Kubernetes benchmark tests run on the security primitives that returned a result of WARN.

MetadataDescription
Metric TypeGauge
Value TypeInteger
Segment ByContainer
Default Time AggregationRate
Available Time Aggregation FormatsAvg, Rate, Sum, Min, Max
Default Group AggregationSum
Available Group Aggregation FormatsAvg, Sum, Min, Max

compliance.k8s-bench.kubelet.pass_pct

The percentage of Kubernetes benchmark tests run on the non-master node Kubernetes agent that passed.

MetadataDescription
Metric TypeGauge
Value Type%
Segment ByContainer
Default Time AggregationAverage
Available Time Aggregation FormatsAvg, Rate, Sum, Min, Max
Default Group AggregationAverage
Available Group Aggregation FormatsAvg, Sum, Min, Max

compliance.k8s-bench.kubelet.tests_fail

The number of Kubernetes benchmark tests run on the non-master node Kubernetes agent that failed.

MetadataDescription
Metric TypeGauge
Value TypeInteger
Segment ByContainer
Default Time AggregationRate
Available Time Aggregation FormatsAvg, Rate, Sum, Min, Max
Default Group AggregationSum
Available Group Aggregation FormatsAvg, Sum, Min, Max

compliance.k8s-bench.kubelet.tests_pass

The number of Kubernetes benchmark tests run on the non-master node Kubernetes agent that passed.

MetadataDescription
Metric TypeGauge
Value TypeInteger
Segment ByContainer
Default Time AggregationRate
Available Time Aggregation FormatsAvg, Rate, Sum, Min, Max
Default Group AggregationSum
Available Group Aggregation FormatsAvg, Sum, Min, Max

compliance.k8s-bench.kubelet.tests_total

The total number of Kubernetes benchmark tests run on the non-master node Kubernetes agent.

MetadataDescription
Metric TypeGauge
Value TypeInteger
Segment ByContainer
Default Time AggregationRate
Available Time Aggregation FormatsAvg, Rate, Sum, Min, Max
Default Group AggregationSum
Available Group Aggregation FormatsAvg, Sum, Min, Max

compliance.k8s-bench.kubelet.tests_warn

The number of Kubernetes benchmark tests run on the non-master node Kubernetes agent that returned a result of WARN.

MetadataDescription
Metric TypeGauge
Value TypeInteger
Segment ByContainer
Default Time AggregationRate
Available Time Aggregation FormatsAvg, Rate, Sum, Min, Max
Default Group AggregationSum
Available Group Aggregation FormatsAvg, Sum, Min, Max

compliance.k8s-bench.pass_pct

The percentage of Kubernetes benchmark tests that passed.

MetadataDescription
Metric TypeGauge
Value Type%
Segment ByContainer
Default Time AggregationAverage
Available Time Aggregation FormatsAvg, Rate, Sum, Min, Max
Default Group AggregationAverage
Available Group Aggregation FormatsAvg, Sum, Min, Max

compliance.k8s-bench.scheduler.pass_pct

The percentage of Kubernetes benchmark tests run on the scheduler that passed.

MetadataDescription
Metric TypeGauge
Value Type%
Segment ByContainer
Default Time AggregationAverage
Available Time Aggregation FormatsAvg, Rate, Sum, Min, Max
Default Group AggregationAverage
Available Group Aggregation FormatsAvg, Sum, Min, Max

compliance.k8s-bench.scheduler.tests_fail

The number of Kubernetes benchmark tests run on the scheduler that failed.

MetadataDescription
Metric TypeGauge
Value TypeInteger
Segment ByContainer
Default Time AggregationRate
Available Time Aggregation FormatsAvg, Rate, Sum, Min, Max
Default Group AggregationSum
Available Group Aggregation FormatsAvg, Sum, Min, Max

compliance.k8s-bench.scheduler.tests_pass

The number of Kubernetes benchmark tests run on the scheduler that passed.

MetadataDescription
Metric TypeGauge
Value TypeInteger
Segment ByContainer
Default Time AggregationRate
Available Time Aggregation FormatsAvg, Rate, Sum, Min, Max
Default Group AggregationSum
Available Group Aggregation FormatsAvg, Sum, Min, Max

compliance.k8s-bench.scheduler.tests_total

The total number of Kubernetes benchmark tests run on the scheduler.

MetadataDescription
Metric TypeGauge
Value TypeInteger
Segment ByContainer
Default Time AggregationRate
Available Time Aggregation FormatsAvg, Rate, Sum, Min, Max
Default Group AggregationSum
Available Group Aggregation FormatsAvg, Sum, Min, Max

compliance.k8s-bench.scheduler.tests_warn

The number of Kubernetes benchmark tests run on the scheduler that returned a result of WARN.

MetadataDescription
Metric TypeGauge
Value TypeInteger
Segment ByContainer
Default Time AggregationRate
Available Time Aggregation FormatsAvg, Rate, Sum, Min, Max
Default Group AggregationSum
Available Group Aggregation FormatsAvg, Sum, Min, Max

compliance.k8s-bench.tests_fail

The number of Kubernetes benchmark tests that failed.

MetadataDescription
Metric TypeGauge
Value TypeInteger
Segment ByContainer
Default Time AggregationRate
Available Time Aggregation FormatsAvg, Rate, Sum, Min, Max
Default Group AggregationSum
Available Group Aggregation FormatsAvg, Sum, Min, Max

compliance.k8s-bench.tests_pass

The number of Kubernetes benchmark tests that passed.

MetadataDescription
Metric TypeGauge
Value TypeInteger
Segment ByContainer
Default Time AggregationRate
Available Time Aggregation FormatsAvg, Rate, Sum, Min, Max
Default Group AggregationSum
Available Group Aggregation FormatsAvg, Sum, Min, Max

compliance.k8s-bench.tests_total

The total number of Kubernetes benchmark tests run.

MetadataDescription
Metric TypeGauge
Value TypeInteger
Segment ByContainer
Default Time AggregationRate
Available Time Aggregation FormatsAvg, Rate, Sum, Min, Max
Default Group AggregationSum
Available Group Aggregation FormatsAvg, Sum, Min, Max

compliance.k8s-bench.tests_warn

The number of Kubernetes benchmark tests that returned a result of WARN.

MetadataDescription
Metric TypeGauge
Value TypeInteger
Segment ByContainer
Default Time AggregationRate
Available Time Aggregation FormatsAvg, Rate, Sum, Min, Max
Default Group AggregationSum
Available Group Aggregation FormatsAvg, Sum, Min, Max

6.3.4 - Containers

Note: Sysdig follows the Prometheus-compabtible naming convention for both metrics and labels as opposed to the previous statsd-compatible, legacy Sysdig naming convention. However, this page still shows metrics in the legacy Sysdig naming convention. Until this page is updated, see Metrics and Label Mapping for the mapping between Sysdig legacy and Prometheus naming conventions.

This topic introduces you to the Container metrics.

container.count

The number of containers in the infrastructure.

MetadataDescription
Metric TypeGauge
Value TypeInteger
Segment ByHost, Container, Process, Kubernetes, Mesos, Swarm, CloudProvider
Default Time AggregationRate
Available Time Aggregation FormatsAvg, Rate, Sum, Min, Max
Default Group AggregationAverage
Available Time Aggregation FormatsAvg, Sum, Min, Max

container.id

The container’s identifier.

For Docker containers, this value is a 12 digit hex number.

MetadataDescription
Metric TypeGauge
Value TypeString
Segment ByContainer
Default Time AggregationN/A
Available Time Aggregation FormatsN/A
Default Group AggregationN/A
Available Time Aggregation FormatsN/A

container.image

The name of the image used to run the container.

MetadataDescription
Metric TypeGauge
Value TypeString
Segment ByContainer
Default Time AggregationN/A
Available Time Aggregation FormatsN/A
Default Group AggregationN/A
Available Time Aggregation FormatsN/A

container.name

The name of the container.

MetadataDescription
Metric TypeGauge
Value TypeString
Segment ByContainer
Default Time AggregationN/A
Available Time Aggregation FormatsN/A
Default Group AggregationN/A
Available Time Aggregation FormatsN/A

container.type

The type of container (for example, Docker, LXC, or Mesos).

MetadataDescription
Metric TypeGauge
Value TypeString
Segment ByContainer
Default Time AggregationN/A
Available Time Aggregation FormatsN/A
Default Group AggregationN/A
Available Time Aggregation FormatsN/A

cpu.quota.used.percent

The percentage of CPU quota a container actually used over a defined period of time.

CPU quotas are a common way of creating a CPU limit for a container. A container can only spend its quota of time on CPU cycles across a given time period. The default time period is 100ms.

Unlike CPU shares, CPU quota is a hard limit for the amount of CPU the container can use. For this reason, the CPU quota should not exceed 100% for an extended period of time. For a shorter time, containers are allowed to consume higher than the CPU quota.

MetadataDescription
Metric TypeGauge
Value Type%
Segment ByHost, Container, Process, Kubernetes, Mesos, Swarm, CloudProvider
Default Time AggregationAverage
Available Time Aggregation FormatsAvg, Rate, Sum, Min, Max
Default Group AggregationAverage
Available Time Aggregation FormatsAvg, Sum, Min, Max

cpu.shares.count

The amount of CPU shares assigned to the container’s cgroup. CPU shares represent a relative weight used by the kernel to distribute CPU cycles across different containers. Each container receives its own allocation of CPU cycles, based on the ratio of share allocation for the container versus the total share allocation for all containers. For example, if an environment has three containers, each with 1024 shares, then each will receive 1/3 of the CPU cycles.

The default value for a container is 1024.

Defining a CPU shares count is a common way to create a CPU limit for a container.

The CPU shares count is not a hard limit. A container can consume more than its allocation, as long as the CPU has cycles that are not being consumed by the container they were originally allocated to.

MetadataDescription
Metric TypeGauge
Value TypeInteger
Segment ByHost, Container, Process, Kubernetes, Mesos, Swarm, CloudProvider
Default Time AggregationRate
Available Time Aggregation FormatsAvg, Rate, Sum, Min, Max
Default Group AggregationAverage
Available Time Aggregation FormatsAvg, Sum, Min, Max

cpu.shares.used.percent

The percentage of a container’s allocated CPU shares that are used. CPU shares are a common way of creating a CPU limit for a container, as they represent a relative weight used by the kernel to distribute CPU cycles across different containers. Each container receives its own allocation of CPU cycles, according to the ratio of share count vs the total number of shares claimed by all containers. For example, in an infrastructure with three containers, each with 1024 shares, each container receives 1/3 of the CPU cycles.

A container can use more CPU cycles than allocated if the CPU has cycles that are not being consumed by the container they were originally allocated to. This means that the value of cpu.shares.used.percent can exceed 100%.

MetadataDescription
Metric TypeGauge
Value Type%
Segment ByHost, Container, Process, Kubernetes, Mesos, Swarm, CloudProvider
Default Time AggregationAverage
Available Time Aggregation FormatsAvg, Rate, Sum, Min, Max
Default Group AggregationAverage
Available Time Aggregation FormatsAvg, Sum, Min, Max

memory.limit.bytes

The RAM limit assigned to a container. The default value is 0.

MetadataDescription
Metric TypeGauge
Value TypeByte
Segment ByHost, Container, Process, Kubernetes, Mesos, Swarm, CloudProvider
Default Time AggregationRate
Available Time Aggregation FormatsAvg, Rate, Sum, Min, Max
Default Group AggregationAverage
Available Time Aggregation FormatsAvg, Sum, Min, Max

memory.limit.used.percent

The percentage of the memory limit used by a container.

MetadataDescription
Metric TypeGauge
Value Type%
Segment ByHost, Container, Process, Kubernetes, Mesos, Swarm, CloudProvider
Default Time AggregationAverage
Available Time Aggregation FormatsAvg, Rate, Sum, Min, Max
Default Group AggregationAverage
Available Time Aggregation FormatsAvg, Sum, Min, Max

swap.limit.bytes

The swap limit assigned to a container.

MetadataDescription
Metric TypeGauge
Value TypeByte
Segment ByHost, Container, Process, Kubernetes, Mesos, Swarm, CloudProvider
Default Time AggregationRate
Available Time Aggregation FormatsAvg, Rate, Sum, Min, Max
Default Group AggregationAverage
Available Time Aggregation FormatsAvg, Sum, Min, Max

swap.limit.used.percent

The percentage of swap limit used by the container.

MetadataDescription
Metric TypeGauge
Value Type%
Segment ByHost, Container, Process, Kubernetes, Mesos, Swarm, CloudProvider
Default Time AggregationAverage
Available Time Aggregation FormatsAvg, Rate, Sum, Min, Max
Default Group AggregationAverage
Available Time Aggregation FormatsAvg, Sum, Min, Max

6.3.5 - Cloud Provider

Note: Sysdig follows the Prometheus-compabtible naming convention for both metrics and labels as opposed to the previous statsd-compatible, legacy Sysdig naming convention. However, this page still shows metrics in the legacy Sysdig naming convention. Until this page is updated, see Metrics and Label Mapping for the mapping between Sysdig legacy and Prometheus naming conventions.

At this time, all cloudProvider metrics are AWS-related.

cloudProvider.account.id

The cloud provider instance account number.

This metric is useful if there are multiple accounts linked with Sysdig Monitor.

MetadataDescription
Metric TypeGauge
Value TypeString
Segment ByCloudProvider
Default Time AggregationN/A
Available Time Aggregation FormatsN/A
Default Group AggregationN/A
Available Group Aggregation FormatsN/A

cloudProvider.availabilityZone

The AWS Availability Zone where the entity or entities are located. Each availability zone is an isolated subsection of an AWS region. See cloudProvider.region.

MetadataDescription
Metric TypeGauge
Value TypeString
Segment ByCloudProvider
Default Time AggregationN/A
Available Time Aggregation FormatsN/A
Default Group AggregationN/A
Available Group Aggregation FormatsN/A

cloudProvider.host.ip.private

The private IP address allocated by the cloud provider for the instance. This address can be used for communication between instances in the same network.

MetadataDescription
Metric TypeGauge
Value TypeString
Segment ByCloudProvider
Default Time AggregationN/A
Available Time Aggregation FormatsN/A
Default Group AggregationN/A
Available Group Aggregation FormatsN/A

cloudProvider.host.ip.public

Public IP address of the selected host.

MetadataDescription
Metric TypeGauge
Value TypeString
Segment ByCloudProvider
Default Time AggregationN/A
Available Time Aggregation FormatsN/A
Default Group AggregationN/A
Available Group Aggregation FormatsN/A

cloudProvider.host.name

The name of the host as reported by the cloud provider.

MetadataDescription
Metric TypeGauge
Value TypeString
Segment ByCloudProvider
Default Time AggregationN/A
Available Time Aggregation FormatsN/A
Default Group AggregationN/A
Available Group Aggregation FormatsN/A

cloudProvider.id

The ID number as assigned and reported by the cloud provider.

MetadataDescription
Metric TypeGauge
Value TypeString
Segment ByCloudProvider
Default Time AggregationN/A
Available Time Aggregation FormatsN/A
Default Group AggregationN/A
Available Group Aggregation FormatsN/A

cloudProvider.instance.type

The type of instance (for example, AWS or Rackspace).

This metric is extremely useful to segment instances and compare their resource usage and saturation. You can use it as a grouping criteria for the explore table to quickly explore AWS usage on a per-instance-type basis. You can also use it to compare things like CPU usage, number of requests or network utilization for different instance types.

Use this grouping criteria in conjunction with the host.count metric to easily create a report on how many instances of each type you have.

MetadataDescription
Metric TypeGauge
Value TypeString
Segment ByCloudProvider
Default Time AggregationN/A
Available Time Aggregation FormatsN/A
Default Group AggregationN/A
Available Group Aggregation FormatsN/A

cloudProvider.name

The name of the instance (for example, AWS or Rackspace).

MetadataDescription
Metric TypeGauge
Value TypeString
Segment ByCloudProvider
Default Time AggregationN/A
Available Time Aggregation FormatsN/A
Default Group AggregationN/A
Available Group Aggregation FormatsN/A

cloudProvider.region

The region the cloud provider host (or group of hosts) is located in.

Use this grouping criteria in conjunction with the host.count metric to easily create a report on how many instances you have in each region.

MetadataDescription
Metric TypeGauge
Value TypeString
Segment ByCloudProvider
Default Time AggregationN/A
Available Time Aggregation FormatsN/A
Default Group AggregationN/A
Available Group Aggregation FormatsN/A

cloudProvider.resource.endPoint

The DNS name for which the resource can be accessed.

MetadataDescription
Metric TypeGauge
Value TypeString
Segment ByCloudProvider
Default Time AggregationN/A
Available Time Aggregation FormatsN/A
Default Group AggregationN/A
Available Group Aggregation FormatsN/A

cloudProvider.resource.name

The cloud provider service name (for example, Amazon EC2 or Amazon ELB).

MetadataDescription
Metric TypeGauge
Value TypeString
Segment ByCloudProvider
Default Time AggregationN/A
Available Time Aggregation FormatsN/A
Default Group AggregationN/A
Available Group Aggregation FormatsN/A

cloudProvider.resource.type

The cloud provider service type (for example, INSTANCE, LOAD_BALANCER, DATABASE).

MetadataDescription
Metric TypeGauge
Value TypeString
Segment ByCloudProvider
Default Time AggregationN/A
Available Time Aggregation FormatsN/A
Default Group AggregationN/A
Available Group Aggregation FormatsN/A

cloudProvider.status

Resource status.

MetadataDescription
Metric TypeGauge
Value TypeString
Segment ByCloudProvider
Default Time AggregationN/A
Available Time Aggregation FormatsN/A
Default Group AggregationN/A
Available Group Aggregation FormatsN/A

6.3.5.1.1 - Elasticache

Amazon ElastiCache is a cloud-caching service that increases the performance, speed, and redundancy with which applications can retrieve data by providing an in-memory database caching system.

aws.elasticache.CPUUtilization

The percentage of CPU utilization.

When reaching high utilization and your main workload is from read requests, scale your cache cluster out by adding read replicas. If the main workload is from write requests, scale up by using a larger cache instance type.

For more information, refer to the ElastiCache documentation.

MetadataDescription
Metric TypeGauge
Value Type%
Segment ByCloudProvider
Default Time AggregationAverave
Available Time Aggregation FormatsAvg, Rate, Sum, Min, Max
Default Group AggregationAverage
Available Group Aggregation FormatsAvg, Sum, Min, Max

aws.elasticache.FreeableMemory

The amount of memory considered free, or that could be made available, for use by the node.

For more information, refer to the ElastiCache documentation.

MetadataDescription
Metric TypeGauge
Value TypeByte
Segment ByCloudProvider
Default Time AggregationRate
Available Time Aggregation FormatsAvg, Rate, Sum, Min, Max
Default Group AggregationAverage
Available Group Aggregation FormatsAvg, Sum, Min, Max

aws.elasticache.NetworkBytesIn

The number of bytes the host has read from the network.

For more information, refer to the ElastiCache documentation.

MetadataDescription
Metric TypeCounter
Value TypeByte
Segment ByCloudProvider
Default Time AggregationRate
Available Time Aggregation FormatsAvg, Rate, Sum, Min, Max
Default Group AggregationSum
Available Group Aggregation FormatsAvg, Sum, Min, Max

aws.elasticache.NetworkBytesOut

The number of bytes the host has written to the network.

For more information, refer to the ElastiCache documentation.

MetadataDescription
Metric TypeCounter
Value TypeByte
Segment ByCloudProvider
Default Time AggregationRate
Available Time Aggregation FormatsAvg, Rate, Sum, Min, Max
Default Group AggregationSum
Available Group Aggregation FormatsAvg, Sum, Min, Max

aws.elasticache.SwapUsage

The amount of swap space used on the host.

If swap is being utilized, the node probably needs more memory than is available and cache performance may be negatively impacted. Consider adding more nodes or using larger ones to reduce or eliminate swapping.

For more information, refer to the ElastiCache documentation.

MetadataDescription
Metric TypeGauge
Value TypeByte
Segment ByCloudProvider
Default Time AggregationRate
Available Time Aggregation FormatsAvg, Rate, Sum, Min, Max
Default Group AggregationAverage
Available Group Aggregation FormatsAvg, Sum, Min, Max

6.3.5.1.2 - Elastic Application Load Balancing (ALB)

Application Load Balancer is best suited for load balancing of HTTP and HTTPS traffic and provides advanced request routing targeted at the delivery of modern application architectures, including microservices and containers. For more information, refer to the Elastic Application Load Balancer documentation.

aws.alb.ActiveConnectionCount

The total number of concurrent TCP connections active from clients to the load balancer and from the load balancer to the targets.

MetadataDescription
Metric TypeCounter
Value TypeInteger
Segment ByCloudProvider
Default Time AggregationRate
Available Time Aggregation FormatsAvg, Rate, Sum, Min, Max
Default Group AggregationSum
Available Group Aggregation FormatsAvg, Sum, Min, Max

aws.alb.ClientTLSNegotiationErrorCount

The number of TLS connections initiated by the client that did not establish a session with the load balancer.

Possible causes include a mismatch of ciphers or protocols.

MetadataDescription
Metric TypeCounter
Value TypeInteger
Segment ByCloudProvider
Default Time AggregationRate
Available Time Aggregation FormatsAvg, Rate, Sum, Min, Max
Default Group AggregationSum
Available Group Aggregation FormatsAvg, Sum, Min, Max

aws.alb.ConsumedLCUs

The number of load balancer capacity units (LCU) used by the load balancer.

MetadataDescription
Metric TypeCounter
Value TypeInteger
Segment ByCloudProvider
Default Time AggregationRate
Available Time Aggregation FormatsAvg, Rate, Sum, Min, Max
Default Group AggregationSum
Available Group Aggregation FormatsAvg, Sum, Min, Max

aws.alb.HTTPCode_ELB_4XX_Count

The number of HTTP 4XX client error codes that originate form the load balancer. Client errors are generated when requests are malformed or incomplete. These requests have not been received by the target.

This count does not include any response codes generated by the targets.

MetadataDescription
Metric TypeCounter
Value TypeInteger
Segment ByCloudProvider
Default Time AggregationRate
Available Time Aggregation FormatsAvg, Rate, Sum, Min, Max
Default Group AggregationSum
Available Group Aggregation FormatsAvg, Sum, Min, Max

aws.alb.HTTPCode_ELB_5XX_Count

The number of HTTP 5XX server error codes that originate from the load balancer. Server errors are generated when requests are malformed or incomplete. These requests have not been received by the target.

This count does not include any response codes generated by the targets.

MetadataDescription
Metric TypeCounter
Value TypeInteger
Segment ByCloudProvider
Default Time AggregationRate
Available Time Aggregation FormatsAvg, Rate, Sum, Min, Max
Default Group AggregationSum
Available Group Aggregation FormatsAvg, Sum, Min, Max

aws.alb.HTTPCode_Target_2XX_Count

The number of HTTP 2XX response codes generated by the target.

This count does not include any response codes generated by the load balancer.

MetadataDescription
Metric TypeCounter
Value TypeInteger
Segment ByCloudProvider
Default Time AggregationRate
Available Time Aggregation FormatsAvg, Rate, Sum, Min, Max
Default Group AggregationSum
Available Group Aggregation FormatsAvg, Sum, Min, Max

aws.alb.HTTPCode_Target_3XX_Count

The number of HTTP 3XX response codes generated by the target.

This count does not include any response codes generated by the load balancer.

MetadataDescription
Metric TypeCounter
Value TypeInteger
Segment ByCloudProvider
Default Time AggregationRate
Available Time Aggregation FormatsAvg, Rate, Sum, Min, Max
Default Group AggregationSum
Available Group Aggregation FormatsAvg, Sum, Min, Max

aws.alb.HTTPCode_Target_4XX_Count

The number of HTTP 4XX response codes generated by the target.

This count does not include any response codes generated by the load balancer.

MetadataDescription
Metric TypeCounter
Value TypeInteger
Segment ByCloudProvider
Default Time AggregationRate
Available Time Aggregation FormatsAvg, Rate, Sum, Min, Max
Default Group AggregationSum
Available Group Aggregation FormatsAvg, Sum, Min, Max

aws.alb.HTTPCode_Target_5XX_Count

The number of HTTP 5XX response codes generated by the target.

This count does not include any response codes generated by the load balancer.

MetadataDescription
Metric TypeCounter
Value TypeInteger
Segment ByCloudProvider
Default Time AggregationRate
Available Time Aggregation FormatsAvg, Rate, Sum, Min, Max
Default Group AggregationSum
Available Group Aggregation FormatsAvg, Sum, Min, Max

aws.alb.HealthyHostCount

The number of targets that are considered healthy.

MetadataDescription
Metric TypeCounter
Value TypeInteger
Segment ByCloudProvider
Default Time AggregationRate
Available Time Aggregation FormatsAvg, Rate, Sum, Min, Max
Default Group AggregationSum
Available Group Aggregation FormatsAvg, Sum, Min, Max

aws.alb.IPv6ProcessedBytes

The total number of bytes processed by the load balancer over IPv6.

MetadataDescription
Metric TypeCounter
Value TypeInteger
Segment ByCloudProvider
Default Time AggregationRate
Available Time Aggregation FormatsAvg, Rate, Sum, Min, Max
Default Group AggregationSum
Available Group Aggregation FormatsAvg, Sum, Min, Max

aws.alb.IPv6RequestCount

The total number of data requested by the load balancer over IPv6.

MetadataDescription
Metric TypeCounter
Value TypeInteger
Segment ByCloudProvider
Default Time AggregationRate
Available Time Aggregation FormatsAvg, Rate, Sum, Min, Max
Default Group AggregationSum
Available Group Aggregation FormatsAvg, Sum, Min, Max

aws.alb.NewConnectionCount

The total number of new TCP connections established from clients to the load balancer and from the load balancer to targets.

MetadataDescription
Metric TypeCounter
Value TypeInteger
Segment ByCloudProvider
Default Time AggregationRate
Available Time Aggregation FormatsAvg, Rate, Sum, Min, Max
Default Group AggregationSum
Available Group Aggregation FormatsAvg, Sum, Min, Max

aws.alb.ProcessedBytes

The total number of bytes processed by the load balancer over IPv4 and IPv6.

MetadataDescription
Metric TypeCounter
Value TypeInteger
Segment ByCloudProvider
Default Time AggregationRate
Available Time Aggregation FormatsAvg, Rate, Sum, Min, Max
Default Group AggregationSum
Available Group Aggregation FormatsAvg, Sum, Min, Max

aws.alb.RejectedConnectionCount

The number of connections that were rejected because the load balancer had reached its maximum number of connections.

MetadataDescription
Metric TypeCounter
Value TypeInteger
Segment ByCloudProvider
Default Time AggregationRate
Available Time Aggregation FormatsAvg, Rate, Sum, Min, Max
Default Group AggregationSum
Available Group Aggregation FormatsAvg, Sum, Min, Max

aws.alb.RequestCount

The number of requests processed over IPv4 and IPv6. This count only includes the requests with a response generated by a target of the load balancer.

MetadataDescription
Metric TypeCounter
Value TypeInteger
Segment ByCloudProvider
Default Time AggregationRate
Available Time Aggregation FormatsAvg, Rate, Sum, Min, Max
Default Group AggregationSum
Available Group Aggregation FormatsAvg, Sum, Min, Max

aws.alb.RequestCountPerTarget

The average number of requests received by each target in a target group.

MetadataDescription
Metric TypeCounter
Value TypeInteger
Segment ByCloudProvider
Default Time AggregationRate
Available Time Aggregation FormatsAvg, Rate, Sum, Min, Max
Default Group AggregationSum
Available Group Aggregation FormatsAvg, Sum, Min, Max

aws.alb.RuleEvaluations

The number of rules processed by the load balancer given a request rate averaged over an hour.

MetadataDescription
Metric TypeCounter
Value TypeInteger
Segment ByCloudProvider
Default Time AggregationRate
Available Time Aggregation FormatsAvg, Rate, Sum, Min, Max
Default Group AggregationSum
Available Group Aggregation FormatsAvg, Sum, Min, Max

aws.alb.TargetConnectionErrorCount

The number of connections that were not successfully established between the load balancer and target.

MetadataDescription
Metric TypeCounter
Value TypeInteger
Segment ByCloudProvider
Default Time AggregationRate
Available Time Aggregation FormatsAvg, Rate, Sum, Min, Max
Default Group AggregationSum
Available Group Aggregation FormatsAvg, Sum, Min, Max

aws.alb.TargetResponseTime

The time elapsed, in seconds, after the request leaves the load balancer until a response from the target is received.

MetadataDescription
Metric TypeCounter
Value TypeInteger
Segment ByCloudProvider
Default Time AggregationRate
Available Time Aggregation FormatsAvg, Rate, Sum, Min, Max
Default Group AggregationSum
Available Group Aggregation FormatsAvg, Sum, Min, Max

aws.alb.TargetTLSNegotiationErrorCount

The number of TLS connections initiated by the load balancer that did not establish a session with the target.

Possible causes include a mismatch of ciphers or protocols.

MetadataDescription
Metric TypeCounter
Value TypeInteger
Segment ByCloudProvider
Default Time AggregationRate
Available Time Aggregation FormatsAvg, Rate, Sum, Min, Max
Default Group AggregationSum
Available Group Aggregation FormatsAvg, Sum, Min, Max

aws.alb.UnHealthyHostCount

The number of targets that are considered unhealthy.

MetadataDescription
Metric TypeCounter
Value TypeInteger
Segment ByCloudProvider
Default Time AggregationRate
Available Time Aggregation FormatsAvg, Rate, Sum, Min, Max
Default Group AggregationSum
Available Group Aggregation FormatsAvg, Sum, Min, Max

6.3.5.1.3 - Elastic Cloud Compute (EC2)

Amazon Elastic Compute Cloud (Amazon EC2) is a web service that provides secure, resizable compute capacity in the cloud. It is designed to make web-scale cloud computing easier for developers.

aws.ec2.CPUCreditBalance

The CPU credit balance of an instance, based on what has accrued since it started. For more information, refer to the Elastic Compute Cloud metric definition table.

MetadataDescription
Metric TypeGauge
Value TypeInteger
Segment ByCloudProvider
Default Time AggregationRate
Available Time Aggregation FormatsAvg, Rate, Sum, Min, Max
Default Group AggregationAverage
Available Group Aggregation FormatsAvg, Sum, Min, Max

aws.ec2.CPUCreditUsage

The CPU credit usage by the instance. For more information, refer to the Elastic Compute Cloud metric definition documentation.

MetadataDescription
Metric TypeGauge
Value TypeInteger
Segment ByCloudProvider
Default Time AggregationRate
Available Time Aggregation FormatsAvg, Rate, Sum, Min, Max
Default Group AggregationAverage
Available Group Aggregation FormatsAvg, Sum, Min, Max

aws.ec2.CPUUtilization

The percentage of allocated EC2 compute units currently in use on the instance. For more information, refer to the Elastic Compute Cloud metric definition documentation.

This metric identifies the processing power required to run an application upon a selected instance.

MetadataDescription
Metric TypeGauge
Value Type%
Segment ByCloudProvider
Default Time AggregationAverage
Available Time Aggregation FormatsAvg, Rate, Sum, Min, Max
Default Group AggregationAverage
Available Group Aggregation FormatsAvg, Sum, Min, Max

aws.ec2.DiskReadBytes

The total bytes read from all ephemeral disks available to the instance. This metric is used to determine the volume of the data the application reads from the disk and can be used to determine the speed of the application.

The number reported is the number of bytes received during a specified period. For a basic (five-minute) monitoring, divide this number by 300 to find Bytes/second. For a detailed (one-minute) monitoring, divide it by 60.

For more information, refer to the Elastic Compute Cloud metric definition documentation.

MetadataDescription
Metric TypeCounter
Value TypeByte
Segment ByCloudProvider
Default Time AggregationRate
Available Time Aggregation FormatsAvg, Rate, Sum, Min, Max
Default Group AggregationSum
Available Group Aggregation FormatsAvg, Sum, Min, Max

aws.ec2.DiskReadOps

Total completed read operations from all ephemeral disks available to the instance in a specified period of time. For more information, refer to the Elastic Compute Cloud metric definition documentation.

MetadataDescription
Metric TypeCounter
Value TypeInteger
Segment ByCloudProvider
Default Time AggregationRate
Available Time Aggregation FormatsAvg, Rate, Sum, Min, Max
Default Group AggregationAverage
Available Group Aggregation FormatsAvg, Sum, Min, Max

aws.ec2.DiskWriteBytes

It is the total bytes written to all ephemeral disks available to the instance. This metric is used to determine the volume of the data the application writes to the disk and can be used to determine the speed of the application.

The number reported is the number of bytes received during a specified period. For a basic (five-minute) monitoring, divide this number by 300 to find Bytes/second. For a detailed (one-minute) monitoring, divide it by 60.

For more information, refer to the Elastic Compute Cloud metric definition documentation.

MetadataDescription
Metric TypeCounter
Value TypeByte
Segment ByCloudProvider
Default Time AggregationRate
Available Time Aggregation FormatsAvg, Rate, Sum, Min, Max
Default Group AggregationSum
Available Group Aggregation FormatsAvg, Sum, Min, Max

aws.ec2.DiskWriteOps

The completed write operations to all ephemeral disks available to the instance in a specified period of time. If your instance uses Amazon EBS volumes, see Amazon EBS Metrics. For more information, refer to the Elastic Compute Cloud metric definition documentation.

MetadataDescription
Metric TypeCounter
Value TypeByte
Segment ByCloudProvider
Default Time AggregationRate
Available Time Aggregation FormatsAvg, Rate, Sum, Min, Max
Default Group AggregationSum
Available Group Aggregation FormatsAvg, Sum, Min, Max

aws.ec2.NetworkIn

The number of bytes received on all network interfaces by the instance. For more information, refer to the Elastic Compute Cloud metric definition documentation.

MetadataDescription
Metric TypeCounter
Value TypeByte
Segment ByCloudProvider
Default Time AggregationRate
Available Time Aggregation FormatsAvg, Rate, Sum, Min, Max
Default Group AggregationSum
Available Group Aggregation FormatsAvg, Sum, Min, Max

aws.ec2.NetworkOut

The number of bytes sent out on all network interfaces by the instance. For more information, refer to the Elastic Compute Cloud metric definition documentation.

This metric identifies the volume of outgoing network traffic to an application on a single instance.

MetadataDescription
Metric TypeCounter
Value TypeByte
Segment ByCloudProvider
Default Time AggregationRate
Available Time Aggregation FormatsAvg, Rate, Sum, Min, Max
Default Group AggregationSum
Available Group Aggregation FormatsAvg, Sum, Min, Max

6.3.5.1.4 - Elastic Container Service (ECS)

Amazon Elastic Container Service (Amazon ECS) is a highly scalable, high-performance container orchestration service that supports Docker containers and allows you to easily run and scale containerized applications on AWS. Amazon ECS eliminates the need for you to install and operate your own container orchestration software, manage and scale a cluster of virtual machines, or schedule containers on those virtual machines.

ecs.clusterName

The name of the cluster. For more information, refer to the AWS CloudFormation documentation.

MetadataDescription
Metric TypeGauge
Value TypeString
Segment ByCloudProvider
Default Time AggregationN/A
Available Time Aggregation FormatsN/A
Default Group AggregationN/A
Available Group Aggregation FormatsN/A

ecs.serviceName

The name of the Elastic Container Service (Amazon ECS) service. For more information, refer to the AWS CloudFormation documentation.

MetadataDescription
Metric TypeGauge
Value TypeString
Segment ByCloudProvider
Default Time AggregationN/A
Available Time Aggregation FormatsN/A
Default Group AggregationN/A
Available Group Aggregation FormatsN/A

ecs.taskFamilyName

The name of the task definition family. For more information, refer to the AWS CloudFormation documentation.

MetadataDescription
Metric TypeGauge
Value TypeString
Segment ByCloudProvider
Default Time AggregationN/A
Available Time Aggregation FormatsN/A
Default Group AggregationN/A
Available Group Aggregation FormatsN/A

6.3.5.1.5 - Elastic Load Balancing (ELB)

Elastic Load Balancing automatically distributes incoming application traffic across multiple targets, such as Amazon EC2 instances, containers, IP addresses, and Lambda functions.

aws.elb.BackendConnectionErrors

The number of errors encountered by the load balancer while attempting to connect to your application.

For high error counts, look for network related issues or check that your servers are operating correctly. The ELB is having problems connecting to them.

For more information, refer to the Elastic Load Balancing documentation.

MetadataDescription
Metric TypeCounter
Value TypeInteger
Segment ByCloudProvider
Default Time AggregationRate
Available Time Aggregation FormatsAvg, Rate, Sum, Min, Max
Default Group AggregationSum
Available Group Aggregation FormatsAvg, Sum, Min, Max

aws.elb.HealthyHostCount

A count of the number of healthy instances that are bound to the load balancer.

Hosts are declared healthy if they meet the threshold for the number of consecutive health checks that are successful. Hosts that have failed more health checks than the value of the unhealthy threshold are considered unhealthy. If cross-zone is enabled, the count of the number of healthy instances is calculated for all Availability Zones.

For more information, refer to the Elastic Load Balancing documentation.

MetadataDescription
Metric TypeCounter
Value TypeInteger
Segment ByCloudProvider
Default Time AggregationRate
Available Time Aggregation FormatsAvg, Rate, Sum, Min, Max
Default Group AggregationSum
Available Group Aggregation FormatsAvg, Sum, Min, Max

aws.elb.HTTPCode_Backend_2XX

The count of the number of HTTP 2XX response codes generated by back-end instances. This metric does not include any response codes generated by the load balancer.

The 2XX class status codes represent successful actions (e.g., 200-OK, 201-Created, 202-Accepted, 203-Non-Authoritative Info).

For more information, refer to the Elastic Load Balancing documentation.

MetadataDescription
Metric TypeCounter
Value TypeInteger
Segment ByCloudProvider
Default Time AggregationRate
Available Time Aggregation FormatsAvg, Rate, Sum, Min, Max
Default Group AggregationSum
Available Group Aggregation FormatsAvg, Sum, Min, Max

aws.elb.HTTPCode_Backend_3XX

The count of the number of HTTP 3XX response codes generated by back-end instances. This metric does not include any response codes generated by the load balancer.

The 3XX class status code indicates that the user agent requires action (e.g., 301-Moved Permanently, 302-Found, 305-Use Proxy, 307-Temporary Redirect).

For more information, refer to the Elastic Load Balancing documentation.

MetadataDescription
Metric TypeCounter
Value TypeInteger
Segment ByCloudProvider
Default Time AggregationRate
Available Time Aggregation FormatsAvg, Rate, Sum, Min, Max
Default Group AggregationSum
Available Group Aggregation FormatsAvg, Sum, Min, Max

aws.elb.HTTPCode_Backend_4XX

The count of the number of HTTP 4XX response codes generated by back-end instances. This metric does not include any response codes generated by the load balancer. For more information, refer to the Elastic Load Balancing documentation.

The 4XX class status code represents client errors (e.g., 400-Bad Request, 401-Unauthorized, 403-Forbidden, 404-Not Found).

MetadataDescription
Metric TypeCounter
Value TypeInteger
Segment ByCloudProvider
Default Time AggregationRate
Available Time Aggregation FormatsAvg, Rate, Sum, Min, Max
Default Group AggregationSum
Available Group Aggregation FormatsAvg, Sum, Min, Max

aws.elb.HTTPCode_Backend_5XX

The count of the number of HTTP 5XX response codes generated by back-end instances. This metric does not include any response codes generated by the load balancer. For more information, refer to the Elastic Load Balancing documentation.

The 5XX class status code represents back-end server errors e.g., 500-Internal Server Error, 501-Not implemented, 503-Service Unavailable).

MetadataDescription
Metric TypeCounter
Value TypeInteger
Segment ByCloudProvider
Default Time AggregationRate
Available Time Aggregation FormatsAvg, Rate, Sum, Min, Max
Default Group AggregationSum
Available Group Aggregation FormatsAvg, Sum, Min, Max

aws.elb.HTTPCode_ELB_4XX

The count of the number of HTTP 4XX client error codes generated by the load balancer when the listener is configured to use HTTP or HTTPS protocols. For more information, refer to the Elastic Load Balancing documentation.

Client errors are generated when a request is malformed or is incomplete.

MetadataDescription
Metric TypeCounter
Value TypeInteger
Segment ByCloudProvider
Default Time AggregationRate
Available Time Aggregation FormatsAvg, Rate, Sum, Min, Max
Default Group AggregationSum
Available Group Aggregation FormatsAvg, Sum, Min, Max

aws.elb.HTTPCode_ELB_5XX

The count of the number of HTTP 5XX server error codes generated by the load balancer when the listener is configured to use HTTP or HTTPS protocols. This metric does not include any responses generated by back-end instances.For more information, refer to the Elastic Load Balancing documentation.

The metric is reported if there are no back-end instances that are healthy or registered to the load balancer, or if the request rate exceeds the capacity of the instances or the load balancers.

MetadataDescription
Metric TypeCounter
Value TypeInteger
Segment ByCloudProvider
Default Time AggregationRate
Available Time Aggregation FormatsAvg, Rate, Sum, Min, Max
Default Group AggregationSum
Available Group Aggregation FormatsAvg, Sum, Min, Max

aws.elb.Latency

A measurement of the time backend requests require to process. For more information, refer to the Elastic Load Balancing documentation.

Latency metrics from the ELB are good indicators of the overall performance of your application.

MetadataDescription
Metric TypeCounter
Value TyperelativeTime
Segment ByCloudProvider
Default Time AggregationAverage
Available Time Aggregation FormatsAvg, Rate, Sum, Min, Max
Default Group AggregationAverage
Available Group Aggregation FormatsAvg, Sum, Min, Max

aws.elb.RequestCount

The number of requests handled by the load balancer. For more information, refer to the Elastic Load Balancing documentation.

MetadataDescription
Metric TypeCounter
Value TypeInteger
Segment ByCloudProvider
Default Time AggregationRate
Available Time Aggregation FormatsAvg, Rate, Sum, Min, Max
Default Group AggregationSum
Available Group Aggregation FormatsAvg, Sum, Min, Max

aws.elb.SpilloverCount

A count of the total number of requests that were rejected due to the queue being full. For more information, refer to the Elastic Load Balancing documentation.

Positive numbers indicate some requests are not being forwarded to any server. Clients are not notified that their request was dropped.

MetadataDescription
Metric TypeCounter
Value TypeInteger
Segment ByCloudProvider
Default Time AggregationRate
Available Time Aggregation FormatsAvg, Rate, Sum, Min, Max
Default Group AggregationSum
Available Group Aggregation FormatsAvg, Sum, Min, Max

aws.elb.SurgeQueueLength

A count of the total number of requests that are pending submission to a registered instance. For more information, refer to the Elastic Load Balancing documentation.

Positive numbers indicate clients are waiting for their requests to be forwarded to a server for processing.

MetadataDescription
Metric TypeCounter
Value TypeInteger
Segment ByCloudProvider
Default Time AggregationRate
Available Time Aggregation FormatsAvg, Rate, Sum, Min, Max
Default Group AggregationSum
Available Group Aggregation FormatsAvg, Sum, Min, Max

aws.elb.UnHealthyHostCount

The count of the number of unhealthy instances that are bound to the load balancer. For more information, refer to the Elastic Load Balancing documentation.

Hosts are declared healthy if they meet the threshold for the number of consecutive health checks that are successful. Hosts that have failed more health checks than the value of the unhealthy threshold are considered unhealthy.

MetadataDescription
Metric TypeCounter
Value TypeInteger
Segment ByCloudProvider
Default Time AggregationRate
Available Time Aggregation FormatsAvg, Rate, Sum, Min, Max
Default Group AggregationSum
Available Group Aggregation FormatsAvg, Sum, Min, Max

6.3.5.1.6 - DynamoDB

DynamoDB is a fully managed proprietary NoSQL database service that supports key-value and document data structures and is offered by Amazon as part of the Amazon Web Services portfolio. Amazon CloudWatch aggregates the DynamoDB metrics at one-minute intervals.

In DynamoDB, provisioned throughput requirements are specified in terms of capacity units: Read Capacity unit and Write Capacity unit. A unit of read capacity represents one strongly consistent read per second for items up to 4 KB in size. One write capacity unit represents one write per second for items up to 1 KB in size. Larger items will require more capacity. You can calculate the number of units of read and write capacity by estimating the number of reads or writes required per second and multiplying by the size of the items rounded up to the nearest KB.

For more information, see the Amazon DynamoDB documentation.

aws.dynamodb.ConditionalCheckFailedRequests

The number of failed attempts to perform conditional writes.

MetadataDescription
Metric TypeCounter
Value TypeInteger
Segment ByCloudProvider
Default Time AggregationRate
Available Time Aggregation FormatsAvg, Rate, Sum, Min, Max
Default Group AggregationSum
Available Group Aggregation FormatsAvg, Sum, Min, Max

aws.dynamodb.ConsumedReadCapacityUnits

The amount of read capacity units consumed over the defined time period. Amazon CloudWatch aggregates the metrics at one-minute intervals. Use the Sum aggregation to calculate the consumed throughput. For example, get the Sum value over a span of one minute, and divide it by the number of seconds in a minute (60) to calculate the average ConsumedReadCapacityUnits per second.

MetadataDescription
Metric TypeCounter
Value TypeInteger
Segment ByCloudProvider
Default Time AggregationRate
Available Time Aggregation FormatsAvg, Rate, Sum, Min, Max
Default Group AggregationSum
Available Group Aggregation FormatsAvg, Sum, Min, Max

aws.dynamodb.ConsumedWriteCapacityUnits

The amount of write capacity units consumed over the specified time interval. Amazon CloudWatch aggregates the metrics at one-minute intervals. Use the Sum aggregation to calculate the consumed throughput. For example, get the Sum value over a span of one minute, and divide it by the number of seconds in a minute (60) to calculate the average ConsumedWriteCapacityUnits per second.

MetadataDescription
Metric TypeCounter
Value TypeInteger
Segment ByCloudProvider
Default Time AggregationRate
Available Time Aggregation FormatsAvg, Rate, Sum, Min, Max
Default Group AggregationSum
Available Group Aggregation FormatsAvg, Sum, Min, Max

aws.dynamodb.ProvisionedReadCapacityUnits

The number of read capacity units provisioned for a table or a global secondary index.

MetadataDescription
Metric TypeCounter
Value TypeInteger
Segment ByCloudProvider
Default Time AggregationRate
Available Time Aggregation FormatsAvg, Rate, Sum, Min, Max
Default Group AggregationSum
Available Group Aggregation FormatsAvg, Sum, Min, Max

aws.dynamodb.ProvisionedWriteCapacityUnits

The number of write capacity units provisioned for a table or global secondary table.

MetadataDescription
Metric TypeCounter
Value TypeInteger
Segment ByCloudProvider
Default Time AggregationRate
Available Time Aggregation FormatsAvg, Rate, Sum, Min, Max
Default Group AggregationSum
Available Group Aggregation FormatsAvg, Sum, Min, Max

aws.dynamodb.ReadThrottleEvents

The number of DynamoDB requests that exceed the amount of read capacity units provisioned.

MetadataDescription
Metric TypeCounter
Value TypeInteger
Segment ByCloudProvider
Default Time AggregationRate
Available Time Aggregation FormatsAvg, Rate, Sum, Min, Max
Default Group AggregationSum
Available Group Aggregation FormatsAvg, Sum, Min, Max

aws.dynamodb.ReturnedBytes.GetRecords

The number of bytes returned by GetRecords operation during the specified time period.

MetadataDescription
Metric TypeCounter
Value TypeByte
Segment ByCloudProvider
Default Time AggregationRate
Available Time Aggregation FormatsAvg, Rate, Sum, Min, Max
Default Group AggregationSum
Available Group Aggregation FormatsAvg, Sum, Min, Max

aws.dynamodb.ReturnedItemCount

The number of items returned by query or scan operations during the specified time period.

MetadataDescription
Metric TypeCounter
Value TypeInteger
Segment ByCloudProvider
Default Time AggregationRate
Available Time Aggregation FormatsAvg, Rate, Sum, Min, Max
Default Group AggregationSum
Available Group Aggregation FormatsAvg, Sum, Min, Max

aws.dynamodb.ReturnedRecordsCount.GetRecords

The number of stream records returned by the GetRecords operations during the specific period.

MetadataDescription
Metric TypeCounter
Value TypeInteger
Segment ByCloudProvider
Default Time AggregationRate
Available Time Aggregation FormatsAvg, Rate, Sum, Min, Max
Default Group AggregationSum
Available Group Aggregation FormatsAvg, Sum, Min, Max

aws.dynamodb.SuccessfulRequestLatency

The number of successful requests to DynamoDB or Amazon DynamoDB Streams during the specified time period. The time period is in milliseconds.

MetadataDescription
Metric TypeCounter
Value TypeInteger
Segment ByCloudProvider
Default Time AggregationRate
Available Time Aggregation FormatsAvg, Rate, Sum, Min, Max
Default Group AggregationSum
Available Group Aggregation FormatsAvg, Sum, Min, Max

aws.dynamodb.SystemErrors

The number of requests made to DynamoDB or Amazon DynamoDB Streams that resulted in an HTTP 500 status code during the specified time period. HTTP 500 usually indicates an internal service error.

MetadataDescription
Metric TypeCounter
Value TypeInteger
Segment ByCloudProvider
Default Time AggregationRate
Available Time Aggregation FormatsAvg, Rate, Sum, Min, Max
Default Group AggregationSum
Available Group Aggregation FormatsAvg, Sum, Min, Max

aws.dynamodb.ThrottledRequests

The number of requests to DynamoDB that exceed the provisioned throughput limits on a resource, such as a table or an index. ThrottledRequests is incremented by one if any event within a request exceeds a provisioned throughput limit.

If any individual request for read or write events within the batch is throttled, ReadThrottleEvents metrics or WriteThrottleEvents metrics is incremented respectively.

MetadataDescription
Metric TypeCounter
Value TypeInteger
Segment ByCloudProvider
Default Time AggregationRate
Available Time Aggregation FormatsAvg, Rate, Sum, Min, Max
Default Group AggregationSum
Available Group Aggregation FormatsAvg, Sum, Min, Max

aws.dynamodb.UserErrors

The number of requests to DynamoDB or Amazon DynamoDB Streams that returned an HTTP 400 status code during the specified time period. HTTP 400 usually indicates a client-side error.

MetadataDescription
Metric TypeCounter
Value TypeInteger
Segment ByCloudProvider
Default Time AggregationRate
Available Time Aggregation FormatsAvg, Rate, Sum, Min, Max
Default Group AggregationSum
Available Group Aggregation FormatsAvg, Sum, Min, Max

aws.dynamodb.WriteThrottleEvents

The number of requests to DynamoDB that exceed the provisioned write capacity units for a table or a global secondary index.

MetadataDescription
Metric TypeCounter
Value TypeInteger
Segment ByCloudProvider
Default Time AggregationRate
Available Time Aggregation FormatsAvg, Rate, Sum, Min, Max
Default Group AggregationSum
Available Group Aggregation FormatsAvg, Sum, Min, Max

6.3.5.1.7 - Relational Database Service (RDS)

Amazon Relational Database Service (Amazon RDS) is a managed SQL database service provided by Amazon Web Services (AWS). Amazon RDS supports an array of database engines to store and organize data and helps with database management tasks, such as migration, backup, recovery, and patching.

aws.rds.BinLogDiskUsage

The amount of disk space occupied by binary logs on the master. Applies to MySQL read replicas.

For more information, refer to the Amazon Relational Database (RDS) documentation.

MetadataDescription
Metric TypeCounter
Value TypeByte
Segment ByCloudProvider
Default Time AggregationRate
Available Time Aggregation FormatsAvg, Rate, Sum, Min, Max
Default Group AggregationSum
Available Group Aggregation FormatsAvg, Sum, Min, Max

aws.rds.CPUUtilization

The percentage of CPU utilization.

For more information, refer to the Amazon Relational Database (RDS) documentation.

MetadataDescription
Metric TypeGauge
Value Type%
Segment ByCloudProvider
Default Time AggregationAverage
Available Time Aggregation FormatsAvg, Rate, Sum, Min, Max
Default Group AggregationAverage
Available Group Aggregation FormatsAvg, Sum, Min, Max

aws.rds.DatabaseConnections

The number of database connections in use.

For more information, refer to the Amazon Relational Database (RDS) documentation.

MetadataDescription
Metric TypeCounter
Value TypeInteger
Segment ByCloudProvider
Default Time AggregationRate
Available Time Aggregation FormatsAvg, Rate, Sum, Min, Max
Default Group AggregationSum
Available Group Aggregation FormatsAvg, Sum, Min, Max

aws.rds.DiskQueueDepth

The number of outstanding I/Os (read/write requests) waiting to access the disk.

For more information, refer to the Amazon Relational Database (RDS) documentation.

MetadataDescription
Metric TypeCounter
Value TypeInteger
Segment ByCloudProvider
Default Time AggregationRate
Available Time Aggregation FormatsAvg, Rate, Sum, Min, Max
Default Group AggregationSum
Available Group Aggregation FormatsAvg, Sum, Min, Max

aws.rds.FreeableMemory

The amount of available random access memory, in megabytes.

For more information, refer to the Amazon Relational Database (RDS) documentation.

MetadataDescription
Metric TypeGauge
Value TypeByte
Segment ByCloudProvider
Default Time AggregationRate
Available Time Aggregation FormatsAvg, Rate, Sum, Min, Max
Default Group AggregationAverage
Available Group Aggregation FormatsAvg, Sum, Min, Max

aws.rds.FreeStorageSpace

The amount of available storage space in bytes.

For more information, refer to the Amazon Relational Database (RDS) documentation.

MetadataDescription
Metric TypeGauge
Value TypeByte
Segment ByCloudProvider
Default Time AggregationRate
Available Time Aggregation FormatsAvg, Rate, Sum, Min, Max
Default Group AggregationAverage
Available Group Aggregation FormatsAvg, Sum, Min, Max

aws.rds.NetworkReceiveThroughput

The incoming (Receive) network traffic on the DB instance, including both customer database traffic and Amazon RDS traffic used for monitoring and replication. The metric is measured in bytes per second.

For more information, refer to the Amazon Relational Database (RDS) documentation.

MetadataDescription
Metric TypeCounter
Value TypeByte
Segment ByCloudProvider
Default Time AggregationRate
Available Time Aggregation FormatsAvg, Rate, Sum, Min, Max
Default Group AggregationSum
Available Group Aggregation FormatsAvg, Sum, Min, Max

aws.rds.NetworkTransmitThroughput

The outgoing (Transmit) network traffic on the DB instance, including both customer database traffic and Amazon RDS traffic used for monitoring and replication. The metric is measured in bytes per second.

For more information, refer to the Amazon Relational Database (RDS) documentation.

MetadataDescription
Metric TypeCounter
Value TypeByte
Segment ByCloudProvider
Default Time AggregationRate
Available Time Aggregation FormatsAvg, Rate, Sum, Min, Max
Default Group AggregationSum
Available Group Aggregation FormatsAvg, Sum, Min, Max

aws.rds.ReadIOPS

The average number of read I/O operations per second.

For more information, refer to the Amazon Relational Database (RDS) documentation.

MetadataDescription
Metric TypeCounter
Value TypeInteger
Segment ByCloudProvider
Default Time AggregationRate
Available Time Aggregation FormatsAvg, Rate, Sum, Min, Max
Default Group AggregationSum
Available Group Aggregation FormatsAvg, Sum, Min, Max

aws.rds.ReadLatency

The average amount of seconds taken per read I/O operation.

For more information, refer to the Amazon Relational Database (RDS) documentation.

MetadataDescription
Metric TypeCounter
Value TyperelativeTime
Segment ByCloudProvider
Default Time AggregationAverage
Available Time Aggregation FormatsAvg, Rate, Sum, Min, Max
Default Group AggregationAverage
Available Group Aggregation FormatsAvg, Sum, Min, Max

aws.rds.ReadThroughput

The average number of bytes read from disk per second.

For more information, refer to the Amazon Relational Database (RDS) documentation.

MetadataDescription
Metric TypeCounter
Value TypeInteger
Segment ByCloudProvider
Default Time AggregationRate
Available Time Aggregation FormatsAvg, Rate, Sum, Min, Max
Default Group AggregationSum
Available Group Aggregation FormatsAvg, Sum, Min, Max

aws.rds.ReplicaLag

The amount of time, in nanoseconds, a Read Replica DB instance lags behind the source DB instance.

This metric applies to MySQL read replicas.

For more information, refer to the Amazon Relational Database (RDS) documentation.

MetadataDescription
Metric TypeCounter
Value TyperelativeTime
Segment ByCloudProvider
Default Time AggregationRate
Available Time Aggregation FormatsAvg, Rate, Sum, Min, Max
Default Group AggregationSum
Available Group Aggregation FormatsAvg, Sum, Min, Max

aws.rds.SwapUsage

The amount of swap space used by the database, measured in megabytes.

For more information, refer to the Amazon Relational Database (RDS) documentation.

MetadataDescription
Metric TypeGauge
Value TypeByte
Segment ByCloudProvider
Default Time AggregationRate
Available Time Aggregation FormatsAvg, Rate, Sum, Min, Max
Default Group AggregationAverage
Available Group Aggregation FormatsAvg, Sum, Min, Max

aws.rds.WriteIOPS

The average number of write I/O operations per second.

For more information, refer to the Amazon Relational Database (RDS) documentation.

MetadataDescription
Metric TypeCounter
Value TypeInteger
Segment ByCloudProvider
Default Time AggregationRate
Available Time Aggregation FormatsAvg, Rate, Sum, Min, Max
Default Group AggregationSum
Available Group Aggregation FormatsAvg, Sum, Min, Max

aws.rds.WriteLatency

The average amount of time taken per write I/O operation.

For more information, refer to the Amazon Relational Database (RDS) documentation.

MetadataDescription
Metric TypeCounter
Value TyperelativeTime
Segment ByCloudProvider
Default Time AggregationAverage
Available Time Aggregation FormatsAvg, Rate, Sum, Min, Max
Default Group AggregationAverage
Available Group Aggregation FormatsAvg, Sum, Min, Max

aws.rds.WriteThroughput

The average number of bytes written to disk per second.

For more information, refer to the Amazon Relational Database (RDS) documentation.

MetadataDescription
Metric TypeCounter
Value TypeInteger
Segment ByCloudProvider
Default Time AggregationRate
Available Time Aggregation FormatsAvg, Rate, Sum, Min, Max
Default Group AggregationSum
Available Group Aggregation FormatsAvg, Sum, Min, Max

6.3.5.1.8 - Simple Queue Service (SQS)

Amazon Simple Queue Service (Amazon SQS) is a pay-per-use web service for storing messages in transit between computers. Developers use SQS to build distributed applications with decoupled components without having to deal with the overhead of creating and maintaining message queues.

Amazon Simple Queue Service (Amazon SQS) is a pay-per-use web service for storing messages in transit between computers. Developers use SQS to build distributed applications with decoupled components without having to deal with the overhead of creating and maintaining message queues. For more information, see Amazon SQS Resources.

aws.sqs.ApproximateNumberOfMessagesDelayed

The number of messages in the queue that are delayed or currently unavailable for reading. Messages are stuck like this when the queue is configured as a delay queue or when a message has been sent with a delay parameter.

MetadataDescription
Metric TypeCounter
Value TypeInteger
Segment ByCloudProvider
Default Time AggregationRate
Available Time Aggregation FormatsAvg, Rate, Sum, Min, Max
Default Group AggregationAvg
Available Group Aggregation FormatsAvg, Sum, Min, Max

aws.sqs.ApproximateNumberOfMessagesNotVisible

The number of undelivered messages. These messages are still in the queue, on their way to a client (in flight), but have not yet been deleted or have not yet reached the destination.

MetadataDescription
Metric TypeCounter
Value TypeInteger
Segment ByCloudProvider
Default Time AggregationRate
Available Time Aggregation FormatsAvg, Rate, Sum, Min, Max
Default Group AggregationAvg
Available Group Aggregation FormatsAvg, Sum, Min, Max

aws.sqs.ApproximateNumberOfMessagesVisible

The number of messages available for retrieval from the queue. These are the messages which have not yet been locked by an SQS worker.

MetadataDescription
Metric TypeCounter
Value TypeInteger
Segment ByCloudProvider
Default Time AggregationRate
Available Time Aggregation FormatsAvg, Rate, Sum, Min, Max
Default Group AggregationAvg
Available Group Aggregation FormatsAvg, Sum, Min, Max

aws.sqs.NumberOfEmptyReceives

The number of ReceiveMessage API calls that did not return a message. This metric is populated every 5 minutes.

MetadataDescription
Metric TypeCounter
Value TypeInteger
Segment ByCloudProvider
Default Time AggregationRate
Available Time Aggregation FormatsAvg, Rate, Sum, Min, Max
Default Group AggregationSum
Available Group Aggregation FormatsAvg, Sum, Min, Max

aws.sqs.NumberOfMessagesDeleted

The number of messages deleted from the queue. Amazon SQS considers every successful deletion that uses a valid receipt handle, including duplicate deletions, to generate the NumberOfMessagesDeleted metric. Therefore, this number could include duplicate deletions.

MetadataDescription
Metric TypeCounter
Value TypeInteger
Segment ByCloudProvider
Default Time AggregationRate
Available Time Aggregation FormatsAvg, Rate, Sum, Min, Max
Default Group AggregationSum
Available Group Aggregation FormatsAvg, Sum, Min, Max

aws.sqs.NumberOfMessagesReceived

The number of messages returned by calls to the ReceiveMessage API action.

MetadataDescription
Metric TypeCounter
Value TypeInteger
Segment ByCloudProvider
Default Time AggregationRate
Available Time Aggregation FormatsAvg, Rate, Sum, Min, Max
Default Group AggregationSum
Available Group Aggregation FormatsAvg, Sum, Min, Max

aws.sqs.NumberOfMessagesSent

The number of messages added to a queue.

MetadataDescription
Metric TypeCounter
Value TypeInteger
Segment ByCloudProvider
Default Time AggregationRate
Available Time Aggregation FormatsAvg, Rate, Sum, Min, Max
Default Group AggregationSum
Available Group Aggregation FormatsAvg, Sum, Min, Max

aws.sqs.SentMessageSize

The size of messages in bytes added to a queue. The SentMessageSize does not display as an available metric in the CloudWatch console until at least one message is sent to the corresponding queue.

MetadataDescription
Metric TypeCounter
Value TypeInteger
Segment ByCloudProvider
Default Time AggregationRate
Available Time Aggregation FormatsAvg, Rate, Sum, Min, Max
Default Group AggregationSum
Available Group Aggregation FormatsAvg, Sum, Min, Max

6.3.6 - Deprecated Metrics and Labels

Below is the list of metrics and labels that are discontinued with the introduction of new metric store. We made an effort to not deprecate any metrics or labels that are used in existing alerts, but in case you encounter any issues, contact Sysdig Support.

We have applied automatic mapping of all net.*.request.time.worst metrics to net.*.request.time, because the maximum aggregation gives equivalent results and it was almost exclusively used in combination with these metrics.

Deprecated Metrics

The following metrics are no longer supported.

  • net.request.time.file
  • net.request.time.file.percent
  • net.request.time.local
  • net.request.time.local.percent
  • net.request.time.net
  • net.request.time.net.percent
  • net.request.time.nextTiers
  • net.request.time.nextTiers.percent
  • net.request.time.processing
  • net.request.time.processing.percent
  • net.request.time.worst.in
  • net.request.time.worst.out
  • net.incomplete.connection.count.total
  • net.http.request.time.worst
  • net.mongodb.request.time.worst
  • net.sql.request.time.worst
  • net.link.clientServer.bytes
  • net.link.delay.perRequest
  • net.link.serverClient.bytes
  • capacity.estimated.request.stolen.count
  • capacity.estimated.request.total.count
  • capacity.stolen.percent
  • capacity.total.percent
  • capacity.used.percent

Deprecated Labels

The following labels are no longer supported:

  • net.connection.client
  • net.connection.client.pid
  • net.connection.direction
  • net.connection.endpoint.tcp
  • net.connection.udp.inverted
  • net.connection.errorCode
  • net.connection.l4proto
  • net.connection.server
  • net.connection.server.pid
  • net.connection.state
  • net.role
  • cloudProvider.resource.endPoint
  • host.container.mappings
  • host.ip.all
  • host.ip.private
  • host.ip.public
  • host.server.port
  • host.isClientServer
  • host.isInstrumented
  • host.isInternal
  • host.procList.main
  • proc.id
  • proc.name.client
  • proc.name.server
  • program.environment
  • program.usernames
  • mesos_cluster
  • mesos_node
  • mesos_pid

In addition to this list, the composite labels ending with ‘.label’ string will no longer be supported. For example kubernetes.service.label will be deprecated, but kubernetes.service.label.* labels are supported.

6.3.7 - File

Note: Sysdig follows the Prometheus-compabtible naming convention for both metrics and labels as opposed to the previous statsd-compatible, legacy Sysdig naming convention. However, this page still shows metrics in the legacy Sysdig naming convention. Until this page is updated, see Metrics and Label Mapping for the mapping between Sysdig legacy and Prometheus naming conventions.

file.bytes.in

The number of bytes read from the file. By default, this metric displays the total value for the defined scope. For example, if the scope is set to a group of machines, the metric value will be the total value for the whole group.

MetadataDescription
Metric TypeCounter
Value TypeByte
Segment ByHost, Container, Process, Kubernetes, Mesos, Swarm, CloudProvider
Default Time AggregationRate
Available Time Aggregation FormatsAvg, Rate, Sum, Min, Max
Default Group AggregationSum
Available Group Aggregation FormatsAvg, Sum, Min, Max

file.bytes.out

The number of bytes written to the file. By default, this metric displays the total value for the defined scope. For example, if the scope is set to a group of machines, the metric value will be the total value for the whole group.

MetadataDescription
Metric TypeCounter
Value TypeByte
Segment ByHost, Container, Process, Kubernetes, Mesos, Swarm, CloudProvider
Default Time AggregationRate
Available Time Aggregation FormatsAvg, Rate, Sum, Min, Max
Default Group AggregationSum
Available Group Aggregation FormatsAvg, Sum, Min, Max

file.bytes.total

The total number of bytes written to, and read from, the file. By default, this metric displays the total value for the defined scope. For example, if the scope is set to a group of machines, the metric value will be the total value for the whole group.

MetadataDescription
Metric TypeCounter
Value TypeByte
Segment ByHost, Container, Process, Kubernetes, Mesos, Swarm, CloudProvider
Default Time AggregationRate
Available Time Aggregation FormatsAvg, Rate, Sum, Min, Max
Default Group AggregationSum
Available Group Aggregation FormatsAvg, Sum, Min, Max

file.error.open.count

The number of errors that occurred when opening files. By default, this metric displays the total value for the defined scope. For example, if the scope is set to a group of machines, the metric value will be the total value for the whole group.

MetadataDescription
Metric TypeCounter
Value TypeInteger
Segment ByHost, Container, Process, Kubernetes, Mesos, Swarm, CloudProvider
Default Time AggregationRate
Available Time Aggregation FormatsAvg, Rate, Sum, Min, Max
Default Group AggregationSum
Available Group Aggregation FormatsAvg, Sum, Min, Max

file.error.total.count

The number of errors encountered by file system calls, such as open(), close(), and create(). By default, this metric displays the total value for the defined scope. For example, if the scope is defined as a group of machines, the metric value will be the total value for the whole group.

MetadataDescription
Metric TypeCounter
Value TypeInteger
Segment ByHost, Container, Process, Kubernetes, Mesos, Swarm, CloudProvider
Default Time AggregationRate
Available Time Aggregation FormatsAvg, Rate, Sum, Min, Max
Default Group AggregationSum
Available Group Aggregation FormatsAvg, Sum, Min, Max

file.iops.in

The number of file read operations per second. This metric is calculated by measuring the actual number of read requests made by a process. By default, this metric displays the total value for the defined scope. For example, if the scope is set to a group of machines, the metric value will be the total value for the whole group.

The value of file.iops.in can differ from the value other tools show, as they are usually based on interpolating this value from the number of bytes read and written to the file system.

MetadataDescription
Metric TypeCounter
Value TypeInteger
Segment ByHost, Container, Process, Kubernetes, Mesos, Swarm, CloudProvider
Default Time AggregationRate
Available Time Aggregation FormatsAvg, Rate, Sum, Min, Max
Default Group AggregationSum
Available Group Aggregation FormatsAvg, Sum, Min, Max

file.iops.out

The number of file write operations per second. This metric is calculated by measuring the actual number of write requests made by a process. By default, this metric displays the total value for the defined scope. For example, if the scope is set to a group of machines, the metric value will be the total value for the whole group.

The value of file.iops.out can differ from the value other tools show, as they are usually based on interpolating this value from the number of bytes read and written to the file system.

MetadataDescription
Metric TypeCounter
Value TypeInteger
Segment ByHost, Container, Process, Kubernetes, Mesos, Swarm, CloudProvider
Default Time AggregationRate
Available Time Aggregation FormatsAvg, Rate, Sum, Min, Max
Default Group AggregationSum
Available Group Aggregation FormatsAvg, Sum, Min, Max

file.iops.total

The number of file read and write operations per second. This metric is calculated by measuring the actual number of read/write requests made by a process. By default, this metric displays the total value for the defined scope. For example, if the scope is set to a group of machines, the metric value will be the total value for the whole group.

The value of file.iops.total can differ from the value other tools show, as they are usually based on interpolating this value from the number of bytes read and written to the file system.

MetadataDescription
Metric TypeCounter
Value TypeInteger
Segment ByHost, Container, Process, Kubernetes, Mesos, Swarm, CloudProvider
Default Time AggregationRate
Available Time Aggregation FormatsAvg, Rate, Sum, Min, Max
Default Group AggregationSum
Available Group Aggregation FormatsAvg, Sum, Min, Max

file.name

The name of the file.

MetadataDescription
Metric TypeGauge
Value TypeString
Segment ByHost
Default Time AggregationN/A
Available Time Aggregation FormatsN/A
Default Group AggregationN/A
Available Group Aggregation FormatsN/A

file.open.count

The number of times the file has been opened.

MetadataDescription
Metric TypeCounter
Value TypeInteger
Segment ByHost, Container, Process, Kubernetes, Mesos, Swarm, CloudProvider
Default Time AggregationRate
Available Time Aggregation FormatsAvg, Rate, Sum, Min, Max
Default Group AggregationSum
Available Group Aggregation FormatsAvg, Sum, Min, Max

file.time.in

The time spent reading the file. By default, this metric displays the total value for the defined scope. For example, if the scope is set to a group of machines, the metric value will be the total value for the whole group.

MetadataDescription
Metric TypeCounter
Value TyperelativeTime
Segment ByHost, Container, Process, Kubernetes, Mesos, Swarm, CloudProvider
Default Time AggregationAverage
Available Time Aggregation FormatsAvg, Rate, Sum, Min, Max
Default Group AggregationAverage
Available Group Aggregation FormatsAvg, Sum, Min, Max

file.time.out

The time spent writing in the file. By default, this metric displays the total value for the defined scope. For example, if the scope is set to a group of machines, the metric value will be the total value for the whole group.

MetadataDescription
Metric TypeCounter
Value TyperelativeTime
Segment ByHost, Container, Process, Kubernetes, Mesos, Swarm, CloudProvider
Default Time AggregationAverage
Available Time Aggregation FormatsAvg, Rate, Sum, Min, Max
Default Group AggregationAverage
Available Group Aggregation FormatsAvg, Sum, Min, Max

file.time.total

The time spent during file I/O. By default, this metric displays the total value for the defined scope. For example, if the scope is set to a group of machines, the metric value will be the total value for the whole group.

MetadataDescription
Metric TypeCounter
Value TyperelativeTime
Segment ByHost, Container, Process, Kubernetes, Mesos, Swarm, CloudProvider
Default Time AggregationAverage
Available Time Aggregation FormatsAvg, Rate, Sum, Min, Max
Default Group AggregationAverage
Available Group Aggregation FormatsAvg, Sum, Min, Max

6.3.8 - File System

Note: Sysdig follows the Prometheus-compabtible naming convention for both metrics and labels as opposed to the previous statsd-compatible, legacy Sysdig naming convention. However, this page still shows metrics in the legacy Sysdig naming convention. Until this page is updated, see Metrics and Label Mapping for the mapping between Sysdig legacy and Prometheus naming conventions.

fs.used.percent

Specifies what percentage of the file system has been used.

Metadata

Description

Metric Type

Gauge

Value Type

Percent

Scope

Host, Container

Segment By

agent.tag

cloudProvider.account.id

cloudProvider.availabilityZone

cloudProvider.region

cloudProvider.tag

container.id

container.image

container.name

ecs.clusterName

ecs.serviceName

ecs.taskFamilyName

fs.device

fs.mountDir

fs.type

host.hostName

host.mac

Default Time Aggregation

Average

Available Time Aggregation Formats

Average, Rate, Sum, Minimum, Maximum

Default Group Aggregation

Average

Available Group Aggregation Formats

Average, Sum, Minimum, Maximum

fs.free.percent

Specifies what percentage of the file system is free.

Metadata

Description

Metric Type

Gauge

Value Type

Percent

Scope

Host, Container

Segment By

agent.tag

cloudProvider.account.id

cloudProvider.availabilityZone

cloudProvider.region

cloudProvider.tag

container.id

container.image

container.name

ecs.clusterName

ecs.serviceName

ecs.taskFamilyName

fs.device

fs.mountDir

fs.type

host.hostName

host.mac

Default Time Aggregation

Average

Available Time Aggregation Formats

Average, Rate, Sum, Minimum, Maximum

Default Group Aggregation

Average

Available Group Aggregation Formats

Average, Sum, Minimum, Maximum

fs.bytes.free

The number of bytes free in the file system.

Metadata

Description

Metric Type

gauge

Value Type

Byte

Scope

Host, Container

Segment By

agent.tag

cloudProvider.account.id

cloudProvider.availabilityZone

cloudProvider.region

cloudProvider.tag

container.id

container.image

container.name

ecs.clusterName

ecs.serviceName

ecs.taskFamilyName

fs.device

fs.mountDir

fs.type

host.hostName

host.mac

Default Time Aggregation

Average

Available Time Aggregation Formats

Average, Rate, Sum, Minimum, Maximum

Default Group Aggregation

Average

Available Group Aggregation Formats

Average, Sum, Minimum, Maximum

fs.bytes.used

The number of bytes used in the file system.

Metadata

Description

Metric Type

Gauge

Value Type

Byte

Scope

Host, Container

Segment By

agent.tag

cloudProvider.account.id

cloudProvider.availabilityZone

cloudProvider.region

cloudProvider.tag

container.id

container.image

container.name

ecs.clusterName

ecs.serviceName

ecs.taskFamilyName

fs.device

fs.mountDir

fs.type

host.hostName

host.mac

Default Time Aggregation

Average

Available Time Aggregation Formats

Average, Rate, Sum, Minimum, Maximum

Default Group Aggregation

Average

Available Group Aggregation Formats

Average, Sum, Minimum, Maximum

fs.bytes.total

The size of the file system.

Metadata

Description

Metric Type

Gauge

Value Type

Byte

Scope

Host, Container

Segment By

agent.tag

cloudProvider.account.id

cloudProvider.availabilityZone

cloudProvider.region

cloudProvider.tag

container.id

container.image

container.name

ecs.clusterName

ecs.serviceName

ecs.taskFamilyName

fs.device

fs.mountDir

fs.type

host.hostName

host.mac

Default Time Aggregation

Average

Available Time Aggregation Formats

Average, Rate, Sum, Minimum, Maximum

Default Group Aggregation

Average

Available Group Aggregation Formats

Average, Rate, Sum, Minimum, Maximum

fs.inodes.total.count

The number of inodes in the file system.

Metadata

Description

Metric Type

Gauge

Value Type

Integer

Scope

Host, Container

Segment By

agent.tag

cloudProvider.account.id

cloudProvider.availabilityZone

cloudProvider.region

cloudProvider.tag

container.id

container.image

container.name

ecs.clusterName

ecs.serviceName

ecs.taskFamilyName

fs.device

fs.mountDir

fs.type

host.hostName

host.mac

Default Time Aggregation

Average

Available Time Aggregation Formats

Average, Rate, Sum, Minimum, Maximum

Default Group Aggregation

Average

Available Group Aggregation Formats

Average, Sum, Minimum, Maximum

fs.inodes.used.count

The number of inodes used in the file system.

Metadata

Description

Metric Type

Gauge

Value Type

Integer

Scope

Host, Container

Segment By

agent.tag

cloudProvider.account.id

cloudProvider.availabilityZone

cloudProvider.region

cloudProvider.tag

container.id

container.image

container.name

ecs.clusterName

ecs.serviceName

ecs.taskFamilyName

fs.device

fs.mountDir

fs.type

host.hostName

host.mac

Default Time Aggregation

Average

Available Time Aggregation Formats

Average, Rate, Sum, Minimum, Maximum

Default Group Aggregation

Average

Available Group Aggregation Formats

Average, Sum, Minimum, Maximum

fs.inodes.used.percent

Percentage of filesystem inodes usage.

Metadata

Description

Metric Type

Gauge

Value Type

Percent

Scope

Host, Container

Segment By

agent.tag

cloudProvider.account.id

cloudProvider.availabilityZone

cloudProvider.region

cloudProvider.tag

container.id

container.image

container.name

ecs.clusterName

ecs.serviceName

ecs.taskFamilyName

fs.device

fs.mountDir

fs.type

host.hostName

host.mac

Default Time Aggregation

Average

Available Time Aggregation Formats

Average, Rate, Sum, Minimum, Maximum

Default Group Aggregation

Average

Available Group Aggregation Formats

Average, Sum, Minimum, Maximum

fs.root.used.percent

Percentage of root filesystem usage.

Metadata

Description

Metric Type

Gauge

Value Type

Percent

Scope

Host, Container

Segment By

agent.tag

cloudProvider.account.id

cloudProvider.availabilityZone

cloudProvider.region

cloudProvider.tag

container.id

container.image

container.name

ecs.clusterName

ecs.serviceName

ecs.taskFamilyName

fs.device

fs.mountDir

fs.type

host.hostName

host.mac

Default Time Aggregation

Average

Available Time Aggregation Formats

Average, Rate, Sum, Minimum, Maximum

Default Group Aggregation

Average

Available Group Aggregation Formats

Average, Sum, Minimum, Maximum

fs.largest.used.percent

Percentage of the largest filesystem.

Metadata

Description

Metric Type

Gauge

Value Type

Percent

Scope

Host, Container

Segment By

agent.tag

cloudProvider.account.id

cloudProvider.availabilityZone

cloudProvider.region

cloudProvider.tag

container.id

container.image

container.name

ecs.clusterName

ecs.serviceName

ecs.taskFamilyName

fs.device

fs.mountDir

fs.type

host.hostName

host.mac

Default Time Aggregation

Average

Available Time Aggregation Formats

Average, Rate, Sum, Minimum, Maximum

Default Group Aggregation

Average

Available Group Aggregation Formats

Average, Sum, Minimum, Maximum

6.3.9 - Host

Note: Sysdig follows the Prometheus-compabtible naming convention for both metrics and labels as opposed to the previous statsd-compatible, legacy Sysdig naming convention. However, this page still shows metrics in the legacy Sysdig naming convention. Until this page is updated, see Metrics and Label Mapping for the mapping between Sysdig legacy and Prometheus naming conventions.

agent.id

MetadataDescription
Metric TypeGauge
Value TypeString
Segment ByHost
Default Time AggregationN/A
Available Time Aggregation FormatsN/A
Default Group AggregationN/A
Available Group Aggregation FormatsN/A

agent.mode

For more information on agent modes, see Configure Agent Modes.

MetadataDescription
Metric TypeString
Value TypeString
Segment ByHost
Default Time Aggregationconcat
Available Time Aggregation Formatsconcat, distinct, count
Default Group Aggregationconcat
Available Group Aggregation Formatsconcat, distinct, count

agent.version

MetadataDescription
Metric TypeGauge
Value TypeString
Segment ByHost
Default Time AggregationN/A
Available Time Aggregation FormatsN/A
Default Group AggregationN/A
Available Group Aggregation FormatsN/A

cpu.core

MetadataDescription
Metric TypeGauge
Value TypeString
Segment ByHost
Default Time AggregationN/A
Available Time Aggregation FormatsN/A
Default Group AggregationN/A
Available Group Aggregation FormatsN/A

host.container.mappings

MetadataDescription
Metric TypeGauge
Value TypeString
Segment ByHost
Default Time AggregationN/A
Available Time Aggregation FormatsN/A
Default Group AggregationN/A
Available Group Aggregation FormatsN/A

host.count

MetadataDescription
Metric TypeGauge
Value TypeInteger
Segment ByHost, CloudProvider
Default Time AggregationRate
Available Time Aggregation FormatsAvg, Rate, Sum, Min, Max
Default Group AggregationSum
Available Group Aggregation FormatsAvg, Sum, Min, Max

host.domain

MetadataDescription
Metric TypeGauge
Value TypeString
Segment ByHost
Default Time AggregationN/A
Available Time Aggregation FormatsN/A
Default Group AggregationN/A
Available Group Aggregation FormatsN/A

host.hostName

MetadataDescription
Metric TypeGauge
Value TypeString
Segment ByHost
Default Time AggregationN/A
Available Time Aggregation FormatsN/A
Default Group AggregationN/A
Available Group Aggregation FormatsN/A

host.ip.all

MetadataDescription
Metric TypeGauge
Value TypeString
Segment ByHost
Default Time AggregationN/A
Available Time Aggregation FormatsN/A
Default Group AggregationN/A
Available Group Aggregation FormatsN/A

host.ip.private

MetadataDescription
Metric TypeGauge
Value TypeString
Segment ByHost
Default Time AggregationN/A
Available Time Aggregation FormatsN/A
Default Group AggregationN/A
Available Group Aggregation FormatsN/A

host.ip.public

MetadataDescription
Metric TypeGauge
Value TypeString
Segment ByHost
Default Time AggregationN/A
Available Time Aggregation FormatsN/A
Default Group AggregationN/A
Available Group Aggregation FormatsN/A

host.isClientServer

MetadataDescription
Metric TypeGauge
Value TypeString
Segment ByHost
Default Time AggregationN/A
Available Time Aggregation FormatsN/A
Default Group AggregationN/A
Available Group Aggregation FormatsN/A

host.isInstrumented

MetadataDescription
Metric TypeGauge
Value TypeString
Segment ByHost
Default Time AggregationN/A
Available Time Aggregation FormatsN/A
Default Group AggregationN/A
Available Group Aggregation FormatsN/A

host.isInternal

MetadataDescription
Metric TypeGauge
Value TypeString
Segment ByHost
Default Time AggregationN/A
Available Time Aggregation FormatsN/A
Default Group AggregationN/A
Available Group Aggregation FormatsN/A

host.mac

MetadataDescription
Metric TypeGauge
Value TypeString
Segment ByHost
Default Time AggregationN/A
Available Time Aggregation FormatsN/A
Default Group AggregationN/A
Available Group Aggregation FormatsN/A

host.procList.main

MetadataDescription
Metric TypeGauge
Value TypeString
Segment ByHost
Default Time AggregationN/A
Available Time Aggregation FormatsN/A
Default Group AggregationN/A
Available Group Aggregation FormatsN/A

host.uname

host.uname provides the following system information:

  • kernel name

  • kernel release number

  • kernel version

  • machine hardware name

Agents send this metric along with a number of labels that map with the uname information. host.uname is supported on agent versions 10.1 and above.

Metrics Details

MetadataDescription
Metric TypeGauge
Value TypeInteger
Segment BySee Segmentation Details.
Default Time AggregationAverage
Available Time AggregationAverage, Rate, Sum, Min, Max, Rate of Change
Default Group AggregationAverage
Available Group RollupAverage, Sum, Min, Max

Segmentation Details

The labels are given below:

LabelDescriptionMapping to the uname toolingExample
host.uname.kernel.nameThe kernel nameuname -sLinux
host.uname.kernel.releaseThe kernel releaseuname -r5.4.0-31-generic
host.uname.kernel.versionThe kernel versionuname -v#35-Ubuntu SMP Thu May 7 20:20:34 UTC 2020
host.machineThe hardware name of the machineuname -mx86_64

Example: Kernel Versions in the Infrastructure

The image depicts host.uname being segmented by host.uname.kernel.version. The resulting dashboard gives the distribution of kernel versions in the infrastructure.

Count Limits StasD Metrics

The count limits metrics report the upper limit of the number of metrics of the same type. The values the metrics report can be changed by modifying the dragent.yaml file.

Metric NameConfiguration Parameter in the dragent.yaml fileDefault Value
metricCount.limit.appCheckapp_checks_limit500
metricCount.limit.statsdstatsd.limit100
metricCount.limit.jmxjmx.limit500
metricCount.limit.prometheusprometheus.max+metrics3000

metricCount.appCheck

MetadataDescription
Metric TypeGauge
Value TypeString
Segment ByHost
Default Time AggregationN/A
Available Time Aggregation FormatsN/A
Default Group AggregationN/A
Available Group Aggregation FormatsN/A

metricCount.jmx

MetadataDescription
Metric TypeGauge
Value TypeString
Segment ByHost
Default Time AggregationN/A
Available Time Aggregation FormatsN/A
Default Group AggregationN/A
Available Group Aggregation FormatsN/A

metricCount.statsd

MetadataDescription
Metric TypeGauge
Value TypeString
Segment ByHost
Default Time AggregationN/A
Available Time Aggregation FormatsN/A
Default Group AggregationN/A
Available Group Aggregation FormatsN/A

metricCount.prometheus

MetadataDescription
Metric TypeGauge
Value TypeString
Segment ByHost
Default Time AggregationN/A
Available Time Aggregation FormatsN/A
Default Group AggregationN/A
Available Group Aggregation FormatsN/A

6.3.10 - JVM

Note: Sysdig follows the Prometheus-compabtible naming convention for both metrics and labels as opposed to the previous statsd-compatible, legacy Sysdig naming convention. However, this page still shows metrics in the legacy Sysdig naming convention. Until this page is updated, see Metrics and Label Mapping for the mapping between Sysdig legacy and Prometheus naming conventions.

jvm.class.loaded

The number of classes currently loaded in the JVM. By default, this metric shows the total value of the selected scope. For example, if applied to a group of machines, the value will be the total value for the whole group.

jvm.class.unloaded

jvm.gc.ConcurrentMarkSweep.count

The number of times the Concurrent Mark-Sweep garbage collector has run.

jvm.gc.ConcurrentMarkSweep.time

The total time the Concurrent Mark-Sweep garbage collector has run.

jvm.gc.Copy.count

jvm.gc.Copy.time

jvm.gc.G1_Old_Generation.count

jvm.gc.G1_Old_Generation.time

jvm.gc.G1_Young_Generation.count

jvm.gc.G1_Young_Generation.time

jvm.gc.global.time

The total time the garbage collection has run.

jvm.gc.MarkSweepCompact.count

jvm.gc.MarkSweepCompact.time

jvm.gc.PS_MarkSweep.count

The number of times the parallel scavenge Mark-Sweep old generation garbage collector has run.

jvm.gc.PS_MarkSweep.time

The total time the parallel scavenge Mark-Sweep old generation garbage collector has run.

jvm.gc.PS_Scavenge.count

The number of times the parallel eden/survivor space garbage collector has run.

jvm.gc.PS_Scavenge.time

The total time the parallel eden/survivor space garbage collector has run.

jvm.gc.ParNew.count

The number of times the parallel garbage collector has run.

jvm.gc.ParNew.time

The total time the parallel garbage collector has run.

jvm.gc.scavenge.time

The total time the scavenge collector has run.

jvm.heap.committed

The amount of memory that is currently allocated to the JVM for heap memory. Heap memory is the storage area for Java objects. By default, this metric shows the total value of the selected scope. For example, if applied to a group of machines, the value will be the total value for the whole group.

The JVM may release memory to the system and Heap Committed could decrease below Heap Init; but Heap Committed can never increase above Heap Max.

jvm.heap.init

The initial amount of memory that the JVM requests from the operating system for heap memory during startup (defined by the –Xms option).The value of Heap Init may be undefined. By default, this metric shows the total value of the selected scope. For example, if applied to a group of machines, the value will be the total value for the whole group.

The JVM may request additional memory from the operating system and may also release memory to the system over time.

jvm.heap.max

The maximum size allocation of heap memory for the JVM (defined by the –Xmx option). By default, this metric shows the total value of the selected scope. For example, if applied to a group of machines, the value will be the total value for the whole group.

Any memory allocation attempt that would exceed this limit will cause an OutOfMemoryError exception to be thrown.

jvm.heap.used

The amount of allocated heap memory (ie Heap Committed) currently in use. The number of classes currently loaded in the JVM. By default, this metric shows the total value of the selected scope. For example, if applied to a group of machines, the value will be the total value for the whole group.

Heap memory is the storage area for Java objects.

An object in the heap that is referenced by another object is ’live’, and will remain in the heap as long as it continues to be referenced. Objects that are no longer referenced are garbage and will be cleared out of the heap to reclaim space.

jvm.heap.used.percent

The ratio between Heap Used and Heap Committed. By default, this metric shows the total value of the selected scope. For example, if applied to a group of machines, the value will be the total value for the whole group.

jvm.nonHeap.committed

The amount of memory that is currently allocated to the JVM for non-heap memory. By default, this metric shows the total value of the selected scope. For example, if applied to a group of machines, the value will be the total value for the whole group.

Non-heap memory is used by Java to store loaded classes and other meta-data.

The JVM may release memory to the system and Non-Heap Committed could decrease below Non-Heap Init; but Non-Heap Committed can never increase above Non-Heap Max.

jvm.nonHeap.init

The initial amount of memory that the JVM requests from the operating system for non-heap memory during startup. By default, this metric shows the total value of the selected scope. For example, if applied to a group of machines, the value will be the total value for the whole group.

The value of Non-Heap Init may be undefined.

The JVM may request additional memory from the operating system and may also release memory to the system over time.

jvm.nonHeap.max

The maximum size allocation of non-heap memory for the JVM. This memory is used by Java to store loaded classes and other meta-data. By default, this metric shows the total value of the selected scope. For example, if applied to a group of machines, the value will be the total value for the whole group.

jvm.nonHeap.used

The amount of allocated non-heap memory (Non-Heap Committed) currently in use. By default, this metric shows the total value of the selected scope. For example, if applied to a group of machines, the value will be the total value for the whole group.

Non-heap memory is used by Java to store loaded classes and other meta-data.

jvm.nonHeap.used.percent

The ratio between Non-Heap Used and Non-Heap Committed. By default, this metric shows the total value of the selected scope. For example, if applied to a group of machines, the value will be the total value for the whole group.

jvm.thread.count

The current number of live daemon and non-daemon threads. By default, this metric shows the total value of the selected scope. For example, if applied to a group of machines, the value will be the total value for the whole group.

jvm.thread.daemon

The current number of live daemon threads. By default, this metric shows the total value of the selected scope. For example, if applied to a group of machines, the value will be the total value for the whole group.

Daemon threads are used for background supporting tasks and are only needed while normal threads are executing.

6.3.11 - Prometheus Metrics Types

Sysdig Monitor transforms Prometheus metrics into usable, actionable entries in two ways:

Calculated Metrics

The Prometheus metrics that are scraped by the Sysdig agent and transformed into the traditional StatsD model are called calculated metrics. In calculated metrics, the delta is stored with the previous value. This delta is what Sysdig uses on the classic backend for metrics analyzing and visualization. While generating the calculated metrics, the gauge metrics are kept as they are, but the counter metrics are transformed.

Prometheus calculated metrics cannot be used in PromQL.

The Histogram and Summary metrics are transformed into a different format called Prometheus histogram and summary metrics respectively. The transformations include:

  • All of the quantiles are transformed into a different metric, with the quantile added as a suffix.

  • The count and sum of these summary metrics are exposed as different metrics with names slightly changed. _ (underscore) in the name is replaced with a period .. For more information, see Mapping Classic Metrics and PromQL Metrics.

Prometheus calculated metrics (legacy metrics) are scheduled to be deprecated in the coming months.

Raw Metrics

In Sysdig parlance, the Prometheus metrics that are scraped (by the Sysdig agent), collected, sent, stored, visualized, and presented exactly as Prometheus exposes them are called raw metrics. Raw metrics are used with PromQL.

Sysdig counter is a StatsD type counter, where the difference in value is kept, but not the raw value of the counter, whereas Prometheus raw metrics are counters that are always monotonically increasing. A rate function needs to be applied on Prometheus raw metrics to make sense of it.

Time Aggregations Over Prometheus Metrics

The following time aggregations are supported for both the metric types:

  • Average: Returns an average of a set of data points, keeping all the labels.

  • Maximum and Minimum: Returns a maximal or minimal value, keeping all the labels.

  • Sum: Returns a sum of the values of data points, keeping all the labels.

  • Rate (timeAvg): Returns a sum of changes to the counter across data points in a given time period and divides by time, keeping all the labels as they are. For Prometheus raw metrics, timeAvg is calculated by taking the difference and dividing it by time.

Prometheus Calculated Metrics

Prometheus calculated metrics are treated as gauges by Sysdig, and there the following time aggregations are available:

  • Average

  • Sum

  • Minimum

  • Maximum

Rate (timeAvg) is not available because they are not applicable to gauge metrics.

Prometheus Raw Metrics

For the gauge type, the following types are available:

  • Average

  • Minimum

  • Maximum

For the counter type, the following types are available:

  • Rate: Calculates the first derivative of the counter (change over time).

  • Sum: Calculates a complete change of the counter over a period of time.

6.3.12 - Kubernetes

Note: Sysdig follows the Prometheus-compabtible naming convention for both metrics and labels as opposed to the previous statsd-compatible, legacy Sysdig naming convention. However, this page still shows metrics in the legacy Sysdig naming convention. Until this page is updated, see Metrics and Label Mapping for the mapping between Sysdig legacy and Prometheus naming conventions.

Contents

6.3.12.1 - Kubernetes State

kubernetes.hpa.replicas.min

The lower limit for the number of pods that can be set by the Horizontal Pod Autoscaler. The default value is 1.

The lower limit determines the minimum number of replicas that the autoscaler can periodically adjust in a replication controller or deployment to the target specified by the user in order to match the observed average CPU utilization.

Metric Type: Gauge

Segmented by:

  • kubernetes.hpa.name

  • kubernetes.cluster.id

  • kubernetes.cluster.name

  • kubernetes.namespace.name

kubernetes.hpa.replicas.max

The upper limit for the number of pods that can be set by the Horizontal Pod Autoscaler. This value cannot be smaller than that of kubernetes.hpa.replicas.min.

The upper limit determines the maximum number of replicas that the autoscaler can periodically adjust in a replication controller or deployment to the target specified by the user in order to match the observed average CPU utilization .

Metric Type: Gauge

Segmented by:

  • kubernetes.hpa.name

  • kubernetes.cluster.id

  • kubernetes.cluster.name

  • kubernetes.namespace.name

kubernetes.hpa.replicas.current

The current number of replicas of pods managed by the Horizontal Pod Autoscaler.

Metric Type: Gauge

Segmented by:

  • kubernetes.hpa.name

  • kubernetes.cluster.id

  • kubernetes.cluster.name

  • kubernetes.namespace.name

kubernetes.hpa.replicas.desired

The desired number of replicas of pods managed by the Horizontal Pod Autoscaler.

Metric Type: Gauge

Segmented by:

  • kubernetes.hpa.name

  • kubernetes.cluster.id

  • kubernetes.cluster.name

  • kubernetes.namespace.name

kubernetes.resourcequota.configmaps.hard

The number of config maps that can be created in each Kubernetes namespace.

Metric Type: Gauge - Integer

kubernetes.resourcequota.configmaps.used

The current number of config maps in each Kubernetes namespace.

Metric Type: Gauge - Integer

kubernetes.resourcequota.limits.cpu.hard

The total CPU limit across all pods in a non-terminal state in the cluster, determined by adding each pod’s CPU limit together.

Metric Type: Gauge - Integer

kubernetes.resourcequota.limits.cpu.used

The current amount of CPU used across all cluster pods in a non-terminal state.

Metric Type: Gauge - Integer

kubernetes.resourcequota.limits.memory.hard

The total memory limit across all cluster pods in a non-terminal state.

Metric Type: Gauge - Integer

kubernetes.resourcequota.limits.memory.used

The current amount of memory used across all cluster pods in a non-terminal state.

Metric Type: Gauge - Integer

kubernetes.resourcequota.persistentvolumeclaims.hard

The maximum number of persistent volume claims that can exist in the Kubernetes namespace.

Metric Type: Gauge - Integer

kubernetes.resourcequota.persistentvolumeclaims.used

The current number of persistent volume claims that exist in the Kubernetes namespace.

Metric Type: Gauge - Integer

kubernetes.resourcequota.cpu.hard

The maximum number of CPU cores assigned in the namespace or at the resource quota scope level. Across all the pods in a non-terminal state, the sum of CPU requests cannot exceed this value.

Metric Type: Gauge - Integer

Segmented by:

  • kubernetes.cluster

  • kubernetes.namespace

  • kubernetes.resourcequota

kubernetes.resourcequota.memory.hard

The maximum memory assigned in the namespace or at the resource quota scope level. Across all the pods in a non-terminal state, the sum of memory requests cannot exceed this value

Metric Type: Gauge - Integer

Segmented by:

  • kubernetes.cluster

  • kubernetes.namespace

  • kubernetes.resourcequota

kubernetes.resourcequota.pods.hard

The maximum number of pods in a non-terminal state that can exist in the Kubernetes namespace.

Metric Type: Gauge - Integer

kubernetes.resourcequota.pods.used

The current number of pods in a non-terminal state that exists in the Kubernetes namespace.

Metric Type: Gauge - Integer

kubernetes.resourcequota.replicationcontrollers.hard

The maximum number of replication controllers that can exist in the Kubernetes namespace.

Metric Type: Gauge - Integer

kubernetes.resourcequota.replicationcontrollers.used

The current number of replication controllers that can exist in the Kubernetes namespace.

Metric Type: Gauge - Integer

kubernetes.resourcequota.requests.cpu.hard

The maximum number of CPU requests allowed across all cluster pods in a non-terminal state.

Metric Type: Gauge - Integer

kubernetes.resourcequota.requests.cpu.used

The current number of CPU requests across all cluster pods in a non-terminal state.

Metric Type: Gauge - Integer

kubernetes.resourcequota.requests.memory.hard

The maximum number of memory requests allowed across all cluster pods in a non-terminal state.

Metric Type: Gauge - Integer

kubernetes.resourcequota.requests.memory.used

The current total number of memory requests across all cluster pods in a non-terminal state.

Metric Type: Gauge - Integer

kubernetes.resourcequota.requests.storage.hard

The maximum number of storage requests allowed across all persistent volume claims in the cluster.

Metric Type: Gauge - Integer

kubernetes.resourcequota.requests.storage.used

The current total number of storage requests across all persistent volume claims.

Metric Type: Gauge - Integer

kubernetes.resourcequota.resourcequotas.hard

The maximum number of resource quotas that can exist in the Kubernetes namespace.

Metric Type: Gauge - Integer

kubernetes.resourcequota.resourcequotas.used

The current number of resource quotas that exist in the Kubernetes namespace.

Metric Type: Gauge - Integer

kubernetes.resourcequota.secrets.hard

The maximum number of secrets that can exist in the namespace.

Metric Type: Gauge - Integer

kubernetes.resourcequota.secrets.used

The current number of secrets that exist in the namespace.

Metric Type: Gauge - Integer

kubernetes.resourcequota.services.hard

The maximum number of services that can exist in the namespace.

Metric Type: Gauge - Integer

kubernetes.resourcequota.services.used

The current number of services that exist in the namespace.

Metric Type: Gauge - Integer

kubernetes.resourcequota.services.loadbalancers.hard

The maximum number of load balancer services that can exist in the namespace.

Metric Type: Gauge - Integer

kubernetes.resourcequota.services.loadbalancers.used

The current number of load balancer services that exist in the namespace.

Metric Type: Gauge - Integer

kubernetes.resourcequota.services.nodeports.hard

The maximum number of node port services that can exist in the namespace.

Metric Type: Gauge - Integer

kubernetes.resourcequota.services.nodeports.used

The current number of node port services that exist in the namespace.

Metric Type: Gauge - Integer

kubernetes.daemonSet.pods.desired

The number of nodes that should be running the daemon pod.

kubernetes.daemonSet.pods.misscheduled

The number of nodes running a daemon pod but are not supposed to.

kubernetes.daemonSet.pods.ready

The number of nodes that should be running the daemon pod and have one or more of the daemon pod running and ready.

kubernetes.daemonSet.pods.scheduled

The number of nodes that running at least one daemon pod and are supposed to.

kubernetes.deployment.replicas.available

The number of available pods per deployment.

kubernetes.deployment.replicas.desired

The number of desired pods per deployment.

kubernetes.deployment.replicas.paused

The number of paused pods per deployment. These pods will not be processed by the deployment controller.

kubernetes.deployment.replicas.running

The number of running pods per deployment.

kubernetes.deployment.replicas.unavailable

The number of unavailable pods per deployment.

kubernetes.deployment.replicas.updated

The number of updated pods per deployment.

kubernetes.job.completions

The desired number of successfully finished pods that the job should be run with.

kubernetes.job.numFailed

The number of pods which reached Phase Failed.

kubernetes.job.numSucceeded

The number of pods which reached Phase Succeeded.

kubernetes.job.parallelism

The maximum desired number of pods that the job should run at any given time.

kubernetes.job.status.active

The number of actively running pods.

kubernetes.namespace.count

The number of namespaces.

kubernetes.namespace.deployment.count

The number of deployments per namespace.

kubernetes.namespace.job.count

The number of jobs per namespaces.

kubernetes.namespace.pod.status.count

Supported by Sysdig Agent 9.5.0 and above.

The metric gives the number of pods in each aggregate state per Namespace. This is the value that the kubectl get pods command returns in the STATUS column. This metric does not represent the pod condition or the pod phase.

Segmentable by kubernetes.namespace.name and kubernetes.namespace.pod.status.name.

Due to performance implications, Sysdig Monitor shows only a subset of the pod aggregate statuses. The statuses displayed on the UI are:

  • Evicted

  • DeadlineExceeded

  • Error

  • ContainerCreating

  • CrashLoopBackOff

  • Pending

  • Running

To view other statuses, override the default list by adding the following property in dragent.yaml

k8s_pod_status_reason_strings:
  - Pending
  - ImagePullBackOff

kubernetes.namespace.pod.running.count

Required: agent 9.6.0+

The number of all the running pods in a Namespace. The metric takes free pods also into account, that is, pods that do not belong to any controller. Therefore, its value is not the sum of (statefulset|daemonset|deployment).pod.running.count.

kubernetes.namespace.pod.running.count is supported by Agent v9.6.0 and above.

Metric Type: Gauge

Segmented by: Namespace

kubernetes.namespace.replicaSet.count

The number of replicaSets per namespace.

kubernetes.namespace.service.count

The number of services per namespace.

kubernetes.node.allocatable.cpuCores

The CPU resources of a node that are available for scheduling.

kubernetes.node.allocatable.memBytes

The memory resources of a node that are available for scheduling.

kubernetes.node.allocatable.pods

The pod resources of a node that are available for scheduling.

kubernetes.node.capacity.cpuCores

The maximum CPU resources of the node.

kubernetes.node.capacity.memBytes

The maximum memory resources of the node.

kubernetes.node.capacity.pods

The maximum number of pods of the node.

kubernetes.node.diskPressure

The number of nodes with disk pressure.

kubernetes.node.memoryPressure

The number of nodes with memory pressure.

kubernetes.node.networkUnavailable

The number of nodes with network unavailable.

kubernetes.node.outOfDisk

The number of nodes that are out of disk space.

kubernetes.node.ready

The number of nodes that are ready.

kubernetes.node.unschedulable

The number of nodes unavailable to schedule new pods.

kubernetes.pod.containers.waiting

The number of containers waiting for a pod.

kubernetes.pod.resourceLimits.cpuCores

The limit on CPU cores to be used by a container.

kubernetes.pod.resourceLimits.memBytes

The limit on memory to be used by a container in bytes.

kubernetes.pod.resourceRequests.cpuCores

The number of CPU cores requested by containers in the pod.

kubernetes.pod.resourceRequests.memBytes

The number of memory bytes requested by containers in the pod.

kubernetes.pod.status.ready

The number of pods ready to serve requests.

kubernetes.replicaSet.replicas.fullyLabeled

The number of fully labeled pods per ReplicaSet.

kubernetes.replicaSet.replicas.ready

The number of ready pods per ReplicaSet.

kubernetes.statefulset.replicas

The desired number of pods per StatefulSet.

kubernetes.statefulset.status.replicas

The total number of pods created by the StatefulSet.

kubernetes.statefulset.status.replicas.current

The number of pods created by the current version of the StatefulSet.

kubernetes.statefulset.status.replicas.ready

The number of ready pods created by this StatefulSet.

kubernetes.statefulset.status.replicas.updated

The number of pods updated to the new version of this StatefulSet.

6.3.12.2 - Resource Usage

Compatibility Mapping

Before using Kubernetes resource metrics, review their compatibility with Sysdig components. The newly supported Kubernetes metrics are not available to older versions of Sysdig Agent.

Note also that you must edit the agent config file, dragent.yaml, to enable these metrics. See Enable Kube State Metrics Collection with K8s_extra_resources.

Metric NameAgentPlatform
PVC metrics0.89.3 and beyondRelease 2172
Resource Quota metrics0.87.1 and beyondRelease 2172
HPA metrics0.79.0 and beyondRelease 2172

Kubernetes Resource Metrics

Metric Name

Metric Description

Metric Type

Segment By

kubernetes.persistentvolumeclaim.storage

The storage capacity requested by the persistent volume claim.

kubernetes.persistentvolumeclaim.storage provides Sysdig users with a single overarching metric for persistent volume claims (PVCs), rather than a series of metrics that often repeat/duplicate information. Each Kubernetes PVC metric is mapped to a kubernetes.persistentvolumeclaim.storage label, which can then be used to segment the overarching metric.

See Using Labels for more information on segmenting metrics.

Gauge

  • kubernetes.namespace.name

  • kubernetes.persistentvolumeclaim.label.accessmode

  • kubernetes.persistentvolumeclaim.label.app

  • kubernetes.persistentvolumeclaim.label.status.phase

  • kubernetes.persistentvolumeclaim.label.storage

  • kubernetes.persistentvolumeclaim.label.storageclassname

  • kubernetes.persistentvolumeclaim.label.volumename

kubernetes.pod.restart.count

The cumulative number of container restarts for the pod over its lifetime.

This metric is not useful for alerts. Sysdig recommends using kubernetes.pod.restart.rate instead.

Counter - Integer

Kubernetes

kubernetes.pod.restart.rate

The number of container restarts for the pod within the defined scope/time period.

Gauge - Integer

Kubernetes

kubernetes.replicaSet.replicas.desired

The number of replica pods the replicaSet is configured to maintain.

Gauge - Integer

Kubernetes

kubernetes.replicaSet.replicas.running

The current number of replica pods running in the replicaSet.

Gauge - Integer

Kubernetes

kubernetes.replicationController.replicas.desired

The number of replica pods the replicationController is configured to maintain.

Gauge - Integer

Kubernetes

kubernetes.replicationController.replicas.running

The current number of replica pods running in the replication controller.

Gauge - Integer

Kubernetes

6.3.13 - Network

Note: Sysdig follows the Prometheus-compabtible naming convention for both metrics and labels as opposed to the previous statsd-compatible, legacy Sysdig naming convention. However, this page still shows metrics in the legacy Sysdig naming convention. Until this page is updated, see Metrics and Label Mapping for the mapping between Sysdig legacy and Prometheus naming conventions.

net.bytes.in

Inbound network bytes. By default, this metric displays the total value for the defined scope. For example, if the scope is set to a group of machines, the metric value will be the total value for the whole group.

MetadataDescription
Metric TypeCounter
Value TypeByte
Segment ByHost, Container, Process, Kubernetes, Mesos, Swarm, CloudProvider
Default Time AggregationRate
Available Time Aggregation FormatsAvg, Rate, Sum, Min, Max
Default Group AggregationSum
Available Group Aggregation FormatsAvg, Sum, Min, Max

net.bytes.out

Outbound network bytes. By default, this metric displays the total value for the defined scope. For example, if the scope is set to a group of machines, the metric value will be the total value for the whole group.

MetadataDescription
Metric TypeCounter
Value TypeByte
Segment ByHost, Container, Process, Kubernetes, Mesos, Swarm, CloudProvider
Default Time AggregationRate
Available Time Aggregation FormatsAvg, Rate, Sum, Min, Max
Default Group AggregationSum
Available Group Aggregation FormatsAvg, Sum, Min, Max

net.bytes.total

Total network bytes. By default, this metric displays the total value for the defined scope. For example, if the scope is set to a group of machines, the metric value will be the total value for the whole group.

MetadataDescription
Metric TypeCounter
Value TypeByte
Segment ByHost, Container, Process, Kubernetes, Mesos, Swarm, CloudProvider
Default Time AggregationRate
Available Time Aggregation FormatsAvg, Rate, Sum, Min, Max
Default Group AggregationSum
Available Group Aggregation FormatsAvg, Sum, Min, Max

net.client.ip

The client IP address.

MetadataDescription
Metric TypeGauge
Value TypeString
Segment ByHost
Default Time AggregationN/A
Available Time Aggregation FormatsN/A
Default Group AggregationN/A
Available Group Aggregation FormatsN/A

net.connection.count.in

The number of currently established client (inbound) connections.

This metric is especially useful when segmented by port, process, or protocol.

MetadataDescription
Metric TypeCounter
Value TypeInteger
Segment ByHost, Container, Process, Protocol, Port, Kubernetes, Mesos, Swarm, CloudProvider
Default Time AggregationRate
Available Time Aggregation FormatsAvg, Rate, Sum, Min, Max
Default Group AggregationSum
Available Group Aggregation FormatsAvg, Sum, Min, Max

net.connection.count.out

The number of currently established server (outbound) connections.

This metric is especially useful when segmented by port, process, or protocol.

MetadataDescription
Metric TypeCounter
Value TypeInteger
Segment ByHost, Container, Process, Port, Protocol, Kubernetes, Mesos, Swarm, CloudProvider
Default Time AggregationRate
Available Time Aggregation FormatsAvg, Rate, Sum, Min, Max
Default Group AggregationSum
Available Group Aggregation FormatsAvg, Sum, Min, Max

net.connection.count.total

The number of currently established connections. This value may exceed the sum of the inbound and outbound metrics since it represents client and server inter-host connections as well as internal only connections.

This metric is especially useful when segmented by port, process, or protocol.

MetadataDescription
Metric TypeCounter
Value TypeInteger
Segment ByHost, Container, Process, Port, Protocol, Kubernetes, Mesos, Swarm, CloudProvider
Default Time AggregationRate
Available Time Aggregation FormatsAvg, Rate, Sum, Min, Max
Default Group AggregationSum
Available Group Aggregation FormatsAvg, Sum, Min, Max

net.error.count

The number of errors encountered by network system calls, such as connect(), send(), and recv(). By default, this metric displays the total value for the defined scope. For example, if the scope is defined as a group of machines, the metric value will be the total value for the whole group.

MetadataDescription
Metric TypeCounter
Value TypeInteger
Segment ByHost, Container, Process, Kubernetes, Mesos, Swarm, CloudProvider
Default Time AggregationRate
Available Time Aggregation FormatsAvg, Rate, Sum, Min, Max
Default Group AggregationSum
Available Group Aggregation FormatsAvg, Sum, Min, Max

net.http.error.count

net.http.error.count is a heuristic metric.

The number of failed HTTP requests, determined by the total number of 4xx/5xx status codes.

MetadataDescription
Metric TypeCounter
Value TypeInteger
Segment ByHost, Container, Process, Kubernetes, Mesos, Swarm, CloudProvider
Default Time AggregationRate
Available Time Aggregation FormatsAvg, Rate, Sum, Min, Max
Default Group AggregationSum
Available Group Aggregation FormatsAvg, Sum, Min, Max

net.http.method

The HTTP request method.

MetadataDescription
Metric TypeGauge
Value TypeString
Segment Byhost
Default Time AggregationN/A
Available Time Aggregation FormatsN/A
Default Group AggregationN/A
Available Group Aggregation FormatsN/A

net.http.request.count

net.http.request.count is a heuristic metric.

HTTP request count.

MetadataDescription
Metric TypeCounter
Value TypeInteger
Segment ByHost
Default Time AggregationRate
Available Time Aggregation FormatsAvg, Rate, Sum, Min, Max
Default Group AggregationSum
Available Group Aggregation FormatsAvg, Sum, Min, Max

net.http.request.time

net.http.request.time is a heuristic metric.

Average HTTP request time.

MetadataDescription
Metric TypeCounter
Value TyperelativeTime
Segment ByHost, Container, Process, Kubernetes, Mesos, Swarm, CloudProvider
Default Time AggregationAverage
Available Time Aggregation FormatsAvg, Rate, Sum, Min, Max
Default Group AggregationAverage
Available Group Aggregation FormatsAvg, Sum, Min, Max

net.http.request.time.worst

The maximum time for HTTP requests.

MetadataDescription
Metric TypeCounter
Value TyperelativeTime
Segment ByHost, Container, Process, Kubernetes, Mesos, Swarm, CloudProvider
Default Time AggregationAverage
Available Time Aggregation FormatsAvg, Rate, Sum, Min, Max
Default Group AggregationAverage
Available Group Aggregation FormatsAvg, Sum, Min, Max

net.http.statusCode

The HTTP response status code.

MetadataDescription
Metric TypeGauge
Value TypeString
Segment ByHost
Default Time AggregationN/A
Available Time Aggregation FormatsN/A
Default Group AggregationN/A
Available Group Aggregation FormatsN/A

net.http.url

The HTTP request URL.

MetadataDescription
Metric TypeGauge
Value TypeString
Segment ByHost
Default Time AggregationN/A
Available Time Aggregation FormatsN/A
Default Group AggregationN/A
Available Group Aggregation FormatsN/A

net.link.clientServer.bytes

The number of bytes passing through the link from client to server.

MetadataDescription
Metric TypeCounter
Value TypeByte
Segment ByHost
Default Time AggregationRate
Available Time Aggregation FormatsAvg, Rate, Sum, Min, Max
Default Group AggregationSum
Available Group Aggregation FormatsAvg, Sum, Min, Max

net.link.delay.perRequest

Average delay in the network link per request.

MetadataDescription
Metric TypeCounter
Value TyperelativeTime
Segment ByHost
Default Time AggregationAverage
Available Time Aggregation FormatsAvg, Rate, Sum, Min, Max
Default Group AggregationAverage
Available Group Aggregation FormatsAvg, Sum, Min, Max

net.link.serverClient.bytes

The number of bytes passing through the link from server to client.

MetadataDescription
Metric TypeCounter
Value TypeByte
Segment ByHost
Default Time AggregationRate
Available Time Aggregation FormatsAvg, Rate, Sum, Min, Max
Default Group AggregationSum
Available Group Aggregation FormatsAvg, Sum, Min, Max

net.local.endpoint

The local endpoint for a connection. This metric is resolved to a user-friendly host name, if available.

MetadataDescription
Metric TypeGauge
Value TypeString
Segment ByHost
Default Time AggregationN/A
Available Time Aggregation FormatsN/A
Default Group AggregationN/A
Available Group Aggregation FormatsN/A

net.local.service

MetadataDescription
Metric TypeGauge
Value TypeString
Segment ByHost
Default Time AggregationN/A
Available Time Aggregation FormatsN/A
Default Group AggregationN/A
Available Group Aggregation FormatsN/A

net.mongodb.collection

The MongoDB collection.

MetadataDescription
Metric TypeGauge
Value TypeString
Segment ByHost
Default Time AggregationN/A
Available Time Aggregation FormatsN/A
Default Group AggregationN/A
Available Group Aggregation FormatsN/A

net.mongodb.error.count

net.mongodb.error.count is a heuristic metric.

The number of Failed MongoDB requests.

MetadataDescription
Metric TypeCounter
Value TypeInteger
Segment ByHost, Container, Process, Kubernetes, Mesos, Swarm, CloudProvider
Default Time AggregationRate
Available Time Aggregation FormatsAvg, Rate, Sum, Min, Max
Default Group AggregationSum
Available Group Aggregation FormatsAvg, Sum, Min, Max

net.mongodb.operation

The MongoDB operation.

MetadataDescription
Metric TypeGauge
Value TypeString
Segment ByHost
Default Time AggregationN/A
Available Time Aggregation FormatsN/A
Default Group AggregationN/A
Available Group Aggregation FormatsN/A

net.mongodb.request.count

net.mongodb.request.count is a heuristic metric.

The total number of MongoDB requests.

MetadataDescription
Metric TypeCounter
Value TypeInteger
Segment ByHost, Container, Process, Kubernetes, Mesos, Swarm, CloudProvider
Default Time AggregationRate
Available Time Aggregation FormatsAvg, Rate, Sum, Min, Max
Default Group AggregationSum
Available Group Aggregation FormatsAvg, Sum, Min, Max

net.mongodb.request.time

net.mongodb.request.time is a heuristic metric.

The average time to complete a MongoDB request.

MetadataDescription
Metric TypeCounter
Value TyperelativeTime
Segment ByHost, Container, Process, Kubernetes, Mesos, Swarm, CloudProvider
Default Time AggregationAverage
Available Time Aggregation FormatsAvg, Rate, Sum, Min, Max
Default Group AggregationAverage
Available Group Aggregation FormatsAvg, Sum, Min, Max

net.mongodb.request.time.worst (deprecated)

The maximum time to complete a MongoDB request.

MetadataDescription
Metric TypeCounter
Value TyperelativeTime
Segment ByHost, Container, Process, Kubernetes, Mesos, Swarm, CloudProvider
Default Time AggregationAverage
Available Time Aggregation FormatsAvg, Rate, Sum, Min, Max
Default Group AggregationAverage
Available Group Aggregation FormatsAvg, Sum, Min, Max

net.protocol

The network protocol of a request (for example, HTTP or MySQL).

MetadataDescription
Metric TypeGauge
Value TypeString
Segment ByHost
Default Time AggregationN/A
Available Time Aggregation FormatsN/A
Default Group AggregationN/A
Available Group Aggregation FormatsN/A

net.remote.endpoint

The remote endpoint of a connection. This metric automatically resolves as a user-friendly host name, if available.

MetadataDescription
Metric TypeGauge
Value TypeString
Segment ByHost
Default Time AggregationN/A
Available Time Aggregation FormatsN/A
Default Group AggregationN/A
Available Group Aggregation FormatsN/A

net.remote.service

Service (port number) of a remote node.

MetadataDescription
Metric TypeGauge
Value TypeString
Segment ByHost
Default Time AggregationN/A
Available Time Aggregation FormatsN/A
Default Group AggregationN/A
Available Group Aggregation FormatsN/A

net.request.count

net.request.count is a heuristic metric.

Total number of network requests.

This value may exceed the sum of inbound and outbound requests, because this count includes requests over internal connections.

MetadataDescription
Metric TypeCounter
Value TypeInteger
Segment ByHost, Container, Process, Kubernetes, Mesos, Swarm, CloudProvider
Default Time AggregationRate
Available Time Aggregation FormatsAvg, Rate, Sum, Min, Max
Default Group AggregationSum
Available Group Aggregation FormatsAvg, Sum, Min, Max

net.request.count.in

net.request.count.in is a heuristic metric.

Number of inbound network requests.

MetadataDescription
Metric TypeCounter
Value TypeInteger
Segment ByHost, Container, Process, Kubernetes, Mesos, Swarm, CloudProvider
Default Time AggregationRate
Available Time Aggregation FormatsAvg, Rate, Sum, Min, Max
Default Group AggregationSum
Available Group Aggregation FormatsAvg, Sum, Min, Max

net.request.count.out

Number of outbound network requests.

MetadataDescription
Metric TypeCounter
Value TypeInteger
Segment ByHost, Container, Process, Kubernetes, Mesos, Swarm, CloudProvider
Default Time AggregationRate
Available Time Aggregation FormatsAvg, Rate, Sum, Min, Max
Default Group AggregationSum
Available Group Aggregation FormatsAvg, Sum, Min, Max

net.request.time

net.request.time is a heuristic metric.

A measure of response time which includes app + network latency. For server side it is purely a measure of app latency. This is calculated by measuring when we see the arrival of the last request buffer to when we see the departure of the first response buffer.

MetadataDescription
Metric TypeCounter
Value TyperelativeTime
Segment ByHost, Container, Process, Kubernetes, Mesos, Swarm, CloudProvider
Default Time AggregationAverage
Available Time Aggregation FormatsAvg, Rate, Sum, Min, Max
Default Group AggregationAverage
Available Group Aggregation FormatsAvg, Sum, Min, Max

net.request.time.file (deprecated)

The amount of time for serving a request that is spent doing file I/O. See also net.request.time.net (network I/O time) and net.request.time.processing (CPU processing time).

MetadataDescription
Metric TypeCounter
Value TyperelativeTime
Segment ByHost, Container, Process, Kubernetes, Mesos, Swarm, CloudProvider
Default Time AggregationAverage
Available Time Aggregation FormatsAvg, Rate, Sum, Min, Max
Default Group AggregationAverage
Available Group Aggregation FormatsAvg, Sum, Min, Max

net.request.time.file.percent

net.request.time.file.percent is a heuristic metric.

The percentage of time for serving a request that is spent doing file I/O. See also net.request.time.net (network I/O time) and net.request.time.processing (CPU processing time).

MetadataDescription
Metric TypeGauge
Value Type%
Segment ByHost, Container, Process, Kubernetes, Mesos, Swarm, CloudProvider
Default Time AggregationAverage
Available Time Aggregation FormatsAvg, Rate, Sum, Min, Max
Default Group AggregationAverage
Available Group Aggregation FormatsAvg, Sum, Min, Max

net.request.time.in

net.request.time.in is a heuristic metric.

Average time to serve an inbound request.

MetadataDescription
Metric TypeCounter
Value TyperelativeTime
Segment ByHost, Container, Process, Kubernetes, Mesos, Swarm, CloudProvider
Default Time AggregationAverage
Available Time Aggregation FormatsAvg, Rate, Sum, Min, Max
Default Group AggregationAverage
Available Group Aggregation FormatsAvg, Sum, Min, Max

net.request.time.local (deprecated)

Average per request delay introduced by this node when it serves requests coming from the previous tiers. In other words, this is the time spent serving incoming requests minus the time spent waiting for outgoing requests to complete.

MetadataDescription
Metric TypeCounter
Value TyperelativeTime
Segment ByHost, Container, Process, Kubernetes, Mesos, Swarm, CloudProvider
Default Time AggregationAverage
Available Time Aggregation FormatsAvg, Rate, Sum, Min, Max
Default Group AggregationAverage
Available Group Aggregation FormatsAvg, Sum, Min, Max

net.request.time.local.percent

net.request.time.local.percent is a heuristic metric.

The percentage of time spent in the local node versus the next tiers, when serving requests that come from previous tiers.

MetadataDescription
Metric TypeGauge
Value Type%
Segment ByHost, Container, Process, Kubernetes, Mesos, Swarm, CloudProvider
Default Time AggregationAverage
Available Time Aggregation FormatsAvg, Rate, Sum, Min, Max
Default Group AggregationAverage
Available Group Aggregation FormatsAvg, Sum, Min, Max

net.request.time.net (deprecated)

The amount of time for serving a request that is spent doing network I/O. See also net.request.time.file (file I/O time) and net.request.time.processing (CPU processing time).

MetadataDescription
Metric TypeCounter
Value TyperelativeTime
Segment ByHost, Container, Process, Kubernetes, Mesos, Swarm, CloudProvider
Default Time AggregationAverage
Available Time Aggregation FormatsAvg, Rate, Sum, Min, Max
Default Group AggregationAverage
Available Group Aggregation FormatsAvg, Sum, Min, Max

net.request.time.net.percent

net.request.time.net.percent is a heuristic metric.

The percent of time for serving a request that is spent doing network I/O. See also net.request.time.file (file I/O time) and net.request.time.processing (CPU processing time).

MetadataDescription
Metric TypeGauge
Value Type%
Segment ByHost, Container, Process, Kubernetes, Mesos, Swarm, CloudProvider
Default Time AggregationAverage
Available Time Aggregation FormatsAvg, Rate, Sum, Min, Max
Default Group AggregationAverage
Available Group Aggregation FormatsAvg, Sum, Min, Max

net.request.time.nextTiers (deprecated)

Delay introduced by the successive tiers when serving requests.

MetadataDescription
Metric TypeCounter
Value TyperelativeTime
Segment ByHost, Container, Process, Kubernetes, Mesos, Swarm, CloudProvider
Default Time AggregationAverage
Available Time Aggregation FormatsAvg, Rate, Sum, Min, Max
Default Group AggregationAverage
Available Group Aggregation FormatsAvg, Sum, Min, Max

net.request.time.nextTiers.percent

net.request.time.nextTiers.percent is a heuristic metric.

The percentage of time spent in the next tiers versus the local node, when serving requests that come from previous tiers.

MetadataDescription
Metric TypeGauge
Value Type%
Segment ByHost, Container, Process, Kubernetes, Mesos, Swarm, CloudProvider
Default Time AggregationAverage
Available Time Aggregation FormatsAvg, Rate, Sum, Min, Max
Default Group AggregationAverage
Available Group Aggregation FormatsAvg, Sum, Min, Max

net.request.time.out

net.request.time.out is a heuristic metric.

Average time spent waiting for an outbound request.

MetadataDescription
Metric TypeCounter
Value TyperelativeTime
Segment ByHost, Container, Process, Kubernetes, Mesos, Swarm, CloudProvider
Default Time AggregationAverage
Available Time Aggregation FormatsAvg, Rate, Sum, Min, Max
Default Group AggregationAverage
Available Group Aggregation FormatsAvg, Sum, Min, Max

net.request.time.processing (deprecated)

The amount of time for serving a request that is spent doing CPU processing. See also net.request.time.file (file I/O time) and net.request.time.net (network I/O time).

MetadataDescription
Metric TypeCounter
Value TyperelativeTime
Segment ByHost, Container, Process, Kubernetes, Mesos, Swarm, CloudProvider
Default Time AggregationAverage
Available Time Aggregation FormatsAvg, Rate, Sum, Min, Max
Default Group AggregationAverage
Available Group Aggregation FormatsAvg, Sum, Min, Max

net.request.time.processing.percent

net.request.time.processing.percent is a heuristic metric.

The percent of time for serving a request that is spent doing CPU processing. See also net.request.time.file (file I/O time) and net.request.time.net (network I/O time).

MetadataDescription
Metric TypeGauge
Value Type%
Segment ByHost, Container, Process, Kubernetes, Mesos, Swarm, CloudProvider
Default Time AggregationAverage
Available Time Aggregation FormatsAvg, Rate, Sum, Min, Max
Default Group AggregationAverage
Available Group Aggregation FormatsAvg, Sum, Min, Max

net.request.time.worst.in

net.request.time.worst.in is a heuristic metric.

Maximum time to serve an inbound request.

MetadataDescription
Metric TypeCounter
Value TyperelativeTime
Segment ByHost, Container, Process, Kubernetes, Mesos, Swarm, CloudProvider
Default Time AggregationAverage
Available Time Aggregation FormatsAvg, Rate, Sum, Min, Max
Default Group AggregationAverage
Available Group Aggregation FormatsAvg, Sum, Min, Max

net.request.time.worst.out

net.request.time.worst.out is a heuristic metric.

Maximum time spent waiting for an outbound request.

MetadataDescription
Metric TypeCounter
Value TyperelativeTime
Segment ByHost, Container, Process, Kubernetes, Mesos, Swarm, CloudProvider
Default Time AggregationAverage
Available Time Aggregation FormatsAvg, Rate, Sum, Min, Max
Default Group AggregationAverage
Available Group Aggregation FormatsAvg, Sum, Min, Max

net.role

MetadataDescription
Metric TypeGauge
Value TypeString
Segment ByHost
Default Time AggregationN/A
Available Time Aggregation FormatsN/A
Default Group AggregationN/A
Available Group Aggregation FormatsN/A

net.server.ip

Server IP address.

MetadataDescription
Metric TypeGauge
Value TypeString
Segment ByHost
Default Time AggregationN/A
Available Time Aggregation FormatsN/A
Default Group AggregationN/A
Available Group Aggregation FormatsN/A

net.server.port

TCP/UDP Server port number.

MetadataDescription
Metric TypeGauge
Value TypeInteger
Segment ByHost
Default Time AggregationRate
Available Time Aggregation FormatsAvg, Rate, Sum, Min, Max
Default Group AggregationSum
Available Group Aggregation FormatsAvg, Sum, Min, Max

net.sql.error.count

net.sql.error.count is a heuristic metric.

The number of Failed SQL requests.

MetadataDescription
Metric TypeCounter
Value TypeInteger
Segment ByHost, Container, Process, Kubernetes, Mesos, Swarm, CloudProvider
Default Time AggregationRate
Available Time Aggregation FormatsAvg, Rate, Sum, Min, Max
Default Group AggregationSum
Available Group Aggregation FormatsAvg, Sum, Min, Max

net.sql.query

The full SQL query. If the query string is longer than 512 characters, it will be truncated to 512 characters.

MetadataDescription
Metric TypeGauge
Value TypeString
Segment ByHost
Default Time AggregationN/A
Available Time Aggregation FormatsN/A
Default Group AggregationN/A
Available Group Aggregation FormatsN/A

net.sql.query.type

The SQL query type (for example, SELECT, INSERT, or DELETE).

MetadataDescription
Metric TypeGauge
Value TypeString
Segment ByHost
Default Time AggregationN/A
Available Time Aggregation FormatsN/A
Default Group AggregationN/A
Available Group Aggregation FormatsN/A

net.sql.request.count

net.sql.request.count is a heuristic metric.

The number of SQL requests.

MetadataDescription
Metric TypeCounter
Value TypeInteger
Segment ByHost, Container, Process, Kubernetes, Mesos, Swarm, CloudProvider
Default Time AggregationRate
Available Time Aggregation FormatsAvg, Rate, Sum, Min, Max
Default Group AggregationSum
Available Group Aggregation FormatsAvg, Sum, Min, Max

net.sql.request.time

net.sql.request.time is a heuristic metric.

Average time to complete an SQL request.

MetadataDescription
Metric TypeCounter
Value TyperelativeTime
Segment ByHost, Container, Process, Kubernetes, Mesos, Swarm, CloudProvider
Default Time AggregationAverage
Available Time Aggregation FormatsAvg, Rate, Sum, Min, Max
Default Group AggregationAverage
Available Group Aggregation FormatsAvg, Sum, Min, Max

net.sql.request.time.worst (deprecated)

Maximum time to complete a SQL request.

MetadataDescription
Metric TypeCounter
Value TyperelativeTime
Segment ByHost, Container, Process, Kubernetes, Mesos, Swarm, CloudProvider
Default Time AggregationAverage
Available Time Aggregation FormatsAvg, Rate, Sum, Min, Max
Default Group AggregationAverage
Available Group Aggregation FormatsAvg, Sum, Min, Max

net.sql.table

The SQL query table name.

MetadataDescription
Metric TypeGauge
Value TypeString
Segment ByHost
Default Time AggregationN/A
Available Time Aggregation FormatsN/A
Default Group AggregationN/A
Available Group Aggregation FormatsN/A

net.tcp.queue.len

The length of the TCP request queue.

MetadataDescription
Metric TypeGauge
Value Type%
Segment ByHost, Container, Process, Kubernetes, Mesos, Swarm, CloudProvider
Default Time AggregationAverage
Available Time Aggregation FormatsAvg, Rate, Sum, Min, Max
Default Group AggregationAverage
Available Group Aggregation FormatsAvg, Sum, Min, Max

6.3.14 - Process

Note: Sysdig follows the Prometheus-compabtible naming convention for both metrics and labels as opposed to the previous statsd-compatible, legacy Sysdig naming convention. However, this page still shows metrics in the legacy Sysdig naming convention. Until this page is updated, see Metrics and Label Mapping for the mapping between Sysdig legacy and Prometheus naming conventions.

fd.used.percent

The percentage of used file descriptors out of the maximum available. By default, this metric displays the average value for the defined scope. For example, if the scope is set to a group of machines, the metric value will be the average value for the whole group.

This metric should be monitored carefully, and used for alerts, as when a process reaches its file descriptor limit, the process will stop operating correctly, and potentially crash.

MetadataDescription
Metric TypeGauge
Value Type%
Segment ByHost, Container, Process, Kubernetes, Mesos, Swarm, CloudProvider
Default Time AggregationAverage
Available Time Aggregation FormatsAvg, Rate, Sum, Min, Max
Default Group AggregationAverage
Available Group Aggregation FormatsAvg, Sum, Min, Max

proc.commandLine

Command line used to start the process.

MetadataDescription
Metric TypeGauge
Value TypeString
Segment ByProcess
Default Time AggregationN/a
Available Time Aggregation FormatsN/A
Default Group AggregationN/A
Available Group Aggregation FormatsN/A

proc.count

The number of processes on host or container, excluding any processes that do not have .exe or command line parameters in the process table. These processes typically are kernel or system level, and are typically identified by square brackets (for example, [kthreadd]).

As some processes are excluded, the host level proc.count value will be lower than the value reported by the ps -ef command on the host.

MetadataDescription
Metric TypeCounter
Value TypeInteger
Segment ByHost, Container, Process, Kubernetes, Mesos, Swarm, CloudProvider
Default Time AggregationRate
Available Time Aggregation FormatsAvg, Rate, Sum, Min, Max
Default Group AggregationSum
Available Group Aggregation FormatsAvg, Sum, Min, Max

proc.name

Name of the process.

MetadataDescription
Metric TypeGauge
Value TypeString
Segment ByProcess
Default Time AggregationN/A
Available Time Aggregation FormatsN/A
Default Group AggregationN/A
Available Group Aggregation FormatsN/A

proc.name.client

Name of the Client process.

MetadataDescription
Metric TypeGauge
Value TypeString
Segment ByProcess
Default Time AggregationN/A
Available Time Aggregation FormatsN/A
Default Group AggregationN/A
Available Group Aggregation FormatsN/A

proc.name.server

Name of the server process.

MetadataDescription
Metric TypeGauge
Value TypeString
Segment ByProcess
Default Time AggregationN/A
Available Time Aggregation FormatsN/A
Default Group AggregationN/A
Available Group Aggregation FormatsN/A

proc.start.count

Number of process starts on host or container.

MetadataDescription
Metric TypeGauge
Value TypeInteger
Segment ByHost, CloudProvider
Default Time AggregationRate
Available Time Aggregation FormatsAvg, Rate, Sum, Min, Max
Default Group AggregationSum
Available Group Aggregation FormatsAvg, Sum, Min, Max

6.3.15 - RedisDB Metrics

Note: Sysdig follows the Prometheus-compabtible naming convention for both metrics and labels as opposed to the previous statsd-compatible, legacy Sysdig naming convention. However, this page still shows metrics in the legacy Sysdig naming convention. Until this page is updated, see Metrics and Label Mapping for the mapping between Sysdig legacy and Prometheus naming conventions.

See RedisDB integration information.

redis.aof.buffer_length

The size of the AOF buffer.

redis.aof.last_rewrite_time

The duration of the last AOF rewrite.

redis.aof.rewrite

A flag indicating that a AOF rewrite operation is on-going.

redis.clients.biggest_input_buf

The biggest input buffer among current client connections.

redis.clients.blocked

The number of connections waiting on a blocking call.

redis.clients.longest_output_list

The longest output list among current client connections.

redis.command.calls

The number of times a redis command has been called. The commands are tagged with command (for example, command:append).

redis.command.usec_per_call

The CPU time consumed per redis command call. The commands are tagged with command (for example, command:append).

redis.cpu.sys

The system CPU consumed by the Redis server.

redis.cpu.sys_children

The system CPU consumed by the background processes.

redis.cpu.user

The user CPU consumed by the Redis server.

redis.cpu.user_children

The user CPU consumed by the background processes.

redis.expires

The number of keys that have expired.

redis.expires.percent

The percentage of total keys that have been expired.

redis.info.latency_ms

The latency of the redis INFO command.

redis.key.length

The number of elements in a given key. Each element is tagged by key (for example, key:mykeyname).

redis.keys

The total number of keys.

redis.keys.evicted

The total number of keys evicted due to the maxmemory limit.

redis.keys.expired

The total number of keys expired from the database.

redis.mem.fragmentation_ratio

The ratio between used_memory_rss and used_memory.

redis.mem.lua

The amount of memory used by the Lua engine.

redis.mem.maxmemory

The maximum amount of memory allotted to the RedisDB system.

redis.mem.overhead

Sum of all the overheads allocated by Redis for managing its internal data structures.

Supported by Sysdig Agent v9.7.0 and above.

redis.mem.peak

The peak amount of memory used by Redis.

redis.mem.startup

Amount of memory consumed by Redis while initializing.

Supported by Sysdig Agent v9.7.0 and above.

redis.mem.rss

The amount of memory that Redis allocated as seen by the operating system.

redis.mem.used

The amount of memory allocated by Redis.

redis.net.clients

The number of connected clients (excluding slaves).

redis.net.commands

The number of commands processed by the server.

redis.net.commands.instantaneous_ops_per_sec

The number of commands processed by the server per second.

redis.net.rejected

The number of rejected connections.

redis.net.slaves

The number of connected slaves.

redis.perf.latest_fork_usec

The duration of the latest fork.

redis.persist

The number of keys persisted. The formula for this metric is redis.keys - redis.expires.

redis.persist.percent

Percentage of total keys that are persisted.

redis.pubsub.channels

The number of active pubsub channels.

redis.pubsub.patterns

The number of active pubsub patterns.

redis.rdb.bgsave

Determines whether a bgsave is in progress. The value is one if a bgsave is in progress, and zero at all other times.

redis.rdb.changes_since_last

The number of changes since the last background save.

redis.rdb.last_bgsave_time

The duration of the last bg_save operation.

redis.replication.backlog_histlen

The amount of data in the backlog sync buffer.

redis.replication.delay

The replication delay in offsets.

redis.replication.last_io_seconds_ago

The amount of time since the last interaction with master.

The amount of time that the master link has been down.

redis.replication.master_repl_offset

The replication offset reported by the master.

redis.replication.slave_repl_offset

The replication offset reported by the slave.

redis.replication.sync

Determines whether a sync is in progress. The value is one if a sync is in progress, and zero at all other times.

redis.replication.sync_left_bytes

The amount of data left before syncing is complete.

redis.slowlog.micros.95percentile

The 95th percentile of the duration of queries reported in the slow log.

redis.slowlog.micros.avg

The average duration of queries reported in the slow log.

redis.slowlog.micros.count

The rate of queries reported in the slow log.

redis.slowlog.micros.max

The maximum duration of queries reported in the slow log.

redis.slowlog.micros.median

The median duration of queries reported in the slow log.

redis.stats.keyspace_hits

The total number of successful lookups in the database.

redis.stats.keyspace_misses

The total number of missed lookups in the database.

6.3.16 - Security Policy Metrics

Note: Sysdig follows the Prometheus-compabtible naming convention for both metrics and labels as opposed to the previous statsd-compatible, legacy Sysdig naming convention. However, this page still shows metrics in the legacy Sysdig naming convention. Until this page is updated, see Metrics and Label Mapping for the mapping between Sysdig legacy and Prometheus naming conventions.

Metrics

Description

Type

Segmented by

Minimum Agent Version

security.evts.k8s_audit

The total number of policy events from a Kubernetes audit policy.

Gauge

host.mac

host.hostname

0.86.0

security.policy_evts.syscall

The total number of policy events from a syscall policy.

security.policies.enabled

The number of security policies enabled for a user.

security.policies.total

The number of security policies that exist for a user.

security.policy_evts.container

The total number of policy events from a container policy.

security.policy_evts.falco

The total number of policy events from a Falco policy.

security.policy_evts.filesystem

The total number of policy events from a filesystem policy.

security.policy_evts.high

The number of policy events from a policy with high severity.

security.policy_evts.low

The number of policy events from a policy with low severity.

security.policy_evts.medium

The number of policy events from a policy with medium severity.

security.policy_evts.network

The total number of policy events from a network policy.

security.policy_evts.process

The total number of policy events from a process policy.

security.policy_evts.total

The total number of policy events across all policy types.

security_policy_evts.by_name

The number of events triggered with segment name available.

name

host.mac

host.hostname

6.3.17 - System

Note: Sysdig follows the Prometheus-compabtible naming convention for both metrics and labels as opposed to the previous statsd-compatible, legacy Sysdig naming convention. However, this page still shows metrics in the legacy Sysdig naming convention. Until this page is updated, see Metrics and Label Mapping for the mapping between Sysdig legacy and Prometheus naming conventions.

capacity.estimated.request.stolen.count (deprecated)

The number of requests the node cannot serve due to CPU steal time. This metric is calculated by measuring the current number of requests the machine is serving, and calculating how many more requests could be served if there was no steal time.

This metric can be used to understand how steal time impacts the ability to serve user requests.

MetadataDescription
Metric TypeCounter
Value TypeInteger
Segment ByHost, Process
Default Time AggregationRate
Available Time Aggregation FormatsAvg, Rate, Sum, Min, Max
Default Group AggregationSum
Available Group Aggregation FormatsAvg, Sum, Min, Max

capacity.estimated.request.total.count (deprecated)

The estimated number of requests the node serves at full capacity. This metric is calculated by measuring the number of requests that a machine is serving, and the resources each request is using, and combining the values to project how many requests the machine can serve.

This metric can help users determine if/when the infrastructure capacity should be increased.

MetadataDescription
Metric TypeCounter
Value TypeInteger
Segment ByHost, Process
Default Time AggregationRate
Available Time Aggregation FormatsAvg, Rate, Sum, Min, Max
Default Group AggregationSum
Available Group Aggregation FormatsAvg, Sum, Min, Max

capacity.stolen.percent (deprecated)

The lost service request capacity due to stolen CPU. This metric reflects the impact on other resource usage capabilities, including disk I/O and network I/O.

capacity.stolen.percent is non-zero only if cpu.stolen.percent is also non-zero.

MetadataDescription
Metric TypeGauge
Value Type%
Segment ByHost, Process
Default Time AggregationAverage
Available Time Aggregation FormatsAvg, Rate, Sum, Min, Max
Default Group AggregationAverage
Available Group Aggregation FormatsAvg, Sum, Min, Max

capacity.total.percent (deprecated)

The estimated current capacity usage, based on CPU and disk/network utilization, with CPU stolen time added back in.

capacity.total.percent can be used to show how the system would perform with dedicated CPU usage.

MetadataDescription
Metric TypeGauge
Value Type%
Segment ByHost, Process
Default Time AggregationAverage
Available Time Aggregation FormatsAvg, Rate, Sum, Min, Max
Default Group AggregationAverage
Available Group Aggregation FormatsAvg, Sum, Min, Max

capacity.used.percent (deprecated)

The estimated current capacity usage, based on CPU and disk/network utilization. This metric is calculated by adding the value of how many resources each request coming to the machine is using, creating a score that indicates how saturates the machine resources are.

MetadataDescription
Metric TypeGauge
Value Type%
Segment ByHost, Process
Default Time AggregationAverage
Available Time Aggregation FormatsAvg, Rate, Sum, Min, Max
Default Group AggregationAverage
Available Group Aggregation FormatsAvg, Sum, Min, Max

cpu.idle.percent

The percentage of time that the CPU/s were idle and the system did not have an outstanding disk I/O request. By default, this metric displays the average value for the defined scope. For example, if the scope is set to a group of machines, the metric value will be the average value for the whole group.

MetadataDescription
Metric TypeGauge
Value Type%
Segment ByHost, CloudProvider
Default Time AggregationAverage
Available Time Aggregation FormatsAvg, Rate, Sum, Min, Max
Default Group AggregationAverage
Available Group Aggregation FormatsAvg, Sum, Min, Max

cpu.iowait.percent

The percentage of time that the CPU/s were idle during which the system had an outstanding disk I/O request. By default, this metric displays the average value for the defined scope. For example, if the scope is set to a group of machines, the metric value will be the average value for the whole group.

MetadataDescription
Metric TypeGauge
Value Type%
Segment ByHost, CloudProvider
Default Time AggregationAverage
Available Time Aggregation FormatsAvg, Rate, Sum, Min, Max
Default Group AggregationAverage
Available Group Aggregation FormatsAvg, Sum, Min, Max

cpu.nice.percent

The percentage of CPU utilization that occurred while executing at the user level with Nice priority. By default, this metric displays the average value for the defined scope. For example, if the scope is set to a group of machines, the metric value will be the average value for the whole group.

MetadataDescription
Metric TypeGauge
Value Type%
Segment ByHost, CloudProvider
Default Time AggregationAverage
Available Time Aggregation FormatsAvg, Rate, Sum, Min, Max
Default Group AggregationAverage
Available Group Aggregation FormatsAvg, Sum, Min, Max

cpu.stolen.percent

Measures the percentage of time that a virtual machine’s CPU is in a state of involuntary wait due to the fact that the physical CPU is shared among virtual machines. In calculating steal time, the operating system kernel detects when it has work available but does not have access to the physical CPU to perform that work.

If the percent of steal time is consistently high, you may want to stop and restart the instance (since it will most likely start on different physical hardware) or upgrade to a virtual machine with more CPU power. Also see capacity.total.percent to see how steal time directly impacts the number of server requests that could not be handled. On AWS EC2, steal time does not depend on the activity of other virtual machine neighbors. EC2 is simply making sure your instance is not using more CPU cycles than paid for.

MetadataDescription
Metric TypeGauge
Value Type%
Segment ByHost, CloudProvider
Default Time AggregationAverage
Available Time Aggregation FormatsAvg, Rate, Sum, Min, Max
Default Group AggregationAverage
Available Group Aggregation FormatsAvg, Sum, Min, Max

cpu.system.percent

The percentage of CPU utilization that occurred while executing at the system level (kernel). By default, this metric displays the average value for the defined scope. For example, if the scope is set to a group of machines, the metric value will be the average value for the whole group.

MetadataDescription
Metric TypeGauge
Value Type%
Segment ByHost, CloudProvider
Default Time AggregationAverage
Available Time Aggregation FormatsAvg, Rate, Sum, Min, Max
Default Group AggregationAverage
Available Group Aggregation FormatsAvg, Sum, Min, Max

cpu.cores.used

The CPU core usage of each container is obtained from cgroups, and is equal to