Using PromQL

The Prometheus Query Language (PromQL) is the defacto standard for querying Prometheus metric data. PromQL is designed to allow the user to select and aggregate time-series data.

PromQL is available only in Sysdig SaaS editions. The feature is not yet supported by Sysdig on-premises installations.

Sysdig Monitor’s PromQL support includes all of the features, functions, and aggregations in standard open-source PromQL. The PromQL language is documented at Prometheus Query Basics.

For new functionalities released as part of agent v10.0.0, see Collect Prometheus Metrics.

Built-In Variables

The PromQL query field supports the following built-in variables:

$__range

Represents the time range currently selected in the time navigation and it is used to adapt operations like calculating average for a selected time interval. In the Live mode, the value is constantly updated to reflect the new time range.

$__range_sec

Same as $__range but this variable will be replaced with an absolute value in seconds.

$__interval

Represents a time interval and is automatically configured based on the time range. Use it within a PromQL query to apply the most appropriate sampling corresponding to the time range you have selected. Setting it, ensures that the most granular data is accessible for aggregation in long intervals of time. This in turn helps load panels quickly.

You currently have no control over the sampling visualized in a time chart. Sysdig determines the best and the maximum number of samples for aggregation from what’s currently available in the data store while maintaining good performance. Therefore $__interval offers a way to avoid referencing an explicit, static sampling, such as 1 minute, and instead allow for runtime substitution with the sampling that is picked by the Sysdig.

$__interval_sec

Same as $__interval but this variable will be replaced with an absolute value in seconds.

Time builtins summary

$__interval and $__range are replaced with the time range you have selected, such as 10s, 1m, 10m, whereas $_interval_sec and $_range_sec are replaced with seconds and can be used everywhere in the query.

In the example below, if $_interval is 10m , $_interval_sec will be 600 . http_requests_total{job="prometheus[$__interval]

The $__interval and $__range variables can be used in the range vector selector.

In normal cases, you cannot use the rate function to calculate the rate of gauge metrics. For example, sysdig_host_cpu_used_percent is a gauge metric and you can’t use the rate function because rate should only be used with counters while sysdig_host_cpu_used_percent is a gauge.

In such case, you can use $_interval_sec to compute the rate as follows:

  • sum_over_time(sysdig_container_cpu_used_percent[$__interval]) / $__interval_sec
  • sum_over_time(sysdig_container_cpu_used_percent[$__range]) / $__range_sec

$__scope

Represents the selected scope that is applied to a PromQL query. The defined scope is applied by using the filter functionality of PromQL similar to how scope variables are applied. It allows you to apply whole Dashboard scope to the queries, instead of applying each scope variable individually. You can place this builtin anywhere within the query expression. Using multiple $__scope variages in a single expression is not allowed.

See Using $_scope.

Construct a PromQL Query

In the Dashboard Panel, select the PromQL type to query data using PromQL.

Display Options

  • Type: Select the type of chart.

    PromQL is currently not supported for Histogram visualization type.

  • Query Display Name: A meaningful display name for the legend. The text you enter replaces the metric name displayed in the legend. The default legend title is the metric name. The default legend title is the query itself.

  • Timeseries Name: A display name of the time series for the query using text and any label values returned with the metric.

Query

sum(rate(sysdig_container_net_in_bytes{$__scope}[$__interval])) by (container_id,agent_id)

Specify the following:

Metrics: Search the desired metric. The field supports auto-complete. Enter the text and the rest of the text you type is predicted so you can filter the metric easily. In the example: sysdig_container_net_in_bytes.

Segmentation: This is the process of categorizing aggregated data with labels to provide precise control over the data. Choose an appropriate value for segmenting the aggregated PromQL data. In this example, container_id and agent_id.

Query Options

  • Unit and Y-Axes: Specify the unit of scale and display format.

  • No Data Display: Determine how to display null data on the dashboard.

  • Min. Interval: Specify the minimum interval to replace $__interval. The value must be expressed as a time duration, for example: 10s, 1m, 1h, and must be between 10s and 1d.

  • Compare To: Optionally, you can compare the data against historical data. This is helpful for comparing current usage to past usage when determining the conditions of your infrastructure. To compare the current data with that from a time range in the past, click Enable and select the Range Offset and time unit. The supported time units are, Hour, Day, Week, and Month.
    When segmentation is applied, comparing time series against historical data is not supported.

  • Axes: Determine scale, unit, display format, and gauge for the Y-axes.

  • Legend: Determine the position of the legend in the Dashboard.

  • Panel: Specify a name and add details about the panel. See Create a New Panel for details.

Build PromQL Panels from Form Query

You can use the Translate to PromQL option to quickly build a PromQL-based panel from form queries. To do so,

  1. Build a form query, as given in Building a Form-Based Query. For example, let us build a Toplist for the metric, sysdig_program_cpu_cores_used, segmented by program_name and container_name.

  2. For Sorting, choose Top.

  3. Click Translate to PromQL.

    If a PromQL query is already defined, you will see a message similar to the following:


In the scenario, you are overriding manually-created or manually-modified queries in the PromQL tab.

  1. Click Continue to proceed.

    The PromQL Toplist panel will be displayed on screen.

Apply a Dashboard Scope to a PromQL Query

The dashboard scope is automatically applied only to form-based panels. To scope a panel built from a PromQL query, you must use a scope variable within the query. The variable will take the value of the referenced scope parameter, and the PromQL panel will change accordingly.

The easiest way to apply the full dashboard scope to a PromQL query is to use $__scope.

sysdig_container_cpu_used_percent{$__scope}

The following example show how to use scope variables within PromQL queries when a finer control is required (for example when you want to apply a scope entry on specific metrics).

Example: CPU Used Percent

The following query returns the CPU used percent for all the hosts, regardless of the scope configured at the dashboard level, with a mobile average depending on the time span defined.

avg_over_time(sysdig_host_cpu_used_percent[$__interval])

To scope this query, you must set up an appropriate scope variable. A key step is to provide a variable name that is referenced as part of the query.

In this example, hostname is used as the variable name. The host can then be referenced using $hostname as follows:

avg_over_time(sysdig_host_cpu_used_percent{host_name=$hostname}[$__interval])

Depending on the operator specified while configuring scope values, you might need to use a different operator within the query. If you are not using the correct operator for the scope type, the system will perform the query but will show a warning as the results may not be the expected ones.

Scope OperatorPromQL Filter OperatorExample
  • is foo
  • is not foo
  • = : Select labels that are exactly equal to the provided string.
  • != : Select labels that are not equal to the provided string.
sysdig_host_cpu_used_percent{host_name=$hostname}
  • in foo,bar
  • not in foo,bar
  • =~: Select labels that regex-match the provided string.
  • !~ : Select labels that do not regex-match the provided string.
sysdig_host_cpu_used_percent{host_name=~$hostname}

Enrich Metrics with Labels

Running PromQL queries in Sysdig Monitor by default returns only a minimum set of labels. To enrich the return results of PromQL queries with additional labels, such as Kubernetes cluster name, you need to use a vector matching operation. The vector matching operation in Prometheus is similar to the SQL-like join operation.

Info Metrics

Prometheus returns different information metrics that have a value of 1 with several labels. The information that the info metrics return might not be useful as it is. However, joining the labels of an info metric with a non-info metric can provide useful information, such as the value of metric X across an infrastructure/application/deployment.

Vector Matching Operation

The vector matching operation is similar to an SQL join. You use a vector matching operation to build a PromQL query that can return metrics with information from your infrastructure. Vector matching helps filter and enrich labels, usually adding information labels to the metrics you are trying to visualize.

See Mapping Between Classic Metrics and PromQL Metrics for a list of info metrics.

Example 1: Return a Metric Filtered by Cluster

This example shows a metric returned by an application, say myapp_guage, running on Kubernetes. The query attempts at getting an aggregated value of a cluster, by having one cluster selected in the scope. We assume that previously you have set a $cluster variable in your scope.

To do so, run the following query to return the myapp_guage metrics:

sum (myapp_gauge * on (container_id) kube_pod_container_info{cluster=$cluster})

The query performs the following operations, not necessarily in this order:

  • The kube_pod_container_info info metrics is filtered, selecting only those timeseries and the associated cluster values you want to see. The selection is based on the cluster label.

  • The myapp_gauge metric is matched with the kube_pod_container_info metric where the container_id label has the same value, multiplying both the values. Because the info metric has the value 1, multiplying the values doesn’t change the result. As the info metric has already been filtered by a cluster, only those values associated with the cluster will be kept.

  • The resultant timeseries with the value of myapp_gauge are then aggregated with the sum function and the result is returned.

Example 2: Calculate the GC Latency

This example shows calculating the GC latency in a go application deployed on a specific Kubernetes namespace.

To calculate the GC latency, run the following query:

go_gc_duration_seconds * on (container_id,host_mac) group_left(pod,namespace) kube_pod_container_info{namespace=~$namespace}

The query is performing the following operations:

  • The kube_pod_container_info info metrics are filtered based on the namespace variable.

  • The metrics associated with go_gc_duration_seconds is matched in a many-to-one way with the filtered kube_pod_container_info .

    The pod and namespace labels are added from the kube_pod_container_info metric to the result. The query keeps only those metrics that have the matching container_id and host_mac labels on both sides.

  • The values are multiplied and the resulting metrics are returned. The new metrics will only have the values associated with go_gc_duration_seconds because the info metric value is always 1.

You can use any Prometheus metric in the query. For example, the query above can be rewritten for a sample Apache metric as follows:

appinfo_apache_net_bytes * on (container_id) group_left(pod, namespace) kube_pod_container_info{namespace=~$namespace}

Example 3: Calculate Average CPU Used Percent in AWS Hosts

This example shows calculating the average CPU used percent per AWS account and region, having the hosts filtered by account and region.

avg by(region,account_id) (sysdig_host_cpu_used_percent  * on (host_mac) group_left(region,account_id) sysdig_cloud_provider_info{account_id=~$AWS_account, region=~$AWS_region})

The query performs the following operations:

  • Filters the sysdig_cloud_provider_info metric based on the account_id and region labels that come from the dashboard scope as variables.

  • Matches the sysdig_host_cpu_used_percent metrics with sysdig_cloud_provider_info. Only those metrics with the same host_mac label on both sides are extracted, adding region and account_id labels to the resulting metrics.

  • Calculates the average of the new metrics by account_id and region.

Example 4: Calculate Total CPU Usage in Deployments

This example shows calculating the total CPU usage per deployment. The value can also be filtered by cluster, namespace, and deployment by using the dashboard scope.

sum by(cluster,namespace,owner_name) ((sysdig_container_cpu_cores_used * on(container_id) group_left(pod,namespace,cluster) kube_pod_container_info) * on(pod,namespace,cluster) group_left(owner_name) kube_pod_owner{owner_kind="Deployment",owner_name=~$deployment,cluster=~$cluster,namespace=~$namespace})
  • sysdig_container_cpu_cores_used can be replaced by any metric that has the container_id label.

  • To connect the sysdig_container_cpu_cores_used metric with the pod, use kube_pod_container_info and then, use kube_pod_owner to connect the pod to other kubernetes objects.

The query performs the following:

  • sysdig_container_cpu_cores_used * on(container_id) group_left(pod,namespace,cluster) kube_pod_container_info:

    • The sysdig_container_cpu_cores_used metric value is multiplied with kube_pod_container_info (which has the value of 1), by matching container_id and by keeping the pod, namespace and cluster labels as it is.

      _name_='sysdig_container_cpu_cores_used',container='<label>', container_id='<label>',container_type='DOCKER`,host_mac='<label>'
      
    • The new metrics will be

      cluster='<label>',container='<label>', container_id='<label>',container_type='DOCKER`,host_mac='<label>',namespace='<label>, pod='<label>'
      
  • The value extracted from the previous result is multiplied with kube_pod_owner (which has the value of 1) by matching on the pod, namespace, and cluster labels and keeping the owner name from the value of kube_pod_owner . The owner can be deployment, replicaset, service, daemonset, or statefulset object.

    • The name of the deployment to filter upon is extracted from the kube_pod_owner metrics.

    • The pod, namespace, and cluster names are extracted from the kube_pod_container_info metrics.

  • The new metrics will be:

    cluster='<matched_label>',container='<matched_container_label>', container_id='<label>',container_type='DOCKER`,host_mac='<label>',namespace='<label>, owner_name ='<label>', pod='<label>'
    
  • The kube_pod_owner will have a label owner_name that is the name of the object that owns the pod. This value is extracted by filtering:

    kube_pod_owner{owner_kind="Deployment",owner_name=~$deployment,cluster=~$cluster,namespace=~$namespace}
    

    The owner_kind provides the deployment name and the origin of owner_name , that is the dashboard scope.

  • The sum aggregation is applied and the time series are aggregated by cluster, namespace, and deployment.

The following table helps understand the labels applied in each step of the query:

__name__

container_id

container

container_type

host_mac

pod

namespace

cluster

owner_name

sysdig_container_cpu_cores_used * on(container_id) group_left(pod,namespace,cluster) kube_pod_container_info)

No

Yes

Yes

Yes

Yes

Yes

Yes

Yes

No

(sysdig_container_cpu_cores_used * on(container_id) group_left(pod,namespace,cluster) kube_pod_container_info) * on(pod,namespace,cluster) group_left(owner_name) kube_pod_owner{owner_kind="Deployment",owner_name=~$deployment,cluster=~$cluster,namespace=~$namespace}

No

Yes

Yes

Yes

Yes

Yes

Yes

Yes

Yes

sum by(cluster,namespace,owner_name) ((sysdig_container_cpu_cores_used * on(container_id) group_left(pod,namespace,cluster) kube_pod_container_info) * on(pod,namespace,cluster) group_left(owner_name) kube_pod_owner{owner_kind="Deployment",owner_name=~$deployment,cluster=~$cluster,namespace=~$namespace})

No

No

No

No

No

No

Yes

Yes

Yes

Formatting

Sysdig Monitor supports percentages only as 0-100 values. In calculated ratios, you can skip multiplying the whole query times 100 by selecting percentage as a 0-1 value.

Learn More