Run PromQL Queries Faster with Extended Label Set

Sysdig allows you to run PromQL queries smoother and faster with the extended label set. The extended label set is created by augmenting the incoming data with the rich metadata associated with your infrastructure and making it available in PromQL.

With this, you can troubleshoot a problem or building Dashboards and Alerts without the need to write complex queries. Sysdig automatically enriches your metrics with Kubernetes and application context without the need to instrument additional labels in your environment. This reduces operational complexity and cost—the enrichment takes place in Sysdig metric ingestion pipeline after time series have been sent to the backend.

Calculate Memory Usage by Deployment in a Cluster

Using the vector matching operation, you could run the following query and calculate the memory usage by deployment in a cluster:

sum by(cluster,namespace,owner_name) ((sysdig_container_memory_used_bytes * on(container_id) group_left(pod,namespace,cluster) kube_pod_container_info) * on(pod,namespace,cluster) group_left(owner_name) kube_pod_owner{owner_kind="Deployment",owner_name=~".+",cluster=~".+",namespace=~".+"})

To get the result, you need to write a query to perform a join (vector match) of various metrics, usually in the following order:

• Grab a metric you need that is defined on a container level. For example, a Prometheus metric or some of the Sysdig provided metrics, such as sysdig_container_memory_used_byte.

• Perform a vector match on container ID with the metric kube_pod_container_info to get the pod metadata.

• Perform a vector match on the pod, namespace, and cluster with the kube_pod_owner metric.

In the case of Sysdig’s extended label set for PromQL, all the metrics inherit the metadata, so that necessary container, host, and Kubernetes metadata are set on all the metrics. This simplifies the query so you can build and run it quickly.

Likewise, the above query can be simplified as follows:

sum by (kube_cluster_name,kube_namespace_name,kube_deployment_name) (sysdig_container_memory_used_bytes{kube_cluster_name!="",kube_namespace_name!="",kube_deployment_name!=""})

The advantages of using a simplified query are:

• Complex vector matching operations (the group_left and group_right operators) are no longer required. All the labels are already available on each of the metrics, and therefore, any filtering can be performed directly on the metric itself.

• The metrics now will have a huge amount of labels. You can use PromQL Explorer to deal with this rich metadata.

• The metadata is distinguishable from user-defined labels. For example, Kubernetes metadata labels start with kube_. For instance, cluster is replaced with kube_cluster_name.

• Create a dashboard panel or an alert from the PromQL query you run in the PromQL Query Explore.

• Filter data by applying the comparison operators on the label values given in the table.

Examples for Simplifying Queries

Given below are some of the examples of using the extended label set to simplify complex query operations.

Memory Usage in a Kubernetes Cluster

Query with core label set:

avg by (agent_tag_cluster) ((sysdig_host_memory_used_bytes/sysdig_host_memory_total_bytes) * on(host,agent_tag_cluster) sysdig_host_info{agent_tag_cluster=~".+"}) * 100

Query with the extended label set:

avg by (agent_tag_cluster) (sysdig_host_memory_used_bytes/sysdig_host_memory_total_bytes) * 100

CPU Usage in Containers

Query with the core label set:

sum by (cluster,namespace)(sysdig_container_cpu_cores_used * on (container_id) group_left(cluster,pod,namespace) kube_pod_container_info{cluster=~".+"})

Simplified query with the extended label set:

sum by (kube_cluster_name,kube_namespace_name)(sysdig_container_cpu_cores_used{kube_cluster_name=~".+"})

Memory Usage in Daemonset

Query with the core label set:

sum by(cluster,namespace,owner_name) (sum by(pod) (label_replace(sysdig_container_memory_used_bytes * on(container_id,host_mac) group_left(label_io_kubernetes_pod_namespace,label_io_kubernetes_pod_name,label_io_kubernetes_container_name) sysdig_container_info{label_io_kubernetes_pod_namespace=~".*",cluster=~".*"},"pod","$1","label_io_kubernetes_pod_name","(.*)")) * on(pod) group_right sum by(cluster,namespace,owner_name,pod) (kube_pod_owner{owner_kind=~"DaemonSet",owner_name=~".*",cluster=~".*",namespace=~".*"})) Simplified query with the extended label set: sum by(kube_cluster_name,kube_namespace_name,kube_daemonset_name) (sysdig_container_memory_used_bytes{kube_daemonset_name=~".*",kube_cluster_name=~".*",kube_namespace_name=~".*"}) Pod Restarts in a Kubernetes Cluster Query with the core label set: sum by(cluster,namespace,owner_name)(changes(kube_pod_status_ready{condition="true",cluster=~$cluster,namespace=~$namespace}[$__interval]) * on(cluster,namespace,pod) group_left(owner_name) kube_pod_owner{owner_kind="Deployment",owner_name=~".+",cluster=~$cluster,namespace=~$namespace})

Simplified query with the extended label set:

sum by (kube_cluster_name,kube_namespace_name,kube_deployment_name)(changes(kube_pod_status_ready{condition="true",kube_cluster_name=~$cluster,kube_namespace_name=~$namespace,kube_deployment_name=~".+"}[$__interval])) Containers per Image Query with the core label set: count by (owner_name,image,cluster,namespace)((sysdig_container_info{cluster=~$cluster,namespace=~$namespace}) * on(pod,namespace,cluster) group_left(owner_name) max by (pod,namespace,cluster,owner_name)(kube_pod_owner{owner_kind="Deployment",owner_name=~".+"})) Simplified query with the extended label set: count by (kube_deployment_name,image,kube_cluster_name,kube_namespace_name)(sysdig_container_info{kube_deployment_name=~".+",kube_cluster_name=~$cluster,kube_namespace_name=~$namespace}) Average TCP Queue per Node Query with the core label set: avg by (agent_tag_cluster,host)( sysdig_host_net_tcp_queue_len * on (host_mac) group_left(agent_tag_cluster,host) sysdig_host_info{agent_tag_cluster=~$cluster,host=~".+"})

Simplified query with the extended label set:

avg by (agent_tag_cluster,host_hostname) (sysdig_host_net_tcp_queue_len{agent_tag_cluster =~ \$cluster})