This the multi-page printable view of this section. Click here to print.

Return to the regular view of this page.

Enhanced Metric Store

Sysdig has launched our next generation metric store, introducing a number of new features, as well as changes to and removal of some features in Sysdig Monitor. This document covers the major enhancements and changes introduced by the metric store.

    June 2022

    New Features and Enhancements

    Prometheus-Compatible Naming Conventions for Metrics & Labels

    In prior versions of Sysdig Monitor, metrics were inconsistent between PromQL and Form querying. This behavior has been changed. Metrics are now unified — all the metrics are now presented in a Prometheus compatible naming convention, as opposed to the previous statsd compatible naming convention. For example, underscore is used instead of dot notation as given below:

    kubernetes.node.allocatable.cpuCores will be mapped to kube_node_status_allocatable_cpu_cores and to kube_namespace_name.

    Your existing dashboards, alerts and notifications will be automatically migrated to the new naming convention. Sysdig APIs support metrics and labels in both old and new naming conventions. Note that for the initial release, Labels will not be migrated to the new naming convention in the old explore, events, and team settings.

    Notifications sent via alerts (webhooks, PagerDuty, etc) will use the new label/metric convention. If you are performing further processing to parse the metric or label names within these notification messages please update your scripts as appropriate.

    If you have any concerns or questions regarding this mapping or you feel like you need more time to adjust you notification tools, please contact Sysdig Support.

    For metrics mapping, see Metrics and Label Mapping.

    Context-Specific Metrics

    Metrics such as cpu.used.percent previously would either be showing values from a process, container, or host depending on your query segmentation or scope. This has been improved by creating new sets of context-specific metrics and resource specific semantics of Prometheus naming convention. For example:

    Classic MetricsNew Metrics

    Network metrics previously would either be showing values from a host, container, program, or connection depending on your query segmentation or scope. This has been improved by creating also a new sets of context-explicit metrics, in this case also per connection metrics:

    Classic MetricsNew Metrics

    Your existing dashboards, alerts and notifications will be automatically migrated to the new naming convention. Sysdig APIs support metrics in both old and new naming conventions.

    For the complete list of context-specific metrics, see Mapping Classic Metrics with Context-Specific PromQL Metrics.

    Faster Query Performance

    Queries now perform faster and handle larger volumes of data. You can expect queries executed in Sysdig Monitor to be noticeably faster.

    Single Stat Panels Displays Latest Value

    Number panels, tables, histograms, and toplist panels can now show the latest value for an entity. This can be done without having to aggregate multiple values over the time selection.

    Overview Displays Latest Data

    Overview pages now shows the latest data as opposed to an aggregated value for widgets over the time window selected. Time navigation has been removed to focus this view on the live (latest) status of your infrastructure.

    Scope Variable in PromQL Dashboard

    You can easily reference a dashboard scope in PromQL queries. To do so, use the reserved $__scope variable as shown below:

    Under the hood $__scope will be substituted with the expression specified in the dashboard scope. This is achieved by leveraging Sysdig ServiceVision technology which allows for automatically enriching metrics with Kubernetes and application context. Learn more about ServiceVision.

    Mixed-Metric Granularity

    Sysdig Monitor can now display metrics scraped at different intervals, for example 10s and 1m, on the same graph.

    Improved Granularity for PromQL panels

    Granularity of graphs has been improved for promQL panels. For example, a 1 hour selection now shows metrics with 10 second intervals. In prior versions, 1-hour selection in Dashboards showed metrics in 1-minute interval.

    Removed Re-Alignment

    Previously, Sysdig Monitor would re-align time selections in graphs due to certain performance limitations. This time re-alignment has been removed to show more up-to-date metrics.

    Troubleshooting Metrics

    Troubleshooting metrics (program metrics, connection-level network metrics, and Kubernetes troubleshooting metrics) are being reported on a granular level at 10s and will be stored for 4 days. For the list of troubleshooting metrics and the labels that you can use to segment them, see Troubleshooting Metrics.

    Discontinued Features

    Discontinued Metrics and Labels

    Below is the list of metrics and labels that are going to be discontinued. We made an effort to not deprecate any metrics or labels used in existing alerts, but in case you encounter any issues please contact us.

    It is important to note that we have applied automatic mapping of all net.*.request.time.worst metrics to net.*.request.time, as max aggregation gives equivalent results and it was almost exclusively used in combination with these metrics.

    Discontinued Metrics

    The following metrics are no longer supported:

    • net.request.time.file
    • net.request.time.file.percent
    • net.request.time.local
    • net.request.time.local.percent
    • net.request.time.nextTiers
    • net.request.time.nextTiers.percent
    • net.request.time.processing
    • net.request.time.processing.percent
    • net.request.time.worst.out
    • net.http.request.time.worst
    • net.mongodb.request.time.worst
    • net.sql.request.time.worst

    Discontinued Labels

    The following labels are no longer supported:

    • net.connection.client
    • net.connection.direction
    • net.connection.endpoint.tcp
    • net.connection.udp.inverted
    • net.connection.errorCode
    • net.connection.l4proto
    • net.connection.server
    • net.connection.state
    • net.role
    • cloudProvider.resource.endPoint
    • host.container.mappings
    • host.ip.all
    • host.ip.private
    • host.ip.public
    • host.server.port
    • host.isClientServer
    • host.isInstrumented
    • host.isInternal
    • host.procList.main
    • host_domain
    • program.environment
    • program.usernames
    • mesos_cluster
    • mesos_node
    • mesos_pid

    In addition to this, composite labels ending with the ‘.label’ string will no longer be supported. For example kubernetes.service.label will be deprecated, but kubernetes.service.label.* labels will continue to be supported.

    Removed Featurees

    Topology Maps

    Topology Maps will be deprecated due to their incompatibility with the new data store and had limitations at scale for certain users.

    Agent Percentiles

    Agent derived percentiles will be deprecated. If you have been using these, your query will stop working and you will have to manually migrate your queries to leverage Prometheus histograms or PromQL functions such as histogram_quantile to achieve more precise results.

    Change in Functionality

    Usage of Labels in Table Panel

    Querying labels as metrics is limited to Infrastructure labels. For example, you can use all the host level labels (for example, agent tags), aws tags (for example, region) and the Kubernetes labels (for example, workload) to build table panels.

    Aggregated Data for Non Timecharts

    Due to the underlying changes we made to our core metric ingestion engine, charts that are not Timecharts (e.g. Number panels) will sometimes not display aggregated data for the full requested time range. In this case, we will

    1. aggregate a portion of data. This will in all cases be no less than across the span of 2 weeks
    2. clearly define the time range for which we were able to aggregate in the warning message. It is important to note that this is a transient side effect, and will be less likely to happen over time.

    Contact Us

    If you have any questions or comments about these changes, feel free to Sysdig Support or contact your Sysdig representative.