Form and PromQL Editors

This topic describes the differences between Form and PromQL modes when building dashboards.

The Dashboards module helps construct data queries in two ways: using the Form editor and the PromQL editor. The choice between the two editors can be performed by using the Form/PromQL switch. Additionally, queries built using the Form editor can be automatically translated into equivalent PromQL queries by clicking the Translate to PromQL button, right next to the switch.

Using PromQL as the Single Data Query Mechanism

This feature is being progressively rolled out to the SaaS users. See the release notes for more details.

Before this enhancement, queries built using the Form editor were hitting a legacy API, while queries built using the PromQL editor (or automatically translated into equivalent PromQL queries using the Translate to PromQL button) were hitting the Sysdig Prometheus API.

By introducing this enhancement, Sysdig Monitor fully embraces PromQL and the Form editor becomes a full PromQL query builder.

This means that while using the Form editor, the configuration in the Form is automatically translated into an equivalent PromQL query (applying the same translation logic triggered by the Translate to PromQL button), eventually forwarding the query to the Sysdig Prometheus API server. Regardless of whether you are building a query by using the Form or the PromQL editor, the system will therefore retrieve data by using the Sysdig Prometheus APIs.

Implications

All the possible configurations in the Form editor are translated as faithfully as possible into equivalent PromQL queries. Note that specific scenarios require adaptation, because of the inherent differences between the two models. The following sections outline such cases and describe relevant differences between data produced by our legacy API and data produced by Sysdig Prometheus API for equivalent PromQL queries.

Improved Granularity

The most important difference between data queried by the legacy API and data produced by the Sysdig Prometheus API is granularity. The data produced by the queries configured by using the Form editor will have the same granularity as the PromQL Editor.

For example, a 1-hour time selection will now display metrics with 10-second granularity while before this enhancement, you would only get 1-minute granularity.

Translating Aggregate Functions

Time Aggregation

When configuring a query using the Form editor and defining a time aggregation, Sysdig automatically translates the chosen time aggregation function into the equivalent Prometheus aggregation function according to the following tables.

Note that the chosen Prometheus aggregation function varies according to the type of the selected metric:

  • Gauge metrics that represent a single numerical value that can arbitrarily fluctuate over time, for example, CPU usage.
  • Counter metrics that help you record how many times something has happened, for example, a user login.

Gauge Metrics

Time Aggregation in Legacy APITime Aggregation in Sysdig Prometheus API
avgavg_over_time
sumsum_over_time
minmin_over_time
maxmax_over_time
ratesum_over_time / $__interval_sec
roc (rate of change)deriv Prometheus Docs

Counter Metrics

Sysdig distinguishes counter metrics as follows and the difference between the two counter types is in the way they store values:

  • Prometheus counter metrics that Sysdig refers to as prom counters.

    Prom counters are monotonically increasing cumulative metrics and report the total number of events since the event reporter started. The value always increases except when the reporter restarts/reboots. This is called a reset.

  • StatsD-style counter metrics that Sysdig refers to as delta counters.

    Delta counters report the number of events in the current time window

As an example, consider the following table that shows what a delta and prom counters would store for the same sequence of events occurrences:

Time102030405060
delta122443
prom1354811

To ensure that the same information is stored, consider the following question: how many events occurred after t=10 up to & including t=60?

  • For delta counters you sum the numbers: 2+2+4+4+3 = 15
  • For prom counters you have to perform: (3-1)+(5-3)+4+(8-4)+(11-8) = 15
    • where (3-1) is the number of events at t=20, (5-3) is the number of events at t=30, and so on.
    • The prom counter resets between t=30 and t=40 (in fact, its value is decreased), therefore, the number of events at t=40 is just the value at t=40, which is 4

Because of the difference in the way these two counter types store values, Susdig translates the rate and sum time aggregations in two different ways based on the counter type.

For prom counter metrics:

Time Aggregation in Legacy APITime Aggregation in Sysdig Prometheus API
raterate
sumincrease

For delta counter metrics:

Time Aggregation in Legacy APITime Aggregation in Sysdig Prometheus API
ratesum_over_time / $__interval_sec
sumsum_over_time

For additional details about the Prometheus functions, see Query Functions.

Group Aggregation

When configuring a query using the Form editor and defining a Group Aggregation, Sysdig automatically translates the chosen group aggregation function into the equivalent Prometheus aggregation function. Note that, Group Aggregation function names and meanings have no changes. For example, avg stays avg in PromQL as well and it has exactly the same meaning.

Using top(k) / bottom(k)

When a group aggregation is defined, you can explicitly select a set of aggregation labels. When this happens, data tends to become bulky and less readable on the charts. For this reason, when an aggregation label is configured, the Form editor automatically selects and returns the top 10 time series. How this selection happens in the Prometheus system is what makes the two models different.

As an example, imagine your data looks like the chart below, where you have four time series, represented using the colors green, blue, red, and orange.

When applying the Prometheus function top(2), Prometheus independently selects the top 2 time series for each point in time on the graph. Each point in time on the graph will have its own set of top 2 time series.

t0t1t2t3t4t5t6t7t8t9t10t11
top1bluegreengreengreengreenblueorangeorangeorangebluegreenorange
top2greenblueblueorangebluegreengreengreengreengreenorangegreen

The output of the top(2) function applied to the time series above will therefore be represented as follows.

Note that:

  • The green time series stays the same because its points are part of all top(2) sets.
  • The red time series disappears because it is not part of any top(2) set.
  • The blue and orange time series get some gaps and some isolated points, according to their presence in the various top(2) sets.

Using the Latest Displayed Value with Sparse Metric

In the Form Number, Toplist, and Table panels you can set the Displayed Value option to Latest or Entire range. When set to Latest, the panel shows the latest non-null value in the previous 5 minutes.

After the enhancement, the translated query will look like this:

avg(avg_over_time(my_metric_name[$__interval]))

Sysdig uses a range vector, my_metric_name[$__interval], and therefore, Prometheus will only take the data points comprised within the $__interval into account.

When displaying a sparse metric, for example, reporting values every 2 minutes, with a small time range, such as 10m, 1h, or 6h, the panel might display No Data because $__interval does not include any non-null values, while previous Form panels had a static interval of 5 minutes.

To see data in such cases you can:

  • Translate the Form panel to PromQL and set the Min. interval to an appropriate value. For example, 5m.
  • Switch from Latest to Entire range. This will apply the time aggregation to all points within the selected time range.