Form and PromQL Editors
Automatic Query Translation
When you use the Form editor, Form queries are automatically translated into an equivalent PromQL query (applying the same translation logic triggered by the Translate to PromQL button), before Sysdig forwards them to the Sysdig Prometheus API server. Regardless of whether you are building a query by using the Form or the PromQL editor, the system will therefore retrieve data by using the Sysdig Prometheus APIs.
Legacy API and Sysdig Prometheus API
Before automatic query translation was introduced, Form queries were sent to the legacy API, and only PromQL queries would be sent to the Sysdig Prometheus API.
The most important difference between data queried by the legacy API and data produced by the Sysdig Prometheus API is granularity. The data produced by the queries configured by using the Form editor will have the same granularity as the PromQL Editor.
For example, a 1-hour time selection will now display metrics with 10-second granularity while before this enhancement, you would only get 1-minute granularity.
Translation Limitations
Sysdig translates Form queries to PromQL as faithfully as possible.However, some scenarios pose problems, because of the inherent differences between the two models. The following sections outline such cases and describe relevant differences between data produced by our legacy API and data produced by Sysdig Prometheus API for equivalent PromQL queries.
Click Translate to PromQL to make further edits to the query in PromQL.
Translating Aggregate Functions
Time Aggregation
When configuring a query using the Form editor and defining a time aggregation, Sysdig automatically translates the chosen time aggregation function into the equivalent Prometheus aggregation function according to the following tables.
The Prometheus aggregation function varies according to the type of the selected metric:
Gauge metrics that represent a single numerical value that can arbitrarily fluctuate over time, for example, CPU usage.
Counter metrics that help you record how many times something has happened, for example, a user login.
See Metric Types.
Gauge Metrics
Time Aggregation in Legacy API | Time Aggregation in Sysdig Prometheus API |
---|---|
avg | avg_over_time |
sum | sum_over_time |
min | min_over_time |
max | max_over_time |
rate | sum_over_time / $__interval_sec |
roc (rate of change) | deriv See: deriv |
Counter Metrics
Sysdig distinguishes counter metrics as follows and the difference between the two counter types is in the way they store values:
Prometheus counter metrics that Sysdig refers to as
prom
counters.Prom counters are monotonically increasing cumulative metrics and report the total number of events since the event reporter started. The value always increases except when the reporter restarts/reboots. This is called a reset.
StatsD-style counter metrics that Sysdig refers to as
delta
counters.Delta counters report the number of events in the current time window
As an example, consider the following table that shows what a delta and prom counters would store for the same sequence of events occurrences:
Time | 10 | 20 | 30 | 40 | 50 | 60 |
---|---|---|---|---|---|---|
delta | 1 | 2 | 2 | 4 | 4 | 3 |
prom | 1 | 3 | 5 | 4 | 8 | 11 |
To ensure that the same information is stored, consider the following question: how many events occurred after t=10
up to and including t=60
?
- For delta counters you sum the numbers:
2+2+4+4+3 = 15
- For prom counters you have to perform:
(3-1)+(5-3)+4+(8-4)+(11-8) = 15
- where
(3-1)
is the number of events att=20
,(5-3)
is the number of events att=30
, and so on. - The prom counter resets between
t=30
andt=40
(in fact, its value is decreased), therefore, the number of events att=40
is just the value att=40
, which is4
- where
Because of the difference in the way these two counter types store values, Susdig translates the rate
and sum
time aggregations in two different ways based on the counter type.
For prom counter metrics:
Time Aggregation in Legacy API | Time Aggregation in Sysdig Prometheus API |
---|---|
rate | rate |
sum | increase |
For delta counter metrics:
Time Aggregation in Legacy API | Time Aggregation in Sysdig Prometheus API |
---|---|
rate | sum_over_time / $__interval_sec |
sum | sum_over_time |
For additional details about the Prometheus functions, see Query Functions.
Group Aggregation
When configuring a query using the Form editor and defining a Group Aggregation, Sysdig automatically translates the chosen group aggregation function into the equivalent Prometheus aggregation function. Group Aggregation function names and meanings don’t change. For example, avg
stays avg
in PromQL as well and it has exactly the same meaning.
Using top(k) / bottom(k)
When a group aggregation is defined, you can explicitly select a set of aggregation labels. When this happens, data tends to become bulky and less readable on the charts. For this reason, when an aggregation label is configured, the Form editor automatically selects and returns the top 10 time series. How this selection happens in the Prometheus system is what makes the two models different.
As an example, imagine your data looks like the chart below, where you have four time series, represented using the colors green, blue, red, and orange.
When applying the Prometheus function top(2)
, Prometheus independently selects the top 2 time series for each point in time on the graph. Each point in time on the graph will have its own set of top 2 time series.
t0 | t1 | t2 | t3 | t4 | t5 | t6 | t7 | t8 | t9 | t10 | t11 | |
---|---|---|---|---|---|---|---|---|---|---|---|---|
top1 | blue | green | green | green | green | blue | orange | orange | orange | blue | green | orange |
top2 | green | blue | blue | orange | blue | green | green | green | green | green | orange | green |
The output of the top(2)
function applied to the time series above will therefore be represented as follows.
Note that:
- The green time series stays the same because its points are part of all
top(2)
sets. - The red time series disappears because it is not part of any
top(2)
set. - The blue and orange time series get some gaps and some isolated points, according to their presence in the various
top(2)
sets.
Using the Latest Displayed Value with Sparse Metric
Translated spare metric queries will look like this:
avg(avg_over_time(my_metric_name[$__interval]))
Sysdig uses a range vector, my_metric_name[$__interval]
, and therefore, Prometheus will only take the data points comprised within the $__interval
into account.
When displaying a sparse metric, for example, reporting values every 2 minutes, with a small time range, such as 10m, 1h, or 6h, the panel might display No Data because $__interval
does not include any non-null values, while previous Form panels had a static interval of 5 minutes.
To see data in such cases you can:
- Translate the Form panel to PromQL and set the Min. interval to an appropriate value. For example, 5m.
- Switch from Latest to Entire range. This will apply the time aggregation to all points within the selected time range.
Feedback
Was this page helpful?
Glad to hear it! Please tell us how we can improve.
Sorry to hear that. Please tell us how we can improve.