Configure Alerts
Create an Alert
In addition to the Alert Editor, alerts can be created in various ways.
- From Metrics Explorer, Select Create Alert.
- From an existing Dashboard, select the
More Options
(three dots) icon for a panel, and select Create Alert. - From any Event panel, select Create Alert from Event.
Alert Types
You use the Alert Editor to create alerts.
- Threshold (Previously Metric): Monitor your infrastructure by comparing any metric against user-defined thresholds
- Prometheus (Previously PromQL): Monitor your infrastructure with PromQL queries, maintaining full compatibility with OSS Prometheus.
- Event: Monitor your infrastructure by tracking specific events, and alert if the total number of occurrences exceeds a user-defined threshold
- Group Outlier: Monitor unusual patterns by detecting deviations from expected group behavior.
- Percentage of Change: Compare the percentage of change of a metric over two specific timeframes, such as comparing the last 5 minutes to the previous hour.
- Downtime: Monitor any type of entity - host, container, process, service, etc - and alert when the entity goes down.
Settings
- Alert Severity: Select a priority. High, Medium, Low and Info.
- Alert Name: Specify a meaningful name that can uniquely represent the Alert you are creating. For example,
Production Cluster Failed Scheduling pods
. - Description (optional): Add additional alert context
- Group (optional): Group alerts by assigning them to a specific group name. Alerts that have no group name will be added to the Default Group.
- Orphaned Alerts: Automatically deactivate orphaned alert occurrences and eliminate noise caused by outdated alerts triggered for entities that are no longer reporting data.
- Link to Dashboard: Select a dashboard that you might want to include in the alert notification.
- Link to Runbook: Specify the URL of a runbook.
Consistent Alert Preview for Alert Rule Evaluation
Threshold Alerts and Prometheus Alerts provide an alert preview that accurately reflects alert rule checks. This aligns the data points in the alert preview with the actual alert evaluation intervals, ensuring a realistic representation of alert behavior, since each point in the alert preview corresponds to an actual alert rule check.
For scenarios where you need to view data in a different format from the alert rule checks, such as data in 10s
intervals or a week-long alert preview, switch to Explore Mode. This mode provides the flexibility to view data at different granularities or over extended periods, even if these do not correspond to the specific intervals of alert check.
Evaluation Interval for Threshold Alerts and Prometheus Alerts
By default, alerts are evaluated every minute. However, if an alert has a range of 3 hours or more, it will be evaluated every 10 minutes instead. For instance, if you have configured a Threshold Alert to look at data “over the last 3 hours,” it will be evaluated every 10 minutes. The same applies to Prometheus Alerts like sum(rate(errors_total[3h])) > 100
which will also be evaluated every 10 minutes. Additionally, this means that re-notifications can only be as frequent as 10m for these alerts.
Please note that Threshold Alerts and Prometheus Alerts with ranges of 60 days or more are not supported.
Query Range | Check Interval |
---|---|
up to 3h | 1m |
up to 1d | 10m |
up to 7d | 1h |
up to 60d | 1d |
60d | Not Supported |
Alert Notifications
Notification Channel
After setting up a Notification Channel, the channel will appear on the Notification Channel drop-down list. You can configure alerts for forwarding to multiple notification channels when the alert condition is met.
Resolution Notification
Notification Channels that receive alert notifications can also receive resolution notifications when the alert condition is no longer met. Toggle Get Notified under When Resolved in order to forward resolution notifications so that incidents can be automatically closed in incident management channels such as Pagerduty or Opsgenie.
This setting allows an alert to override the notification channel’s default notification settings. If an override is not configured, the alert will inherit the default settings from the notification channel.
Customize Notifications
Configure Notification Template
Optionally customize alert notifications using custom text and dynamic variables.
Dynamic Variables
Dynamic variable assign themselves the value of the variable which can continually change as the operation is evaluated.
The variables should correspond to the segment values you created the alert for. For example, if an alert is segmented by
host_hostName
andcontainer_name
, the corresponding variables will be{{host_hostName}}
and{{container_name}}
respectively.The variables that you insert must be enclosed in double curly braces, such as
{{file_mount}}
.Variables are case-sensitive.
Notification subjects will not show up on the Event feed.
Using a variable that is not a part of the segment will trigger an error.
When a variable is not resolved, the output will be “N/a”. No error will be reported.
Supported variables are:
{{__alert_name__}}
: The unique name of the alert.{{__alert_status__}}
: the status of the alert, between Triggered and Resolved.{{$value}}
: Obtains the metric value that made the alert trigger or resolve.Any label that you can find on the corresponding Scope section of the matching Event in the Events feed, as given below. This can be any of the labels used in the segment or scope, but also any of the expanded labels you are able to obtain, used with the syntax
{{labelName}}
.
An Alert Event in the Events Feed
No other segment variables are allowed in the notification subject and body.
The body of the notification message contains a Default Alert Template. It is the default alert notification generated by Sysdig Monitor. You may add free text, variables, or hyperlinks before and after the template.
Example
The following example shows a notification template created to alert you on Failing Prometheus Jobs. Adding {{kube_cluser_name}}: {{job}}
- {{__alert_name__}} is {{__alert_status__}}
to the subject line helps you identify the problem area at a glance without having to read the entire notification body.
Deactivate Orphaned Alert Occurrences Automatically
Sysdig Monitor can automatically deactivate alert occurrences triggered by entities like hosts or containers that are no longer reporting data. This enhancement curbs noise from potentially outdated alert occurrences, ensuring that your alert notifications remain relevant. Alert occurrences are deactivated rather than resolved since the resolution condition can no longer be achieved for disconnected entities.
Entities may cease to report data during scaling events or on dynamic workloads. By automatically deactivating orphaned alert occurrences, users can eliminate false positives from their alert occurrences and ensure that only alert occurrences from entities that actually exist are reported in the system.
Prometheus Alerts and Event Alerts do not support this feature.
Supported Aggregation Functions
The table below displays supported time aggregation functions, group aggregation functions, and relational operators:
Time Aggregation Function | Group Aggregation Function | Relational Operator |
---|---|---|
timeAvg() | avg() | = |
min() | min() | < |
max() | max() | > |
sum() | sum() | <= |
not applicable | not applicable | >= |
not applicable | not applicable | != |
Captures
Optionally, configure a Sysdig capture. Specify the following:
- Capture Enabled: Click the slider to enable Capture.
- Capture Duration: The period of time captured. The default time is 15 seconds. The capture time starts from the time the alert threshold was breached
- Capture Storage: The storage location for the capture files.
- Capture Name: The name of the capture file
- Capture Filter: Restricts the amount of trace information collected.
Sysdig capture files are not available for Event Alerts and Prometheus Alerts. See Captures for more information.
Feedback
Was this page helpful?
Glad to hear it! Please tell us how we can improve.
Sorry to hear that. Please tell us how we can improve.