Configure Alerts

Configure alerts based on specific metrics, conditions, and thresholds.

Create an Alert

In addition to the Alert Editor, alerts can be created in various ways.

  • From Metrics Explorer, Select Create Alert.
  • From an existing Dashboard, select the More Options (three dots) icon for a panel, and select Create Alert.
  • From any Event panel, select Create Alert from Event.

Alert Types

You use the Alert Editor to create alerts.

  • Metric: Select a time-series metric to be alerted on if they violate user-defined thresholds.
  • PromQL: Enter the PromQL query and duration to define an alert condition.
  • Event: Alert on events using the event name, tag, description, and source
  • Downtime: Select the entity to monitor.

Settings

  • Alert Severity: Select a priority. High, Medium, Low and Info.
  • Alert Name: Specify a meaningful name that can uniquely represent the Alert you are creating. For example, Production Cluster Failed Scheduling pods.
  • Description (optional): Add additional alert context
  • Group (optional): Group alerts by assigning them to a specific group name. Alerts that have no group name will be added to the Default Group.
  • Orphaned Alerts: Automatically deactivate orphaned alert occurrences and eliminate noise caused by outdated alerts triggered for entities that are no longer reporting data.
  • Link to Dashboard: Select a dashboard that you might want to include in the alert notification.
  • Link to Runbook: Specify the URL of a runbook.

Evaluation Interval for Metric Alerts and PromQL Alerts

By default, alerts are evaluated every minute. However, if an alert has a range of 3 hours or more, it will be evaluated every 10 minutes instead. For instance, if you have configured a metric alert to look at data “over the last 3 hours,” it will be evaluated every 10 minutes. The same applies to PromQL alerts like sum(rate(errors_total[3h])) > 100 which will also be evaluated every 10 minutes. Additionally, this means that re-notifications can only be as frequent as 10m for these alerts.

Please note that Metric Alerts and PromQL Alerts with ranges of 60 days or more are not supported.

Query RangeCheck Interval
up to 3h1m
up to 1d10m
up to 7d1h
up to 60d1d
60dNot Supported

Alert Notifications

Notification Channel

After setting up a Notification Channel, the channel will appear on the Notification Channel drop-down list. You can configure alerts for forwarding to multiple notification channels when the alert condition is met.

Resolution Notification

Notification Channels that receive alert notifications can also receive resolution notifications when the alert condition is no longer met. Toggle Get Notified under When Resolved in order to forward resolution notifications so that incidents can be automatically closed in incident management channels such as Pagerduty or Opsgenie.

This setting allows an alert to override the notification channel’s default notification settings. If an override is not configured, the alert will inherit the default settings from the notification channel.

Customize Notifications

Configure Notification Template

Optionally customize alert notifications using custom text and dynamic variables.

Dynamic Variables

Dynamic variable assigns itself the value of the variable which can continually change as the operation is evaluated.

  • The variables should correspond to the segment values you created the alert for. For example, if an alert is segmented by host_hostName and container_name, the corresponding variables will be {{host_hostName}} and {{container_name}} respectively.

  • The variables that you insert must be enclosed in double curly braces, such as {{file_mount}}.

  • Variables are case-sensitive.

  • Notification subjects will not show up on the Event feed.

  • Using a variable that is not a part of the segment will trigger an error.

  • When a variable is not resolved, the output will be “N/a”. No error will be reported.

  • Supported variables are:

  • {{__alert_name__}} : The unique name of the alert.

  • {{__alert_status__}} : the status of the alert, between Triggered and Resolved.

  • {{$value}}: Obtains the metric value that made the alert trigger or resolve.

  • Any label that you can find on the corresponding Scope section of the matching Event in the Events feed, as given below. This can be any of the labels used in the segment or scope, but also any of the expanded labels you are able to obtain, used with the syntax {{labelName}}.

No other segment variables are allowed in the notification subject and body.

The body of the notification message contains a Default Alert Template. It is the default alert notification generated by Sysdig Monitor. You may add free text, variables, or hyperlinks before and after the template.

Example

The following example shows a notification template created to alert you on Failing Prometheus Jobs. Adding {{kube_cluser_name}}: {{job}} - {{__alert_name__}} is {{__alert_status__}} to the subject line helps you identify the problem area at a glance without having to read the entire notification body.

Deactivate Orphaned Alert Occurrences Automatically

Sysdig Monitor can automatically deactivate alert occurrences triggered by entities like hosts or containers that are no longer reporting data. This enhancement curbs noise from potentially outdated alert occurrences, ensuring that your alert notifications remain relevant. Alert occurrences are deactivated rather than resolved since the resolution condition can no longer be achieved for disconnected entities.

Entities may cease to report data during scaling events or on dynamic workloads. By automatically deactivating orphaned alert occurrences, users can eliminate false positives from their alert occurrences and ensure that only alert occurrences from entities that actually exist are reported in the system.

PromQL and Event alerts do not support this feature.

Supported Aggregation Functions

The table below displays supported time aggregation functions, group aggregation functions, and relational operators:

Time Aggregation FunctionGroup Aggregation FunctionRelational Operator
timeAvg()avg()=
min()min()<
max()max()>
sum()sum()<=
not applicablenot applicable>=
not applicablenot applicable!=

Captures

Optionally, configure a Sysdig capture. Specify the following:

  • Capture Enabled: Click the slider to enable Capture.
  • Capture Duration: The period of time captured. The default time is 15 seconds. The capture time starts from the time the alert threshold was breached
  • Capture Storage: The storage location for the capture files.
  • Capture Name: The name of the capture file
  • Capture Filter: Restricts the amount of trace information collected.

Sysdig capture files are not available for Event and PromQL Alerts. See Captures for more information.