Configure Alerts

Use the Alerts Editor to create or edit alerts.

Different Ways To Create An Alert

Beyond the ability to use the Alert Editor, you can create alerts from different modules.

  • From Metrics Explorer, Select Create Alert.
  • From an existing Dashboard, Select the More Options (three dots) icon for a panel, and select Create Alert.
  • From any Event panel, Select Create Alert from Event.

Create An Alert from the Editor

Configure notification channels before you begin, so the channels are available to assign to the alert. Optionally, you can add a custom subject and body information into individual alert notifications.

Enter Basic Alert Information

Configuration slightly differs for each Alert type. See respective pages to learn more. This section covers general instructions to help you acquainted with and navigate the Alerts user interface.

To configure an alert, open the Alert Editor and set the following parameters:

Alert Types

Select the desired Alert Type:

  • Downtime: Select the entity to monitor.
  • Metric: Select a time-series metric to be alerted on if they violate user-defined thresholds.
  • PromQL: Enter the PromQL query and duration to define an alert condition.
  • Event: Filter the custom event to be alerted on by using the name, tag, description, and a source tag.

Metric and Condition

  • Scope: Select Entire Infrastructure, or one or more labels to apply a limited scope and filter a specific metric.

  • Metric: Select a metric that this alert will monitor. Selecting a metric from the list will automatically add the name to the threshold expression being edited. Define how the data is aggregated (Time aggregation), such as average, maximum, minimum, or sum. It’s the historical data rolled up over a selected period.

  • Group By: Metrics are applied to a group of items (Group Aggregation). If no group aggregation type is selected, the appropriate default for the metric will be applied (either sum or average). Group aggregation functions must be applied outside of time aggregation functions.

  • Segment by: Select one or more labels for segmentation. This allows for the creation of multi-series comparisons and multiple alerts. Multiple alerts will be triggered for each segment you specify. For more information, see Metric Alerts.

Multiple Thresholds

In addition to an alert threshold, a warning threshold can be configured for Metric Alerts and Event Alerts. Warning thresholds and alert thresholds can be associated with different notification channels. In the following example, a user may want to send a warning and alert notification to Slack, but also page the on-call team on Pagerduty if an alert threshold is met.

  • Notify when resolved: In order to prevent a Pagerduty incident from automatically resolving once the alert threshold is no longer met, the user can toggle ‘Notify when Resolved’ off in order to ensure that the on-call team can triage the incident. This setting allows an alert to override the notification channel’s default notification settings. If an override is not configured, the alert will inherit the default settings from the notification channel.

If both warning and alert thresholds are associated with the same notification channel, a metric immediately exceeding the alert threshold will ignore the warn threshold and only trigger the alert threshold.

Notification

  • Notification Channel: Select from the configured notification channels in the list. Supported channels are:

    • Email

    • Slack

    • Amazon SNS Topic

    • Opsgenie

    • Pagerduty

    • VictorOps

    • Webhook

    You can view the list of notification channels configured for each alert on the Alerts page.

  • Configure Notification Template: If applicable, add the following message format details and click Apply Template.

    • Notification Subject & Event Title: Customize using variables, such as {{__alert_name__}} is {{__alert_status__}} for {{agent_id}}
    • Notification Body: Add the text for the notification you are creating. See Customize Notifications.

Settings

  • Alert Severity: Select a priority. High, Medium, Low, and Info.
  • Alert Name: Specify a meaningful name that can uniquely represent the Alert you are creating. For example, the entity that an alert targets, such as Production Cluster Failed Scheduling pods.
  • Description (optional): Briefly expand on the alert name or alert condition to give additional context for the recipient.
  • Group (optional): Specify a meaningful group name for the alert you are creating. Alerts that have no group name will be added to the Default Group.
  • Link to Dashboard: Select a dashboard that you might want to include in the alert notification. You can view the specified dashboard link in the event feed associated with the alert.
  • Link to Runbook: Specify the URL of a runbook. The link to the runbook appears in the event feed.

Captures

Optionally, configure a Sysdig capture. Specify the following:

  • Capture Enabled: Click the slider to enable Capture.
  • Capture Duration: The period of time captured. The default time is 15 seconds. The capture time starts from the time the alert threshold was breached
  • Capture Storage: The storage location for the capture files.
  • Capture Name: The name of the capture file
  • Capture Filter: Restricts the amount of trace information collected.

Sysdig capture files are not available for Event and PromQL Alerts. See Captures for more information.

Optional: Customize Notifications

You can optionally customize individual notifications to provide context for the errors that triggered the alert. All the notification channels support this added contextual information and customization flexibility.

Modify the subject, body, or both of the alert notification with the following:

  • Plaintext: A custom message stating the problem. For example, Stalled Deployment.

  • Hyperlink: For example, URL to a Dashboard.

  • Dynamic Variable: For example, a hostname. Note the conventions:

    • All variables that you insert must be enclosed in double curly braces, such as {{file_mount}}.
    • Variables are case sensitive.
    • The variables should correspond to the segment values you created the alert for. For example, if an alert is segmented by host_hostName and container_name, the corresponding variables will be {{host_hostName}} and {{container_name}} respectively. In addition to these segment variables, __alert_name__  and __alert_status__ are supported. No other segment variables are allowed in the notification subject and body.
    • Notification subjects will not show up on the Event feed.
    • Using a variable that is not a part of the segment will trigger an error.

The body of the notification message contains a Default Alert Template. It is the default alert notification generated by Sysdig Monitor. You may add free text, variables, or hyperlinks before and after the template.

You can send a customized alert notification to the following channels:

  • Email
  • Slack
  • Amazon SNS Topic
  • Opsgenie
  • Pagerduty
  • VictorOps
  • Webhook

The following example shows a notification template created to alert you on Failing Prometheus Jobs. Adding {{kube_cluser_name}}: {{job}} - {{__alert_name__}} is {{__alert_status__}} to the subject line helps you identify the problem area at a glance without having to read the entire notification body.

Supported Aggregation Functions

The table below displays supported time aggregation functions, group aggregation functions, and relational operators:

Time Aggregation FunctionGroup Aggregation FunctionRelational Operator
timeAvg()avg()=
min()min()<
max()max()>
sum()sum()<=
not applicablenot applicable>=
not applicablenot applicable!=