Legacy Alerts Editor

If you do not have the new Sysdig metric store enabled, you will not be able to use the latest Alert Editor features. You will continue to use the legacy Alerts Editor to create and edit alert notifications.

Alert Types

The types of alerts available in Sysdig Monitor:

  • Downtime: Monitor any type of entity, such as a host, a container, or a process, and alert when the entity goes down.

  • Metric: Monitor time-series metrics, and alert if they violate user-defined thresholds.

  • PromQL: Monitor metrics through a PromQL query.

  • Event: Monitor occurrences of specific events, and alert if the total number of occurrences violates a threshold. Useful for alerting on container, orchestration, and service events like restarts and unauthorized access.

  • Anomaly Detection: Monitor hosts based on their historical behaviors, and alert when they deviate from the expected pattern.

  • Group Outlier: Monitor a group of hosts and be notified when one acts differently from the rest. Group Outlier Alert is supported only on hosts.

Alert Tools

The following tools help with alert creation:

  • Alert Library: Sysdig Monitor provides a set of alerts by default. Use it as it is or as a template to create your own.

  • Sysdig API: Use Sysdig’s Python client to create, list, delete, update and restore alerts. See examples.

Guidelines for Creating Alerts

Steps

Description

Decide What to monitor

Determine what type of problem you want to be alerted on. See Alert Types to choose a type of problem.

Define how it will be monitored

Specify exactly what behavior triggers a violation. For example, Marathon App is down on the Kubernetes Cluster named Production for ten minutes.

Decide Where to monitor

Narrow down your environment to receive fine-tuned results. Use Scope to choose an entity that you want to keep a close watch on. Specify additional segments (entities) to give context to the problem. For example, in addition to specifying a Kubernetes cluster, add a namespace and deployment to refine your scope.

Define when to notify

Define the threshold and time window for assessing the alert condition.

Single Alert fires an alert for your entire scope, while Multiple Alert fires if any or every segment breach the threshold at once.

Multiple Alerts include all the segments you specified to uniquely identify the location and thus provides a full qualification of where the problem occurred. The higher the number of segments the easier to uniquely identify the affected entities.

A good analogy for multiple alerts is alerting on cities. For example, creating multiple alerts on San Francisco would trigger an alert which will include information such as the country that it is part of is the USA and the continent is North America.

Trigger gives you control over how notifications are created. For example, you may want to receive a notification for every violation, or want only a single notification for a series of consecutive violations.

Decide how notifications are sent

Alert supports customizable notification channels, including email, mobile push notifications, OpsGenie, Slack, and more. To see supported services, see Set Up Notification Channels.

To create alerts, simply:

  1. Choose an alert type.

  2. Configure alert parameters.

  3. Configure the notification channels you want to use for alert notification.

Sysdig sometimes deprecates outdated metrics. Alerts that use these metrics will not be modified or disabled, but will no longer be updated. See Deprecated Metrics and Labels.

Configure Alerts

Use the Alert wizard to create or edit alerts.

Open the Alert Wizard

There are multiple ways to access the Alert wizard:

From Explore

Do one of the following:

  • Select New Alert next to an entity.

  • Click More Options (three dots), and select Create a new alert.

From Dashboards

Click the More Options (three dots) icon for a panel, and select Create Alert.

From Alerts

Do one of the following:

  • Click Add Alerts.

  • Select an existing alert and click Edit.

From Overview

From the Events panel on the Overview screen, select a custom or an Infrastructure type event. From the event description screen, click Create Alert from Event.

Create an Alert

Configure notification channels before you begin, so the channels are available to assign to the alert. Optionally, you can add a custom subject and body information into individual alert notifications.

Enter Basic Alert Information

Configuration slightly defers for each Alert type. See respective pages to learn more. This section covers general instructions to help you acquainted with and navigate the Alerts user interface.

To configure an alert, open the Alert wizard and set the following parameters:

  • Create the alert:

    • Type: Select the desired Alert Types.

      Each type has different parameters, but they follow the same pattern:

      • Name: Specify a meaningful name that can uniquely represent the Alert that you are creating. For example, the entity that an alert targets, such as Production Cluster Failed Scheduling pods.

      • Group (optional): Specify a meaningful group name for the alert you are creating. Group name helps you narrow down the problem area and focus on the infrastructure view that needs your attention. For example, you can enter Redis for alerts related to Redis services. When the alert triggers you will know which service in your workload requires inspection. Alerts that have no group name will be added to the Default Group. Group name is editable. Edit the alert to do so.

        An alert can belong to only one group. An alert created from an alert template will have the group already configured by the Monitor Integrations. You can see the existing alert groups on the Alerts details page.

        See Groupings for more information on how Sysdig handles infrastructure views.

      • Description (optional): Briefly expand on the alert name or alert condition to give additional context for the recipient.

      • Priority: Select a priority. High, Medium, Low, and Info. You can later sort by the severity by using the top navigation pane.

      • Specify the parameters in the Define, Notify, and Act sections.

  • Define:

    Based on the alert type, define the parameters.

    • Downtime: Select the entity to monitor. For more information, see Downtime Alert.

    • Metric: Select a metric that this alert will monitor. You also define how the data is aggregated, such as average, maximum, minimum, or sum. Metrics are applied to a group of items (group aggregation). For more information, see Metric Alerts.

    • PromQL: Enter the PromQL query and duration. For more information, see PromQL Alerts.

    • Event: Filter the custom event to be alerted on by using the name, tag, description and one or more event sources. For more information, see Event Alerts

    • Anomaly Detection: Specify the metrics to be monitored for anomalies. For more information, see Anomaly Detection Alerts.

    • Group Outlier: Specify the metrics to be monitored for outliers. For more information, see Group Outlier Alerts.

To alert on multiple metrics using boolean logic, click Create multi-condition alerts. See Multi-Condition Alerts.

  • Scope: Everywhere, or a more limited scope to filter a specific component of the infrastructure monitored, such as a Kubernetes deployment, a Sysdig Agent, or a specific service.

  • Trigger: Boundaries for assessing the alert condition, and whether to send a single alert or multiple alerts. Supported time scales are minute, hour, or day.

    • Single alert: Single Alert fires an alert for your entire scope.

    • Multiple alerts: Multiple Alert fires if any or every segment breaches the threshold at once.

      Multiple alerts are triggered for each segment you specify. The specified segments will be represented in alerts. The higher the number of segments the easier to uniquely identify the affected entities.

For detailed description, see respective sections on Alert Types.

  • (2) Notify

    • Notification Channel: Select from the configured notification channels in the list. Supported channels are:

      • Email

      • Slack

      • Amazon SNS Topic

      • Opsgenie

      • Pagerduty

      • VictorOps

      • Webhook

      You can view the list of notification channels configured for each alert on the Alerts page.

    • Notification Options: Set the time interval at which multiple alerts should be sent.

    • Format Message: If applicable, add message format details. See Customize Notifications.

  • (3) Act

    • (Optional): Configure a Sysdig capture. See also Captures.

      Sysdig capture files are not available for Event Alerts.

  • Click Create.

Optional: Customize Notifications

You can optionally customize individual notifications to provide context for the errors that triggered the alert. All the notification channels support this added contextual information and customization flexibility.

Modify the subject, body, or both of the alert notification with the following:

  • Plaintext: A custom message stating the problem. For example, Stalled Deployment.

  • Hyperlink: For example, URL to a Dashboard.

  • Dynamic Variable: For example, a hostname. Note the conventions:

    • All variables that you insert must be enclosed in double curly braces, such as {{file_mount}}.

    • Variables are case sensitive.

    • The variables should correspond to the segment values you created the alert for. For example, if an alert is segmented byhost.hostName andcontainer.name, the corresponding variables will be{{host.hostName}}and {{container.name}} respectively. In addition to these segment variables, __alert_name__  and __alert_status__ are supported. No other segment variables are allowed in the notification subject and body.

    • Notification subjects will not show up on the Event feed.

    • Using a variable that is not a part of the segment will trigger an error.

    • The segment variables used in an alert are turned to the current system values upon sending the alert.

The body of the notification message contains a Default Alert Template. It is the default alert notification generated by Sysdig Monitor. You may add free text, variables, or hyperlinks before and after the template.

You can send a customized alert notification to the following channels:

  • Email

  • Slack

  • Amazon SNS Topic

  • Opsgenie

  • Pagerduty

  • VictorOps

  • Webhook

Multi-Condition Alerts

Multi-condition alerts are advanced alert threshold created on complex conditions. To do so, you define alert thresholds as custom boolean expressions that can involve multiple conditions. Click Create multi-condition alerts to enable adding conditions as boolean expressions.

These advanced alerts require specific syntax, as described in the examples below.

Format and Operations

Each condition has five parts:

  • Metric Name : Use the exact metric names. To avoid typos, click the HELP link to access the drop-down list of available metrics. Selecting a metric from the list will automatically add the name to the threshold expression being edited.

  • Group Aggregation (optional): If no group aggregation type is selected, the appropriate default for the metric will be applied (either sum or average). Group aggregation functions must be applied outside of time aggregation functions.

  • Time aggregation : It’s the historical data rolled up over a selected period of time.

  • Operator: Both logical and relational operators are supported.

  • Value: A static numerical value against which a condition is evaluated.

The table below displays supported time aggregation functions, group aggregation functions, and relational operators:

Time Aggregation FunctionGroup Aggregation FunctionRelational Operator
timeAvg()avg()=
min()min()<
max()max()>
sum()sum()<=
>=
!=

The format is:

condition1 AND condition2
condition1 OR condition2
NOT condition1

The order of operations can also be altered via parenthesis:

NOT (condition1 AND (condition2 OR condition3))

Conditions take the following form:

groupAggregation(timeAggregation(metric.name)) operator value

Example Expressions

Several examples of advanced alerts are given below:

timeAvg(cpu.used.percent) > 50 AND timeAvg(memory.used.percent) > 75
timeAvg(cpu.used.percent) > 50 OR timeAvg(memory.used.percent) > 75
timeAvg(container.count) != 10
min(min(cpu.used.percent)) <= 30 OR max(max(cpu.used.percent)) >= 60
sum(file.bytes.total) > 0 OR sum(net.bytes.total) > 0
timeAvg(cpu.used.percent) > 50 AND (timeAvg(mysql.net.connections) > 20 OR timeAvg(memory.used.percent) > 75)