Alerts

Alert is the responsive component of Sysdig Monitor. Alerts notify you when an event/issue occurs that requires attention. Events and issues are identified based on changes in the metric values collected by Sysdig Monitor. The Alerts module displays out-of-the-box alerts and a wizard for creating and editing alerts as needed.

About Sysdig Alert

Sysdig Monitor can generate notifications based on certain conditions or events you configure. Using the alert feature, you can keep a tab on your infrastructure and find out about problems as they happen, or even before they happen with the alert conditions you define. In Sysdig Monitor, metrics serve as the central configuration artifact for alerts. A metric ties one or more conditions or events to the measures to take when the condition is met, or an event happens. Alerts work across Sysdig modules including Explore, Dashboard, Events, and Overview.

Alert Types

The types of alerts available in Sysdig Monitor:

  • Downtime: Monitor any type of entity, such as a host, a container, or a process, and alert when the entity goes down.

  • Metric: Monitor time-series metrics, and alert if they violate user-defined thresholds.

  • PromQL: Monitor metrics through a PromQL query.

  • Event: Monitor occurrences of specific events, and alert if the total number of occurrences violates a threshold. Useful for alerting on container, orchestration, and service events like restarts and unauthorized access.

  • Anomaly Detection: Monitor hosts based on their historical behaviors, and alert when they deviate from the expected pattern.

  • Group Outlier: Monitor a group of hosts and be notified when one acts differently from the rest. Group Outlier Alert is supported only on hosts.

  • Out-of-the-box: Sysdig Monitor provides a set of alerts by default. Use it as it is or as a template to create your own.

  • Sysdig API: Use Sysdig's Python client to create, list, delete, update and restore alerts. See examples.

Guidelines for Creating Alerts

Steps

Description

Decide What to monitor

Determine what type of problem you want to be alerted on. See Alert Types to choose a type of problem.

Define how it will be monitored

Specify exactly what behavior triggers a violation. For example, Marathon App is down on the Kubernetes Cluster named Production for ten minutes.

Decide Where to monitor

Narrow down your environment to receive fine-tuned results. Use Scope to choose an entity that you want to keep a close watch on. Specify additional segments (entities) to give context to the problem. For example, in addition to specifying a Kubernetes cluster, add a namespace and deployment to refine your scope.

Define when to notify

Define the threshold and time window for assessing the alert condition.

Single Alert fires an alert for your entire scope, while Multiple Alert fires if any or every segment breach the threshold at once.

Multiple Alerts include all the segments you specified to uniquely identify the location and thus provides a full qualification of where the problem occurred. The higher the number of segments the easier to uniquely identify the affected entities.

A good analogy for multiple alerts is alerting on cities. For example, creating multiple alerts on San Francisco would trigger an alert which will include information such as the country that it is part of is the USA and the continent is North America.

Trigger gives you control over how notifications are created. For example, you may want to receive a notification for every violation, or want only a single notification for a series of consecutive violations.

Decide how notifications are sent

Alert supports customizable notification channels, including email, mobile push notifications, OpsGenie, Slack, and more. To see supported services, see Set Up Notification Channels.

To create alerts, simply:

  1. Choose an alert type.

  2. Configure alert parameters.

  3. Configure the notification channels you want to use for alert notification.

Note

Sysdig sometimes deprecates outdated metrics. Alerts that use these metrics will not be modified or disabled, but will no longer be updated. See Heuristic and Deprecated Metrics.

Configure Alerts

Use the Alert wizard to create or edit alerts.

Open the Alert Wizard

There are multiple ways to access the Alert wizard:

From Explore

Do one of the following:

alert_from_explore.png
  • Select New Alertbell_icon.pngbeside an entity:

  • Click More Options (three dots), and select Create a new alert.

From Dashboards

Click the More Options (three dots) icon for a panel, and select Create Alert.

alert_from_dashboard.png

From Alerts

Do one of the following:

  • Click Add Alerts.

    add_alerts.png
  • Select an existing alert and click Edit.

    edit_alert.png

From Overview

From the Events panel on the Overview screen, select a custom or an Infrastructure type event. From the event description screen, click Create Alert from Event.

events_from_overview.png

Create an Alert

Configure notification channels before you begin, so the channels are available to assign to the alert. Optionally, you can add a custom subject and body information into individual alert notifications.

Enter Basic Alert Information

Configuration slightly defers for each Alert type. See respective pages to learn more. This section covers general instructions to help you acquainted with and navigate the Alerts user interface.

To configure an alert, open the Alert wizard and set the following parameters:

  • Create the alert:

    • Type: Select the desired Alert Types.

      select_alert_type.png

      Each type has different parameters, but they follow the same pattern:

      • Name: Specify a meaningful name that can uniquely represent the Alert that you are creating. For example, the entity that alert targets, such as Production Cluster Failed Scheduling pods.

      • Description (optional): Briefly expand on the alert name or alert condition to give additional context for the recipient.

      • Priority: Select a priority. High, Medium, Low, and Info. You can later sort by the severity by using the top navigation pane.

        sort_alerts_bypriority.png
      • Specify the parameters in the Define, Notify, and Act sections.

  • Define:

    Based on the alert type, define the parameters.

    • Downtime: Select the entity to monitor. For more information, see Downtime Alert.

    • Metric: Select a metric that this alert will monitor. You also define how the data is aggregated, such as average, maximum, minimum, or sum. Metrics are applied to a group of items (group aggregation). For more information, see Metric Alerts.

    • PromQL: Enter the PromQL query and duration. For more information, see PromQL Alerts.

    • Event: Filter the custom event to be alerted on by using the name, tag, description, and a source tag. For more information, see Event Alerts

    • Anomaly Detection: Specify the metrics to be monitored for anomalies. For more information, see Anomaly Detection Alerts.

    • Group Outlier: Specify the metrics to be monitored for outliers. For more information, see Group Outlier Alerts.

To alert on multiple metrics using boolean logic, click Create multi-condition alerts. See Multi-Condition Alerts.

mutli-condition-alert.png
  • Scope: Everywhere, or a more limited scope to filter a specific component of the infrastructure monitored, such as a Kubernetes deployment, a Sysdig Agent, or a specific service.

  • Trigger: Boundaries for assessing the alert condition, and whether to send a single alert or multiple alerts. Supported time scales are minute, hour, or day.

    • Single alert: Single Alert fires an alert for your entire scope.

    • Multiple alerts: Multiple Alert fires if any or every segment breaches the threshold at once.

      Multiple alerts are triggered for each segment you specify. The specified segments will be represented in alerts. The higher the number of segments the easier to uniquely identify the affected entities.

For detailed description, see respective sections on Alert Types.

  • (2) Notify

    • Notification Channel: Select from the configured notification channels in the list. Supported channels are:

      • Email

      • Slack

      • Amazon SNS Topic

      • Opsgenie

      • Pagerduty

      • VictorOps

      • Webhook

      You can view the list of notification channels configured for each alert on the Alerts page.

      alert_notification_channels.png
    • Notification Options: Set the time interval at which multiple alerts should be sent.

    • Format Message: If applicable, add message format details. See Customize Notifications.

  • (3) Act

    • (Optional): Configure a Sysdig capture. See also Captures.

      Sysdig capture files are not available for Event Alerts.

      alert_act.png
  • Click Create.

Optional: Customize Notifications

You can optionally customize individual notifications to provide context for the errors that triggered the alert. All the notification channels support this added contextual information and customization flexibility.

Modify the subject, body, or both of the alert notification with the following:

  • Plaintext: A custom message stating the problem. For example, Stalled Deployment.

  • Hyperlink: For example, URL to a Dashboard.

  • Dynamic Variable: For example, a hostname. Note the conventions:

    • All variables that you insert must be enclosed in double curly braces, such as {{file_mount}}.

    • Variables are case sensitive.

    • The variables should correspond to the segment values you created the alert for. For example, if an alert is segmented byhost.hostName andcontainer.name, the corresponding variables will be{{host.hostName}}and {{container.name}} respectively. In addition to these segment variables, __alert_name__  and __alert_status__ are supported. No other segment variables are allowed in the notification subject and body.

    • Notification subjects will not show up on the Event feed.

    • Using a variable that is not a part of the segment will trigger an error.

    • The segment variables used in an alert are turned to the current system values upon sending the alert.

The body of the notification message contains a Default Alert Template. It is the default alert notification generated by Sysdig Monitor. You may add free text, variables, or hyperlinks before and after the template.

You can send a customized alert notification to the following channels:

  • Email

  • Slack

  • Amazon SNS Topic

  • Opsgenie

  • Pagerduty

  • VictorOps

  • Webhook

Manage Alerts

Alerts can be managed individually, or as a group, by using the checkboxes on the left side of the Alert UI and the customization bar. The columns of the table can also be configured, to provide you with the necessary data for your use cases. Select a group of alerts and perform several batch operations, such as filtering, deleting, enabling, disabling, or exporting to a JSON object. Select individual alerts to perform tasks such as creating a copy for a different team.

Multi-Condition Alerts

Multi-condition alerts are advanced alert threshold created on complex conditions. To do so, you define alert thresholds as custom boolean expressions that can involve multiple conditions. Click Create multi-condition alerts to enable adding conditions as boolean expressions.

multicondition-alert-example.png

These advanced alerts require specific syntax, as described in the examples below.

Format and Operations

Each condition has five parts:

  • Metric Name : Use the exact metric names. To avoid typos, click the HELP link to access the drop-down list of available metrics. Selecting a metric from the list will automatically add the name to the threshold expression being edited.

  • Group Aggregation (optional): If no group aggregation type is selected, the appropriate default for the metric will be applied (either sum or average). Group aggregation functions must be applied outside of time aggregation functions.

  • Time aggregation : It's the historical data rolled up over a selected period of time.

  • Operator: Both logical and relational operators are supported.

  • Value: A static numerical value against which a condition is evaluated.

The table below displays supported time aggregation functions, group aggregation functions, and relational operators:

Time Aggregation Function

Group Aggregation Function

Relational Operator

timeAvg()

avg()

=

min()

min()

<

max()

max()

>

sum()

sum()

<=

>=

!=

The format is:

condition1 AND condition2
condition1 OR condition2
NOT condition1

The order of operations can also be altered via parenthesis:

NOT (condition1 AND (condition2 OR condition3))

Conditions take the following form:

groupAggregation(timeAggregation(metric.name)) operator value

Example Expressions

Several examples of advanced alerts are given below:

timeAvg(cpu.used.percent) > 50 AND timeAvg(memory.used.percent) > 75
timeAvg(cpu.used.percent) > 50 OR timeAvg(memory.used.percent) > 75
timeAvg(container.count) != 10  
min(min(cpu.used.percent)) <= 30 OR max(max(cpu.used.percent)) >= 60
sum(file.bytes.total) > 0 OR sum(net.bytes.total) > 0
timeAvg(cpu.used.percent) > 50 AND (timeAvg(mysql.net.connections) > 20 OR timeAvg(memory.used.percent) > 75)