Sysdig Documentation

Alerts

Alert is the responsive component of Sysdig Monitor. Alerts notify you when an event/issue occurs that requires attention. Events and issues are identified based on changes in the metric values collected by Sysdig Monitor. The Alerts module displays out-of-the-box alerts and a wizard for creating and editing alerts as needed.

About Sysdig Alert

Sysdig Monitor can generate notifications based on certain conditions or events you configure. Using the alert feature, you can keep a tab on your infrastructure and find out about problems as they happen, or even before they happen with the alert conditions you define. In Sysdig Monitor, metrics serve as the central configuration artifact for alerts. A metric ties one or more conditions or events to the measures to take when the condition is met, or an event happens. Alerts work across Sysdig modules including Explore, Dashboard, Events, and Overview.

Alert Types

The types of alerts available in Sysdig Monitor:

  • Downtime: Monitor any type of entity, such as a host, a container, or a process, and alert when the entity goes down.

  • Metric: Monitor time-series metrics, and alert if they violate user-defined thresholds.

  • Event: Monitor occurrences of specific events, and alert if the total number of occurrences violates a threshold. Useful for alerting on container, orchestration, and service events like restarts and unauthorized access.

  • Anomaly Detection: Monitor hosts based on their historical behaviors, and alert when they deviate from the expected pattern.

  • Group Outlier: Monitor a group of hosts and be notified when one acts differently from the rest. Group Outlier Alert is supported only on hosts.

  • Out-of-the-box: Sysdig Monitor provides a set of alerts by default. Use it as it is or as a template to create your own.

  • Sysdig API: Use Sysdig's Python client to create, list, delete, update and restore alerts. See examples.

Guidelines for Creating Alerts

Steps

Description

Decide What to monitor

Determine what type of problem you want to be alerted on. See Alert Types to choose a type of problem.

Define how it will be monitored

Specify exactly what behavior triggers a violation. For example, Marathon App is down on the Kubernetes Cluster named Production for ten minutes.

Decide Where to monitor

Narrow down your environment to receive fine-tuned results. Use Scope to choose an entity that you want to keep a close watch on. Specify additional segments (entities) to give context to the problem. For example, in addition to specifying a Kubernetes cluster, add a namespace and deployment to refine your scope.

Define when to notify

Define the threshold and time window for assessing the alert condition.

Single Alert fires an alert for your entire scope, while Multiple Alert fires if any or every segment breach the threshold at once.

Multiple Alerts include all the segments you specified to uniquely identify the location and thus provides full qualification of where the problem occurred. The higher the number of segments the easier to uniquely identify the affected entities.

A good analogy for multiple alerts is alerting on cities. For example, creating multiple alerts on San Francisco would trigger an alert which will include information such as the country that it is part of is the USA and the continent is North America.

Trigger gives you control over how notifications are created. For example, you may want to receive a notification for every violation, or want only a single notification for a series of consecutive violations.

Decide how notifications are sent

Alert supports customizable notification channels, including email, mobile push notifications, OpsGenie, Slack, and more. To see supported services, see Set Up Notification Channels.

To create alerts, simply:

  1. Choose an Alert Type.

  2. Configure alert parameters.

  3. Configure the notification channels you want to use for alert notification.

Note

Sysdig sometimes deprecates outdated metrics. Alerts that use these metrics will not be modified or disabled, but will no longer be updated. See Heuristic and Deprecated Metrics.

Configure Alerts

Use the Alert wizard to create or edit alerts.

Open the Alert Wizard

There are multiple ways to access the Alert wizard:

From the Explore Table

  • Select the Alert (bell) icon beside an entity:

    384336285.png
  • Click the More Options (three dots) icon for the table, and select Create a New Alert:

    384336280.png

From Dashboards Panel

Click the More Options (three dots) icon for a panel, and select Create Alert.

384336275.png

From Alerts Module

  • Click the Add Alert button:

    384336265.png
  • Select an existing alert (click directly or select the checkbox beside the alert) and click the Edit button:

    384336260.png

From Overview (Beta)

Select a custom or an Infrastructure type event from the Events panel on the Overview screen. From the event description screen, Click Create Alert from Event.

384336239.png

Create an Alert

Configure notification channels before you begin, so the channels are available to assign to the alert. Optionally, you can add a custom subject and body information into individual alert notifications.

Enter Basic Alert Information

Configuration slightly defers for each Alert type. See respective pages to learn more. This section covers general instructions to help you acquainted with and navigate the Alerts user interface.

To configure an alert, Open the Alert Wizard and set the following parameters:

  • (Setup):

    • Type:Select the desired Alert Type.

      384336255.png

      Each type has different parameters, but they follow the same pattern:

      • Name: Specify a meaningful name that can uniquely represent the Alert that you are creating. For example, the entity that alert targets, such as Production Cluster Failed Scheduling pods.

      • Description (optional): Briefly expand on the alert name or alert condition to give additional context for the recipient.

      • Priority: High, Medium, Low, and Info are reflected in the Events list, where you can sort by the severity of the Event/Alert.

      • Parameters in Define, Notify, and Act sections

      384336245.png
  • (1) Define:

    • Metric: Select a metric or entity that this alert will monitor. You also define how the data is aggregated, such as avg, max, min or sum. Metrics are applied to a group of items (group aggregation).

To alert on multiple metrics using boolean logic, click Create multi-condition alerts. See Multi-Condition Alerts.

384336227.png
  • Scope: Everywhere, or a more limited scope to filter a specific component of the infrastructure monitored, such as a Kubernetes deployment, a Sysdig Agent, or a specific service.

  • Trigger: Boundaries for assessing the alert condition, and whether to send a single alert or multiple alerts. Supported time scales are minute, hour, or day.

    • Single alert:Single Alert fires an alert for your entire scope.

    • Multiple alerts: Multiple Alert fires if any or every segment breaches the threshold at once.

      Multiple alerts are triggered for each segment you specify. The specified segments will be represented in alerts. The higher the number of segments the easier to uniquely identify the affected entities.

For detailed description, see respective sections on Alert Types.

  • (2) Notify

    • Notification Channel: Select from the configured notification channels in the list. Supported channels are:

      • Email

      • Slack

      • Amazon SNS Topic

      • Opsgenie

      • Pagerduty

      • VictorOps

      • Webhook

    • Notification Options: Set the time interval at which multiple alerts should be sent.

    • Format Message: If applicable, add message format details. See Customize Notifications.

  • (3) Act

    • (Optional): Configure a Sysdig capture. See also Captures.

      Sysdig capture files are not available for Event Alerts.

      384336250.png
  • Click Create or Save.

Optional: Customize Notifications

You can optionally customize individual notifications to provide context for the errors that triggered the alert. All the notification channels support this added contextual information and customization flexibility.

Modify the subject, body, or both of the alert notification with the following:

  • Plaintext: A custom message stating the problem. For example, Stalled Deployment.

  • Hyperlink: For example, URL to a Dashboard.

  • Dynamic Variable: For example, a hostname. Note the conventions:

    • All variables that you insert must be enclosed in double curly braces, such as {{file_mount}}.

    • Variables are case sensitive.

    • The variables should correspond to the segment values you created the alert for. For example, if an alert is segmented byhost.hostName andcontainer.name, the corresponding variables will be{{host.hostName}}and {{container.name}} respectively, and no other segment variables are allowed in the notification subject and body.

    • Notification subjects will not show up on the Event feed.

    • Using a variable that is not a part of the segment will trigger an error.

    • The segment variables used in an alert are turned to the current system values upon sending the alert.

The body of the notification message contains a Default Alert Template. It is the default alert notification generated by Sysdig Monitor. You may add free text, variables, or hyperlinks before and after the template.

You can send a customized alert notification to the following channels:

  • Email

  • Slack

  • Amazon SNS Topic

  • Opsgenie

  • Pagerduty

  • VictorOps

  • Webhook

Manage Alerts

Alerts can be managed individually, or as a group, by using the checkboxes on the left side of the Alert UI, and the customization bar at the bottom. The columns of the table can also be configured, to provide you with the necessary data for your use cases. Select a group of alerts and perform several batch operations, such as deleting, enabling, disabling, or exporting to a JSON object. Select individual alerts to perform tasks such as creating a copy for a different team.

Enable/Disable Alerts

Alerts can be enabled or disabled using the customization bar. You can perform these operations on a single alert or on multiple alerts as a batch operation.

  1. From the Alerts module, check the boxes beside the relevant alerts.

  2. Click the Enable or Disable as necessary.

The Enable /Disable buttons are only visible if a relevant alert is selected. For example, in the image below, only the Disable button is visible, as the alert selected is currently enabled:

384336315.png

In the image below, both buttons are visible, as an enabled alert and a disabled alert are visible:

384336320.png

Export Alert JSON

A JSON file can be exported to a local machine, containing JSON snippets for each selected alert:

  1. Click the checkboxes beside the relevant alerts to be exported.

  2. Click the Export JSON button on the customization bar:

    384336330.png

Copy an Alert

Alerts can be copied within the current team to allow for similar alerts to be created quickly, or copied to a different team to share alerts.

Copy an Alert to the Same Team

To copy an alert within the current team:

  1. Click the checkbox beside the alert to be copied.

  2. Click the Copy button on the customization bar:

    384336340.png
  3. Check that the Current Team option is selected.

  4. Rename the alert, and click the Copy and Open button to save the changes.

Copy an Alert to a Different Team

To copy an alert within the current team:

  1. Click the checkbox beside the alert to be copied.

  2. Click the Copy button on the customization bar.

  3. Select the Other Team(s) option.

  4. Open the Select Team drop-down menu, and select the teams that the alert should be copied to:

    384336335.png
  5. Rename the alert, and click the Send Copy button to save the changes.

Delete Alerts

To delete one or more alerts:

  1. Click the checkboxes beside the relevant alerts to be deleted.

  2. Click the Delete button on the customization bar.

  3. Click the Yes, Delete Alerts button to confirm the changes.

Configure the Alerts Table Columns

To configure the visible columns:

  1. From the Alerts module, click the Table Columns Configuration (three dots) icon.

    384336360.png
  2. Check the boxes beside each desired column.

  3. Click the Apply button to save the changes, the Restore button to return the table to the original configuration, or the Cancel button to revert to the previous configuration.

Search for an Alert

The Alerts table can be searched using partial or full strings. For example, the search below displays only events that contain kubernetes:

384336355.png

Edit an Existing Alert

To edit an existing alert:

  1. Click the checkbox beside the alert:

    384336350.png
  2. Click the Edit button on the customization bar:

    384336345.png
  3. Edit the alert, and click the Save button to confirm the changes.

Multi-Condition Alerts

Multi-condition alerts are advanced alert threshold created on complex conditions. To do so, you define alert thresholds as custom boolean expressions that can involve multiple conditions. Click Create multi-condition alerts to enable adding conditions as boolean expressions.

384336221.png

These advanced alerts require specific syntax, as described in the examples below.

Format and Operations

Each condition has five parts:

  • Metric Name : Use the exact metric names. To avoid typos, click the HELP link to access the drop-down list of available metrics. Selecting a metric from the list will automatically add the name to the threshold expression being edited.

  • Group Aggregation (optional): If no group aggregation type is selected, the appropriate default for the metric will be applied (either sum or average). Group aggregation functions must be applied outside of time aggregation functions.

  • Time aggregation : It's the historical data rolled up over a selected period of time.

  • Operator: Both logical and relational operators are supported.

  • Value: A static numerical value against which a condition is evaluated.

The table below displays supported time aggregation functions, group aggregation functions, and relational operators:

Time Aggregation Function

Group Aggregation Function

Relational Operator

timeAvg()

avg()

=

min()

min()

<

max()

max()

>

sum()

sum()

<=

>=

!=

The format is:

condition1 AND condition2
condition1 OR condition2
NOT condition1

The order of operations can also be altered via parenthesis:

NOT (condition1 AND (condition2 OR condition3))

Conditions take the following form:

groupAggregation(timeAggregation(metric.name)) operator value

Example Expressions

Several examples of advanced alerts are given below:

timeAvg(cpu.used.percent) > 50 AND timeAvg(memory.used.percent) > 75
timeAvg(cpu.used.percent) > 50 OR timeAvg(memory.used.percent) > 75
timeAvg(container.count) != 10  
min(min(cpu.used.percent)) <= 30 OR max(max(cpu.used.percent)) >= 60
sum(file.bytes.total) > 0 OR sum(net.bytes.total) > 0
timeAvg(cpu.used.percent) > 50 AND (timeAvg(mysql.net.connections) > 20 OR timeAvg(memory.used.percent) > 75)