Alerts
Alert is the responsive component of Sysdig Monitor. Alerts notify you when an event or issue occurs that requires attention. Events and issues are identified based on changes in the metric values collected by Sysdig Monitor. The Alerts module displays out-of-the-box alerts and a wizard for creating and editing alerts as needed.
Alert Types
The types of alerts available in Sysdig Monitor:
Downtime: Monitor any
type of entity, such as a host, a container, or a process, and alert
when the entity goes down.
Metric: Monitor
time-series metrics, and alert if they violate user-defined
thresholds.
PromQL: Monitor
metrics through a PromQL query.
Event: Monitor
occurrences of specific events, and alert if the total number of
occurrences violates a threshold. Useful for alerting on container,
orchestration, and service events like restarts and unauthorized
access.
The following tools help with alert creation:
Alert Library: Sysdig
Monitor provides a set of alerts by default. Use it as it is or as a
template to create your own.
Sysdig API:
Use Sysdig’s Python client to create, list, delete, update and restore alerts. See
examples.
Import Prometheus Rules: Sysdig Monitor allows you to import Prometheus rules or create new rules on the fly and add them to the existing list of alerts.
Create Alerts for CloudWatch Metrics
CloudWatch metrics queries are displayed as no data in the Alerts Editor. This is because our metric store does not currently store CloudWatch metrics and therefore, the UI displays the missing metrics as no data. However, you can successfully create alerts using these metrics.
Guidelines for Creating Alerts
Decide What to monitor | Determine what type of problem you want to be alerted on. See Alert Types to choose a type of problem. |
Define how it will be monitored | Specify exactly what behavior triggers a violation. For example, Marathon App is down on the Kubernetes Cluster named Production for ten minutes. |
Decide Where to monitor | Narrow down your environment to receive fine-tuned results. Use Scope to choose an entity that you want to keep a close watch on. Specify additional segments (entities) to give context to the problem. For example, in addition to specifying a Kubernetes cluster, add a namespace and deployment to refine your scope. |
Define when to notify | Define the threshold and time window for assessing the alert condition. Setting up a Warning Threshold allows you to notify of incidents earlier. For example, a database using 60% disk may trigger a warning to Slack but the same database using 80% disk may page the on-call team. |
Decide how notifications are sent | Alert supports customizable notification channels, including email, mobile push notifications, OpsGenie, Slack, and more. To see supported services, see Set Up Notification Channels. |
To create alerts, simply:
Choose an alert
type.
Configure alert
parameters.
Configure the notification
channels you want to
use for alert notification.
Sysdig sometimes deprecates outdated metrics. Alerts that use these
metrics will not be modified or disabled, but will no longer be updated.
See Deprecated Metrics and Labels.
1 - Configure Alerts
Different Ways To Create An Alert
Beyond the ability to use the Alert Editor, you can create alerts from different modules.
- From Metrics Explorer, Select Create Alert.
- From an existing Dashboard, Select the
More Options
(three dots) icon for a panel, and select Create Alert. - From any Event panel, Select Create Alert from Event.
Create An Alert from the Editor
Configure notification channels before you begin, so the channels are available to assign to the alert. Optionally, you can add a custom subject and body information into individual alert notifications.
Configuration slightly differs for each Alert type. See respective pages
to learn more. This section covers general instructions to help you
acquainted with and navigate the Alerts user interface.
To configure an alert, open the Alert Editor and set the following
parameters:
Alert Types
Select the desired Alert Type:
- Downtime: Select the entity to monitor.
- Metric: Select a time-series metric to be alerted on if they violate user-defined thresholds.
- PromQL: Enter the PromQL query and duration to define an alert condition.
- Event: Filter the custom event to be alerted on by using the name, tag, description, and a source tag.
Metric and Condition
Scope: Select Entire Infrastructure, or one or more labels to apply a limited scope and filter a specific metric.
Metric: Select a metric that this alert will monitor. Selecting a metric from the list will automatically add the name to the threshold expression being edited. Define how the data is aggregated (Time aggregation), such as average, maximum, minimum, or sum. It’s the historical data rolled up over a selected period.
Group By: Metrics are applied to a group of items (Group Aggregation). If no group aggregation type is selected, the appropriate default for the metric will be applied (either sum or
average). Group aggregation functions must be applied outside of time aggregation functions.
Segment by: Select one or more labels for segmentation. This allows for the creation of multi-series comparisons and multiple alerts. Multiple alerts will be triggered for each segment you specify. For more information, see Metric Alerts.
Multiple Thresholds
In addition to an alert threshold, a warning threshold can be configured for Metric Alerts and Event Alerts. Warning thresholds and alert thresholds can be associated with different notification channels. In the following example, a user may want to send a warning and alert notification to Slack, but also page the on-call team on Pagerduty if an alert threshold is met.

- Notify when resolved: In order to prevent a Pagerduty incident from automatically resolving once the alert threshold is no longer met, the user can toggle ‘Notify when Resolved’ off in order to ensure that the on-call team can triage the incident. This setting allows an alert to override the notification channel’s default notification settings. If an override is not configured, the alert will inherit the default settings from the notification channel.

If both warning and alert thresholds are associated with the same notification channel, a metric immediately exceeding the alert threshold will ignore the warn threshold and only trigger the alert threshold.
Notification
Notification Channel: Select from the configured notification channels in the list. Supported channels are:
Email
Slack
Amazon SNS Topic
Opsgenie
Pagerduty
VictorOps
Webhook
You can view the list of notification channels configured for each alert on the Alerts page.
Configure Notification Template: If applicable, add the following message format details and click Apply Template.
- Notification Subject & Event Title: Customize using variables, such as
{{__alert_name__}} is {{__alert_status__}} for {{agent_id}}
- Notification Body: Add the text for the notification you are creating. See Customize Notifications.
Settings
- Alert Severity: Select a priority. High, Medium, Low,
and Info.
- Alert Name: Specify a meaningful name that can uniquely represent the Alert you are creating. For example, the entity that an alert targets, such as
Production Cluster Failed Scheduling pods
. - Description (optional): Briefly expand on the alert name or alert condition to give additional context for the recipient.
- Group (optional): Specify a meaningful group name for the alert you are creating. Alerts that have no group name will be added to the Default Group.
- Link to Dashboard: Select a dashboard that you might want to include in the alert notification. You can view the specified dashboard link in the event feed associated with the alert.
- Link to Runbook: Specify the URL of a runbook. The link to the runbook appears in the event feed.
Captures
Optionally, configure a Sysdig capture. Specify the following:
- Capture Enabled: Click the slider to enable Capture.
- Capture Duration: The period of time captured. The default time is 15 seconds. The capture time starts from the time the alert threshold was breached
- Capture Storage: The storage location for the capture files.
- Capture Name: The name of the capture file
- Capture Filter: Restricts the amount of trace information collected.
Sysdig capture files are not available for Event and PromQL Alerts. See Captures for more information.
Optional: Customize Notifications
You can optionally customize individual notifications to provide context
for the errors that triggered the alert. All the notification channels
support this added contextual information and customization flexibility.
Modify the subject, body, or both of the alert notification with the
following:
Plaintext: A custom message stating the problem. For example,
Stalled Deployment.
Hyperlink: For example, URL to a Dashboard.
Dynamic Variable: For example, a hostname. Note the conventions:
- All variables that you insert must be enclosed in double curly
braces, such as
{{file_mount}}
. - Variables are case sensitive.
- The variables should correspond to the segment values you created the alert for. For example, if an alert is segmented by
host_hostName
and container_name
, the corresponding variables will be {{host_hostName}}
and {{container_name}}
respectively. In addition to these segment variables,
__alert_name__
and __alert_status__
are supported. No other segment variables are allowed in the notification subject and body. - Notification subjects will not show up on the Event feed.
- Using a variable that is not a part of the segment will trigger an error.
The body of the notification message contains a Default Alert Template. It is the default alert notification generated by Sysdig Monitor. You may add free text, variables, or hyperlinks before and after the
template.
You can send a customized alert notification to the following channels:
- Email
- Slack
- Amazon SNS Topic
- Opsgenie
- Pagerduty
- VictorOps
- Webhook
The following example shows a notification template created to alert you on Failing Prometheus Jobs. Adding {{kube_cluser_name}}: {{job}}
- {{__alert_name__}} is {{__alert_status__}}
to the subject line helps you identify the problem area at a glance without having to read the entire notification body.

Supported Aggregation Functions
The table below displays supported time aggregation functions, group
aggregation functions, and relational operators:
Time Aggregation Function | Group Aggregation Function | Relational Operator |
---|
timeAvg() | avg() | = |
min() | min() | < |
max() | max() | > |
sum() | sum() | <= |
not applicable | not applicable | >= |
not applicable | not applicable | != |
2 - Manage Alerts
Alerts can be managed individually, or as a group, by using the checkboxes on the left side of the Alert UI and the customization bar.
The columns of the table can also be configured, to provide you with the
necessary data for your use cases.

Select a group of alerts and perform several batch operations, such as
filtering, deleting, enabling, disabling, or exporting to a JSON object.
Select individual alerts to perform tasks such as creating a copy for a
different team.
View Alert Details
The bell button next to an alert indicates that you have not resolved
the corresponding events. The Activity Over Last Two Weeks column
visually notifies you with an event chart showing the number of events
that were triggered over the last two weeks. The color of the event
chart represents what severity level they are.
To view alert details, click the corresponding alert row. The slider
with the alert details will appear. Click an individual event to Take
Action. You can do one of the following:
Acknowledge: Mark that the event has been acknowledged by the
intended recipient.
Create Silence from Event: If you no longer want to be notified,
use this option. You can choose the scope for alert
silence. When silenced,
alerts will still be triggered but will not send you any
notifications.
Explore: Use this option to troubleshoot by using the PromQL Query Explorer.
The event feed will be empty and The Activity Over Last Two Weeks
column will have no event chart if no events are reported in the past
two weeks.
Enable/Disable Alerts
Alerts can be enabled or disabled using the slider or the customization
bar. You can perform these operations on a single alert or on multiple
alerts as a batch operation.
From the Alerts module, check the boxes beside the relevant alerts.
Click Enable Selected or Disable Selected as necessary.
Use the slider beside the alert to disable or enable individual alerts.

Edit an Existing Alert
To edit an existing alert:
Do one of the following::
Click the Edit button beside the alert.

Click an alert to open the detail view, then click Edit on
the top right corner.

Edit the alert, and click Save to confirm the changes.
Copy an Alert
Alerts can be copied within the current team to allow for similar alerts
to be created quickly, or copied to a different team to share alerts.
Copy an Alert to the Current Team
To copy an alert within the current team:
Highlight the alert to be copied.
The detail view is displayed.

Click Copy.
The Copy Alert screen is displayed.
Select Current from the drop-down.
Click Copy and Open.
The particular alert in the edit mode appears.
Make necessary changes and save the alert.
Copy an Alert to a Different Team
Highlight the alert to be copied.
The detail view is displayed.
Click Copy.
The Copy Alert screen is displayed.
Select the teams that the alert should be copied to.

Click Send Copy.
Search for an Alert
Search Using Strings
The Alerts table can be searched using partial or full strings. For
example, the search below displays only events that contain
kubernetes
:

Filter Alerts
The alert feed can be filtered in multiple ways, to drill-down into the
environment’s history and refine the alert displayed. The feed can be
filtered by severity or status. Examples of each are shown below.
The example below shows only high and medium severity:

The example below shows the alerts that are invalid:

Export Alerts as JSON
A JSON file can be exported to a local machine, containing JSON snippets
for each selected alert:
Click the checkboxes beside the relevant alerts to be exported.
Click Export JSON.

Delete Alerts
Open the Alert page and use one of the following methods to delete
alerts :
Hover on a specific alert and click Delete.

Hover on one or more alerts, click the checkbox, then click
Delete on the bulk-action toolbar.

Click an alert to see the detailed view, then click Delete on
the top right corner.

3 - Alert Types
Sysdig Monitor can generate notifications based on certain conditions or events you configure. Using the alert feature, you can keep a tab on your infrastructure and find out about problems as they happen, or even before they happen with the alert conditions you define. In Sysdig Monitor, metrics serve as the central configuration artifact for alerts. A metric ties one or more conditions or events to the measures to take when the condition is met, or an event happens. Alerts work across Sysdig modules including Explore, Dashboard, Events, and Overview.
The types of alerts available in Sysdig Monitor:
Downtime: Monitor any
type of entity, such as a host, a container, or a process, and alert
when the entity goes down.
Metric: Monitor
time-series metrics, and alert if they violate user-defined
thresholds.
PromQL: Monitor
metrics through a PromQL query.
Event: Monitor
occurrences of specific events, and alert if the total number of
occurrences violates a threshold. Useful for alerting on container,
orchestration, and service events like restarts and unauthorized
access.
3.1 - Downtime Alert
Sysdig Monitor continuously surveils different types of entities in your infrastructure, such as a host, a container, a process, and sends notifications when the monitored entity is not available or responding. Downtime alert focuses mainly on unscheduled downtime of programs, containers, and hosts in your infrastructure.
In this example, the downtime of the containers are monitored. When one of more containers in the given scope go down in the 1-minute time window, notifications will be sent with necessary information on both the containers and the agents.
The lines shown in the preview chart represent the values for the
segments selected to monitor. The popup is a color-coded legend to show
which segment (or combination of segments if there is more than one) the
lines represent. You can also deselect some segment lines to prevent
them from showing in the chart. Note that there is a limit of 10 lines
that Sysdig Monitor ever shows in the preview chart. For downtime
alerts, segments are actually what you select for the Alerts if any of option.
About Up Metrics
To monitor the downtime of the entities, the following up
metrics are used: sysdig_host_up
, sysdig_container_up
, and sysdig_program_up
. They indicate whether the agent is able to communicate with the collector. The value 1
represents the entity is up and agent is sending this information to the collector. The value 0
represents the entity is down, implies no communication from agent to the collector about the entity.
When an alert is configured based on Up
metric, two data API queries are performed during the alert check. One query will retrieve the current values and the other will retrieve the values from the previous alert check interval. For any entity that was present in previous interval and is not present in current interval, the metric is marked as 0
.
An aggregated value of the up metric is displayed on the dashboard on the Alert Editor, and therefore, you might see a value between 0 and 1.
Define a Downtime Alert
Guidelines
Set a unique name and description: Set a meaningful name and
description that help recipients easily identify the alert.
Severity: Set a severity level for your alert. The
Priority—High, Medium, Low, and Info—are reflected
in the Alert list, where you can sort by the severity of the Alert.
You can use severity as a criterion when creating alerts, for
example: if there are more than 10 high severity events, notify.
Specify multiple segments: Selecting a single segment might not
always supply enough information to troubleshoot. Enrich the
selected entity with related information by adding additional
related segments. Enter hierarchical entities so you have the
bottom-down picture of what went wrong and where. For example,
specifying a Kubernetes cluster alone does not provide the context
necessary to troubleshoot. In order to narrow down the issue, add
further contextual information, such as Kubernetes Namespace,
Kubernetes Deployment, and so on.
Scope
Filter the environment on which this alert will apply. For example, an alert will
fire when a container associated with the agent 197288
goes down. The alert will be triggered for each container name and agent ID.
Use in or contain operators to match multiple different possible
values to apply scope.
The contain and not contain operators help you retrieve values
if you know part of the values. For example, us retrieves values
that contain strings that start with “us”, such as “us-east-1b”,
“us-west-2b”, and so on.
The in and not in operators help you filter multiple values.
You can also create alerts directly from Explore and Dashboards for
automatically populating this scope.
Metric
Select an uptime metric associated with the entity whose downtime you want to monitor for. You can select one of the following entities: host, container, program.
Entity
Specify additional segments by using the Alert if any of option.
The specified entities are segmented on and notified with the default notification template as well as on the Preview. In this example, data is segmented on container name and agent ID. When a container is affected, the notification will not only include the affected container details but also the associated agent IDs.
Trigger
Define the threshold and time window for assessing the alert condition.
Supported time scales are minute, hour, or day.
If the monitored program is not available or not
responding for the last 1 minute, recipients will be notified.
You can set any value for % and a value greater than 1 for the time
window. For example, If you choose 50% instead of 100%, a notification
will be triggered when the entity is down for 5 minutes in the selected
time window of 10 minutes.
Use Cases
Your e-commerce website is down during the peak hours of Black
Friday, Christmas, or New Year season.
Production servers of your data center experience a critical outage
MySQL database is unreachable
File upload does not work on your marketing website.
3.2 - PromQL Alerts
Sysdig Monitor enables you to use PromQL to define metric expressions that you can alert on
You define the alert conditions using the
PromQL-based metric expression. This way, you can combine different
metrics and alert on cases like service-level agreement breach, running
out of disk space in a day, and so on.
Examples
For PromQL alerts, you can use any metric that is available in PromQL,
including Sysdig native metrics. For more details
see the various integrations available on
promcat.io.
Low Disk Space Alert
Warn if disk space falls below a specified quantity. For example disk
space is below 10GB in the 24h hour:
predict_linear(sysdig_fs_free_bytes{fstype!~"tmpfs"}[1h], 24*3600) < 10000000000
Slow Etcd Requests
Notify if etcd
requests are slow. This example uses the
promcat.io integration.
histogram_quantile(0.99, rate(etcd_http_successful_duration_seconds_bucket[5m]) > 0.15
High Heap Usage
Warn when the heap usage in ElasticSearch is more than 80%. This example
uses the promcat.io
integration.
(elasticsearch_jvm_memory_used_bytes{area="heap"} / elasticsearch_jvm_memory_max_bytes{area="heap"}) * 100 > 80
Guidelines
Sysdig Monitor does not currently support the following:
Interact with the Prometheus alert manager or import alert manager
configuration.
Provide the ability to use, copy, paste, and import predefined alert
rules.
Convert the alert rules to map to the Sysdig alert editor.
Create a PromQL Alert
Set a meaningful name and description that help recipients easily
identify the alert.
Set a Priority
Select a priority for the alert that you are creating. The supported
priorities are High, Medium, Low, and Info. You can also
view and sort events in the dashboard and explore UI, as well as sort
them by severity.
Define a PromQL Alert
PromQL: Enter a valid PromQL expression. The query will be executed
every minute. However, the alert will be triggered only if the query
returns data for the specified duration.
In this example, you will be alerted when disk space falls below 10GB in the 24h hour.
Duration: Specify the time window for evaluating the alert condition
in minutes, hour, or day. The alert will be triggered if the query
returns data for the specified duration.
Define Notification
Notification Channels: Select from the configured notification
channels in the list.
Re-notification Options: Set the time interval at which multiple
alerts should be sent if the problem remains unresolved.
Notification Message & Events: Enter a subject and body. Optionally,
you can choose an existing template for the body. Modify the subject,
body, or both for the alert notification with a hyperlink, plain text,
or dynamic variables.
Import Prometheus Alert Rules
Sysdig Alert allows you to import Prometheus rules or create new rules
on the fly and add them to the existing list of alerts. Click the
Upload Prometheus Rules option and enter the rules as YAML in the
Upload Prometheus Rules YAML editor. Importing your Prometheus alert
rules will convert them to PromQL-based Sysdig alerts. Ensure that the
alert rules are valid YAML.

You can upload one or more alert rules in a single YAML and create
multiple alerts simultaneously.

Once the rules are imported to Sysdig Monitor, the alert list will be
automatically sorted by last modified date.

Besides the pre-populated template, each rule specified in the Upload
Prometheus Rules YAML editor requires the following fields:
See the following examples to understand the format of Prometheus Rules
YAML. Ensure that the alert rules are valid YAML to pass validation.
Example: Alert Prometheus Crash Looping
To alert potential Prometheus crash looping. Create a rule to alert when
Prometheus restart more than twice in the last 10 minutes.
groups:
- name: crashlooping
rules:
- alert: PrometheusTooManyRestarts
expr: changes(process_start_time_seconds{job=~"prometheus|pushgateway|alertmanager"}[10m]) > 2
for: 0m
labels:
severity: warning
annotations:
summary: Prometheus too many restarts (instance {{ $labels.instance }})
description: Prometheus has restarted more than twice in the last 15 minutes. It might be crashlooping.\n VALUE = {{ $value }}\n
Example: Alert HTTP Error Rate
To alert HTTP requests with status 5xx (> 5%) or high latency:
groups:
- name: default
rules:
# Paste your rules here
- alert: NginxHighHttp5xxErrorRate
expr: sum(rate(nginx_http_requests_total{status=~"^5.."}[1m])) / sum(rate(nginx_http_requests_total[1m])) * 100 > 5
for: 1m
labels:
severity: critical
annotations:
summary: Nginx high HTTP 5xx error rate (instance {{ $labels.instance }})
description: Too many HTTP requests with status 5xx
- alert: NginxLatencyHigh
expr: histogram_quantile(0.99, sum(rate(nginx_http_request_duration_seconds_bucket[2m])) by (host, node)) > 3
for: 2m
labels:
severity: warning
annotations:
summary: Nginx latency high (instance {{ $labels.instance }})
description: Nginx p99 latency is higher than 3 seconds
Learn More
3.3 - Metric Alerts
Sysdig Monitor offers an easy way to define metrics-based alerts.
You can create metric alerts for scenarios such as:
- Number of processes running on a host
- Root volume disk usage in a container
- Cpu / memory usage of a host or workload
Defining a Metric Alert
Guidelines
Set a unique name and description: Set a meaningful name and
description that help recipients easily identify the alert
Specify multiple segments: Selecting a single segment might not
always supply enough information to troubleshoot. Enrich the
selected entity with related information by adding additional
related segments. Enter hierarchical entities so you have the
bottom-down picture of what went wrong and where. For example,
specifying a Kubernetes cluster alone does not provide the context
necessary to troubleshoot. In order to narrow down the issue, add
further contextual information, such as namespace, deployment, and so on.
Specify Metrics
Select a metric that this alert will monitor. You can also define how
data is aggregated, such
as average, maximum, minimum, or sum.
Team scope is automatically applied to alerts. You can further filter the environment by overriding the scope.
For example, the below alert will fire when any host’s cpu usage will go above the defined threshold within the us-east-1a.
cloud availability zone.

You can also create alerts directly from Explore and Dashboards for
automatically populating this scope.
Alerting on No Data
When a metric stop reporting, Sysdig Monitor show no data where you would normally expect data points. To detect such incidents that fail silently, you can configure alerts to notify you when a metric ceases to report data.
You can use the No Data option in the Settings section to determine how a metric alert should behave upon discovering the metric reports no data.
By default, alerts configured for metrics that stop reporting data will not be evaluated. You can change this behavior by enabling Notify on missing data, in which case, an alert will be sent when the metric stops reporting data.
This feature is currently available only for Metric Alerts.
Define the threshold and time window for assessing the alert condition.
Single Alert fires an alert for your entire scope, while Multiple Alert
fires if any or every segment breach the threshold at once.
Metric alerts can be triggered to notify you of different aggregations:
on average | The average of the retrieved metric values across the time period. Actual number of samples retrieved is used to calculate the value. For example, if new data is retrieved in the 7th minute of a 10-minutes sample and the alert is defined as on average, the alert will be calculated by summing the 3 recorded values and dividing by 3. |
as a rate | The average value of the metric across the time period evaluated. The expected number of values is used to calculate the rate to trigger the alert. For example, if new data is retrieved in the 7th minute of a 10-minutes sample and the alert is defined as as a rate, the alert will be calculated by summing the 3 recorded values and dividing by 10 ( 10 x 1 minute samples). |
in sum | The combined sum of the metric across the time period evaluated. |
at least once | The trigger value is met for at least one sample in the evaluated period. |
for the entire time | The trigger value is met for a every sample in the evaluated period. |
as a rate of change | The trigger value is met the change in value over the evaluated period. |
For example, the alert below will fire for each unique segment denominated by host_hostname
and kube_cluster_name
using more than 75% of the filesystem on average, over the last 5 minutes.

Example: Alert When Data Transfer Over the Threshold
The below example shows an alert that triggers when the average
bytes of data transferred by a container is over 20 KiB/s for a period of 1 minute.

In the alert Settings, you can configure a link to a Runbook and to a Dashboard to speed up troubleshooting when the alert fires.
When viewing the triggered alert you will be able to quickly access your defined Runbook and Dashboard.

3.4 - Event Alerts
Monitor occurrences of specific events, and alert if the total number of occurrences violates a threshold. Useful for alerting on container, orchestration, and service events like restarts and deployments.
Alerts on events support one or more segmentation labels. An alert is
generated for each segment.
Defining an Event Alert
Guidelines
Count Events That Match: Specify a meaningful filter text to count the number of related events.
Severity: Set a severity level for your alert. The Priority:
High
, Medium
, Low
, and Info
are reflected in the Alert list,
where you can sort by the severity by using the top navigation pane.
You can use severity as a criterion when creating events and alerts,
for example: if there are more than 10 high severity events, notify.
Event Source: Filter by one or more event sources
that should be considered by the alert. Predefined options are included for
infrastructure event sources
(kubernetes, docker, and containerd), but you can freely specify other values to match
custom event sources.
Alert if: Specify the trigger condition in terms of the number of
events for a given duration.
Set a unique name and description: Set a meaningful name and description that help recipients easily identify the alert.
Filter the environment on which this alert will apply. Use advanced operators to include, exclude, or pattern-match groups, tags, and entities. You can also create alerts directly from Explore and Dashboards for automatically populating this scope.
In this example, failing to schedule a pod in a default namespace triggers an alert.
Define the threshold and time window for assessing the alert condition.
Single alert fires an alert for your entire scope, while multiple alert
fires if any or every segment breach the threshold at once.
If the number of events triggered in the monitored entity is greater
than 5 for the last 10 minutes, recipients will be notified through the
selected channel.
3.5 - Advanced Metric Alerts
Advanced metric alerts (Multi-condition alerts) are advanced alert threshold created on complex conditions. They are created by defining alert thresholds as custom boolean expressions that can involve multiple conditions.
The new Alert Editor does not support creating Advanced Alerts. However, it gives you an option to open the existing advanced alerts, and save them as a PromQL Alert.
To save an advanced metric alert as a PromQL alert:
Open the advanced alert and click Edit.
The Advanced Metric Alert page will display an option to copy the alert to a PromQL alert.
Adjust the time window as necessary, select one or more notification channels, and configure alert settings. Alternatively, you can do the configuration after copying to Prometheus alert page.
Click Copy to Prometheus alert.
The PromQL alert editor page will be displayed.
Click Save.
4 - Alerts Library
To help you get started quickly, Sysdig provides a set of curated alert templates called Alerts Library.
Powered by Monitoring Integrations, Sysdig automatically
detects the applications and services running in your environment and
recommends alerts that you can enable.
Two types of alert templates are included in Alerts Library:

Recommended: Alert suggestions based on the services that are
detected running in your infrastructure.
All templates: You can browse templates for all the services.
For some templates, you might need to configure Monitor
Integrations.
Access Alerts Library
Log in to Sysdig Monitor.
Click Alerts from the left navigation pane.
On the Alerts tab, click Library.

Import an Alert
Locate the service that you want to configure an alert for.
To do so, either use the text search or identify from a list of
services.
For example, click Redis.

Eight template suggestions are displayed for 14 Redis services
running on the environment.
From a list of template suggestions, choose the desired template.
The Redis page shows the alerts that are already in use and that you
can enable.
Enable one or more alert templates. To do so, you can do one of the
following:
Click Enable Alert.
Bulk enable templates. Select the check box corresponding to the
alert templates and click Enable Alert on the top-right
corner.
Click on the alert template to display the slider. Click the
Enable Alert on the slider.
On the Configure Redis Alert page, specify the Scope and select
the Notification channels.

Click Enable Alert.
You will see a message stating that the Redis Alert has been
successfully created.
Use Alerts Library
In addition to importing an alert, you can also do the following with
the Alerts Library:
Identify Alert templates associated with the services running in
your infrastructure.

Bulk import Alert templates. See Import an
Alert.
View alerts that are already configured.
Filter Alert templates. Enter the search string to display the
matching results.

Discover the workloads where a service is running. To do so, click
on the Alert template to display the slider. On the slider, click
Workloads.
View the alerts in use. To do so, click on an Alert template to
display the slider. On the slider, click Alerts in use.

Configure an alert.
Additional alert configuration, such as changing the alert name,
description, and severity can be done after the import.
5 - Silence Alert Notifications
Sysdig Monitor allows you to silence alerts for a given scope for a predefined amount of time. When silenced, alerts will still be triggered but will not send any notifications. You can schedule silencing in advance. This helps administrators to temporarily mute notifications during planned downtime or maintenance and send downtime notifications to selected channels.
With an active silence, the only notifications you will receive are
those indicating the start time and the end time of the silence. All
other notifications for events from that scope will be silenced. When a
silence is active, creating an alert triggers the alert but no
notification will be sent. Additionally, a triggering event will be
generated stating that the alert is silenced.
See Working with Alert
APIs for programmatically
silencing alert notifications.
When you create a new silence, it is by default enabled and scheduled.
When the start time arrives for a scheduled silence, it becomes active
and the list shows the time remaining. When the end time arrives, the
silence becomes completed and cannot be enabled again.
To configure a silence:
Click Alerts on the left navigation on the Monitor UI.
Click the Silence tab.
The page shows the list of all the existing silences.
Click Set a Silence.
The Silence for Scope window is displayed.

Specify the following:
Scope: Specify the entity you want to apply the scope as.
For example, a particular workload or namespace, from
environments that may include thousands of entities.
Begins: Specify one of the following: Today,
Tomorrow, Pick Another Day. Select the time from the
drop-down.
Duration: Specify how long notifications should be
suppressed.
Name: Specify a name to identify the silence.
Notify: Select a channel you want to notify about the
silence.
Click Save.
Silence Alert Notifications from Event Feed
You can also create and edit silences and view silenced alert events on
the Events feeds across the Monitor UI. When you create a silence, the
alert will still be triggered and posted on the Events feed and in the
graph overlays but will indicate that the alert has been silenced.
If you have an alert with no notification channel configured, events
generated from that alert won’t be marked as silenced. They won’t be
visually represented in the events feed as well with the crossed bell
icon and the option to silence events.
To do so,
On the event feed, select the alert event that you want to silence.
On the event details slider, click Take Action.

Click Create Silence from Event.
The Silence for Scope window is displayed.
Continue configuring the silence as described in
4.
Manage Silences
Silences can be managed individually, or as a group, by using the
checkboxes on the left side of the Silence UI and the customization
bar. Select a group of silences and perform batch delete operations.
Select individual silences to perform tasks such as enabling, disabling,
duplicating, and editing.
Change States
You can enable or disable a silence by sliding the state bar next to the
silences. There are two kinds of silences that will show as enabled:
active (a running silence) and a scheduled silence (which will start in
the future). Its starting date is back in time but the end date is yet
to happen. A clock icon visually represents an active silence.

Completed silences cannot be re-enabled once a silenced period is
finished. However, you can duplicate it with all the data but you need
to set a new silencing period.
A silence can be disabled only when:
Filter Silences
Use the search bar to filter silences. You can either perform a simple
auto-complete text search or use the categories. The feed can be
filtered by the following categories: Active, Scheduled,
Completed.
For example, the following shows the completed silences that start with
“cl”.

Duplicate a Silence
Do one of the following to duplicate a silence:
Click the Duplicate hover-the-row button on the menu.

Click the row for the Silence for Scope window to open. On the
window, make necessary changes if required and click Duplicate.
Edit Silence
You can edit scheduled silences. For the active ones, you can only
extend the time. You cannot edit completed silences.
To edit a silence, do one of the following:
Click the row for the Silence for Scope window to open. Make
necessary changes and click Update.
Click the Edit hover-the-row button on the menu. The Silence
for Scope window will be displayed.

Make necessary changes and click Update.
Extend the Time Duration
For the active silences, you can extend the duration to one of the
following:
1 Hour
2 Hours,
6 Hours,
12 Hours
24 Hours
To do so, click the extend the time duration button on the menu and
choose the duration. You can extend the time of an active silence even
from the Silence for Scope window.

Extending the time duration will notify the configured notification
channels that the downtime is extended. You can also extend the time
from a Slack notification of a silence by clicking the given link. It
opens the Silence for Scope window of the running silence where you
can make necessary adjustments.
You cannot extend the duration of completed silences.
6 - Legacy Alerts Editor
If you do not have the new
Sysdig metric store enabled, you will not be able to use the
latest Alert Editor features. You will continue to use the legacy Alerts Editor to create and edit alert notifications.
Alert Types
The types of alerts available in Sysdig Monitor:
Downtime: Monitor any
type of entity, such as a host, a container, or a process, and alert
when the entity goes down.
Metric: Monitor
time-series metrics, and alert if they violate user-defined
thresholds.
PromQL: Monitor
metrics through a PromQL query.
Event: Monitor
occurrences of specific events, and alert if the total number of
occurrences violates a threshold. Useful for alerting on container,
orchestration, and service events like restarts and unauthorized
access.
Anomaly Detection:
Monitor hosts based on their historical behaviors, and alert when
they deviate from the expected pattern.
Group Outlier: Monitor
a group of hosts and be notified when one acts differently from the
rest. Group Outlier Alert is supported only on hosts.
The following tools help with alert creation:
Alert Library: Sysdig
Monitor provides a set of alerts by default. Use it as it is or as a
template to create your own.
Sysdig
API:
Use Sysdig’s Python client to create, list, delete, update and
restore alerts. See
examples.
Guidelines for Creating Alerts
Decide What to monitor | Determine what type of problem you want to be alerted on. See Alert Types to choose a type of problem. |
Define how it will be monitored | Specify exactly what behavior triggers a violation. For example, Marathon App is down on the Kubernetes Cluster named Production for ten minutes. |
Decide Where to monitor | Narrow down your environment to receive fine-tuned results. Use Scope to choose an entity that you want to keep a close watch on. Specify additional segments (entities) to give context to the problem. For example, in addition to specifying a Kubernetes cluster, add a namespace and deployment to refine your scope. |
Define when to notify | Define the threshold and time window for assessing the alert condition. Single Alert fires an alert for your entire scope, while Multiple Alert fires if any or every segment breach the threshold at once.
Multiple Alerts include all the segments you specified to uniquely identify the location and thus provides a full qualification of where the problem occurred. The higher the number of segments the easier to uniquely identify the affected entities.
A good analogy for multiple alerts is alerting on cities. For example, creating multiple alerts on San Francisco would trigger an alert which will include information such as the country that it is part of is the USA and the continent is North America. Trigger gives you control over how notifications are created. For example, you may want to receive a notification for every violation, or want only a single notification for a series of consecutive violations.
|
Decide how notifications are sent | Alert supports customizable notification channels, including email, mobile push notifications, OpsGenie, Slack, and more. To see supported services, see Set Up Notification Channels. |
To create alerts, simply:
Choose an alert
type.
Configure alert
parameters.
Configure the notification
channels you want to
use for alert notification.
Sysdig sometimes deprecates outdated metrics. Alerts that use these
metrics will not be modified or disabled, but will no longer be updated.
See Deprecated Metrics and Labels.
Use the Alert wizard to create or edit alerts.
Open the Alert Wizard
There are multiple ways to access the Alert wizard:
From Explore
Do one of the following:
From Dashboards
Click the More Options (three dots) icon for a panel, and
select Create Alert.
From Alerts
Do one of the following:
From Overview
From the Events panel on the Overview screen, select a custom or an
Infrastructure type event. From the event description screen, click
Create Alert from Event.

Create an Alert
Configure notification
channels before you begin,
so the channels are available to assign to the alert. Optionally, you
can add a custom subject and body information into individual alert
notifications.
Configuration slightly defers for each Alert type. See respective pages
to learn more. This section covers general instructions to help you
acquainted with and navigate the Alerts user interface.
To configure an alert, open the Alert wizard and set the following
parameters:
Create the alert:
Type: Select the desired Alert
Types.

Each type has different parameters, but they follow the same
pattern:
Name: Specify a meaningful name that can uniquely
represent the Alert that you are creating. For example, the
entity that an alert targets, such as
Production Cluster Failed Scheduling pods
.
Group (optional): Specify a meaningful group name for
the alert you are creating. Group name helps you narrow down
the problem area and focus on the infrastructure view that
needs your attention. For example, you can enter Redis
for alerts related to Redis services. When the alert
triggers you will know which service in your workload
requires inspection. Alerts that have no group name will be
added to the Default Group. Group name is editable. Edit
the alert to do so.
An alert can belong to only one group. An alert created from
an alert template will have the group already configured by
the Monitor Integrations.
You can see the existing alert groups on the Alerts
details page.

See Groupings
for more information on how Sysdig handles infrastructure views.
Description (optional): Briefly expand on the alert name
or alert condition to give additional context for the
recipient.
Priority: Select a priority. High, Medium, Low,
and Info. You can later sort by the
severity by
using the top navigation pane.

Specify the parameters in the Define, Notify, and
Act sections.
To alert on multiple metrics using boolean logic, click Create
multi-condition alerts. See Multi-Condition
Alerts.

Scope: Everywhere, or a more limited scope to filter a specific
component of the infrastructure monitored, such as a Kubernetes
deployment, a Sysdig Agent, or a specific service.
Trigger: Boundaries for assessing the alert condition, and
whether to send a single alert or multiple alerts. Supported time
scales are minute, hour, or day.
Single alert: Single Alert fires an alert for your entire
scope.
Multiple alerts: Multiple Alert fires if any or every
segment breaches the threshold at once.
Multiple alerts are triggered for each segment you specify. The
specified segments will be represented in alerts. The higher the
number of segments the easier to uniquely identify the affected
entities.
For detailed description, see respective sections on Alert Types.
(2) Notify
Notification Channel: Select from the configured
notification channels in the list. Supported channels are:
Email
Slack
Amazon SNS Topic
Opsgenie
Pagerduty
VictorOps
Webhook
You can view the list of notification channels configured for
each alert on the Alerts page.

Notification Options: Set the time interval at which
multiple alerts should be sent.
Format Message: If applicable, add message format details.
See Customize
Notifications.
(3) Act
Click Create.
Optional: Customize Notifications
You can optionally customize individual notifications to provide context
for the errors that triggered the alert. All the notification channels
support this added contextual information and customization flexibility.
Modify the subject, body, or both of the alert notification with the
following:
Plaintext: A custom message stating the problem. For example,
Stalled Deployment.
Hyperlink: For example, URL to a Dashboard.
Dynamic Variable: For example, a hostname. Note the conventions:
All variables that you insert must be enclosed in double curly
braces, such as {{file_mount}}
.
Variables are case sensitive.
The variables should correspond to the segment values you
created the alert for. For example, if an alert is segmented
byhost.hostName
andcontainer.name
, the corresponding
variables will be{{host.hostName}}
and {{container.name}}
respectively. In addition to these segment variables,
__alert_name__
and __alert_status__
are supported. No other
segment variables are allowed in the notification subject and
body.
Notification subjects will not show up on the Event feed.
Using a variable that is not a part of the segment will trigger
an error.
The segment variables used in an alert are turned to the current
system values upon sending the alert.
The body of the notification message contains a Default Alert Template.
It is the default alert notification generated by Sysdig Monitor. You
may add free text, variables, or hyperlinks before and after the
template.
You can send a customized alert notification to the following channels:
Email
Slack
Amazon SNS Topic
Opsgenie
Pagerduty
VictorOps
Webhook
Multi-Condition Alerts
Multi-condition alerts are advanced alert threshold created on complex
conditions. To do so, you define alert thresholds as custom boolean
expressions that can involve multiple conditions. Click Create
multi-condition alerts to enable adding conditions as boolean
expressions.

These advanced alerts require specific syntax, as described in the
examples below.
Each condition has five parts:
Metric Name : Use
the exact metric names. To avoid typos, click the HELP
link to
access the drop-down list of available metrics. Selecting a metric
from the list will automatically add the name to the threshold
expression being edited.
Group Aggregation
(optional): If no group aggregation type is selected, the
appropriate default for the metric will be applied (either sum or
average). Group aggregation functions must be applied outside of
time aggregation functions.
Time aggregation :
It’s the historical data rolled up over a selected period of time.
Operator: Both logical and relational operators are supported.
Value: A static numerical value against which a condition is
evaluated.
The table below displays supported time aggregation functions, group
aggregation functions, and relational operators:
Time Aggregation Function | Group Aggregation Function | Relational Operator |
---|
timeAvg() | avg() | = |
min() | min() | < |
max() | max() | > |
sum() | sum() | <= |
| | >= |
| | != |
The format is:
condition1 AND condition2
condition1 OR condition2
NOT condition1
The order of operations can also be altered via parenthesis:
NOT (condition1 AND (condition2 OR condition3))
Conditions take the following form:
groupAggregation(timeAggregation(metric.name)) operator value
Example Expressions
Several examples of advanced alerts are given below:
timeAvg(cpu.used.percent) > 50 AND timeAvg(memory.used.percent) > 75
timeAvg(cpu.used.percent) > 50 OR timeAvg(memory.used.percent) > 75
timeAvg(container.count) != 10
min(min(cpu.used.percent)) <= 30 OR max(max(cpu.used.percent)) >= 60
sum(file.bytes.total) > 0 OR sum(net.bytes.total) > 0
timeAvg(cpu.used.percent) > 50 AND (timeAvg(mysql.net.connections) > 20 OR timeAvg(memory.used.percent) > 75)
6.1 - Legacy Downtime Alert
Sysdig Monitor continuously surveils any type of entity in your
infrastructure, such as a host, a container, a process, or a service,
and sends notifications when the monitored entity is not available or
responding. Downtime alert focuses mainly on unscheduled downtime of
your infrastructure.

In this example, a Kubernetes cluster is monitored and the alert is
segmented on both cluster and namespace. When a Kubernetes cluster in
the selected availability zone goes down, notifications will be sent
with necessary information on both cluster and affected namespace.
The lines shown in the preview chart represent the values for the
segments selected to monitor. The popup is a color-coded legend to show
which segment (or combination of segments if there is more than one) the
lines represent. You can also deselect some segment lines to prevent
them from showing in the chart. Note that there is a limit of 10 lines
that Sysdig Monitor ever shows in the preview chart. For downtime
alerts, segments are actually what you select for the “Select entity
to monitor” option.
Define a Downtime Alert
Guidelines
Set a unique name and description: Set a meaningful name and
description that help recipients easily identify the alert.
Severity: Set a severity level for your alert. The
Priority—High, Medium, Low, and Info—are reflected
in the Alert list, where you can sort by the severity of the Alert.
You can use severity as a criterion when creating alerts, for
example: if there are more than 10 high severity events, notify.
Specify multiple segments: Selecting a single segment might not
always supply enough information to troubleshoot. Enrich the
selected entity with related information by adding additional
related segments. Enter hierarchical entities so you have the
bottom-down picture of what went wrong and where. For example,
specifying a Kubernetes Cluster alone does not provide the context
necessary to troubleshoot. In order to narrow down the issue, add
further contextual information, such as Kubernetes Namespace,
Kubernetes Deployment, and so on.
Specify Entity
Select an entity whose downtime you want to monitor for.
In this example, you are monitoring the unscheduled downtime of a
host.
Specify additional segments:

The specified entities are segmented on and notified with the
default notification template as well as on the Preview. In this
example, data is segmented on Kubernetes cluster name and namespace
name. When a cluster is affected, the notification will not only
include the affected cluster details but also the associated
namespaces.
Filter the environment on which this alert will apply. An alert will
fire when a host goes down in the availability zone, us-east-1b.

Use in or contain operators to match multiple different possible
values to apply scope.
The contain and not contain operators help you retrieve values
if you know part of the values. For example, us retrieves values
that contain strings that start with “us”, such as “us-east-1b”,
“us-west-2b”, and so on.
The in and not in operators help you filter multiple values.
You can also create alerts directly from Explore and Dashboards for
automatically populating this scope.
Define the threshold and time window for assessing the alert condition.
Supported time scales are minute, hour, or day.

If the monitored host or Kubernetes cluster is not available or not
responding for the last 10 minutes, recipients will be notified.
You can set any value for % and a value greater than 1 for the time
window. For example, If you choose 50% instead of 100%, a notification
will be triggered when the entity is down for 5 minutes in the selected
time window of 10 minutes.
Use Cases
Your e-commerce website is down during the peak hours of Black
Friday, Christmas, or New Year season.
Production servers of your data center experience a critical outage
MySQL database is unreachable
File upload does not work on your marketing website.
6.2 - Legacy PromQL Alerts
Sysdig Monitor enables you to use PromQL to define metric expressions
that you can alert on. You define the alert conditions using the
PromQL-based metric expression. This way, you can combine different
metrics and warn on cases like service-level agreement breach, running
out of disk space in a day, and so on.
Examples
For PromQL alerts, you can use any metric that is available in PromQL,
including Sysdig native metrics. For more details
see the various integrations available on
promcat.io.
Low Disk Space Alert
Warn if disk space falls below a specified quantity. For example disk
space is below 10GB in the 24h hour:
predict_linear(sysdig_fs_free_bytes{fstype!~"tmpfs"}[1h], 24*3600) < 10000000000
Slow Etcd Requests
Notify if etcd
requests are slow. This example uses the
promcat.io integration.
histogram_quantile(0.99, rate(etcd_http_successful_duration_seconds_bucket[5m]) > 0.15
High Heap Usage
Warn when the heap usage in ElasticSearch is more than 80%. This example
uses the promcat.io
integration.
(elasticsearch_jvm_memory_used_bytes{area="heap"} / elasticsearch_jvm_memory_max_bytes{area="heap"}) * 100 > 80
Guidelines
Sysdig Monitor does not currently support the following:
Interact with the Prometheus alert manager or import alert manager
configuration.
Provide the ability to use, copy, paste, and import predefined alert
rules.
Convert the alert rules to map to the Sysdig alert editor.
Create a PromQL Alert
Set a meaningful name and description that help recipients easily
identify the alert.
Set a Priority
Select a priority for the alert that you are creating. The supported
priorities are High, Medium, Low, and Info. You can also
view and sort events in the dashboard and explore UI, as well as sort
them by severity.
Define a PromQL Alert
PromQL: Enter a valid PromQL expression. The query will be executed
every minute. However, the alert will be triggered only if the query
returns data for the specified duration.

In this example, you will be alerted when the rate of HTTP requests has
doubled over the last 5 minutes.
Duration: Specify the time window for evaluating the alert condition
in minutes, hour, or day. The alert will be triggered if the query
returns data for the specified duration.
Define Notification
Notification Channels: Select from the configured notification
channels in the list.
Re-notification Options: Set the time interval at which multiple
alerts should be sent if the problem remains unresolved.
Notification Message & Events: Enter a subject and body. Optionally,
you can choose an existing template for the body. Modify the subject,
body, or both for the alert notification with a hyperlink, plain text,
or dynamic variables.
Import Prometheus Alert Rules
Sysdig Alert allows you to import Prometheus rules or create new rules
on the fly and add them to the existing list of alerts. Click the
Upload Prometheus Rules option and enter the rules as YAML in the
Upload Prometheus Rules YAML editor. Importing your Prometheus alert
rules will convert them to PromQL-based Sysdig alerts. Ensure that the
alert rules are valid YAML.

You can upload one or more alert rules in a single YAML and create
multiple alerts simultaneously.

Once the rules are imported to Sysdig Monitor, the alert list will be
automatically sorted by last modified date.

Besides the pre-populated template, each rule specified in the Upload
Prometheus Rules YAML editor requires the following fields:
See the following examples to understand the format of Prometheus Rules
YAML. Ensure that the alert rules are valid YAML to pass validation.
Example: Alert Prometheus Crash Looping
To alert potential Prometheus crash looping. Create a rule to alert when
Prometheus restart more than twice in the last 10 minutes.
groups:
- name: crashlooping
rules:
- alert: PrometheusTooManyRestarts
expr: changes(process_start_time_seconds{job=~"prometheus|pushgateway|alertmanager"}[10m]) > 2
for: 0m
labels:
severity: warning
annotations:
summary: Prometheus too many restarts (instance {{ $labels.instance }})
description: Prometheus has restarted more than twice in the last 15 minutes. It might be crashlooping.\n VALUE = {{ $value }}\n
Example: Alert HTTP Error Rate
To alert HTTP requests with status 5xx (> 5%) or high latency:
groups:
- name: default
rules:
# Paste your rules here
- alert: NginxHighHttp5xxErrorRate
expr: sum(rate(nginx_http_requests_total{status=~"^5.."}[1m])) / sum(rate(nginx_http_requests_total[1m])) * 100 > 5
for: 1m
labels:
severity: critical
annotations:
summary: Nginx high HTTP 5xx error rate (instance {{ $labels.instance }})
description: Too many HTTP requests with status 5xx
- alert: NginxLatencyHigh
expr: histogram_quantile(0.99, sum(rate(nginx_http_request_duration_seconds_bucket[2m])) by (host, node)) > 3
for: 2m
labels:
severity: warning
annotations:
summary: Nginx latency high (instance {{ $labels.instance }})
description: Nginx p99 latency is higher than 3 seconds
Learn More
6.3 - Legacy Metric Alerts
Sysdig Monitor keeps a watch on time-series metrics, and alert if they
violate user-defined thresholds.

The lines shown in the preview chart represent the values for the
segments selected to monitor. The popup is a color-coded legend to show
which segment (or combination of segments if there is more than one) the
lines represent. You can also deselect some segment lines to prevent
them from showing in the chart. Note that there is a limit of 10 lines
that Sysdig Monitor ever shows in the preview chart.
Defining a Metric Alert
Guidelines
Set a unique name and description: Set a meaningful name and
description that help recipients easily identify the alert
Specify multiple segments: Selecting a single segment might not
always supply enough information to troubleshoot. Enrich the
selected entity with related information by adding additional
related segments. Enter hierarchical entities so you have the
bottom-down picture of what went wrong and where. For example,
specifying a Kubernetes Cluster alone does not provide the context
necessary to troubleshoot. In order to narrow down the issue, add
further contextual information, such as Kubernetes Namespace,
Kubernetes Deployment, and so on.
Specify Metrics
Select a metric that this alert will monitor. You can also define how
data is aggregated, such
as avg, max, min or sum. To alert on multiple metrics using boolean
logic, switch to multi-condition
alert.
Filter the environment on which this alert will apply.
Filter the environment on which this alert will apply. An alert will
fire when a host goes down in the availability zone, us-east-1b.

Use advanced operators to include, exclude, or pattern-match groups,
tags, and entities. See Multi-Condition
Alerts.
You can also create alerts directly from Explore and Dashboards for
automatically populating this scope.
Define the threshold and time window for assessing the alert condition.
Single Alert fires an alert for your entire scope, while Multiple Alert
fires if any or every segment breach the threshold at once.
Metric alerts can be triggered to notify you of different aggregations:
on average | The average of the retrieved metric values across the time period. Actual number of samples retrieved is used to calculate the value. For example, if new data is retrieved in the 7th minute of a 10-minutes sample and the alert is defined as on average, the alert will be calculated by summing the 3 recorded values and dividing by 3. |
as a rate | The average value of the metric across the time period evaluated. The expected number of values is used to calculate the rate to trigger the alert. For example, if new data is retrieved in the 7th minute of a 10-minutes sample and the alert is defined as as a rate, the alert will be calculated by summing the 3 recorded values and dividing by 10 ( 10 x 1 minute samples). |
in sum | The combined sum of the metric across the time period evaluated. |
at least once | The trigger value is met for at least one sample in the evaluated period. |
for the entire time | The trigger value is met for a every sample in the evaluated period. |
as a rate of change | The trigger value is met the change in value over the evaluated period. |
For example, if the file system used percentage goes above 75 for the
last 5 minutes on an average, multiple alerts will be triggered. The mac
address of the host and mount directory of the file system will be
represented in the alert notification.

Usecases
6.4 - Legacy Event Alerts
Monitor occurrences of specific events, and alert if the total number of
occurrences violates a threshold. Useful for alerting on container,
orchestration, and service events like restarts and deployments.
Alerts on events support only one segmentation label. An alert is
generated for each segment.

Defining a Metric Alert
Guidelines
Set a unique name and description: Set a meaningful name and
description that help recipients easily identify the alert.
Severity: Set a severity level for your alert. The Priority:
High
, Medium
, Low
, and Info
are reflected in the Alert list,
where you can sort by the severity by using the top navigation pane.
You can use severity as a criterion when creating events and alerts,
for example: if there are more than 10 high severity events, notify.
Event Source: Filter by one or more event sources
that should be considered by the alert. Predefined options are included for
infrastructure event sources
(kubernetes, docker, and containerd), but you can freely specify other values to match
custom event sources.
Trigger: Specify the trigger condition in terms of the number of
events for a given duration.
Event alert support only one segmentation label. If you choose
Multiple Alerts, Sysdig generates only one alert for a selected
segment.
Specify Event
Specify the name, tag, or description of an event.

Specify one or more Event Sources.
Filter the environment on which this alert will apply. Use advanced
operators to include, exclude, or pattern-match groups, tags, and
entities. You can also create alerts directly from Explore and
Dashboards for automatically populating this scope.

In this example, failing a liveness probe in the
agent-process-whitelist-cluster cluster triggers an alert.
Define the threshold and time window for assessing the alert condition.
Single Alert fires an alert for your entire scope, while Multiple Alert
fires if any or every segment breach the threshold at once.

If the number of events triggered in the monitored entity is greater
than 5 for the last 10 minutes, recipients will be notified through the
selected channel.
6.5 - Legacy Anomaly Detection Alerts
Anomaly refers to an outlier in a given data set polled from an
environment. It is a deviation from a conformed pattern. Anomaly
detection is about identifying these anomalous observations. A set of
data points collectively, a single instance of data or context-specific
abnormalities help detect anomalies. For example, unauthorized copying
of a directory from a container, high CPU or memory consumption, and so
on.

Define an Anomaly Detection Alert
Guidelines
Set a unique name and description: Set a meaningful name and
description that help recipients easily identify the alert
Severity: Set a severity level for your alert. The Priority:
High
, Medium
, Low
, and Info
are reflected in the Alert list,
where you can sort by the severity by using the top navigation pane.
You can use severity as a criterion when creating events and alerts,
for example: if there are more than 10 high severity events, notify.
Specify multiple segments: Selecting a single segment might not
always supply enough information to troubleshoot. Enrich the
selected entity with related information by adding additional
related segments. Enter hierarchical entities so you have the
bottom-down picture of what went wrong and where. For example,
specifying a Kubernetes Cluster alone does not provide the context
necessary to troubleshoot. In order to narrow down the issue, add
further contextual information, such as Kubernetes Namespace,
Kubernetes Deployment, and so on.
Specify Entity
Select one or more metrics whose behavior you want to monitor.
Filter the environment on which this alert will apply. An alert will
fire when the value returned by one of the selected metrics does not
follow the pattern in the availability zone, us-east-1b.

You can also create alerts directly from Explore and Dashboards for
automatically populating this scope.
Trigger gives you control over how notifications are created and help
prevent flooding your notification channel with notifications. For
example, you may want to receive a notification for every violation, or
only want a single notification for a series of consecutive violations.
Define the threshold and time window for assessing the alert condition.
Supported time scales are minute, hour, or day.

If the monitored host or Kubernetes cluster is not available or not
responding for the last 5 minutes, recipients will be notified.
You can set any value for % and a value greater than 1 for the time
window. For example, If you choose 50% instead of 100%, a notification
will be triggered when the entity is down for 2.5 minutes in the
selected time window of 5 minutes.
6.6 - Legacy Group Outlier Alerts
Sysdig Monitor observes a group of hosts and notifies you when one acts
differently from the rest.

Define a Group Outlier Alert
Guidelines
Set a unique name and description: Set a meaningful name and
description that help recipients easily identify the alert
Severity: Set a severity level for your alert. The Priority:
High
, Medium
, Low
and Info
are reflected in the Alert list,
where you can sort by the severity by using the top navigation pane.
You can use severity as a criterion when creating events and alerts,
for example: if there are more than 10 high severity events, notify.
Specify Entity
Select one or more metrics whose behavior you want to monitor.
Filter the environment on which this alert will apply. An alert will
fire when the value returned by one of the selected metrics does not
follow the pattern in the availability zone, us-east-1b.

You can also create alerts directly from Explore and Dashboards for
automatically populating this scope.
Trigger gives you control over how notifications are created and help
prevent flooding your notification channel with notifications. For
example, you may want to receive a notification for every violation, or
only want a single notification for a series of consecutive violations.
Define the threshold and time window for assessing the alert condition.
Supported time scales are minute, hour, or day.

If the monitored host or Kubernetes cluster is not available or not
responding for the last 5 minutes, recipients will be notified.
You can set any value for % and a value greater than 1 for the time
window. For example, If you choose 50% instead of 100%, a notification
will be triggered when the entity is down for 2.5 minutes in the
selected time window of 5 minutes.
Usecases
Load balancer servers have uneven workloads
Changes in applications or instances deployed in different
availability zones.
Network hogging hosts in a cluster