- 1:
- 2:
- 3:
- 4:
- 4.1:
- 4.2:
- 4.3:
- 4.4:
- 4.5:
- 4.6:
- 4.7:
- 5:
- 6:
- 7:
- 8:
- 9:
- 10:
Agent Configuration
Out of the box, the Sysdig agent will gather and report on a wide
variety of pre-defined metrics. It can also accommodate any number of
custom parameters for additional metrics collection.
Use this section when you need to change the default or pre-defined
settings by editing the agent configuration files, or for other
special circumstances.
Integrations for Sysdig
Monitor also require
editing the agent config files.
By default, the Sysdig agent is configured to collect metric data from a
range of platforms and applications. You can edit the agent config files
to extend the default behavior, including additional metrics for JMX,
StatsD, Prometheus, or a wide range of other applications. You can also
monitor log files for
targeted text strings.
1 -
Understand the Agent Configuration
Out of the box, the Sysdig agent will gather and report on a wide
variety of pre-defined metrics. It can also accommodate any number of
custom parameters for additional metrics collection.
The agent relies on a pair of configuration files to define metrics
collection parameters:
dragent.default.yaml
| The core configuration file. You can look at it to understand more about the default configurations provided. Location: /opt/draios/etc/dragent.default.yaml . CAUTION. This file should never be edited. |
dragent.yaml or configmap.yaml (Kubernetes)
| The configuration file where parameters can be added, either directly in YAML as name/value pairs, or using environment variables such as ADDITIONAL_CONF Location: /opt/draios/etc/dragent.yaml . |
The dragent.yaml
file can be accessed and edited in several ways,
depending on how the agent was installed. This document describes how to
modify dragent.yaml
.
One additional file, dragent.auto.yaml
is also created and used in
special circumstances. See Optional: Agent
Auto-Config for more
detail.
Access and Edit the Configuration File
There are various ways to add or edit parameters indragent.yaml
.
Option 1: With dragent.yaml (for testing)
It is possible to edit the container’s file directly on the host.
Add parameters directly in YAML.
Access dragent.yaml
directly
at"/opt/draios/etc/dragent.yaml
."
Edit the file. Use proper YAML
syntax.
See the examples at the bottom of the page.
Restart the agent for changes to take effect
Option 2: With configmap.yaml(Kubernetes)
Configmap.yaml
is the configuration file where parameters can be added,
either directly in YAML as name/value pairs, or using environment
variables such as ‘ADDTIONAL_CONF."
If you install agents as DaemonSets on a system running Kubernetes, you
use configmap.yaml
to connect with and manipulate the
underlyingdragent.yaml
file.
See also: Agent Install: Kubernetes | GKE | OpenShift |
IBM
Add parameters directly in YAML.
Edit the files locally and apply with the changes withkubectl -f.
Access theconfigmap.yaml
.
Edit the file as needed.
Apply the changes:
kubectl apply -f sysdig-agent-configmap.yaml
Running agents will automatically pick the new configuration after
Kubernetes pushes the changes across all the nodes in the cluster.
Option 3: With Docker Run (Docker)
Add -e ADDITIONAL_CONF="<VARIABLES>"
to a Docker run command, where
<VARIABLES>
contains all the customized parameters you want to
include, in a single-line format.
To insert ADDITIONAL_CONF
parameters in a Docker run command or a
daemonset
file, you must convert the YAML code into a single-line
format.
You can do the conversion manually for short snippets. To convert longer
portions of YAML, use echo|sed
commands.
In earlier versions, the Sysdig Agent connected to port 6666. This
behavior has been deprecated, as the Sysdig agent now connects to port
6443.
The basic procedure:
Write your configuration in YAML, as it would be entered directly in
dragent.yaml
.
In a bash shell, use echo
and sed
to convert to a single
line.
sed
script: echo "" | sed -e ':a' -e 'N' -e '$!ba' -e 's/\n/\\n/g'
Insert the resulting line into a Docker run command or add it to the
daemonset
file as an ADDITIONAL_CONF
.
Example: Simple
Insert parameters to turn off StatsD collection and blacklist port 6443.
Sysdig agent uses port 6443 for both inbound and outbound communication with the Sysdig backend. The agent initiates a request and keeps a connection open with the Sysdig backend for the backend to push configurations, Falco rules, policies, and so on.
Ensure that you allow the agents’ inbound and outbound communication on TCP 6443 from the respective IPs associated with your SaaS Regions. Note that you are allowing the agent to send communication outbound on TCP 6443 to the inbound IP ranges listed in the SaaS Regions.
YAML format
statsd:
enabled: false
blacklisted_ports:
- 6443
Single-line format (manual)
Use spaces, hyphens, and \n
correctly when manually converting to a
single line:
ADDITIONAL_CONF="statsd:\n enabled: false\n blacklisted_ports:\n - 6443"
Here the single line is incorporated into a full agent startup Docker
command.
docker run
--name sysdig-agent \
--privileged \
--net host \
--pid host \
-e ACCESS_KEY=1234-your-key-here-1234 \
-e TAGS=dept:sales,local:NYC \
-e ADDITIONAL_CONF="statsd:\n enabled: false\n blacklisted_ports:\n - 6443" \
-v /var/run/docker.sock:/host/var/run/docker.sock \
-v /dev:/host/dev \
-v /proc:/host/proc:ro \
-v /boot:/host/boot:ro \
-v /lib/modules:/host/lib/modules:ro \
-v /usr:/host/usr:ro \
quay.io/sysdig/agent
Example: Complex
Insert parameters to override the default configuration for a RabbitMQ
app check.
YAML format
app_checks:
- name: rabbitmq
pattern:
port: 15672
conf:
rabbitmq_api_url: "http://localhost:15672/api/"
rabbitmq_user: myuser
rabbitmq_pass: mypassword
queues:
- MyQueue1
- MyQueue2
Single-line format (echo |sed)
From a bash shell, issue the echo command and sed script.
echo "app_checks:
- name: rabbitmq
pattern:
port: 15672
conf:
rabbitmq_api_url: "http://localhost:15672/api/"
rabbitmq_user: myuser
rabbitmq_pass: mypassword
queues:
- MyQueue1
- MyQueue2
" | sed -e ':a' -e 'N' -e '$!ba' -e 's/\n/\\n/g'
This results in the single-line format to be used with ADDITIONAL_CONF
in a Docker command or daemonset file.
"app_checks:\n - name: rabbitmq\n pattern:\n port: 15672\n conf:\n rabbitmq_api_url: http://localhost:15672/api/\n rabbitmq_user: myuser\n rabbitmq_pass: mypassword\n queues:\n - MyQueue1\n - MyQueue2\n"
If you installed the Sysdig agent in Kubernetes using a Helm
chart, then no
configmap.yaml
file was downloaded. You edit dragent.yaml
using Helm
syntax:
Example
$ helm install
--name sysdig-agent
--set sysdig.settings.tags='linux:ubuntu\,dept:dev\,local:nyc'
--set clusterName='my_cluster'
sysdig/sysdig
Will be transformed into
data:
dragent.yaml: |
tags: linux:ubuntu,dept:dev,local:nyc
k8s_cluster_name: my_cluster
Table 1: Environment Variables for Agent Config File
ACCESS_KEY
| <your Sysdig access key>
| Required |
TAGS
| <meaningful tags you want applied to your instances>
| Optional. These are displayed in Sysdig Monitor for ease of use. For example: tags: linux:ubuntu,dept:dev,local:nyc
See sysdig-agent-configmap.yaml. |
COLLECTOR
| <collector-hostname.com> or 111.222.333.400
| Enter the host name or IP address of the Sysdig collector service. Note that when used within dragent.yaml , must be lowercase collector . For SaaS regions, see: SaaS Regions and IP Ranges. |
COLLECTOR_PORT
| 6443
| On-prem only. The port used by the Sysdig collector service; default 6443. |
SECURE
| "true"
| On-prem only. If using SSL/TLS to connect to collector service value = "true" otherwise "false." |
CHECK_CERTIFICATE
| "false"
| On-prem only. Set to "true" when using SSL/TLS to connect to the collector service and should check for valid SSL/TLS certificate. |
ADDITIONAL_CONF
| | Optional. A place to provide custom configuration values to the agent as environment variables . |
SYSDIG_PROBE_URL
| | Optional. An alternative URL to download precompiled kernel module. |
Sample Docker Command Using Variables
docker run \
--name sysdig-agent \
--privileged \
--net host \
--pid host \
-e ACCESS_KEY=3e762f9a-3936-4c60-9cf4-c67e7ce5793b \
-e COLLECTOR=mycollector.elb.us-west-1.amazonaws.com \
-e COLLECTOR_PORT=6443 \
-e CHECK_CERTIFICATE=false \
-e TAGS=my_tag:some_value \
-e ADDITIONAL_CONF="log:\n file_priority: debug\n console_priority: error" \
-v /var/run/docker.sock:/host/var/run/docker.sock \
-v /dev:/host/dev \
-v /proc:/host/proc:ro \
-v /boot:/host/boot:ro \
-v /lib/modules:/host/lib/modules:ro \
-v /usr:/host/usr:ro \
--shm-size=350m \
quay.io/sysdig/agent
2 -
Agent modes provide the ability to control metric collection to fit your
scale and specific requirement. You can choose one of the following
modes to do so:
Monitor
Monitor Light
Troubleshooting
Secure
Custom Metrics Only
Using a stripped-down mode limits collection of unneeded metrics, which
in turn prevents the consumption of excess resources and helps reduce
expenses.
Monitor
The Monitor mode offers an extensive collection of metrics. We recommend
this mode to monitor enterprise environments.
monitor
is the default mode if you are running the Enterprise
tier. To switch back to the
Monitor mode from a different mode, do one of the following:
Add the following to the dragent.yaml
file and restart the agent:
feature:
mode: monitor
Remove the parameter related to the existing mode from the
dragent.yaml
file and restart the agent. For example, to switch
from troubleshooting
mode to monitor
, delete the following
lines:
feature:
mode: troubleshooting
Monitor Light
Monitor Light caters to the users that run agents in a
resource-restrictive environment, or to those who are interested only in
a limited set of metrics.
Monitor Light provides CPU, Memory, File, File system, and Network
metrics. For more information, see Metrics Available in Monitor
Light.
Enable Monitor Light Mode
To switch to the Monitor Light mode, edit the dragent.yaml
file:
Open the dragent.yaml
file.
Add the following configuration parameter:
feature:
mode: monitor_light
Restart the agent.
Troubleshooting
Troubleshooting mode offers sophisticated metrics with detailed
diagnostic capabilities. Some of these metrics are heuristic in nature.
In addition to the extensive metrics available in the Monitor mode,
Troubleshooting mode provides additional metrics such as net.sql
and
additional segmentation for file and network metrics. For more
information, see Additional Metrics Values Available in
Troubleshooting.
Enable Troubleshooting Mode
To switch to the Troubleshooting mode, edit the dragent.yaml
file:
Open the dragent.yaml
file.
Add the following configuration parameter:
feature:
mode: troubleshooting
Restart the agent.
Secure Mode
The secure mode supports only Sysdig
Secure features.
Sysdig agent collects no metrics in the secure mode, which, in turn,
minimizes network consumption and storage requirement in the Sysdig
backend. Lower resource usage can help reduce costs and improve
performance.
In the Secure mode, the Monitor UI shows no data because no metrics are
sent to the collector.
This feature requires agent v10.5.0 or above.
Enabling Secure Mode
Open the dragent.yaml
file.
Add the following:
feature:
mode: secure
Restart the agent.
Custom Metrics Only Mode
Custom Metrics Only mode collects the same metrics as the Monitor
Light mode, but also adds the ability to collect the following:
- Custom Metrics: StatsD, JMX, App Checks, and Prometheus
- Kubernetes State Metrics
As such, Custom Metrics Only mode is suitable if would like to use most of the
features of Monitor mode but are limited in resources.
This mode is not compatible with Secure. If your account is
configured for Secure, you must explicitly disable Secure in
the agent configuration if you wish to use this mode.
This mode requires agent v12.4.0 or above.
Enabling Custom Metrics Only Mode
Open the dragent.yaml
file.
Add the following configuration parameter:
feature:
mode: custom-metrics-only
If your account is enabled for Secure, add the following:
security:
enabled: false
secure_audit_streams:
enabled: false
falcobaseline:
enabled: false
This configuration explicitly disables the Secure features in the agent. If you do not disable Secure, the agent will not start due to incompatiblity issues.
Restart the agent.
2.1 -
Metrics Available in Monitor Light
Monitor Light provides cpu, memory, file, file system, and network
metrics.
Metrics | Description |
---|
cpu.cores.used | See System.System |
cpu.cores.used.percent | |
cpu.idle.percent | |
cpu.iowait.percent | |
cpu.nice.percent | |
cpu.stolen.percent | |
cpu.system.percent | |
cpu.used.percent | |
cpu.user.percent | |
load.average.percpu.1m | |
load.average.percpu.5m | |
load.average.percpu.15m | |
memory.bytes.available | |
memory.bytes.total | |
memory.bytes.used | |
memory.bytes.used | |
memory.bytes.virtual | |
memory.pageFault.major | |
memory.pageFault.minor | |
memory.swap.bytes.available | |
memory.swap.bytes.total | |
memory.swap.bytes.used | |
memory.swap.used.percent | |
memory.used.percent | |
file.bytes.in | |
file.bytes.out | |
file.bytes.total | |
file.iops.in | |
file.iops.out | |
file.iops.total | |
file.open.count | |
file.time.in | |
file.time.out | |
file.time.total | |
fs.bytes.free | |
fs.bytes.total | |
fs.bytes.used | |
fs.free.percent | |
fs.inodes.total.count | |
fs.inodes.used.count | |
fs.inodes.used.percent | |
fs.largest.used.percent | |
fs.root.used.percent | |
fs.used.percent | |
net.bytes.in | |
net.bytes.out | |
net.bytes.total | |
proc.count | |
thread.count | |
container.count | |
system.uptime | |
uptime | |
2.2 -
Additional Metrics Values Available in Troubleshooting
In addition to the extensive set of metrics available in the monitor
mode, additional metrics, such as net.sql
and net.mongodb
, as well
as additional segmentations for file and network metrics are available.
Metrics | Additional Metrics Values Available When Segmented by | Supported Agent Versions |
---|
file.error.total.count | file.name and file.mount labels | Version 10.1.0 or above |
file.bytes.total | | |
file.bytes.in | | |
file.bytes.out | | |
file.open.count | | |
file.time.total | | |
host.count | | |
host.error.count | | |
proc.count | | |
proc.start.count | | |
net.mongodb.collection | all | Version 10.2.0 or above |
net.mongodb.error.count | | |
net.mongodb.operation | | |
net.mongodb.request.count | | |
net.mongodb.request.time | | |
net.sql.query | all | |
net.sql.error.count | | |
net.sql.query.type | | |
net.sql.request.count | | |
net.sql.request.time | | |
net.sql.table | | |
net.http.error.count | net.http.url | Version 10.3.0 or above |
net.http.method | | |
net.http.request.count | | |
net.http.request.time | | |
net.bytes.in | | |
net.bytes.out | | |
net.request.time.worst.out | | |
net.request.count | | |
net.request.time | | |
net.bytes.total | | |
net.http.request.time.worst | all | |
2.3 -
Metrics Not Available in Essentials Mode
The following metrics will not be reported in the essentials
mode when
compared with monitor
mode:
Metrics | Segmented By |
---|
net.bytes.in | net.connection.server , net.connection.direction , net.connection.l4proto , and net.connection.client labels |
net.bytes.out | |
net.connection.count.total | |
net.connection.count.in | |
net.connection.count.out | |
net.request.count | |
net.request.count.in | |
net.request.count.out | |
net.request.time | |
net.request.time.in | |
net.request.time.out | |
net.bytes.total | |
net.mongodb.collection | all |
net.mongodb.error.count | |
net.mongodb.operation | |
net.mongodb.request.count | |
net.mongodb.request.time | |
net.sql.query | all |
net.sql.error.count | |
net.sql.query.type | |
net.sql.request.count | |
net.sql.request.time | |
net.sql.table | |
net.sql.query | all |
net.sql.error.count | |
net.sql.query.type | |
net.sql.request.count | |
net.sql.request.time | |
net.sql.table | |
net.http.method | |
net.http.request.count | |
net.http.request.time | |
net.http.statusCode | |
net.http.url | |
3 -
Enable HTTP Proxy for Agents
You can configure the agent to allow it to communicate with the Sysdig
collector through an HTTP proxy. HTTP proxy is usually configured to
offer greater visibility and better management of the network.
Agent Behaviour
The agent can connect to the collector through an HTTP proxy by sending
an HTTP CONNECT message and receiving a response. The proxy then
initiates a TCP connection to the collector. These two connections form
a tunnel that acts like one logical connection.
By default, the agent will encrypt all messages sent through this
tunnel. This means that after the initial CONNECT message and response,
all the communication on that tunnel is encrypted by SSL end-to-end.
This encryption is controlled by the top-level ssl
parameter in the
agent configuration.
Optionally, the agent can add a second layer of encryption, securing the
CONNECT message and response. This second layer of encryption may be
desired in the case of HTTP authentication if there is a concern that
network packet sniffing could be used to determine the user’s
credentials. This second layer of encryption is enabled by setting the
ssl
parameter to true in the http_proxy
section of the agent
configuration. See
Examples
for details.
Configuration
You specify the following parameters at the same level as http_proxy
in the dragent.yaml
file. These existing configuration options affect
the communication between the agent and collector (both with and without
a proxy).
ssl
: If set to false
, the metrics sent from the agent to the
collector are unencrypted (default is true
).
ssl_verify_certificate
: Determines whether the agent verifies the
SSL certificate sent from the collector (default is true
).
The following configuration options affect the behavior of the HTTP
Proxy setting. You specify them under the http_proxy
heading in the
dragent.yaml
file.
proxy_host
: Indicates the hostname of the proxy server. The
default is an empty string, which implies communication through an
HTTP proxy is disabled.
proxy_port
: Specifies the port on the proxy server the agent
should connect to. The default is 0, which indicates that the HTTP
proxy is disabled.
proxy_user
: Required if HTTP authentication is configured. This
option specifies the username for the HTTP authentication. The
default is an empty string, which indicates that authentication is
not configured.
proxy_password
: Required if HTTP authentication is configured.
This option specifies the password for the HTTP authentication. The
default is an empty string. Specifying proxy_user
with no
proxy_password
is allowed.
ssl
: If set to true, the connection between the agent and the
proxy server is encrypted.
Note that this parameter requires the top-level ssl
parameter to
be enabled, as the agent does not support SSL to the proxy but
unencrypted traffic to the collector. This additional security
prevents you from misconfiguring the agent assuming the metrics are
as well encrypted end-to-end when they are not.
ssl_verify_certificate
: Determines whether the agent will verify
the certificate presented by the proxy.
This option is configured independently of the top-level
ssl_verify_certificate
parameter. This option is enabled by
default. If the provided certificate is not correct, this option can
cause the connection to the proxy server to fail.
ca_certificate
: The path to the CA certificate for the proxy
server. If ssl_verify_certificate
is enabled, the CA certificate
must be signed appropriately.
Examples
No SSL
The following example shows no SSL connection between the agent and the
proxy server as well as between the proxy server and the collector.
collector_port: 6667
ssl: false
http_proxy:
proxy_host: squid.yourdomain.com
proxy_port: 3128
ssl: false
SSL Between Proxy and Collector
In this example, SSL is enabled only between the proxy server and the
collector.
collector_port: 6443
ssl: true
ssl_verify_certificate: true
http_proxy:
proxy_host: squid.yourdomain.com
proxy_port: 3128
SSL
The following example shows SSL is enabled between the agent and the
proxy server as well as between the proxy server and the collector.
collector_port: 6443
ssl: true
http_proxy:
proxy_host: squid.yourdomain.com
proxy_port: 3129
ssl: true
ssl_verify_certificate: true
ca_certificate: /usr/proxy/proxy.crt
SSL with Username and Password
The following configuration instructs the agent to connect to a proxy
server located at squid.yourdomain.com
on port 3128
. The agent will
request the proxy server to establish an HTTP tunnel to the Sysdig
collector at collector-your.sysdigcloud.com
on port 6443. The agent
will authenticate with the proxy server using the given user and
password combination.
collector: collector-your.sysdigcloud.com
collector_port: 6443
http_proxy:
proxy_host: squid.yourdomain.com
proxy_port: 3128
proxy_user: sysdig_customer
proxy_password: 12345
ssl: true
ssl_verify_certificate: true
ca_certificate: /usr/proxy/proxy_cert.crt
4 -
Filter Data
The dragent.yaml
file elements are wide-reaching. This section
describes the parameters to edit in dragent.yaml
to perform a range of
activities:
4.1 -
Blacklist Ports
Use the blacklisted_ports
parameter in the agent configuration file to
block network traffic and metrics from unnecessary network ports.
Note: Port 53 (DNS) is always blacklisted.
Access the agent configuration file, using one of the options
listed.
Add blacklisted_ports
with desired port numbers.
Example (YAML):
blacklisted_ports:
- 6443
- 6379
Restart the agent (if editing dragent.yaml
file directly), using
either the service dragent restart
or
docker restart sysdig-agent
command as appropriate.
4.2 -
Enable/Disable Event Data
Sysdig Monitor supports event integrations with certain applications by
default. The Sysdig agent will automatically discover these services and
begin collecting event data from them.
The following applications are currently supported:
Other methods of ingesting custom events into Sysdig Monitor are touched
upon in Custom Events.
By default, only a limited set of events is collected for a supported
application, and are listed in the agent’s default settings
configuration file (/opt/draios/etc/dragent.default.yaml
).
To enable collecting other supported events, add an events
entry to
dragent.yaml
.
You can also change log
entry in dragent.yaml
to filter events by
severity.
Learn more about it in the following sections.
Supported Application Events
Events marked with *
are enabled by default; see the
dragent.default.yaml
file.
Docker Events
The following Docker events are supported.
docker:
container:
- attach # Container Attached (information)
- commit # Container Committed (information)
- copy # Container Copied (information)
- create # Container Created (information)
- destroy # Container Destroyed (warning)
- die # Container Died (warning)
- exec_create # Container Exec Created (information)
- exec_start # Container Exec Started (information)
- export # Container Exported (information)
- kill # Container Killed (warning)*
- oom # Container Out of Memory (warning)*
- pause # Container Paused (information)
- rename # Container Renamed (information)
- resize # Container Resized (information)
- restart # Container Restarted (warning)
- start # Container Started (information)
- stop # Container Stopped (information)
- top # Container Top (information)
- unpause # Container Unpaused (information)
- update # Container Updated (information)
image:
- delete # Image Deleted (information)
- import # Image Imported (information)
- pull # Image Pulled (information)
- push # Image Pushed (information)
- tag # Image Tagged (information)
- untag # Image Untaged (information)
volume:
- create # Volume Created (information)
- mount # Volume Mounted (information)
- unmount # Volume Unmounted (information)
- destroy # Volume Destroyed (information)
network:
- create # Network Created (information)
- connect # Network Connected (information)
- disconnect # Network Disconnected (information)
- destroy # Network Destroyed (information)
Kubernetes Events
The following Kubernetes events are supported.
kubernetes:
node:
- TerminatedAllPods # Terminated All Pods (information)
- RegisteredNode # Node Registered (information)*
- RemovingNode # Removing Node (information)*
- DeletingNode # Deleting Node (information)*
- DeletingAllPods # Deleting All Pods (information)
- TerminatingEvictedPod # Terminating Evicted Pod (information)*
- NodeReady # Node Ready (information)*
- NodeNotReady # Node not Ready (information)*
- NodeSchedulable # Node is Schedulable (information)*
- NodeNotSchedulable # Node is not Schedulable (information)*
- CIDRNotAvailable # CIDR not Available (information)*
- CIDRAssignmentFailed # CIDR Assignment Failed (information)*
- Starting # Starting Kubelet (information)*
- KubeletSetupFailed # Kubelet Setup Failed (warning)*
- FailedMount # Volume Mount Failed (warning)*
- NodeSelectorMismatching # Node Selector Mismatch (warning)*
- InsufficientFreeCPU # Insufficient Free CPU (warning)*
- InsufficientFreeMemory # Insufficient Free Mem (warning)*
- OutOfDisk # Out of Disk (information)*
- HostNetworkNotSupported # Host Ntw not Supported (warning)*
- NilShaper # Undefined Shaper (warning)*
- Rebooted # Node Rebooted (warning)*
- NodeHasSufficientDisk # Node Has Sufficient Disk (information)*
- NodeOutOfDisk # Node Out of Disk Space (information)*
- InvalidDiskCapacity # Invalid Disk Capacity (warning)*
- FreeDiskSpaceFailed # Free Disk Space Failed (warning)*
pod:
- Pulling # Pulling Container Image (information)
- Pulled # Ctr Img Pulled (information)
- Failed # Ctr Img Pull/Create/Start Fail (warning)*
- InspectFailed # Ctr Img Inspect Failed (warning)*
- ErrImageNeverPull # Ctr Img NeverPull Policy Violate (warning)*
- BackOff # Back Off Ctr Start, Image Pull (warning)
- Created # Container Created (information)
- Started # Container Started (information)
- Killing # Killing Container (information)*
- Unhealthy # Container Unhealthy (warning)
- FailedSync # Pod Sync Failed (warning)
- FailedValidation # Failed Pod Config Validation (warning)
- OutOfDisk # Out of Disk (information)*
- HostPortConflict # Host/Port Conflict (warning)*
replicationController:
- SuccessfulCreate # Pod Created (information)*
- FailedCreate # Pod Create Failed (warning)*
- SuccessfulDelete # Pod Deleted (information)*
- FailedDelete # Pod Delete Failed (warning)*
Enable/Disable Events Collection with events Parameter
To customize the default events collected for a specific application (by
either enabling or disabling events), add an events
entry to
dragent.yaml
as described in the examples below.
An entry in a section in dragent.yaml
overrides the entire section
in the default configuration.
For example, the Pulling
entry below will permit only kubernetes pod
Pulling
events to be collected and all other kubernetes pod events
settings in dragent.default.yaml
will be ignored.
However, other kubernetes sections - node
and replicationController
-
remain intact and will be used as specified in dragent.default.yaml.
Example 1: Collect Only Certain Events
Collect only ‘Pulling’ events from Kubernetes for pods:
events:
kubernetes:
pod:
- Pulling
Example 2: Disable All Events in a Section
To disable all events in a section, set the event section to none
:
events:
kubernetes: none
docker: none
Example 3: Combine Methods
These methods can be combined. For example, disable all kubernetes node
and docker image events and limit docker container events to
[attach, commit, copy]
(components events in other sections will be
collected as specified by default):
events:
kubernetes:
node: none
docker:
image: none
container:
- attach
- commit
- copy
In addition to bulleted lists, sequences can also be specified in a
bracketed single line, eg.:
events:
kubernetes:
pod: [Pulling, Pulled, Failed]
So, the following two settings are equivalent, permitting only
Pulling, Pulled, Failed
events for pods to be emitted:
events:
kubernetes:
pod: [Pulling, Pulled, Failed]
events:
kubernetes:
pod:
- Pulling
- Pulled
- Failed
Change Event Collection by Severity with log Parameter
Events are limited globally at the agent level based on severity, using
the log
settings in dragent.yaml
.
The default setting for the events severity filter is information
(only warning and higher severity events are transmitted).
Valid severity levels are:
none, emergency, alert, critical, error, warning, notice, information, debug
.
Example 1: Block Low-Severity Messages
Block all low-severity messages (notice, information, debug
):
log:
event_priority: warning
Example 2: Block All Event Collection
Block all event collection:
log:
event_priority: none
For other uses of the log
settings see Optional: Change the Agent Log
Level.
4.3 -
Include/Exclude Custom Metrics
For more information, see Integrate Applications (Default App
Checks).
It is possible to filter custom metrics in the following ways:
Ability to include/exclude custom metrics using configurable
patterns,
Ability to log which custom metrics are exceeding limits
After you identify those key custom metrics that must be received, use
the new ‘include’ and ’exclude’ filtering parameters to make sure you
receive them before the metrics limit is hit.
Filter Metrics Example
Here is an example configuration entry that would be put into the agent
config file: (/opt/draios/etc/dragent.yaml)
metrics_filter:
- include: test.*
- exclude: test.*
- include: haproxy.backend.*
- exclude: haproxy.*
- exclude: redis.*
Given the config entry above, this is the action for these metrics:
test.* → send
haproxy.backend.request → send
haproxy.frontend.bytes → drop
redis.keys → drop
The semantic is: whenever the agent is reading metrics, they are
filtered according to configured filters and the filtering rule order -
the first rule that matches will be applied. Thus since the inclusion
item for test.* was listed first it will be followed and that second
’exclude’ rule for the same exact metric entry will be ignored.
Logging Accepted/Dropped Metrics
Logging is disabled by default. You can enable logging to see which
metrics are accepted or dropped by adding the following configuration
entry into the dragent.yaml config file:
metrics_excess_log: true
When logging of excess metrics is enabled, logging occurs at INFO-level,
every 30 seconds and lasts for 10 seconds. The entries that can be seen
in /opt/draios/logs/draios.log
will be formatted like this:
+/-[type] [metric included/excluded]: metric.name (filter: +/-[metric.filter])
The first ‘+’ or ‘-’, followed by ’type’ provides an easy way to quickly
scan the list of metrics and spot which are included or excluded (’+’
means “included”, ‘-’ means “excluded”).
The second entry specifies metric type (“statsd”, “app_check”,
“service_check”, or “jmx”).
A third entry spells out whether “included” or “excluded”, followed by
the metric name. Finally, inside the last entry (in parentheses), there
is information about filter applied and its effect (’+’ or ‘-’, meaning
“include” or “exclude”).
With this example filter rule set:
metrics_filter:
- include: mongo.statsd.net*
- exclude: mongo.statsd.*
We might see the following INFO-level log entries (timestamps stripped):
-[statsd] metric excluded: mongo.statsd.vsize (filter: -[mongo.statsd.*])
+[statsd] metric included: mongo.statsd.netIn (filter: +[mongo.statsd.net*])
4.4 -
Prioritize Designated Containers
To get the most out of Sysdig Monitor, you may want to customize the way
in which container data is prioritized and reported. Use this page to
understand the default behavior and sorting rules, and to implement
custom behavior when and where you need it. This can help reduce agent
and backend load by not monitoring unnecessary containers, or– if
encountering backend limits for containers– you can filter to ensure
that the important containers are always reported.
Overview
By default, a Sysdig agent will collect metrics from all containers it
detects in an environment. When reporting to the Monitor interface, it
uses default sorting behavior to prioritize what container information
to display first.
Understand Default Behavior
Out of the box, it chooses the containers with the highest
and allocates approximately 1/4 of the total limit to each stat type.
Understand Simple Container Filtering
As of agent version 0.86,
it is possible set a use_container_filter
parameter in the agent
config
file, tag/label specific containers, and set include/exclude
rules to push those containers to the top of the reporting hierarchy.
This is an effective sorting tool when:
You can manually mark each container with an include
or exclude
tag, AND
The number of includes is small (say, less than 100)
In this case, the containers that explicitly match the include
rules
will take top priority.
Understand Smart Container Reporting
In some enterprises, the number of containers is too high to tag with
simple filtering rules, and/or the include_all
group is too large to
ensure that the most-desired containers are consistently reported. As of
Sysdig agent version 0.91,
you can append another parameter to the agent config file,
smart_container_reporting
.
This is an effective sorting tool when:
The number of containers is large and you can’t or won’t mark each
one with include/exclude tags, AND
There are certain containers you would like to always prioritize
This helps ensure that even when there are thousands of containers in an
environment, the most-desired containers are consistently reported.
Container filtering and smart container reporting affect the monitoring
of all the processes/metrics within a container, including StatsD, JMX,
app-checks, and built-in metrics.
Prometheus metrics are attached to processes, rather than containers,
and are therefore handled differently.
The container limit is set in dragent.yaml under containers:limit:
Understand Sysdig Aggregated Container
The sydig_aggregated
parameter is automatically activated when smart
container reporting is enabled, to capture the most-desired metrics from
the containers that were excluded by smart filtering and report them
under a single entity. It appears like any other container in the Sysdig
Monitor UI, with the name “sysdig_aggregated.
”
Sysdig_aggregated
can report on a wide array of metrics; see
Sysdig_aggregated Container
Metrics. However, because
this is not a regular container, certain limitations apply:
container_id and container_image do not exist.
The aggregated container cannot be segmented by certain metrics that
are excluded, such as process.
Some default dashboards associated with the aggregated container may
have some empty graphs.
Use Simple Container Filtering
By default, the filtering feature is turned off. It can be enabled by
adding the following line to the agent configuration:
use_container_filter: true
When enabled, the agent will follow include/exclude filtering rules
based on:
The default behavior in default.dragent.yaml
excludes based on a
container label (com.sysdig.report
) and/or a Kubernetes pod
annotation (.sysdig.com/report
).
Container Condition Parameters and Rules
Parameters
The condition parameters are described in the following table:
container.image
| Matches if the process is running inside a container running the specified image | - include:
container.image: luca3m/prometheus-java-app
|
container.name
| Matches if the process is running inside a container with the specified name | - include:
container.name: my-java-app
|
container.label.*
| Matches if the process is running in a container that has a Label matching the given value | - include:
container.label.class: exporter
|
kubernetes.<object>.annotation.* kubernetes.<object>.label.*
| Matches if the process is attached to a Kubernetes object (Pod, Namespace, etc.) that is marked with the Annotation/Label matching the given value. | - include:
kubernetes.pod.annotation.prometheus.io/scrape: true
|
all | Matches all. Use as last rule to determine default behavior. | - include:
all
|
Rules
Once enabled (when use_container_filter: true
is set), the agent will
follow filtering rules from the container_filter
section.
Each rule is an include
or exclude
rule which can contain one or
more conditions.
The first matching rule in the list will determine if the container
is included or excluded.
The conditions consist of a key name and a value. If the given key
for a container matches the value, the rule will be matched.
If a rule contains multiple conditions they all need to match for
the rule to be considered a match.
Default Configuraton
The dragent.default.yaml
contains the following default configuration
for container filters:
use_container_filter: false
container_filter:
- include:
container.label.com.sysdig.report: true
- exclude:
container.label.com.sysdig.report: false
- include:
kubernetes.pod.annotation.sysdig.com/report: true
- exclude:
kubernetes.pod.annotation.sysdig.com/report: false
- include:
all
Note that it excludes via a container.label
and by a
kubernetes.pod.annotation.
The examples on this page show how to edit in the dragent.yaml
file
directly. Convert the examples to Docker or Helm commands, if applicable
for your situation.
Enable Container Filtering in the Agent Config File
Option 1: Use the Default Configuration
To enable container filtering using the default configuration in
default.dragent.yaml
(above), follow the steps below.
1. Apply Labels and/or Annotations to Designated Containers
To set up, decide which containers should be excluded from automatic
monitoring.
Apply the container label .com.sysdig.report
and/or the Kubernetes
pod annotation sysdig.com/report
to the designated containers.
2. Edit the Agent Configuration
Add the following line to dragent.yaml
to turn on the default
functionality:
use_container_filter: true
Option 2: Define Your Own Rules
You can also edit dragent.yaml
to apply your own container filtering
rules.
1. Designate Containers
To set up, decide which containers should be excluded from automatic
monitoring.
Note the image, name, label, or Kubernetes pod information as
appropriate, and build your rule set accordingly.
2. Edit the Agent Configuration
For example:
use_container_filter: true
container_filter:
- include:
container.name: my-app
- include:
container.label.com.sysdig.report: true
- exclude:
kubernetes.namespace.name: kube-system
container.image: "gcr.io*"
- include:
all
The above example shows a container_filter
with 3 include rules and 1
exclude rule.
If the container name is “my-app
” it will be included.
Likewise, if the container has a label with the key
“com.sysdig.report
” and with the value “true
”.
If neither of those rules is true, and the container is part of a
Kubernetes hierarchy within the “kube-system
” namespace and the
container image starts with “gcr.io
”, it will be excluded.
The last rule includes all, so any containers not matching an
earlier rule will be monitored and metrics for them will be sent to
the backend.
Use Smart Container Reporting
As of Sysdig agent version
0.91, you can add another
parameter to the config file: smart_container_reporting = true
This enables several new prioritization checks:
The sort is modified with the following rules in priority order:
User-specified containers come before others
Containers reported previously should be reported before those which
have never been reported
Containers with higher usage by each of the 4 default stats should
come before those with lower usage
Enable Smart Container Reporting and sysdig_aggregated
Set up any simple container filtering rules you need, following
either Option 1 or Option 2, above.
Edit the agent configuration:
smart_container_reporting: true
This turns on both smart_container_reporting
and
sysdig_aggregated
. The changes will be visible in the Sysdig
Monitor UI.
See also Sysdig_aggregated Container
Metrics..
Logging
When the log level is set to DEBUG, the following messages may be found
in the logs:
message | meaning |
---|
container <id>, no filter configured | container filtering is not enabled |
container <id>, include in report | container is included |
container <id>, exclude in report | container is excluded |
Not reporting thread <thread-id> in container <id> | Process thread is excluded |
See also: Optional: Change the Agent Log
Level.
4.4.1 -
Sysdig Aggregated Container Metrics
Sysdig_aggregated containers can report on the following metrics:
tcounters
other
time_ns
time_percentage
count
io_file
time_ns_in
time_ns_out
time_ns_other
time_percentage_in
time_percentage_out
time_percentage_other
count_in
count_out
count_other
bytes_in
bytes_out
bytes_other
io_net
time_ns_in
time_ns_out
time_ns_other
time_percentage_in
time_percentage_out
time_percentage_other
count_in
count_out
count_other
bytes_in
bytes_out
bytes_other
processing
time_ns
time_percentage
count
reqcounters
other
time_ns
time_percentage
count
io_file
time_ns_in
time_ns_out
time_ns_other
time_percentage_in
time_percentage_out
time_percentage_other
count_in
count_out
count_other
bytes_in
bytes_out
bytes_other
io_net
time_ns_in
time_ns_out
time_ns_other
time_percentage_in
time_percentage_out
time_percentage_other
count_in
count_out
count_other
bytes_in
bytes_out
bytes_other
processing
time_ns
time_percentage
count
max_transaction_counters
time_ns_in
time_ns_out
count_in
count_out
resource_counters
syscall_errors
count
count_file
count_file_opened
count_net
protos
http
server_totals
ncalls
time_tot
time_max
bytes_in
bytes_out
nerrors
client_totals
ncalls
time_tot
time_max
bytes_in
bytes_out
nerrors
mysql
server_totals
ncalls
time_tot
time_max
bytes_in
bytes_out
nerrors
client_totals
ncalls
time_tot
time_max
bytes_in
bytes_out
nerrors
postgres
server_totals
ncalls
time_tot
time_max
bytes_in
bytes_out
nerrors
client_totals
ncalls
time_tot
time_max
bytes_in
bytes_out
nerrors
mongodb
server_totals
ncalls
time_tot
time_max
bytes_in
bytes_out
nerrors
client_totals
ncalls
time_tot
time_max
bytes_in
bytes_out
nerrors
names
transaction_counters
time_ns_in
time_ns_out
count_in
count_out
4.5 -
Include/Exclude Processes
In addition to filtering data by container, it is also possible to
filter independently by process. Broadly speaking, this refinement helps
ensure that relevant data is reported while noise is reduced. More
specifically, use cases for process filtering may include:
Wanting to alert reliably whenever a given process goes down. The
total number of processes can exceed the reporting limit; when that
happens, some processes are not reported. In this case, an
unreported process could be misinterpreted as being “down.” Specify
a filter for 30-40 processes to guarantee that they will always be
reported.
Wanting to limit the number of noisy but inessential processes being
reported, for example: sed, awk, grep, and similar tools that may be
used infrequently.
Wanting to prioritize workload-specific processes, perhaps from
integrated applications such as NGINX, Supervisord or PHP-FPM.
Note that you can report on processes and containers independently;
the including/excluding of one does not affect the including/excluding
of the other.
Prerequisites_Processes
This feature requires the following Sysdig component versions:
Understand Process Filtering Behavior
By default, processes are reported according to internal criteria such
as resource usage (CPU/memory/file and net IO) and container count.
If you choose to enable process filtering, processes in the include list
will be given preference over other internal criteria.
Processes are filtered based on a standard priority filter description
already used in Sysdig yaml files. It is comprised of -include and
-exclude statements which are matched in order, with evaluation ceasing
with the first matched statement. Statements are considered matched if
EACH of the conditions in the statement is met.
Use Process Filtering
Edit dragent.yaml per the following patterns to implement the filtering
you need.
Process Condition Parameters and Rules
The process:
condition parameters and rules are described below.
Name | Value | Description |
---|
app_checks_always_send: | true/false | Legacy config that causes the agent to emit any process with app check. With process filtering, this translates to an extra “include” clause at the head of the process filter which matches a process with any app check, thereby overriding any exclusions. Still subject to limit. |
flush_filter: | | Definition of process filter to be used if flush_filter_enabled == true. Defaults to -include all |
flush_filter_enabled: | true/false | Defaults to false (default process reporting behavior). Set to true to use the rest of the process filtering options. |
limit: | N (chosen number) | Defines the approximate limit of processes to emit to the backend, within 10 processes or so. Default is 250 processes. |
top_n_per_container: | N (chosen number) | Defines how many of the top processes per resource category per emitted container to report after included processes. Still subject to limit. Defaults to 1. |
top_n_per_host: | N (chosen number) | Defines how many of the top processes per resource category per host are reported before included processes. Still subject to limit. Defaults to 1. |
The process: Condition Parameters
Rules
container.image: my_container_image
Validates whether the
container image associated with the process is a wild card match of
the provided image name
container.name: my_container_name Validates whether the container
name associated with the process is a wild card match of the
provided image name
container.label.XYZ: value
Validates whether the label XYZ of the
container associated with the process is a wildcard match of the
provided value
process.name: my_process_name
Validates whether the name of the
process is a wild card match of the provided value
process.cmdline: value
Checks whether the executable name of a
process contains the specified value, or any argument to the process
is a wildcard match of the provided value
appcheck.match: value
Checks whether the process has any appcheck
which is a wildcard match of the given value
all
Matches all processes, but does not whitelist them, nor does
it blacklist them. If no filter is provided, the default
is -include all
. However, if a filter is provided and no match is
made otherwise, then all unmatched processes will be blacklisted. In
most cases, the definition of a process filter should end
with -include: all
.
Examples
Block All Processes from a Container
Block all processes from a given container. No processes
from some_container_name will be reported.
process:
flush_filter_enabled: true
flush_filter:
- exclude:
container.name: some_container_name
- include:
allprocess: flush_filter: - exclude: container.name: some_container_name - include: all
Prioritize Processes from a Container
Send all processes from a given container at high priority.
process:
flush_filter_enabled: true
flush_filter:
- include:
container.name: some_container_name
- include:
all
Prioritize “java” Processes
Send all processes that contain ‘java" in the name at high priority.
process:
flush_filter_enabled: true
flush_filter:
- include:
process.name: java
- include:
all
Prioritize “java” Processes from a Particular Container
Send processes containing “java” from a given container at high
priority.
process:
flush_filter_enabled: true
flush_filter:
- include:
container.name: some_container_name
process.name: java
- include:
all
Prioritize “java” Processes not in a Particular Container
Send all processes that contain “java” in the name that are not in
container some_container_nane
.
process:
flush_filter_enabled: true
flush_filter:
- exclude:
container.name: some_container_name
- include:
process.name: java
- include:
all
Prioritize “java” Processes even from an Excluded Container
Send all processes containing “java” in the name. If a process does not
contain “java” in the name and if the container within which the process
runs is named some_container_name, then exclude it.
Note that each include/exclude rule is handled sequentially and
hierarchically so that even if the container is excluded, it can still
report “java” processes.
flush_filter:
- flush_filter_enabled: true
- include:
process.name: java
- exclude:
container.name: some_container_name
- include:
all
Prioritize “java” Processes and “sql” Processes from Different Containers
Send Java processes from one container and SQL processes from another at
high priority.
process:
flush_filter:
- flush_filter_enabled: true
- include:
container.name: java_container_name
process.name: java
- include
container.name: sql_container_name
process.name: sql
- include
all
Report ONLY Processes in a Particular Container
Only send processes running in a container with a given label.
process:
flush_filter:
- flush_filter_enabled: true
- include:
=container.label.report_processes_from_this_container_example_label: true
- exclude:
all
4.6 -
Collect Metrics from Remote File Systems
Sysdig agent does not automatically discover and collect metrics from
external file systems, such as NFS, by default. To enable collecting
these metrics, add the following entry to the dragent.yaml
file:
remotefs = true
In addition to the remote file systems, the following mount types are
also excluded because they cause high load.
mounts_filter:
- exclude: "*|autofs|*"
- exclude: "*|proc|*"
- exclude: "*|cgroup|*"
- exclude: "*|subfs|*"
- exclude: "*|debugfs|*"
- exclude: "*|devpts|*"
- exclude: "*|fusectl|*"
- exclude: "*|mqueue|*"
- exclude: "*|rpc_pipefs|*"
- exclude: "*|sysfs|*"
- exclude: "*|devfs|*"
- exclude: "*|devtmpfs|*"
- exclude: "*|kernfs|*"
- exclude: "*|ignore|*"
- exclude: "*|rootfs|*"
- exclude: "*|none|*"
- exclude: "*|tmpfs|*"
- exclude: "*|pstore|*"
- exclude: "*|hugetlbfs|*"
- exclude: "*|*|/etc/resolv.conf"
- exclude: "*|*|/etc/hostname"
- exclude: "*|*|/etc/hosts"
- exclude: "*|*|/var/lib/rkt/pods/*"
- exclude: "overlay|*|/opt/stage2/*"
- exclude: "/dev/mapper/cl-root*|*|/opt/stage2/*"
- exclude: "*|*|/dev/termination-log*"
- include: "*|*|/var/lib/docker"
- exclude: "*|*|/var/lib/docker/*"
- exclude: "*|*|/var/lib/kubelet/pods/*"
- exclude: "*|*|/run/secrets"
- exclude: "*|*|/run/containerd/*"
- include: "*|*|*"
To include a mount type:
Open the dragent.yaml
file.
Remove the corresponding line from the exclude list in the
mount_filter
.
Add the file mount to the include list under mount_filter
.
The format is:
# format of a mount filter is:
# ```
# mounts_filter:
# - exclude: "device|filesystem|mount_directory"
# - include: "pattern1|pattern2|pattern3"
For example:
mounts_filter:
- include: "*|autofs|*"mounts_filter:
- include: "overlay|*|/opt/stage2/*"
- include: "/dev/mapper/cl-root*|*|/opt/stage2/*"
Save the configuration changes and restart the agent.
4.7 -
Disable Captures
Sometimes, security requirements dictate that capture functionality
should NOT be triggered at all (for example, PCI compliance for payment
information).
To disable Captures altogether:
Access using one of the options
listed.
This example accesses dragent.yaml
directly. ``
Set the parameter:
sysdig_capture_enabled: false
Restart the agent, using the command: ``
service dragent restart
See Captures for more
information on the feature
5 -
Reduce Memory Consumption in Agent
Sysdig provides a configuration option called thin cointerface to reduce
the memory footprint in the agent. When the agent is installed as a
Kubernetes daemonset, you can optionally enable the thin cointerface in
the sysdig-agent configmap
.
Pros
- Reduces memory consumption
- Particularly useful on very large Kubernetes clusters (>10,000 pods)
Cons
- Less frequently used option which is therefore less battle-tested
- If a watch is dropped and re-list is required (e.g. in case of a network issue, and apiserver update, etc.), there is no cache to maintain the resources. In this case, the agent must process many additional events.
How It Works
In a typical Kubernetes cluster, two instances of agent daemonset are
installed to retrieve the data. They are automatically connected to the
Kubernetes API server to retrieve the metadata associated with the
entities running on the cluster and sends the global Kubernetes state to
the Sysdig backend. Sysdig uses this data to generate kube state
metrics.
A delegated agent will not have a higher CPU or memory footprint than a
non-delegated agent.
On very large Kubernetes clusters (in the range of 10,000 pods) or
clusters with several replication controllers, the agent’s data
ingestion can have a significant memory footprint on itself and on the
Kubernetes API server. Thin cointerface is provided to reduce this
impact.
Enabling this option changes the way the agent communicates with the API
server and reduces the need to cache data, which in turn reduces the
overall memory usage. Thin cointerface does this by moving some
processing from the agent’s cointerface process to the dragent process.
This change does not alter the data which is ultimately sent to the
backend nor will it impact any Sysdig feature.
The thin cointerface feature is disabled by default.
To Enable:
Add the following in either the sysdig-agent’s configmap
or via
the dragent.yaml
file:
thin_cointerface_enabled: true
6 -
Enable Kube State Metrics
Agent Versions 12.5.0 and Onward
HPA kube state metrics are no longer collected by default. To enable the agent to collect HPA kube state metrics, you must edit the agent configuration file, dragent.yaml
, and include it along with the other resources you would like to collect.
For example, to collect all supported resources including HPAs, add the following to dragent.yaml
:
k8s_extra_resources:
include:
- services
- resourcequotas
- persistentvolumes
- persistentvolumeclaims
- horizontalpodautoscalers
Agent Versions 12.3.x and 12.4.x
The Sysdig agent collects HPA, PVS, PV, Resourcequota, and Services kube state metrics by default.
To disable some of them, you must edit the agent config file, dragent.yaml
, as follows:
k8s_extra_resources:
include:
- services
- resourcequotas
- persistentvolumes
- persistentvolumeclaims
- horizontalpodautoscalers
The above list includes all the supported resources so you must remove the resources you are not interested in.
For example, if you wanted to disable Services, it should look like the following:
k8s_extra_resources:
include:
- resourcequotas
- persistentvolumes
- persistentvolumeclaims
- horizontalpodautoscalers
For more information, see Understanding the Agent Configuration Files.
7 -
Process Kubernetes Events
Use Go to Process Kubernetes Events
Required: Sysdig agent version 92.1 or higher.
As of agent version 9.5.0, go_k8s_user_events:true
is the default
setting. Set to false
to use the older, C++-based version.
To streamline Sysdig agent processing times and reduce CPU load, you can
use an updated processing engine written in Go.
To do so, edit the following code in dragent.yaml
:
go_k8s_user_events: true
Kubernetes Audit Events
The agent listens on /k8s-audit
for Kubernetes audit events. Configure
the path using the following configuration option:
security:{k8s_audit_server_path_uris: [path1, path2]}
For more information, see Kubernetes Audit
Logging.
Working with containerd in K3S
If you have containerd using a custom socket, you can specify this parameter in the agent configuration to correctly capture the containers’ metadata:
cri:
socket_path: /run/k3s/containerd/containerd.sock
8 -
Manage Agent Log Levels
Sysdig allows you to configure file log levels for agents globally and
granularly.
8.1 -
Change Agent Log Level Globally
The Sysdig agent generates log entries in /opt/draios/logs/draios.log
.
The agent will rotate the log file when it reaches 10MB in size, keeping
the 10 most recent log files archived with a date-stamp appended to the
filename.
In order of increasing detail, the log levels available are: [ none
| critical| error | warning |notice | info | debug | trace ].
The default level (info) creates an entry for each aggregated
metrics transmission to the backend servers, once per second, in
addition to entries for any warnings and errors.
Setting the value lower than info
may prohibit troubleshooting
agent-related issues.
The type and amount of logging can be changed by adding parameters and
log level arguments shown below to the agent’s user settings
configuration file here:
/opt/draios/etc/dragent.yaml
After editing the dragent.yaml
file, restart the agent at the shell
with: service dragent restart
to affect changes.
Note that dragent.yaml
code can be written in both YAML and JSON. The
examples below use YAML.
File Log Level
When troubleshooting agent behavior, increase the logging to debug for
full detail:
log:
file_priority: debug
If you wish to reduce log messages going to the
/opt/draios/logs/draios.log
file, add the log:
parameter with one of
the following arguments under it and indented two spaces: [ none |
error | warning | info | debug | trace ]
log:
file_priority: error
Container Console Logging
If you are running the containerized agent, you can also reduce
container console output by adding the additional parameter
console_priority:
with the same arguments [ none | error | warning
| info | debug | trace ]
log:
console_priority: warning
Note that troubleshooting a host with less than the default ‘info’ level
will be more difficult or not possible. You should revert to ‘info’ when
you are done troubleshooting the agent.
A level of ’error’ will generate the fewest log entries, a level of
’trace’ will give the most, ‘info’ is the default if no entry exists.
Example in dragent.yaml
customerid: 831f3-Your-Access-Key-9401
tags: local:sf,acct:eng,svc:websvr
log:
file_priority: warning
console_priority: info
OR
customerid: 831f3-Your-Access-Key-9401
tags: local:sf,acct:eng,svc:websvr
log: { file_priority: debug, console_priority: debug }
Docker run command
If you are using the “ADDITIONAL_CONF” parameter to start a Docker
containerized agent, you would specify this entry in the Docker run
command:
-e ADDITIONAL_CONF="log: { file_priority: error, console_priority: none }"
-e ADDITIONAL_CONF="log:\n file_priority: error\n console_priority: none"
Kubernetes Infrastructure
When running in a Kubernetes infrastructure (installed using the v1
method, comment in the “ADDITIONAL_CONF” line in the agent
sysdig-daemonset.yaml
manifest file, and modify as needed:
- name: ADDITIONAL_CONF #OPTIONAL pass additional parameters to the agent
value: "log:\n file_priority: debug\n console_priority: error"
8.2 -
Manage File Logging for Agent Components
Sysdig Agent provides the ability to set component-wise log levels that
override the global file logging level controlled by the file_priority
configuration option. The components represent internal software modules
and can be found in /opt/draios/logs/draios.log
.
By controlling logging at the fine-grained component level, you can
avoid excessive logging from certain components in draios.log
or
enable extra logging from specific components for troubleshooting.
Components can also have an optional feature level logging that
can provide a way to control the logging for a particular feature
in Sysdig Agent.
To set feature-level or component-level logging:
Determine the agent feature or component you want to set the log level:
To do so,
Open the /opt/draios/logs/draios.log
file.
Copy the component name.
The format of the log entry is:
<timestamp>, <<pid>.<tid>>, <log level>, [feature]:<component>[pid]:[line]: <message>
For example, the given snippet from a sample log file shows log
messages from promscrape
featture, sdjagent
, mountedfs_reader
,
watchdog_runnable
, protobuf_file_emitter
,
connection_manager
, and dragent
.
2020-09-07 17:56:01.173, 27979.28018, Information, sdjagent[27980]: Java classpath: /opt/draios/share/sdjagent.jar
2020-09-07 17:56:01.173, 27979.28018, Information, mountedfs_reader: Starting mounted_fs_reader with pid 27984
2020-09-07 17:56:01.174, 27979.28019, Information, watchdog_runnable:105: connection_manager starting
2020-09-07 17:56:01.174, 27979.28019, Information, protobuf_file_emitter:64: Will save protobufs for all message types
2020-09-07 17:56:01.174, 27979.28019, Information, connection_manager:282: Initiating connection to collector
2020-09-07 17:56:01.175, 27979.27979, Information, dragent:1243: Created Sysdig inspector
2020-09-07 18:52:40.065, 27979.27980, Debug, promscrape:prom_emitter:72: Sent 927 Prometheus metrics of 7297 total
2020-09-07 18:52:41.129, 27979.27981, Information, promscrape:prom_stats:45: Prometheus timeseries statistics, 5 endpoints
To set feature-level logging:
Open /opt/draios/etc/dragent.yaml
.
Edit the dragent.yaml
file and add the desired feature:
In this example, you are setting the global level to notice and
promscrape
feature level to info.
log:
file_priority: notice
file_priority_by_component:
- "promscrape: info"
The log levels specified for feature override global settings.
To set component-level logging:
Open /opt/draios/etc/dragent.yaml
.
Edit the dragent.yaml
file and add the desired feature:
In this example, you are setting the global level to notice and
promscrape
feature level to info, sdjagent
, mountedfs_reader
component log level to debug, watchdog_runnable
component log level
to warning and promscrape:prom_emitter
component log level to debug.
log:
file_priority: notice
file_priority_by_component:
- "promscrape: info"
- "promscrape:prom_emitter: debug"
- "watchdog_runnable: warning"
- "sdjagent: debug"
- "mountedfs_reader: debug"
The log levels specified for feature override global settings.
The log levels specified for component overide feature and global settings.
Restart the agent.
For example, if you have installed the agent as a service, then run:
$ service dragent restart
8.3 -
Manage Console Logging for Agent Components
Sysdig Agent provides the ability to set component-wise log levels that
override the global console logging level controlled by the
console_priority
configuration option. The components represent
internal software modules and can be found in
/opt/draios/logs/draios.log
.
By controlling logging at the fine-grained component level, you can
avoid excessive logging from certain components in draios.log
or
enable extra logging from specific components for troubleshooting.
Components can also have an optional feature level logging that
can provide a way to control the logging for a particular feature
in Sysdig Agent.
To set feature-level or component-level logging:
Determine the agent component you want to set the log level:
To do so,
Look at the console output.
If you’re using an orchestrator like Kubernetes, the log viewer
facility, such as the kubectl
log command, shows the console
log output.
Copy the component name.
The format of the log entry is:
<timestamp>, <<pid>.<tid>>, <log level>, [feature]:<component>[pid]:[line]: <message>
For example, the given snippet from a sample log file shows log
messages from promscrape
featture, sdjagent
, mountedfs_reader
,
watchdog_runnable
, protobuf_file_emitter
,
connection_manager
, and dragent
.
2020-09-07 17:56:01.173, 27979.28018, Information, sdjagent[27980]: Java classpath: /opt/draios/share/sdjagent.jar
2020-09-07 17:56:01.173, 27979.28018, Information, mountedfs_reader: Starting mounted_fs_reader with pid 27984
2020-09-07 17:56:01.174, 27979.28019, Information, watchdog_runnable:105: connection_manager starting
2020-09-07 17:56:01.174, 27979.28019, Information, protobuf_file_emitter:64: Will save protobufs for all message types
2020-09-07 17:56:01.174, 27979.28019, Information, connection_manager:282: Initiating connection to collector
2020-09-07 17:56:01.175, 27979.27979, Information, dragent:1243: Created Sysdig inspector
2020-09-07 18:52:40.065, 27979.27980, Debug, promscrape:prom_emitter:72: Sent 927 Prometheus metrics of 7297 total
2020-09-07 18:52:41.129, 27979.27981, Information, promscrape:prom_stats:45: Prometheus timeseries statistics, 5 endpoints
To set feature-level logging:
Open /opt/draios/etc/dragent.yaml
.
Edit the dragent.yaml
file and add the desired feature:
In this example, you are setting the global level to notice and
promscrape
feature level to info.
log:
console_priority: notice
console_priority_by_component:
- "promscrape: info"
The log levels specified for feature override global settings.
To set component-level logging:
Open /opt/draios/etc/dragent.yaml
.
Edit the dragent.yaml
file and add the desired feature:
In this example, you are setting the global level to notice and
promscrape
feature level to info, sdjagent
, mountedfs_reader
component log level to debug, watchdog_runnable
component log level
to warning and promscrape:prom_emitter
component log level to debug.
log:
console_priority: notice
console_priority_by_component:
- "promscrape: info"
- "promscrape:prom_emitter: debug"
- "watchdog_runnable: warning"
- "sdjagent: debug"
- "mountedfs_reader: debug"
The log levels specified for feature override global settings.
The log levels specified for component overide feature and global settings.
Restart the agent.
For example, if you have installed the agent as a service, then run:
$ service dragent restart
9 -
Agent Auto-Config
Introduction
If you want to maintain centralized control over the configuration of
your Sysdig agents, one of the following approaches is typically ideal:
Via an orchestration system, such as using
Kubernetes or
Mesos/Marathon.
Using a configuration management system, such as
Chef or
Ansible.
However, if these approaches are not viable for your environment, or to
further augment your Agent configurations via central control, Sysdig
Monitor provides an Auto-Config option for agents. The feature allows
you to upload fragments of YAML configuration to Sysdig Monitor that
will be automatically pushed and applied to some/all of your Agents
based on your requirements.
Enable Agent Auto-Config
Independent of the Auto-Config feature, typical Agent configuration
lives in /opt/draios/etc and is derived from a combination of base
config in the dragent.default.yaml file and any overrides that may
be present in dragent.yaml. See also Understanding the Agent Config
Files.
Agent Auto-Config adds a middle layer of possible overrides in an
additional file dragent.auto.yaml.When present, the the order of
config application from highest precedence to lowest now becomes:
dragent.yaml
dragent.auto.yaml
dragent.default.yaml
While all Agents are by default prepared to receive and make use of
Auto-Config data, the file dragent.auto.yaml will not be present on
an Agent until you’ve pushed central Auto-Config data to be applied to
that Agent.
Auto-Config settings are performed via Sysdig Monitor’s REST API.
Simplified examples are available that use the Python client
library to
get
or
set
current Auto-Config settings. Detailed examples using the REST API are
shown below.
The REST endpoint for Auto-Config is /api/agents/config. Use the
GET method to review the current configuration. The following
example shows the initial empty settings that result in no
dragent.auto.yaml files being present on your Agents.
curl -X GET \
--header "Authorization: Bearer xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx" \
https://app.sysdigcloud.com/api/agents/config
Output:
{
"files": []
}
Use the PUT method to centrally push YAML that will be distributed
and applied to your Agents as dragent.auto.yaml files. The
content parameter must contain syntactically-correct YAML. The
filter option is used to specify if the config should be sent to one
agent or all of them, such as in this example to globally enable Debug
logging on all Agents:
curl -X PUT \
--header "Content-Type: application/json" \
--header "Authorization: Bearer xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx" \
https://app.sysdigcloud.com/api/agents/config -d '
{
"files": [
{
"filter": "*",
"content": "log:\n console_priority: debug"
}
]
}'
Alternatively, the filter can specify a hardware MAC address for a
single Agent that should receive a certain YAML config. All MAC-specific
configs should appear at the top of the JSON object and are not
additive to any global Auto-Config specified with “filter”: “*” at
the bottom. For example, when the following config is applied, the one
Agent that has the MySQL
app check configured would not have Debug logging enabled, but all
others would.
curl -X PUT \
--header "Content-Type: application/json" \
--header "Authorization: Bearer xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx" \
https://app.sysdigcloud.com/api/agents/config -d '
{
"files": [
{
"filter": "host.mac = \"08:00:27:de:5b:b9\"",
"content": "app_checks:\n - name: mysql\n pattern:\n comm: mysqld\n conf:\n server: 127.0.0.1\n user: sysdig-cloud\n pass: sysdig-cloud-password"
},
{
"filter": "*",
"content": "log:\n console_priority: debug"
}
]
}'
To update the active central Auto-Config settings, simply PUT a
complete replacement JSON object.
All connected Agents will receive centrally-pushed Auto-Config updates
that apply to them based on the filter settings. Any Agent whose
Auto-Config is enabled/disabled/changed based on the centrally-pushed
settings will immediately restart, putting the new configuration into
effect. Any central Auto-Config settings that would result in a
particular Agent’s Auto-Config remaining the same will not trigger a
restart.
Disable Agent Auto-Config
To clear all Agent Auto-Configs, use the PUT method to upload the
original blank config setting of ’{ “files”: [] }’.
It is also possible to override active Auto-Config on an individual
Agent. To do so, follow these steps for your Agent:
Add the following config directly to the dragent.yaml file:
auto_config: false.
Delete the file /opt/draios/etc/dragent.auto.yaml.
Restart the Agent.
For such an Agent to opt-in to Auto-Config again, remove auto_config:
false from the dragent.yaml and restart the Agent.
Restrictions
To prevent the possibility of pushing Auto-Config that would damage an
Agent’s ability to connect, the following keys will not be accepted in
the centrally-pushed YAML.
auto_config
customerid
collector
collector_port
ssl
ssl_verify_certificate
ca_certificate
compression
10 -
Using the Agent Console
Sysdig provides an Agent Console to interact with the Sysdig agent. This
is a troubleshooting tool to help you view configuration files and
investigate agent configuration problems quickly.
Access Agent Console
From Explore click the Groupings drop-down.
Select Hosts & Container or Nodes.
Click the desired host to investigate the corresponding agent
configuration.
Click Options (three dots) on the right upper corner of the
Explore tab.
Click Agent Console.
Agent Console Commands
View Help
The ?
command displays the commands to manage Prometheus configuration
and targets monitored by the Sysdig agent.
$ prometheus ?
$ prometheus config ?
$ prometheus config show ?
Command Syntax
The syntax of the Agent Console commands is as follows:
directory command
directory sub-directory command
directory sub-directory sub-sub-directory command
View Version
Run the following to find the version of the agent running in your
environment:
$ version
An example output:
12.0.0
Troubleshoot Prometheus Metrics Collection
These commands help troubleshoot Prometheus targets configured in your
environment.
For example, the following commands display and scrape the Prometheus
endpoints respectively.
$ prometheus target show
$ prometheus target scrape
Sub-Directory Commands
The Promscrape CLI consists of the following sections.
config
: Manages Sysdig agent-specific Prometheus configuration.
metadata
: Manages metadata associated with the Prometheus targets
monitored by the Sysdig agent.
stats
: Helps view the global- and job-specific Prometheus
statistics.
target
: Manages Prometheus endpoints monitored by Sysdig agent.
Prometheus Commands
Show
The show
command displays the information about the subsection. For
example, the following example displays the configuration of the
Prometheus server.
$ prometheus config show
5 Configuration Value
6 Enabled True
7 Target discovery Prometheus service discovery
8 Scraper Promscrape v2
9 Ingest raw True
10 Ingest calculated True
11 Metric limit 2000
Scrape
The scrape
command scrapes a Prometheus target and displays the
information. The syntax is:
$ prometheus target scrape -url <URL>
For example:
$ prometheus target scrape -url http://99.99.99.3:10055/metrics
# HELP go_gc_duration_seconds A summary of the GC invocation durations.
7 # TYPE go_gc_duration_seconds summary
8 go_gc_duration_seconds{quantile="0"} 7.5018e-05
9 go_gc_duration_seconds{quantile="0.25"} 0.000118155
10 go_gc_duration_seconds{quantile="0.5"} 0.000141586
11 go_gc_duration_seconds{quantile="0.75"} 0.000171626
12 go_gc_duration_seconds{quantile="1"} 0.00945638
13 go_gc_duration_seconds_sum 0.114420898
14 go_gc_duration_seconds_count 607
View Agent Configuration
The Agent configuration commands have a different syntax.
Run the following to view the configuration of the agent running in your
environment:
$ configuration show-dragent-yaml
$ configuration show-configmap-yaml
$ configuration show-default-yaml
$ configuration show-backend-yaml
The output displays the configuration file. Sensitive data, such as the
credentials, are obfuscated.
customerid: "********"
watchdog:
max_memory_usage_mb: 2048
Security Considerations
User-sensitive configuration is obfuscated and not visible through
the CLI.
All the information is read-only. You cannot currently change any
configuration by using the Agent console.
Runs completely inside the agent. It does not use bash or any other
Linux terminals to prevent the risk of command injection.
Runs only via a TLS connection with the Sysdig backend.
Disable Agent Console
This is currently turned on by default. To turn off Agent Console for a
particular team:
Navigate to Settings > Teams.
Select the team that you want to disable Agent Console for.
From Additional Permissions, Deselect Agent CLI.
Click Save.
To turn it off in your environment, edit the following in the
dragent.yaml
file:
command_line:
enabled: false