This the multi-page printable view of this section. Click here to print.

Return to the regular view of this page.

Installation

This guide describes deployment options for various Sysdig components.

1 - Sysdig Agent

Sysdig agents are simple to deploy and upgrade, and out of the box, they will gather and report on a wide variety of pre-defined metrics. Agent installation can be as simple as copying and pasting a few lines of code from a Wizard prompt, or you can follow step-by-step instructions to check supported environments and distributions, review installation options, and customize the agent configuration file post-install.

About Sysdig Agent

Sysdig agent is a lightweight host component that processes syscall events, creates capture files, and performs auditing and compliance. It is a platform that supports monitoring, securing, and troubleshooting cloud-native environments

In addition, the agent performs the following:

  • Metrics processing, analysis, aggregation, and transmission

  • Policy monitoring and alert notification through bidirectional communication with the Sysdig collector

  • Integration with third-party software for consolidating customer ecosystem data

  • Full assimilation into containerized and orchestrated environment

  • Matches runtime policies against the syscall events and generates policy events

Learn More

1.1 - Agent Installation

Sysdig agents are delivered as either a container or a service and can be deployed with or without an orchestrator such as Kubernetes or Mesos.

A quick install involves just a few lines of code from the Getting Started wizard copied into a shell. The complete install instructions address checking for and installing kernel headers if needed, any prerequisite permissions settings for particular environments, and tips about updating the configuration files after initial installation.

Plan the Installation

Topic

Description

Host Requirements

Review the platforms, runtimes, Linux distributions, orchestration, browsers, etc. that are supported.

Access key

An agent access key is provided with a Sysidg trial

Installation Options

Different ways in which you can install Sysdig agent.

Troubleshooting Agents

Troubleshooting tips for agent installation, tuning agents, and compiling kernel modules.

Installation Options

In the default mode of agent installation, you install the agent package as two containers, each container responsible for different functions as given below. The agent-slim reduces the surface area of attack for potential vulnerabilities and is, therefore, more secure.

  • agent-kmodule: Responsible for downloading and building the kernel module. The image is short-lived. The container exits after the kernel module is loaded. The transient nature of the container reduces the time and opportunities for exploiting any potential vulnerabilities present in the container image.

    Prerequisites: The package depends on Dynamic Kernel Module Support (DKMS) and requires the compiler and kernel headers installed if you are using the agent-kmodule to build the kernel probe. Alternatively, you can use it without the kernel headers. In such cases, the agent-kmodule will attempt to download a pre-built kernel probe if it is present in the Sysdig probe repository.

    The module contains:

    • The driver sources

    • A post-install script that builds the module upon installation

  • agent-slim: Responsible for running the agent module once the kernel module has been loaded. Slim agent functions the same way as the regular agent and retains the feature parity.

Use the instruction below to install agent on your chosen environment:

EnvironmentFlavorInstallationInstall Instructions
KubernetesOpen SourceHelm and manualInstall Agent on Kubernetes
OpenShiftHelm and manualInstall Agent on OpenShift
GKEHelm and manualInstall Agent on GKE
MKEHelmInstall Agent on MKE
RancherHelm and manualInstall Agent on Rancher
OKEHelm and manualInstall Agent on OKE
Non-OrchestratedManualInstall Agent on Non-Orchestrated Environment

Legacy Agent: The legacy agent can be run as a single container or a service. It includes the components for downloading and building the kernel module, as well as for gathering and reporting on a wide variety of pre-defined metrics. For more information, see Installing Agent as a Single Container.

Helm

Helm is the preferred way of installing Sysdig agent. It is used in most cloud environments, for example, Amazon EKS or EC2 on AWS Cloud or AWS Outpost, EC2, and Azure AKS.

Manual

With the Getting Started wizard, you can copy a simple line of code to deploy agents in a variety of environments.

Behind the scenes, the wizard auto-detects and completes configuration items such as the required access key and port information. The wizard can also be launched from the Start a Free Trial button at sysdig.com.

After the first install, Sysdig Secure and Monitor users can access the wizard at any time from the Rocket icon on the navigation bar.

EnvironmentFlavorInstall Instructions
KubernetesOpen SourceHelm is the preferred way of installing Sysdig agent. It is used in most cloud environments, for example, Amazon EKS or EC2 on AWS Cloud or AWS Outpost, EC2, and Azure AKS.
OpenShift
GKEUsed for Google Kubernetes Service environment.
MKEeBPF parameter required
RanchereBPF parameter required
OKE
IKSIBM manages and documents Sysdig installs as part of IKS.
IBM Cloud Monitoring
Non-OrchestratedUsed when there is no orchestrator such as Kubernetes. Install Agent on Non-Orchestrated Environment.
LinuxRare, used with custom kernel headers, unique use cases
Agent Install: Manual Linux Installation.
Mesos |Marathon|DCOSAgent Install: Mesos |Marathon |DCOS.

1.1.1 - Agent Installation Requirements

Sysdig agents can be installed on a wide array of Linux hosts. Check your environment to ensure it meets the minimum supported platform, operating system, runtime, and orchestration requirements and uses the appropriate installation instructions.

Versioning Scheme

We recommend that you use the latest version of the agent. Sysdig supports n-3 versions back based on the minor number. For example, if the latest release is v12.0.0, we will support n-3 versions back, up to v11.2.0.

End of Support

Sysdig agents that are older than version 0.85.1, released October 1, 2018, will no longer connect to the Sysdig US-East SaaS platform with default agent values.

Going forward all the agent releases will have a 3-year deprecation policy. This implies:

  • Sysdig Support might not be able to help you troubleshoot or address the problems with agents past the deprecation date.

  • Sysdig will no longer provide prebuilt kernel probe binaries for these agent releases. You need to build the kernel probe binaries on the fly by using the hosts kernel headers.

    These changes is effective starting Sysdig agent v12.1.0.

Agent Installation Requirements

Orchestration Platforms

Support Matrix for Kubernetes

Sysdig agent version 12.8.1 has been tested on the following list of latest Kubernetes versions. The matrix provides a single view into the supported operating systems, architecture, and runtime versions for different flavors of Kubernetes orchestrators.

ClusterOperating SystemKubernetes VersionArchitectureRuntime
RedHat OpenShift Kubernetes Service (ROKS)Red Hatv1.22x86_64cri-o
RancherSUSE Linux Enterprise Server 15 SP2v1.20x86_64docker
OpenShift (okd4) 4.8Red Hat Enterprise Linux CoreOS 48v1.21zlinuxcri-o
OpenShift (okd4) 4.10
NOTE: OpenShift versions 4.10+ cannot be used with the new Vulnerability Management component.
This means that if installing the agent on OCP4.10+, the following option should not be used or be set to false
(Helm):
set nodeAnalyzer.secure.vulnerabilityManagement.newEngineOnly=false \
Red Hat Enterprise Linux CoreOS 410v1.23x86_64cri-o
OpenShift (okd3)CentOS Linux 7 (Core)v1.11.0+d4cacc0x86_64docker
Kubernetes Operations (kops)Ubuntu 20.04.4 LTSv1.21x86_64, arm64containerd
Kubernetes Operations (kops)Ubuntu 20.04.4 LTSv1.24x86_64, arm64containerd
KubernetesUbuntu 20.04.2 LTSv1.23x86_64docker
IBM Cloud Kubernetes Service (IKS)Ubuntu 18.04.6 LTSv1.23x86_64containerd
Google Kubernetes Engine (GKE)Container-Optimized OS from Googlev1.22x86_64containerd
Amazon Elastic Kubernetes Service (EKS)Bottlerocket OS 1.9v1.22x86_64, arm64containerd

(Beta) Additional Orchestration Platforms

Orchestration PlatformsDocumentation
Oracle Kubernetes Engine (OKE)Steps for OKE
Microsoft Azure Cloud ServicesAgent Install: Non-Orchestrated
Microsoft Azure Kubernetes Service (AKS)Agent Install: Kubernetes
Amazon Elastic Container Service (Amazon ECS)Agent Install: Non-Orchestrated
+ AWS Integration Instructions
RancherOSAgent Install: Non-Orchestrated
Mesos/MarathonAgent Install: Mesos/Marathon
Docker Datacenter (DDC)Agent Install: Non-Orchestrated

If you are not using an orchestrator in your environment, follow the instructions for Agent Install Non-Orchestrated.

Note: Installing the Sysdig agent into a namespace managed by Istio and configured for sidecar auto-injection is not supported. For example, setting kubectl label namespace sysdig-agent istio-injection=enabled. Because the agent behaves more like a host component, it is required to be part of the host PID and network namespace to function correctly. Due to this requirement, deploying the Sysdig agent in Istio with an Envoy sidecar is not supported. However, running the Sysdig agent in a non-injected namespace where Istio is installed and managing other namespaces is fully supported. See Istio integration for more details on using the Sysdig agent to monitor Istio control plane and sidecar metrics.

Linux Distributions and Kernels

Support Matrix for Linux Distributions

Sysdig agent version 12.8.1 (installed as a service) has been tested on the following list of latest linux distros:

Operating SystemArchitecture
Amazon Linux 2x86_64
Fedora Linux 36 (Cloud Edition)x86_64
Red Hat Enterprise Linux 8.6 (Ootpa)x86_64
Ubuntu 18.04.6 LTS (Bionic Beaver)x86_64
Ubuntu 20.04.4 LTS (Focal Fossa)x86_64
Ubuntu 22.04 LTS (Jammy Jellyfish)x86_64

(Beta) Linux Distributions

Sysdig agent is supported on the following Linux distributions:

Platforms

Linux Distribution

Core Set

  • Debian

  • Ubuntu

  • Ubuntu (Amazon)

  • CentOS

  • Red Hat Enterprise Linux (RHEL)

  • SuSE Linux Enterprise Server

  • RHEL CoreOS (RHCOS)

  • Fedora

  • Fedora CoreOS

  • Linux Mint

  • Amazon Linux

  • Amazon Linux v2

  • Amazon Bottlerocket

  • Google Container Optimized OS (COS)

  • Oracle Linux (UEH)

  • Oracle Linux (RHCK)

AWS EC2

  • Amazon Linux 2

  • Amazon Bottlerocket

  • Core set (see above)

GCP

  • Core set (see above)

  • COS

Azure

  • Core set (see above)

Container Runtimes

Sysdig agent supports the detection of the following:

  • Docker
  • LXC
  • CRI-O
  • containerd
  • Podman
  • Mesos

Support Matrix for Docker

Sysdig agent version 12.8.1 has been tested on the following list of latest linux distros:

Operating SystemArchitecture
Amazon Linux 2x86_64, arm64
Amazon Linux 2022x86_64, arm64
Debian GNU/Linux 10 (buster)x86_64, arm64
Debian GNU/Linux 11 (bullseye)x86_64, arm64
Fedora Linux 35 (Cloud Edition)x86_64, arm64
Fedora Linux 36 (Cloud Edition)x86_64, arm64
Red Hat Enterprise Linux 8.6 (Ootpa)x86_64, arm64
Red Hat Enterprise Linux 9.0 (Plow)x86_64, arm64
SLES_15 SP4x86_64
Red Hat Enterprise Linux 9.0 (Plow)x86_64, arm64
Ubuntu 18.04.6 LTS (Bionic Beaver)x86_64, arm64
Ubuntu 20.04.4 LTS (Focal Fossa)x86_64, arm64
Ubuntu 22.04.4 LTS (Jammy Jellyfish)x86_64, arm64

Prerequisites for Podman Environments

Sysdig agent supports running as a Podman container.

  • Enable Podman API Service for all the users.

    The agent will not able to collect Podman-managed container metadata, such as the container name, if the API service is not enabled.

  • Secure rules and policies that depend on container metadata other than the container ID will not work.

  • Pausing and terminating containers will not work because Policy actions for Podman are not supported.

  • The containers started as a non-root user will have the podman_owner_uid label associated with it if the API service is enabled for that user. The value of podman_owner_uid will be the numeric user ID corresponding to the user that started the container.

Container Registries

Quay.io

For example, to pull the latest agent container from Quay.io:

docker pull quay.io/sysdig/agent

CPU Architectures

x86

Supported Agent Containers
  • agent
  • agent-slim
  • agent-kmodule

ARM (aarch64)

Supported kernel versions are v4.17 and above

Unsupported Features
  • Pre-built probes
  • Activity Audit
  • Sysdig agent installation using the agent container

s390x (zLinux)

Unsupported Features
Probes

No support for pre-built probes on zLinux. For kernel instrumentation, use the kernel module. eBPF probes are not supported on zLinux.

Captures

Capture is not supported on zLinux.

Legacy Agent Installation

Sysdig agent installation using agent container is not supported.

Java Versions and Vendors

Sysdig agent supports the following:

  • Java versions: v7 and above
  • Vendors: Oracle, OpenJDK

For Java-based applications (Cassandra, Elasticsearch, Kafka, Tomcat, Zookeeper and etc.), the Sysdig agent requires the Java runtime environment (JRE) to be installed to poll for metrics (beans).

If the Docker-container-based Sysdig agent is installed, the JRE is installed alongside the agent binaries and no further dependencies exist. However, if you are installing the service-based agent (non-container) and you do not see the JVM/JMX metrics reporting, your host may not have the JRE installed or it may not be installed in the expected location: usr/bin/java

Minimum Resource Requirements

The resource requirements of the agent are subjective to the size and load of the host— more activity equates to more resources required.

It is typical to see between 5-20KiB/s of bandwidth consumed—different variables can increase the throughput required such as the number of metrics, events, Kubernetes objects, and which products and features are enabled. When a Sysdig Capture is being collected, you can expect to see a spike in bandwidth while the capture file is being ingested.

We do not recommend placing bandwidth shaping or caps on the agent to ensure data can be sent to our collection service. For more information, see Tuning Sysdig Agent.

Additional Requirements

Access key

The installation of the Sysdig agent requires an access key.

This key and the agent installation instructions are presented to you after activating your account and using a web-based wizard upon initial login.

The same information can also be found in the Settings > Agent Installation menu of the web interface after logging in. See Agent Installation: Overview and Key for details.

Network connection

A Sysdig agent (containerized or native) is installed into each host being monitored and will need to be able to connect to the Sysdig Monitor backend servers to report host metrics. The agent must be able to reach the Sysdig Collector addresses. For example, for US East, it is ‘collector.sysdigcloud.com’ (via multiple IPs) over port tcp/6443 . See Sysdig Collector Ports for supported ports for other regions.

The agent supports the HTTP proxy for communicating with Sysdig backend components. For more information, see Enable HTTP Proxy for Agents.

1.1.2 - Quick Install Sysdig Agent

Sysdig provides you with quick-install commands pre-filled with some of your environment variables to get started with Sysdig agent. You choose the deployment type and Sysdig gives you auto-generated command to ease your installation experience.

Access from Get Started Pages

  1. Log in as the administrator to Sysdig Monitor or Sysdig Secure.

  2. Select the Get Started page.

  3. Click Install the Agent, select the appropriate deployment type, and copy the auto-generated code, filling in remaining variable values as required.

Sample Usage

Kubernetes

Helm

Helm is the recommended option for installing agents on Kubernetes.

Example

The shell commands below will create a new Kubernetes namespace called sysdig-agent and deploy the agent with a Helm release name of sysdig. Be sure to replace the configuration options with the values specific to your setup.

SaaS
kubectl create ns sysdig-agent
helm repo add sysdig https://charts.sysdig.com
helm repo update
helm install sysdig-agent \
    --namespace=sysdig-agent \
    --set global.sysdig.accessKey=`1234-your-key-here-1234` \
    --set global.sysdig.region='us1' \
    --set global.clusterConfig.name='my_cluster' \
    --set agent.sysdig.settings.tags='linux:ubuntu,dept:dev,local:nyc' \
    sysdig/sysdig-deploy
On-Prem
kubectl create ns sysdig-agent
helm repo add sysdig https://charts.sysdig.com
helm repo update
helm install sysdig-agent \
    --namespace=`dev` \
    --set global.sysdig.accessKey=`1234-your-key-here-1234` \
    --set agent.collectorSettings.collectorHost='mycollector.elb.us-west-1.amazonaws.com' \
    --set agent.collectorSettings.collectorPort=`6443` \
    --set agent.sysdig.settings.tags='linux:ubuntu,dept:dev,local:nyc' \
    --set agent.sysdig.settings.k8s_cluster_name='my_cluster' \
    sysdig/sysdig-deploy

Options

For the latest helm-based installation instructions and configuration options, see sysdig-deploy.

Script

If you cannot utilize helm, we also provide a script which will download and apply Kubernetes manifests to deploy the agent as a DaemonSet. The script requires curl and kubectl installed in the $PATH on the host in which it is run.

install-agent-kubernetes \
[-a | --access_key <value>] [-t | --tags <value>] \
[-c | --collector <value>] [-cp | --collector_port <value>] [-s | --secure <value>] \
[-cc | --check_certificate <value>] [-ns | --namespace | --project <value>] \
[-ac | --additional_conf <value>] [-op | --openshift] [-as | --agent_slim] \
[-av | --agent_version <value>] [-ae | --api_endpoint <value> ] [-na | --nodeanalyzer ] \
[-ia | --imageanalyzer ] [-am | --analysismanager <value>] [-ds | --dockersocket <value>] \
[-cs | --crisocket <value>] [-cv | --customvolume <value>] \
[-cn | --cluster_name <value>] [-r | --remove ] [-h | --help]
Example
curl -s https://download.sysdig.com/stable/install-agent-kubernetes | sudo bash -s -- \
--access_key <ACCESS_KEY>  \
--collector <COLLECTOR_ADDRESS> --collector_port <COLLECTOR_PORT> \
--nodeanalyzer --api_endpoint <SECURE_ENDPOINT_ADDRESS>

Options

For the complete configuration options, see Agent Install:Kubernetes.

Docker

Install agent-kmodule
docker run -it --privileged --rm --name sysdig-agent-kmodule \
  -v /usr:/host/usr:ro \
  -v /boot:/host/boot:ro \
  -v /lib/modules:/host/lib/modules:ro \
  quay.io/sysdig/agent-kmodule
Install agent-slim
docker run -d --name sysdig-agent \
  --restart always \
  --privileged \
  --net host \
  --pid host \
  -e ACCESS_KEY=<ACCESS_KEY> \
  -e COLLECTOR=<COLLECTOR_URL> \
  -e SECURE=true \
  [-e TAGS=<LIST_OF_TAGS>] \
  -e ADDITIONAL_CONF= <LIST_OF_CONFIG> \
  -v /var/run/docker.sock:/host/var/run/docker.sock \
  -v /dev:/host/dev \
  -v /proc:/host/proc:ro \
  -v /boot:/host/boot:ro \
  --shm-size=512m \
  quay.io/sysdig/agent-slim

Example

Install agent-kmodule
docker run -it --privileged --rm --name sysdig-agent-kmodule \
  -v /usr:/host/usr:ro \
  -v /boot:/host/boot:ro \
  -v /lib/modules:/host/lib/modules:ro \
  quay.io/sysdig/agent-kmodule
Install agent-slim
docker run \
  --name sysdig-agent \
  --privileged \
  --net host \
  --pid host \
  -e ACCESS_KEY=1234-your-key-here-1234  \
  -e COLLECTOR=mycollector.elb.us-west-1.amazonaws.com \
  -e COLLECTOR_PORT=6443 \
  -e CHECK_CERTIFICATE=false \
  -e TAGS=my_tag:some_value \
  -e ADDITIONAL_CONF="log:\n file_priority: debug\n console_priority: error" \
  -v /var/run/docker.sock:/host/var/run/docker.sock \
  -v /dev:/host/dev \
  -v /proc:/host/proc:ro \
  -v /boot:/host/boot:ro \
  -v /lib/modules:/host/lib/modules:ro \
  -v /usr:/host/usr:ro \
  --shm-size=350m \
quay.io/sysdig/agent-slim

Options

OptionDescription
ACCESS_KEYThe agent access key. You can retrieve this from Settings > Agent Installation in either Sysdig Monitor or Sysdig Secure.
tagsEnter meaningful tags you want applied to your instances.
COLLECTORThe collector URL for Sysdig Monitor or Sysdig Secure. This value is region-dependent in SaaS and is auto-completed on the Get Started page in the UI. It is a custom value in on-prem installations. See SaaS Regions and IP Ranges.
collector_portThe default is 6443.
SECUREUse a secure SSL/TLS connection to send metrics to the collector. This option is enabled by default.
CHECK_CERTIFICATE(On-prem) Determines strong SSL certificate check for Sysdig Monitor on-premises installation. Set to true when using SSL/TLS to connect to the collector service to ensure that a valid SSL/TLS certificate is installed.
ADDITIONAL_CONFOptional. Use this option to provide custom configuration values to the agent as environment variables. If provided, will be appended to agent configuration file. For example, For example, file log configuration.
bpfEnables eBPF probe.

Linux

$ curl -s https://download.sysdig.com/stable/install-agent | sudo bash -c -- \
--access_key [-t | --tags <value>] [-c | --collector <value>] \
[-cp | --collector_port <value>] [-s | --secure <value>] \
[-cc | --check_certificate]  [-ac | --additional_conf <value>] \
[-b | --bpf] [-h | --help]

Example

curl -s https://download.sysdig.com/stable/install-agent | sudo bash -s -- \
--access_key <ACCESS_KEY> --collector collector-staging.sysdigcloud.com \
--secure true

Options

OptionDescription
access-keyThe agent access key. You can retrieve this from Settings > Agent Installation in either Sysdig Monitor or Sysdig Secure.
tagsEnter meaningful tags you want applied to your instances.
collectorThe collector URL for Sysdig Monitor or Sysdig Secure. This value is region-dependent in SaaS and is auto-completed on the Get Started page in the UI. It is a custom value in on-prem installations. See SaaS Regions and IP Ranges.
collector_portThe default is 6443.
secureUse a secure SSL/TLS connection to send metrics to the collector. This option is enabled by default.
check_certificateDisables strong SSL certificate check for Sysdig Monitor on-premises installation.
additional_confOptional. Use this option to provide custom configuration values to the agent as environment variables. If provided, the value will be appended to agent configuration file. For example, file log configuration.
bpfEnables eBPF probe.

1.1.3 - Agent Install: Kubernetes

The recommended method to monitor Kubernetes environments is to deploy the Sysdig agent using the helm chart. Alternatively, you can install the agent container using DaemonSet. This section helps you install the agent in both the methods.

Installing the agent using helm or as a daemonSet will deploy agent containers on every node in your Kubernetes environment. Once the agent is installed, Sysdig Monitor automatically begins monitoring all of your hosts, apps, pods, and services and automatically connects to the Kubernetes API server to pull relevant metadata about the environment. If licensed, Sysdig Secure launches with default policies that you can view and configure to suit your needs. You can access the front-end web interfaces for Sysdig Monitor and Sysdig Secure immediately.

Sysdig supports monitoring numerous Kubernetes platforms, including the following:

Prerequisites

  • A supported distribution: See Agent Installation Requirements for details.

  • Kubernetes v1.9+: The agent installation on Kubernetes requires v1.9 or higher because the APIs used to fetch kubernetes metadata are only present in v1.9+.

  • Sysdig account and access key: Request a trial or full account at Sysdig.com and click the Activate Account button. The Getting Started Wizard provides an access key.

  • Port 6443 open for outbound traffic: The agent communicates with the collector on port 6443. If you are using a firewall, you must open port 6443 for outbound traffic for the agent.

  • Kernel headers installed: If a prebuilt kernel probe is not available for your kernel, the kernel headers must be installed in order to build the kernel probe.

  • kubectl installed: All of the installation methods utilize kubectl to install the agent in the cluster.

Kernel Headers

The Sysdig agent requires kernel header files to install successfully on a Kubernetes cluster. If the hosts in your environment match the pre-compiled kernel modules available from Sysdig, no special action is required.

In some cases, the nodes in your Kubernetes environment might use Unix versions that do not match the provided headers, and the agent might fail to install correctly. In those cases, you must install the kernel headers manually on each node.

To do so:

For Debian-style distributions, run the command:

apt-get -y install linux-headers-$(uname -r)

For RHEL-style distributions, run the command:

yum -y install kernel-devel-$(uname -r)

For more information on troubleshooting, see About Kernel Headers and the Kernel Module.

Kubernetes Environments

Some Kubernetes environments require special configuration options to deploy the agent. If you’re installing in one of the following environments, follow the guides specific to those environments to deploy the agent. Otherwise, continue with this topic.

Installation

Helm

Sysdig recommends using helm charts to install Sysdig agent in Kubernetes environments. For the latest chart and installation instructions, see sysdig-deploy.

Script

Sysdig also provides a script that you can use to install the agent as a DaemonSet.

Installation

  1. Download the script and make it executable.

     wget https://download.sysdig.com/stable/install-agent-kubernetes
     chmod +x install-agent-kubernetes
    
  2. Run the script to install the agent as a DaemonSet.

    ./install-agent-kubernetes -a <ACCESS_KEY> -c <COLLECTOR_URL> -cn <CLUSTER_NAME>
    

Options

Option

Description

-a

The agent access key. You can retrieve this from Settings > Agent Installation in either Sysdig Monitor or Sysdig Secure.

-t

The list of tags to identify the host where the agent is installed. For example: role:webserver, location:europe, role:webserver.

-c

The collector URL for Sysdig Monitor or Sysdig Secure. This value is region-dependent in SaaS and is auto-completed on the Get Started page in the UI. It is a custom value in on-prem installations.

-cp

The collector port. The default is 6443.

-cn

If a value is provided, the cluster will be identified with the name provided

-s

Use a secure SSL/TLS connection to send metrics to the collector. This option is enabled by default.

-cc

Enable strong SSL certificate check. The default is true.

-ns

If a value is provided, the agent will be deployed to the specified namespace/project. The default is sysdig-agent.

-op

If provided, perform the agent installation using the OpenShift command line.

-ac

If a value is provided, the additional configuration will be appended to the agent configuration file.

-av

If a version is provided, use the specified agent version. The default is the latest version.

-r

If a value is provided, the daemonset, configmap, cluster role binding, service acccount and secret associated with the Sysdig Agent will be removed from the specified namespace.

-ae

The api_endpoint is the region-dependent domain for the Sysdig product, without the protocol. E.g. secure.sysdig.com, us2.app.sysdig.com, eu1.app.sysdig.com

-h

Print this usage and exit.

Sysdig Secure Only

-na

If provided, will install the Node Analyzer tools. It is an error to set both -ia and -na.

-ds

The docker socket for Image Analyzer.

-cs

The CRI socket for Image Analyzer.

-cv

The custom volume for Image Analyzer.

-h

Print this usage and exit.

-b

Required in AWS Bottlerocket nodes to determine whether the eBPF should be built. Alternatively, you can use `--bpf`.

Sysdig Secure Only (Legacy)

These values apply to the Node Image Analyzer (v1) in Sysdig Secure.

-am

The Analysis Manager endpoint for Sysdig Secure.

-ia

If provided, will install the Node Image Analyzer (v1). It is an error to set both -ia and -na. The v1 Node Image Analyzer will be deprecated and replaced by the NA tools.

Manifests

To deploy agents using Kubernetes manifests, you can download manifest files, edit them as required, and deploy them using kubectl.

  1. Download the sample files:

    • sysdig-agent-clusterrole.yaml

    • sysdig-agent-daemonset-v2.yaml

    • sysdig-agent-configmap.yaml

    • sysdig-agent-service.yaml

  2. Create a namespace for the Sysdig agent.

    Note: You can use whatever name you prefer. This example uses sysdig-agent for both the namespace and the service account. The default service account name was automatically defined in sysdig-agent-daemonset-v2.yaml, at the line: serviceAccount: sysdig-agent

    kubectl create ns sysdig-agent
    
  3. Create a secret key:

    kubectl create secret generic sysdig-agent --from-literal=access-key=<your sysdig access key> -n sysdig-agent
    
  4. Create a cluster role and service account, and define the cluster role binding that grants the Sysdig agent rules in the cluster role:

    kubectl apply -f sysdig-agent-clusterrole.yaml -n sysdig-agent
    kubectl create serviceaccount sysdig-agent -n sysdig-agent
    kubectl create clusterrolebinding sysdig-agent --clusterrole=sysdig-agent --serviceaccount=sysdig-agent:sysdig-agent
    
  5. Edit sysdig-agent-configmap.yaml to add the collector address , port , and the SSL/TLS information:

    collector:
    collector_port:
    ssl: #true or false
    check_certificate: #true or false
    
    • For SaaS, find the collector address for your region.

    • For On-prem, enter the collector endpoint defined in your environment.

    • check_certificate should be set to false if a self-signed certificate or private, and a CA-signed cert is used. See Set Up SSL Connectivity to the Backend for more information.

  6. Apply the sysdig-agent-configmap.yaml file:

    kubectl apply -f sysdig-agent-configmap.yaml -n sysdig-agent
    
  7. Apply the sysdig-agent-service.yaml file:

    kubectl apply -f sysdig-agent-service.yaml -n sysdig-agent
    

    This allows the agent to receive Kubernetes audit events from the Kubernetes API server. See Kubernetes Audit Logging for information on enabling Kubernetes audit logging.

  8. Apply the daemonset-v2.yaml file :

    kubectl apply -f sysdig-agent-daemonset-v2.yaml -n sysdig-agent
    

Additional Options

Verify Metrics in Sysdig Monitor

Log in to Sysdig Monitor to verify that the agent deployed and the metrics are detected and collected appropriately.

The steps below give one way to do the check.

  1. Access Sysdig Monitor:

    SaaS: See SaaS Regions and IP Ranges and identify the correct domain URL associated with your Sysdig application and region. For example, for US East, the URL is https://app.sysdigcloud.com.

    For other regions, the format is https://<region>.app.sysdig.com. Replace <region> with the region where your Sysidig application is hosted. For example, for Sysdig Monitor in the EU, you use https://eu1.app.sysdig.com.

    Log in with your Sysdig user name and password.

  2. Select the Explore tab to see if metrics are displayed.

  3. Determine the Kube State Metrics you want to collect.

  4. To verify that kube state metrics and cluster name are working correctly, select the Explore tab and see if your cluster is listed.

Kubernetes metadata (pods, deployments etc.) appear a minute or two later than the nodes/containers themselves; if pod names do not appear immediately, wait and retry the Explore view.

If agents are disconnecting, there could be an issue with your MAC addresses. See Troubleshooting Agent Installation for tips.

Connect to the Sysdig Backend via Static IPs (SaaS only)

Sysdig provides a list of static IP addresses that can be whitelisted in a Sysdig environment, allowing users to establish a network connection to the Sysdig backend without opening complete network connectivity. This is done by setting the Collector IP to collector-static.sysdigcloud.com.

The sysdig-agent-configmap.yaml file can be edited either locally or using the edit command in Kubernetes.

To configure the collector IP in a Kubernetes SaaS instance:

  1. Open sysdig-agent-configmap.yaml in a text editor.

  2. Uncomment the following lines:

    • collector:

    • collector_port

  3. Set the collector: value to collector-static.sysdigcloud.com

  4. Set the collector_port: value to 6443

  5. Save the file.

The example file below shows how the sysdig-agent-configmap.yaml file should look after configuration:

apiVersion: v1
kind: ConfigMap
metadata:
  name: sysdig-agent
data:
  dragent.yaml: |
    ### Agent tags
    # tags: linux:ubuntu,dept:dev,local:nyc

    #### Sysdig Software related config ####

    # Sysdig collector address
    collector: collector-static.sysdigcloud.com

    # Collector TCP port
    collector_port: 6443

    # Whether collector accepts ssl/TLS
    ssl: true

    # collector certificate validation
    ssl_verify_certificate: true

    # Sysdig Secure
    security:
      enabled: true

    #######################################
    # new_k8s: true
    # k8s_cluster_name: production    

1.1.3.1.1 - GKE Autopilot

Autopilot is an operation mode for creating and managing clusters in GKE. In brief, with Autopilot, Google configures and manages the underlying node infrastructure for you. This topic helps you use helm to install Sysdig agent on a GKE cluster installed in Autopilot mode.

NodeAnalyzer is not supported on Autopilot environments.

Prerequisites

  1. Install a GKE cluster in Autopilot mode.

  2. Connect the GKE cluster.

  3. Install your workload.

Deploy Sysdig Agent

Sysdig recommends using Helm to install Sysdig agent in kubernetes environments. After connecting to the GKE cluster, use the sysdig-deplpy chart to install Sysdig agent.

To customize the configuration of the agent, see the Sysdig Agent Helm Chart.

Verify Metrics on the Sysdig Monitor UI

Log in to Sysdig Monitor to verify that the agent deployed and the metrics are detected and collected appropriately.

Given below is one way to do so.

  1. Access Sysdig Monitor:

    SaaS: See SaaS Regions and IP Ranges and identify the correct domain URL associated with your Sysdig application and region. For example, for US East, the URL is https://app.sysdigcloud.com.

    For other regions, the format is https://<region>.app.sysdig.com. Replace <region> with the region where your Sysdig application is hosted. For example, for Sysdig Monitor in the EU, you use https://eu1.app.sysdig.com.

    Log in with your Sysdig user name and password.

  2. Select the Explore tab to see if metrics are displayed.

  3. Verify that kube state metrics and cluster name are working correctly: select the Explore tab and create a grouping by kube_cluster_name and kube_pod_name.

  4. Select an individual container or pod to see the details.

1.1.3.1.2 - GKE Standard

Google Kubernetes Engine (GKE) is a managed environment for running Kubernetes in Google Cloud, in order to deploy containerized applications. Sysdig supports all flavors of GKE, including Ubuntu and GKE’s default Container-Optimized OS(COS).

GKE COS environments require eBPF probe to support agent installation.

Preparation

Open Port 6443 for Agent Egress

Because GKE uses stateful firewalls, you must actively open port 6443 for the Sysdig agent outbound traffic.

In earlier versions, the Sysdig Agent connected to port 6666. This behavior has been deprecated, as the Sysdig agent now connects to port 6443.

GKE COS/eBPF-Specific Requirements

  • Linux kernel version >= 4.14.

  • When performing the installation steps, you will add one additional parameter to install the eBPF probe. See Step 7. Note that only the eBPF probe is supported in GKE COS environments.

Prerequisites

You can review Agent Install: Kubernetes and the Agent Installation Requirements for additional context, if desired.

Installation Steps

Helm

Sysdig recommends using helm charts to install Sysdig agent in Kubernetes environments. For the latest chart and installation instructions, see sysdig-deploy.

Manifests

To deploy agents using Kubernetes manifests, you can download the manifest files, edit them as required, and deploy them using kubectl.

  1. Download the sample files:

    • sysdig-agent-clusterrole.yaml

    • sysdig-agent-daemonset-v2.yaml

    • sysdig-agent-configmap.yaml

    • sysdig-agent-service.yaml

  2. Create a namespace to use for the Sysdig agent.

    You can use whatever name you want. In this document, we used sysdig-agent for both the namespace and the service account.

    kubectl create ns sysdig-agent
    
  3. Create a secret key:

    kubectl create secret generic sysdig-agent --from-literal=access-key=<your sysdig access key> -n sysdig-agent
    
  4. If you are running Kubernetes 1.6 or higher, you must grant your user the ability to create roles in Kubernetes by running the following command:

    kubectl create clusterrolebinding your-user-cluster-admin-binding --clusterrole=cluster-admin --user=your.google.cloud.email@example.org
    

    See Google documentation for more information.

    Create a service account for the Sysdig agent using the clusterrole.yaml file.

    The Sysdig agent must be granted read-only access to certain Kubernetes APIs, which the agent uses to populate metadata and provide component metrics.

    You can use the Sysdig-provided, sysdig-agent-clusterrole.yaml file. Deploying this file creates a cluster role and service account in Kubernetes, and defines cluster role binding that grants the Sysdig agent rules in the cluster role.

    Run the following commands (using whatever namespace you’ve defined in Step 2):

    kubectl apply -f sysdig-agent-clusterrole.yaml -n sysdig-agent
    kubectl create serviceaccount sysdig-agent -n sysdig-agent
    kubectl create clusterrolebinding sysdig-agent --clusterrole=sysdig-agent --serviceaccount=sysdig-agent:sysdig-agent
    
  5. Edit sysdig-agent-configmap.yaml to add the collector address, port, and the SSL/TLS information :

    collector:
    collector_port:
    ssl: #true or false
    check_certificate: #true or false
    
  6. Apply the sysdig-agent-configmap.yaml file:

    kubectl apply -f sysdig-agent-configmap.yaml -n sysdig-agent
    
  7. FOR GKE COS ONLY: To enable the eBPF probe required for COS, uncomment the following parameters in  sysdig-agent-daemonset-v2.yaml under the env section:

    env:
      - name: SYSDIG_BPF_PROBE
        value: ""
    
  8. Apply the sysdig-agent-service.yaml file:

    kubectl apply -f sysdig-agent-service.yaml -n sysdig-agent
    

    This allows the agent to receive Kubernetes audit events from the Kubernetes API server. See Kubernetes Audit Logging for information on enabling Kubernetes audit logging.

  9. Apply the daemonset-v2.yaml file:

    kubectl apply -f sysdig-agent-daemonset-v2.yaml -n sysdig-agent
    

The agents will be deployed and you can see some metrics in the Sysdig Monitor UI.

Next Steps

You can continue with instructions in Verify Metrics in Sysdig Monitor and optionally, Connect to Sysdig Backend.

1.1.3.2 - Steps for OKE

Oracle Kubernetes Engine (OKE) is a managed environment for running Kubernetes in Oracle Cloud, in order to deploy containerized applications. As of Sysdig agent version 12.0.1, Sysdig supports all flavors of OKE.

OKE environments require eBPF probe to support agent installation.

The instructions below describe a standard OKE agent install and call out the special steps needed to install the eBPF probe.

Preparation

Open Port 6443 for Agent Egress

Because OKE uses stateful firewalls, you must actively open port 6443 for the Sysdig agent outbound traffic.

OKE by default allows network access to the sysdig Agent on 6443, but ensure that firewall rules are open and the agent can connect to the Sysdig backends.

eBPF-Specific Requirements

  • Linux kernel version >= 4.14.

  • When performing the installation steps, you will add one additional parameter to install the eBPF probe. See Step 7, below.

Installation Steps

Identify the appropriate endpoint depending on your Sysdig account region. For more information, see SaaS Regions and IP Ranges. More info here https://docs.sysdig.com/en/docs/administration/saas-regions-and-ip-ranges/

After making clear which region your account belongs to, please choose one of the following methods:

Helm

Sysdig recommends using helm charts to install Sysdig agent in Kubernetes environments. For the latest chart and installation instructions, see sysdig-deploy.

Manifests

To deploy agents using Kubernetes manifests, you can downloadmanifest files, edit them as required, and deploy them using kubectl.

  1. Download the sample files:

    • sysdig-agent-clusterrole.yaml

    • sysdig-agent-daemonset-v2.yaml

    • sysdig-agent-configmap.yaml

    • sysdig-agent-service.yaml

  2. Create a namespace to use for the Sysdig agent.

    Notes: You can use whatever name you want. In this document, we used sysdig-agent for both the namespace and the service account.

    kubectl create ns sysdig-agent
    
  3. Create a secret key:

    kubectl create secret generic sysdig-agent --from-literal=access-key=<your sysdig access key> -n sysdig-agent
    
  4. If you are running Kubernetes 1.6 or higher, you must create a service account for the Sysdig agent by using the clusterrole.yaml file.

    The Sysdig agent must be granted read-only access to certain Kubernetes APIs, which the agent uses to populate metadata and provide component metrics.

    You can use the Sysdig-provided sysdig-agent-clusterrole.yaml file. Deploying it create a cluster role and service account in Kubernetes, and defines cluster role binding that grants the Sysdig agent rules in the cluster role.

    Run the following commands by using the namespace you’ve defined in Step 2:

    kubectl apply -f sysdig-agent-clusterrole.yaml -n sysdig-agent
    kubectl create serviceaccount sysdig-agent -n sysdig-agent
    kubectl create clusterrolebinding sysdig-agent --clusterrole=sysdig-agent --serviceaccount=sysdig-agent:sysdig-agent
    
  5. Edit sysdig-agent-configmap.yaml to add the collector address, port, and the SSL/TLS information :

    collector:
    collector_port:
    ssl: #true or false
    check_certificate: #true or false
    
  6. Apply the sysdig-agent-configmap.yaml file:

    kubectl apply -f sysdig-agent-configmap.yaml -n sysdig-agent
    
  7. To enable the eBPF probe uncomment the following parameters in  sysdig-agent-daemonset-v2.yaml under the env section:

    env:
      - name: SYSDIG_BPF_PROBE
        value: ""
    
  8. Apply the sysdig-agent-service.yaml file:

    kubectl apply -f sysdig-agent-service.yaml -n sysdig-agent
    

    This allows the agent to receive Kubernetes audit events from the Kubernetes API server. See Kubernetes Audit Logging for information on enabling Kubernetes audit logging.

  9. Apply the daemonset-v2.yaml file:

    kubectl apply -f sysdig-agent-daemonset-v2.yaml -n sysdig-agent
    

    The agents will be deployed and you can see some metrics in the Sysdig Monitor UI.

Next Steps

You can continue with instructions in Verify Metrics in Sysdig Monitor and optionally, Connect to Sysdig Backend.

1.1.3.3 - Steps for OpenShift

You can review Agent Install: Kubernetes and the Agent Installation Requirements for additional context, if desired.

RHCOS environments require eBPF probe to support agent installation.

Preparation

RHCOS/eBPF-Specific Requirements

  • Linux kernel version 4.14 or above.
  • When performing the installation steps, you will add one additional parameter to install the eBPF probe. See Step 7, below.

Kernel Headers

The Sysdig agent requires kernel header files to install successfully on a Kubernetes cluster. If the hosts in your environment match the pre-compiled kernel modules available from Sysdig, no special action is required.

In some cases, the nodes in your Kubernetes environment might use Unix versions that do not match the provided headers, and the agent might fail to install correctly. In those cases, you must install the kernel headers manually on each node.

To do so:

For Debian-style distributions, run the command:

apt-get -y install linux-headers-$(uname -r)

For RHEL-style distributions, run the command:

yum -y install kernel-devel-$(uname -r)

For more information on troubleshooting, see About Kernel Headers and the Kernel Module.

Configure for OpenShift

If you are using Red Hat OpenShift, these steps are required. They describe how to create a project, assign and label the node selector, create a privileged service account, and add it to a cluster role.

Copy/Paste Sample Code Block

In the example code, this document uses sysdig-agent for the PROJECT NAME (-n) and the SERVICE ACCOUNT (-z).

You can copy and paste the code as is, or follow the steps below to customize your naming conventions.

oc adm new-project sysdig-agent --node-selector=''
oc project sysdig-agent
oc create serviceaccount sysdig-agent
oc adm policy add-scc-to-user privileged -n sysdig-agent -z sysdig-agent -z node-analyzer
oc adm policy add-cluster-role-to-user cluster-reader -n sysdig-agent -z sysdig-agent -z node-analyzer

Customize the Code

You can use your own Project name and Service Account name if desired.

Note that if you use a different Service Account name, you will need to edit the default service account in the Sysdig Installation Steps, below.

  1. Create a new OpenShift project for the Sysdig agent deployment and use an empty string for the node selector:

    oc adm new-project PROJECT-NAME --node-selector=""
    
  2. Change to the new OpenShift Project for the Sysdig agent deployment:

    oc project PROJECT-NAME
    
  3. Create a service account for the project:

    oc create serviceaccount SERVICE-ACCOUNT
    
  4. Add the service account to privileged Security Context Constraints:

    oc adm policy add-scc-to-user privileged -n PROJECT-NAME -z SERVICE-ACCOUNT -z node-analyzer
    
  5. Add the service account to the cluster-reader Cluster Role:

    oc adm policy add-cluster-role-to-user cluster-reader -n PROJECT-NAME -z SERVICE-ACCOUNT -z node-analyzer
    

Installation

Helm

Sysdig recommends using helm charts to install Sysdig agent in Kubernetes environments. For the latest chart and installation instructions, see sysdig-deploy.

Manifests

  1. Download the sample files:

    • sysdig-agent-daemonset-v2.yaml

    • sysdig-agent-clusterrole.yaml

    • sysdig-agent-configmap.yaml

    • sysdig-agent-service.yaml

  2. Create the sysdig-agent cluster role and assign it to the service account:

     oc apply -f sysdig-agent-clusterrole.yaml
     oc adm policy add-cluster-role-to-user sysdig-agent -n PROJECT-NAME -z SERVICE-ACCOUNT
    
  3. Create a secret key:

    oc create secret generic sysdig-agent --from-literal=access-key=<your sysdig access key> -n PROJECT-NAME
    
  4. If you created a service account name other than sysdig-agent: Edit sysdig-agent-daemonset-v2.yamlto provide your custom value:``

    serviceAccount: SERVICE-ACCOUNT
    
  5. Edit sysdig-agent-configmap.yaml to add the collector address, port, and the SSL/TLS information:

    collector:
    collector_port:
    ssl: #true or false
    check_certificate: #true or false
    
    • For SaaS, find the collector address for your region.
    • For On-prem, enter the collector endpoint defined in your environment.
    • check_certificate should be set to false if a self-signed certificate or private, CA-signed cert is used. See also Set Up SSL Connectivity to the Backend.
  6. Apply the sysdig-agent-configmap.yaml file:

    oc apply -f sysdig-agent-configmap.yaml -n PROJECT-NAME
    
  7. FOR RHCOS ONLY: To enable the eBPF probe required for COS, uncomment the following parameters in sysdig-agent-daemonset-v2.yaml under the env section:`

    env:
      - name: SYSDIG_BPF_PROBE
        value: ""
    
  8. Apply the sysdig-agent-service.yaml file:

    oc apply -f sysdig-agent-service.yaml -n PROJECT-NAME
    

    This allows the agent to receive Kubernetes audit events from the Kubernetes API server. See Kubernetes Audit Logging for information on enabling Kubernetes audit logging.

  9. Apply the daemonset-v2.yaml file:

    oc apply -f sysdig-agent-daemonset-v2.yaml -n PROJECT-NAME
    

    The agents will be deployed and you can see some metrics in the Sysdig Monitor UI.

Next Steps

You can continue with instructions in Verify Metrics in Sysdig Monitor and optionally, Connect to Sysdig Backend.

1.1.3.4 - Steps for Rancher

Preparation

General Requirements

You can review Agent Install: Kubernetes | GKE | OpenShift | IBM and the Agent Installation Requirements for additional context, if desired.

Kernel Headers

The Sysdig agent requires a kernel module in order to be installed successfully on a host. On RancherOS distributions, the Unix version does not match the provided headers, and the agent might fail to install correctly. Therefore, you must install the kernel headers manually.

For RancherOS distributions, the kernel headers are available in the form of a system service and therefore are enabled using the ros service command:

$ sudo ros service enable kernel-headers-system-docker
$ sudo ros service up -d kernel-headers-system-docker

Some cloud hosting service providers supply pre-configured Linux instances with customized kernels. You may need to contact your provider’s support desk for instructions on obtaining appropriate header files, or for installing the distribution’s default kernel.

Installation

Helm

Sysdig recommends using helm charts to install Sysdig agent in Kubernetes environments. For the latest chart and installation instructions, see sysdig-deploy.

Manifests

To deploy agents using Kubernetes manifests, you can download the manifest files, edit them as required, and deploy them using kubectl.

  1. Download the sample files:

    • sysdig-agent-clusterrole.yaml

    • sysdig-agent-daemonset-v2.yaml

    • sysdig-agent-configmap.yaml

    • sysdig-agent-service.yaml

  2. Create a namespace to use for the Sysdig agent.

    You can use whatever naming you prefer. This document uses sysdig-agent for both the namespace and the service account.

    The default service account name was automatically defined in sysdig-agent-daemonset-v2.yaml, at the line: serviceAccount: sysdig-agent.

    kubectl create ns sysdig-agent
    
  3. Create a secret key:

    kubectl create secret generic sysdig-agent --from-literal=access-key=<your sysdig access key> -n sysdig-agent
    
  4. Create a cluster role and service account, and define the cluster role binding that grants the Sysdig agent rules in the cluster role:

    kubectl apply -f sysdig-agent-clusterrole.yaml -n sysdig-agent
    kubectl create serviceaccount sysdig-agent -n sysdig-agent
    kubectl create clusterrolebinding sysdig-agent --clusterrole=sysdig-agent --serviceaccount=sysdig-agent:sysdig-agent
    
  5. Edit sysdig-agent-configmap.yaml to add the collector address , port , and the SSL/TLS information:

    collector:
    collector_port:
    ssl: #true or false
    check_certificate: #true or false
    
  6. Apply the sysdig-agent-configmap.yaml file:

     kubectl apply -f sysdig-agent-configmap.yaml -n sysdig-agent
    
  7. Apply the sysdig-agent-service.yaml file:

    kubectl apply -f sysdig-agent-service.yaml -n sysdig-agent
    

This allows the agent to receive Kubernetes audit events from the Kubernetes API server. See Kubernetes Audit Logging for information on enabling Kubernetes audit logging.

  1. Apply the daemonset-v2.yaml file :

    kubectl apply -f sysdig-agent-daemonset-v2.yaml -n sysdig-agent
    

The agents will be deployed and you can see some metrics in the Sysdig Monitor UI.

Next Steps

You can continue with instructions in Verify Metrics in Sysdig Monitor and optionally, Connect to Sysdig Backend.

1.1.3.5 - Steps for MKE

Mirantis Kubernetes Engine (MKE) formerly Docker Enterprise, is a managed environment for running Kubernetes to deploy containerized applications. As of Sysdig agent version 12.0.1, Sysdig supports all flavors of MKE.

MKE environments require eBPF probe to support agent installation.

The instructions below describe a standard MKE agent install and call out the special steps needed to install the eBPF probe.

Preparation

eBPF-Specific Requirements

  • Linux kernel versions 4.14 or above.

  • eBPF probe parameter, --set ebpf.enabled=true to install eBPF probe. See the instructions given below.

Installation Steps

Identify the appropriate endpoint depending on your Sysdig account region. For more information, see SaaS Regions and IP Ranges.

Helm

Sysdig recommends using helm charts to install Sysdig agent in Kubernetes environments. For the latest chart and installation instructions, see sysdig-deploy.

Make sure to add the eBPF parameter to the helm command:

--set ebpf.enabled=true

Next Steps

You can continue with instructions in Verify Metrics in Sysdig Monitor and optionally, Connect to Sysdig Backend.

1.1.3.6 - Using Node Leases

The Sysdig agent uses Kubernetes Lease to control how and when connections are made to the Kubernetes API Server. This mechanism prevents overloading the Kubernetes API server with connection requests during agent bootup.

Kubernetes node leases are automatically created for agent version 12.0.0 and above. On versions prior to 12.0.0, you must configure node leases as given in the KB article.

Prerequisites

  • Sysdig Agent v11.3.0 or above

  • Kubernetes v1.14 or above

Types of Leases

The agent creates the following leases:

Cold Start

During boot up, the Sysdig agent connects to the Kubernetes API server to retrieve Kubernetes metadata and build a cache. The cold-start leases control the number of agents that build up this cache at any given time. An agent will grab a lease, build its cache, and then release the lease so that another agent can build its cache. This mechanism prevents agents from creating a “boot storm” which can overwhelm the API server in large clusters.

Delegation

In Kubernetes environments, two agents are marked as delegated in each cluster. The delegated agents are the designated agents to request more data from the API server and produce KubeState metrics. The delegation leases will not be released until the agent is terminated.

View Leases

To view the leases, run the following:

$ kubectl get leases -n sysdig-agent

You will see an output similar to the following:

NAME           HOLDER             AGE
cold-start-0                      20m
cold-start-1                      20m
cold-start-2                      21m
cold-start-3   ip-10-20-51-167    21m
cold-start-4                      21m
cold-start-5                      21m
cold-start-6                      20m
cold-start-7                      21m
cold-start-8                      20m
cold-start-9   ip-10-20-51-166   21m
delegation-0   ip-10-20-52-53    21m
delegation-1   ip-10-20-51-98    21m

Troubleshoot Leases

Verify Configuration

When lease-based delegation is working as expected, the agent logs show one of the following:

  • Getting pods only for node <node>

  • Getting pods for all nodes.

  • Both (occasionally on the delegated nodes)

Run the following to confirm that it is working:

$ kubectl logs sysdig-agent-9l2gf -n sysdig-agent | grep -i "getting pods"

The configuration is working as expected if the output on a pod is similar to the following:

2021-05-05 02:48:32.877, 15732.15765, Information, cointerface[15738]: Only getting pods for node ip-10-20-51-166.ec2.internal

Unable to Create Leases

The latest Sysdig ClusterRole is required for the agent to create leases. If you do not have the latest ClusterRole or if you have not configured the ClusterRole correctly, the logs show the following error:

Error, lease_pool_manager[2989554]: Cannot access leases objects: leases.coordination.k8s.io is forbidden: User "system:serviceaccount:sysdig-agent:sysdig-agent" cannot list resource "leases" in API group "coordination.k8s.io" in the namespace "sysdig-agent"

Contact Sysdig Support for help.

Optional Agent Configuration

Several configuration options exist for leases. It is recommended to not change the default settings unless prompted by Sysdig Customer Support.

Configuration

Default

Description

k8s_coldstart:
  enabled: <true/false>

true above agent versions 12.0.0

When true, the agent will attempt to create cold-start leases to control the number of agents which are allowed to build their cache at one time.

k8s_coldstart:
  max_parallel_cold_start: <int>

10

The number of cold-start leases to be created. This is the number of agents that can connect to the API Server simultaneously during agent initialization.

k8s_coldstart:
  namespace: <string>

sysdig-agent

The namespace to be created. This shouldn't be needed in agent version 12.0.0 because the DownwardAPI in the ClusterRole will provide the appropriate namespace.

k8s_coldstart:
  enforce_leader_election: <true/false>

false

When true, the agent will not fall back to the previous method if it cannot create leases.This can be useful if the previous method caused API Server problems.

k8s_delegation_election: <true/false>

true above agent versions 12.0.0

When true, the agent will create delegation leases to control which set of agents generate global cluster metrics.

1.1.4 - Agent Install: Non-Orchestrated

This section describes how to install the Sysdig agent directly on a Linux host, without using an orchestrator, such as Kubernetes or Mesos.

The agent can be installed in two ways:

  • As a standard container

  • As a non-containerized service

The steps for each flavor differ slightly depending on whether you are using the SaaS or on-premises version of the Sysdig platform.

If you are installing the Sysdig agent in an environment that has Kubernetes, use the Agent Install: Kubernetes instructions instead.

Prerequisites

  • See Agent Installation Requirements for information on the following:

    • Supported Linux distributions

    • Network connection

    • Sysdig access key

    • Cloud service providers (AWS, Google, and Microsoft Azure) and any steps you may need to configure to integrate the Sysdig agent.

  • kernel headers: The Sysdig agent requires kernel header files in order to install successfully on a host, and the agent is delivered with precompiled headers. If the hosts in your environment match the kernel versions included with the agent, no special action is needed .In some cases, the hosts in your environment may use Unix versions that do not match the provided headers, and the agent may fail to install correctly. In those cases, you must install the kernel headers manually. See About Kernel Headers and the Kernel Module for details.

  • Run any commands as root or with the sudo command.

  • Retrieve the Sysdig access key.

  • Collect the configuration parameters.

Configuration Options

OptionDescription
ACCESS_KEYThe agent access key. You can retrieve this from Settings > Agent Installation in either Sysdig Monitor or Sysdig Secure.
tagsThe list of tags for the host where the agent is installed. For example: role:webserver, location:europe, role:webserver
COLLECTORThe collector URL for Sysdig Monitor or Sysdig Secure. This value is region-dependent in SaaS and is auto-completed on the Get Started page in the UI. It is a custom value in on-prem installations. See SaaS Regions and IP Ranges.
collector_portThe default is 6443.
SECUREUse a secure SSL/TLS connection to send metrics to the collector. This option is enabled by default.
CHECK_CERTIFICATE(On-prem) Determines strong SSL certificate check for Sysdig Monitor on-premises installation. Set to true when using SSL/TLS to connect to the collector service to ensure that a valid SSL/TLS certificate is installed. For more information, see Set Up SSL Connectivity to the Backend.
ADDITIONAL_CONFOptional. Use this option to provide custom configuration values to the agent as environment variables. If provided, will be appended to agent configuration file. For example, For example, file log configuration.
bpfEnables eBPF probe. The path to the probe file that is either built or downloaded.

Installing Agent Using Containers

The Sysdig agent can be deployed as a docker container.

The commands below can also be copied from the Get Started page. In that case, your access key will already be included in the command automatically.

SaaS

Installing As Two Containers

The agent is installed by running sysdig/agent-kmodule, followed by running sysdig/agent-slim. See Installation Options for description about agent-slim and agent-kmodule.

Every host restart requires subsequent running of agent-kmodule and agent-slim containers.

  1. Collect the configuration parameters.

  2. Build and load the kernel module:

    If you are not using eBPF, use the following:

    docker run -it --privileged --rm --name sysdig-agent-kmodule \
    -v /usr:/host/usr:ro \
    -v /boot:/host/boot:ro \
    -v /lib/modules:/host/lib/modules \
    quay.io/sysdig/agent-kmodule
    

    If you are using eBPF use the following:

    docker run -it --privileged --rm --name sysdig-agent-kmodule \
    -e SYSDIG_BPF_PROBE="" \
    -v /etc/os-release:/host/etc/os-release:ro \
    -v /root/.sysdig:/root/.sysdig \
    -v /usr:/host/usr:ro \
    -v /boot:/host/boot:ro \
    -v /lib/modules:/host/lib/modules:ro \
    quay.io/sysdig/agent-kmodule
    
  3. Configure kernel module to load during system boot.

    If you are not using eBPF, use the following commands to configure the Linux system to automatically load the kernel module during system boot.

    $ sudo mkdir -p /etc/modules-load.d
    $ sudo bash -c "echo sysdigcloud-probe > /etc/modules-load.d/sysdigcloud-probe.conf"
    
  4. Run the agent module providing the access key and, optionally, user-defined tags:

    If you are not using eBPF, use the following:

    docker run -d --name sysdig-agent \
    --restart always \
    --privileged \
    --net host \
    --pid host \
    -e ACCESS_KEY=[ACCESS_KEY] \
    -e COLLECTOR=[COLLECTOR_ADDRESS] \
    [-e TAGS=[TAGS]]
    -v /var/run/docker.sock:/host/var/run/docker.sock \
    -v /dev:/host/dev \
    -v /proc:/host/proc:ro \
    -v /boot:/host/boot:ro \
    --shm-size=512m \
    quay.io/sysdig/agent-slim
    

    If you are using eBPF use the following:

    docker run -d --name sysdig-agent \
    --restart always \
    --privileged \
    --net host \
    --pid host\
    -e ACCESS_KEY=[ACCESS_KEY] \
    -e COLLECTOR=[COLLECTOR_ADDRESS] \
    [-e TAGS=[TAGS]]
    -e SYSDIG_BPF_PROBE="" \
    -v /sys/kernel/debug:/sys/kernel/debug:ro \
    -v /root/.sysdig:/root/.sysdig \
    -v /var/run/docker.sock:/host/var/run/docker.sock \
    -v /dev:/host/dev \
    -v /proc:/host/proc:ro \
    -v /boot:/host/boot:ro \
    --shm-size=512m \
    quay.io/sysdig/agent-slim
    

Installing As Single Container (Legacy)

  1. Collect the configuration parameters.

  2. Run the agent container providing the access key and, optionally, user-defined tags:

    If you are not using eBPF, use the following:

    If you are not using eBPF, use the following:

    docker run -d --name sysdig-agent \
    --restart always \
    --privileged \
    --net host \
    --pid host\
     -e ACCESS_KEY=[ACCESS_KEY] \
     -e COLLECTOR=[COLLECTOR_ADDRESS] \
    -e TAGS=[TAGS] \
    -v /var/run/docker.sock:/host/var/run/docker.sock \
    -v /dev:/host/dev \
    -v /proc:/host/proc:ro \
    -v /boot:/host/boot:ro \
    -v /lib/modules:/host/lib/modules:ro \
    -v /usr:/host/usr:ro \
    --shm-size=512m \
    quay.io/sysdig/agent
    

    If you are using eBPF use the following:

    docker run -d --name sysdig-agent \
    --restart always \
    --privileged \
    --net host \
    --pid host\
     -e ACCESS_KEY=[ACCESS_KEY] \
     -e COLLECTOR=[COLLECTOR_ADDRESS] \
    -e TAGS=[TAGS] \
    -e SYSDIG_BPF_PROBE="" \
    -v /sys/kernel/debug:/sys/kernel/debug:ro \
    -v /var/run/docker.sock:/host/var/run/docker.sock \
    -v /dev:/host/dev \
    -v /proc:/host/proc:ro \
    -v /boot:/host/boot:ro \
    -v /lib/modules:/host/lib/modules:ro \
    -v /usr:/host/usr:ro \
    --shm-size=512m \
    quay.io/sysdig/agent
    

On-Premises

Installing As Two Containers

  1. Collect the configuration parameters:

  2. Build and load the kernel module:

    If you are not using eBPF, use the following:

    docker run -it --privileged --rm --name sysdig-agent-kmodule \
    -v /usr:/host/usr:ro \
    -v /boot:/host/boot:ro \
    -v /lib/modules:/host/lib/modules \
    quay.io/sysdig/agent-kmodule
    

    If you are using eBPF use the following:

    docker run -it --privileged --rm --name sysdig-agent-kmodule \
    -e SYSDIG_BPF_PROBE="" \
    -v /etc/os-release:/host/etc/os-release:ro \
    -v /root/.sysdig:/root/.sysdig \
    -v /usr:/host/usr:ro \
    -v /boot:/host/boot:ro \
    -v /lib/modules:/host/lib/modules:ro \
    quay.io/sysdig/agent-kmodule
    
  3. Configure kernel module to load during system boot.

    If you are not using eBPF, use the following commands to configure the Linux system to automatically load the kernel module during system boot.

    $ sudo mkdir -p /etc/modules-load.d
    $ sudo bash -c "echo sysdigcloud-probe > /etc/modules-load.d/sysdigcloud-probe.conf"
    
  4. Run the agent module providing the access key and, optionally, user-defined tags:

    If you are not using eBPF, use the following:

    docker run -d --name sysdig-agent \
    --restart always \
    --privileged \
    --net host \
    --pid host \
    -e ACCESS_KEY=[ACCESS_KEY] \
    -e COLLECTOR=[COLLECTOR_ADDRESS] \
    -e SECURE=true \
    -e CHECK_CERTIFICATE=true \
    [-e TAGS=[TAGS]]
    -v /var/run/docker.sock:/host/var/run/docker.sock \
    -v /dev:/host/dev \
    -v /proc:/host/proc:ro \
    -v /boot:/host/boot:ro \
    -v /lib/modules:/host/lib/modules:ro \
    -v /usr:/host/usr:ro \
    --shm-size=512m \
    quay.io/sysdig/agent-slim
    

    If you are using eBPF use the following:

    docker run -d --name sysdig-agent \
    --restart always \
    --privileged \
    --net host \
    --pid host \
    -e ACCESS_KEY=[ACCESS_KEY] \
    -e COLLECTOR=[COLLECTOR_ADDRESS] \
    -e SECURE=true \
    -e CHECK_CERTIFICATE=true \
    [-e TAGS=[TAGS]]
    -e SYSDIG_BPF_PROBE="" \
    -v /sys/kernel/debug:/sys/kernel/debug:ro \
    -v /var/run/docker.sock:/host/var/run/docker.sock \
    -v /dev:/host/dev \
    -v /proc:/host/proc:ro \
    -v /boot:/host/boot:ro \
    -v /lib/modules:/host/lib/modules:ro \
    -v /usr:/host/usr:ro \
    --shm-size=512m \
    quay.io/sysdig/agent-slim
    

Installing As Single Container (Legacy)

  1. Collect the configuration parameters.

  2. Run the agent module providing the access key and, optionally, user-defined tags:

    If you are not using eBPF, use the following:

    docker run -d --name sysdig-agent \
    --restart always \
    --privileged \
    --net host \
    --pid host \
    -e ACCESS_KEY=[ACCESS_KEY] \
    -e COLLECTOR=[COLLECTOR_ADDRESS] \
    -e SECURE=true \
    -e CHECK_CERTIFICATE=true \
    [-e TAGS=[TAGS]]
    -v /var/run/docker.sock:/host/var/run/docker.sock \
    -v /dev:/host/dev \
    -v /proc:/host/proc:ro \
    -v /boot:/host/boot:ro \
    -v /lib/modules:/host/lib/modules:ro \
    -v /usr:/host/usr:ro \
    --shm-size=512m \
    quay.io/sysdig/agent
    

    If you are using eBPF use the following:

    docker run -d --name sysdig-agent \
    --restart always \
    --privileged \
    --net host \
    --pid host \
    -e ACCESS_KEY=[ACCESS_KEY] \
    -e COLLECTOR=[COLLECTOR_ADDRESS] \
    -e SECURE=true \
    -e CHECK_CERTIFICATE=true \
    [-e TAGS=[TAGS]]
    -e SYSDIG_BPF_PROBE="" \
    -v /sys/kernel/debug:/sys/kernel/debug:ro \
    -v /var/run/docker.sock:/host/var/run/docker.sock \
    -v /dev:/host/dev \
    -v /proc:/host/proc:ro \
    -v /boot:/host/boot:ro \
    -v /lib/modules:/host/lib/modules:ro \
    -v /usr:/host/usr:ro \
    --shm-size=512m \
    quay.io/sysdig/agent
    

Installing Agent as a Service on Linux Host

Use these instructions to install the agent on the host itself, not in a container. Install on each host in the environment.

The command lines below can also be copy/pasted from the Welcome wizard or the Settings>Agent Installation page in the Sysdig Monitor interface.

In that case, your access key will already be included in the command automatically.

The Sysdig agent depends on several python modules, some of which might not be installed on the hosts where the agent is running as a service. When the required dependencies are not available, the sdchecks component in the agent will report errors in the log files, such as:

 >> Error, sdchecks[0] ModuleNotFoundError: No module named 'posix_ipc'

To address these errors, install the missing modules using the pip install command.

SaaS

  1. Run the following command:

    curl -s https://download.sysdig.com/stable/install-agent | sudo bash -s -- --access_key [ACCESS_KEY] --collector [COLLECTOR_ADDRESS] [--tags [TAGS]]
    

    Where [ACCESS_KEY] is your unique agent access key string. For example, 1234-your-key-here-1234. TAGS is an optional list of user-defined agent tags. For example, role:webserver,location:europe.

    See SaaS Regions and IP Ranges to find the collector endpoint for your region.

  2. Restart the agent and start the service:

    sudo systemctl enable dragent
    

On-Premises

  1. Run the following command:

    curl -s https://download.sysdig.com/stable/install-agent | sudo bash -s -- --access_key [ACCESS_KEY] --collector [COLLECTOR_ADDRESS] --secure true --check_certificate true [--tags [TAGS]]
    

    For configuration parameters, see Configuration Options.

  2. Restart the agent and start the service:

    sudo systemctl enable dragent
    

Connect to the Sysdig Backend via Static IPs (SaaS only)

Sysdig provides a list of static IP addresses that can be whitelisted in a Sysdig environment, allowing users to establish a network connection to the Sysdig backend without opening complete network connectivity. This is done by setting the Collector IP to collector-static.sysdigcloud.com:

user@host:~$ docker run --name sysdig-agent \
--privileged \
--net host \
--pid host \
-e ACCESS_KEY=[ACCESS_KEY] \
-e TAGS=[TAGS] \
-v /var/run/docker.sock:/host/var/run/docker.sock \
-v /dev:/host/dev \
-v /proc:/host/proc:ro \
-v /boot:/host/boot:ro \
-v /lib/modules:/host/lib/modules:ro \
-v /usr:/host/usr:ro \
-e COLLECTOR=collector-static.sysdigcloud.com \
-e COLLECTOR_PORT=6443 \
-e SECURE=true \
-e CHECK_CERTIFICATE=true \
--shm-size=512m \
quay.io/sysdig/agent-slim

Guidelines for Manual Agent Installation

In the following cases, we recommend that you manually install the agent.

  • Full control over the deployment process

  • Integration with configuration management tools

  • Custom kernel

  • Unsupported distribution

See Agent Install: Manual Linux Installation for more information.

1.1.5 - Agent Install: Manual Linux Installation

Manual installation of the native Linux agent is recommended in the following cases:

  • Full control over the deployment process

  • Integration with configuration management tools

  • Custom kernel

  • Unsupported distribution (within Debian/Fedora flavors)

Otherwise, you may want to follow Agent Install: Non-Orchestrated

Note: If you are installing the Sysdig agent in an orchestrated infrastructure such as Kubernetes, Mesos/Marathon, use the respective Installation Guides:

Prerequisites

  • Review the Agent Installation Requirements.

  • Collect the configuration parameters:

    • ACCESS_KEY: Your unique access key string. Inability to retrieve the key indicates that the administrator of your instance might have it turned off for non-admin users. Contact your Sysdig administrator to receive the key. If you still have issues please contact Sysdig Support.

    • TAGS: The optional parameter you can use to list one or more tags for this host. Tagging your hosts is highly recommended. Agent Tags allow you to sort nodes of your infrastructure into custom groups in Sysdig Monitor. Replace the [TAGS] parameter above with a comma-separated list of TAG_NAME:TAG_VALUE.

    For example: role:webserver,location:europe

  • Run the commands as root or with sudo.

Installation

Follow the instructions for the appropriate Linux distribution:

Debian, Ubuntu

  1. Use the Sysdig Monitor GPG key, configure the apt repository, and update the package list:

    curl -s https://download.sysdig.com/DRAIOS-GPG-KEY.public | apt-key add -
    curl -s -o /etc/apt/sources.list.d/draios.list http://download.sysdig.com/stable/deb/draios.list
    apt-get update
    
  2. Install kernel development files.

    Note: The following command might not work with every kernel. Ensure that you customize the name of the package properly.

    apt-get -y install linux-headers-$(uname -r)
    
  3. Install, configure, and restart the Sysdig agent.

    apt-get -y install draios-agent
    echo customerid: ACCESS_KEY >> /opt/draios/etc/dragent.yaml
    echo tags: [TAGS] >> /opt/draios/etc/dragent.yaml
    echo collector: COLLECTOR_URL >> /opt/draios/etc/dragent.yaml
    service dragent restart
    

    See Prerequisites for the configuration parameters required.

CentOS, RHEL, Fedora, Amazon AMI, Amazon Linux 2

  1. Trust the Sysdig Monitor GPG key, configure the yum repository.

    $ rpm --import https://download.sysdig.com/DRAIOS-GPG-KEY.public
    $ curl -s -o /etc/yum.repos.d/draios.repo http://download.sysdig.com/stable/rpm/draios.repo
    
  2. Install the EPEL repository.

    Note: The following command is required only if DKMS is not available in the distribution. You can verify if DKMS is available with yum list dkms.

    The command below contains a sample release number; be sure to update with the correct release.

    $ rpm -i http://mirror.us.leaseweb.net/epel/6/i386/epel-release-6-8.noarch.rpm
    
  3. Install kernel development files.

    Note: The following command might not work with every kernel. Make sure to customize the name of the package properly.

    $ yum -y install kernel-devel-$(uname -r)
    
  4. Install, configure, and start the Sysdig agent.

    $ yum -y install draios-agent
    $ echo customerid: ACCESS_KEY >> /opt/draios/etc/dragent.yaml
    $ echo tags: [TAGS] >> /opt/draios/etc/dragent.yaml
    $ sudo systemctl enable dragent
    $ sudo systemctl start dragent
    

    See Prerequisites for the configuration parameters required.

  5. If you using a non-systemd Linux distribution, use the service command to start dragent.

    $ service dragent restart
    

Other Linux Distributions

The Sysdig agent is unsupported outside of the Debian, Fedora, and Amazon distributions.

1.1.6 - Agent Install: Amazon ECS

Amazon Elastic Container Service (ECS) is a fully managed container orchestration service that helps to easily deploy, manage, and scale containerized applications.

This section describes how to install the Sysdig agent container on each underlying host in your ECS cluster. Once installed, the agent will automatically begin monitoring all of your hosts, service and tasks.

These instructions are valid only for ECS clusters using EC2 instances. For information on ECS Fargate clusters, see AWS Fargate Serverless Agents.

Installation

To install Sysdig agent on ECS, do the following:

  • Create an ECS task definition for the Sysdig agent.

  • Register the task definition in your AWS account.

  • Create a service with the previous task definition to run the Sysdig agent in each of the nodes of your ECS cluster.

Create an ECS Task Definition

  1. Collect the following configuration parameters:
  • ACCESS_KEY: The agent access key. You can retrieve this from Settings > Agent Installation in either Sysdig Monitor or Sysdig Secure.
  • COLLECTOR: Use the collector address for your region. For more information, see SaaS Regions and IP Ranges.
  • TAGS: The list of tags for the host where the agent is installed. For example: role:webserver, location:europe, role:webserver
  1. Use the above values to customize the JSON snippet below and save it as a file named sysdig-agent-ecs.json.

Note that memory and cpu have both been set to 1024, depending of the size of your cluster you might want to tune those values. See Tuning Sysdig Agent for more information.

```yaml
{
  "family": "sysdig-agent-ecs",
  "containerDefinitions": [
    {
      "name": "sysdig-agent",
      "image": "quay.io/sysdig/agent-slim",
      "cpu": 1024,
      "memory": 1024,
      "privileged": true,
      "environment": [
        {
          "name": "ACCESS_KEY",
          "value": "$ACCESS_KEY"
        },
        {
          "name": "COLLECTOR",
          "value": "$COLLECTOR"
        },
	      {
          "name": "TAGS",
          "value": "$TAG1,TAG2"
        }
      ],
      "mountPoints": [
        {
          "readOnly": true,
          "containerPath": "/host/boot",
          "sourceVolume": "boot"
        },
        {
          "containerPath": "/host/dev",
          "sourceVolume": "dev"
        },
        {
          "readOnly": true,
          "containerPath": "/host/lib/modules",
          "sourceVolume": "modules"
        },
        {
          "readOnly": true,
          "containerPath": "/host/proc",
          "sourceVolume": "proc"
        },
        {
          "containerPath": "/host/var/run/docker.sock",
          "sourceVolume": "sock"
        },
        {
          "readOnly": true,
          "containerPath": "/host/usr",
          "sourceVolume": "usr"
        }
      ],
      "dependsOn": [
        {
          "containerName": "sysdig-agent-kmodule",
          "condition": "SUCCESS"
        }
      ]
    },
    {
      "name": "sysdig-agent-kmodule",
      "image": "quay.io/sysdig/agent-kmodule",
      "memory": 512,
      "privileged": true,
      "essential": false,
      "mountPoints": [
        {
          "readOnly": true,
          "containerPath": "/host/boot",
          "sourceVolume": "boot"
        },
        {
          "containerPath": "/host/dev",
          "sourceVolume": "dev"
        },
        {
          "readOnly": true,
          "containerPath": "/host/lib/modules",
          "sourceVolume": "modules"
        },
        {
          "readOnly": true,
          "containerPath": "/host/proc",
          "sourceVolume": "proc"
        },
        {
          "containerPath": "/host/var/run/docker.sock",
          "sourceVolume": "sock"
        },
        {
          "readOnly": true,
          "containerPath": "/host/usr",
          "sourceVolume": "usr"
        }
      ]
    }
  ],
  "pidMode": "host",
  "networkMode": "host",
  "volumes": [
    {
      "name": "sock",
      "host": {
        "sourcePath": "/var/run/docker.sock"
      }
    },
    {
      "name": "dev",
      "host": {
        "sourcePath": "/dev/"
      }
    },
    {
      "name": "proc",
      "host": {
        "sourcePath": "/proc/"
      }
    },
    {
      "name": "boot",
      "host": {
        "sourcePath": "/boot/"
      }
    },
    {
      "name": "modules",
      "host": {
        "sourcePath": "/lib/modules/"
      }
    },
    {
      "name": "usr",
      "host": {
        "sourcePath": "/usr/"
      }
    }
  ],
  "requiresCompatibilities": [
    "EC2"
  ]
}
```

Register a Task Definition

Once your task definition is ready, ensure that you register it in your AWS account:

aws ecs register-task-definition \
    --cli-input-json file://sysdig-agent-ecs.json

Run the Agent as an ECS Service

Using the ECS task definition you have created, create a service in the cluster that you want to monitor with Sysdig.

aws ecs create-service \
    --cluster $CLUSTER_NAME \
    --service-name sysdig-agent-svc \
    --launch-type EC2 \
    --task-definition sysdig-agent-ecs \
    --scheduling-strategy DAEMON

With the agent installed, Sysdig will begin auto-discovering your containers and other resources of your ECS environment.

Using ECS Anywhere

If you’re using ECS Anywhere, change the launch type to EXTERNAL when the service is created.

aws ecs create-service \
    --cluster $CLUSTER_NAME \
    --service-name sysdig-agent-svc \
    --launch-type EXTERNAL \
    --task-definition sysdig-agent-ecs \
    --scheduling-strategy DAEMON

Enable Log Driver

You can send the logs from the containers running the ECS tasks to the log groups in CloudWatch Logs. You can send agent container log files to AWS by enabling the log driver, awslogs. To do so:

  1. Add the following section to each of the container definitions you’ve created above:

                "logConfiguration": {
                    "logDriver": "awslogs",
                    "options": {
                        "awslogs-group": "$YOUR_LOG_GROUP",
                        "awslogs-region": "$AWS_REGION",
                        "awslogs-stream-prefix": "sysdig"
                    }
    
  2. Update your task definition and the service to enable the logs.

1.1.7 - Agent Install: IKS (IBM Cloud Monitoring)

IBM Cloud maintains the documentation for Sysdig agent installation on IBM Cloud Kubernetes Service (IKS).

For more information, see the IBM Cloud Monitoring documentation:

1.1.8 - Agent Install: Mesos | Marathon | DCOS

Marathon is the container orchestration platform for Mesosphere’s Datacenter Operating System (DC/OS) and Apache Mesos.

This guide describes how to install the Sysdig agent container on each underlying host in your Mesos cluster. Once installed, the agent will automatically connect to the Mesos and Marathon APIs to pull relevant metadata about the environment and will begin monitoring all of your hosts, apps, containers, and frameworks.

Prerequisites

  • Review the Agent Installation Requirements.

  • Collect the configuration parameters. You will need them to create the JSON file.

    • ACCESS_KEY: Your unique access key string. Inability to retrieve the key indicates that the administrator of your instance might have it turned off for non-admin users. Contact your Sysdig administrator to receive the key. If you still have issues please contact Sysdig Support.

    • COLLECTOR: The collector URL for Sysdig Monitor or Sysdig Secure. This value is region-dependent in SaaS and is auto-completed on the Get Started page in the UI. It is a custom value in on-prem installations. See SaaS Regions and IP Ranges.

    • COLLECTOR_PORT: The default is 6443. It is used in environments with Sysdig’s on-premises backend installed.

    • SECURE: Use a secure SSL/TLS connection to send metrics to the collector. It is used in environments with Sysdig’s on-premises backend installed.

    • CHECK_CERT: Determines strong SSL certificate check for Sysdig Monitor on-premises installation. Set to true when using SSL/TLS to connect to the collector service to ensure that a valid SSL/TLS certificate is installed. It is used in environments with Sysdig’s on-premises backend installed.

Installation

In this three-part installation, you:

  • Deploy the Sysdig agent on all Mesos Agent (Slave) nodes, either automatically or by creating and posting a .json file to the leader Marathon API server.

  • Deploy the Sysdig agent on the Mesos Master nodes.

  • Special configuration steps: modify the Sysdig agent config file to monitor Marathon instances.

Deploy the Sysdig Agent on Mesos Agent Nodes

Preferred Option: Automatic install (DC/OS 1.11+)

If you’re using DC/OS 1.8 or higher, then you can find Sysdig agent in the Mesosphere Universe marketplace and install it from there.

It will automatically deploy the Sysdig agent container on each of your Mesos Agent nodes as a Marathon app.

Proceed to Deploy the Sysdig Agent.

Alternate Option: Post a .json file

If you are using a version of DC/OS earlier than 1.8 then:

  1. Create a JSON file for Marathon, in the following format. See configuration parameters for details.

    {
      "backoffFactor": 1.15,
      "backoffSeconds": 1,
      "constraints": [
        [
          "hostname",
          "UNIQUE"
        ]
      ],
      "container": {
        "docker": {
          "forcePullImage": true,
          "image": "sysdig/agent",
          "parameters": [],
          "privileged": true
        },
        "type": "DOCKER",
        "volumes": [
          {
            "containerPath": "/host/var/run/docker.sock",
            "hostPath": "/var/run/docker.sock",
            "mode": "RW"
          },
          {
            "containerPath": "/host/dev",
            "hostPath": "/dev",
            "mode": "RW"
          },
          {
            "containerPath": "/host/proc",
            "hostPath": "/proc",
            "mode": "RO"
          },
          {
            "containerPath": "/host/boot",
            "hostPath": "/boot",
            "mode": "RO"
          },
          {
            "containerPath": "/host/lib/modules",
            "hostPath": "/lib/modules",
            "mode": "RO"
          },
          {
            "containerPath": "/host/usr",
            "hostPath": "/usr",
            "mode": "RO"
          }
        ]
      },
      "cpus": 1,
      "deployments": [],
      "disk": 0,
      "env": {
        "ACCESS_KEY": "ACCESS_KEY=YOUR-ACCESS-KEY-HERE",
        "CHECK_CERT": "false",
        "SECURE": "true",
        "TAGS": "example_tag:example_value",
        "name": "sdc-agent",
        "pid": "host",
        "role": "monitoring",
        "shm-size": "350m"
      },
      "executor": "",
      "gpus": 0,
      "id": "/sysdig-agent",
      "instances": 1,
      "killSelection": "YOUNGEST_FIRST",
      "labels": {},
      "lastTaskFailure": {
        "appId": "/sysdig-agent",
        "host": "YOUR-HOST",
        "message": "Container exited with status 70",
        "slaveId": "1fa6f2fc-95b0-445f-8b97-7f91c1321250-S2",
        "state": "TASK_FAILED",
        "taskId": "sysdig-agent.3bb0759d-3fa3-11e9-b446-c60a7a2ee871",
        "timestamp": "2019-03-06T00:03:16.234Z",
        "version": "2019-03-06T00:01:57.182Z"
      },
      "maxLaunchDelaySeconds": 3600,
      "mem": 850,
      "networks": [
        {
          "mode": "host"
        }
      ],
      "portDefinitions": [
        {
          "name": "default",
          "port": 10101,
          "protocol": "tcp"
        }
      ],
      "requirePorts": false,
      "tasks": [
        {
          "appId": "/sysdig-agent",
          "healthCheckResults": [],
          "host": "YOUR-HOST-IP",
          "id": "sysdig-agent.0d5436f4-3fa4-11e9-b446-c60a7a2ee871",
          "ipAddresses": [
            {
              "ipAddress": "YOUR-HOST-IP",
              "protocol": "IPv4"
            }
          ],
          "localVolumes": [],
          "ports": [
            4764
          ],
          "servicePorts": [],
          "slaveId": "1fa6f2fc-95b0-445f-8b97-7f91c1321250-S2",
          "stagedAt": "2019-03-06T00:09:04.232Z",
          "startedAt": "2019-03-06T00:09:06.912Z",
          "state": "TASK_RUNNING",
          "version": "2019-03-06T00:09:04.182Z"
        }
      ],
      "tasksHealthy": 0,
      "tasksRunning": 1,
      "tasksStaged": 0,
      "tasksUnhealthy": 0,
      "unreachableStrategy": {
        "expungeAfterSeconds": 0,
        "inactiveAfterSeconds": 0
      },
      "upgradeStrategy": {
        "maximumOverCapacity": 1,
        "minimumHealthCapacity": 1
      },
      "version": "2019-03-06T00:09:04.182Z",
      "versionInfo": {
        "lastConfigChangeAt": "2019-03-06T00:09:04.182Z",
        "lastScalingAt": "2019-03-06T00:09:04.182Z"
      }
    }
    

    See Environment Variables for Agent Config File for the Sysdig name:value definitions.

    Complete the “cpus”, “mem” and “labels” (i.e. Marathon labels) entries to fit the capacity and requirements of the cluster environment.

  2. Update the created.json file to the leader Marathon API server:

    $ $curl -X POST http://$(hostname -i):8080/v2/apps -d @sysdig.json -H "Content-type: application/json"
    

Deploy the Sysdig Agent on Master Nodes

After deploying the agent to the Mesos Agent nodes, you will install agents on each of the Mesos Master nodes as well.

If any cluster node has both Mesos Master and Mesos Agent roles, do not perform this installation step on that node. It already will have a Sysdig agent installed from the procedure in step A. Running duplicate Sysdig agents on a node will cause errors.

Use the Agent Install: Non-Orchestrated instructions to install the agent directly on each of your Mesos Master nodes.

When the Sysdig agent is successfully installed on the master nodes, it will automatically connect to the local Mesos and Marathon (if available) API servers via http://localhost:5050 and http://localhost:8080 respectively, to collect cluster configuration and current state metadata in addition to host metrics.

Additional Configuration

In certains situations, you may need to add additional configurations to the dragent.yaml file:

  • If the Sysdig agent cannot be run directly on the Mesos API server.

  • If the API server is protected with a username/password.

Descriptions and examples are shown below.

Sysdig Agent Unable to Run on the Mesos API Server

Mesos allows multiple masters. If the API server can not be instrumented with a Sysdig agent, simply delegate ONE other node with an agent installed to remotely receive infrastructure information from the API server.

NOTE: If you manually configure the agent to point to a master with a static configuration file entry, then automatic detection/following of leader changes will no longer be enabled.

Add the following Mesos parameter to the delegated agent’s dragent.yaml file to allow it to connect to the remote API server and authenticate, either by:

a. Directly editing dragent.yaml on the host, or

b. Converting the YAML code to a single-line format and adding it as an ADDITIONAL_CONF argument in a Docker command.

See Understanding the Agent Configuration for details.

Specify the API server’s connection method, address, and port. Also specify credentials if necessary.

YAML example:

mesos_state_uri: http://[acct:passwd@][hostname][:port]
marathon_uris:
  - http://[acct:passwd@][hostname][:port]

Although marathon_uris: is an array, currently only a single “root” Marathon framework per cluster is supported. Multiple side-by-side Marathon frameworks should not be configured in order for our agent to function properly. Multiple side-by-side “root” Marathon frameworks on the same cluster are currently not supported. The only supported multiple-Marathon configuration is with one “root” Marathon and other Marathon frameworks as its apps.

Mesos API Server Requires Authentication

If the agent is installed on the API server but the API server uses a different port or requires authentication, those parameters must be explicitly specified.

Add the following Mesos parameters to the API server’s dragent.yaml to make it connect to the API server and authenticate with any unique account and password, either by:

a. Directly editing dragent.yaml on the host, or

b. Converting the YAML code to a single-line format and adding it as an ADDITIONAL_CONF argument in a Docker command.

See Understanding the Agent Configuration for details.

Specify the API server’s protocol, user credentials, and port:

mesos_state_uri: http://[username:password@][hostname][:port]
marathon_uris:
  - http://[acct:passwd@][hostname][:port]

*HTTPS protocol is also supported.

Troubleshooting: Turning Off Metadata Reception

In troubleshooting cases where auto-detection and reporting of your Mesos infrastructure needs to be temporarily turned off in a designated agent:

  1. Comment out the Mesos parameter entries in the agent’s dragent.yaml file.

    Example parameters to disable: mesos_state_uri, marathon_uris

  2. If the agent is running on the API server (Master node) and auto-detecting a default configuration, you can add the line:

    mesos_autodetect: false

    either directly in the dragent.yaml file or as an ADDITIONAL_CONF parameter in a Docker command.

  3. Restart the agent.

1.1.9 - Airgapped Agent Installation

Airgapped environments are those that do not have network access to the public internet.

At startup, the agent will try to compile its own version of the probes, provided kernel header packages are installed on the host. Failing that, the agent will try to download pre-compiled probes, sysdigcloud-probe-<suffix>.ko or sysdigcloud-probe-bpf-<suffix>.o, from the Sysdig download site over the internet.

In an airgapped environemnt, you cannot download these artifacts. Therefore, before installing the agent, you will have to compile sysdigcloud-probe-<suffix> for each kernel version in your environment, and make it available to the installed agents through an internally accessible URL.

Prerequisites

  • A machine with internet access where you can download the required artifacts
  • A machine in your airgapped environment where you can build your probes
  • Tool to transfer artifacts to the machine in your airgapped environment
  • Docker installed

Overview

Sysdig provides a tool, named the probe builder, to help you build the probes for different kernels and for a specific agent version. After downloading the required artifacts on a machine connected to the internet, you can copy them to an airgapped host, build your own probes, and make them available to your agent installations.

On a Machine with Internet Connectivity

Prepare the Sysdig Probe Builder Images

On a machine with internet connectivity, build the Sysdig probe builder container images and create a tar file of the images.

  1. Get the probe builder source code from the repository:

    $ git clone https://github.com/draios/probe-builder
    
  2. Build the container image for the probe builder:

    $ docker build -t airgap/sysdig-probe-builder probe-builder/
    
  3. Build the images for each supported distribution-compiler combination:

    $ docker run --rm -v /var/run/docker.sock:/var/run/docker.sock airgap/sysdig-probe-builder:latest -P -b airgap/
    

    Running this command will create a different image tag for each supported combination of distribution-compiler, with the distro-compiler information suffixed to the image name, airgap/sysdig-probe-builder. For example, airgap/sysdig-probe-builder:centos-gcc4.8.

  4. Save all the above images to a tar archive:

    $ docker save airgap/sysdig-probe-builder | gzip > builders.tar.gz
    
  5. (optional) If you are building probes for the Ubuntu kernels, you will also need an ubuntu:latest image on your airgapped host. You can build it as follows:

    $ docker pull ubuntu
    $ docker save ubuntu | gzip > ubuntu.tar.gz
    

Download the Kernel Packages

Download your kernel packages. For more information, see Download Kernel Packages.

Download Probe Source Code

You need to download the probe source code for a specific agent version you want to build your probes for.

For example, for agent version 12.0.0 you would use:

$ git clone https://github.com/draios/agent-libs
$ cd agent-libs
$ git archive agent/12.0.0 --prefix sysdig/ | gzip > sysdig.tar.gz

Transfer the Downloaded Files

Copy the artifacts you have built to the airgapped host machine:

  • builders.tar.gz
  • ubuntu.tar.gz (if needed, see above)
  • sysdig.tar.gz
  • Kernel packages

On the Airgapped Host

Load the Builder Images

$ zcat builders.tar.gz | docker load

Unpack the Sysdig Source

$ tar xzf sysdig.tar.gz

Running this command will create the sysdig/ directory in the current directory.

Move the Kernel Packages to a Dedicated Location

Make sure you have all the downloaded kernel package artifacts in a single directory, /directory-containing-kernel-packages/, for each distribution you want to support.

Run the Probe Builder

Now that you have all your requirements in place, you can run the main probe builder:

$ docker run --rm \
  -v /var/run/docker.sock:/var/run/docker.sock \
  -v /a-directory-with-some-free-space/:/workspace \
  -v /wherever-you-unpacked/sysdig/:/sysdig \
  -v /directory-containing-kernel-packages/:/kernels \
  airgap/sysdig-probe-builder:latest -B -b airgap/ -- \
  -p sysdigcloud-probe -v 12.0.0 -k CustomCentOS

The probes will appear in /a-directory-with-some-free-space/output. That directory can be served over HTTP and the URL to the server used as SYSDIG_PROBE_URL when loading the module (e.g. agent-kmodule container). As an example, the following sections describe how you can deploy your own nginx server within your cluster and upload your probes there.

Serve Your Pre-Compiled Probes

Set up a local repository to host the pre-compiled kernel module. Here is an example with nginx:

$ docker run --rm -v /a-directory-with-some-free-space/output:/usr/share/nginx/html/stable/sysdig-probe-binaries -p 80:80 nginx

Note down the URL and use it as the SYSDIG_PROBE_URL while installing the agent.

​ See Run the Probe Builder.

Run the Probe Builder

$ docker run --rm \
  -v /var/run/docker.sock:/var/run/docker.sock \
  -v /sysdigcloud-probe/:/workspace \
  -v /wherever-you-unpacked/sysdig/:/sysdig \
  -v /directory-containing-kernel-packages/:/kernels \
  airgap/sysdig-probe-builder:latest -B -b airgap/ -- \
  -p sysdigcloud-probe -v 12.0.0 -k CustomCentOS

The probes will appear in /sysdigcloud-probe/output. This directory can be served over HTTP and the URL to the server used as SYSDIG_PROBE_URL when loading the module. For example, agent-kmodule container.

Install Agent in a Docker Environment

  1. Install Sysdig agent by pointing SYSDIG_PROBE_URL to the local repository:

    For docker-based installations:

    $ docker run -d --name sysdig-agent --restart always --privileged --net host --pid host -e ACCESS_KEY=WWWWW-YYYY-XXXX-ZZZZ-123456789 -e SECURE=true -e SYSDIG_PROBE_URL=http://www.mywebserver.net:80/ -v /var/run/docker.sock:/host/var/run/docker.sock -v /dev:/host/dev -v /proc:/host/proc:ro -v /boot:/host/boot:ro -v /lib/modules:/host/lib/modules:ro -v /usr:/host/usr:ro --shm-size=512m sysdig/agent
    

    Where -e SYSDIG_PROBE_URL=http://www.mywebserver:80/ is the local nginx web server with the loaded module.

    Note: To use HTTPS communication with a self-signed or untrusted certificate, use the -e SYSDIG_PROBE_INSECURE_DOWNLOAD=true environment variable in the above command.

  2. Check the agent log. If the installation is successful, you will see a message as follows:

    Evaluating override of environment variables
    
    Trying to download precompiled module from http://mywebserver:80/stable/sysdig-probe-binaries/sysdigcloud-probe-<version>
    
    Download succeeded
    
  3. Continue with the instructions in Agent Install: Non-Orchestrated.

Install Agent in a Kubernetes Environment

  1. Open your agent daemonset and update the SYSDIG_PROBE_URL to point to the local repository:

    - name: SYSDIG_PROBE_URL
      value: http://www.mywebserver:80/
    

    If you would like to use secure communication with a self-signed or untrusted certificate, apply the SYSDIG_PROBE_INSECURE_DOWNLOAD environment variable.

    - name: SYSDIG_PROBE_INSECURE_DOWNLOAD
      value: true
    
  2. Continue with the instructions in Agent Install: Kubernetes.

1.1.10 - Identify Agent Version

Use one of the following methods to determine the version of the agents installed in your environment:

Explore

Segmenting metrics by using agent.version shows the installed versions of agents in your environment. For example, segment the uptime metric across your environment by using agent.version . Hover over the graph to see the list of agent versions.

The image shows the list of agent versions in n/a.

Dashboard

Use the Sysdig Agent Health Dashboard to determine the agent versions:

  1. Log in to the Sysdig Monitor.

  2. Select Dashboards and expand Host Infrastructure Dashboards.

  3. Open the Sysdig Agent Health & Status template or create your own from the template.

    The Sysdig Agent and Health & Status Dashboard shows the agent version corresponding to each host in your environment.

1.2 - Agent Configuration

Out of the box, the Sysdig agent will gather and report on a wide variety of pre-defined metrics. It can also accommodate any number of custom parameters for additional metrics collection.

Use this section when you need to change the default or pre-defined settings by editing the agent configuration files, or for other special circumstances.

For the latest helm-based installation instructions and configuration options, see sysdig-deploy.

Monitoring Integrations also require editing the agent config files.

By default, the Sysdig agent is configured to collect metric data from a range of platforms and applications. You can edit the agent config files to extend the default behavior, including additional metrics for JMX, StatsD, Prometheus, or a wide range of other applications. You can also monitor log files for targeted text strings.

1.2.1 - Understand the Agent Configuration

Out of the box, the Sysdig agent will gather and report on a wide variety of pre-defined metrics. It can also accommodate any number of custom parameters for additional metrics collection.

The agent relies on a pair of configuration files to define metrics collection parameters:

dragent.default.yaml

The core configuration file. You can look at it to understand more about the default configurations provided.

Location: /opt/draios/etc/dragent.default.yaml.

CAUTION. This file should never be edited.

dragent.yaml or configmap.yaml (Kubernetes)

The configuration file where parameters can be added, either directly in YAML as name/value pairs, or using environment variables such as ADDITIONAL_CONFLocation: /opt/draios/etc/dragent.yaml.

The dragent.yaml file can be accessed and edited in several ways, depending on how the agent was installed. This document describes how to modify dragent.yaml.

One additional file, dragent.auto.yaml is also created and used in special circumstances. See Optional: Agent Auto-Config for more detail.

Access and Edit the Configuration File

There are various ways to add or edit parameters indragent.yaml.

Option 1: With dragent.yaml (for testing)

It is possible to edit the container’s file directly on the host.

Add parameters directly in YAML.

  1. Access dragent.yamldirectly at"/opt/draios/etc/dragent.yaml."

  2. Edit the file. Use proper YAML syntax.

    See the examples at the bottom of the page.

  3. Restart the agent for changes to take effect

  • Native agent: service dragent restart

  • Container agent: docker restart sysdig-agent

Option 2: With configmap.yaml (Kubernetes)

Configmap.yaml is the configuration file where parameters can be added, either directly in YAML as name/value pairs, or using environment variables such as ‘ADDTIONAL_CONF."

If you install agents as DaemonSets on a system running Kubernetes, you use configmap.yaml to connect with and manipulate the underlyingdragent.yamlfile.

See Agent Install: Kubernetes for more information.

Add parameters directly in YAML.

Edit the files locally and apply with the changes withkubectl -f.

  1. Access theconfigmap.yaml.

  2. Edit the file as needed.

  3. Apply the changes:

    kubectl apply -f sysdig-agent-configmap.yaml

Running agents will automatically pick the new configuration after Kubernetes pushes the changes across all the nodes in the cluster.

Option 3: With Docker Run (Docker)

Add -e ADDITIONAL_CONF="<VARIABLES>" to a Docker run command, where <VARIABLES> contains all the customized parameters you want to include, in a single-line format.

Convert YAML Parameters to Single-Line Format

To insert ADDITIONAL_CONF parameters in a Docker run command or a daemonset file, you must convert the YAML code into a single-line format.

You can do the conversion manually for short snippets. To convert longer portions of YAML, use echo|sed commands.

In earlier versions, the Sysdig Agent connected to port 6666. This behavior has been deprecated, as the Sysdig agent now connects to port 6443.

The basic procedure:

  1. Write your configuration in YAML, as it would be entered directly in dragent.yaml.

  2. In a bash shell, use echo and sed to convert to a single line.

    sed script: echo "" | sed -e ':a' -e 'N' -e '$!ba' -e 's/\n/\\n/g'

  3. Insert the resulting line into a Docker run command or add it to the daemonset file as an ADDITIONAL_CONF.

Example: Simple

Insert parameters to turn off StatsD collection and blacklist port 6443.

Sysdig agent uses port 6443 for both inbound and outbound communication with the Sysdig backend. The agent initiates a request and keeps a connection open with the Sysdig backend for the backend to push configurations, Falco rules, policies, and so on. Ensure that you allow the agents’ inbound and outbound communication on TCP 6443 from the respective IPs associated with your SaaS Regions. Note that you are allowing the agent to send communication outbound on TCP 6443 to the inbound IP ranges listed in the SaaS Regions.

YAML format

statsd:
    enabled: false
    blacklisted_ports:
    - 6443

Single-line format (manual)

Use spaces, hyphens, and \n correctly when manually converting to a single line:

ADDITIONAL_CONF="statsd:\n enabled: false\n blacklisted_ports:\n - 6443"

Here the single line is incorporated into a full agent startup Docker command.

docker run
  --name sysdig-agent \
  --privileged \
  --net host \
  --pid host \
  -e ACCESS_KEY=1234-your-key-here-1234 \
  -e TAGS=dept:sales,local:NYC \
  -e ADDITIONAL_CONF="statsd:\n    enabled: false\n    blacklisted_ports:\n    - 6443" \
  -v /var/run/docker.sock:/host/var/run/docker.sock \
  -v /dev:/host/dev \
  -v /proc:/host/proc:ro \
  -v /boot:/host/boot:ro \
  -v /lib/modules:/host/lib/modules:ro \
  -v /usr:/host/usr:ro \
quay.io/sysdig/agent
Example: Complex

Insert parameters to override the default configuration for a RabbitMQ app check.

YAML format

app_checks:
  - name: rabbitmq
    pattern:
      port: 15672
    conf:
      rabbitmq_api_url: "http://localhost:15672/api/"
      rabbitmq_user: myuser
      rabbitmq_pass: mypassword
      queues:
        - MyQueue1
        - MyQueue2

Single-line format (echo |sed)

From a bash shell, issue the echo command and sed script.

echo "app_checks:
  - name: rabbitmq
    pattern:
      port: 15672
    conf:
      rabbitmq_api_url: "http://localhost:15672/api/"
      rabbitmq_user: myuser
      rabbitmq_pass: mypassword
      queues:
        - MyQueue1
        - MyQueue2
" | sed -e ':a' -e 'N' -e '$!ba' -e 's/\n/\\n/g'

This results in the single-line format to be used with ADDITIONAL_CONF in a Docker command or daemonset file.

"app_checks:\n - name: rabbitmq\n  pattern:\n    port: 15672\n  conf:\n    rabbitmq_api_url: http://localhost:15672/api/\n    rabbitmq_user: myuser\n    rabbitmq_pass: mypassword\n    queues:\n      - MyQueue1\n      - MyQueue2\n"

Option 4: With Helm Format

If you installed the Sysdig agent in Kubernetes Using a Helm chart, then no configmap.yaml file was downloaded. You edit dragent.yaml using Helm syntax:

Example

$ helm install \
    --namespace sysdig-agent \
    --set agent.sysdig.settings.tags='linux:ubuntu\,dept:dev\,local:nyc' \
    --set global.clusterConfig.name='my_cluster' \
    sysdig/sysdig-deploy

Will be transformed into

data:
 dragent.yaml: |
  tags: linux:ubuntu,dept:dev,local:nyc
  k8s_cluster_name: my_cluster  

Table 1: Environment Variables for Agent Config File

Name

Value

Description

ACCESS_KEY

<your Sysdig access key>

Required

TAGS

<meaningful tags you want applied to your instances>

Optional. These are displayed in Sysdig Monitor for ease of use.

For example:

tags: linux:ubuntu,dept:dev,local:nyc

See sysdig-agent-configmap.yaml.

Region

The region associated with your Sysdig application.

Enter the SaaS region.

COLLECTOR

<collector-hostname.com> or 111.222.333.400

Enter the host name or IP address of the Sysdig collector service. Note that when used within dragent.yaml, must be lowercase collector.

For SaaS regions, see: SaaS Regions and IP Ranges.

COLLECTOR_PORT

6443

On-prem only. The port used by the Sysdig collector service; default 6443.

SECURE

"true"

On-prem only. If using SSL/TLS to connect to collector service value = "true" otherwise "false."

CHECK_CERTIFICATE

"false"

On-prem only. Set to "true" when using SSL/TLS to connect to the collector service and should check for valid SSL/TLS certificate.

ADDITIONAL_CONF

Optional. A place to provide custom configuration values to the agent as environment variables .

SYSDIG_PROBE_URL

Optional. An alternative URL to download precompiled kernel module.

Sample Docker Command Using Variables

docker run \
  --name sysdig-agent \
  --privileged \
  --net host \
  --pid host \
  -e ACCESS_KEY=3e762f9a-3936-4c60-9cf4-c67e7ce5793b \
  -e COLLECTOR=mycollector.elb.us-west-1.amazonaws.com \
  -e COLLECTOR_PORT=6443 \
  -e CHECK_CERTIFICATE=false \
  -e TAGS=my_tag:some_value \
  -e ADDITIONAL_CONF="log:\n file_priority: debug\n console_priority: error" \
  -v /var/run/docker.sock:/host/var/run/docker.sock \
  -v /dev:/host/dev \
  -v /proc:/host/proc:ro \
  -v /boot:/host/boot:ro \
  -v /lib/modules:/host/lib/modules:ro \
  -v /usr:/host/usr:ro \
  --shm-size=350m \
quay.io/sysdig/agent

1.2.2 - Configure Agent Modes

Agent modes provide the ability to control metric collection to fit your scale and specific requirement. You can choose one of the following modes to do so:

  • Monitor

  • Monitor Light

  • Troubleshooting

  • Secure

  • Custom Metrics Only

Using a stripped-down mode limits collection of unneeded metrics, which in turn prevents the consumption of excess resources and helps reduce expenses.

Monitor

The Monitor mode offers an extensive collection of metrics. We recommend this mode to monitor enterprise environments.

monitor is the default mode if you are running the Enterprise tier. To switch back to the Monitor mode from a different mode, do one of the following:

  • Add the following to the dragent.yaml file and restart the agent:

    feature:
      mode: monitor
    
  • Remove the parameter related to the existing mode from the dragent.yaml file and restart the agent. For example, to switch from troubleshooting mode to monitor, delete the following lines:

    feature:
      mode: troubleshooting
    

Monitor Light

Monitor Light caters to the users that run agents in a resource-restrictive environment, or to those who are interested only in a limited set of metrics.

Monitor Light provides CPU, Memory, File, File system, and Network metrics. For more information, see Metrics Available in Monitor Light.

Enable Monitor Light Mode

To switch to the Monitor Light mode, edit the dragent.yaml file:

  1. Open the dragent.yaml file.

  2. Add the following configuration parameter:

    feature:
      mode: monitor_light
    
  3. Restart the agent.

Troubleshooting

Troubleshooting mode offers sophisticated metrics with detailed diagnostic capabilities. Some of these metrics are heuristic in nature.

In addition to the extensive metrics available in the Monitor mode, Troubleshooting mode provides additional metrics such as net.sql and additional segmentation for file and network metrics. For more information, see Additional Metrics Values Available in Troubleshooting.

Enable Troubleshooting Mode

To switch to the Troubleshooting mode, edit the dragent.yaml file:

  1. Open the dragent.yaml file.

  2. Add the following configuration parameter:

    feature:
      mode: troubleshooting
    
  3. Restart the agent.

Secure Mode

The secure mode supports only Sysdig Secure features.

Sysdig agent collects no metrics in the secure mode, which, in turn, minimizes network consumption and storage requirement in the Sysdig backend. Lower resource usage can help reduce costs and improve performance.

In the Secure mode, the Monitor UI shows no data because no metrics are sent to the collector.

This feature requires agent v10.5.0 or above.

Enabling Secure Mode

  1. Open the dragent.yaml file.

  2. Add the following:

    feature:
      mode: secure
    
  3. Restart the agent.

Custom Metrics Only Mode

Custom Metrics Only mode collects the same metrics as the Monitor Light mode, but also adds the ability to collect the following:

  • Custom Metrics: StatsD, JMX, App Checks, and Prometheus
  • Kubernetes State Metrics

As such, Custom Metrics Only mode is suitable if would like to use most of the features of Monitor mode but are limited in resources.

This mode is not compatible with Secure. If your account is configured for Secure, you must explicitly disable Secure in the agent configuration if you wish to use this mode.

This mode requires agent v12.4.0 or above.

Enabling Custom Metrics Only Mode

  1. Open the dragent.yaml file.

  2. Add the following configuration parameter:

    feature:
      mode: custom-metrics-only
    
  3. If your account is enabled for Secure, add the following:

    security:
      enabled: false
    secure_audit_streams:
      enabled: false
    falcobaseline:
      enabled: false
    

    This configuration explicitly disables the Secure features in the agent. If you do not disable Secure, the agent will not start due to incompatiblity issues.

  4. Restart the agent.

1.2.2.1 - Metrics Available in Monitor Light

Monitor Light provides cpu, memory, file, file system, and network metrics.

Sysdig Legacy IDPrometheus ID
cpu.cores.usedsysdig_host_cpu_cores_used
sysdig_container_cpu_cores_used
sysdig_program_cpu_cores_used
cpu.cores.used.percentsysdig_host_cpu_cores_used_percent
sysdig_container_cpu_cores_used_percent
sysdig_program_cpu_cores_used_percent
cpu.idle.percentsysdig_host_cpu_idle_percent
cpu.iowait.percentsysdig_host_cpu_iowait_percent
cpu.nice.percentsysdig_host_cpu_nice_percent
cpu.stolen.percentsysdig_host_cpu_stolen_percent
cpu.system.percentsysdig_host_cpu_system_percent
cpu.used.percentsysdig_host_cpu_used_percent
sysdig_container_cpu_used_percent
sysdig_program_cpu_used_percent
cpu.user.percentsysdig_host_cpu_user_percent
load.average.percpu.1msysdig_host_load_average_percpu_1m
load.average.percpu.5msysdig_host_load_average_percpu_5m
load.average.percpu.15msysdig_host_load_average_percpu_15m
memory.bytes.availablesysdig_host_memory_available_bytes
memory.bytes.totalsysdig_host_memory_total_bytes
memory.bytes.usedsysdig_host_memory_used_bytes
sysdig_container_memory_used_bytes
sysdig_program_memory_used_bytes
memory.bytes.virtualsysdig_host_memory_virtual_bytes
sysdig_container_memory_virtual_bytes
sysdig_program_memory_virtual_bytes
memory.pageFault.majorNone
memory.pageFault.minorNone
memory.swap.bytes.availablesysdig_host_memory_swap_available_bytes
memory.swap.bytes.totalsysdig_host_memory_swap_total_bytes
memory.swap.bytes.usedsysdig_host_memory_swap_used_bytes
memory.swap.used.percentsysdig_host_memory_swap_used_percent
memory.used.percentsysdig_host_memory_used_percent
sysdig_container_memory_used_percent
sysdig_program_memory_used_percent
file.bytes.insysdig_host_file_in_bytes
sysdig_container_file_in_bytes
sysdig_program_file_in_bytes
file.bytes.outsysdig_host_file_out_bytes
sysdig_container_file_out_bytes
sysdig_program_file_out_bytes
file.bytes.totalsysdig_host_file_bytes_total
sysdig_container_file_bytes_total
sysdig_program_file_bytes_total
file.iops.insysdig_host_file_in_iops
sysdig_container_file_in_iops
file.iops.outsysdig_host_file_out_iops
sysdig_container_file_out_iops
file.iops.totalsysdig_host_file_iops_total
sysdig_container_file_iops_total
sysdig_program_file_iops_total
file.open.countsysdig_host_file_open_count
sysdig_container_file_open_count
file.time.insysdig_host_file_in_time
sysdig_container_file_in_time
file.time.outsysdig_host_file_out_time
sysdig_container_file_out_time
file.time.totalsysdig_host_file_time_total
sysdig_container_file_time_total
sysdig_program_file_time_total
fs.bytes.freesysdig_host_fs_free_bytes
sysdig_container_fs_free_bytes
sysdig_fs_free_bytes
fs.bytes.totalsysdig_fs_total_bytes
sysdig_host_fs_total_bytes
sysdig_container_fs_total_bytes
fs.bytes.usedsysdig_fs_used_bytes
sysdig_host_fs_used_bytes
sysdig_container_fs_used_bytes
fs.free.percentsysdig_fs_free_percent
sysdig_host_fs_free_percent
sysdig_container_fs_free_percent
fs.inodes.total.countsysdig_fs_inodes_total_count
sysdig_container_fs_inodes_total_count
sysdig_host_fs_inodes_total_count
fs.inodes.used.countsysdig_fs_inodes_used_count
sysdig_container_fs_inodes_used_count
sysdig_host_fs_inodes_used_count
fs.inodes.used.percentsysdig_fs_inodes_used_percent
sysdig_container_fs_inodes_used_percent
sysdig_host_fs_inodes_used_percent
fs.largest.used.percentsysdig_container_fs_largest_used_percent
sysdig_host_fs_largest_used_percent
fs.root.used.percentsysdig_container_fs_root_used_percent
sysdig_host_fs_root_used_percent
fs.used.percentsysdig_fs_used_percent
sysdig_container_fs_used_percent
sysdig_host_fs_used_percent
net.bytes.insysdig_host_net_in_bytes
sysdig_container_net_in_bytes
sysdig_program_net_in_bytes
net.bytes.outsysdig_host_net_out_bytes
sysdig_container_net_out_bytes
sysdig_program_net_out_bytes
net.bytes.totalsysdig_host_net_total_bytes
sysdig_container_net_total_bytes
sysdig_program_net_total_bytes
sysdig_connection_net_total_bytes
proc.countsysdig_host_proc_count
sysdig_container_proc_count
sysdig_program_proc_count
thread.countsysdig_host_thread_count
sysdig_container_thread_count
sysdig_program_thread_count
container.countsysdig_container_count
system.uptimesysdig_host_system_uptime
uptimesysdig_host_up
sysdig_container_up
sysdig_program_up

1.2.2.2 - Additional Metrics Values Available in Troubleshooting

In addition to the extensive set of metrics available in the monitor mode, additional metrics, such as net.sql and net.mongodb, as well as additional segmentations for file and network metrics are available.

Sysdig Legacy IDPrometheus IDAdditional Metrics Values Available When Segmented by
file.error.total.countsysdig_host_file_error_total_count
sysdig_container_file_error_total_count
sysdig_program_file_error_total_count
file.name and file.mount labels
file.bytes.totalsysdig_host_file_total_bytes
sysdig_container_file_total_bytes
sysdig_program_file_total_bytes
file.bytes.insysdig_host_file_in_bytes
sysdig_container_file_in_bytes
sysdig_program_file_in_bytes
file.bytes.outsysdig_host_file_out_bytes
sysdig_container_file_out_bytes
sysdig_program_file_out_bytes
file.open.countsysdig_host_file_open_count
sysdig_container_file_open_count
sysdig_program_file_open_count
file.time.totalsysdig_host_file_total_time
sysdig_container_file_total_time
sysdig_program_file_total_time
host.countNone
host.error.countsysdig_host_syscall_error_count
sysdig_container_syscall_error_count
proc.countsysdig_host_proc_count
sysdig_container_proc_count
sysdig_program_proc_count
proc.start.countNone
net.mongodb.collectionall
net.mongodb.error.countsysdig_host_net_mongodb_error_count
sysdig_container_net_mongodb_error_count
net.mongodb.operation
net.mongodb.request.countsysdig_host_net_mongodb_request_count
sysdig_container_net_mongodb_request_count
net.mongodb.request.timesysdig_host_net_mongodb_request_time
sysdig_container_net_mongodb_request_time
net.sql.queryall
net.sql.error.countsysdig_host_net_sql_error_count
sysdig_container_net_sql_error_count
net.sql.query.type
net.sql.request.countsysdig_host_net_sql_request_count
sysdig_container_net_sql_request_count
net.sql.request.timesysdig_host_net_sql_request_time
sysdig_container_net_sql_request_time
net.sql.table
net.http.error.countsysdig_host_net_http_error_count
sysdig_container_net_http_error_count
net.http.url
net.http.methodNone
net.http.request.countsysdig_host_net_http_request_count
sysdig_container_net_http_request_count
net.http.request.timesysdig_host_net_http_request_time
sysdig_container_net_http_request_time
net.bytes.insysdig_host_net_in_bytes
sysdig_container_net_in_bytes
sysdig_program_net_in_bytes
net.bytes.outsysdig_host_net_out_bytes
sysdig_container_net_out_bytes
sysdig_program_net_out_bytes
net.request.time.worst.outNone
net.request.countsysdig_host_net_request_count
sysdig_container_net_request_count
sysdig_program_net_request_count
net.request.timesysdig_host_net_request_time
sysdig_container_net_request_time
sysdig_program_net_request_time
net.bytes.totalsysdig_host_net_total_bytes
sysdig_container_net_total_bytes
sysdig_program_net_total_bytes
sysdig_connection_net_total_bytes
net.http.request.time.worstall

1.2.2.3 - Metrics Not Available in Essentials Mode

The following metrics will not be reported in the essentials mode as compared to the monitor mode:

Sysdig IDPrometheus IDSegmented By
net.bytes.insysdig_host_net_in_bytes
sysdig_container_net_in_bytes
sysdig_program_net_in_bytes
net.connection.server, net.connection.direction, net.connection.l4proto , and net.connection.client labels
net.bytes.outsysdig_host_net_out_bytes
sysdig_container_net_out_bytes
sysdig_program_net_out_bytes
net.connection.count.totalsysdig_host_net_connection_total_count
sysdig_container_net_connection_total_count
sysdig_program_net_connection_total_count
sysdig_connection_net_connection_total_count
net.connection.count.insysdig_host_net_connection_in_count
sysdig_container_net_connection_in_count
sysdig_program_net_connection_in_count
sysdig_connection_net_connection_in_count
net.connection.count.outsysdig_host_net_connection_out_count
sysdig_container_net_connection_out_count
sysdig_program_net_connection_out_count
sysdig_connection_net_connection_out_count
net.request.countsysdig_host_net_request_count
sysdig_container_net_request_count
sysdig_program_net_request_count
net.request.count.insysdig_host_net_request_in_count
sysdig_container_net_request_in_count
sysdig_program_net_request_in_count
sysdig_connection_net_request_in_count
net.request.count.outsysdig_host_net_request_out_count
sysdig_container_net_request_out_count
sysdig_program_net_request_out_count
sysdig_connection_net_request_out_count
net.request.timesysdig_host_net_request_time
sysdig_container_net_request_time
sysdig_program_net_request_time
net.request.time.insysdig_host_net_time_in_count
sysdig_container_net_time_in_count
sysdig_program_net_time_out_count
sysdig_connection_net_time_in_count
net.request.time.outsysdig_host_net_time_out_count
sysdig_container_net_time_out_count
sysdig_program_net_time_out_count
sysdig_connection_net_time_out_count
net.bytes.totalsysdig_host_net_total_bytes
sysdig_container_net_total_bytes
sysdig_program_net_total_bytes
sysdig_connection_net_total_bytes
net.mongodb.collectionall
net.mongodb.error.countsysdig_host_net_mongodb_error_count
sysdig_container_net_mongodb_error_count
net.mongodb.operation
net.mongodb.request.countsysdig_host_net_mongodb_request_count
sysdig_container_net_mongodb_request_count
net.mongodb.request.timesysdig_host_net_mongodb_request_time
sysdig_container_net_mongodb_request_time
net.sql.queryall
net.sql.error.countsysdig_host_net_sql_error_count
sysdig_container_net_sql_error_count
net.sql.query.type
net.sql.request.countsysdig_host_net_sql_request_count
sysdig_container_net_sql_request_count
net.sql.request.timesysdig_host_net_sql_request_time
sysdig_container_net_sql_request_time
net.sql.table
net.sql.queryall
net.sql.table
net.http.method
net.http.request.countsysdig_host_net_http_request_count
sysdig_container_net_http_request_count
net.http.request.timesysdig_host_net_http_request_time
sysdig_container_net_http_request_time
net.http.statusCode
net.http.url

1.2.3 - Enable HTTP Proxy for Agents

You can configure the agent to allow it to communicate with the Sysdig collector through an HTTP proxy. HTTP proxy is usually configured to offer greater visibility and better management of the network.

Agent Behaviour

The agent can connect to the collector through an HTTP proxy by sending an HTTP CONNECT message and receiving a response. The proxy then initiates a TCP connection to the collector. These two connections form a tunnel that acts like one logical connection.

By default, the agent will encrypt all messages sent through this tunnel. This means that after the initial CONNECT message and response, all the communication on that tunnel is encrypted by SSL end-to-end. This encryption is controlled by the top-level ssl parameter in the agent configuration.

Optionally, the agent can add a second layer of encryption, securing the CONNECT message and response. This second layer of encryption may be desired in the case of HTTP authentication if there is a concern that network packet sniffing could be used to determine the user’s credentials. This second layer of encryption is enabled by setting the ssl parameter to true in the http_proxy section of the agent configuration. See Examples for details.

Configuration

You specify the following parameters at the same level as http_proxy in the dragent.yaml file. These existing configuration options affect the communication between the agent and collector (both with and without a proxy).

  • ssl: Default: true. It is not recommended to change this setting. If set to false, the metrics sent from the agent to the collector are unencrypted.

  • ssl_verify_certificate: Determines whether the agent verifies the SSL certificate sent from the collector (default is true).

The following configuration options affect the behavior of the HTTP Proxy setting. You specify them under the http_proxy heading in the dragent.yaml file.

  • proxy_host: Indicates the hostname of the proxy server. The default is an empty string, which implies communication through an HTTP proxy is disabled.

  • proxy_port: Specifies the port on the proxy server the agent should connect to. The default is 0, which indicates that the HTTP proxy is disabled.

  • proxy_user : Required if HTTP authentication is configured. This option specifies the username for the HTTP authentication. The default is an empty string, which indicates that authentication is not configured.

  • proxy_password : Required if HTTP authentication is configured. This option specifies the password for the HTTP authentication. The default is an empty string. Specifying proxy_user with no proxy_password is allowed.

  • ssl: Default: false. If set to true, the connection between the agent and the proxy server is encrypted.

    Note that this parameter requires the top-level ssl parameter to be enabled, as the agent does not support SSL to the proxy but unencrypted traffic to the collector. This additional security prevents you from misconfiguring the agent assuming the metrics are as well encrypted end-to-end when they are not.

  • ssl_verify_certificate: Determines whether the agent will verify the certificate presented by the proxy.

    This option is configured independently of the top-level ssl_verify_certificate parameter. This option is enabled by default. If the provided certificate is not correct, this option can cause the connection to the proxy server to fail.

  • ca_certificate: The path to the CA certificate for the proxy server. If ssl_verify_certificate is enabled, the CA certificate must be signed appropriately.

Examples

SSL Between Proxy and Collector

In this example, SSL is enabled only between the proxy server and the collector.

collector_port: 6443
ssl: true
ssl_verify_certificate: true
http_proxy:
        proxy_host: squid.yourdomain.com
        proxy_port: 3128

SSL

The following example shows SSL is enabled between the agent and the proxy server as well as between the proxy server and the collector.

collector_port: 6443
ssl: true
http_proxy:
        proxy_host: squid.yourdomain.com
        proxy_port: 3129
        ssl: true
        ssl_verify_certificate: true
        ca_certificate: /usr/proxy/proxy.crt

SSL with Username and Password

The following configuration instructs the agent to connect to a proxy server located at squid.yourdomain.com on port 3128. The agent will request the proxy server to establish an HTTP tunnel to the Sysdig collector at collector-your.sysdigcloud.com on port 6443. The agent will authenticate with the proxy server using the given user and password combination.

collector: collector-your.sysdigcloud.com
collector_port: 6443
http_proxy:
    proxy_host: squid.yourdomain.com
    proxy_port: 3128
    proxy_user: sysdig_customer
    proxy_password: 12345
    ssl: true
    ssl_verify_certificate: true
    ca_certificate: /usr/proxy/proxy_cert.crt

1.2.4 - Filter Data

The dragent.yaml file elements are wide-reaching. This section describes the parameters to edit in dragent.yaml to perform a range of activities:

1.2.4.1 - Blacklist Ports

Use the blacklisted_ports parameter in the agent configuration file to block network traffic and metrics from unnecessary network ports.

Note: Port 53 (DNS) is always blacklisted.

  1. Access the agent configuration file, using one of the options listed.

  2. Add blacklisted_ports with desired port numbers.

    Example (YAML):

    blacklisted_ports:

    - 6443

    - 6379

  3. Restart the agent (if editing dragent.yaml file directly), using either the service dragent restart or docker restart sysdig-agent command as appropriate.

1.2.4.2 - Enable/Disable Event Data

Sysdig Monitor supports event integrations with certain applications by default. The Sysdig agent will automatically discover these services and begin collecting event data from them.

The following applications are currently supported:

  • Docker

  • Kubernetes

Other methods of ingesting custom events into Sysdig Monitor are touched upon in Custom Events.

By default, only a limited set of events is collected for a supported application, and are listed in the agent’s default settings configuration file (/opt/draios/etc/dragent.default.yaml).

To enable collecting other supported events, add an events entry to dragent.yaml.

You can also change log entry in dragent.yaml to filter events by severity.

Learn more about it in the following sections.

Supported Application Events

Events marked with * are enabled by default; see the dragent.default.yaml file.

Docker Events

The following Docker events are supported.

  docker:
    container:
      - attach       # Container Attached      (information)
      - commit       # Container Committed     (information)
      - copy         # Container Copied        (information)
      - create       # Container Created       (information)
      - destroy      # Container Destroyed     (warning)
      - die          # Container Died          (warning)
      - exec_create  # Container Exec Created  (information)
      - exec_start   # Container Exec Started  (information)
      - export       # Container Exported      (information)
      - kill         # Container Killed        (warning)*
      - oom          # Container Out of Memory (warning)*
      - pause        # Container Paused        (information)
      - rename       # Container Renamed       (information)
      - resize       # Container Resized       (information)
      - restart      # Container Restarted     (warning)
      - start        # Container Started       (information)
      - stop         # Container Stopped       (information)
      - top          # Container Top           (information)
      - unpause      # Container Unpaused      (information)
      - update       # Container Updated       (information)
    image:
      - delete # Image Deleted  (information)
      - import # Image Imported (information)
      - pull   # Image Pulled   (information)
      - push   # Image Pushed   (information)
      - tag    # Image Tagged   (information)
      - untag  # Image Untaged  (information)
    volume:
      - create  # Volume Created    (information)
      - mount   # Volume Mounted    (information)
      - unmount # Volume Unmounted  (information)
      - destroy # Volume Destroyed  (information)
    network:
      - create     # Network Created       (information)
      - connect    # Network Connected     (information)
      - disconnect # Network Disconnected  (information)
      - destroy    # Network Destroyed     (information)

Kubernetes Events

The following Kubernetes events are supported.

  kubernetes:
    node:
      - TerminatedAllPods       # Terminated All Pods      (information)
      - RegisteredNode          # Node Registered          (information)*
      - RemovingNode            # Removing Node            (information)*
      - DeletingNode            # Deleting Node            (information)*
      - DeletingAllPods         # Deleting All Pods        (information)
      - TerminatingEvictedPod   # Terminating Evicted Pod  (information)*
      - NodeReady               # Node Ready               (information)*
      - NodeNotReady            # Node not Ready           (information)*
      - NodeSchedulable         # Node is Schedulable      (information)*
      - NodeNotSchedulable      # Node is not Schedulable  (information)*
      - CIDRNotAvailable        # CIDR not Available       (information)*
      - CIDRAssignmentFailed    # CIDR Assignment Failed   (information)*
      - Starting                # Starting Kubelet         (information)*
      - KubeletSetupFailed      # Kubelet Setup Failed     (warning)*
      - FailedMount             # Volume Mount Failed      (warning)*
      - NodeSelectorMismatching # Node Selector Mismatch   (warning)*
      - InsufficientFreeCPU     # Insufficient Free CPU    (warning)*
      - InsufficientFreeMemory  # Insufficient Free Mem    (warning)*
      - OutOfDisk               # Out of Disk              (information)*
      - HostNetworkNotSupported # Host Ntw not Supported   (warning)*
      - NilShaper               # Undefined Shaper         (warning)*
      - Rebooted                # Node Rebooted            (warning)*
      - NodeHasSufficientDisk   # Node Has Sufficient Disk (information)*
      - NodeOutOfDisk           # Node Out of Disk Space   (information)*
      - InvalidDiskCapacity     # Invalid Disk Capacity    (warning)*
      - FreeDiskSpaceFailed     # Free Disk Space Failed   (warning)*
    pod:
      - Pulling           # Pulling Container Image          (information)
      - Pulled            # Ctr Img Pulled                   (information)
      - Failed            # Ctr Img Pull/Create/Start Fail   (warning)*
      - InspectFailed     # Ctr Img Inspect Failed           (warning)*
      - ErrImageNeverPull # Ctr Img NeverPull Policy Violate (warning)*
      - BackOff           # Back Off Ctr Start, Image Pull   (warning)
      - Created           # Container Created                (information)
      - Started           # Container Started                (information)
      - Killing           # Killing Container                (information)*
      - Unhealthy         # Container Unhealthy              (warning)
      - FailedSync        # Pod Sync Failed                  (warning)
      - FailedValidation  # Failed Pod Config Validation     (warning)
      - OutOfDisk         # Out of Disk                      (information)*
      - HostPortConflict  # Host/Port Conflict               (warning)*
    replicationController:
      - SuccessfulCreate    # Pod Created        (information)*
      - FailedCreate        # Pod Create Failed  (warning)*
      - SuccessfulDelete    # Pod Deleted        (information)*
      - FailedDelete        # Pod Delete Failed  (warning)*

Enable/Disable Events Collection with events Parameter

To customize the default events collected for a specific application (by either enabling or disabling events), add an events entry to dragent.yaml as described in the examples below.

An entry in a section in dragent.yaml overrides the entire section in the default configuration.

For example, the Pulling entry below will permit only kubernetes pod Pulling events to be collected and all other kubernetes pod events settings in dragent.default.yaml will be ignored.

However, other kubernetes sections - node and replicationController - remain intact and will be used as specified in dragent.default.yaml.

Example 1: Collect Only Certain Events

Collect only ‘Pulling’ events from Kubernetes for pods:

events:
  kubernetes:
    pod:
       - Pulling

Example 2: Disable All Events in a Section

To disable all events in a section, set the event section to none:

events:
  kubernetes: none
  docker: none

Example 3: Combine Methods

These methods can be combined. For example, disable all kubernetes node and docker image events and limit docker container events to [attach, commit, copy] (components events in other sections will be collected as specified by default):

events:
  kubernetes:
    node: none
  docker:
    image: none
    container:
      - attach
      - commit
      - copy

Note: Format Sequences as List or Single Line

In addition to bulleted lists, sequences can also be specified in a bracketed single line, eg.:

events:
  kubernetes:
    pod: [Pulling, Pulled, Failed]

So, the following two settings are equivalent, permitting only Pulling, Pulled, Failed events for pods to be emitted:

events:
  kubernetes:
    pod: [Pulling, Pulled, Failed]

events:
  kubernetes:
    pod:
      - Pulling
      - Pulled
      - Failed

Change Event Collection by Severity with log Parameter

Events are limited globally at the agent level based on severity, using the log settings in dragent.yaml.

The default setting for the events severity filter is information (only warning and higher severity events are transmitted).

Valid severity levels are: none, emergency, alert, critical, error, warning, notice, information, debug.

Example 1: Block Low-Severity Messages

Block all low-severity messages (notice, information, debug):

log:
  event_priority: warning

Example 2: Block All Event Collection

Block all event collection:

log:
  event_priority: none

For other uses of the log settings see Optional: Change the Agent Log Level.

1.2.4.3 - Include/Exclude Custom Metrics

It is possible to filter custom metrics in the following ways:

  • Ability to include/exclude custom metrics using configurable patterns,

  • Ability to log which custom metrics are exceeding limits

After you identify those key custom metrics that must be received, use the new ‘include’ and ’exclude’ filtering parameters to make sure you receive them before the metrics limit is hit.

Filter Metrics Example

Here is an example configuration entry that would be put into the agent config file: (/opt/draios/etc/dragent.yaml)

metrics_filter:
  - include: test.*
  - exclude: test.*
  - include: haproxy.backend.*
  - exclude: haproxy.*
  - exclude: redis.*

Given the config entry above, this is the action for these metrics:

test.* → send

haproxy.backend.request → send

haproxy.frontend.bytes → drop

redis.keys → drop

The semantic is: whenever the agent is reading metrics, they are filtered according to configured filters and the filtering rule order - the first rule that matches will be applied. Thus since the inclusion item for test.* was listed first it will be followed and that second ’exclude’ rule for the same exact metric entry will be ignored.

Logging Accepted/Dropped Metrics

Logging is disabled by default. You can enable logging to see which metrics are accepted or dropped by adding the following configuration entry into the dragent.yaml config file:

metrics_excess_log: true

When logging of excess metrics is enabled, logging occurs at INFO-level, every 30 seconds and lasts for 10 seconds. The entries that can be seen in /opt/draios/logs/draios.log will be formatted like this:

+/-[type] [metric included/excluded]: metric.name (filter: +/-[metric.filter])

The first ‘+’ or ‘-’, followed by ’type’ provides an easy way to quickly scan the list of metrics and spot which are included or excluded (’+’ means “included”, ‘-’ means “excluded”).

The second entry specifies metric type (“statsd”, “app_check”, “service_check”, or “jmx”).

A third entry spells out whether “included” or “excluded”, followed by the metric name. Finally, inside the last entry (in parentheses), there is information about filter applied and its effect (’+’ or ‘-’, meaning “include” or “exclude”).

With this example filter rule set:

metrics_filter:
  - include: mongo.statsd.net*
  - exclude: mongo.statsd.*

We might see the following INFO-level log entries (timestamps stripped):

-[statsd] metric excluded: mongo.statsd.vsize (filter: -[mongo.statsd.*])
+[statsd] metric included: mongo.statsd.netIn (filter: +[mongo.statsd.net*])

1.2.4.4 - Prioritize Designated Containers

To get the most out of Sysdig Monitor, you may want to customize the way in which container data is prioritized and reported. Use this page to understand the default behavior and sorting rules, and to implement custom behavior when and where you need it. This can help reduce agent and backend load by not monitoring unnecessary containers, or– if encountering backend limits for containers– you can filter to ensure that the important containers are always reported.

Overview

By default, a Sysdig agent will collect metrics from all containers it detects in an environment. When reporting to the Monitor interface, it uses default sorting behavior to prioritize what container information to display first.

Understand Default Behavior

Out of the box, it chooses the containers with the highest

  • CPU

  • Memory

  • File IO

  • Net IO

and allocates approximately 1/4 of the total limit to each stat type.

Understand Simple Container Filtering

As of agent version 0.86, it is possible set a use_container_filter parameter in the agent config file, tag/label specific containers, and set include/exclude rules to push those containers to the top of the reporting hierarchy.

This is an effective sorting tool when:

  • You can manually mark each container with an include or exclude tag, AND

  • The number of includes is small (say, less than 100)

In this case, the containers that explicitly match the include rules will take top priority.

Understand Smart Container Reporting

In some enterprises, the number of containers is too high to tag with simple filtering rules, and/or the include_all group is too large to ensure that the most-desired containers are consistently reported. As of Sysdig agent version 0.91, you can append another parameter to the agent config file, smart_container_reporting.

This is an effective sorting tool when:

  • The number of containers is large and you can’t or won’t mark each one with include/exclude tags, AND

  • There are certain containers you would like to always prioritize

This helps ensure that even when there are thousands of containers in an environment, the most-desired containers are consistently reported.

Container filtering and smart container reporting affect the monitoring of all the processes/metrics within a container, including StatsD, JMX, app-checks, and built-in metrics.

Prometheus metrics are attached to processes, rather than containers, and are therefore handled differently.

The container limit is set in dragent.yaml under containers:limit:

Understand Sysdig Aggregated Container

The sydig_aggregated parameter is automatically activated when smart container reporting is enabled, to capture the most-desired metrics from the containers that were excluded by smart filtering and report them under a single entity. It appears like any other container in the Sysdig Monitor UI, with the name “sysdig_aggregated.

Sysdig_aggregated can report on a wide array of metrics; see Sysdig_aggregated Container Metrics. However, because this is not a regular container, certain limitations apply:

  • container_id and container_image do not exist.

  • The aggregated container cannot be segmented by certain metrics that are excluded, such as process.

  • Some default dashboards associated with the aggregated container may have some empty graphs.

Use Simple Container Filtering

By default, the filtering feature is turned off. It can be enabled by adding the following line to the agent configuration:

  • use_container_filter: true

When enabled, the agent will follow include/exclude filtering rules based on:

  • container image

  • container name

  • container label

  • Kubernetes annotation or label

The default behavior in default.dragent.yaml excludes based on a container label (com.sysdig.report) and/or a Kubernetes pod annotation (.sysdig.com/report ).

Container Condition Parameters and Rules

Parameters

The condition parameters are described in the following table:

Pattern name

Description

Example

container.image

Matches if the process is running inside a container running the specified image

- include:

container.image: luca3m/prometheus-java-app

container.name

Matches if the process is running inside a container with the specified name

- include:

container.name: my-java-app

container.label.*

Matches if the process is running in a container that has a Label matching the given value

- include:

container.label.class: exporter

kubernetes.<object>.annotation.* kubernetes.<object>.label.*

Matches if the process is attached to a Kubernetes object (Pod, Namespace, etc.) that is marked with the Annotation/Label matching the given value.

- include:

kubernetes.pod.annotation.prometheus.io/scrape: true

all

Matches all. Use as last rule to determine default behavior.

- include:

all

Rules

Once enabled (when use_container_filter: true is set), the agent will follow filtering rules from the container_filter section.

  • Each rule is an include or exclude rule which can contain one or more conditions.

  • The first matching rule in the list will determine if the container is included or excluded.

  • The conditions consist of a key name and a value. If the given key for a container matches the value, the rule will be matched.

  • If a rule contains multiple conditions they all need to match for the rule to be considered a match.

Default Configuraton

The dragent.default.yaml contains the following default configuration for container filters:

use_container_filter: false

container_filter:
  - include:
      container.label.com.sysdig.report: true
  - exclude:
      container.label.com.sysdig.report: false
  - include:
      kubernetes.pod.annotation.sysdig.com/report: true
  - exclude:
      kubernetes.pod.annotation.sysdig.com/report: false
  - include:
        all

Note that it excludes via a container.label and by a kubernetes.pod.annotation.

The examples on this page show how to edit in the dragent.yaml file directly. Convert the examples to Docker or Helm commands, if applicable for your situation.

Enable Container Filtering in the Agent Config File

Option 1: Use the Default Configuration

To enable container filtering using the default configuration in default.dragent.yaml (above), follow the steps below.

1. Apply Labels and/or Annotations to Designated Containers

To set up, decide which containers should be excluded from automatic monitoring.

Apply the container label .com.sysdig.report and/or the Kubernetes pod annotation sysdig.com/report to the designated containers.

2. Edit the Agent Configuration

Add the following line to dragent.yaml to turn on the default functionality:

use_container_filter: true

Option 2: Define Your Own Rules

You can also edit dragent.yaml to apply your own container filtering rules.

1. Designate Containers

To set up, decide which containers should be excluded from automatic monitoring.

Note the image, name, label, or Kubernetes pod information as appropriate, and build your rule set accordingly.

2. Edit the Agent Configuration

For example:

use_container_filter: true

container_filter:
  - include:
      container.name: my-app
  - include:
      container.label.com.sysdig.report: true
  - exclude:
      kubernetes.namespace.name: kube-system
      container.image: "gcr.io*"
  - include:
      all

The above example shows a container_filter with 3 include rules and 1 exclude rule.

  • If the container name is “my-app” it will be included.

  • Likewise, if the container has a label with the key “com.sysdig.report” and with the value “true”.

  • If neither of those rules is true, and the container is part of a Kubernetes hierarchy within the “kube-system” namespace and the container image starts with “gcr.io”, it will be excluded.

  • The last rule includes all, so any containers not matching an earlier rule will be monitored and metrics for them will be sent to the backend.

Use Smart Container Reporting

As of Sysdig agent version 0.91, you can add another parameter to the config file: smart_container_reporting = true

This enables several new prioritization checks:

  • container_filter (you would enable and set include/exclude rules, as described above)

  • container age

  • high stats

  • legacy patterns

The sort is modified with the following rules in priority order:

  1. User-specified containers come before others

  2. Containers reported previously should be reported before those which have never been reported

  3. Containers with higher usage by each of the 4 default stats should come before those with lower usage

Enable Smart Container Reporting and sysdig_aggregated

  1. Set up any simple container filtering rules you need, following either Option 1 or Option 2, above.

  2. Edit the agent configuration:

    smart_container_reporting: true
    
  3. This turns on both smart_container_reporting and sysdig_aggregated. The changes will be visible in the Sysdig Monitor UI.

    See also Sysdig_aggregated Container Metrics..

Logging

When the log level is set to DEBUG, the following messages may be found in the logs:

messagemeaning
container <id>, no filter configuredcontainer filtering is not enabled
container <id>, include in reportcontainer is included
container <id>, exclude in reportcontainer is excluded
Not reporting thread <thread-id> in container <id>Process thread is excluded

See also: Optional: Change the Agent Log Level.

1.2.4.4.1 - Sysdig Aggregated Container Metrics

Sysdig_aggregated containers can report on the following metrics:

  • tcounters

    • other

      • time_ns

      • time_percentage

      • count

    • io_file

      • time_ns_in

      • time_ns_out

      • time_ns_other

      • time_percentage_in

      • time_percentage_out

      • time_percentage_other

      • count_in

      • count_out

      • count_other

      • bytes_in

      • bytes_out

      • bytes_other

    • io_net

      • time_ns_in

      • time_ns_out

      • time_ns_other

      • time_percentage_in

      • time_percentage_out

      • time_percentage_other

      • count_in

      • count_out

      • count_other

      • bytes_in

      • bytes_out

      • bytes_other

    • processing

      • time_ns

      • time_percentage

      • count

  • reqcounters

    • other

      • time_ns

      • time_percentage

      • count

    • io_file

      • time_ns_in

      • time_ns_out

      • time_ns_other

      • time_percentage_in

      • time_percentage_out

      • time_percentage_other

      • count_in

      • count_out

      • count_other

      • bytes_in

      • bytes_out

      • bytes_other

    • io_net

      • time_ns_in

      • time_ns_out

      • time_ns_other

      • time_percentage_in

      • time_percentage_out

      • time_percentage_other

      • count_in

      • count_out

      • count_other

      • bytes_in

      • bytes_out

      • bytes_other

    • processing

      • time_ns

      • time_percentage

      • count

  • max_transaction_counters

    • time_ns_in

    • time_ns_out

    • count_in

    • count_out

  • resource_counters

    • connection_queue_usage_pct

    • fd_usage_pct

    • cpu_pct

    • resident_memory_usage_kb

    • swap_memory_usage_kb

    • major_pagefaults

    • minor_pagefaults

    • fd_count

    • cpu_shares

    • memory_limit_kb

    • swap_limit_kb

    • count_processes

    • proc_start_count

    • threads_count

  • syscall_errors

    • count

    • count_file

    • count_file_opened

    • count_net

  • protos

    • http

      • server_totals

        • ncalls

        • time_tot

        • time_max

        • bytes_in

        • bytes_out

        • nerrors

      • client_totals

        • ncalls

        • time_tot

        • time_max

        • bytes_in

        • bytes_out

        • nerrors

    • mysql

      • server_totals

        • ncalls

        • time_tot

        • time_max

        • bytes_in

        • bytes_out

        • nerrors

      • client_totals

        • ncalls

        • time_tot

        • time_max

        • bytes_in

        • bytes_out

        • nerrors

    • postgres

      • server_totals

        • ncalls

        • time_tot

        • time_max

        • bytes_in

        • bytes_out

        • nerrors

      • client_totals

        • ncalls

        • time_tot

        • time_max

        • bytes_in

        • bytes_out

        • nerrors

    • mongodb

      • server_totals

        • ncalls

        • time_tot

        • time_max

        • bytes_in

        • bytes_out

        • nerrors

      • client_totals

        • ncalls

        • time_tot

        • time_max

        • bytes_in

        • bytes_out

        • nerrors

  • names

  • transaction_counters

    • time_ns_in

    • time_ns_out

    • count_in

    • count_out

1.2.4.5 - Include/Exclude Processes

In addition to filtering data by container, it is also possible to filter independently by process. Broadly speaking, this refinement helps ensure that relevant data is reported while noise is reduced. More specifically, use cases for process filtering may include: 

  • Wanting to alert reliably whenever a given process goes down.  The total number of processes can exceed the reporting limit; when that happens, some processes are not reported. In this case, an unreported process could be misinterpreted as being “down.” Specify a filter for 30-40 processes to guarantee that they will always be reported.

  • Wanting to limit the number of noisy but inessential processes being reported, for example: sed, awk, grep, and similar tools that may be used infrequently.

  • Wanting to prioritize workload-specific processes, perhaps from integrated applications such as NGINX, Supervisord or PHP-FPM.

Note that you can report on processes and containers independently; the including/excluding of one does not affect the including/excluding of the other.

Prerequisites_Processes

This feature requires the following Sysdig  component versions: 

  • Sysdig agent version 0.91 or higher

  • For on-premises installations: version 3.2.0.2540 or higher

Understand Process Filtering Behavior

By default, processes are reported according to internal criteria such as resource usage (CPU/memory/file and net IO) and container count.

If you choose to enable process filtering, processes in the include list will be given preference over other internal criteria.

Processes are filtered based on a standard priority filter description already used in Sysdig yaml files. It is comprised of -include and -exclude statements which are matched in order, with evaluation ceasing with the first matched statement. Statements are considered matched if EACH of the conditions in the statement is met.

Use Process Filtering

Edit dragent.yaml per the following patterns to implement the filtering you need.

Process Condition Parameters and Rules

The process: condition parameters and rules are described below.

NameValueDescription
app_checks_always_send:true/falseLegacy config that causes the agent to emit any process with app check. With process filtering, this translates to an extra “include” clause at the head of the process filter which matches a process with any app check, thereby overriding any exclusions. Still subject to limit.
flush_filter:Definition of process filter to be used if flush_filter_enabled == true. Defaults to -include all
flush_filter_enabled:true/falseDefaults to false (default process reporting behavior). Set to true to use the rest of the process filtering options.
limit:N (chosen number)Defines the approximate limit of processes to emit to the backend, within 10 processes or so. Default is 250 processes.
top_n_per_container:N (chosen number)Defines how many of the top processes per resource category per emitted container to report after included processes. Still subject to limit. Defaults to 1.
top_n_per_host:N (chosen number)Defines how many of the top processes per resource category per host are reported before included processes. Still subject to limit. Defaults to 1.

The process: Condition Parameters

Rules

  • container.image: my_container_image  Validates whether the container image associated with the process is a wild card match of the provided image name

  • container.name: my_container_name  Validates whether the container name associated with the process is a wild card match of the provided image name

  • container.label.XYZ: value  Validates whether the label XYZ of the container associated with the process is a wildcard match of the provided value

  • process.name: my_process_name  Validates whether the name of the process is a wild card match of the provided value

  • process.cmdline: value  Checks whether the executable name of a process contains the specified value, or any argument to the process is a wildcard match of the provided value

  • appcheck.match: value  Checks whether the process has any appcheck which is a wildcard match of the given value

  • all  Matches all processes, but does not whitelist them, nor does it blacklist them. If no filter is provided, the default is -include all. However, if a filter is provided and no match is made otherwise, then all unmatched processes will be blacklisted. In most cases, the definition of a process filter should end with -include: all.

Examples

Block All Processes from a Container

Block all processes from a given container. No processes from some_container_name will be reported.

process:
  flush_filter_enabled: true
  flush_filter:
  - exclude:
      container.name: some_container_name
  - include:
      allprocess:   flush_filter: - exclude: container.name: some_container_name - include: all

Prioritize Processes from a Container

Send all processes from a given container at high priority.

process:
  flush_filter_enabled: true
  flush_filter:
    - include:
        container.name: some_container_name
    - include:
        all

Prioritize “java” Processes

Send all processes that contain ‘java" in the name at high priority.

process:
  flush_filter_enabled: true
  flush_filter:
    - include:
        process.name: java
    - include:
        all

Prioritize “java” Processes from a Particular Container

Send processes containing “java” from a given container at high priority.

process:
  flush_filter_enabled: true
  flush_filter:
    - include:
        container.name: some_container_name
        process.name: java
    - include:
        all

Prioritize “java” Processes not in a Particular Container

Send all processes that contain “java” in the name that are not in container some_container_nane.

process:
  flush_filter_enabled: true
  flush_filter:
    - exclude:
        container.name: some_container_name
    - include:
        process.name: java
    - include:
        all

Prioritize “java” Processes even from an Excluded Container

Send all processes containing “java” in the name. If a process does not contain “java” in the name and if the container within which the process runs is named  some_container_name,  then exclude it.

Note that each include/exclude rule is handled sequentially and hierarchically so that even if the container is excluded, it can still report “java” processes.

flush_filter:
   - flush_filter_enabled: true
   - include:
       process.name: java
   - exclude:
      container.name: some_container_name
   - include:
      all

Prioritize “java” Processes and “sql” Processes from Different Containers

Send Java processes from one container and SQL processes from another at high priority.

process:
  flush_filter:
    - flush_filter_enabled: true
    - include:
       container.name: java_container_name
       process.name: java
    - include
       container.name: sql_container_name
       process.name: sql
    - include
        all

Report ONLY Processes in a Particular Container

Only send processes running in a container with a given label.

process:
  flush_filter:
     - flush_filter_enabled: true
     - include:
 =container.label.report_processes_from_this_container_example_label: true
     - exclude:
         all

1.2.4.6 - Collect Metrics from Remote File Systems

Sysdig agent does not automatically discover and collect metrics from external file systems, such as NFS, by default. To enable collecting these metrics, add the following entry to the dragent.yaml file:

remotefs = true

In addition to the remote file systems, the following mount types are also excluded because they cause high load.

mounts_filter:
  - exclude: "*|autofs|*"
  - exclude: "*|proc|*"
  - exclude: "*|cgroup|*"
  - exclude: "*|subfs|*"
  - exclude: "*|debugfs|*"
  - exclude: "*|devpts|*"
  - exclude: "*|fusectl|*"
  - exclude: "*|mqueue|*"
  - exclude: "*|rpc_pipefs|*"
  - exclude: "*|sysfs|*"
  - exclude: "*|devfs|*"
  - exclude: "*|devtmpfs|*"
  - exclude: "*|kernfs|*"
  - exclude: "*|ignore|*"
  - exclude: "*|rootfs|*"
  - exclude: "*|none|*"
  - exclude: "*|tmpfs|*"
  - exclude: "*|pstore|*"
  - exclude: "*|hugetlbfs|*"
  - exclude: "*|*|/etc/resolv.conf"
  - exclude: "*|*|/etc/hostname"
  - exclude: "*|*|/etc/hosts"
  - exclude: "*|*|/var/lib/rkt/pods/*"
  - exclude: "overlay|*|/opt/stage2/*"
  - exclude: "/dev/mapper/cl-root*|*|/opt/stage2/*"
  - exclude: "*|*|/dev/termination-log*"
  - include: "*|*|/var/lib/docker"
  - exclude: "*|*|/var/lib/docker/*"
  - exclude: "*|*|/var/lib/kubelet/pods/*"
  - exclude: "*|*|/run/secrets"
  - exclude: "*|*|/run/containerd/*"
  - include: "*|*|*"

To include a mount type:

  1. Open the dragent.yaml file.

  2. Remove the corresponding line from the exclude list in the mount_filter.

  3. Add the file mount to the include list under mount_filter .

    The format is:

    # format of a mount filter is:
    # ```
    # mounts_filter:
    #   - exclude: "device|filesystem|mount_directory"
    #   - include: "pattern1|pattern2|pattern3"
    

    For example:

    mounts_filter:
      - include: "*|autofs|*"mounts_filter:
      - include: "overlay|*|/opt/stage2/*"
      - include: "/dev/mapper/cl-root*|*|/opt/stage2/*"
    
  4. Save the configuration changes and restart the agent.

1.2.4.7 - Disable Captures

Sometimes, security requirements dictate that capture functionality should NOT be triggered at all (for example, PCI compliance for payment information).

To disable Captures altogether:

  1. Access using one of the options listed.

    This example accesses dragent.yaml directly. ``

  2. Set the parameter:

    sysdig_capture_enabled: false
    
  3. Restart the agent, using the command: ``

    service dragent restart
    

See Captures for more information on the feature

1.2.5 - Reduce Memory Consumption in Agent

Sysdig provides a configuration option called thin cointerface to reduce the memory footprint in the agent. When the agent is installed as a Kubernetes daemonset, you can optionally enable the thin cointerface in the sysdig-agent configmap.

Pros

  • Reduces memory consumption
  • Particularly useful on very large Kubernetes clusters (>10,000 pods)

Cons

  • Less frequently used option which is therefore less battle-tested
  • If a watch is dropped and re-list is required (e.g. in case of a network issue, and apiserver update, etc.), there is no cache to maintain the resources. In this case, the agent must process many additional events.

How It Works

In a typical Kubernetes cluster, two instances of agent daemonset are installed to retrieve the data. They are automatically connected to the Kubernetes API server to retrieve the metadata associated with the entities running on the cluster and sends the global Kubernetes state to the Sysdig backend. Sysdig uses this data to generate kube state metrics.

A delegated agent will not have a higher CPU or memory footprint than a non-delegated agent.

On very large Kubernetes clusters (in the range of 10,000 pods) or clusters with several replication controllers, the agent’s data ingestion can have a significant memory footprint on itself and on the Kubernetes API server. Thin cointerface is provided to reduce this impact.

Enabling this option changes the way the agent communicates with the API server and reduces the need to cache data, which in turn reduces the overall memory usage. Thin cointerface does this by moving some processing from the agent’s cointerface process to the dragent process. This change does not alter the data which is ultimately sent to the backend nor will it impact any Sysdig feature.

The thin cointerface feature is disabled by default.

To Enable:

Add the following in either the sysdig-agent’s configmap or via the dragent.yaml file:

thin_cointerface_enabled: true

1.2.6 - Enable Kube State Metrics

Agent Versions 12.5.0 and Onward

HPA kube state metrics are no longer collected by default. To enable the agent to collect HPA kube state metrics, you must edit the agent configuration file, dragent.yaml, and include it along with the other resources you would like to collect.

For example, to collect all supported resources including HPAs, add the following to dragent.yaml:

k8s_extra_resources:
    include:
      - services
      - resourcequotas
      - persistentvolumes
      - persistentvolumeclaims
      - horizontalpodautoscalers

Agent Versions 12.3.x and 12.4.x

The Sysdig agent collects HPA, PVS, PV, Resourcequota, and Services kube state metrics by default.

To disable some of them, you must edit the agent config file, dragent.yaml, as follows:

k8s_extra_resources:
    include:
      - services
      - resourcequotas
      - persistentvolumes
      - persistentvolumeclaims
      - horizontalpodautoscalers

The above list includes all the supported resources so you must remove the resources you are not interested in. For example, if you wanted to disable Services, it should look like the following:

k8s_extra_resources:
    include:
      - resourcequotas
      - persistentvolumes
      - persistentvolumeclaims
      - horizontalpodautoscalers

For more information, see Understanding the Agent Configuration Files.

Enable PVC Metrics

In addition to the agent configuration detailed in this section, PVC metrics collection requires additional settings. See Configure PVC for more information.

1.2.7 - Process Kubernetes Events

Use Go to Process Kubernetes Events

Required: Sysdig agent version 92.1 or higher.

As of agent version 9.5.0, go_k8s_user_events:true is the default setting. Set to false to use the older, C++-based version.

To streamline Sysdig agent processing times and reduce CPU load, you can use an updated processing engine written in Go.

To do so, edit the following code in dragent.yaml:

go_k8s_user_events: true

Kubernetes Audit Events

The agent listens on /k8s-audit for Kubernetes audit events. Configure the path using the following configuration option:

security:{k8s_audit_server_path_uris: [path1, path2]}

For more information, see Kubernetes Audit Logging.

Working with containerd in K3S

If you have containerd using a custom socket, you can specify this parameter in the agent configuration to correctly capture the containers’ metadata:

cri:
  socket_path: /run/k3s/containerd/containerd.sock

1.2.8 - Manage Agent Log Levels

Sysdig allows you to configure file log levels for agents globally and granularly.

1.2.8.1 - Change Agent Log Level Globally

The Sysdig agent generates log entries in /opt/draios/logs/draios.log. The agent will rotate the log file when it reaches 10MB in size, keeping the 10 most recent log files archived with a date-stamp appended to the filename.

In order of increasing detail, the log levels available are: [ none | critical| error | warning |notice | info | debug | trace ].

The default level (info) creates an entry for each aggregated metrics transmission to the backend servers, once per second, in addition to entries for any warnings and errors.

Setting the value lower than info may prohibit troubleshooting agent-related issues.

The type and amount of logging can be changed by adding parameters and log level arguments shown below to the agent’s user settings configuration file here:

/opt/draios/etc/dragent.yaml

After editing the dragent.yaml file, restart the agent at the shell with: service dragent restart to affect changes.

Note that dragent.yaml code can be written in both YAML and JSON. The examples below use YAML.

File Log Level

When troubleshooting agent behavior, increase the logging to debug for full detail:

log:
  file_priority: debug

If you wish to reduce log messages going to the /opt/draios/logs/draios.log file, add the log: parameter with one of the following arguments under it and indented two spaces: [ none | error | warning | info | debug | trace ]

log:
  file_priority: error

Container Console Logging

If you are running the containerized agent, you can also reduce container console output by adding the additional parameter console_priority: with the same arguments [ none | error | warning | info | debug | trace ]

log:
  console_priority: warning

Note that troubleshooting a host with less than the default ‘info’ level will be more difficult or not possible. You should revert to ‘info’ when you are done troubleshooting the agent.

A level of ’error’ will generate the fewest log entries, a level of ’trace’ will give the most, ‘info’ is the default if no entry exists.

Examples

Using HELM


helm install ... \
  --set sysdig.settings.log.file_priority=debug \
  --set sysdig.settings.log.console_priority=debug

Using values.yaml

sysdig:
  settings:
    log:
      file_priority: debug
      console_priority: debug

Using dragent.yaml

customerid: 831f3-Your-Access-Key-9401
tags: local:sf,acct:eng,svc:websvr
log:
 file_priority: warning
 console_priority: info

OR

customerid: 831f3-Your-Access-Key-9401
tags: local:sf,acct:eng,svc:websvr
log: { file_priority: debug, console_priority: debug }

Using Docker Run Command

If you are using the “ADDITIONAL_CONF” parameter to start a Docker containerized agent, you would specify this entry in the Docker run command:

-e ADDITIONAL_CONF="log:  { file_priority: error, console_priority: none }"
-e ADDITIONAL_CONF="log:\n  file_priority: error\n  console_priority: none"

Using deamonset.yaml in Kubernetes Infrastructure

When running in a Kubernetes infrastructure (installed using the v1 method, comment in the “ADDITIONAL_CONF” line in the agent sysdig-daemonset.yaml manifest file, and modify as needed:

- name: ADDITIONAL_CONF #OPTIONAL pass additional parameters to the agent
  value: "log:\n file_priority: debug\n console_priority: error"

1.2.8.2 - Manage File Logging for Agent Components

Sysdig Agent provides the ability to set component-wise log levels that override the global file logging level controlled by the file_priority configuration option. The components represent internal software modules and can be found in /opt/draios/logs/draios.log.

By controlling logging at the fine-grained component level, you can avoid excessive logging from certain components in draios.log or enable extra logging from specific components for troubleshooting.

The Agent components can also have an optional feature level logging that can provide a way to control the logging for a particular feature in Sysdig Agent.

To set feature-level or component-level logging:

  1. Determine the agent feature or component you want to set the log level:

    To do so,

    1. Open the /opt/draios/logs/draios.log file.

    2. Copy the component name.

      The format of the log entry is:

      <timestamp>, <<pid>.<tid>>, <log level>, [feature]:<component>[pid]:[line]: <message>
      

      For example, the given snippet from a sample log file shows log messages from promscrape featture, sdjagent, mountedfs_reader, watchdog_runnable, protobuf_file_emitter, connection_manager, and dragent.

      2020-09-07 17:56:01.173, 27979.28018, Information, sdjagent[27980]: Java classpath: /opt/draios/share/sdjagent.jar
      2020-09-07 17:56:01.173, 27979.28018, Information, mountedfs_reader: Starting mounted_fs_reader with pid 27984
      2020-09-07 17:56:01.174, 27979.28019, Information, watchdog_runnable:105: connection_manager starting
      2020-09-07 17:56:01.174, 27979.28019, Information, protobuf_file_emitter:64: Will save protobufs for all message types
      2020-09-07 17:56:01.174, 27979.28019, Information, connection_manager:282: Initiating connection to collector
      2020-09-07 17:56:01.175, 27979.27979, Information, dragent:1243: Created Sysdig inspector
      2020-09-07 18:52:40.065, 27979.27980, Debug,       promscrape:prom_emitter:72: Sent 927 Prometheus metrics of 7297 total
      2020-09-07 18:52:41.129, 27979.27981, Information, promscrape:prom_stats:45: Prometheus timeseries statistics, 5 endpoints
      
  2. To set feature-level logging:

    1. Open /opt/draios/etc/dragent.yaml.

    2. Edit the dragent.yaml file and add the desired feature:

      In this example, you are setting the global level to notice and promscrape feature level to info.

      log:
        file_priority: notice
        file_priority_by_component:
          - "promscrape: info"
      

      The log levels specified for feature override global settings.

  3. To set component-level logging:

    1. Open /opt/draios/etc/dragent.yaml.

    2. Edit the dragent.yaml file and add the desired feature:

      In this example, you are setting the global level to notice and promscrape feature level to info, sdjagent, mountedfs_reader component log level to debug, watchdog_runnable component log level to warning and promscrape:prom_emitter component log level to debug.

      log:
        file_priority: notice
        file_priority_by_component:
          - "promscrape: info"
          - "promscrape:prom_emitter: debug"
          - "watchdog_runnable: warning"
          - "sdjagent: debug"
          - "mountedfs_reader: debug" 
      

      The log levels specified for feature override global settings. The log levels specified for component overide feature and global settings.

  4. Restart the agent.

    For example, if you have installed the agent as a service, then run:

    $ service dragent restart
    

1.2.8.3 - Manage Console Logging for Agent Components

Sysdig Agent provides the ability to set component-wise log levels that override the global console logging level controlled by the console_priority configuration option. The components represent internal software modules and can be found in /opt/draios/logs/draios.log.

By controlling logging at the fine-grained component level, you can avoid excessive logging from certain components in draios.log or enable extra logging from specific components for troubleshooting.

Components can also have an optional feature level logging that can provide a way to control the logging for a particular feature in Sysdig Agent.

Configure Logging

To set feature-level or component-level logging:

  1. Determine the agent component you want to set the log level:

    To do so,

    1. Look at the console output.

      If you’re using an orchestrator like Kubernetes, the log viewer facility, such as the kubectl log command, shows the console log output.

    2. Copy the component name.

      The format of the log entry is:

      <timestamp>, <<pid>.<tid>>, <log level>, [feature]:<component>[pid]:[line]: <message>
      

      For example, the given snippet from a sample log file shows log messages from promscrape featture, sdjagent, mountedfs_reader, watchdog_runnable, protobuf_file_emitter, connection_manager, and dragent.

      2020-09-07 17:56:01.173, 27979.28018, Information, sdjagent[27980]: Java classpath: /opt/draios/share/sdjagent.jar
      2020-09-07 17:56:01.173, 27979.28018, Information, mountedfs_reader: Starting mounted_fs_reader with pid 27984
      2020-09-07 17:56:01.174, 27979.28019, Information, watchdog_runnable:105: connection_manager starting
      2020-09-07 17:56:01.174, 27979.28019, Information, protobuf_file_emitter:64: Will save protobufs for all message types
      2020-09-07 17:56:01.174, 27979.28019, Information, connection_manager:282: Initiating connection to collector
      2020-09-07 17:56:01.175, 27979.27979, Information, dragent:1243: Created Sysdig inspector
      2020-09-07 18:52:40.065, 27979.27980, Debug,       promscrape:prom_emitter:72: Sent 927 Prometheus metrics of 7297 total
      2020-09-07 18:52:41.129, 27979.27981, Information, promscrape:prom_stats:45: Prometheus timeseries statistics, 5 endpoints
      
  2. To set feature-level logging:

    1. Open /opt/draios/etc/dragent.yaml.

    2. Edit the dragent.yaml file and add the desired feature:

      In this example, you are setting the global level to notice and promscrape feature level to info.

      log:
        console_priority: notice
        console_priority_by_component:
          - "promscrape: info"
      

      The log levels specified for feature override global settings.

  3. To set component-level logging:

    1. Open /opt/draios/etc/dragent.yaml.

    2. Edit the dragent.yaml file and add the desired feature:

      In this example, you are setting the global level to notice and promscrape feature level to info, sdjagent, mountedfs_reader component log level to debug, watchdog_runnable component log level to warning and promscrape:prom_emitter component log level to debug.

      log:
        console_priority: notice
        console_priority_by_component:
          - "promscrape: info"
          - "promscrape:prom_emitter: debug"
          - "watchdog_runnable: warning"
          - "sdjagent: debug"
          - "mountedfs_reader: debug" 
      

      The log levels specified for feature override global settings. The log levels specified for component overide feature and global settings.

  4. Restart the agent.

    For example, if you have installed the agent as a service, then run:

    $ service dragent restart
    

Agent Components

  • analyzer: The logs from this component provide information about events and metrics as they come into the system. These logs assist in basic troubleshooting of event flow.

  • connection_manager: This component logs details about the agent’s connection to the Sysdig backend. These logs help diagnose and troubleshoot connectivity issues.

  • security_mgr: These logs describe the security processing steps the agent is taking. Having these logs assists in understanding what the security side of the agent is doing.

  • infrastructure_state: This component interacts with the orchestration runtime to provide a view of the infrastructure. The logs from this component help troubleshoot orchestration issues and communication with the API server.

  • procfs_parser: The agent uses the procfs parser to gather information about the state of the system. These logs provide insight into the functioning of the agent.

  • dragent: These logs provide data about the core functionality of the agent.

  • process_emitter: This component is used to provide data regarding processes running on a host.

  • k8s_parser: The k8s_parser is used as part of the communication with the Kubernetes API server. These logs help debug communication issues.

  • netsec: These logs provide data about the functioning of the netsec component, which provides topology and endpoint security functionality.

  • protocol_handler: This component logs information about the protobufs the agent sends to the Sysdig backend.

  • k8s_deleg: Kubernetes uses the concept of delegated nodes to help reduce cluster load and manage distributed systems. These logs help with troubleshooting issues within the Kubernetes distributed environment.

  • promscrape: Promscrape allows the agent to send prometheus data as custom metrics.

  • cm_socket: The cm_socket is the low-level networking code used by the connection_manager. These logs work together with the logs from the connection_manager to show the behavior of the network connection between the agent and the backend.

  • secure_audit: Audit is a feature of Sysdig Secure which provides information on system activity such as file and network behavior. These logs help understand the behavior of that feature.

  • memdumper: The memdumper is used to perform back-in-time captures, and logs from this component help troubleshoot any problems which might occur with back-in-time captures.

1.2.8.4 - Change the Agent Log Directory

The Sysdig agent generates log entries in /opt/draios/logs/draios.log. The agent will rotate the log file when it reaches 10MB in size, keeping the 10 most recent log files archived with a date-stamp appended to the filename.

You can change the default location as follows:

log:
  location: new_directory

By default, this location is rooted in the agent install path: /opt/draios/. Therefore, the new log location for the given example would be /opt/draios/new_directory.

You cannot write agent logs outside of the agent install path.

1.2.8.5 - Enable Agent Logs Globally Readable

The Sysdig agent generates log entries in /opt/draios/logs/draios.log. By default, only accounts with superuser credentials can read the agent logs.

To allow all the users access and read the agent logs, use the following configuration:

log:
  globally_readable: true

This option can be combined with the facility to change the agent log location or used independently. For example:

log:
  location: new_directory
  globally_readable: true

Now, all the users can read agent logs from /opt/draios/new_directory/draios.log.

1.2.8.6 - Control Disk Usage by Agent Logs

The Sysdig agent generates log entries in /opt/draios/logs/draios.log. It periodically performs rotation of its own logs.

You can use the following configuration to control the space taken up by agent logs:

  • max_size: Sets a limit to the size of a single agent log file, in megabytes. When the log file reaches this size, a new log file will be created. The old log will be renamed with a timestamp. The default size is 10 megabytes.

  • rotate: The rotate configuration determines how many old log files are kept on the disk. The default is 10 log files.

    When the log file reaches this size, a new log file, draios.log will be created, and the old log will be renamed with a timestamp.

log:
  max_size: 10
  rotate: 10

For example, if the current log file reaches the size limit of 10 megabytes and the number of log files reaches the limit of 10, the oldest will be removed. The last log file will be renamed with a timestamp and added to the list of old log files.

Increasing these values can provide more logs for troubleshooting at the expense of more space.

1.2.9 - Agent Auto-Config

Introduction

If you want to maintain centralized control over the configuration of your Sysdig agents, one of the following approaches is typically ideal:

  1. Via an orchestration system, such as using Kubernetes or Mesos/Marathon.

  2. Using a configuration management system, such as Chef or Ansible.

However, if these approaches are not viable for your environment, or to further augment your Agent configurations via central control, Sysdig Monitor provides an Auto-Config option for agents. The feature allows you to upload fragments of YAML configuration to Sysdig Monitor that will be automatically pushed and applied to some/all of your Agents based on your requirements.

Enable Agent Auto-Config

Independent of the Auto-Config feature, typical Agent configuration lives in /opt/draios/etc and is derived from a combination of base config in the dragent.default.yaml file and any overrides that may be present in dragent.yaml. See also Understanding the Agent Config Files.

Agent Auto-Config adds a middle layer of possible overrides in an additional file dragent.auto.yaml.When present, the the order of config application from highest precedence to lowest now becomes:

  1. dragent.yaml

  2. dragent.auto.yaml

  3. dragent.default.yaml

While all Agents are by default prepared to receive and make use of Auto-Config data, the file dragent.auto.yaml will not be present on an Agent until you’ve pushed central Auto-Config data to be applied to that Agent.

Auto-Config settings are performed via Sysdig Monitor’s REST API. Simplified examples are available that use the Python client library to get or set current Auto-Config settings. Detailed examples using the REST API are shown below.

The REST endpoint for Auto-Config is /api/agents/config. Use the GET method to review the current configuration. The following example shows the initial empty settings that result in no dragent.auto.yaml files being present on your Agents.

curl -X GET \
       --header "Authorization: Bearer xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx" \
       https://app.sysdigcloud.com/api/agents/config


Output:
{
    "files": []
}

Use the PUT method to centrally push YAML that will be distributed and applied to your Agents as dragent.auto.yaml files. The content parameter must contain syntactically-correct YAML. The filter option is used to specify if the config should be sent to one agent or all of them, such as in this example to globally enable Debug logging on all Agents:

curl -X PUT \
       --header "Content-Type: application/json" \
       --header "Authorization: Bearer xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx" \
       https://app.sysdigcloud.com/api/agents/config -d '
{
  "files": [
    {
      "filter": "*",
      "content": "log:\n  console_priority: debug"
    }
  ]
}'

Alternatively, the filter can specify a hardware MAC address for a single Agent that should receive a certain YAML config. All MAC-specific configs should appear at the top of the JSON object and are not additive to any global Auto-Config specified with “filter”: “*” at the bottom. For example, when the following config is applied, the one Agent that has the MySQL app check configured would not have Debug logging enabled, but all others would.

curl -X PUT \
       --header "Content-Type: application/json" \
       --header "Authorization: Bearer xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx" \
       https://app.sysdigcloud.com/api/agents/config -d '
{
  "files": [
    {
      "filter": "host.mac = \"08:00:27:de:5b:b9\"",
      "content": "app_checks:\n  - name: mysql\n    pattern:\n      comm: mysqld\n    conf:\n      server: 127.0.0.1\n      user: sysdig-cloud\n      pass: sysdig-cloud-password"
    },
    {
      "filter": "*",
      "content": "log:\n  console_priority: debug"
    }
  ]
}'

To update the active central Auto-Config settings, simply PUT a complete replacement JSON object.

All connected Agents will receive centrally-pushed Auto-Config updates that apply to them based on the filter settings. Any Agent whose Auto-Config is enabled/disabled/changed based on the centrally-pushed settings will immediately restart, putting the new configuration into effect. Any central Auto-Config settings that would result in a particular Agent’s Auto-Config remaining the same will not trigger a restart.

Disable Agent Auto-Config

To clear all Agent Auto-Configs, use the PUT method to upload the original blank config setting of ’{ “files”: [] }’.

It is also possible to override active Auto-Config on an individual Agent. To do so, follow these steps for your Agent:

  1. Add the following config directly to the dragent.yaml file: auto_config: false.

  2. Delete the file /opt/draios/etc/dragent.auto.yaml.

  3. Restart the Agent.

For such an Agent to opt-in to Auto-Config again, remove auto_config: false from the dragent.yaml and restart the Agent.

Restrictions

To prevent the possibility of pushing Auto-Config that would damage an Agent’s ability to connect, the following keys will not be accepted in the centrally-pushed YAML.

  • auto_config

  • customerid

  • collector

  • collector_port

  • ssl

  • ssl_verify_certificate

  • ca_certificate

  • compression

1.2.10 - Using the Agent Console

Sysdig provides an Agent Console to interact with the Sysdig agent. This is a troubleshooting tool to help you view configuration files and investigate agent configuration problems quickly.

Access Agent Console

  1. From Explore click the Groupings drop-down.

  2. Select Hosts & Container or Nodes.

  3. Click the desired host to investigate the corresponding agent configuration.

  4. Click Options (three dots) on the right upper corner of the Explore tab.

  5. Click Agent Console.

Agent Console Commands

View Help

The ? command displays the commands to manage Prometheus configuration and targets monitored by the Sysdig agent.

$ prometheus ?
$ prometheus config ?
$ prometheus config show ?

Command Syntax

The syntax of the Agent Console commands is as follows:

directory command
directory sub-directory command
directory sub-directory sub-sub-directory command

View Version

Run the following to find the version of the agent running in your environment:

$ version

An example output:

12.0.0

Troubleshoot Prometheus Metrics Collection

These commands help troubleshoot Prometheus targets configured in your environment.

For example, the following commands display and scrape the Prometheus endpoints respectively.

$ prometheus target show
$ prometheus target scrape

Sub-Directory Commands

The Promscrape CLI consists of the following sections.

  • config: Manages Sysdig agent-specific Prometheus configuration.

  • metadata: Manages metadata associated with the Prometheus targets monitored by the Sysdig agent.

  • stats: Helps view the global- and job-specific Prometheus statistics.

  • target: Manages Prometheus endpoints monitored by Sysdig agent.

Prometheus Commands

Show

The show command displays the information about the subsection. For example, the following example displays the configuration of the Prometheus server.

$ prometheus config show

5  Configuration      Value
6  Enabled            True
7  Target discovery   Prometheus service discovery
8  Scraper            Promscrape v2
9  Ingest raw         True
10  Ingest calculated  True
11  Metric limit       2000
Scrape

The scrape command scrapes a Prometheus target and displays the information. The syntax is:

$ prometheus target scrape -url <URL>

For example:

$ prometheus target scrape -url http://99.99.99.3:10055/metrics

# HELP go_gc_duration_seconds A summary of the GC invocation durations.
7  # TYPE go_gc_duration_seconds summary
8  go_gc_duration_seconds{quantile="0"} 7.5018e-05
9  go_gc_duration_seconds{quantile="0.25"} 0.000118155
10  go_gc_duration_seconds{quantile="0.5"} 0.000141586
11  go_gc_duration_seconds{quantile="0.75"} 0.000171626
12  go_gc_duration_seconds{quantile="1"} 0.00945638
13  go_gc_duration_seconds_sum 0.114420898
14  go_gc_duration_seconds_count 607

View Agent Configuration

The Agent configuration commands have a different syntax.

Run the following to view the configuration of the agent running in your environment:

$ configuration show-default-yaml
$ configuration show-backend-yaml
# docker environments
$ configuration show-dragent-yaml
# Kubernetes environments
$ configuration show-configmap-yaml

The output displays the configuration file. Sensitive data, such as the credentials, are obfuscated.

customerid: "********"
watchdog:
  max_memory_usage_mb: 2048

Security Considerations

  • User-sensitive configuration is obfuscated and not visible through the CLI.

  • All the information is read-only. You cannot currently change any configuration by using the Agent console.

  • Runs completely inside the agent. It does not use bash or any other Linux terminals to prevent the risk of command injection.

  • Runs only via a TLS connection with the Sysdig backend.

Disable Agent Console

This is currently turned on by default. To turn off Agent Console for a particular team:

  1. Navigate to Settings > Teams.

  2. Select the team that you want to disable Agent Console for.

  3. From Additional Permissions, Deselect Agent CLI.

  4. Click Save.

To turn it off in your environment, edit the following in the dragent.yaml file:

command_line:
  enabled: false

1.3 - Agent Upgrade

The steps to upgrade an agent differ depending on whether the agent was originally installed as a container or as a service.

Follow the upgrade best practices for a smooth upgrade and to maximize the value of Sysdig applications:

  • Keep upgrades current

  • Upgrade progressively without skipping versions

  • Test upgrades in a non-mission-critical or staging environment before rolling in to production.

This section describes how to check the current version of the installed agents, and then how to upgrade them.

Agent Version Check

Kubernetes installation

If the agent is installed in a Kubernetes environment, run:

kubectl get pods -n sysdig-agent -o=jsonpath='{.items[0].spec.containers[:1].image}'

Container/Docker Installation

If the agent is installed as container, run:

docker exec sysdig-agent /opt/draios/bin/dragent --version

Service Installation

If the agent is installed as a service, run:

/opt/draios/bin/dragent --version

The agent version can also be found in the agent log file: /opt/draios/logs/draios.log.

Look for the “Agent starting” message, which is logged whenever the agent restarts.

Update Agent

Update the containerized agent version as you normally update any container; the basic steps are below.

Use the full run command as shown in the Settings > Agent Installation tab of your account.

Containerized Agent

To see which agent versions are available see tags.

Kubernetes

Helm
  1. Update the chart:

    helm repo update
    
  2. Do one of the following:

  • If you have deployed the chart with a values.yaml file, modify or add (if it’s missing) the agent.image.tag field and run:
    helm upgrade --namespace sysdig-agent sysdig-agent -f values.yaml sysdig/sysdig-deploy
    
  • If you have deployed the chart by setting the values as CLI parameters, run:
    helm upgrade --namespace sysdig-agent --set agent.image.tag=<latest_version> --reuse-values sysdig-agent sysdig/sysdig-deploy
    
    Replace <latest_version> with the latest version number of Sysdig agent.

For more information on using Helm, see Helm Charts.

Manual

Check whether .yaml files must be updated:

Updating the agent image does not overwrite the daemonset.yaml and sysdig-agent-configmap.yaml on your local system. Check the Sysdig Agent Release Notes to see if you need to download the latest .yaml files from GitHub.

Perform update:

kubectl set image ds/sysdig-agent sysdig-agent=sysdig/agent:<tag>

Watch update status:

kubectl rollout status ds/sysdig-agent

Docker

Basic Steps: stop the agent, remove it, pull the new agent, and install it.

The exact Docker command can also be found in the Sysdig Settings > Agent Installation menu.

docker stop sysdig-agent
docker rm sysdig-agent
docker pull sysdig/agent
docker run . . .

Service Agent

For service (non-containerized) agent installations, updates are installed as part of the normal system upgrade available with apt-get or yum.

Debian, Ubuntu

apt-get update
apt-get -y install draios-agent

CentOS, RHEL, Fedora, Amazon AMI, Amazon Linux 2

yum clean expire-cache
yum -y install draios-agent

1.4 - Uninstall the Agent

This section describes uninstalling the Sysdig agent when it was installed as a service.

If the agent was installed as a container, remove it using standard container commands.

If the agent was installed by an orchestrator, such as Kubernetes, remove it by using the standard orchestrator commands.

Debian/Ubuntu Distributions

To uninstall the agent from Debian Linux distributions, including Ubuntu:

As the sudo user, run the following command in a terminal on each host:

sudo apt-get remove draios-agent

Fedora/CentOS/RHEL/Amazon AMI/ Amazon Linux 2 Distributions

To uninstall the agent from Fedora Linux distributions, including CentOS, Red Hat Enterprise Linux, as well as Amazon AMI and Amazon Linux 2:

As the sudo user, run the following command in a terminal on each host:

sudo yum erase draios-agent

1.5 - Troubleshooting Agent Installation

This section describes methods for troubleshooting two types of issue:

  • Disconnecting Agents

  • Can’t See Metrics After Agent Install

Disconnecting Agents

If agents are disconnecting, there could be problems with addresses that need to be resolved in the agent configuration files. See also Understanding the Agent Config Files.

Check for Duplicate MAC addresses

The Sysdig agent will use the eth0 MAC address to identify the different hosts within an infrastructure. In a virtualized environment, you should confirm each of your VM’s eth0 MAC addresses are unique.

If a unique address cannot be configured, you can supply an additional parameter in the Sysdig agent’s dragent.yaml configuration file: machine_id_prefix: prefix

The prefix text can be any string and will be prepended to the MAC address as reported in the Sysdig Monitor web interface’s Explore tables.

Example: (using ADDITIONAL_CONF rather than Kubernetes Configmap)

Here is an example Docker run command installing the parameter via the ADDITIONAL_CONF parameter

docker run --name sysdig-agent --privileged --net host --pid host -e ACCESS_KEY=abc123-1234-abcd-4321-abc123def456 -e TAGS=tag1:value1 -e ADDITIONAL_CONF="machine_id_prefix: MyPrefix123-" -v /var/run/docker.sock:/host/var/run/docker.sock -v /dev:/host/dev -v /proc:/host/proc:ro -v /boot:/host/boot:ro -v /lib/modules:/host/lib/modules:ro -v /usr:/host/usr:ro sysdig/agent

The resulting /opt/draios/etc/dragent.yaml config file would look like this:

customerid:abc123-1234-abcd-4321-abc123def456
tags: tag1:value1
machine_id_prefix: MyPrefix123-

You will then see all of your hosts, provided that all the prefixes are unique. The prefix will be visible whenever the MAC address is displayed in any view.

See also: Agent Configuration.

Check for Conflicting MAC addresses in GKE environments

In Google Container Engine (GKE) environments, MAC addresses could be repeated across multiple hosts. This would cause some hosts running Sysdig agents not to appear in your web interface.

To address this, add a unique machine ID prefix to each config you use to deploy the agent to a given cluster (i.e. each sysdig-daemonset.yaml file).

Note: This example uses the (v1) ADDITIONAL_CONF, rather than (v2) Configmap method.

- name: ADDITIONAL_CONF value: "machine_id_prefix: mycluster1-prefix-"

Can’t See Metrics After Agent Install

If agents were successfully installed, you could log in to the Sysdig Monitor UI, but no metrics are displayed in the Explore panel, first confirm that the agent license count has not been exceeded. Then check for any proxy, firewall, or host security policies preventing proper agent communication to the Sysdig Monitor backend infrastructure.

Check License Count

If network connectivity is good, the agent will connect to the backend but will be disconnected after a few seconds if the license count has been exceeded.

To check whether you are over-subscribed, go to Settings > Subscription.

See Subscription for details.

Check Network Policy

Agent Connection Port

Check your service provider VPC security groups to verify that network ACLs are set to allow the agent’s outbound traffic over TCP ports. See Sysdig Collector Ports for the supported TCP ports for each region.

Outbound IP Addresses

Due to the distributed nature of the Sysdig Monitor infrastructure, the agent must be open for outbound connections to collector.sysdigcloud.com on all outbound IP addresses.

Check Amazon’s public IP ranges file to see all the potential IP addresses the Sysdig agent can use to communicate with the Sysdig backend databases.

AWS Metadata Endpoint

AWS metadata is used for gathering information about the instance itself, such as instance id, public IP address, etc.

When running on an AWS instance, access to the following AWS metadata endpoint is also needed: 169.254.169.254

Check Local Host Policy

The agent requires access to the following local system resources in order to gather metrics:

  • Read/Write access to /dev/sysdig devices.

  • Read access to all the files under /proc file system.

  • For container support, the Docker API endpoint /var/run/docker.sock

If any settings or firewall modifications are made, you may need to restart the agent service. In a shell on the affected instances issue the following command:

sudo service dragent restart

Learn More

1.5.1 - Kernel Header Troubleshooting

In addition to the information on Agent Installation Requirements, this page describes how the agent uses kernel headers and tips on troubleshooting, if needed.

About Kernel Headers and the Kernel Module

The Sysdig agent requires a kernel module in order to install successfully on a host. This can be obtained in three ways:

  1. Agent compiles the module using kernel headers.

    If the hosts in your environment already have kernel header files pre-installed, no special action is needed. Or you can install the kernel headers manually; see below.

  2. Agent auto-downloads precompiled modules from Sysdig’s AWS storage location.

    If the headers are not already installed but the agent is able to auto-download, no special action is needed. If there is no internet connectivity, you can use method 3 (download from an internal URL).

  3. Agent downloads precompiled modules from an internal URL.

    Use the environment variable SYSDIG_PROBE_URL. See also Understanding the Agent Config Files. Contact Sysdig support for assistance.

Agent Installation on SELinux-Enabled Linux Distributions

On Fedora 35 or similar SELinux-enabled distributions with default restrictive policies, the agent init container, agent-kmodule, will not install the downloaded kernel module raising an error similar to the following:

insmod: ERROR: could not insert module /root/.sysdig/sysdigcloud-probe-12.3.1-x86_64-5.16.11-200.fc35.x86_64-67098c7fdcc97105d4b9fd0bb2341888.ko: Permission denied

In such cases, we recommend that you use eBPF option while running agent-kmodule instead.

TracePoints

All supported distribution released kernels have this support but if creating a custom kernel, it must support the following options:

  • CONFIG_TRACEPOINTS
  • CONFIG_HAVE_SYSCALL_TRACEPOINTS

To Install Kernel Headers

In some cases, the host(s) in your environment may use Unix versions that do not match the provided headers, and the agent may fail to install correctly. In those cases, you must install the kernal headers manually.

Debian-Style

For Debian-syle distributions, run the command:

apt-get -y install linux-headers-$(uname -r)

RHEL-Style

For RHEL-style distributions, run the command:

yum -y install kernel-devel-$(uname -r)

RancherOS

For RancherOS distributions, the kernel headers are available in the form of a system service and therefore are enabled using the ros service command:

sudo ros service enable kernel-headers-system-docker
sudo ros service up -d kernel-headers-system-docker

NOTE: Some cloud hosting service providers supply pre-configured Linux instances with customized kernels. You may need to contact your provider’s support desk for instructions on obtaining appropriate header files, or for installing the distribution’s default kernel.

To Correct Kernel Header Errors in AWS AMI

During an agent installation in an Amazon machine image (AMI) you may encounter the following errors while the installer is trying to compile the Sysdig kernel module:

Errors

  • “Unable to find kernel development files for the current kernel version” or

  • “FATAL: Module sysdigcloud-probe not found”

This indicates your machine is running a kernel in an older AMI for which the kernel headers are no longer available in the configured repositories. The issue has to do with Amazon purging packages in the yum repository when new Amazon Linux machine images are released.

The solution is either to update your kernel to a version for which header files are readily available (recommended), or perform a one-time installation of the kernel headers for your older AMI.

Option 1: Upgrade Your Host’s Kernel

First install a new kernel and reboot your instance:

sudo yum -y install kernel
sudo reboot

After rebooting, check to see if the host is reporting metrics to your Sysdig account. If not, you may need to issue three more commands to install the required header files:

sudo yum -y install kernel-devel-$(uname -r)
sudo /usr/lib/dkms/dkms_autoinstaller start
sudo service dragent restart

Option 2: Install Older Kernel Headers

Although it is recommended to upgrade to the latest kernel for security and performance reasons, you can alternatively install the older headers for your AMI.

Find the the AMI version string and install the appropriate headers with the commands:

releasever=$(cat /etc/os-release | grep 'VERSION_ID' | grep -Eo "[0-9]{4}\.[0-9]{2}")
sudo yum -y --releasever=${releasever} install kernel-devel-$(uname -r)

Issue the remaining commands to allow the Sydig Agent to start successfully:

sudo /usr/lib/dkms/dkms_autoinstaller start
sudo service dragent restart

Reference: Find Your AWS Instance Image Version

The file /etc/image-id shows information about the original machine image with which your instance was set up:

[ec2-user ~]$ cat /etc/image-id
image_name="amzn-ami-hvm"
image_version="2017.03"
image_arch="x86_64"
image_file="amzn-ami-hvm-2017.03.0.20170401-x86_64.ext4.gpt"
image_stamp="26a3-ed31"
image_date="20170402053945"
recipe_name="amzn ami"
recipe_id="47cfa924-413c-d460-f4f2-2af7-feb6-9e37-7c9f1d2b"

This file will not change as you install updates from the yum repository.

The file /etc/system-release will tell what version of the AWS image is currently installed:

[ec2-user ~]$ cat /etc/system-release
Amazon Linux AMI release 2017.03

1.5.2 - Tuning Sysdig Agent

The resource requirements for the Sysdig agent are subjective to the size and load of the host. Increased activity equates to higher resource requirements.

You might see 5 to 20 KiB/s of bandwidth consumed. Different variables can increase the throughput required. For example:

  • The number of metrics

  • The number of events

  • Kubernetes objects

  • Products and features enabled

When a Sysdig Capture is being collected, you can expect to see a spike in the bandwidth while the capture file is being ingested.

Sysdig does not recommend placing bandwidth shaping or caps on the agent to ensure that data is sent to the Sysdig Collection service.

In general, in larger clusters, the agent requires more memory, and in servers with a high number of cores, the agent requires more CPU cores to monitor all the system calls. You will use CPU cores on the host and the Kubernetes nodes visible to the agent as proxies for the rate of events processed in the agent.

Similarly, there are different factors that are at play, and considering all the factors, we recommend the following:

Small: CPU core count <= 8. Kubernetes nodes <=10

Medium: 8 < CPU core count <= 32. 10 < Kubernetes nodes <= 100

Large: CPU core count > 32. Kubernetes nodes > 100

While you can expect the behavior with the given numbers to be better than simply using the default values, Sysdig cannot guarantee that resource allocation will be correct for all the cases.

Cluster SizeSmallMediumLarge
Kubernetes CPU Request135
Kubernetes CPU Limit135
Kubernetes Memory Request1024 MB3072 MB6144 MB
Kubernetes Memory Limit1024 MB3072 MB6144 MB
Dragent Memory Watchdog512 MB1024 MB2048 MB
Cointerface Memory Watchdog512 MB2048 MB4096 MB

Note that the agent has its own memory watchdog to prevent runaway memory consumption on the host in case of memory leaks. The default values of the watchdog are specified in the following agent configuration file.

watchdog:
  max_memory_usage_mb: 1024
  max_memory_usage_subprocesses:
    sdchecks: 128
    sdjagent: 256
    mountedfs_reader: 32
    statsite_forwarder: 32
    cointerface: 512
    promscrape: 640

The recommended value for promscrape depends on the amount of timeseries and label data that required to be scraped on a particular node. The cluster size does not have an effect on promscrape memory usage.

max_memory_usage_mb corresponds to the dragent process in the agent. All the values are given in MiB.

For example, use the following agent configuration to match the agent watchdog settings with large values:

watchdog:
  max_memory_usage_mb: 2048
  max_memory_usage_subprocesses:
    sdchecks: 128
    sdjagent: 256
    mountedfs_reader: 32
    statsite_forwarder: 32
    cointerface: 4096
    promscrape: 640
    

2 - Serverless Agents

Used for container-based cloud environments such as Fargate

Overview

The serverless environment: As cloud platforms have evolved, both the convenience and the abstraction levels have increased simultaneously and new agent models are required.

For example, with Amazon’s ECS and EKS, users remain in charge of managing the underlying virtual host machines. In environments like Fargate, however, the hosts are implicitly allocated by the cloud provider and users simply run their containers without allocating, configuring, or having any knowledge of the underlying compute infrastructure.

While this “container as a service” model is convenient, it can introduce risk, as many users leave the containers unattended and don’t monitor for security events inside them that can exfiltrate secrets, compromise business data, and increase their AWS/cloud provider costs. In addition, it is not possible to install a standard agent in an environment where you do not have access to a host.

For these reasons, Sysdig has introduced a new “serverless agent” that can be deployed in such container-based cloud environments.

Available Platforms

2.1 - AWS Fargate Serverless Agents

Introduction

Check the Overview for an explanation of when and why to use serverless agents in “container-as-a-service” cloud environments.

Architecture

The Sysdig serverless agent provides runtime detection through policy enforcement with Falco. At this time, the serverless agent is available for AWS Fargate on ECS. It is comprised of an orchestrator agent and (potentially multiple) workload agents.

  • The Sysdig serverless orchestrator agent is a collection point installed on each VPC to collect data from the serverless workload agent(s) and to forward them to the Sysdig backend. It also syncs the Falco runtime policies and rules to the workload agent(s) from the Sysdig backend.

  • The Sysdig serverless workload agent is installed in each task and requires network access to communicate with the orchestrator agent.

    Note that the workload agent is designed to secure your workload. However, at deployment, the default setting prioritizes availability over security, using setting that allow your workload to start even if policies are not in place. If you prefer to prioritize security over availability, you can change these settings <>.

Prerequisites

Before starting the installation, ensure that you have the following:

On AWS Side
  • A custom Terraform/CloudFormation template containing the Fargate task definitions that you want to instrument through the Sysdig Serverless Agent
  • Two VPC subnets in different availability zones that can connect with the internet via a NAT gateway or an internet gateway
On Sysdig Side

Known Limitations

Sysdig instruments a target workload by patching its task definition to run the original workload below Sysdig instrumentation. To patch the original task definition, Sysdig instrumentation pulls and analyzes the workload image to get the original entry point and the command, along with other information.

Pulling workload images from Public vs Private registries

If you retrieve your workload image from a private registry, you must explicitly define the entry point and the command in the container definition. If you don’t specify them, the Sysdig instrumentation might not be able to collect such information, and the instrumentation might fail.

Instead, if you pull the workload image from a public registry no additional operations are required.

Referencing a parameterized Image in CloudFormation Workload Template

When instrumenting a workload over CloudFormation you must define the Image inline in your TaskDefinition.

Using a parameterized image in the TaskDefinition instead might prevent the Sysdig instrumentation from retrieving the workload image configuration. That could lead to incorrect workload instrumentation.

The example below shows a valid TaskDefinition.

Resources:
  TaskDefinition:
    Type: AWS::ECS::TaskDefinition
    Properties:
      ContainerDefinitions:
        - Name: !Ref MyContainerName
          Image: "MyContainerImage"  # Inline Image
        ...

Install Options

The Sysdig serverless agent can be deployed automatically via Terraform or CloudFormation. Alternatively, you can also use the manual process to complete the instrumentation tasks.

Sysdig recommends using the Terraform deployment method to instrument your Fargate workloads.

Terraform

This option presumes you use Terraform to deploy your workload.

You can deploy the orchestrator agent and install the workload agent by using an automated process which will instrument all your task definitions.

For details, see the following Sysdig Terraform registries:

Deployment Steps

Install the Orchestrator Agent
  1. Set up the AWS Terraform provider:

      provider "aws" {
        region = var.region
      }
    
  2. Configure the Sysdig orchestrator module and deploy it:

    module "fargate-orchestrator-agent" {
      source  = "sysdiglabs/fargate-orchestrator-agent/aws"
      version = "0.1.1"
    
      vpc_id           = var.vpc_id
      subnets          = [var.subnet_a_id, var.subnet_b_id]
    
      access_key       = var.access_key
    
      collector_host   = var.collector_host
      collector_port   = var.collector_port
    
      name             = "sysdig-orchestrator"
      agent_image      = "quay.io/sysdig/orchestrator-agent:latest"
    
      # True if the VPC uses an InternetGateway, false otherwise
      assign_public_ip = true
    
      tags = {
        description    = "Sysdig Serverless Agent Orchestrator"
      }
    }
    

    Call this module for each VPC that needs instrumentation.

Install the Instrumented Workload
  1. Set up the Sysdig Terraform provider:

    terraform {
      required_providers {
        sysdig = {
          source = "sysdiglabs/sysdig"
          version = ">= 0.5.39"
        }
      }
    }
    
    provider "sysdig" {
      sysdig_secure_api_token = var.secure_api_token
    }
    
  2. Pass the orchestrator host, port, and container definitions of your workload to the sysdig_fargate_workload_agent data source:

    data "sysdig_fargate_workload_agent" "instrumented" {
      container_definitions = jsonencode([...])
    
      sysdig_access_key     = var.access_key
    
      workload_agent_image  = "quay.io/sysdig/workload-agent:latest"
    
      orchestrator_host     = module.sysdig_orchestrator_agent.orchestrator_host
      orchestrator_port     = module.sysdig_orchestrator_agent.orchestrator_port
    }
    

    Note: Ensure that the input container definitions must be in JSON format.

  3. Include the instrumented JSON in your Fargate task definition and deploy your instrumented workload:

    resource "aws_ecs_task_definition" "fargate_task" {
      ...
    
      network_mode             = "awsvpc"
      requires_compatibilities = ["FARGATE"]
    
      container_definitions    = "${data.sysdig_fargate_workload_agent.instrumented.output_container_definitions}"
    }
    

    The Sysdig instrumentation will go over the original task definition to instrument it. The process includes replacing the original entry point and command of the containers.

    For the images pulled from private registries, explicitly provide the Entrypoint and Command in the related container definition, otherwise, the instrumentation will not be completed.

(Latest) CloudFormation

This option presumes you use a CFT to deploy your workload.

As of Serverless Agent 3.0, a YAML provided by Sysdig helps you deploy the orchestrator agent and the instrumentation service in a single step. Then, you will install the workload agent using an automated process which will instrument all the Fargate task definitions in your CFT.

  • For the Orchestrator Agent and Instrumention Service, Sysdig provides the serverless-instumentation.yaml to use as a CloudFormation Template which you can deploy through the AWS console. You need one Orchestrator deployment per VPC in your environment that your organization wants to secure.

  • For the Workload Agents, you need one Workload Agent per Fargate task definition. For example, if you have ten services and ten task definitions, each needs to be instrumented.

Deployment Steps

Deploy the Sysdig Instrumentation and Orchestration Stack

Deploy the serverless-instrumentation.yaml for each desired VPC using CloudFormation:

  1. Log in to the AWS Console, select CloudFormation, Create a stack with new resources, and specify the serverless-instrumentation.yaml as the Template source.

  2. Specify the stack details to deploy the Orchestrator Agent on the same VPC where your service is running. For standard deployments, provide the parameters highlighted in the figure below.

  3. Click Next, complete the stack creation, and wait for the deployment to complete (usually a few minutes).

Deploy the Workload Agent
Edit Your CFT

Once the Sysdig Instrumentation and Orchestration stack is deployed, the Outputs tab provides the transformation instruction as shown in the figure below. Note that the value here is dependent on what you set it to during the deployment of the Sysdig Instrumentation and Orchestration Stack.

Copy and paste the value of the transformation instruction to the root level of your applications CFT. For Example:

Transform: ["SysdigMacro"]

The Sysdig instrumentation will go over the original task definition to instrument it. The instrumentation process includes replacing the original entry point and command of the container. If you are using an image from a public registry, it can determine these values from the image. If you are using an image from a private registry then you must explicitly provide the Entrypoint and Command in the related container definition; otherwise, the instrumentation will not be completed.

Deploy Your CFT

All the new deployments of your CFT will be instrumented.

When instrumentation is complete, Fargate events should be visible in the Sysdig Secure Events feed.

Upgrade from a Prior Version

Up through version 2.3.0, the installation process deployed two stacks, as described in (Legacy) CloudFormation:

  • Orchestration stack, deployed via YAML
  • Instrumentation stack, deployed via command-line installer.

Instead, from version 3.0.0, you will deploy the “Instrumentation & Orchestration” stack only, using the (Latest) CloudFormation installation option.

To upgrade to version 3.0.0:

  1. Deploy the new Instrumentation and Orchestration stack and the Workload Agents, as described in (Latest) CloudFormation. When deploying the Instrumentation and Orchestration stack, assign a unique name to your macro, for example, Transform: MyV2SysdigMacro. You now have two versions of the serverless agent components. When you are ready to switch from the earlier version, proceed with the next step.

  2. Stop all the running tasks and use CloudFormation to delete the earlier stacks.

  3. Clean up the earlier macro using the installer: ./installer-linux-amd64 cfn-macro delete MyEarlierSysdigMacro

  4. Redeploy the workload stack with the updated CFT (Transform: MyV2SysdigMacro).

(Legacy) CloudFormation

Note: This option has been deprecated.

This option presumes you use a CFT to deploy your workload.

Up to Serverless Agent 2.3.0, a YAML and an installer provided by Sysdig lets you deploy the Orchestrator Agent and the instrumentation service, respectively. Then, you will install the Workload Agent using an automated process which will instrument all the Fargate task definitions in your CFT.

The following components of the serverless agent are installed separately.

  • For the Orchestrator Agent, Sysdig provides the orchestrator-agent.yaml to use as a CloudFormation Template which you can deploy through the AWS Console. You need one orchestrator deployment per VPC in your environment that your organization wants to secure.

  • For the instrumentation service, Sysdig provides the installer to run to deploy the instrumentation service that will automatically instrument your task definitions.

  • For the Workload Agents, you need one Workload Agent per Fargate task definition. For example, if you have ten services and ten task definitions, each needs to be instrumented.

Additional Prerequisites

In addition to the prerequisites defined above, you will need the following on the AWS side:

  • AWS CLI configured and permissions to create and use an S3 bucket.

Deployment Steps

Install the Orchestrator Agent
  1. Get the CFT orchestrator-agent.yaml to deploy the orchestrator agent.

  2. Deploy the orchestrator agent for each desired VPC, using CloudFormation. The steps below are an outline of the important Sysdig-related parts.

    1. Log in to the AWS Console, select CloudFormation and Create Stack with new resources and specify theorchestrator-agent.yaml as the Template source.

    2. Specify the stack details to deploy the Orchestrator Agent on the same VPC where your service is running.

      Stack name: self-defined

      Sysdig Settings

      • Sysdig Access Key: Use the agent key of your Sysdig platform.

      • Sysdig Collector Host: collector.sysdigcloud.com (default); region-dependent in Sysdig SaaS; custom in Sysdig on-prem.

      • Sysdig Collector Port: 6443 (default), or could be custom for on-prem installations.

      Network Settings

      • VPC Id: Choose your VPC.

      • VPC Gateway: Choose the type of Gateway in your VPC to balance the load of the orchestrator service.

      • Subnet A & B: These depend on the VPC you choose; select from the drop-down menu.

      Advanced Settings

      • Sysdig Agent Tags: Enter a comma-separated list of tags (eg. role:webserver,location:europe) Note: tags will also be created automatically from your infrastructure’s metadata, including AWS, Docker, etc.

      • Sysdig Orchestrator Agent Image:

        quay.io/orchestrator-agent:latest (default)

      • Check Collector SSL Certificate: Default: true. False means no validation will be done on the SSL certificate received from the collector, used for dev purposes only.

      • Sysdig Orchestrator Agent Port: 6667 (default). Port where the orchestrator service will be listening for instrumented task connections.

    3. Click Next to start the deployment, and wait for the deployment to complete (usually a few minutes).

    4. From the Outputs tab, take note of the OrchestratorHost and OrchestratorPort values.

Install the Workload Agents

Install the Instrumentation Service
  1. Prerequisite: Have the orchestrator agent deployed in the appropriate VPC and have the Orchestrator Host and Port information handy.

  2. Download the appropriate installer for your OS.

    These set up Kilt, an open-source library mechanism for injection into Fargate containers.

  3. Create a macro for the serverless worker agents, using the installer. Any service tagged with this macro will have the serverless worker agent(s) added and Fargate data will be collected.

    1. Log in to the AWS CLI.

    2. Create a CFN macro that applies instrumentation. You will need the outputs from previous task. Example:

      ./installer-linux-amd64 cfn-macro install -r us-east-1 MySysdigMacro $OrchestratorHost $OrchestratorPort
      
Edit Your CFT

Once the instrumentation service is deployed, you can use the transformation instruction to instrument your workloads. Copy and paste the transformation instruction at the root level of your CFT. All new deployments of that template will be instrumented.

The Sysdig instrumentation will go over the original task definition to instrument it. The instrumentation process includes replacing the original entry point and command of the containers.

For images pulled from private registries, explicitly provide the Entrypoint and Command in the related container definition, otherwise, the instrumentation will not be completed.

Deploy Your CFT

All new deployments of your CFT will be instrumented.

When instrumentation is complete, Fargate events should be visible in the Sysdig Secure Events feed.

Upgrade from a Prior Version

In most cases, it is advised to upgrade directly to 3.0.0 +, as described in (Latest) CloudFormation. These instructions are kept for special cases.

To upgrade to version 2.3.0, follow the instructions in (Legacy) CloudFormation:

  1. Install the Orchestrator Agent, note that the OrchestratorHost and OrchestratorPort values will be unique.

  2. Install the instrumentation service, note that you have to assign a unique name to your macro, for example, Transform: MyV2SysdigMacro. At this point you have two versions of the Serverless agent components. When you are ready to switch from the earlier version, proceed with the next step.

  3. Stop all running tasks and use CloudFormation to delete the earlier stack. Redeploy the new stack with the updated CFT.

  4. Clean up the earlier macro using the installer: ./installer-linux-amd64 cfn-macro delete MyEarlierSysdigMacro

  5. Redeploy the workload stack with the updated CFT (Transform: MyV2SysdigMacro).

Manual Instrumentation

In some cases, you may prefer not to use the Terraform or CloudFormation installers and instead to use one of the following:

  • Manually Instrument a Task Definition
  • Instrument a Container Image (rare).

Manually Instrument a Task Definition

Install the orchestrator agent via Terraform or CloudFormation, as described above.

Take note of the OrchestratorHost and OrchestratorPort values. Such parameters will be passed to the workload via the environment variables SYSDIG_ORCHESTRATORand SYSDIG_ORCHESTRATOR_PORT respectively.

Then, instrument the task definition to deploy the workload agent manually as follows:

  1. Add a new container to your existing task definition:

    • Use sysdigInstrumentation as a name for the container.

    • Obtain the image from quay.io/sysdig/workload-agent:latest.

    • The entrypoint and command can be left empty.

  2. Edit the containers you want to instrument.

    • Mount volumes from SysdigInstrumentation.

    • Add the SYS_PTRACE capability to your container. See AWS Documentation for detail if needed.

    • Prepend /opt/draios/bin/instrument to the entrypoint of your container. So, if your original entrypoint is ["my", "original", "entrypoint"], it becomes ["/opt/draios/bin/instrument", "my", "original", "entrypoint"].

Terraform Example

Task definition before the manual instrumentation:

resource "aws_ecs_task_definition" "test_app" {
  execution_role_arn       = aws_iam_role.test_app_execution_role.arn
  task_role_arn            = aws_iam_role.test_app_task_role.arn

  cpu                      = "2048"
  memory                   = "8192"
  network_mode             = "awsvpc"
  requires_compatibilities = ["FARGATE"]

  tags                     = {
    application = "TestApp"
  }

  container_definitions    = jsonencode([
    {
      "name" : "app",
      "image" : "my_image",
      "entrypoint" : ["/bin/ping", "google.com"],
    }
  ])
}

Task definition after the manual instrumentation:

resource "aws_ecs_task_definition" "test_app" {
  execution_role_arn       = aws_iam_role.test_app_execution_role.arn
  task_role_arn            = aws_iam_role.test_app_task_role.arn

  cpu                      = "2048"
  memory                   = "8192"
  network_mode             = "awsvpc"
  requires_compatibilities = ["FARGATE"]

  tags                     = {
    application = "TestApp"
  }

  container_definitions    = jsonencode([
    {
      "name" : "app",
      "image" : "my_image",
      "entrypoint" : ["/opt/draios/bin/instrument", "/bin/ping", "google.com"],
      "linuxParameters": {
        "capabilities": {
          "add": [
            "SYS_PTRACE"
          ],
        },
      },
      "environment": [
        {
          "name": "SYSDIG_ORCHESTRATOR",
          "value": "<host orchestrator output, region dependent>"
        },
        {
          "name": "SYSDIG_ORCHESTRATOR_PORT",
          "value": "6667"
        },
        {
          "name": "SYSDIG_ACCESS_KEY",
          "value": ""
        },
        {
          "name": "SYSDIG_COLLECTOR",
          "value": ""
        },
        {
          "name": "SYSDIG_COLLECTOR_PORT",
          "value": ""
        },
        {
          "name": "SYSDIG_LOGGING",
          "value": ""
        },
      ],
      "volumesFrom": [
        {
          "sourceContainer": "SysdigInstrumentation",
          "readOnly": true
        }
      ],
    },
    {
      "name" : "SysdigInstrumentation",
      "image" : "quay.io/sysdig/workload-agent:latest",
    }
  ])
}
CloudFormation Example

Task definition before the manual instrumentation:

TestApp:
  Type: AWS::ECS::TaskDefinition
  Properties:
    NetworkMode: awsvpc
    RequiresCompatibilities:
      - FARGATE
    Cpu: 2048
    Memory: 8GB
    ExecutionRoleArn: !Ref TestAppExecutionRole
    TaskRoleArn: !Ref TestAppTaskRole
    ContainerDefinitions:
      - Name: App
        Image: !Ref TestAppImage
        EntryPoint:
        - /bin/ping
        - google.com
    Tags:
      - Key: application
        Value: TestApp

Task definition after the manual Instrumentation

TestApp:
  Type: AWS::ECS::TaskDefinition
  Properties:
    NetworkMode: awsvpc
    RequiresCompatibilities:
      - FARGATE
    Cpu: 2048
    Memory: 8GB
    ExecutionRoleArn: !Ref TestAppExecutionRole
    TaskRoleArn: !Ref TestAppTaskRole
    ContainerDefinitions:
      - Name: App
        Image: !Ref TestAppImage
        EntryPoint:
##### BEGIN patch entrypoint for manual instrumentation
        - /opt/draios/bin/instrument
##### END patch entrypoint for manual instrumentation
        - /bin/ping
        - google.com
##### BEGIN add properties for manual instrumentation
        LinuxParameters:
          Capabilities:
            Add:
            - SYS_PTRACE
        VolumesFrom:
        - SourceContainer: SysdigInstrumentation
          ReadOnly: true
        Environment:
        - Name: SYSDIG_ORCHESTRATOR
          Value: "<host orchestrator output, region dependent>"
        - Name: SYSDIG_ORCHESTRATOR_PORT
          Value: "6667"
        - Name: SYSDIG_ACCESS_KEY
          Value: ""
        - Name: SYSDIG_COLLECTOR
          Value: ""
        - Name: SYSDIG_COLLECTOR_PORT
          Value: ""
        - Name: SYSDIG_LOGGING
          Value: ""
      - Name: SysdigInstrumentation
        Image: !Ref WorkloadAgentImage
##### END add properties for manual instrumentation
    Tags:
      - Key: application
        Value: TestApp

Instrument a Container Image

Alternatively, you can include the Workload Agent in your container at build time. To do so, update your dockerfile to copy the required files:

ARG sysdig_agent_version=latest
FROM quay.io/sysdig/workload-agent:$sysdig_agent_version AS workload-agent

FROM my_original_base

COPY --from=workload-agent /opt/draios /opt/draios

Prepend the /opt/draios/bin/instrument command to the entrypoint of your container:

ENTRYPOINT ["/opt/draios/bin/instrument", "my", "original", "entrypoint"]

Advanced Configurations

Configuring Workload Starting Policy

To customize the Sysdig instrumentation workload starting policy see Configure Workload Starting Policy.

Configuring Instrumentation Logging

To customize the Sysdig instrumentation logging see Manage Serverless Agent Logs.

Enable Proxy

To configure the Orchestrator/Workload agents to use a proxy see Enable HTTP proxy for agents.

The configuration can be provided through the environment variables that follow:

  • ADDITIONAL_CONF for the Orchestrator Agent.
  • SYSDIG_EXTRA_CONF for the Workload Agent.

Both of the environment variables expect a valid YAML or JSON.

2.1.1 - Configure Workload Starting Policy

Configure Workload Starting Policy

As of serverless agent version 3.0.2, the instrumentation starts the workload even if policies are not in place. That avoids workload starvation in case of issues like agent misconfiguration or network issues.

It is possible to customize the workload starting policy via the following environment variables:

  • agentino.run_without_policies, true by default, defines whether the Sysdig instrumentation should continue running with no policies in place. true enables the workload to run unsecured. false disallows the workload to run unsecured so, the workload will not run at all without policies.

  • agentino.delay_startup_until_policies_timeout_s, 0 (zero) by default, defines the amount of time in seconds the Sysdig instrumentation has to wait before starting up the workload. Note that the time the workload agent needs for acquiring policies depends on a number of factors like configuration, network latency, load, etc. A conservative value might be 60 seconds.

You can provide such configuration options to the Workload Agent via the SYSDIG_EXTRA_CONF environment variable. Note that SYSDIG_EXTRA_CONF expects either a valid YAML or JSON.

For example, the following configuration delays the workload startup for 60 secs to let Sysdig instrumentation acquire the policies. Moreover, it enables the workload to start after the waiting even with no policies in place.

SYSDIG_EXTRA_CONF='{"agentino": {"delay_startup_until_policies_timeout_s": 60}}'

As another example, the following configuration delays the workload startup for 60 secs to let Sysdig instrumentation acquire the policies. But it prevents the workload from starting after the waiting if policies are not in place.

SYSDIG_EXTRA_CONF='{"agentino": {"run_without_policies": false, "delay_startup_until_policies_timeout_s": 60}}'

2.1.2 - Manage Serverless Agent Logs

As of serverless agent version 2.2.0, task logs and instrumentation logs are handled separately.

  • Task logs continue to be with whatever log setup you have on your task container.

  • Instrumentation logs go to a separate log group created by the serverless instrumentation installer:

    <stack_name>-SysdigServerlessLogGroup-<uid>.

    At this time, the log group name cannot be edited. By default, the logging level is set to info.

Logging Environment Variables

Set Logging Level

  • SYSDIG_LOGGING is used for instrumentation log level. Default = info.

    The full list of options are: silent | fatal | critical | error | warning | info | debug | trace

Log Forwarding

The instrumented workload will forward instrumentation logs to the instrumentation agent cloudwatch log group.

If by any reason you want to observe instrumentation logs in the log group of the workload you can set SYSDIG_ENABLE_LOG_FORWARD=false. You can do so, by manually going to ECS task definition and creating a new revision with a modified workload container definition that includes SYSDIG_ENABLE_LOG_FORWARD=false. After creating this revision you can then update your task in the cluster to deploy your new revision. This will in turn lunch a workload container that will NOT forward it’s logs.

When SYSDIG_ENABLE_LOG_FORWARD is set to true the workload will forward it’s logs to the agent’s cloudwatch log group by default, but if you want to capture those logs in a different system for later analysis, you can have your workload container forward it’s logs to an endpoint you set and all you need is to provide SYSDIG_LOG_FORWARD_ADDR and SYSDIG_LOG_LISTEN_PORT.

  • SYSDIG_ENABLE_LOG_FORWARD Default = true
  • SYSDIG_LOG_LISTEN_PORT Default = 32000
  • SYSDIG_LOG_FORWARD_ADDR Default = localhost:32000. Where the logwriter of the workload agent listens. Can be set to any other TCP endpoint.
  • SYSDIG_LOG_FORWARD_FORMAT Default = json. It defines the format of the forwarded instrumentation logs. The supported options are: text | json.

Advanced Configurations

  • SYSDIG_BUFFER_SIZE: Receiving buffer size in bytes for the TCP listener, which can be increased. Default 1024.
  • SYSDIG_DEADLINE_SECONDS: Connection deadline, which can be increased if clients are taking longer to connect. Default 3.

See other serverless agent environment variables here.

3 - Prometheus Remote Write

You can collect Prometheus metrics from environments where the Sysdig agent is not available. Sysdig uses the remote_write capabilities to help you do so.

In Sysdig terminology, the remote endpoints that can read Prometheus metrics are known as Prometheus Remote Write. Prometheus Remote Write does not require the Sysdig agent to be installed in the Prometheus environment. This facility expands your monitoring capabilities beyond Kubernetes and regular Linux kernels to environments where the Sysdig agent cannot be installed.

Prometheus Remote Write can collect metrics from:

  • An existing Prometheus server

  • Additional environments:

    • Windows

    • Managed Cloud Environments, such as AWS and IBM

    • Fargate

    • IoT

Use Sysdig agent in environments where an agent can be installed. However, use the Prometheus Remote Write to collect metrics from ephemeral or batch jobs that may not exist long enough to be scraped by the agent.

With the Prometheus Remote Write, you can either monitor metrics through the Monitor UI or you can use PromQL to query the data by using the standard Prometheus query language.

Endpoints and Regions

Prometheus Remote Write resides in the ingest endpoints for each region under /prometheus/remote/write. The public Prometheus Remote Write endpoints for each region are listed below:

RegionEndpoints
US Easthttps://api.sysdigcloud.com/prometheus/remote/write
US Westhttps://us2.app.sysdig.com/prometheus/remote/write
European Unionhttps://eu1.app.sysdig.com/prometheus/remote/write
Asia Pacific (Sydney)https://app.au1.sysdig.com/prometheus/remote/write

Configure Remote Write in Prometheus Server

You need to configure remote_write in your Prometheus server in order to send metrics to Sysdig Prometheus Remote Write.

The configuration of your Prometheus server depends on your installation. In general, you configure the remote_write section in the prometheus.yml configuration file:

global:
  external_labels:
    [ <labelname>: <labelvalue> ... ]
remote_write:
    - url: "https://<region-url>/prometheus/remote/write"
      bearer_token: "<your API Token>"

The communication between your Prometheus server and Prometheus Remote Write should use the authorization header with the Sysdig API key (not the agent access key) as the bearer token.

Alternatively, you can also use the bearer_token_file entry to refer to a file instead of directly including the API token.

Prometheus does not reveal the bearer_token value on the UI.

Customize Metrics

To enable customization, Sysdig provides additional options to control which metrics you want to send to Prometheus Remote Write.

Manage Metrics

Prometheus Remote Write by default sends all the metrics to Sysdig Prometheus Remote Write. These metrics are sent with a remote_write: true label appended to it so you can easily identify them.

Label Metrics

You can specify custom label-value pairs and send them with each time series to the Prometheus Remote Write. Use the external_labels block in the global section in the Prometheus configuration file. This is similar to setting an agent tag and allowing you to filter or scope the metrics in Sysdig Monitor.

For example, if you have two Prometheus servers configured to remote write to Prometheus Remote Write, you can include an external label to identify them easily:

Prometheus 1

global:
  external_labels:
    provider: prometheus1
remote_write:
- url: ...

Prometheus 2

global:
  external_labels:
    provider: prometheus2
remote_write:
- url: ...

Filter Metrics

With the general configuration, all the metrics are by default remotely written to Prometheus Remote Write. You can control the metrics that you collect and send to Sysdig. To select which series and labels to collect, drop, or replace, and reduce the number of active series that are sent to Sysdig, you can set up relabel configurations by using the write_relabel_configs block within your remote_write section.

For example, you can send metrics from one specific namespace called myapp-ns as give below:

remote_write:
- url: https://<region-url>/prometheus/remote/write
  bearer_token_file: /etc/secrets/sysdig-api-token
  write_relabel_configs:
  - source_labels: [__meta_kubernetes_namespace]
    regex: 'myapp-ns'
    action: keep

Rate Limit

The default limits are configured set for each user and can be raised as required. The defaults are good for most users, and the limits help protect against any misconfigurations.

Feature

Limit

Parallel writes

100 concurrent requests.

This doesn't necessarily mean 100 Prometheus servers because the time at which the data is written is distributed.

Data points per minute

One million.

The number of data points sent depends on how often metrics are submitted to Sysdig. A scrape interval of 10s will submit more DPM than an interval of 60s.

Number of writes per minute

10,000

Team Scoping

It is possible to scope a Sysdig Team to only access metrics matching certain labels sent via Prometheus remote write. See Manage Teams and Roles

Limitations

  • Metrics sent to Prometheus Remote Write can be accessed in Explore, but they are not compatible with the scope tree.

  • Label enrichment is unavailable at this point. Only labels collected at the source can be used. You should add additional labels to perform further scoping or pivoting in Sysdig.

  • Currently, Sysdig Dashboards do not support mixing metrics with different sampling. For example, 10 seconds and 1-minute samples. For optimal experience, configure the scrape interval to be 10s to combine remote write metrics with agent metrics.

  • Remote write functionality does not support sending metric metadata. Upstream Prometheus recently added support for propagation of metadata (metric type, unit, description, info) and this functionality will be supported in a future update to Prometheus Remote Write.

    • Suffix the metric name with _total, _sum , or _count to store them as a counter. Otherwise, the metrics will be handled as a gauge.

    • Units can be set in Dashboards manually.

Learn More

4 - Sysdig Secure for cloud

Sysdig Secure for cloud is the software that connects Sysdig Secure features to your cloud environments to provide unified threat detection, compliance, forensics, and analysis.

Because modern cloud applications are no longer just virtualized compute resources, but a superset of cloud services on which businesses depend, controlling the security of your cloud accounts is essential. Errors can expose an organization to risks that could bring resources down, infiltrate workloads, exfiltrate secrets, create unseen assets, or otherwise compromise the business or reputation. As the number of cloud services and configurations available grows exponentially, using a cloud security platform protects against having an unseen misconfiguration turn into a serious security issue.

Check Sysdig Secure - Secure for cloud to review the features provided per cloud.

Multiple Installation Options

Sysdig Secure for cloud is available on a variety of cloud providers, with simplified installation instructions available from the Get Started screen.

Terraform based

Supported cloud providers at this time are:

Cloud-Native templates

Native template deployment for

Helm Chart based (feature limited)

Core component for Sysdig Secure for Cloud, is called cloud-connector, which is also available through following methods:

Note: Installing this component will only allow you threat detection and image scaning features, although together with SysdigCompliance IAM role, it can handle Compliance and Identity and Access Posture too.

See Also

4.1 - AWS Deployment

This section covers installation methods.
Review the offering description on Sysdig Secure for cloud - AWS

Deployment Options

All following options provide all four cloud features: threat detection, CSPM benchmarks, and image and container registry scanning

  • Terraform-based for two types of AWS account
    • Organizational/management account: This is the account that you use to create the organization in AWS. Organizational accounts create and contain member accounts.
    • Single/member account: Each of these is a stand-alone account. A member account that is part of an organization is supported too.
  • CloudFormation Template (CFT)-based: This option requires explicit creation of an AWS role, which is prompted by the onboarding wizard.

Onboarding Using Terraform

Terraform-based install instructions differ depending on what type of AWS account you are using.

At this time, the options include:

For Single/Member Account

The default code provided in the Get Started page of Sysdig Secure, or the Data Sources | Cloud Accounts page, is pre-populated with your Secure API token and will automatically install threat detection with CloudTrail, AWS benchmarks, and container registry and image scanning.

Prerequisites and Permissions

  • A Sysdig Secure SaaS account, with administrator permissions
  • An AWS account, for Secure for Cloud compute workload deployment
    • You must have Administrator permissions, or permissions to create each of the resources specified in the resources list.
    • Enable AWS STS in each region you would like to secure.
  • Have Terraform installed on the machine from which you will deploy the installation code.

Steps

  1. Log in to Sysdig Secure as Admin and select Get Started > Connect your Cloud account and choose theAWS tab;

    OR

    select Integrations > Data Sources | Cloud Account and choose Connect Account|AWS

  2. Copy the code snippet under Single Account and paste it into a Terraform Manifest (.tf file). It should be pre-configured with your Sysdig API token.

  3. Then run:

    $ terraform init
    

    When complete, run:

    $ terraform apply
    

    which will present the changes to be made, ask you to confirm them, then make the changes.

  4. Confirm the Services are Working

    Check Troubleshooting in case of permissions or account conflict errors.

For Organizational/Management Account

For organizational accounts, the default code provided in the Get Started page of Sysdig Secure is pre-populated with your Secure API token and will automatically install threat detection with CloudTrail (only).

Prerequisites and Permissions

  • A Sysdig Secure SaaS account, with administrator permissions
  • An AWS account on your organization, for Secure for Cloud compute workload deployment (we recommend creating an isolated member account)
    • You must have Administrator permissions, or permissions to create each of the resources specified in the resources list. Sysdig provides an IAM policy containing the required permissions.
    • Enable AWS STS in each region you would like to secure.
    • An existing AWS account as the organization master account with the Organizational CloudTrail service enabled.
    • AWS profile credentials configuration of the master account of the organization; You must also have sufficient permissions for the IAM user or role in the management account to successfully create an organization trail.
  • Have Terraform installed on the machine from which you will deploy the installation code.

Steps

  1. Log in to Sysdig Secure as Admin and select Get Started > Connect your Cloud account and choose the AWS tab;

    OR

    select Integrations > Data Sources | Cloud Account and choose Connect Account

  2. Copy the code snippet under Organizational Account and paste it in the terminal of your local machine. It should be pre-configured with your Sysdig API token.

  3. Then run:

    $ terraform init
    

    When complete, run:

    $ terraform apply
    

    which will present the changes to be made, ask you to confirm them, then make the changes.

  4. Confirm the Services are Working

    Check Troubleshooting in case of permissions or account conflict errors.

Soon, this option will be expanded to include all the features currently in the single account option, as well as the ability to easily add multiple member accounts.

Customizing the Install

Both the Single Account and Organizational Account code examples are configured with sensible defaults for the underlying inputs. But if desired, you can edit the region, module enablement, and other Inputs. See details for:

Enabling Image Scanner

Image Scanner feature is disabled by default. If you want to enable it, just use the deploy_scanning input variable on your snippet such as:

module "secure-for-cloud_example"{
 ...
 deploy_image_scanning_ecs = true
 deploy_image_scanning_ecr = true
}

Resources Created by Each Module

Check full list of created resources

  • Benchmark

    • aws_iam_role

    • aws_iam_role_policy_attachment

    • sysdig_secure_benchmark_task

    • sysdig_secure_cloud_account

  • General; Threat detection / CSPM / CIEM

    • aws_cloudwatch_log_stream

    • aws_ecs_service

    • aws_ecs_task_definition

    • aws_iam_role

    • aws_iam_role_policy

    • aws_s3_bucket

    • aws_s3_bucket_object

    • aws_s3_bucket_public_access_block

    • aws_security_group

    • aws_sns_topic_subscription

    • aws_sqs_queue

    • aws_sqs_queue_policy

  • Image Scanning

    • aws_cloudwatch_log_group

    • aws_cloudwatch_log_stream

    • aws_ecs_service

    • aws_ecs_task_definition

    • aws_iam_role

    • aws_iam_role_policy

    • aws_security_group

    • aws_sns_topic_subscription

    • aws_sqs_queue

    • aws_sqs_queue_policy

If cloud-connector or cloud-scanning is installed, these additional modules will be installed:

  • resource-group

    • aws_resourcegroups_group
  • cloudtrail

    • aws_cloudtrail

    • aws_kms_alias

    • aws_kms_key

    • aws_s3_bucket

    • aws_s3_bucket_policy

    • aws_s3_bucket_public_access_block

    • aws_sns_topic

    • aws_sns_topic_policy

  • ssm

    • aws_ssm_parameter
  • ecs-fargate-cluster

    • aws_ecs_cluster

If cloud-scanning is installed, these additional modules will be installed:

  • codebuild

    • aws_cloudwatch_log_group

    • aws_codebuild_project

    • aws_iam_role

    • aws_iam_role_policy

Troubleshooting

Find more troubleshooting options on the module source repository

1. Resolve 409 Conflict Error

This error may occur if the specified cloud account has already been onboarded to Sysdig.

Solution:

The cloud account can be imported into Terraform by running: 

terraform import module.cloud_bench.sysdig_secure_cloud_account.cloud_account CLOUD_ACCOUNT_ID

2. Resolve Permissions Error/Access Denied

This error may occur if your current AWS authentication session does not have the required permissions to create certain resources.

Solution:

Ensure you are authenticated to AWS using a user or role with the required permissions.Onboarding a Single Account using a CFT

Onboarding a Single Account using a CFT

Each of the features can be enabled from a single CloudFormation Template (CFT) from the AWS Console. Two options are available:

  • Secure For Cloud stack, deployed on ECS compute workload. Available in all regions

  • Secure For Cloud stack, deployed on AppRunner compute workload. A less resource-demanding deployment but not available in all regions; accepting ‘us-east-1’, ‘us-east-2’, ‘us-west-2’, ‘ap-northeast-1’ and ’eu-west-1’

Prerequisites

  • A Sysdig Secure SaaS account

  • An AWS account and AWS services you would like to connect to Sysdig, with appropriate permissions to deploy a CFT.

Steps

  1. Log in to your AWS Console and confirm that you are in the account and AWS region that you want to secure using Sysdig Secure for cloud.

  2. Log in to Sysdig Secure as Admin and select

  3. Get Started > Connect your Cloud account and choose the AWS tab

    OR

    select Integrations > Data Sources | Cloud Account and choose Connect Account

  4. Select between Install Secure For Cloud stack, deployed on ECS compute workload or Install Secure For Cloud stack, deployed on AppRunner compute workload link.

    The Connect Account dialog is displayed.

  5. Enter:

    • The AWS account number with which you want to connect

    • An IAM Role name to be created for Sysdig Secure for cloud in AWS. This role name must not yet exist in your account.

      The role provides read-only access to your resources to allow Sysdig to monitor and secure your cloud account. Access is scoped to the managed SecurityAudit policy.

  6. Click Launch Stack.

    The AWS Console opens, at the CloudFormation > Stacks > Quick Create page. The Sysdig CloudFormation template is pre-loaded.

    Confirm that you are logged in the AWS account and region where you want to deploy the Sysdig Template.

  7. Provide a Stack name or accept the default.

  8. Fill in the Parameters:

    Sysdig Settings

    • Sysdig Secure Endpoint: Default (US-East): https://secure.sysdig.com.
      If your Sysdig Secure platform is installed in another region, use that endpoint.

      • US West: https://us2.app.sysdig.com
      • European Union: https://eu1.app.sysdig.com
    • Sysdig Secure API Token: See Retrieve the Sysdig API Token to find yours.

    • Sysdig Role Name: As specified in Step 3; IAM role name to be created for Sysdig to access your AWS account

    • Sysdig External ID: Not to be modified. It’s the ExternalID to identify Sysdig on AWS, for the Trusted Identity

    • Sysdig Trusted Identity: Not to be modified. It’s the ARN of the Trusted Identity of Sysdig on AWS, to be able to run CSPM benchmarks.

    Modules to Deploy: Choose any or all.
    CSPM/Compliance and Threat detection using CloudTrail capabilities will be always deployed.

    • ECR Image Registry Scanning: Integrates container registry scanning for AWS ECR.

    • Fargate Image Scanning: Integrates image scanning on any container image deployed on a serverless Fargate task (in ECS).

    Existing Infrastructure: Leave all this fields blank for resources to be created.
    If you want to use existing components of your infrastructure, you can provide:

    • Network: Only available if stack is deployed on ECS. If provided, MUST specify ALL field values:

      • ECS Cluster Name where the Sysdig workload is to be deployed
      • VPC ID for the ECS Cluster
      • Private subnet ID(s) for the VPC. At least two subnets are required
    • Cloudstrail SNS Topic: Specify the URL of the SNS Topic


  1. Confirm the Capabilities required to deploy:

    • Check “I acknowledge that AWS CloudFormation might create IAM resources with custom names.”

    • Check “I acknowledge that AWS CloudFormation might require the following capability: CAPABILITY_AUTO_EXPAND”

  2. Click Create Stack.

    In the AWS Console, the main stack and associated substacks will show “CREATE_IN_PROGRESS”. Refresh the status to see “CREATE_COMPLETE” for all. There is a delay of 5-10 minutes for events to be sent from CloudTrail, but no event is lost.

    A success message also appears in the Sysdig Secure Get Started page.

Confirm the Services are Working

Log in to Sysdig Secure and check that each module you deployed is functioning. It may take 10 minutes or so for events to be collected and displayed.

Check Overall Connection Status

  • Data Sources: Select Select Integrations > Data Sources | Cloud Accounts to see all connected cloud accounts.

  • Subscription: Select Settings > Subscription to see an overview of your account activity, including cloud accounts.

  • Insights: Check that Insights have been added to your navigation bar. View activity on the Cloud Account, Cloud User, or Composite insight views.

Check Threat Detection

  • Policies and Rules: Check Policies > Runtime Policies and confirm that the Sysdig AWS Threat Detection and Sysdig AWS Threat Intelligence managed policies are enabled.

    • These consist of the most-frequently-recommended rules for AWS and CloudTrail. You can customize them by creating a new policy of the AWS CloudTrail type.
  • Events: In the Events feed, search cloud to show events from AWS CloudTrail.

  • Force an event: In case you want to manually create an event, choose one of the rules contained an AWS policy and execute it in your AWS account.
    ex.: Create a S3 Bucket with Public Access Blocked. Make it public to prompt the event.
    Remember that in case you add new rules to the policy you need to give it time to propagate the changes.

Check Identity and Access (AWS)

Select Posture > Identity and Access|Users and check if the following are showing up in the Unused Permissions lists:

  • Unused permissions
  • Inactive users
  • Policies with the greatest number of granted vs used permissions

Follow the instructions to remediate overly permissive entitlements and reduce security risks.

See Also

4.2 - GCP Deployment

This section covers installation methods.
Review the offering description on Sysdig Secure for cloud - GCP.

Deployments on GCP use a Terraform file.

Onboarding Using Terraform

Terraform-based install instructions differ depending on what type of account you are using.

At this time, the options include:

The default code provided in the Get Started page of Sysdig Secure is pre-populated with your Secure API token and will automatically install threat detection, benchmarks, and container registry and image scanning.

Prerequisites and Permissions

  • A Sysdig Secure SaaS account, with administrator permissions
  • A Google Cloud Platform (GCP) account, for Secure for Cloud compute workload deployment
    • Owner role, in order to create each of the resources specified in the resources list below
      • For organizational Organization Admin role is required too.
    • Enable the required GCP APIs
  • Have Terraform installed on the machine from which you will deploy the installation code.

Steps

  1. Log in to Sysdig Secure as Admin and select Get Started > Connect your Cloud account and choose the GCP tab.

    OR

    select Integrations > Data Sources | Cloud Account and choose Connect Account | GCP

  2. Copy the code snippet under Single Account or Organizational Account and paste it into a Terraform Manifest (.tf file). It should be pre-configured with your Sysdig API token.

  3. Then run:

    $ terraform init
    

    When complete, run:

    $ terraform apply
    

    which will present the changes to be made, ask you to confirm them, then make the changes.

  4. Confirm the Services are Working

    Check Troubleshooting in case of permissions or account conflict errors.

Customizing the Install

Both the Single Account and Organizational Account code examples are configured with sensible defaults for the underlying inputs. But if desired, you can edit the region, module enablement, and other Inputs. See details for:

Enabling Image Scanner

Image Scanner feature is disabled by default. If you want to enable it, just use the deploy_scanning input variable on your snippet such as:

module "secure-for-cloud_example"{
 ...
 deploy_scanning = true
}

Resources Created by Each Module

Check full list of created resources

  • Cloud-bench
    • google_iam_workload_identity_pool
    • google_iam_workload_identity_pool_provider
    • google_project_iam_custom_role
    • google_project_iam_member
    • google_service_account
    • google_service_account_iam_binding
    • sysdig_secure_benchmark_task
    • sysdig_secure_cloud_account
  • Cloud-connector
    • google_cloud_run_service
    • google_eventarc_trigger
    • google_project_iam_member
    • google_storage_bucket
    • google_storage_bucket_iam_member
    • google_storage_bucket_object
  • Cloud-scanning
    • google_cloud_run_service
    • google_cloud_run_service_iam_member
    • google_eventarc_trigger
    • google_project_iam_member
    • google_pubsub_topic

If Cloud-connector is installed in organizational mode, this additional module will be installed:

  • Organization-sink
    • google_logging_organization_sink
    • google_pubsub_topic
    • google_pubsub_topic_iam_member

If Cloud-connector is installed in single-project mode, this additional module will be installed:

  • Project-sink
    • google_logging_project_sink
    • google_pubsub_topic
    • google_pubsub_topic_iam_member

If Cloud-scanning is installed, this additional module will be installed:

  • Secrets
    • google_secret_manager_secret
    • google_secret_manager_secret_iam_member
    • google_secret_manager_secret_version`

Troubleshooting

Find more troubleshooting options on the module source repository

1. Insufficient Permissions on Project

This error may occur if your current GCP authentication session does not have the required permissions to access the specified project.

Solution: Ensure you are authenticated to GCP using a user or service account with the required permissions.

2. Insufficient Permissions to Create Resource

This error may occur if your current GCP authentication session does not have the required permissions to create certain resources.

Solution: Ensure you are authenticated to GCP using a user or service account with the required permissions.

If you have sufficient permissions but still get this kind of error, try to authenticate gcloud using:

$ gcloud auth application-default login

$ gcloud auth application-default login set-quota-project your_project_id

3. Conflicting Resources

This error may occur if the specified GCP project has already been onboarded to Sysdig.

Solution: The cloud account can be imported into terraform by running

terraform import module.single account.module.cloud_bench.sysdig_secure_cloud_account.cloud_account PROJECT_ID , where PROJECT_ID is the numerical ID of the project (not the project name).

4. Workload Identity Federation pool already exists

This error may occur if a Workload Identity Federation Pool or Pool Provider has previously been created, and then deleted, either via the GCP console or with terraform destroy. When a delete request for these resources is sent to GCP, they are not completely deleted, but marked as “deleted”, and remain for 30 days. These “deleted” resources will block creation of a new resource with the same name.

Solution: The “deleted” pools must be restored using the GCP console, and then imported into terraform, using terraform import module.single-account.module.cloud_bench.google_iam_workload_identity_pool.pool POOL_ID and module.single-account.module.cloud_bench.google_iam_workload_identity_pool_provider.pool_provider POOL_ID/PROVIDER_ID

5. Received an email from Google Cloud Platform citing a Configuration Error

Email contains error codes such as:

Error Code: topic_not_found
or
Error Code: topic_permission_denied

Cause: The resources Sysdig deployed with Terraform will eventually be consistent, but it could happen that some pre-required resources are created but not ready yet.

Solution: This is a known issue that will only take place within first minutes of the deployment. Eventually, resource health checks will pass and modules will work as expected.

Confirm the Services are Working

Log in to Sysdig Secure and check that each module you deployed is functioning. It may take 10 minutes or so for events to be collected and displayed.

Check Overall Connection Status

  • Data Sources: Select Integrations > Data Sources | Cloud Accounts to see all connected cloud accounts.

  • Insights: Check that Insights have been added to your navigation bar. View activity on the Cloud Account, Cloud User, or Composite insight views.

Check Threat Detection

  • Policies and Rules: Check Policies > Runtime Policies and confirm that the Sysdig GCP Threat Detection and Sysdig GCP Threat Intelligence managed policies are enabled.

    • These consist of the most-frequently-recommended rules for GCP.
  • Events: In the Events feed, search cloud to show events from GCP.

See Also

4.3 - Azure Deployment

This section covers installation methods.
Review the offering description on Sysdig Secure for cloud - Azure.

Deployments on Azure use a Terraform file.

Onboarding Using Terraform

Terraform-based install instructions differ depending on what type of account you are using.

At this time, the options include:

  • Install for a single subscription
  • Install for tenant subscriptions

The default code provided in the Get Started page of Sysdig Secure is pre-populated with your Secure API token and will automatically install threat detection, benchmarks, and container registry and image scanning.

Prerequisites and Permissions

  • A Sysdig Secure SaaS account, with administrator permissions
  • An Azure subscription/tenant you would like to connect to Sysdig
    • The installing user must have (Organizational level) Security Administrator and (Subscription level) Owner role
    • More permissions detail
  • Terraform installed on the machine from which you will deploy the installation code.

Steps

  1. Log in to Sysdig Secure as Admin and select Get Started > Connect your Cloud account and choose the Azure tab.

    OR

    select Integrations > Data Sources | Cloud Account and choose Connect Account | Azure

  2. Copy the code snippet under Single Subscription or Tenant Subscriptions and paste it into a Terraform Manifest (.tf file). It should be pre-configured with your Sysdig API token.

  3. Then run:

    $ terraform init
    

    When complete, run:

    $ terraform apply
    

    which will present the changes to be made, ask you to confirm them, then make the changes.

  4. Confirm the Services are Working

    Check Troubleshooting in case of permissions or account conflict errors.

Customizing the Install

Both the Single Account and Organizational Account code examples are configured with sensible defaults for the underlying inputs. But if desired, you can edit the region, module enablement, and other Inputs. See details for:

Enabling Image Scanner

Image Scanner feature is disabled by default. If you want to enable it, just use the deploy_scanning input variable on your snippet such as:

module "secure-for-cloud_example"{
 ...
 deploy_scanning = true
}

Resources Created by Each Module

Check full list of created resources

  • Cloud-bench
    • azurerm_lighthouse_assignment
    • azurerm_lighthouse_definition
    • azurerm_role_definition
    • azurerm_subscription
    • sysdig_secure_cloud_account
    • sysdig_secure_benchmark_task
  • Cloud-connector
    • azurerm_container_group
    • azurerm_network_profile
    • azurerm_storage_account
    • azurerm_storage_blob
    • azurerm_storage_container
    • azurerm_subnet
    • azurerm_virtual_network

If Cloud-connector is installed, these additional modules will also be installed:

  • Container-registry
    • azurerm_container_registry
    • azurerm_eventgrid_event_subscription
  • Enterprise-application
    • azuread_application
    • azuread_application_password
    • azuread_service_principal
    • azuread_service_principal_password
    • azurerm_role_assignment
    • azurerm_role_definition
  • Eventhub
    • azurerm_eventhub
    • azurerm_eventhub_authorization_rule
    • azurerm_eventhub_namespace
    • azurerm_eventhub_namespace_authorization_rule
    • azurerm_monitor_diagnostic_setting
    • azurerm_resource_group

Troubleshooting

Find more troubleshooting options on the module source repository

1. Insufficient Permissions on Subscription

This error may occur if your current Azure authentication session does not have the required permissions to create resources in the specified subscription.

Solution: Ensure you are authenticated to Azure using a Non-Guest user with the Contributor or Owner role on the target subscription.

Error: Error Creating/Updating Lighthouse Definition "dd9be15b-0ee9-7daf-b942-5e173dae13fb" (Scope "/subscriptions/***"): managedservices.RegistrationDefinitionsClient#CreateOrUpdate: Failure sending request: StatusCode=0 -- Original Error: Code="InsufficientPrivilegesForManagedServiceResource" Message="The requested user doesn't have sufficient privileges to perform the operation."
       
     with module.cloudvision_example_existing_resource_group.module.cloud_bench.module.trust_relationship["***"].azurerm_lighthouse_definition.lighthouse_definition,
         on ../../../modules/services/cloud-bench/trust_relationship/main.tf line 28, in resource "azurerm_lighthouse_definition" "lighthouse_definition":
         28: resource "azurerm_lighthouse_definition" "lighthouse_definition" {

2. Conflicting Resources

This error may occur if the specified Azure Subscription has already been onboarded to Sysdig

Solution: The resource can be imported into Terraform by using the terraform import command. This will bring the resource under management in the current Terraform workspace.

Error: A resource with the ID "/subscriptions/***/resourceGroups/sfc-resourcegroup" already exists - to be managed via Terraform this resource needs to be imported into the State. Please see the resource documentation for "azurerm_resource_group" for more information.
       
         with module.cloudvision_example_existing_resource_group.module.infrastructure_eventhub.azurerm_resource_group.rg[0],
         on ../../../modules/infrastructure/eventhub/main.tf line 6, in resource "azurerm_resource_group" "rg":
          6: resource "azurerm_resource_group" "rg" {

Confirm the Services are Working

Log in to Sysdig Secure and check that each module you deployed is functioning. It may take 10 minutes or so for events to be collected and displayed.

Check Overall Connection Status

  • Data Sources: Select Integrations > Data Sources | Cloud Accounts to see all connected cloud accounts.
  • Insights: Check that Insights have been added to your navigation bar. View activity on the Cloud Account, Cloud User, or Composite insight views.

Check Threat Detection

  • Policies and Rules: Check Policies > Runtime Policies and confirm that the Sysdig Azure Threat Detection and Sysdig Azure Threat Intelligence policies are enabled.

    • These consist of the most-frequently-recommended rules for Azure DevOps.
  • Events: In the Events feed, search ‘cloud’ to show events from Azure.

See Also

5 - Rapid Response: Installation

Rapid Response component, allowing designated Advanced Users to remote connect into a host (available for Sysdig Secure on-prem only)

With Rapid Response, Sysdig has introduced a way to grant designated Advanced Users in Sysdig Secure the ability to remote connect into a host directly from the Event stream and execute desired commands there.

Rapid Response team members have access to a full shell from within the Sysdig Secure UI. Responsibility for the security of this powerful feature rests with you: your enterprise and your designated employees.

See also: Rapid Response.

Install and Configure Rapid Response

Prerequisites

  • Sysdig Secure On-Premises 4.0+ or Sysdig Secure SaaS

  • Have on hand:

    • Your Sysdig agent access key

    • Your Sysdig API endpoint

      • On-prem: custom, depending on your on-prem installation

      • SaaS: Region-dependent (use the “endpoint” entries, e.g. https://us2.app.sysdig.com for US West)

    • A passphrase used to encrypt all traffic between the user and host.

      NOTE: Sysdig cannot recover this passphrase. If lost, a user will not be able to start a session, nor will any session logs be recoverable.

    Optionally, these can be added to the environment variables:

    export API_ENDPOINT=https://secure-staging.mycompany.com
    export ACCESS_KEY=$YOUR_SYSDIG_AGENT_INSTALLATION_KEY
    export PASSPHRASE=$ENCRYPTION_PASSPHRASE
    export API_TLS_SKIP_CHECK=false
    

Install Host Component

The Rapid Response agent can be installed as a Docker container, as a Kubernetes DaemonSet or via Helm.

As Docker Container

  1. Mount the host directories and binaries to gain access to the host.

    docker run --hostname $HOST_NAME -d quay.io/sysdig/rapid-response-host-component:latest --endpoint $API_ENDPOINT --access-key $ACCESS_KEY --password $PASSPHRASE
    
  2. Customize the Docker image.

    The container is simply bash shell. To add custom scripts without needing to mount the underlying host filesystem, you can bake this into the Docker container, e.g. by installing kubectl, gcloud, netstat, or another command-line utility.

    FROM quay.io/sysdig/rapid-response-host-component:latest AS base-image
    
    FROM alpine:3.13
    COPY --from=base-image /usr/bin/host /usr/bin/host
    
    # add custom scripts and other directives
    
    ENTRYPOINT ["host"]
    

As Kubernetes DaemonSet

  1. Create a namespace and secrets for the Rapid Response agent:

    kubectl create ns rapid-response
    kubectl create secret generic sysdig-rapid-response-host-component-access-key --from-literal=access-key=$ACCESS_KEY -n rapid-response
    kubectl create secret generic sysdig-rapid-response-host-component-passphrase --from-literal=passphrase=$PASSPHRASE -n rapid-response
    
  2. Create the configmap and change the API-ENDPOINT parameter:

    echo "apiVersion: v1
    kind: ConfigMap
    metadata:
      name: sysdig-rapid-response-host-component
    data:
      api-endpoint: ${API_ENDPOINT}
      api-tls-skip-check: 'false'" | kubectl apply -n rapid-response -f -
    
  3. Deploy the DaemonSet.

    Note: The agent does not automatically have access to the host filesystem; there are several mounts commented-out in the manifest that must be uncommented to investigate the host.

    echo "# apiVersion: extensions/v1beta1  # If you are in Kubernetes version 1.8 or less please use this line instead of the following one
    apiVersion: apps/v1
    kind: DaemonSet
    metadata:
      name: sysdig-rapid-response-host-component
      labels:
        app: sysdig-rapid-response-host-component
    spec:
      selector:
        matchLabels:
          app: sysdig-rapid-response-host-component
      updateStrategy:
        type: RollingUpdate
      template:
        metadata:
          labels:
            app: sysdig-rapid-response-host-component
        spec:
          hostNetwork: true
          volumes:
            # Add custom volume here
            # Uncomment these lines if you'd like to map /root/ from the
            # host into the container.
            #- hostPath:
            #    path: /
            #  name: host-root-vol
            - name: sysdig-rapid-response-host-component-config
              configMap:
                name: sysdig-rapid-response-host-component
                optional: true
          tolerations:
            - effect: NoSchedule
              key: node-role.kubernetes.io/master
          containers:
            - name: sysdig-rapid-response-host-component
              image: quay.io/sysdig/rapid-response-host-component
              #securityContext:
                # The privileged flag is necessary for OCP 4.x and other Kubernetes setups that deny host filesystem access to
                # running containers by default regardless of volume mounts. In those cases, access to the CRI socket would fail.
              #  privileged: true
              imagePullPolicy: Always
              resources:
                limits:
                  cpu: 500m
                  memory: 500Mi
                requests:
                  cpu: 250m
                  memory: 250Mi
              # Add custom volume mount here
              # Uncomment these lines if you'd like to map /root/ from the
              # host into the container.
              #volumeMounts:
              #- mountPath: /host
              #  name: host-root-vol
              env:
                - name: API_ENDPOINT
                  valueFrom:
                    configMapKeyRef:
                      name: sysdig-rapid-response-host-component
                      key: api-endpoint
                - name: API_TLS_SKIP_CHECK
                  valueFrom:
                    configMapKeyRef:
                      name: sysdig-rapid-response-host-component
                      key: api-tls-skip-check
                - name: ACCESS_KEY
                  valueFrom:
                    secretKeyRef:
                      name: sysdig-rapid-response-host-component-access-key
                      key: access-key
                - name: PASSWORD
                  valueFrom:
                    secretKeyRef:
                      name: sysdig-rapid-response-host-component-passphrase
                      key: passphrase" | kubectl apply -n rapid-response -f -
    

Using Helm

The Rapid Response Helm chart can be deployed as:

helm repo add sysdig https://charts.sysdig.com
helm install rapid-response sysdig/rapid-response \
  --namespace rapid-response \
  --create-namespace \
  --set sysdig.accessKey=${ACCESS_KEY} \
  --set rapidResponse.passphrase=${PASSPHRASE} \
  --set rapidResponse.apiEndpoint=${API_ENDPOINT}

Note: The agent does not automatically have access to the host filesystem; use the extraVolumes setting to do so.

Complete the Configuration

After installation/upgrade, complete the following steps:

  • Request enablement of the feature from Sysdig Support.

  • Configure an S3 bucket for Rapid Response logs: If you are using the default Cassandra storage for Capture files, you will need to configure an AWS or custom S3 bucket to store Rapid Response log files after a session. If you have already configured an S3 bucket for Captures, then Rapid Response logs will be routed there automatically, into their own folder.

  • Manage the following port/firewall considerations:

    • Ensure the host component is able to reach the endpoint defined in API_ENDPOINT

    • Ensure there are no intermediate proxies that could enforce maximum time to live (since sessions could potentially have long durations)

    • Ensure that the host component can reach the object storage (S3 bucket) when configured.

  • Configure and use Rapid Response in the Sysdig Secure UI: See Rapid Response.

6 - Node Analyzer: Multi-Feature Installation

Multi-Feature Installation for benchmarks, host scanning, and the image analyzer.

What Is the Node Analyzer?

The Node Analyzer (NA) provides a method for deploying the components for three different Sysdig Secure features:

  • (Node) Image Analyzer: an existing tool that can now be installed and/or upgraded in a new way, alongside the other two components.

  • Benchmarks: Installs a new component (called a benchmark runner) which is required to use Benchmarks, including an updated interface and new improved features. The legacy Benchmark tool can still be accessed.

  • Host Scanning: a new tool for scanning not just the images/containers on a host, but the host itself.

Installation Options

All the Node Analyzer components, along with the Sysdig agent, are deployed per node or host. You can deploy them using various methods:

Fresh Install: Agent + Node Analyzer

If you are installing Sysdig Secure for the first time and have not yet deployed any agents, you can use a single-line install to deploy both the Sysdig agent and the Node Analyzer (NA) tools. The script will make changes to each node or host within a cluster.

curl -s 
https://download.sysdig.com/stable/install-agent-kubernetes | sudo bash -s
-- --access_key ACCESS_KEY --collector COLLECTOR_URL --collector_port 6443 --nodeanalyzer --api_endpoint API_ENDPOINT

For SaaS, see also the Get Started page in Sysdig Secure. Under “Connect Your Data Sources,” the script is generated with your endpoints automatically inserted.

On-Premises with Self-Signed Cert:

If you want the Node Analyzer to report to an On-Prem Sysdig backend that uses a self-signed certificate, then: Add -cc false to the command line so the node analyzer will accept it.

To find the values yourself:

  • access_key: This is the agent access key. You can retrieve this from Settings > Agent Installation in the Sysdig Secure UI.

  • collector_url: This value is region-dependent in SaaS and is auto-completed on the Get Started page in the UI. (It is a custom value in on-prem installations.)

  • api_endpoint: This is the base URL ( region-dependent) for Sysdig Secure and is auto-completed on the Get Started page. E.g. secure.sysdig.com, us2.app.sysdig.com, eu1.app.sysdig.com.

When finished, you can Access the Node Analyzer Features.

Upgrade/Install Node Analyzer Tools Only

Use this script in the following conditions:

  • Agent is already installed, you just want the NA tools

  • Node Image Analyzer already installed; you want to upgrade it to v2

  • You want to add Benchmarks v2 and Host Scanning features to your existing Sysdig Secure environment, as well as upgrade or install the Image Analyzer.

Note that if you already have the Node Image Analyzer (v1) installed, this script will upgrade that component automatically. An agent MUST already be installed. The script will make changes to every node in the cluster.

curl -s https://download.sysdig.com/stable/install-node-analyzer | sudo bash -s -- --api_endpoint API_ENDPOINT

When finished, you can Access the Node Analyzer Features.

Daemonset Install

To deploy the Node Analyzer using Kubernetes daemonsets, download the following configuration files, edit them as annotated within the files, and deploy them.

To deploy the Node Analyzer concurrently with the Sysdig agent, you would also download the sysdig-agent-clusterrole.yaml, sysdig-agent-daemonset-v2.yaml, and sysdig-agent-configmap.yaml and deploy them as described in Agent Install: Kubernetes.

You need to deploy these YAMLs after installing the Sysdig agent in the same nodes, and also in the same namespace (sysdig-agent by default).

When finished, you can Access the Node Analyzer Features.

Install with Helm

Use the “Sysdig” Helm chart, which installs the Sysdig agent and the Node Analyzer, with the following commands:

helm repo add sysdig https://charts.sysdig.com
helm repo update
helm install sysdig-agent --set global.sysdig.accessKey=ACCESS_KEY --set global.sysdig.region=SYSDIG_REGION sysdig/sysdig-deploy

To find the values:

  • global.sysdig.accessKey: This is the agent access key. You can retrieve this from Settings > Agent Installation in the Sysdig Secure UI.

  • global.sysdig.region: This value is region-dependent in SaaS and is auto-completed on the Get Started page in the UI. It is a custom value in on-prem installations.

Access the Node Analyzer Features

Log in to Sysdig Secure and check that the features are working as expected.

Confirm Image Analyzer

  1. Select Scanning > Image Results.

  2. Check for scanned container image results that originate with the Sysdig Node Image Analyzer.

Use Host Scanning

Check vulnerabilities in hosts or nodes, both for operation system packages (e.g. rpm, dpkg) and non-operating system packages (e.g. Java packages, Ruby gems).

  1. Select Scanning > Hosts.

  2. Review the Host vulnerabilities listed.

Your active team scope is applied when loading host scanning results. Log in with the broadest team and user credentials to see the full report.

Use Benchmarks (Legacy Feature)

  1. Select Benchmarks |Tasks.

  2. Either configure a new task or review your upgraded tasks. Click a line item to see the associated benchmark report.

    Your active team scope is applied when loading benchmarks results. Log in with the broadest team and user credentials to see the full report.

Alternate Install Cases

The installation options above should be sufficient for the majority of users; the options below allow for customizations and special cases.

Running Node Analyzer Behind a Proxy

Depending on your organization’s network design, you may require the HTTP requests from Node Analyzer features to pass through a proxy in order to reach the Sysdig Secure backend. To do so, you must edit all three configmaps:

These are in the sysdig-agent namespace by default.

Configure the following variables:

  • http_proxy/https_proxy Use with the relevant proxy URL, e.g. http://my_proxy_address:8080.

    In most cases, it is enough to specify http_proxy. as it applies to HTTPS connections as well.

  • no_proxy Use this parameter to exclude certain subnets from using the proxy, adding a comma-separated exclusion list, e.g. 127.0.0.1,localhost,192.168.0.0/16,172.16.0.0/12,10.0.0.0/8

If the proxy server requires authentication it is possible to specify credentials in the URL, e.g. http://username:password@my_proxy:8080.

Running in a Non-Kubernetes Environment

This is handled per-component.

Benchmarks (Non-Kubernetes)

It is possible to deploy the benchmark runner as a single Docker container:

docker run -d  \
 -v /:/host:ro \
 -v /tmp:/host/tmp \
 --privileged      \
 --network host \
 --pid host \
 -e BACKEND_ENDPOINT=https://<sysdig_backend_endpoint> \
 -e ACCESS_KEY=<Sysdig agent access key> \
 -e BACKEND_VERIFY_TLS=false \
 -e TAGS=<custom_tags> \
 quay.io/sysdig/compliance-benchmark-runner:latest
  • Note: If you don’t want to pass the access key directly via the command line, consider using an alternative method of passing environment variables, such as docker-compose.

  • The BACKEND_ENDPOINT is only required if for Sysdig on-prem or when using a Sysdig SaaS region other than US-EAST.

    For example, for the EU SaaS endpoint would be: https://eu1.app.sysdig.com.

    See also: SaaS Regions and IP Ranges.

  • BACKEND_VERIFY_TLS=false is only needed if you are using an on-prem backend with a self-signed certificate.

  • TAGS: The list of tags for the host where the agent is installed. For example: “role:webserver, location:europe”, “role:webserver” or “webserver”.

Image Analyzer (Non-Kubernetes)

It is also possible to run the image analyzer as a single Docker container:

docker run -d \
  -v /var/run:/var/run \
  --privileged \
  --network host \
  -e AM_COLLECTOR_ENDPOINT=https://<sysdig_backend_endpoint>/internal/scanning/scanning-analysis-collector \
  -e ACCESS_KEY=<Sysdig agent access key> \
  -e VERIFY_CERTIFICATE=false \
  quay.io/sysdig/node-image-analyzer:latest
  • Note: If you don’t want to pass the access key directly via the command line, consider using an alternative method of passing environment variables, such as docker-compose.

  • The AM_COLLECTOR_ENDPOINT is only required if for Sysdig on-prem or when using a Sysdig SaaS region other than US-EAST.

    For example, for the EU SaaS endpoint would be: https://eu1.app.sysdig.com/internal/scanning/scanning-analysis-collector .

    See also: SaaS Regions and IP Ranges.

  • VERIFY_CERTIFICATE=false is only needed if you are using an on-prem backend with a self-signed certificate.

Host Scanning (Non-Kubernetes)

To install the Host Scanning component in a non-Kubernetes environment, you can use:

docker run -d \
 -v /:/host:ro \
 --privileged \
 -e HOST_BASE=/host 
 -e AM_COLLECTOR_ENDPOINT=https://<sysdig_backend_endpoint>/internal/scanning/scanning-analysis-collector \
 -e ACCESS_KEY=<Sysdig agent access key> \
 -e VERIFY_CERTIFICATE=false \
 -e SCHEDULE=@dailydefault \
 quay.io/sysdig/host-analyzer:latest
  • Note: If you don’t want to pass the access key directly via the command line, consider using an alternative method of passing environment variables, such as docker-compose.

  • The BACKEND_ENDPOINT is only required if for Sysdig on-prem or when using a Sysdig SaaS region other than US-EAST.

    For example, for the EU SaaS endpoint would be: https://eu1.app.sysdig.com.

    See also: SaaS Regions and IP Ranges.

  • BACKEND_VERIFY_TLS=false is only needed if you are using an on-prem backend with a self-signed certificate.

  • TAGS: The list of tags for the host where the agent is installed. For example: “role:webserver, location:europe”, “role:webserver” or “webserver”.

For Image Analyzer Component Only

These cases affect only the Image Analyzer component of the Node Analyzer installation.

Installing Image Analyzer Component Alone

It is still possible to install the image analyzer component without benchmarks or host scanning. This option normally would apply only to previous users of the former node image analyzer who want to upgrade just that component, for whatever reason.

This can be done by downloading the sysdig-image-analyzer-daemonset.yaml and sysdig-image-analyzer-configmap.yaml and deploying.

You need to deploy these YAMLs after installing the Sysdig agent in the same nodes, and also in the same namespace (sysdig-agent by default).

Kubernetes Requiring Custom Socket Path

By default, the image analyzer will automatically detect the socket to mount from:

  • Docker socket from /var/run/docker/docker.sock

  • CRI-O socket from/var/run/crio/crio.sock

  • CRI-containerd socket from/var/run/containerd/containerd.sock

Some setups require the analyzer to use custom socket paths.

If the socket is located outside /var/run, the corresponding volume must be mounted as well. You can configure it via the single line installer script or by manually editing the daemonset and configmap variables.

When using the installer, use the-cv option to mount an additional volume and add -ds -cs or -cd to specify a Docker, CRI, or CRI-containerd socket respectively.

See the script -help command for additional information.

Examples:

For K3S, which uses containerd, add:

-cd unix:///run/k3s/containerd/containerd.sock -cv /run/k3s/containerd

For Pivotal, which uses a custom path for the Docker socket, use:

-ds unix:///var/vcap/data/sys/run/docker/docker.sock -cv /var/vcap/data/sys/run/docker

Daemonset Resource Limit Considerations

During its regular operation, the Image Analyzer uses much less memory than the limit specified in the daemonset configuration. However, in some cases, processing an image may require more memory, depending, for example, on image size, content or package types.

This issue can be detected by looking for abnormal spikes in the memory usage of the Image Analyzer pods which are also showing analysis errors. In such cases we recommend trying to increase the analyzer memory usage up to three times the size of the unprocessed images, if the cluster available memory allows.

Component Configurations

Image Analyzer Configmap Options

For special cases, the image analyzer can be configured by editing the sysdig-image-analyzer configmap in the sysdig-agent namespace with the following options:

Option

Description

docker_socket_path

The Docker socket path, defaulting to unix:///var/run/docker/docker.sock

If a custom path is specified, ensure it is correctly mounted from the host inside the container.

cri_socket_path

The socket path to a CRI compatible runtime, such as CRI-O, defaulting to unix:///var/run/crio/crio.sock.

If a custom path is specified, ensure it is correctly mounted from the host inside the container.

containerd_socket_path

The socket path to a CRI-Containerd daemon, defaulting to unix:///var/run/containerd/containerd.sock

If a custom path is specified, ensure it is correctly mounted from the host inside the container.

collector_endpoint

The endpoint to the Scanning Analysis collector, specified in the following format: https://<API_ENDPOINT>/internal/scanning/scanning-analysis-collector

ssl_verify_certificate

Can be set to "false" to allow insecure connections to the Sysdig backend, such as for on-premise installs that use self-signed certificates. By default, certificates are always verified.

debug

Can be set to "true" to show debug logging, useful for troubleshooting.

http_proxy

Proxy configuration variables.

https_proxy

no_proxy

Host Scanning Configuration Options

The analyzer component of the Host Scanning feature can be configured by editing the sysdig-host-analyzer configmap in thesysdig-agentnamespace with the following options:

OptionDescription
scheduleThe scanning schedule specification for the host analyzer expressed as a crontab string such as “5 4 * * *” (more examples). The default value of @dailydefault instructs the analyzer to automatically pick a schedule that will start shortly after it is deployed and will perform a scan every 24 hours.
dirs_to_scanThe list of directories to inspect during the scan, expressed as a comma separated list such as /etc,/var/lib/dpkg,/usr/local,/usr/lib/sysimage/rpm,/var/lib/rpm,/lib/apk/db
collector_endpointThe endpoint to the Scanning Analysis collector, specified in the following format: https://<API_ENDPOINT>/internal/scanning/scanning-analysis-collector
max_send_attemptsThe number of times the analysis collector is allowed to retry sending results if backend communication fails
ssl_verify_certificateCan be set to "false" to allow insecure connections to the Sysdig backend, such as for on-premise installs that use self-signed certificates. By default, certificates are always verified.
debugCan be set to "true" to show debug logging, useful for troubleshooting.
http_proxyProxy configuration variables.
https_proxy
no_proxy

Benchmark Runner Configuration Options

The benchmark runner component can be configured by editing the sysdig-benchmark-runner configmap in the sysdig-agent namespace with the following options:

OptionDescription
collector_endpointThe Secure API endpoint, specified in the following format: https://<API_ENDPOINT>
ssl_verify_certificateCan be set to "false" to allow insecure connections to the Sysdig backend, such as for on-premise installs that use self-signed certificates. By default, certificates are always verified.
debugCan be set to "true" to show debug logging, useful for troubleshooting.
http_proxyProxy configuration variables.
https_proxy
no_proxy

7 - Install Admission Controller

If you have installed the CLI-based version of the Admission Controller, the UI-based version is not backwards-compatible. You will need to uninstall the old version and install the UI-based version instead.

To understand and use the Admission Controller after installing it, see Admission Controller.

For a more technical documentation see Chart Documentation.

Prerequisites

  • Helm 3
  • Kubernetes 1.21 or higher

Install the Admission Controller

The component must be installed on each cluster where you want to use it.

  1. Make sure kubectl is pointing to the target cluster where the Admission Controller will be installed.

  2. Add and synchronize the Helm repository:

    helm repo add sysdig https://charts.sysdig.com
    helm repo update
    
  3. Install the Admission Controller on the target cluster with full capabilities , e.g.:

    helm install sysdig-admission-controller sysdig/admission-controller \
    --create-namespace -n sysdig-admission-controller \
    --set sysdig.secureAPIToken=$SYSDIG_API_TOKEN \
    --set clusterName=$CLUSTER_NAME \
    --set sysdig.url=https://$SYSDIG_SECURE_ENDPOINT \
    --set features.k8sAuditDetections=true 
    
  4. Check that installation was successful in the Sysdig UI.

    NOTE: Menu options are only available if Admission Controller is enabled.

    Log in to Sysdig Secure and select Image Scanning>Admission Controller|Policy Assignments.

    Admission Controller will be disabled by default in your cluster, to avoid accidentally blocking deployment.
    Cluster will be displayed in the Connected list, as healthy, but Disabled (gray colored dot).
    You have to manually enable it by toggling the Enabled flag and status should change to accordingly (green colored dot):

Installation Parameters

Following parameters are the most common ones, but find the full list of available parmeters or specific use-cases

  • --create-namespace: If supplied, will create a namespace
  • --namespace: Desired namespace where the Admission Controller will be installed
  • --set sysdig.secureAPIToken: Sysdig Secure API token as found in the Sysdig UI under Settings/User Profile. Note that this user must have administrator rights
  • --set clusterName: User-defined name for this cluster that will appear in the admission controller interface in Sysdig’s backend. The cluster name needs to match the agent cluster name.
  • --set sysdig.url: Sysdig endpoint. Default https://secure.sysdig.com is for the us-east region.
    • For us-west use https://us2.app.sysdig.com
    • For European Union, use https://eu1.app.sysdig.com
    • For APAC, use https://app.au1.sysdig.com
    • For US4 (our west Google cloud region) use https://app.us4.sysdig.com/
    • For on-prem, your own enpoints.
    • See also SaaS Regions and IP Ranges.
  • --set features.k8sAuditDetections: (true/false) Set true to enable Kubernetes audit logging via the Admission Controller. See also: Kubernetes Audit Logging (legacy installation) and Select the Policy Type (Kubernetes Audit Policies)
  • --set verifySSL: (true/false) Sets the verification of the Sysdig Secure API; default: true (we recommend only changing this to false when doing initial testing / evaluation of an on-premises installation)
  • --set scanner.verifyRegistryTLS: (true/false) Verify TLS from registries on image pull; default: true (we recommend only changing this to false when doing initial testing / evaluation)
  • --set scanner.psp.create: (true/false) Whether to create a psp policy and role / role-binding; default: false

Enable in Sysdig Labs

  1. Log in to Sysdig Secure as administrator and select Settings|User Profile.

  2. Under Sysdig Labs, enable the Admission Controller feature and click Save.

    The links to the Admission Controller pages will appear under Image Scanning in the left-hand navigation. If you don’t see the options it means you are either not an admin user or the legacy scanning engine is not enabled on your Sysdig instance. Please discuss the situation with your account representative, or open a [support case] (/en/docs/administration/get-help-using-sysdig-support/contact-support/).

Upgrades

Upgrading from Scanning-Only Admission Controller

If you already have the Sysdig Admission Controller installed and want to upgrade:

helm upgrade sysdig-admission-controller sysdig/admission-controller \
-n sysdig-admission-controller \
--set features.k8sAuditDetections=true \
--reuse-values

For those customers who already have the Admission Controller AND already enabled Kubernetes audit logging via the legacy method, you can still install/upgrade to the new Admission Controller. Just be sure to set features.k8sAuditDetections=falseto avoid collecting and displaying duplicate events.

Uninstall the CLI-based Version

If you have installed the CLI-basedversion of the Admission Controller, the UI-based version is not backwards-compatible. You will need to uninstall the old version and install the UI-based version.

Deploy the following:

helm uninstall -n sysdig-admission-controller sysdig-admission-controller

Troubleshooting