Agent Install: Mesos | Marathon | DCOS

Marathon is the container orchestration platform for Mesosphere’s Datacenter Operating System (DC/OS) and Apache Mesos.

This guide describes how to install the Sysdig agent container on each underlying host in your Mesos cluster. Once installed, the agent will automatically connect to the Mesos and Marathon APIs to pull relevant metadata about the environment and will begin monitoring all of your hosts, apps, containers, and frameworks.

Standard Installation Instructions

Review the Host Requirements for Agent Installation.

In this three-part installation, you:

  • Deploy the Sysdig agent on all Mesos Agent (aka “Slave”) nodes, either automatically or by creating and posting a .json file to the leader Marathon API server.

  • Deploy the Sysdig agent on the Mesos Master nodes.

  • Special configuration steps: modify the Sysdig agent config file to monitor Marathon instances.

Deploy the Sysdig agent on your Mesos Agent nodes

Preferred Option: Automatic install (DC/OS 1.11+)

If you’re using DC/OS 1.8 or higher, then you can find Sysdig in the Mesosphere Universe marketplace and install it from there.

It will automatically deploy the Sysdig agent container on each of your Mesos Agent nodes as a Marathon app.

Proceed to Deploy the Sysdig Agent.

Alternate Option: Post a .json file

If you are using a version of DC/OS earlier than 1.8 then:

  1. Create a JSON file for Marathon, in the following format.

    The COLLECTOR address comes from your own environment in on-prem installations. For SaaS installations, find the collector endpoint for your region listed here.

    COLLECTOR_PORT, SECURE, and CHECK_CERT are used in environments with Sysdig’s on-premises backend installed.

      "backoffFactor": 1.15,
      "backoffSeconds": 1,
      "constraints": [
      "container": {
        "docker": {
          "forcePullImage": true,
          "image": "sysdig/agent",
          "parameters": [],
          "privileged": true
        "type": "DOCKER",
        "volumes": [
            "containerPath": "/host/var/run/docker.sock",
            "hostPath": "/var/run/docker.sock",
            "mode": "RW"
            "containerPath": "/host/dev",
            "hostPath": "/dev",
            "mode": "RW"
            "containerPath": "/host/proc",
            "hostPath": "/proc",
            "mode": "RO"
            "containerPath": "/host/boot",
            "hostPath": "/boot",
            "mode": "RO"
            "containerPath": "/host/lib/modules",
            "hostPath": "/lib/modules",
            "mode": "RO"
            "containerPath": "/host/usr",
            "hostPath": "/usr",
            "mode": "RO"
      "cpus": 1,
      "deployments": [],
      "disk": 0,
      "env": {
        "CHECK_CERT": "false",
        "SECURE": "true",
        "TAGS": "example_tag:example_value",
        "name": "sdc-agent",
        "pid": "host",
        "role": "monitoring",
        "shm-size": "350m"
      "executor": "",
      "gpus": 0,
      "id": "/sysdig-agent",
      "instances": 1,
      "killSelection": "YOUNGEST_FIRST",
      "labels": {},
      "lastTaskFailure": {
        "appId": "/sysdig-agent",
        "host": "YOUR-HOST",
        "message": "Container exited with status 70",
        "slaveId": "1fa6f2fc-95b0-445f-8b97-7f91c1321250-S2",
        "state": "TASK_FAILED",
        "taskId": "sysdig-agent.3bb0759d-3fa3-11e9-b446-c60a7a2ee871",
        "timestamp": "2019-03-06T00:03:16.234Z",
        "version": "2019-03-06T00:01:57.182Z"
      "maxLaunchDelaySeconds": 3600,
      "mem": 850,
      "networks": [
          "mode": "host"
      "portDefinitions": [
          "name": "default",
          "port": 10101,
          "protocol": "tcp"
      "requirePorts": false,
      "tasks": [
          "appId": "/sysdig-agent",
          "healthCheckResults": [],
          "host": "YOUR-HOST-IP",
          "id": "sysdig-agent.0d5436f4-3fa4-11e9-b446-c60a7a2ee871",
          "ipAddresses": [
              "ipAddress": "YOUR-HOST-IP",
              "protocol": "IPv4"
          "localVolumes": [],
          "ports": [
          "servicePorts": [],
          "slaveId": "1fa6f2fc-95b0-445f-8b97-7f91c1321250-S2",
          "stagedAt": "2019-03-06T00:09:04.232Z",
          "startedAt": "2019-03-06T00:09:06.912Z",
          "state": "TASK_RUNNING",
          "version": "2019-03-06T00:09:04.182Z"
      "tasksHealthy": 0,
      "tasksRunning": 1,
      "tasksStaged": 0,
      "tasksUnhealthy": 0,
      "unreachableStrategy": {
        "expungeAfterSeconds": 0,
        "inactiveAfterSeconds": 0
      "upgradeStrategy": {
        "maximumOverCapacity": 1,
        "minimumHealthCapacity": 1
      "version": "2019-03-06T00:09:04.182Z",
      "versionInfo": {
        "lastConfigChangeAt": "2019-03-06T00:09:04.182Z",
        "lastScalingAt": "2019-03-06T00:09:04.182Z"

    See Table 1: Environment Variables for Agent Config Filef or the Sysdig name:value definitions.

    Complete the “cpus”, “mem” and “labels” (i.e. Marathon labels) entries to fit the capacity and requirements of the cluster environment.

  2. Update the created.json file to the leader Marathon API server:

    $ $curl -X POST http://$(hostname -i):8080/v2/apps -d @sysdig.json -H "Content-type: application/json"

Deploy the Sysdig Agent

After deploying the agent to the Mesos Agent nodes, you will install agents on each of the Mesos Master nodes as well.

If any cluster node has both Mesos Master and Mesos Agent roles, do not perform this installation step on that node. It already will have a Sysdig agent installed from the procedure in step A. Running duplicate Sysdig agents on a node will cause errors.

Use the Agent Install: Non-Orchestrated instructions to install the agent directly on each of your Mesos Master nodes.

When the Sysdig agent is successfully installed on the master nodes, it will automatically connect to the local Mesos and Marathon (if available) API servers via http://localhost:5050 and http://localhost:8080 respectively, to collect cluster configuration and current state metadata in addition to host metrics.

Special Configuration Steps

In certains situations, you may need to add additional configurations to the dragent.yaml file:

  • If the Sysdig agent cannot be run directly on the Mesos API server

  • If the API server is protected with a username/password.

Descriptions and examples are shown below.

If the Sysdig Agent Cannot Run On the Mesos API Server

Mesos allows multiple masters. If the API server can not be instrumented with a Sysdig agent, simply delegate ONE other node with an agent installed to remotely receive infrastructure information from the API server.

NOTE: If you manually configure the agent to point to a master with a static configuration file entry, then automatic detection/following of leader changes will no longer be enabled.

Add the following Mesos parameter to the delegated agent’s dragent.yaml file to allow it to connect to the remote API server and authenticate, either by:

a. Directly editing dragent.yaml on the host, or

b. Converting the YAML code to a single-line format and adding it as an ADDITIONAL_CONF argument in a Docker command.

See Understanding the Agent Config Files for details.

Specify the API server’s connection method, address, and port. Also specify credentials if necessary.

YAML example:

mesos_state_uri: http://[acct:passwd@][hostname][:port]
  - http://[acct:passwd@][hostname][:port]

Although marathon_uris: is an array, currently only a single “root” Marathon framework per cluster is supported. Multiple side-by-side Marathon frameworks should not be configured in order for our agent to function properly. Multiple side-by-side “root” Marathon frameworks on the same cluster are currently not supported. The only supported multiple-Marathon configuration is with one “root” Marathon and other Marathon frameworks as its apps.

If the Mesos API server requires authentication

If the agent is installed on the API server but the API server uses a different port or requires authentication, those parameters must be explicitly specified.

Add the following Mesos parameters to the API server’s dragent.yaml to make it connect to the API server and authenticate with any unique account and password, either by:

a. Directly editing dragent.yaml on the host, or

b. Converting the YAML code to a single-line format and adding it as an ADDITIONAL_CONF argument in a Docker command.

See Understanding the Agent Config Files for details.

Specify the API server’s protocol, user credentials, and port:

mesos_state_uri: http://[username:password@][hostname][:port]
  - http://[acct:passwd@][hostname][:port]

*HTTPS protocol is also supported.

Troubleshooting: Turning Off Metadata Reception

In troubleshooting cases where auto-detection and reporting of your Mesos infrastructure needs to be temporarily turned off in a designated agent:

  1. Comment out the Mesos parameter entries in the agent’s dragent.yaml file.

    Example parameters to disable: mesos_state_uri, marathon_uris

  2. If the agent is running on the API server (Master node) and auto-detecting a default configuration, you can add the line:

    mesos_autodetect: false

    either directly in the dragent.yaml file or as an ADDITIONAL_CONF parameter in a Docker command.

  3. Restart the agent.

Last modified June 23, 2022