Agent Install: Mesos | Marathon | DCOS

Marathon is the container orchestration platform for Mesosphere’s Datacenter Operating System (DC/OS) and Apache Mesos.

This guide describes how to install the Sysdig agent container on each underlying host in your Mesos cluster. Once installed, the agent will automatically connect to the Mesos and Marathon APIs to pull relevant metadata about the environment and will begin monitoring all of your hosts, apps, containers, and frameworks.

Prerequisites

  • Review the Agent Installation Requirements.

  • Collect the configuration parameters. You will need them to create the JSON file.

    • ACCESS_KEY: Your unique access key string. Inability to retrieve the key indicates that the administrator of your instance might have it turned off for non-admin users. Contact your Sysdig administrator to receive the key. If you still have issues please contact Sysdig Support.

    • COLLECTOR: The collector URL for Sysdig Monitor or Sysdig Secure. This value is region-dependent in SaaS and is auto-completed on the Get Started page in the UI. It is a custom value in on-prem installations. See SaaS Regions and IP Ranges.

    • COLLECTOR_PORT: The default is 6443. It is used in environments with Sysdig’s on-premises backend installed.

    • SECURE: Use a secure SSL/TLS connection to send metrics to the collector. It is used in environments with Sysdig’s on-premises backend installed.

    • CHECK_CERT: Determines strong SSL certificate check for Sysdig Monitor on-premises installation. Set to true when using SSL/TLS to connect to the collector service to ensure that a valid SSL/TLS certificate is installed. It is used in environments with Sysdig’s on-premises backend installed.

Installation

In this three-part installation, you:

  • Deploy the Sysdig agent on all Mesos Agent (Slave) nodes, either automatically or by creating and posting a .json file to the leader Marathon API server.

  • Deploy the Sysdig agent on the Mesos Master nodes.

  • Special configuration steps: modify the Sysdig agent config file to monitor Marathon instances.

Deploy the Sysdig Agent on Mesos Agent Nodes

Preferred Option: Automatic install (DC/OS 1.11+)

If you’re using DC/OS 1.8 or higher, then you can find Sysdig agent in the Mesosphere Universe marketplace and install it from there.

It will automatically deploy the Sysdig agent container on each of your Mesos Agent nodes as a Marathon app.

Proceed to Deploy the Sysdig Agent.

Alternate Option: Post a .json file

If you are using a version of DC/OS earlier than 1.8 then:

  1. Create a JSON file for Marathon, in the following format. See configuration parameters for details.

    {
      "backoffFactor": 1.15,
      "backoffSeconds": 1,
      "constraints": [
        [
          "hostname",
          "UNIQUE"
        ]
      ],
      "container": {
        "docker": {
          "forcePullImage": true,
          "image": "sysdig/agent",
          "parameters": [],
          "privileged": true
        },
        "type": "DOCKER",
        "volumes": [
          {
            "containerPath": "/host/var/run/docker.sock",
            "hostPath": "/var/run/docker.sock",
            "mode": "RW"
          },
          {
            "containerPath": "/host/dev",
            "hostPath": "/dev",
            "mode": "RW"
          },
          {
            "containerPath": "/host/proc",
            "hostPath": "/proc",
            "mode": "RO"
          },
          {
            "containerPath": "/host/boot",
            "hostPath": "/boot",
            "mode": "RO"
          },
          {
            "containerPath": "/host/lib/modules",
            "hostPath": "/lib/modules",
            "mode": "RO"
          },
          {
            "containerPath": "/host/usr",
            "hostPath": "/usr",
            "mode": "RO"
          }
        ]
      },
      "cpus": 1,
      "deployments": [],
      "disk": 0,
      "env": {
        "ACCESS_KEY": "ACCESS_KEY=YOUR-ACCESS-KEY-HERE",
        "CHECK_CERT": "false",
        "SECURE": "true",
        "TAGS": "example_tag:example_value",
        "name": "sdc-agent",
        "pid": "host",
        "role": "monitoring",
        "shm-size": "350m"
      },
      "executor": "",
      "gpus": 0,
      "id": "/sysdig-agent",
      "instances": 1,
      "killSelection": "YOUNGEST_FIRST",
      "labels": {},
      "lastTaskFailure": {
        "appId": "/sysdig-agent",
        "host": "YOUR-HOST",
        "message": "Container exited with status 70",
        "slaveId": "1fa6f2fc-95b0-445f-8b97-7f91c1321250-S2",
        "state": "TASK_FAILED",
        "taskId": "sysdig-agent.3bb0759d-3fa3-11e9-b446-c60a7a2ee871",
        "timestamp": "2019-03-06T00:03:16.234Z",
        "version": "2019-03-06T00:01:57.182Z"
      },
      "maxLaunchDelaySeconds": 3600,
      "mem": 850,
      "networks": [
        {
          "mode": "host"
        }
      ],
      "portDefinitions": [
        {
          "name": "default",
          "port": 10101,
          "protocol": "tcp"
        }
      ],
      "requirePorts": false,
      "tasks": [
        {
          "appId": "/sysdig-agent",
          "healthCheckResults": [],
          "host": "YOUR-HOST-IP",
          "id": "sysdig-agent.0d5436f4-3fa4-11e9-b446-c60a7a2ee871",
          "ipAddresses": [
            {
              "ipAddress": "YOUR-HOST-IP",
              "protocol": "IPv4"
            }
          ],
          "localVolumes": [],
          "ports": [
            4764
          ],
          "servicePorts": [],
          "slaveId": "1fa6f2fc-95b0-445f-8b97-7f91c1321250-S2",
          "stagedAt": "2019-03-06T00:09:04.232Z",
          "startedAt": "2019-03-06T00:09:06.912Z",
          "state": "TASK_RUNNING",
          "version": "2019-03-06T00:09:04.182Z"
        }
      ],
      "tasksHealthy": 0,
      "tasksRunning": 1,
      "tasksStaged": 0,
      "tasksUnhealthy": 0,
      "unreachableStrategy": {
        "expungeAfterSeconds": 0,
        "inactiveAfterSeconds": 0
      },
      "upgradeStrategy": {
        "maximumOverCapacity": 1,
        "minimumHealthCapacity": 1
      },
      "version": "2019-03-06T00:09:04.182Z",
      "versionInfo": {
        "lastConfigChangeAt": "2019-03-06T00:09:04.182Z",
        "lastScalingAt": "2019-03-06T00:09:04.182Z"
      }
    }
    
    

    See Environment Variables for Agent Config File for the Sysdig name:value definitions.

    Complete the “cpus”, “mem” and “labels” (i.e. Marathon labels) entries to fit the capacity and requirements of the cluster environment.

  2. Update the created.json file to the leader Marathon API server:

    $ $curl -X POST http://$(hostname -i):8080/v2/apps -d @sysdig.json -H "Content-type: application/json"
    

Deploy the Sysdig Agent on Master Nodes

After deploying the agent to the Mesos Agent nodes, you will install agents on each of the Mesos Master nodes as well.

If any cluster node has both Mesos Master and Mesos Agent roles, do not perform this installation step on that node. It already will have a Sysdig agent installed from the procedure in step A. Running duplicate Sysdig agents on a node will cause errors.

Use the Agent Install: Non-Orchestrated instructions to install the agent directly on each of your Mesos Master nodes.

When the Sysdig agent is successfully installed on the master nodes, it will automatically connect to the local Mesos and Marathon (if available) API servers via http://localhost:5050 and http://localhost:8080 respectively, to collect cluster configuration and current state metadata in addition to host metrics.

Additional Configuration

In certains situations, you may need to add additional configurations to the dragent.yaml file:

  • If the Sysdig agent cannot be run directly on the Mesos API server.

  • If the API server is protected with a username/password.

Descriptions and examples are shown below.

Sysdig Agent Unable to Run on the Mesos API Server

Mesos allows multiple masters. If the API server can not be instrumented with a Sysdig agent, simply delegate ONE other node with an agent installed to remotely receive infrastructure information from the API server.

NOTE: If you manually configure the agent to point to a master with a static configuration file entry, then automatic detection/following of leader changes will no longer be enabled.

Add the following Mesos parameter to the delegated agent’s dragent.yaml file to allow it to connect to the remote API server and authenticate, either by:

a. Directly editing dragent.yaml on the host, or

b. Converting the YAML code to a single-line format and adding it as an ADDITIONAL_CONF argument in a Docker command.

See Understanding the Agent Configuration for details.

Specify the API server’s connection method, address, and port. Also specify credentials if necessary.

YAML example:

mesos_state_uri: http://[acct:passwd@][hostname][:port]
marathon_uris:
  - http://[acct:passwd@][hostname][:port]

Although marathon_uris: is an array, currently only a single “root” Marathon framework per cluster is supported. Multiple side-by-side Marathon frameworks should not be configured in order for our agent to function properly. Multiple side-by-side “root” Marathon frameworks on the same cluster are currently not supported. The only supported multiple-Marathon configuration is with one “root” Marathon and other Marathon frameworks as its apps.

Mesos API Server Requires Authentication

If the agent is installed on the API server but the API server uses a different port or requires authentication, those parameters must be explicitly specified.

Add the following Mesos parameters to the API server’s dragent.yaml to make it connect to the API server and authenticate with any unique account and password, either by:

a. Directly editing dragent.yaml on the host, or

b. Converting the YAML code to a single-line format and adding it as an ADDITIONAL_CONF argument in a Docker command.

See Understanding the Agent Configuration for details.

Specify the API server’s protocol, user credentials, and port:

mesos_state_uri: http://[username:password@][hostname][:port]
marathon_uris:
  - http://[acct:passwd@][hostname][:port]

*HTTPS protocol is also supported.

Troubleshooting: Turning Off Metadata Reception

In troubleshooting cases where auto-detection and reporting of your Mesos infrastructure needs to be temporarily turned off in a designated agent:

  1. Comment out the Mesos parameter entries in the agent’s dragent.yaml file.

    Example parameters to disable: mesos_state_uri, marathon_uris

  2. If the agent is running on the API server (Master node) and auto-detecting a default configuration, you can add the line:

    mesos_autodetect: false

    either directly in the dragent.yaml file or as an ADDITIONAL_CONF parameter in a Docker command.

  3. Restart the agent.



Last modified September 23, 2022