1 - Mesos Agent Metrics

See Application Integrations for more information.

mesos.slave.cpus_percent

The percentage of CPUs allocated to the slave.

mesos.slave.cpus_total

The total number of CPUs.

mesos.slave.cpus_used

The number of CPUs allocated to the slave.

mesos.slave.disk_percent

The percentage of disk space allocated to the slave.

mesos.slave.disk_total

The total disk space available.

mesos.slave.disk_used

The amount of disk space allocated to the slave.

mesos.slave.executors_registering

The number of executors registering.

mesos.slave.executors_running

The number of executors currently running.

mesos.slave.executors_terminated

The number of terminated executors.

mesos.slave.executors_terminating

The number of terminating executors.

mesos.slave.frameworks_active

The number of active frameworks.

mesos.slave.invalid_framework_messages

The number of invalid framework messages.

mesos.slave.invalid_status_updates

The number of invalid status updates.

mesos.slave.mem_percent

The percentage of memory allocated to the slave.

mesos.slave.mem_total

The total memory available.

mesos.slave.mem_used

The amount of memory allocated to the slave.

mesos.slave.recovery_errors

The number of errors encountered during slave recovery.

mesos.slave.tasks_failed

The number of failed tasks.

mesos.slave.tasks_finished

The number of finished tasks.

mesos.slave.tasks_killed

The number of killed tasks.

mesos.slave.tasks_lost

The number of lost tasks.

mesos.slave.tasks_running

The number of running tasks.

mesos.slave.tasks_staging

The number of staging tasks.

mesos.slave.tasks_starting

The number of starting tasks.

mesos.slave.valid_framework_messages

The number of valid framework messages.

mesos.slave.valid_status_updates

The number of valid status updates.

mesos.state.task.cpu

The task CPU.

mesos.state.task.disk

The disk space available for the task.

mesos.state.task.mem

The amount of memory used by the task.

mesos.stats.registered

Defines whether this slave is registered with a master.

mesos.stats.system.cpus_total

The total number of CPUs available.

mesos.stats.system.load_1min

The average load for the last minute.

mesos.stats.system.load_5min

The average load for the last five minutes.

mesos.stats.system.load_15min

The average load for the last 15 minutes.

mesos.stats.system.mem_free_bytes

The amount of free memory.

mesos.stats.system.mem_total_bytes

The total amount of memory.

mesos.stats.uptime_secs

The current uptime for the slave.

2 - Mesos Master Metrics

See Application Integrations for more information.

mesos.cluster.cpus_percent

The percentage of CPUs allocated to the cluster.

mesos.cluster.cpus_total

The total number of CPUs.

mesos.cluster.cpus_used

The number of CPUs used by the cluster.

mesos.cluster.disk_percent

The percentage of disk space allocated to the cluster.

mesos.cluster.disk_total

The total amount of disk space.

mesos.cluster.disk_used

The amount of disk space used by the cluster.

mesos.cluster.dropped_messages

The number of dropped messages.

mesos.cluster.event_queue_dispatches

The number of dispatches in the event queue.

mesos.cluster.event_queue_http_requests

The number of HTTP requests in the event queue.

mesos.cluster.event_queue_messages

The number of messages in the event queue.

mesos.cluster.frameworks_active

The number of active frameworks.

mesos.cluster.frameworks_connected

The number of connected frameworks.

mesos.cluster.frameworks_disconnected

The number of disconnected frameworks.

mesos.cluster.frameworks_inactive

The number of inactive frameworks.

mesos.cluster.gpus_total

The total number of GPUs.

mesos.cluster.invalid_framework_to_executor_messages

The number of invalid messages between the framework and the executor.

mesos.cluster.invalid_status_update_acknowledgements

The number of invalid status update acknowledgements.

mesos.cluster.invalid_status_updates

The number of invalid framework messages.

mesos.cluster.mem_percent

The percentage of memory allocated to the cluster.

mesos.cluster.mem_total

The total amount of memory available.

mesos.cluster.mem_used

The amount of memory the cluster is using.

mesos.cluster.outstanding_offers

The number of outstanding resource offers.

mesos.cluster.slave_registrations

The number of slaves able to rejoin the cluster after a disconnect.

mesos.cluster.slave_removals

The number of slaves that have been removed for any reason, including maintenance.

mesos.cluster.slave_reregistrations

The number of slaves that have re-registered.

mesos.cluster.slave_shutdowns_canceled

The number of slave shutdowns processes that have been cancelled.

mesos.cluster.slave_shutdowns_scheduled

The number of slaves that have failed health checks and are scheduled for removal.

mesos.cluster.slaves_active

The number of active slaves.

mesos.cluster.slaves_connected

The number of connected slaves.

mesos.cluster.slaves_disconnected

The number of disconnected slaves.

mesos.cluster.slaves_inactive

The number of inactive slaves.

mesos.cluster.tasks_error

The number of cluster tasks that resulted in an error.

mesos.cluster.tasks_failed

The number of failed cluster tasks.

mesos.cluster.tasks_finished

The number of completed cluster tasks.

mesos.cluster.tasks_killed

The number of killed cluster tasks.

mesos.cluster.tasks_lost

The number of lost cluster tasks.

mesos.cluster.tasks_running

The number of cluster tasks currently running.

mesos.cluster.tasks_staging

The number of cluster tasks currently staging.

mesos.cluster.tasks_starting

The number of cluster tasks starting.

mesos.cluster.valid_framework_to_executor_messages

The number of valid framework messages.

mesos.cluster.valid_status_update_acknowledgements

The number of valid status update acknowledgements.

mesos.cluster.valid_status_updates

The number of valid status updates.

mesos.framework.cpu

The CPU of the Mesos framework.

mesos.framework.disk

The total disk space of the Mesos framework, measured in mebibytes.

mesos.framework.mem

The total memory of the Mesos framework, measured in mebibytes.

mesos.registrar.queued_operations

The number of queued operations.

mesos.registrar.registry_size_bytes

The size of the Mesos registry in bytes.

mesos.registrar.state_fetch_ms

The Mesos registry’s read latency, in bytes.

mesos.registrar.state_store_ms

The Mesos registry’s write latency, in bytes.

mesos.registrar.state_store_ms.count

The Mesos registry’s write count, in bytes.

mesos.registrar.state_store_ms.max

The maximum write latency for the registry, in milliseconds.

mesos.registrar.state_store_ms.min

The minimum write latency for the registry, in miliseconds.

mesos.registrar.state_store_ms.p50

The median registry write latency, in milliseconds.

mesos.registrar.state_store_ms.p90

The 90th percentile registry write latency, in milliseconds.

mesos.registrar.state_store_ms.p95

The 95th percentile registry write latency, in milliseconds.

mesos.registrar.state_store_ms.p99

The 99th percentile registry write latency, in milliseconds.

mesos.registrar.state_store_ms.p999

The 99.9th percentile registry write latency, in milliseconds.

mesos.registrar.state_store_ms.p9999

The 99.99th percentile registry write latency, in milliseconds.

mesos.role.cpu

The CPU capacity of the configured role.

mesos.role.disk

The total disk space available to the Mesos role, in mebibytes.

mesos.role.mem

The total memory available to the Mesos role, in mebibytes.

mesos.stats.elected

Defines whether this is the elected master or not.

mesos.stats.system.cpus_total

The total number of CPUs in the system.

mesos.stats.system.load_1min

The average load for the last minute.

mesos.stats.system.load_5min

The average load for the last five minutes.

mesos.stats.system.load_15min

The average load for the last fifteen minutes.

mesos.stats.system.mem_free_bytes

The total amount of free system memory, in bytes.

mesos.stats.system.mem_total_bytes

The total cluster memory in bytes.

mesos.stats.uptime_secs

The current uptime of the cluster.

3 - Marathon Metrics

See Application Integrations for more information.

marathon.apps

The total number of applications.

marathon.backoffFactor

The multiplication factor for the delay between each consecutive failed task. This value is multiplied by the value of marathon.backoffSeconds each time the task fails until the maximum delay is reached, or the task succeeds.

marathon.backoffSeconds

The period of time between attempts to run a failed task. This value is multiplied by marathon.backoffFactor for each consecutive task failure, until either the task succeeds or the maximum delay is reached.

marathon.cpus

The number of CPUs configured for each application instance.

marathon.disk

The amount of disk space configured for each application instance.

marathon.instances

The number of instances of a specific application.

marathon.mem

The total amount of configured memory for each instance of a specific application.

marathon.tasksRunning

The number of tasks running for a specific application.

marathon.tasksStaged

The number of tasks staged for a specific application.