This the multi-page printable view of this section. Click here to print.
Mesos/Marathon Metrics
1 - Mesos Agent Metrics
See Application Integrations for more information.
mesos.slave.cpus_percent
The percentage of CPUs allocated to the slave.
mesos.slave.cpus_total
The total number of CPUs.
mesos.slave.cpus_used
The number of CPUs allocated to the slave.
mesos.slave.disk_percent
The percentage of disk space allocated to the slave.
mesos.slave.disk_total
The total disk space available.
mesos.slave.disk_used
The amount of disk space allocated to the slave.
mesos.slave.executors_registering
The number of executors registering.
mesos.slave.executors_running
The number of executors currently running.
mesos.slave.executors_terminated
The number of terminated executors.
mesos.slave.executors_terminating
The number of terminating executors.
mesos.slave.frameworks_active
The number of active frameworks.
mesos.slave.invalid_framework_messages
The number of invalid framework messages.
mesos.slave.invalid_status_updates
The number of invalid status updates.
mesos.slave.mem_percent
The percentage of memory allocated to the slave.
mesos.slave.mem_total
The total memory available.
mesos.slave.mem_used
The amount of memory allocated to the slave.
mesos.slave.recovery_errors
The number of errors encountered during slave recovery.
mesos.slave.tasks_failed
The number of failed tasks.
mesos.slave.tasks_finished
The number of finished tasks.
mesos.slave.tasks_killed
The number of killed tasks.
mesos.slave.tasks_lost
The number of lost tasks.
mesos.slave.tasks_running
The number of running tasks.
mesos.slave.tasks_staging
The number of staging tasks.
mesos.slave.tasks_starting
The number of starting tasks.
mesos.slave.valid_framework_messages
The number of valid framework messages.
mesos.slave.valid_status_updates
The number of valid status updates.
mesos.state.task.cpu
The task CPU.
mesos.state.task.disk
The disk space available for the task.
mesos.state.task.mem
The amount of memory used by the task.
mesos.stats.registered
Defines whether this slave is registered with a master.
mesos.stats.system.cpus_total
The total number of CPUs available.
mesos.stats.system.load_1min
The average load for the last minute.
mesos.stats.system.load_5min
The average load for the last five minutes.
mesos.stats.system.load_15min
The average load for the last 15 minutes.
mesos.stats.system.mem_free_bytes
The amount of free memory.
mesos.stats.system.mem_total_bytes
The total amount of memory.
mesos.stats.uptime_secs
The current uptime for the slave.
2 - Mesos Master Metrics
See Application Integrations for more information.
mesos.cluster.cpus_percent
The percentage of CPUs allocated to the cluster.
mesos.cluster.cpus_total
The total number of CPUs.
mesos.cluster.cpus_used
The number of CPUs used by the cluster.
mesos.cluster.disk_percent
The percentage of disk space allocated to the cluster.
mesos.cluster.disk_total
The total amount of disk space.
mesos.cluster.disk_used
The amount of disk space used by the cluster.
mesos.cluster.dropped_messages
The number of dropped messages.
mesos.cluster.event_queue_dispatches
The number of dispatches in the event queue.
mesos.cluster.event_queue_http_requests
The number of HTTP requests in the event queue.
mesos.cluster.event_queue_messages
The number of messages in the event queue.
mesos.cluster.frameworks_active
The number of active frameworks.
mesos.cluster.frameworks_connected
The number of connected frameworks.
mesos.cluster.frameworks_disconnected
The number of disconnected frameworks.
mesos.cluster.frameworks_inactive
The number of inactive frameworks.
mesos.cluster.gpus_total
The total number of GPUs.
mesos.cluster.invalid_framework_to_executor_messages
The number of invalid messages between the framework and the executor.
mesos.cluster.invalid_status_update_acknowledgements
The number of invalid status update acknowledgements.
mesos.cluster.invalid_status_updates
The number of invalid framework messages.
mesos.cluster.mem_percent
The percentage of memory allocated to the cluster.
mesos.cluster.mem_total
The total amount of memory available.
mesos.cluster.mem_used
The amount of memory the cluster is using.
mesos.cluster.outstanding_offers
The number of outstanding resource offers.
mesos.cluster.slave_registrations
The number of slaves able to rejoin the cluster after a disconnect.
mesos.cluster.slave_removals
The number of slaves that have been removed for any reason, including maintenance.
mesos.cluster.slave_reregistrations
The number of slaves that have re-registered.
mesos.cluster.slave_shutdowns_canceled
The number of slave shutdowns processes that have been cancelled.
mesos.cluster.slave_shutdowns_scheduled
The number of slaves that have failed health checks and are scheduled for removal.
mesos.cluster.slaves_active
The number of active slaves.
mesos.cluster.slaves_connected
The number of connected slaves.
mesos.cluster.slaves_disconnected
The number of disconnected slaves.
mesos.cluster.slaves_inactive
The number of inactive slaves.
mesos.cluster.tasks_error
The number of cluster tasks that resulted in an error.
mesos.cluster.tasks_failed
The number of failed cluster tasks.
mesos.cluster.tasks_finished
The number of completed cluster tasks.
mesos.cluster.tasks_killed
The number of killed cluster tasks.
mesos.cluster.tasks_lost
The number of lost cluster tasks.
mesos.cluster.tasks_running
The number of cluster tasks currently running.
mesos.cluster.tasks_staging
The number of cluster tasks currently staging.
mesos.cluster.tasks_starting
The number of cluster tasks starting.
mesos.cluster.valid_framework_to_executor_messages
The number of valid framework messages.
mesos.cluster.valid_status_update_acknowledgements
The number of valid status update acknowledgements.
mesos.cluster.valid_status_updates
The number of valid status updates.
mesos.framework.cpu
The CPU of the Mesos framework.
mesos.framework.disk
The total disk space of the Mesos framework, measured in mebibytes.
mesos.framework.mem
The total memory of the Mesos framework, measured in mebibytes.
mesos.registrar.queued_operations
The number of queued operations.
mesos.registrar.registry_size_bytes
The size of the Mesos registry in bytes.
mesos.registrar.state_fetch_ms
The Mesos registry’s read latency, in bytes.
mesos.registrar.state_store_ms
The Mesos registry’s write latency, in bytes.
mesos.registrar.state_store_ms.count
The Mesos registry’s write count, in bytes.
mesos.registrar.state_store_ms.max
The maximum write latency for the registry, in milliseconds.
mesos.registrar.state_store_ms.min
The minimum write latency for the registry, in miliseconds.
mesos.registrar.state_store_ms.p50
The median registry write latency, in milliseconds.
mesos.registrar.state_store_ms.p90
The 90th percentile registry write latency, in milliseconds.
mesos.registrar.state_store_ms.p95
The 95th percentile registry write latency, in milliseconds.
mesos.registrar.state_store_ms.p99
The 99th percentile registry write latency, in milliseconds.
mesos.registrar.state_store_ms.p999
The 99.9th percentile registry write latency, in milliseconds.
mesos.registrar.state_store_ms.p9999
The 99.99th percentile registry write latency, in milliseconds.
mesos.role.cpu
The CPU capacity of the configured role.
mesos.role.disk
The total disk space available to the Mesos role, in mebibytes.
mesos.role.mem
The total memory available to the Mesos role, in mebibytes.
mesos.stats.elected
Defines whether this is the elected master or not.
mesos.stats.system.cpus_total
The total number of CPUs in the system.
mesos.stats.system.load_1min
The average load for the last minute.
mesos.stats.system.load_5min
The average load for the last five minutes.
mesos.stats.system.load_15min
The average load for the last fifteen minutes.
mesos.stats.system.mem_free_bytes
The total amount of free system memory, in bytes.
mesos.stats.system.mem_total_bytes
The total cluster memory in bytes.
mesos.stats.uptime_secs
The current uptime of the cluster.
3 - Marathon Metrics
See Application Integrations for more information.
marathon.apps
The total number of applications.
marathon.backoffFactor
The multiplication factor for the delay between each consecutive failed task. This value is multiplied by the value of marathon.backoffSeconds each time the task fails until the maximum delay is reached, or the task succeeds.
marathon.backoffSeconds
The period of time between attempts to run a failed task. This value is multiplied by marathon.backoffFactor for each consecutive task failure, until either the task succeeds or the maximum delay is reached.
marathon.cpus
The number of CPUs configured for each application instance.
marathon.disk
The amount of disk space configured for each application instance.
marathon.instances
The number of instances of a specific application.
marathon.mem
The total amount of configured memory for each instance of a specific application.
marathon.tasksRunning
The number of tasks running for a specific application.
marathon.tasksStaged
The number of tasks staged for a specific application.