Sysdig Agent Release Notes

9.8.0 March 31, 2020

Notice

All the public-facing URLs that were pointing to https://s3.amazonaws.com/download.draios.com/ have been updated to point to https://download.sysdig.com/.

Change the URL in the whitelisting firewall/proxy setting to reflect https://download.sysdig.com/. Otherwise, the agent install on Linux will fail.

Fixes

Metrics Reporting

Fixed an issue in the agent wherein the kubernetes.namespace.pod.desired.count and kubernetes.namespace.pod.available.count metrics were not reporting any values.

HDFS App Check Deprecated

The HDFS (Hadoop Distributed File System) App Check had been deprecated and removed. Users of the HDFS App Check can switch to hdfs_namenode and hdfs_datanode App Checks.

Metric Calculation

Fixed an issue related to calculating the kubernetes.pod.restart.rate metric.

Network Congestion

Isolated the Kubernetes Audit HTTP server from the Audit Event processing path to reduce the chances of slowing down the connections from the Kubernetes API server. This should reduce the likelihood of multiple outstanding connections from the Kubernetes API server.

Certifi Python Module

Added a missing certifi Python module to the agent container.

9.7.0 March 09, 2020

New Features

Support for Openshift Hardening Guide

Added Openshift Hardening Guide as a benchmark program. It is available as an option for CIS Kubernetes Benchmark.

Support for Linux Benchmark

Added Linux benchmarking as an available benchmark program.

New Metrics for Redis and MongoDB App Checks

The following metrics are introduced:

  • RedisDB

    • redis.mem.startup

    • redis.mem.overhead

  • MongoDB

    • mongodb.tcmalloc.generic.current_allocated_bytes

    • mongodb.tcmalloc.generic.heap_size

    • mongodb.tcmalloc.tcmalloc.aggressive_memory_decommit

    • mongodb.tcmalloc.tcmalloc.central_cache_free_bytes

    • mongodb.tcmalloc.tcmalloc.current_total_thread_cache_bytes

    • mongodb.tcmalloc.tcmalloc.max_total_thread_cache_bytes

    • mongodb.tcmalloc.tcmalloc.pageheap_free_bytes

    • mongodb.tcmalloc.tcmalloc.pageheap_unmapped_bytes

    • mongodb.tcmalloc.tcmalloc.spinlock_total_delay_ns

    • mongodb.tcmalloc.tcmalloc.thread_cache_free_bytes

    • mongodb.tcmalloc.tcmalloc.transfer_cache_free_bytes

For more information, see Metrics Introduced with Agent v9.7.0 and RedisDB Metrics

Fixes

Slim Agent Vulnerabilities

Fixed the vulnerabilities detected in the agent-slim v9.6.1 image. These issues are related to the python2 and jackson-databind packages. These packages were upgraded to the versions with fixes.

Run App Checks on Hosts with Python 2.6

Fixed a defect that prevented app checks from running on hosts that install Python 2.6.

9.6.1 February 28, 2020

Fixes

Metrics calculation

Fixed an issue that caused an error in the calculation of some metrics such as net.* in agent version 9.6.0.

Red Hat-based host issue

Fixed an issue that caused the kernel module build associated with agent version 9.6.0 to fail on Red Hat-based hosts.

9.6.0 February 26, 2020

Upgrades

Integrations improved

Added new metrics and configuration options for HAProxy and Consul app check integrations. See HAProxy, HAProxy Metrics, and Consul for details.

Fixed a problem Go app check which caused it to fail with an exception error.

Metrics added

Added Kubernetes metric kubernetes.namespace.pod.running.count to track the number of pods in running state. See Kubernetes State.Kubernetes State

Reduced load on the Kubernetes API server

The version of client-go was updated and now defaults to encoded protobuf messaging instead of JSON to improve performance.

Configuration option new_k8s now enabled by default.

Fixes

Default collector port changed

The default port for the collector was changed from 6666 to 6443.

Prometheus metrics fix

Fixed a problem that inhibited the agent from scraping multiple ports on a single process for Prometheus metrics.

Inaccurate cpu.used reporting fixed

Fixed a problem that caused the agent to erroneously report very high CPU usage in some environments.

Fix for the dynamic back-end configuration of Kubernetes Audit Logging that caused some agent deploys to failKubernetes Audit Logging

The agent is enhanced to listen on /k8s-audit for Kubernetes audit events and the path can be configured via the config option security:{k8s_audit_server_path_uris: [path1, path2]}.

9.5.0 January 28, 2020

Note

Note that the versioning scheme for agent releases has been updated with this release. Previous versions used the format 0.<version number><hotfix>, such as 0.94.0.

Sysdig is aligning version numbers to the rest of the product. The new version number reflects the maturity of the Agent software over the last several years. Going forward, all Agent versions will be numbered as Major.minor.hotfix

We encourage users to be on the latest version of the Agent. Starting with the next release of the Agent, we will support n-3 versions back based on the minor number. For example, if the next release is v 9.6.0, we will support n-3 versions back, e.g to 9.3.0 (old version scheme = 0.93.0).

Fixes and Upgrades

Added new configuration option and metrics for Elasticsearch integrations

In the Elasticsearch app check, the parameter index_stats can be used to collect metrics from individual indices. See Example 4 in Elasticsearch and Elasticsearch Metrics for details.

Added new metrics for NGINX Plus integrations

More than 60 new metrics have been added to the NGINX app check. See NGINX Plus Metrics for details.

Made Go-based event handling the default

See Use Go to Process Kubernetes Events. As of agent 9.5.0, the default setting for go_k8s_user_events is true and there is no need explicitly to enable it. To switch back to the older events monitoring (C++ based), set the value to false in your agent config (dragent.yaml).

Enhanced log tracing for include/exclude processes filter.

No user action is required; see Include/Exclude Processes to use the filter.

Fixed agent termination issue

Fixed a problem that was causing an internal process within the agent to repeatedly restart.

Improved memory buffer handling

The agent will now auto-disable memdump functionality when the memory buffer is too small.

Agent start/stop improvements on CRI-O and Openshift 4.x

The agent can now correctly perform the pause and stop container actions on clusters running OpenShift 4.x and CRI-O.

0.94.0 December 20, 2019

Fixes and Upgrades

Fixed issue in the agent install scripts

The agent install scripts have been updated to mount /etc/modprobe.d from the host into the agent container. This prevents a problem where the agent loaded drivers that were excluded from the host.

Added user events for additional resource types

Added events monitoring for statefulsets, services, and horizontal pod auto-scalers (HPAs) when the Golang-based events monitoring feature is enabled. To enable, see Use Go to Process Kubernetes Events.

Added regex support for Kafka integrations

Added regex capability for consumer groups and topics in Apache Kafka configurations. See Example 6 in Apache Kafka.

Increased the Prometheus max_tags default value

The Prometheus max_tags configuration has been increased from 20 to 40.

Made change to guarantee support for older cpuset configurations.

Changed CRIO cpuset calculations to use the configured cpuset.cpus value instead of cpuset.effective_cpus. This guarantees support on older cpuset configurations.

Corrected an issue that resulted in the suffix "_total" to be added to Prometheus counter metric names.

0.93.1 November 25, 2019

Fixes and Updates

Fixed installation issue on native RHEL 7.x installs

The agent installer script has been updated to refer to an updated epel repository.

Improved JMX metrics reporting

Fixed an issue when retrieving JMX metrics which could result in missing samples.

(Sysdig Secure): Improvement in Kubernetes Audit events

Fixed runtime policy scopes for Kubernetes audit events.

(Sysdig Secure) Fixed audit event exception

The system now catches JSON object-type exceptions when parsing Kubernetes audit events.

Improved error message

Improved the error message reported when the Sysdig agent cannot find a pre-installed kernel header or cannot download a sysdigcloud-probe.

Performance improvement in dragent logging

0.93.0.1 November 15, 2019

Fixes

Fixed issue with Prometheus metrics names

Corrected a problem that resulted in the suffix _total to be removed from Prometheus counter metric names.

0.93.0 November 6, 2019

New Features

Mask the customer ID in log files

The Customer ID is no longer output in the agent log, to avoid inadvertent exposure when sharing of log files.

Kubernetes role node label included by default

The kubernetes.node.label.kubernetes.io/role label is available by default

Update Kubernetes API used, in order to expand support of Kubernetes v1.16

Replaced usage of the extensions/v1beta1 Kubernetes API with apps/v1 in the agent. This is required for supporting Kubernetes v1.16 using the agent's legacy Kubernetes integration (when new_k8s is not enabled).

Introduced a new config option in ElasticSearch app check

Introduced a new config option to generate cluster-wide primary shard metrics from a master node: pshard_stats_master_node_only. See Elasticsearch (Example 3).

Enhanced Postgresql app check

The Postgres app check has been enhanced to provide new metrics and examples. See PostgreSQL.

Agent preparation for upcoming Policy Advisor feature in Sysdig Secure

The agent will support new Rules generated by Sysdig's Kubernetes Policy Advisor. This agent is the minimum version required to use the upcoming feature.

Updates and Fixes

Improved system events handling for Ubuntu 19.10

On kernels 5.1 and newer, some syscall events were incorrectly dropped. This has been fixed.

Stopped Kubernetes pause containers (pods) from being reported

Fixed an issue where Kubernetes pause containers were also showing up in Kubentes events. This fix filters them out from the events being reported.

Fixed rare issue on OpenShift

Fixed an issue where, in a rare case, a dropped event could cause a kernel deadlock and crash the node.

Fixed issue preventing kernel module creation for Debian Buster

This change adds support for building the Sysdig Monitor agent kernel module for Debian Buster.

Improved event timestamp in Kubernetes

This fix ensures that user events get the correct timestamp with Kubernetes v1.16 when thego_k8s_user_events option is set to true.

Updated Kubernetes API used, in order to expand support of Kubernetes v1.16

In dragent.yaml, the Kubernetes API extensions/v1beta1 is updated to apps/v1. This enables agent support for Kubernetes v1.16 even when the new_k8s option is set to false.

Fixed a Kubernetes event reporting issue

Fixed an issue with Kubernetes Events where the host MAC scope was not populated correctly, resulting in not showing up on the dashboard.

Improved Kubernetes events handling from delegated agents

When using go_k8s_user_events, kubernetes events from non-delegated agents are no longer sent.

Eliminated legacy "BASELINES" message

Stopped processing legacy BASELINES messages from the backend collector.

Performance improvement at startup

The agent now defers initialization of Secure-related components slightly to reduce excess resource usage at startup.

0.92.3 October 7, 2019

Updates and Fixed Issues

Included Example of a Prometheus Matching Rule Using HTTPS

The Sysdig agent will use HTTPS for scraping when target's annotation has "kuberentes.pod.annotation.prometheus.io/scheme: https".

Kubernetes versions older than 1.9 no longer supported.

The Sysdig agent has replaced the use of the extensions/v1beta1 Kubernetes API with apps/v1.

Included Example of a Prometheus Matching Rule Using HTTPS

The Sysdig agent will use HTTPS for scraping when target's annotation has "kuberentes.pod.annotation.prometheus.io/scheme: https".

The RabbitMQ app check has a new config option: filter_by_node

Without this option, each node reports cluster-wide information (as presented by rabbitmq itself). This option makes it easier to view the metrics in the UI by removing redundant information reported by individual nodes. See RabbitMQ for details.

0.92.2 September 26, 2019

New Features

Asynchronous metadata collection for CRI-O and containerd

The collection of container metadata from CRI-based runtimes was previously synchronous with other agent tasks.

Prioritize and filter how process metrics are reported in Sysdig Monitor. 

In addition to filtering data by container, it is also possible to filter independently by process. Broadly speaking, this refinement helps ensure that relevant data is reported while noise is reduced. See Include/Exclude Processes for details.

Note

As of this version, App Checks on hosts with Python 2.6 will no longer be supported.

Fixed Issues

Sysdig Monitor

  • Fix for Agent termination during resource discovery from the Kubernetes API Server 

    Fixed an issue where the Agent stopped and shut down if there an error occurred during resource discovery from the Kubernetes API Server. This fix simply reports the error and continues with the discovered resources.

  • Fix for Kubernetes delegation error

    Fixed an issue that caused Kubernetes delegation to not work after the cointerface process restarts following a crash.

  • Fix for accounting Network errors

    Network-related errors are now correctly accounted for instead of being treated as file-open errors.

  • New Prometheus Client Version

    Updated prometheus_client to version 0.7.1. This should result in improved performance while ingesting Prometheus metrics.

  • Fix for dropping StatsD Metrics

    A defect in earlier versions of Sysdig Monitor with the statsd.use_forwarderoption could drop some StatsD metrics from containers. This change resolves that problem; the agent will begin fetching metrics from containers 10 seconds after first identifying that the container exists. The 10 second delay allows containers to start StatsD servers within their network namespaces if they choose.

    The timeout can be overridden using the statsd.container_server_creation_delay_s option, which specifies the delay in seconds.

  • Fixed resource metrics for CRI-O containers

    The following metrics reporting correctly in the Monitor UI: memory.limit.bytes, memory.limit.used.percent, and cpu.quota.used.percent. The CRI extra_queries option now enabled by default. See Runtime Support: CRI-O and Containerd for details.

Sysdig Secure

  • Fix for enlarging Sysdig Capture 

    Fixed an issue where a Sysdig capture would grow endlessly if a security policy was set to Capture 0 seconds after an event.

  • Fix for processing system events

    Fixed problem where gettimeofday syscall was called in compliance code while processing system events. This could potentially cause performance problems in Linux distros that called down to the kernel for gettimeofday responses, such as some versions of Amazon Linux.

Sysdig Platform

  • New RPM dependency

    Changed RPM dependency to Python 2 to support installation on RHEL 8.

0.92.1 August 16, 2019

Fixed Issues

Sysdig Monitor

  • Fixed issue with cluster name in Monitor UI

    Cluster name was being populated incorrectly for Kubernetes event scopes.

  • Fixed Kubernetes events issue

    Fixed Kubernetes event collection issue that occurred when using the go_k8s_user_events option. This option was introduced in agent version 0.91.

Sysdig Platform

  • RHEL 7.7 and 8.0+ support The kernel module now builds for RHEL 7.7 and 8.0+

  • Fixed issue with StatsD metrics collection limits Some versions of the Sysdig agent allowed fewer than the configured number of StatsD metrics because Sysdig Secure-related StatsD metrics were counted towards the configured limit.

    This change corrects that behavior so that the configured limit applies only to StatsD metrics that do not originate from Sysdig components.

Sysdig Secure

  • Fixed a profiling-related issue that impacts Sysdig Secure 2.4

    Sysdig Secure 2.4 will include a new Profiling feature, and 0.92.1 fixes a bug where profiling could remain disabled after periods of high load. In order to use Profiling, it is required to upgrade to agent 0.92.1 or higher.

0.92 August 7, 2019

New Features

Preparatory enhancements for upcoming Sysdig Secure Policy Editor Although the feature UI will not be released until version 2.4.0, Sysdig encourages all users of Sysdig Secure to upgrade to agent 0.92 in preparation for the new Policy Editor feature. Agent 0.92 will accept policies messages from both the current backend as well as a backend that supports the new policy editor.

Ability to compress metrics data for internal transfer

With app checks integrations, when the volume of metrics data collected was too large to send over the agent's internal queue, app checks could fail. This problem is solved by introducing an option to compress app checks metrics data, which reduces the internal load. See Compress Metrics Data for details on how to enable this option.

Fixed Issues

Sysdig Monitor

Fix for occasionally dropped metrics In earlier releases of Sysdig Monitor, the agent sometimes failed to parse metrics containing negative values for some fields.

This change updates the behavior to drop fields that have unsupported negative values, and to generate a log message when such fields are encountered.

Sysdig Platform

  • Fix for MySQL versions 8.0.14+

    Fixed a bug that caused the MySQL app check to fail with an error.

  • Fixed agent crash issue exposed by recent Linux kernels

    Affected kernels include the 5.2.x line, 5.1.8+, and 4.19.49+.

  • Fixed a bug in HTTP parserIn the (uncommon) situation where absoluteURI is used in the Request-URI, fixed a bug that was causing a faulty URL.

0.91 July 17, 2019

New Features

Improved securityRemoved obsolete and vulnerable Python 2.6-compatible libraries from Docker images.

More efficient Kubernetes event handling.

The agent has added functionality to allow more efficient processing of Kubernetes user events.

See Use Go to Process Kubernetes Events to enable.

Reduced CPU usage on Kubernetes clusters Extended performance optimizations for processing Kubernetes Services, which will reduce agent CPU usage in large clusters.

Container filtering enhanced. Smart filters and aggregated filtering options are now available. See Prioritize/Include/Exclude Designated Containers.

Fixed Issues

Monitor

  • Fixed issue with Prometheus metrics gathering intervals

    The agent will now respect the configured interval for scraping Prometheus metrics from remote endpoints, as opposed to doing it every second.

  • Fixed limit/requests calculations for init containers

    Fixed memory calculations for Kubernetes init container limits and requests

  • Improved Healthcheck monitoringAgent has improved ability to detect commands identified as a part of Kubernetes Liveness/Readiness Probes, in addition to Docker Health Checks.

  • Improved error messaging

    Warning messages for container group inconsistencies were demoted to debug level, as they are harmless and do not need to clutter the error reporting stream.

  • Fixed issue with container "incomplete" reporting status

    Starting with version 0.90.0, the agent would report containers for which it had not yet fetched metadata as "incomplete." This would then propagate to the Monitor UI. This restores the behavior where the agent leaves the unknown fields unset.

  • Resolved REST server issue

    Fixed problem where an enabled port would respond to HTTP requests when not desired.

  • Fixed issue with StatsD metrics collection

    Previous versions of the Sysdig agent, when configured to use the StatsD fowarder ({{statsd.use_forwarder: true}}) truncated messages that it received from containers to 2048 bytes, resulting in the potential for dropped and corrupted metrics. This change resolves that problem. See details under StatsD Integration.

Note

For earlier release notes, please see Sysdig Agent Release Notes here.

Note

It is recommended to follow upgrade best practices:

  • Keep upgrades current

  • Test upgrades in a non-mission-critical or staging environment before rolling into production.