Sysdig Agent Release Notes

12.8.1 August 29, 2022

Defect Fixes

Fix Vulnerabilities in Promscrape V1

Upgraded the Prometheus version and resolved vulnerabilities in promscrape v1.

Read information on users and groups from /host/etc/passwd and /host/etc/group when agent is running as a container.

Show Falco Events as Expected

Fixed a problem when the Falco output string for a rule is cut on the first ascent or empty field.

12.8.0 August 02, 2022

Feature Enhancements

Add a New Metric to Indicate Retrieving Kubernetes State

Added an internal metric, statsd_dragent_subproc_cointerface_ready to indicate when the agent has pulled Kubernetes state from the API server.

Read Certificate Chain

Previously, the agent would only accept the first certificate in a cert chain and would attempt to verify all other certificates from the configured certificate store. This behavior is compliant with the TLS specification, but idiomatic usage in the wild requires the agent to accept intermediate certificates provided in the handshake as well. The agent will now accept these certificates if provided.

Support for dup() Syscalls

The support for the dup() family of syscalls has been enhanced and additionally, support for dup2 and dup3 is now available.

Falco Rules Optimizer

Falco Rules Optimizer can now be optionally enabled. This feature increases the speed of syscalls evaluation against Falco rules by introducing indexing on the rules conditions and by caching partial rule condition evaluations. This feature is only available in Sysdig agent, but not in open-source Falco. The feature is enabled by setting falco_optimizer.enabled to true (default value is false).

New Falco Rules Parser

Starting from version 12.8.0, Sysdig agent uses a new Falco rules parser from OSS Falco. The new OSS Falco parser performs stricter grammar parsing and would fail on the following cases:

  • when \n is used instead of , in a list
  • when "[" is present in a rule definition
  • when \034 surrounded by " is present in a rule definition
  • when or operation between lists is used instead of, with in operator. For example: condition: open_write and fd.filename is (list1 or list2)

If any of the above cases are present in custom rules file, the agent fails to parse the respective rule and outputs the following error:

Error, security_mgr:791: Could not load policies_v2 message:.

In this case, the custom rules have to be edited to correct or remove unparsable rules

Defect Fixes

Process Kubernetes Audit Events as Expected

Agent no longer throw errors while processing Kubernetes audit events when Kubernetes audit rules contain the endswith condition.

Upgrade Go Language Packages

Go language packages have been upgraded to fix vulnerabilities

Fix Vulnerabilities

Fixed the following vulnerabilities with Promscrape V2:

  • CVE-2015-3627
  • CVE-2021-3121
  • CVE-2020-14040
  • CVE-2014-6407
  • CVE-2014-9356
  • CVE-2014-9357
  • CVE-2022-23648
  • CVE-2022-27191
  • CVE-2021-41103
  • CVE-2020-15257
  • CVE-2014-9358
  • CVE-2021-21334
  • CVE-2020-13401
  • CVE-2014-5277
  • CVE-2020-13401
  • CVE-2020-8565
  • CVE-2021-32760
  • CVE-2021-20329
  • CVE-2019-11254
  • CVE-2021-4189
  • CVE-2020-8565
  • CVE-2021-4189
  • CVE-2021-3737
  • CVE-2021-3634
  • CVE-2021-3634
  • CVE-2021-3737
  • CVE-2022-1996

Detect Prometheus Targets Correctly

Fixed a problem that was causing new prometheus targets to not be detected until an agent restart.

Intermittent Scraping Failure No Longer Causes Missing Metrics

Fixed an issue with missing metrics when there are intermittent metrics scraping failures.

Show Falco Events as Expected

Sysdig agent now throttles redundant secure events for compliance policies reducing the event noise.

Show Username Correctly in Policy Events

Fixed an agent build issue that caused password and group functions unvailable and therefore, linked the password and group from `/host/etc’ inside the agent container to have the username correctly shown in policy events.

Fix a Logging Issue in Promscrape V2

Fixed a logging issue with Promscrape v2. Log levels are taking effect as expected when passed in with --log.level.

Agents No Longer Incorrectly Behave as Delegated

Fixed an issue that might cause all the agents to behave as delegated.

12.7.1 July 06, 2022

Defect Fixes

Fixed memdump.size Issue

Fixed the memdump.size configuration, which was not being honored.

Fixed Promscrape Crash Issue

Fixed a crash issue in Promscrape v2 when a node has a large number of pods and multiple containers per pod.

Fixed Issue Affecting Two Agent Modes

Fixed a problem that can cause agent subprocesses to be killed in nodriver mode. This affects the custom-metrics-only and monitor_light modes. For more information, see Configure Agent Modes.

12.7.0 June 28, 2022

Feature Enhancements

New Helm Chart

Sysdig released a unified helm chart, sysdig-deploy with the following benefits:

  • Easier to deploy multiple components with one chart, rather than using multiple separate charts
  • Fewer errors by way of using common configuration for components
  • Auto-detection of certain configurations, including eBPF for GKE COS and endpoint region.

We will maintain the old version of helm chart, sysdig chart for a period of six months. In this period, the sysdig chart will be updated with new component versions and and defect fixes.

Live Logs

Sysdig Monitor displays Live Logs in Advisor to allow for troubleshooting Kubernetes, which is the equivalent of running kubectl logs. Live logs are displayed on-demand and not stored by Sysdig.

Support Prometheus v2.32

Updated Prometheus scraper to version 2.32.

Metrics Collected in Custom Metrics Only Mode

When custom-metrics-only mode is used, no process metrics are collected. Additionally, only the metrics related to resources (CPU, memory) are collected for containers and host.

Known Issues

While the agent is running, you might encounter an error similar to the following:

Error, security_rule:610: Could not parse rule xx from rules json array.

The rule number in the error message might change depending on how many rules are defined.

This is a known issue related to failing to parse an experimental rule. The parser will skip this rule and will log the error message as above. The agent performance and policy evaluation will not be affected.

Defect Fixes

Remove Ceph App Checks

Fixed a problem where errors for obsoleted app checks would be shown when Ceph was running on the host.

Disable Timeseries Caching

Removed a configuration option which caused Prometheus jobs to not report timeseries if the scrape failed temporarily.

Builds eBPF Probes in Bottlerocket

Fixed an issue that prevented ebpf probes from having built by the agent in Bottlerocket Environments.

Reports Infrastructure State Correctly

Fixed an issue where Sysdig agent would opens a stream to Cointerface even when it is disabled. This resolves the issue of infrastructure state having reset costantly.

Sends Only Supported Metrics in Nodriver Mode

Fixed an issue where unused container and process metrics were sent while in nodriver mode.

Change Log Level to DEBUG When Excessive Log Level Occurs

The excessive logging level occurs under specific conditions, for example, a pod whose used memory results in zero. This case seems to be normal for small pods using very little memory. Fix has been provided so that, when these conditions are detected, the log level for the message that is polluting the logs is brought from INFO to DEBUG.

Reports Container Resource Limits and Requests Correctly

Fixed an issue where container resource limits and requests would appear as zero when no limit or request was configured.

12.6.0 May 16, 2022

Defect Fixes

Reloading Promscrape V2 No Longer Causes Dropping Scrape Targets

Reloading promscrape v2 no longer causes dropping some scrape targets from sending metrics.

Losing Node No Longer Generates Duplicate Node Events

Resolved an issue that caused generating duplicate events when a Kubernetes node is lost.

Agents Connect to SaaS Backend Through HTTP Proxy on Older Hosts

Fixed an issue related to SSL certificate verification when connecting through an HTTP proxy on older host OS, such as CentOS 7.

Agent Refreshes Service Account Token as Expected

Connection with the Kubernetes API Server works as expected. The Kubernetes client is configured to refresh the bearer token.

12.5.0 May 02, 2022

Feature Enhancements

Default Availability of Slim Agent

The agent installation is defaulted to the slim agent. Slim agent reduces the surface area for potential vulnerabilities as compared to the full agent, which implies increased security for your monitoring environment. For more information, see Agent Installation.

To continue using the regular agent, set slim.enabled to false in your helm chart.

Monitoring Kubernetes Resources

Sysdig agent v12.5.0 and above no longer collect the HPA kube state metrics by default. To enable the agent to collect HPA kube state metrics, you must edit the agent configuration file, dragent.yaml, and include it along with the other resources you would like to collect. For more information, see Enable Kube State Metrics.

Container DriftControl: Detect and Prevent Drift in Container Runtime

Sysdig agent can now detect when a new executable was added to a container after a container has started up. The agent collects when a file was downloaded and made executable. When using prevention mode, the agent can also deny the process from ever running. A policy can also be used to define binaries that should be denied/excluded from being denied if they have been added after the container has started.

See also: Drift Policy

Disable Syscalls for Secure Modes

Switch syscall events are disabled for secure and secure light modes.

Known issues

  • An error message is displayed when the agent detects ceph and attempts to run an obsoleted app check.
  • The Sysdig agent for ARM can restart when multiple containers are started in rapid succession on the host.

Defect Fixes

Agent on zLinux No Longer Restarts Due to Incorrect Detection of tid Colliions

The agent on s390x architecture (zLinux) has been fixed so the agent does not restart needlessly due to incorrect detection of too many tid collisions.

Reports Correct CronJob Version When Adding CronJob Parents

Fixed an issue causing CronJobs to be reported not as the parents of Job objects.

Agent No Longer Crashes During Abnormal Termination

Fixed an issue causing the agent to crash with a stack backtrace during certain abnormal termination situations.

Slow-Starting JVMs Are Terminated Correctly

An incorrect detection of too many tid collisions on s390x architecture (zLinux) will no Longer cause the agent to restart periodically.

Kubernetes Events Are Collected as Expected

Fixed an issue that could prevent Kubernetes events from being correctly fetched.

Disable Watching HorizontalPodAutoscaler

Watching Horizontal Pod Autoscalers has been disabled by default to decrease load on Kubernetes API server. For more information, see Enable Kube State Metrics.

False Positive CVEs for Go Packages No Longer Reported

The Go compiler version has been upgraded to prevent getting flagged with (false-positive) CVEs associated with older Go versions.

Secure Events Reports Correct Cluster Information

The Secure events no longer report Kubernetes cluster name default when no cluster exists in the environment.

12.4.0 April 04, 2022

Feature Enhancements

Support for New Architectures

Installing agent on the following architecture are supported:

  • ARM (aarch64)

    aarch64 environments support AWS Graviton

  • s390x (zLinux)

For more information, see Host Requirements for Agent Installation.

ARM support includes AWS EC2 Graviton platform

Custom-Metrics-Only Mode

A new agent mode, custom-metrics-only, has been introduced. It enables all custom metrics and Kubernetes state metrics but disables all the driver-based metrics.

Prevent Processing Policy Updates

Prevent processing policy update messages to reduce CPU usage when no changes are required in the agent.

Known Issues

Increased Resource Consumption due to Misconfiguration of Node Lease

Incorrect configuration of Kubernetes lease can result in elevated memory usage in the Sysdig agent pods as well as increased load on the Kubernetes API server due to multiple agents querying for more information simultaneously. This also results in a significant amount of additional and unnecessary load on the Sysdig backend. To resolve this issue,

  • Upgrade to Sysdig agent 12.5.0 which adapts to the non-optimal Kubernetes configuration.
  • Configure the Kubernetes lease functionality. If you are using Helm, the latest versions of the Sysdig Agent Helm chart defaults to configuring the lease functionality automatically. If you do not use Helm, the DaemonSet and ClusterRole YAML files are available in our gitbub repository. For further assistance, contact Sysdig Support.

Agent Restarts Periodically on zLinux

An incorrect detection of too many tid collisions on s390x architecture (zLinux) can cause the agent to restart periodically. To workaround this issue, set the following configuration option:

watchdog:
  analyzer_tid_collision_check_interval_s: 86400

This configuration change reduces the number of restarts to once a day instead of every 10 minutes, which is the default value for the above configuration option.

This issue has been fixed in Sysdig agent v12.5.0.

Defect Fixes

Validate Promscrape Scrape Jobs

Validate scrape jobs associated with promscrape integration before scraping the endpoints to avoid unnecessary errors with irrelevant scrape jobs.

Remove App Check Warning Messages When App Checks Are Disabled

Remove unnecessary warning messages about app checks limits when app checks are disabled.

Slow-Starting JVMs Are No Longer Terminated

Slow starting JVMs can be terminated by sdjagent. For example, -XX:+AlwaysPreTouch with large heaps. This fix introduces additional configuration options to tune the delay between sdjagent detecting a started JVM process and an attempt to connect.

jmx:
  monitor_connect_timeout_ms: 5000
  management_agent_connect_delay_ms: 0

EVE Connector Works as Expected in Kubernetes

Fixed metadata incompatibility in profiling with Kubernetes versions above 1.20.

Name Change to Configuration Parameter

The falcobasline.max_drops_buffer_rate_percentage parameter has been corrected to falcobaseline.max_drops_buffer_rate_percentage. Notice the missing e in falcobasline in falcobasline.max_drops_buffer_rate_percentage. However, the backward compatibility is ensured, and therefore, falcobasline.max_drops_buffer_rate_percentage can still be used.

12.3.1 March 03, 2022

Defect Fixes

Noisy Messages Silenced

Removed a kernel message from the driver that could generate spam when the syscall event buffer is full.

12.3.0 February 17, 2022

Feature Enhancements

Binaries Category for Falco Baseline

A new category, binaries is added to the Falco baselines feature.

Support for Workload Information in Falco Baseline

Add workload information to Kubernetes context for Falco baselines.

Default Monitoring of Kubernetes Resources

The following kubernetes resources are monitored by default:

- persistentvolumeclaims
- persistentvolumes
- storageclasses
- horizontalpodautoscalers

Known Issues

IPv6 Addresses Are Saved Incorrectly When Adding Rules

Adding a new rule causes problem saving IPv6 address for both fd.net and fd.ip.

Defect Fixes

Fix Truncated Capture Files

Fixed a problem which caused the agent to generate truncated capture files.

Container Action Pause Work on Kops/GKE Clusters

Fixed the logic that determines the cgroup path for a container in containerd and made the freezer subsystem available to the agent in order to be able to pause/unpause it.

Agent Profiling Works as Expected

High CPU load no longer prevents generating CPU and memory profiles in the agent.

Agents Are Not Reset with Signal 11

Large and negative file descriptors are handled correctly so agents are no longer reset with signal 11.

12.2.1 February 07, 2022

Feature Enhancements

Manage Collecting Metadata from Individual Container Engines

Access to individual container engines from within the agent for fetching metadata can now be disabled via agent configuration. For example, to disable docker, use the following configuration:

container_engines:
  docker: false

Known Issues

The Pause policy action is not working as expected in GKE, EKS, and Openshift4 environments.

Defect Fixes

Policy Action “Kill” Is Correctly Triggered in GKE Environments

Policy action on GKE with containerd works as expected:

  • The container is stopped if HTTP proxy is enabled.
  • The status of the container is checked upon stop requests. If the status is not CONTAINER_EXITING, termination of the container is attempted with exponential backoff.

Agents Assign Username Correctly for Container Events

Fixed an issue that prevented the proc.name field from extracting the right user from the container started events. This issue was found in agent versions 12.2.0 and above.

12.2.0 January 25, 2022

Feature Enhancements

Improve Install Script to Support eBPF

A new option, bpf or -b is added to the native install script of Sysdig agent to support eBPF.

Enable 10s Flush by Default

By default, the agent collects metrics at 1-second granularity, then aggregates and sends them to the backend in 10-seconds intervals. If you want to use agent versions 12.2.0 or above with the on-prem Sysdig Platform versions below 3.5.0, set the 10s_flush_enable configuration to false to prevent compatibility issues.

The backend in our SaaS deployments continues to enable 10-second flush automatically for all agent versions 10.0.0 or above.

Improved Log Messages

Improved the log messages to report the errors encountered while configuring subprocess_resource_limits.

Known Issues

Processing Secure policy updates in the agent can take longer than it did in the previous releases, and in some rare scenarios, it causes agent restarts.

Defect Fixes

Fix CVE-2020-29652 in Cointerface

Updated crypto go module to fix CVE-2020-29652.

Promscrape V2 No Longer Crashes on Pods with Multiple Containers

Prevent promscrape_v2 from crashing when a pod has multiple containers.

skip_events_by_type Works as Expected

Fixed an issue in the kernel probe, which prevented the skip_events_by_type feature from correctly filtering events by system call type.

Kubernetes State Is Transmitted as Expected

Fixed an issue where Kubernetes information and metrics would not be sent from the agent. This scenario arose when the agent was deployed in a namespace other than sysdig-agent, and the agent daemonset did not include the podinfo volume.

Agent Successfully Connects to JMX

Fixed an issue where agent wouldn’t connect to JMX on some applications/JVMs. This issue was originally observed on the WebSphere application and Liberty JVM.

Agent Updates Container Status as Expected

Fixed an issue where the agent would not update the container status it first received from the API server. The agent now updates the container statuses as it receives them from the API server.

Check for Invalid Log Level in sdjagent

Fixed an issue where using a log level of none caused sdjagent to crash.

App Checks Run as Expected on Non-Containerized Agent Installations

Fixed an issue preventing app checks to run on non-containerized agent installation.

Native Install Doesn’t Support eBPF

Prevents insertion of Sysdig probe kernel module when the agent is installed with eBPF by using rpm or deb package.

Prevents Connection Attempts When Agent Encounters Errors

Connection attempts are prevented when the agent encounters errors while handling handshake messages.

12.1.1 November 22, 2021

Defect Fixes

Falco Action Works as Expected

The kill container Falco action works as expected for containerd in Azure.

12.1.0 November 08, 2021

Feature Enhancements

Ability to Build eBPF Probes for Debian 11 Kernels

The agent container has been enhanced to build probes for Debian 11 kernels.

Prebuilt Probes for Debian 11 Kernels

Prebuilt probes are added for Debian 11 kernels.

Prebuilt Probes for Fedora Kernels

Prebuilt probes are added for latest Fedora kernels.

Ability to Build eBPF Probes for Linux Kernel v5.10

The agent container can now build eBPF probes for Linux kernel version 5.10.

Enhanced Agent Containers for Probes on New Kernels with glibc v2.33

The agent container has been enhanced to build probes for new kernel versions that use glibc v2.33.

File Metrics in Audit Tap

Metrics related to file are included in audit tap.

Promscrape Memory Usage Limit

You can now limit the promscrape memory usage. The default is set to 640 MB. For more information, see Sysdig Agent.

Remove Self-Signed Certificate for Agent to Collector Connection

Self-signed certificate support has been removed for agent connection to the collector. See End of Support.

Defect Fixes

Image Profile Shows Results Correctly

The imageid is reported correctly when using a CRI engine.

Duplicate Environment Variable Hashes No Longer Appear in Audit Tap

The discrepancy between reported environment variables and hash in audit tap has been fixed.

Kubernetes Daemonset and Replicaset Association Works as Expected

Fixed an issue that could invalidate the association between Kubernetes Daemonset and Replicaset.

Agent Updates Prometheus Configurations Correctly

Fixed a problem that was causing Prometheus configurations to be merged incorrectly when certain integrations were updated from the backend.

12.0.4 October 29, 2021

Defect Fixes

Secure Policies Load as Expected

Fixed an issue present in 12.0.3 where Secure policies might not be loaded correctly by the agent.

12.0.3 October 22, 2021

Defect Fixes

Leases Fallback Works as Expected on OpenShift v3

Fixed an issue where Kubernetes clusters that don’t support leases failed to report Kubernetes data due to not falling back to the previous behavior.

Update the Cluster Install Scripts for Leases on OpenShift

Modified the OpenShift agent installer to add the sysdig-agent cluster role and to assign it to the sysdig-agent service account. The new cluster role allows the agent to utilize the coldstart leases.

12.0.2 September 30, 2021

Defect Fixes

Network Security Communication Works As Expected

In some environments Sysdig agents could not send any Network Security (Kubernetes Network Policies) communications upon not completing CIDR auto-discovery. This issue has been fixed.

Agent No Longer Crashes in Orchestrated Environments

Fixed a problem related to a race condition in orchestrated environments, such as OpenShift v3, due to which the agent might crash repeatedly at the agent start.

12.0.1 September 27, 2021

Defect Fixes

OpenShift 4 Clusters Able To Retrieve Metadata Without Leases

Fixed an issue where OpenShift clusters would fail to report Kubernetes data when the agent service-account did not have the permission to create leases. With this fix, the Sysdig agent falls back to the previous behavior to retrieve the metadata.

12.0.0 September 15, 2021

Feature Enhancements

Allow Sysdig Backend to Manage Prometheus Configuration

Allow Sysdig backend to manage Prometheus configuration. For more information, see the following:

Agent Console Supports Troubleshooting Prometheus Configuration

The Agent Console now supports troubleshooting Prometheus configuration.

To support this feature, Agent Console is enabled by default. This helps both users and Sysdig support to troubleshoot Sysdig agent issues. Sensitive user configuration is obfuscated and not viewable.

For more information, see Using the Agent Console.

Support for Node Leases

Sysdig agent supports using Kubernetes Lease to control how and when connections are made to the Kubernetes API Server.

For more information, see the following:

Support for Podman Environments

Sysdig agent is supported in Podman environments. For more information, see Prerequisites for Podman Environments.

Add Startup Delay to Agent to Kubernetes API Server Connection

Added a delay prior to the agent connecting to the Kubernetes API server. The delay time is set based on the number of nodes in the cluster to prevent overloading the API server. This is to support environments where node leases cannot be used.

Known Issues

None

Defect Fixes

Stale Capture Files No Longer Exhaust Local File System

Prevent incomplete and stale capture files from being left behind and thereby avoiding storage consumption for such files.

Honor CPU Quotas

Moved the main dragent process to the default cgroup so that CPU quotas can cover all the agent processes.

Containers Are Detected as Expected

Fixed issue where containers are not detected if SystemdCgroup = true is not enabled in the containerd configuration.

Report Correct Container Metadata

Fixed a problem that caused some container metadata such as the image repository and image tag to be reported incorrectly.

Upgrading from 10.8.0 to 11.3.0 No Longer Fails

Provide a http_proxy configuration option to address connection problems post-OpenSSL upgrade from v11.0 to v11.1.

11.4.1 August 03, 2021

This is a hotfix release.

Defect Fixes

Fixed a problem that broke app checks in agent-slim by adding the missing dependencies.

11.4.0 July 28, 2021

Feature Enhancements

Probe Builder

The probe builder can now be used to build kernel modules for the Sysdig agent. It can run on any host with Docker installed, including (with some preparation) air-gapped hosts.

Probe Builder is now enabled and available at https://github.com/draios/probe-builder. See the Readme for more information.

Promscrape v2

Promscrape v2 (used when prom_service_discovery is enabled for Prometheus) has been changed to discover only Kubernetes pods running on the same node as the agent. This should help reduce the load on the Kubernetes API servers in large clusters.

Added Missing Fields for Unified Workload Metrics

Added Kubernetes metric fields indicating the availability of daemon sets (status.numberAvailable, status.numberUnavailable, and status.updatedNumberScheduled) and replica sets (status.availableReplicas) to support workload-level metrics (SaaS only).

Known Issues

App checks in agent-slim don’t work due to missing dependencies. This problem will be addressed in an upcoming hotfix release.

Defect Fixes

Multiple Hosts No Longer Report the Same Pod

Fixed an issue causing multiple hosts to report the same pod if its UUID is the same on both hosts.

Duplicate StasD Metrics Are Reported Correctly

Fixed an issue related to handling duplicate StatsD metrics corresponding to a container that is reported by a host.

Stale Markers Are Sent properly for Dropped Targets

Properly generate stale markers for Prometheus metrics when a scrape target is no longer available and when using promscrape.v1.

Report a Positive Time Delta Value

Fixed a defect that could result in an invalid file.time.in, file.time.out, file.time.other, and file.time.total values.

Agent No Longer Crashes When App Check or Prometheus Is Enabled

Fixed a defect that could cause crashing the agent when app checks or Prometheus is enabled.

Secure Captures No Longer Causes Host Shutdown

Prevent agent restarts caused by apparent stalls encountered in the sample handler thread.

11.3.0 June 10, 2021

Feature Enhancements

Console Logging

Introduced per-component-level console logging feature. See Manage Console Logging for Agent Components.

Slim Agent for eBPF Probes

agent-kmodule and agent-kmodule-thin can now be used to build eBPF probes.

Replication Controller Fields

Added missing replication controller fields to the aggregator Actions.

Non-Delegated Agents Retrieve Less Data From the API Server

Use Kubernetes leases to better control the load on the Kubernetes API Server. This is disabled by default.

Defect Fixes

Agent No Longer Generates Core Dumps on Java

Prevents java process core dumps caused by the Sysdig agent while trying to access /tmp directory.

Support Container Action on Containerd

Container actions are now properly supported on containerd (CRI-O and other CRI engines that already had support). Actions for unsupported container engines are now properly reported to the Sysdig backend and a warning message is logged in the agent logs.

Recovery During Agent Shutdown

Introduced a detection and recovery mechanism for hangs during agent shutdown.

Promscrape V2 Termination No Longer Causes Agent Crash

Fixed a problem causing the agent to crash after promscrape_v2 is terminated.

Agent No Longer Restarts in Kubernetes Environment

The agent tries to fetch the metadata of the AWS instance in which it is running in order to tag metrics generated with the information unique to the AWS instance. If the metadata structure is not as expected, the agent continuously restarts due to an error in fetching such metadata. This issue has been fixed.

Profiling Works as Expected

Fixed an issue that disabled support for performance profiles in the agent.

11.2.1 May 06, 2021

This is a hotfix release.

Defect Fixes

Report Container User Information

Start tracking container user information and make that information accessible in container events. These events denote having a container started. This feature works for Docker as well as CRI-O container engines.

Reporting container user information does not work in OpenShift 4.x because it does not provide necessary CRI-O information.

11.2.0 April 26, 2021

Feature Enhancements

Agent CLI

Sysdig supports Agent CLI, a command-line interactive tool, to troubleshoot agents. This tool helps Sysdig support to solve user issues quickly and efficiently. It is currently disabled by default and requires the customer to turn it on.

For more information, see Using the Agent Console

Scraping Prometheus Metrics

Scraping Prometheus metrics is supported in the following cases:

  • Advertised ports on container IP addresses

  • Advertised ports on host IP addresses

  • Advertised ports on pod IP addresses

Slim Agent for IKS

Use the following:

Reduce Load on Kubernetes API Server

Terminated pods are no longer collected in order to reduce the load on the Kubernetes API server.

Audit Server Listens on All Interfaces

The audit server now by default listens on all the interfaces for Kubernetes audit events. This makes integration with Kubernetes audit events in the agent easier without the need for configuration changes.

Improved Noise-Reduction Filter for Activity Audits

The noise-reduction filter for Activity Audit has been improved. All the filtered data is duplicated.

Defect Fixes

CRI-O Versions Report Correct Image ID

The new CRI-O versions (1.19+, possibly 1.18) now properly report container.image.id.

Log Level Changes for Duplicate Host Container Groups

Demoted logs about duplicate host container_groups from warning to debug level

Fix CVE-2021-28831

Fix CVE-2021-28831 in the Slim Agent container.

11.1.3 April 13, 2021

This is a hotfix release.

Defect Fixes

Prevent Agent CrashLoopBackoff Error Caused by Smaller initialDelaySeconds Values

The readiness probe improvement in version 11.1.2 delayed the transition of the agent pod to a ready state until communication with the Kubernetes API server was established. But this delay could cause a CrashLoopBackoff due to liveness or readiness probes configured with an initialDelaySeconds set to less than 90.

In Agent version 11.1.3 the transition to the ready state does not wait for communication with the Kubernetes API server to be established unless the behavior is enabled via a new configuration option: k8s_wait_before_ready.

11.1.2 March 30, 2021

Known Issues

Prevent Agent CrashLoopBackoff Error Caused by Smaller initialDelaySeconds Values

The readiness probe improvement in version 11.1.2 delayed the transition of the agent pod to a ready state until communication with the Kubernetes API server was established. But this delay could cause a CrashLoopBackoff due to liveness or readiness probes configured with an initialDelaySeconds set to less than 90.

Workaround

If you are using agent version 11.1.2, set initialDelaySeconds for both liveness and readiness probes to a value that is greater than or equal to 90.

Feature Enhancements

Enhanced Connection with Kubernetes API Server

Kubernetes reconnect logic has been improved to automatically backoff (1 min, 2 min, 4 min… 1hr) if the connection is continuously dropped when using Thin Cointerface. This reduces the load that the agent imposes on the Kubernetes API Server in clusters with heavily burdened API servers.

Reduced Load on Kubernetes API Server

The agent’s readiness probe has been improved to not report ready until after the agent connects to the Kubernetes API server. This reduces the load that the agent imposes on the Kubernetes API server when starting up during RollingUpdate.

11.1.1 March 26, 2021

Defect Fixes

Agent Reports Memory Usage Accurately for Containers

Fixed an issue where the agent would incorrectly report memory.bytes.used for containers that use more than 4GB.

Runtime Policies Work as Expected

The runtime policies that have a policy type and capture action are handled as expected.

11.1.0 March 23, 2021

Defect Fixes

Agent Tags in Policy Scopes

Agent tags are supported in runtime policy scopes.

Metric Limits Are Updated As Expected

Fixed a problem where metric limits were not updated from the defaults. This is unlikely to happen if agents are connected to the SaaS backend.

Configured Tags in Prometheus Scraper

Fixed a problem in the old Prometheus scraper (used when promscrape is disabled) to ensure that configured tags are properly added to the metrics.

JMX Metrics for Short-Lived Java Processes

Fixed an issue where short-lived Java processes could cause the Sysdig Agent to stop collecting JMX metrics.

Misconfiguration No Longer Leads to Agent Constantly Querying Kubernetes API Server

Fixed a problem where the agent would continuously send requests to the Kubernetes API server to query the endpoints API. This occurs when the agent’s clusterrole is incorrectly configured. With this fix, the agent will no longer repeat the attempt if it is unable to connect to the Kubernetes API during boot.

Scope Runtime Policies

The runtime policies are now correctly scoped by kubernetes.cluster.name. The fix in 10.6.0 was incomplete.

Agent Correctly Reports Replicasets

Fixed an issue where the agent could lose track of a replicaset and report incomplete metadata.

Agent Issues Over HTTP Proxy

  • Fixed an agent connection issue over plaintext HTTP proxy with encryption.

  • Fixed an agent connection issue via HTTP proxy connections over SSL.

11.0.0 February 18, 2021

Feature Enhancements

Thin Cointerface to Reduce Memory Usage

Thin cointerface reduces the memory required to handle the Kubernetes metadata on both the agent and the Kubernetes API Server. The reduction in memory usage is significant for Kubernetes clusters with a large number of pods (in the range of 10,000 or more) or clusters that heavily use Replication Controllers.

Using this feature returns the same data to the Sysdig backend and does not affect any Sysdig features. The thin cointerface feature is disabled by default.

To enable:

  1. Add the following in either the sysdig-agent’s configmap or via the dragent.yaml file:

    thin_cointerface_enabled: true
    
  2. Restart the agent.

See also: Reduce Memory Consumption in Agent.

Reduce the Volume of Agent Log Messages

Some high-frequency information level log messages are converted to debug level to reduce the volume of messages generated at the default information level.

File Logging Capability

Per-component file logging capability for an additional set of agent components has been enabled.

For more information, see Manage File Logging for Agent Components.

Reduce Agent Memory Consumed by Prometheus

The number of Prometheus time series ingested has been limited to reduce agent memory consumption. This limit is applied after Prometheus relabeling rules are applied but before the agent’s metric filter and metric limit.

Defect Fixes

Missing Metrics Due to Aggregation in Agent Fixed

Fixed an issue where processes with certain names were improperly aggregated, which in turn caused missing metrics in certain situations.

Cointerface Fix

Fixed an issue that caused the agent’s cointerface process to restart continuously while processing kubernetes label selectors.

10.9.1 January 21, 2021

Defect Fixes

Thin Cointerface Works as Expected

Fixed a defect in the Thin Cointerface feature which could cause Kubernetes metadata to stop updating. Because Thin Cointerface is turned off by default, the change affects only a small number of users who have this feature turned on.

10.9.0 January 13, 2021

Feature Improvements

Support for Kubernetes Cronjobs

Kubernetes cronJobs are supported when reporting network communications.

Defect Fixes

Runtime Policies and Rules Are Loaded with No Errors

Fixed a race condition that could prevent runtime policies and rules from being loaded properly if multiple messages from the Sysdig backend are received consecutively.

Cluster Overview Displays Compliance Score

Fixed an issue where Statsd metrics related to compliance would have no associated Kubernetes metadata and were not visible on Cluster Overview.

10.8.0 December 18, 2020

Defect Fixes

Filtering Long Container Labels

Filtering long container labels works as expected with no parsing failures or undesirable agent restarts.

Correct kubernetes.pod.restart.rate Metric

Fixed an issue that could cause kubernetes.pod.restart.rate metric to be incorrect.

Prometheus Metrics With Multiple Process Listening Concurrently

Fixed a problem that caused scraping Prometheus metrics to fail when another process was listening to the TCP port 9090 on a host interface.

StatsD Metrics Reports Correct Value

Fixed a problem that caused Statsd metrics to report incorrect values.

Correct Environment Variable Hash in Audit Tap

Fixed an issue that could cause the environment variable hash associated with the exported processes in audit tap to have an incorrect value.

Improve JMX Availability Check

The sdjagent process in the agent no longer consumes excessive CPU resources.

10.7.0 November 20, 2020

Feature Improvements

Policies and Baselines V1 Messages Are Deprecated

Sysdig agent no longer supports the old backend message types that were originally deprecated in on-prem release 2.4.0 (August 2019).

Load Falco Rules on a Separate Thread

Partially load Falco rules in the background to avoid interrupting event processing.

Workflow for Unacknowledged Metrics

The agent is restarted if a metrics acknowledgment hasn’t been received from the Sysdig backend components in 8 minutes. This can happen if networking issues cause the agent to believe it has an active connection when the backend has closed the connection.

Run Single Agent RPM Per Host

Prevents multiple agent services from being launched on the same RHEL-based hosts.

Known Issues

The host.container.start.count metric acts as a counter metric and its value increases monotonically.

Defect Fixes

OpenShift Hardening Guide Correctly Detects Master and Worker Nodes

Running the OpenShift Hardening Guide functionality of the Kubernetes Benchmark will now correctly detect master vs worker nodes, and run the appropriate Benchmark tests.

Agent No Longer Terminates Non-Agent Processes

In some rare situations when process creation in the Agent’s JMX module failed due to issues caused by resource limits, it could inadvertently stop unrelated processes running on the host. This problem has been fixed.

10.6.0 October 30, 2020

Feature Improvements

Python 2.7 Is No Longer Supported in Agent Containers

Python 2.7 has been removed from the agent and agent-slim containers.

This is a breaking change for users who are using an agent container and have set the python_binary configuration to /usr/bin/python2.7.

To prevent breaking the setup, do one of the following:

  • Remove the python_binary configuration option.

  • Set python_binary to /usr/bin/python3.

Sysdig agent continues to support python 2.7 if installed as a service and the host has python 2.7.

Kubernetes Benchmarks

Updated kube-bench to support Kubernetes benchmarks and targets. For a complete list of benchmarks, see Benchmarks (Legacy) .

  • Kubernetes benchmark 1.6

    • Master

    • Control plane

    • Node

    • etcd

    • Policies

  • Google Kubernetes Engine (GKE) Benchmark 1.0

    • Master

    • Control plane

    • Node

    • etcd

    • Policies

    • Managed services

  • Amazon Elastic Kubernetes Service (EKS) Benchmark 1.0

    • Control plane

    • Node

    • Policies

    • Managed services

Configuring Prometheus Metric Expiration Time

Configuring metrics expiration time is supported by promscrape.v2 for Prometheus metrics gathered by using Prometheus service discovery.

Support for Scoping Policies by Kubernetes Cluster Name

Add support for scoping policies by kubernetes.cluster.name. The cluster name must still be manually configured by using the configuration option, k8s_cluster_name: <CLUSTER NAME>.

Improved Prometheus Service Discovery

Made kubernetes node matching more reliable for Prometheus Service Discovery by comparing IP addresses as opposed to node names in the default configuration.

Defect Fixes

CVE Fixes

Addressed a known vulnerability in the jackson-databind package version 2.9.10.6 by upgrading to version 2.11.3 in agent containers.

Reduce Severity of NoClassDefFoundError Log from Error to Info

Changed the java NoClassDefFoundError class from Error to Info to reduce spamming the logs at the Error level. This happens commonly when the agent attempts to read metrics from a java v11 application which was not started with the com.sun.management.jmxremote option.

StatsD Metrics No Longer Show Larger Than Expected Values

Fixed a problem that caused StatsD metrics to be double the expected value.

Remove Warning Logs

Removed warning logs about ambiguous source labels when using the Prometheus service discovery with multi-container pods.

10.5.2 October 21, 2020

Defect Fixes

Memory Leak No Longer Occurs in the Agent

Fixed an issue that could potentially cause a slow increase in the agent’s memory usage over time when the thin_cointerface_enabled configuration option is enabled.

10.5.1 October 08, 2020

Feature Improvements

Added New Rules to the Prometheus Configuration to Honor Pod Annotations

Improved the default Prometheus configuration for promscrape.v2 to honor pod annotations.

Known Issues

Logs warning messages in the agent log file when promscrape.v2 is enabled.

Defect Fixes

Pods Are No Longer Associated with Incorrect Deployments

Fixed a problem that could cause a pod to be associated with incorrect deployments.

10.5.0 September 24, 2020

New Features

Enable Communication Between Agent and Collector Through a Proxy Server

Sysdig agent to the collector communication can be established via an HTTP or an HTTPS Proxy server.

For more information, see Enable HTTP Proxy for Agents.

Default Prometheus Configuration File

A new version of promscrape, promscrape.v2 , has been introduced to offer native Prometheus service discovery capabilities. To support this, a default prometheus.yaml file has been added with Kubernetes pod discovery rules to use when native Prometheus service discovery is enabled. See Enable Prometheus Native Service Discovery for more information.

Secure Mode

Sysdig agent now supports secure mode that offers Secure only features. See Secure Mode for more information.

Known Issues

None.

Defect Fixes

CVE Fixes

Addressed vulnerabilities reported in the agent and agent-slim containers, including the one for CVE-2017-18640 in a dependency library related to image scanning.

Agent No Longer Hangs While Handling Connection Errors

Fixed an issue that caused the agent to hang while handling some types of connection errors. When this issue is encountered, restarting the agent will allow it to reconnect.

Upgrading to Sysdig agent v10.5.0 or higher is strongly recommended to avoid this problem.

Scraping Prometheus Endpoints in Docker Containers

Prometheus metrics can now be scraped from endpoints in Docker containers with remapped port numbers.

Prevent Agent Crashes in Large Systems

The agent now starts faster on systems with thousands of processes and hundreds of containers.

Warning for Prometheus Metric Limit

The agent logs a warning once in a minute when the Prometheus metric limit is reached.

Transmitting Prometheus Metrics Works As Expected When Service Discovery Is Enabled

Fixed a problem that could randomly result in Prometheus metrics not being sent when Prometheus service discovery is enabled.

Appcheck Metrics No Longer Go Missing

Fixed a problem that would cause certain app check metrics to be missing when 10-second aggregation in the agent is enabled.

Agent Now Times Out If Connection Attempt to Collector Does Not Work

Added a timeout to the handshake protocol between agent and collector.

Agent Now Collects JMX Metrics from New Process Following a Java Service Restart

Fixed a problem that randomly caused JMX metrics to be not collected due to transient errors encountered during the startup of new Java processes.

Pod to Service Connection

Fixed a problem that caused the UI to show a pod under an incorrect service if other services exist in different namespaces with the same selectors. This happened when the thin_cointerface_enabled property was set to true.

Syscall Fast Rule Triggers as Expected

Fixed the evaluation of secure fast engine syscall rules when the If Not Matching rule is selected.

10.4.1 August 26, 2020

Defect Fixes

Kubernetes Pods No Longer Lose Association with Resources

Fixed a problem that could cause Kubernetes pods to lose association with their deployment or other related resources.

10.4.0 August 19, 2020

New Features

Ability to Scrape Prometheus Metrics from Container IP Addresses

The agent can now scrape Prometheus metrics from the docker containers that expose ports only on specific IP addresses besides the localhost.

Use Forwarder Is Enabled by Default

The use_forwarder option is now enabled by default. See Collect StatsD Metrics Under Load.

Set JMX Limits

The default value (300) of per-process JMX bean limits can now be changed as follows:

jmx:
  max_per_process_beans: 500

Known Issues

Handling Benchmark Task When StatsD Metrics Collection Is Disabled

When Statsd is disabled, do not attempt to send metrics related to benchmarks tasks. This also means that benchmarks dashboards will not have data when Statsd is disabled.

Kubernetes Pods Can Lose Association with Resources

A problem that could cause Kubernetes pods to lose association with their deployment or other related resources has been identified in Agent version 10.4.0. A new version, 10.4.1, that will address this problem is currently in development.

Defect Fixes

Kubernetes Audit Server and Agent Process Restart Congruently

Embedded web server for Kubernetes audit events restarts as expected when the agent process is restarted.

Updated the version of the jackson-databind package to fix vulnerabilities discovered in the slim agent v10.3.0

10.3.1 August 06, 2020

Defect Fixes

Kubernetes Benchmark Tasks No Longer Fail

The kube-bench binary that was identified as broken due to the change in the output format has been fixed.

kube-bench that performs the Kubernetes Benchmarks tasks has changed the output format, causing the existing Benchmark tasks to fail in v10.3.0. With this fix, the agent will no longer throw errors related to this issue and the new Kubernetes Benchmark results will appear in the UI as expected.

Probes Works As Expected for v5.8 Kernels

Fixed an issue with building probes for Linux v5.8.0 kernel.

10.3.0 July 28, 2020

New Features and Enhancements

Changes to the Monitor Mode

URL segmentation for metrics has been moved from the default monitor mode to the troubleshooting mode. Due to this change, dashboard panels with per URL metric will show no data. See Additional Metrics Values Available in Troubleshooting.

Sysdig Probe Location Changes

The Sysdig probe URL is changed to download.sysdig.com.

If the Sysdig probe URL is included in the allow list for outbound firewall access, you must change the endpoints to reflect the new probe location.

Agent Connects to Promscrape through UNIX Socket By Default

The agent now connects to promscrape through a UNIX socket by default as opposed to the TCP port 9876.

New Configuration File Paths for Kube Proxy

The version of kube-bench has been upgraded to 0.2.4. The changes include an additional configuration file path for Hyperkube kube-proxy to support OpenShift.

Known Issues

Kubernetes Benchmark Tasks Fail

The kube-bench binary is broken due to the change in the output format and the issue will be fixed in an upcoming release.

kube-bench that performed the Kubernetes Benchmarks tasks changed the output format, causing the existing Benchmark tasks to fail. The new Kubernetes benchmark results will not appear in the UI, and the agent will report errors related to Kubernetes benchmark tasks.

Defect Fixes

EndPoints-Independent Metrics Limits for Prometheus

Prometheus metric limits have been modified to ensure that endpoints with fewer timeseries are not affected when another endpoint hits the limit. Reporting of Prometheus timeseries statistics has also been updated.

Prometheus Count Metrics for Summary and Histogram

The calculated Prometheus _count metrics are reported for summaries and histograms even when the _sum values are missing. This feature is not applicable to raw metrics.

A .count metric (which is the rate of change of _count values) and a .avg (which is the average of new samples when _count increases) are calculated for summaries and histograms. Earlier, those .count and .avg metrics are reported only if the raw Prometheus metrics include both _sum and _count values. In this release, changes have been made such that _sum values are no longer required to calculate Prometheus _count metrics for summaries and histograms.

Reporting Running Pod Counts

Fixed an issue pertaining to the reporting of running pod counts for replication controllers, deployments, and ReplicaSets.

Segmenting Kubernetes Jobs Metrics By Namespace

Fixed an issue that prevented having Kubernetes jobs segmented by namespace.

Agent No Longer Stalls Under High Load

Fixed an issue that caused the agent to stall under high load.

Restarting Agent No Longer Causes Exception

Fixed an issue that caused an exception at agent restart while collecting CPU metrics.

10.2.0 June 25, 2020

New Features and Enhancements

Prometheus Scraping

Periodic logging of statistics for Prometheus timeseries has been added. When a metric limit is hit, all the timeseries metrics associated with the endpoint are dropped.

App Checks and Prometheus Metrics

Processes with app checks or Prometheus metrics are now included by default in the top processes to be sent to the Sysdig collector.

Performance Improvement

A variety of performance improvements have been rolled out to accelerate the evaluation of Falco rules and fast engine rules for the common case of events not matching any rules/policies.

Detect JSVC Processes as Java Programs

The agent has been enhanced to detect JSVC processes as java programs to enable the collection of JMX metrics.

Troubleshooting Metrics Removed from Default Mode

The net.mongodb.* and net.sql.* metrics have been moved from the default monitor mode to the troubleshooting mode. For more information, see Additional Metrics Values Available in Troubleshooting.

Deprecated Metrics

The following deprecated App Checks have been removed and will no longer be supported.

  • Network

  • RiakCS

  • TokuMX

  • Ceph

  • Gearmand

  • Gunicorn

  • Kyoto Tycoon

  • Teamcity

  • Riak

  • Solr

  • OpenStack

Defect Fixes

Fixed a Race Condition

Fixed a potential race condition that could occur when receiving multiple policies and related messages from the Sysdig collector at nearly the same time.

Benchmark Task Configuration

The agent no longer runs a built-in set of benchmark tasks. The agent will only run benchmark tasks when configured to do so by a Sysdig Secure backend.

Prometheus Metrics From Idle Processes Are No Longer Dropped

Prometheus metrics from idle processes are no longer dropped even if the target processes are not active enough to be in the top processes. Additionally, the app_checks_always_send parameter, which can force report the idle processes with metrics, now works as expected for metrics gathered by promscrape.

Removed Authentication Credentials

Removed sensitive authentication credentials related to app checks from debug log messages.

Kubernetes Events Are No Longer Dropped

Kubernetes events are no longer dropped under some high load conditions.

Memcached App Checks Collects Slabs and Items Stats

Fixed a problem that prevented the collection of slab and item stats in the Memcache app checks in certain Python environments.

Metrics No Longer Report Incorrect Zero Values

The following metrics now no longer return incorrect zero values:

  • kubernetes.resourcequota.cpu.requests.hard

  • kubernetes.resourcequota.cpu.requests.used

  • kubernetes.resourcequota.memory.requests.hard

  • kubernetes.resourcequota.memory.requests.used

Agent Automatically Restarts Upon Protocol Mismatch Errors

The agent used to require manual intervention to recover from protocol mismatch errors received from the Sysdig Backend. This error can occur when the agent and Sysdig Backend are not in sync. The agent has been enhanced to automatically restart when this error is encountered, so manual intervention is no longer required.

10.1.1 June 02, 2020

Defect Fixes

Enable Network Topology

Network stats metrics that were moved to the troubleshooting mode in Agent v10.1.0 have been re-enabled by default. The metrics will now be available in the monitor mode, which in turn will enable the network topology by default.

For information on agent modes, see Configure Agent Modes.

10.1.0 June 01, 2020

New Features

Support for Linux v5.6 Kernels

Added support for Linux 5.6 kernels.

JMX Support for Java v11, 12, 13 and 14 JRE

Added JMX support for Java 11, 12, 13, and 14 JRE. For containerized Java apps with JRE, run the app with the -Dcom.sun.management.jmxremote option.

Added Rate Limiting Configurations

Added rate limiting configurations to the agent to avoid connection timeouts for metrics and secure messages.

Added New Metrics

Added a new metric to display the kernel version of the host where the agent is running.

  • host.uname

    This metric can be segmented by host.uname.kernel.name, host.uname.kernel.release , and host.uname.kernel.version. For more information, see host.uname.

Added Container Name to the Containerd Event Description

Added container name to the containerd events description. In some rare cases, the container name associated with a containerd event might be unavailable due to metadata lookup delay.

Removed Authentication Credentials

Removed sensitive authentication credentials related to app checks from debug log messages.

Removal of Deprecated App Checks

The following deprecated app checks will be removed in an upcoming release:

  • Network

  • RiakCS

  • TokuMX

  • Ceph

  • Gearmand

  • Gunicorn

  • Kyoto Tycoon

  • Teamcity

  • Riak

  • Solr

  • OpenStack

Enable Removed Metrics

Some metrics related to network and file will not be available by default. You can enable them by editing the dragent.yaml file.

Edit the Configuration File
  1. Open the dragent.yaml file.

  2. Add the following configuration parameter:

    feature:
      mode: troubleshooting
    
  3. Restart the agent.

Removed Metrics in Agent v10.1.0

The following metrics will not be reported by default in agent v10.1. When segmented by a particular label, these metrics will not have some values. The table summarizes the metrics and missing values when they are segmented by a particular label.

MetricsUnreported Metrics Values When Segmented by
file.error.total.countfile.name and file.mount labels
file.bytes.total
file.bytes.in
file.bytes.out
file.open.count
file.time.total
host.count
host.error.count
proc.count
proc.start.count
net.bytes.innet.connection.server, net.connection.direction, net.connection.l4proto and net.connection.clientlabels
net.bytes.out
net.connection.count.total
net.connection.count.in
net.connection.count.out
net.request.count
net.request.count.in
net.request.count.out
net.request.time
net.request.time.in
net.request.time.out
net.bytes.total

Defect Fixes

Promscrape No Longer Breaks Metrics Collection Over HTTPS

Fixed promscrape to honor the ssl_verify configuration option.

Slim Agent Container No Longer Prevents Certain App Checks From Emitting Metrics

Fixed an issue with the agent-slim container that prevented postgres and pgbouncer app checks from emitting metrics.

Reduced the Frequency of Log Messages

Reduced the frequency of a log message to reduce spam and enhanced a statsd related log message to provide more information about incorrectly formatted strings.

Use Exact Rule Names When Adding Rules to Runtime Policies

Consider only exact matches when linking secure runtime policies to Falco rules to fix this issue.

Corrected Calculation of net.bytes.* Metrics

Fixed calculation of net.bytes.* metrics at the host level when using calico interfaces or VPN tunnels.

10.0.0 May 01, 2020

New Features

Kubernetes Benchmark Master Programs

Added the ability to run Kubernetes Benchmark Master Programs on additional Kubernetes distributions.

New Scraping Mechanism for Prometheus

A new process, called promscrape, has been introduced to scrape Prometheus metrics by default. The mechanism, based on the open-source Prometheus, improves compatibility and performance. It also allows per-endpoint metric filtering and relabeling through metric_relabel_configs.

For more information, see Working with Prometheus Metrics.

Non-Root Access to Log Files

Added the ability to make draios.log files readable by users other than root. This can be enabled with the following configuration in dragent.yaml.

log:
  globally_readable: true

New Runtime Policy Action

Added the ability to kill containers as a runtime policy action. See Manage Policies for details.

Defect Fixes

Fixed the Path Parameter Issue in Prometheus Configuration

Fixed the use of the path parameter in Prometheus configuration when using promscrape. With this fix, the configured path is passed to promscrape by the agent when it is set up for a target rule in dragent.yaml.

Service Annotation Based Prometheus Scraping

Prometheus scraping can now be triggered based on service annotations by default.

Added a Missing Module to the agent-slim Container

Added the missing posix-ipc module to the slim agent. This fixed an issue that prevented App Checks from running in the agent-slim container on v9.9.0.

No Metric Limit on Scraped Prometheus Metadata

Prometheus scraping metadata is no longer counted toward, or limited by, metric limits when using promscrape.

Fix for Percentile Metrics

Fixed a defect that caused percentile metrics to not work properly.

9.9.1 April 16, 2020

Defect Fixes

Added the Missing Module to the Slim Agent

Added the missing Posix module to the slim agent. This fixed an issue that prevented App Checks from running in the agent-slim container on v9.9.0.

9.9.0 April 13, 2020

Core Features and Fixes

Python 3 Set as Default and Some App Checks Deprecated

Python 3 is the new default Python version for app checks, instead of Python 2. Python 2 can still be used by setting the following option in your dragent.yaml:

python_binary: <path to python 2.7 binary>

For containerized agents, this path will be: /usr/bin/python2.7

The following app checks are deprecated as of 9.9.0:

  • Network

  • RiakCS

  • TokuMX

  • Ceph

  • Gearmand

  • Gunicorn

  • Kyoto Tycoon

  • Teamcity

  • Riak

  • Solr

  • Openstack

See Integrate Applications (Default App Checks).

Fixed Kernel Issue when Deploying Agent on GKE

Fixed a potential CPU stall on kernels with versions greater >= 4.19 using eBPF probe.

Fixed Flooded Agent Logs

Fixed an issue that caused excessive logging in the agent log file.

9.8.0 March 31, 2020

All the public-facing URLs that were pointing to https://s3.amazonaws.com/download.draios.com/ have been updated to point to https://download.sysdig.com/.

Change the URL in the whitelisting firewall/proxy setting to reflect https://download.sysdig.com/. Otherwise, the agent install on Linux will fail.

Fixes

Metrics Reporting

Fixed an issue in the agent wherein the kubernetes.namespace.pod.desired.count and kubernetes.namespace.pod.available.count metrics were not reporting any values.

HDFS App Check Deprecated

The HDFS (Hadoop Distributed File System) App Check had been deprecated and removed. Users of the HDFS App Check can switch to hdfs_namenode and hdfs_datanode App Checks.

Metric Calculation

Fixed an issue related to calculating the kubernetes.pod.restart.rate metric.

Network Congestion

Isolated the Kubernetes Audit HTTP server from the Audit Event processing path to reduce the chances of slowing down the connections from the Kubernetes API server. This should reduce the likelihood of multiple outstanding connections from the Kubernetes API server.

Certifi Python Module

Added a missing certifi Python module to the agent container.

9.7.0 March 09, 2020

New Features

Support for Openshift Hardening Guide

Added Openshift Hardening Guide as a benchmark program. It is available as an option for CIS Kubernetes Benchmark.

Support for Linux Benchmark

Added Linux benchmarking as an available benchmark program.

New Metrics for Redis and MongoDB App Checks

The following metrics are introduced:

  • RedisDB

    • redis.mem.startup

    • redis.mem.overhead

  • MongoDB

    • mongodb.tcmalloc.generic.current_allocated_bytes

    • mongodb.tcmalloc.generic.heap_size

    • mongodb.tcmalloc.tcmalloc.aggressive_memory_decommit

    • mongodb.tcmalloc.tcmalloc.central_cache_free_bytes

    • mongodb.tcmalloc.tcmalloc.current_total_thread_cache_bytes

    • mongodb.tcmalloc.tcmalloc.max_total_thread_cache_bytes

    • mongodb.tcmalloc.tcmalloc.pageheap_free_bytes

    • mongodb.tcmalloc.tcmalloc.pageheap_unmapped_bytes

    • mongodb.tcmalloc.tcmalloc.spinlock_total_delay_ns

    • mongodb.tcmalloc.tcmalloc.thread_cache_free_bytes

    • mongodb.tcmalloc.tcmalloc.transfer_cache_free_bytes

For more information, see Metrics Introduced with Agent v9.7.0 and RedisDB Metrics

Fixes

Slim Agent Vulnerabilities

Fixed the vulnerabilities detected in the agent-slim v9.6.1 image. These issues are related to the python2 and jackson-databind packages. These packages were upgraded to the versions with fixes.

Run App Checks on Hosts with Python 2.6

Fixed a defect that prevented app checks from running on hosts that install Python 2.6.

9.6.1 February 28, 2020

Fixes

Metrics calculation

Fixed an issue that caused an error in the calculation of some metrics such as net.* in agent version 9.6.0.

Red Hat-based host issue

Fixed an issue that caused the kernel module build associated with agent version 9.6.0 to fail on Red Hat-based hosts.

9.6.0 February 26, 2020

Upgrades

Integrations improved

Added new metrics and configuration options for HAProxy and Consul app check integrations. See HAProxy, and Consul for details.

Fixed a problem Go app check which caused it to fail with an exception error.

Metrics added

Added Kubernetes metric kubernetes.namespace.pod.running.count to track the number of pods in running state. See Kubernetes Dashboards.

Reduced load on the Kubernetes API server

The version of client-go was updated and now defaults to encoded protobuf messaging instead of JSON to improve performance.

Configuration option new_k8s now enabled by default.

Default collector port changed

The default port for the collector was changed from 6666 to 6443.

This could affect your firewall port settings; you may want to review them before upgrading the agent.

Fix for the dynamic back-end configuration of Kubernetes Audit Logging caused some agent deploys to fail

The agent is enhanced to listen on /k8s-audit for Kubernetes audit events and the path can be configured via the config option security:{k8s_audit_server_path_uris: [path1, path2]}.

Fixes

Prometheus metrics fix

Fixed a problem that inhibited the agent from scraping multiple ports on a single process for Prometheus metrics.

Inaccurate cpu.used reporting fixed

Fixed a problem that caused the agent to erroneously report very high CPU usage in some environments.

9.5.0 January 28, 2020

Note that the versioning scheme for agent releases has been updated with this release. Previous versions used the format 0.<version number><hotfix>, such as 0.94.0.

Sysdig is aligning version numbers to the rest of the product. The new version number reflects the maturity of the Agent software over the last several years. Going forward, all Agent versions will be numbered as Major.minor.hotfix

We encourage users to be on the latest version of the Agent. Starting with the next release of the Agent, we will support n-3 versions back based on the minor number. For example, if the next release is v 9.6.0, we will support n-3 versions back, e.g to 9.3.0 (old version scheme = 0.93.0).

Fixes and Upgrades

Added new configuration option and metrics for Elasticsearch integrations

In the Elasticsearch app check, the parameter index_stats can be used to collect metrics from individual indices. See Example 4 in Elasticsearch and Elasticsearch Metrics for details.

Added new metrics for NGINX Plus integrations

More than 60 new metrics have been added to the NGINX app check. See NGINX Plus Metrics for details.

Made Go-based event handling the default

See Process Kubernetes Events. As of agent 9.5.0, the default setting for go_k8s_user_events is true and there is no need explicitly to enable it. To switch back to the older events monitoring (C++ based), set the value to false in your agent config (dragent.yaml).

Enhanced log tracing for include/exclude processes filter.

No user action is required; see Include/Exclude Processes to use the filter.

Fixed agent termination issue

Fixed a problem that was causing an internal process within the agent to repeatedly restart.

Improved memory buffer handling

The agent will now auto-disable memdump functionality when the memory buffer is too small.

Agent start/stop improvements on CRI-O and Openshift 4.x

The agent can now correctly perform the pause and stop container actions on clusters running OpenShift 4.x and CRI-O.

0.94.0 December 20, 2019

Fixes and Upgrades

Fixed issue in the agent install scripts

The agent install scripts have been updated to mount /etc/modprobe.d from the host into the agent container. This prevents a problem where the agent loaded drivers that were excluded from the host.

Added user events for additional resource types

Added events monitoring for statefulsets, services, and horizontal pod auto-scalers (HPAs) when the Golang-based events monitoring feature is enabled. To enable, see Process Kubernetes Events.

Added regex support for Kafka integrations

Added regex capability for consumer groups and topics in Apache Kafka configurations. See Example 6 in Apache Kafka.

Increased the Prometheus max_tags default value

The Prometheus max_tags configuration has been increased from 20 to 40.

Made change to guarantee support for older cpuset configurations.

Changed CRIO cpuset calculations to use the configured cpuset.cpus value instead of cpuset.effective_cpus. This guarantees support on older cpuset configurations.

Corrected an issue that resulted in the suffix “_total” to be stripped to Prometheus counter metric names.

0.93.1 November 25, 2019

Fixes and Updates

Fixed installation issue on native RHEL 7.x installs

The agent installer script has been updated to refer to an updated epel repository.

Improved JMX metrics reporting

Fixed an issue when retrieving JMX metrics which could result in missing samples.

(Sysdig Secure): Improvement in Kubernetes Audit events

Fixed runtime policy scopes for Kubernetes audit events.

(Sysdig Secure) Fixed audit event exception

The system now catches JSON object-type exceptions when parsing Kubernetes audit events.

Improved error message

Improved the error message reported when the Sysdig agent cannot find a pre-installed kernel header or cannot download a sysdigcloud-probe.

Performance improvement in dragent logging

0.93.0.1 November 15, 2019

Fixes

Fixed issue with Prometheus metrics names

Corrected a problem that resulted in the suffix _total to be removed from Prometheus counter metric names.

0.93.0 November 6, 2019

New Features

Mask the customer ID in log files

The Customer ID is no longer output in the agent log, to avoid inadvertent exposure when sharing of log files.

Kubernetes role node label included by default

The kubernetes.node.label.kubernetes.io/role label is available by default

Update Kubernetes API used, in order to expand support of Kubernetes v1.16

Replaced usage of the extensions/v1beta1 Kubernetes API with apps/v1 in the agent. This is required for supporting Kubernetes v1.16 using the agent’s legacy Kubernetes integration (when new_k8s is not enabled).

Introduced a new config option in ElasticSearch app check

Introduced a new config option to generate cluster-wide primary shard metrics from a master node: pshard_stats_master_node_only. See Elasticsearch (Example 3).

Enhanced Postgresql app check

The Postgres app check has been enhanced to provide new metrics and examples. See PostgreSQL.

Agent preparation for upcoming Policy Advisor feature in Sysdig Secure

The agent will support new Rules generated by Sysdig’s Kubernetes Policy Advisor. This agent is the minimum version required to use the upcoming feature.

Updates and Fixes

Improved system events handling for Ubuntu 19.10

On kernels 5.1 and newer, some syscall events were incorrectly dropped. This has been fixed.

Stopped Kubernetes pause containers (pods) from being reported

Fixed an issue where Kubernetes pause containers were also showing up in Kubentes events. This fix filters them out from the events being reported.

Fixed rare issue on OpenShift

Fixed an issue where, in a rare case, a dropped event could cause a kernel deadlock and crash the node.

Fixed issue preventing kernel module creation for Debian Buster

This change adds support for building the Sysdig Monitor agent kernel module for Debian Buster.

Improved event timestamp in Kubernetes

This fix ensures that user events get the correct timestamp with Kubernetes v1.16 when thego_k8s_user_events option is set to true.

Updated Kubernetes API used, in order to expand support of Kubernetes v1.16

In dragent.yaml, the Kubernetes API extensions/v1beta1 is updated to apps/v1. This enables agent support for Kubernetes v1.16 even when the new_k8s option is set to false.

Fixed a Kubernetes event reporting issue

Fixed an issue with Kubernetes Events where the host MAC scope was not populated correctly, resulting in not showing up on the dashboard.

Improved Kubernetes events handling from delegated agents

When using go_k8s_user_events, kubernetes events from non-delegated agents are no longer sent.

Eliminated legacy “BASELINES” message

Stopped processing legacy BASELINES messages from the backend collector.

Performance improvement at startup

The agent now defers initialization of Secure-related components slightly to reduce excess resource usage at startup.

0.92.3 October 7, 2019

Updates and Fixed Issues

Included Example of a Prometheus Matching Rule Using HTTPS

The Sysdig agent will use HTTPS for scraping when target’s annotation has “kuberentes.pod.annotation.prometheus.io/scheme: https”.

Kubernetes versions older than 1.9 no longer supported.

The Sysdig agent has replaced the use of the extensions/v1beta1 Kubernetes API with apps/v1.

Included Example of a Prometheus Matching Rule Using HTTPS

The Sysdig agent will use HTTPS for scraping when target’s annotation has “kuberentes.pod.annotation.prometheus.io/scheme: https”.

The RabbitMQ app check has a new config option: filter_by_node

Without this option, each node reports cluster-wide information (as presented by rabbitmq itself). This option makes it easier to view the metrics in the UI by removing redundant information reported by individual nodes. See RabbitMQ for details.

0.92.2 September 26, 2019

New Features

Asynchronous metadata collection for CRI-O and containerd

The collection of container metadata from CRI-based runtimes was previously synchronous with other agent tasks.

**Prioritize and filter how process metrics are reported in Sysdig Monitor. **

In addition to filtering data by container, it is also possible to filter independently by process. Broadly speaking, this refinement helps ensure that relevant data is reported while noise is reduced. See Include/Exclude Processes for details.

As of this version, App Checks on hosts with Python 2.6 will no longer be supported.

Fixed Issues

  • **Fix for Agent termination during resource discovery from the Kubernetes API Server **

    Fixed an issue where the Agent stopped and shut down if there an error occurred during resource discovery from the Kubernetes API Server. This fix simply reports the error and continues with the discovered resources.

  • Fix for Kubernetes delegation error

    Fixed an issue that caused Kubernetes delegation to not work after the cointerface process restarts following a crash.

  • Fix for accounting Network errors

    Network-related errors are now correctly accounted for instead of being treated as file-open errors.

  • New Prometheus Client Version

    Updated prometheus_client to version 0.7.1. This should result in improved performance while ingesting Prometheus metrics.

  • Fix for dropping StatsD Metrics

    A defect in earlier versions of Sysdig Monitor with the statsd.use_forwarderoption could drop some StatsD metrics from containers. This change resolves that problem; the agent will begin fetching metrics from containers 10 seconds after first identifying that the container exists. The 10 second delay allows containers to start StatsD servers within their network namespaces if they choose.

    The timeout can be overridden using the statsd.container_server_creation_delay_s option, which specifies the delay in seconds.

  • Fixed resource metrics for CRI-O containers

    The following metrics reporting correctly in the Monitor UI: memory.limit.bytes, memory.limit.used.percent, and cpu.quota.used.percent. The CRI extra_queries option now enabled by default. See Runtime Support: CRI-O and Containerd for details.

Sysdig Secure

  • **Fix for enlarging Sysdig Capture **

    Fixed an issue where a Sysdig capture would grow endlessly if a security policy was set to Capture 0 seconds after an event.

  • Fix for processing system events

    Fixed problem where gettimeofday syscall was called in compliance code while processing system events. This could potentially cause performance problems in Linux distros that called down to the kernel for gettimeofday responses, such as some versions of Amazon Linux.

Sysdig Platform

  • New RPM dependency

    Changed RPM dependency to Python 2 to support installation on RHEL 8.

0.92.1 August 16, 2019

Fixed Issues

Sysdig Monitor

  • Fixed issue with cluster name in Monitor UI

    Cluster name was being populated incorrectly for Kubernetes event scopes.

  • Fixed Kubernetes events issue

    Fixed Kubernetes event collection issue that occurred when using the go_k8s_user_events option. This option was introduced in agent version 0.91.

Sysdig Platform

  • RHEL 7.7 and 8.0+ support The kernel module now builds for RHEL 7.7 and 8.0+

  • Fixed issue with StatsD metrics collection limits Some versions of the Sysdig agent allowed fewer than the configured number of StatsD metrics because Sysdig Secure-related StatsD metrics were counted towards the configured limit.

    This change corrects that behavior so that the configured limit applies only to StatsD metrics that do not originate from Sysdig components.

Sysdig Secure

  • Fixed a profiling-related issue that impacts Sysdig Secure 2.4

    Sysdig Secure 2.4 will include a new Profiling feature, and 0.92.1 fixes a bug where profiling could remain disabled after periods of high load. In order to use Profiling, it is required to upgrade to agent 0.92.1 or higher.

0.92 August 7, 2019

New Features

Preparatory enhancements for upcoming Sysdig Secure Policy Editor Although the feature UI will not be released until version 2.4.0, Sysdig encourages all users of Sysdig Secure to upgrade to agent 0.92 in preparation for the new Policy Editor feature. Agent 0.92 will accept policies messages from both the current backend as well as a backend that supports the new policy editor.

Ability to compress metrics data for internal transfer

With app checks integrations, when the volume of metrics data collected was too large to send over the agent’s internal queue, app checks could fail. This problem is solved by introducing an option to compress app checks metrics data, which reduces the internal load. See Compress Metrics Data for details on how to enable this option.

Fixed Issues

Sysdig Monitor

Fix for occasionally dropped metrics In earlier releases of Sysdig Monitor, the agent sometimes failed to parse metrics containing negative values for some fields.

This change updates the behavior to drop fields that have unsupported negative values, and to generate a log message when such fields are encountered.

Sysdig Platform

  • Fix for MySQL versions 8.0.14+

    Fixed a bug that caused the MySQL app check to fail with an error.

  • Fixed agent crash issue exposed by recent Linux kernels

    Affected kernels include the 5.2.x line, 5.1.8+, and 4.19.49+.

  • Fixed a bug in HTTP parserIn the (uncommon) situation where absoluteURI is used in the Request-URI, fixed a bug that was causing a faulty URL.

0.91 July 17, 2019

New Features

Improved securityRemoved obsolete and vulnerable Python 2.6-compatible libraries from Docker images.

More efficient Kubernetes event handling.

The agent has added functionality to allow more efficient processing of Kubernetes user events.

See Process Kubernetes Events to enable.

Reduced CPU usage on Kubernetes clusters Extended performance optimizations for processing Kubernetes Services, which will reduce agent CPU usage in large clusters.

Container filtering enhanced. Smart filters and aggregated filtering options are now available. See Prioritize/Include/Exclude Designated Containers.

Fixed Issues

Monitor

  • Fixed issue with Prometheus metrics gathering intervals

    The agent will now respect the configured interval for scraping Prometheus metrics from remote endpoints, as opposed to doing it every second.

  • Fixed limit/requests calculations for init containers

    Fixed memory calculations for Kubernetes init container limits and requests

  • Improved Healthcheck monitoringAgent has improved ability to detect commands identified as a part of Kubernetes Liveness/Readiness Probes, in addition to Docker Health Checks.

  • Improved error messaging

    Warning messages for container group inconsistencies were demoted to debug level, as they are harmless and do not need to clutter the error reporting stream.

  • Fixed issue with container “incomplete” reporting status

    Starting with version 0.90.0, the agent would report containers for which it had not yet fetched metadata as “incomplete.” This would then propagate to the Monitor UI. This restores the behavior where the agent leaves the unknown fields unset.

  • Resolved REST server issue

    Fixed problem where an enabled port would respond to HTTP requests when not desired.

  • Fixed issue with StatsD metrics collection

    Previous versions of the Sysdig agent, when configured to use the StatsD fowarder ({{statsd.use_forwarder: true}}) truncated messages that it received from containers to 2048 bytes, resulting in the potential for dropped and corrupted metrics. This change resolves that problem. See details under StatsD Integration.

It is recommended to follow upgrade best practices:

  • Keep upgrades current

  • Test upgrades in a non-mission-critical or staging environment before rolling into production.



Last modified September 23, 2022