This the multi-page printable view of this section. Click here to print.

Return to the regular view of this page.

    Working with the Data API

    The data API provides access to the labels and metrics data captured by Sysdig agents and stored in the Sysdig datastores. Sysdig agents capture process, network, system and other infrastructure data with a 1-second resolution, and sends them to the Sysdig worker service with a 10-second resolution.

    The data API allows you to fetch data at the native resolution of 10-second or lower. You can specify the resolution to return data via the sampling parameter. Each resolution has different data retention periods. Since native data capturing is performed with a 1-second resolution, for each metric you need to specify a time aggregation.Data Retention

    Similarly, data associated with individual pods, processes, and so can be aggregated at a higher level. For example, at the host or namespace level. This type of aggregation depends on the list of labels you specify. For instance, you could aggregate data at the host level by specifying a host label. For this to work, you need to specify a group aggregation.

    To learn more about data aggregation, see Data Aggregation.

    General Guidelines for Using the Data API

    • The maximum number of samples you can fetch via the data API is 600. Consequently, the larger the time window for the data retrieval, the lower the resolution will be.

    • The API enforces a response timeout of 30 seconds. Larger time windows, a higher number of metrics, multiple segments (higher number of columns and rows) might cause a longer response time.

    • Some labels might not apply to certain entities. When those labels are retrieved, the label value will be null.

    • Due to a current limitation, the same metric name cannot be specified more than once independent of the aggregations.

    REST Resource: Data

    See Sysdig REST API Conventions for generic conventions and authentication.

    Request Variables

    Field

    Description

    last

    Specifies the time window. The timestamp is expressed in seconds.

    start

    end

    An alternative to last to specify the time window.

    The timestamp is expressed in seconds.

    sampling

    Data resolution expressed in seconds. Sampling gives a single aggregated value across the entire window. It's value can be one of the following;

    • end - start

    • last

    filter

    Specifies the scope for the data to be returned. The simple expression to filter out data is:

    (not) label operator value

    The filter can also be a set of expressions, for instance, expr1 AND expr2 OR expr3.

    • label: Any label that allows for segmentation

    • operator: Supported operators are:

      • =

      • !=

      • in

      • contains

      • starts with

    • value is one of the following:

      • single value

      • list of values

      • null (supported with = and != operators only)

    For example:

    POST /api/data
    

    { "filter": "kubernetes.node.name = 'n1'", …

    metrics

    Specifies labels or metrics, or both, to be returned. Labels require only the ID, whereas metrics require ID as well as the time and group aggregations.

    Time aggregations: timeAvg (rate), average, minimum, maximum, sum

    Group aggregations: average, minimum, maximum, sum

    "metrics": [
        {"id": "container.name"},
        {
          "id": "cpu.cores.used",
          "aggregations": { "time": "avg", "group": "avg" }
        }

    The first metric requested in the query is the container name. This is a segmentation metric, and therefore, no aggregation criteria is specified. This second metrics queries for CPU utilization of each container separately. The metric is aggregated as an average.

    dataSourceType

    Specifies the type of entity for which metrics are retrieved. This is particularly useful when the same metric name, for instance cpu.cores.used, is used for different sources.

    Accepted values: host and container.

    paging

    Specifies the number of rows of data to be returned. By default, rows from 0 to 9 (10 rows) are returned. Pagination is applied to the sorted rows according to the following criteria:

    • If a metric is available, sort by first metric value (descending)

    • If a metric is unavailable, sort by first label value (descending)

    Response Variables

    Field

    Description

    data

    Returns a list of data points.

    Each data point is uniquely identified by timestamp and a list of label values.

    t: timestamp expressed in seconds.

    d: list of values representing labels and metrics, sorted as given in the request.

    start

    end

    Returns the actual time window of the drawn data.

    It may be different from the time when the API requested for data (eg. to align to timelines)

    Sample Request and Response

    In this example, CPU cores for containers are retrieved.

    POST /api/data
    
    {
      "last": 600,
      "sampling": 600,
      "filter": null,
      "metrics": [
        {
          "id": "cpu.cores.used",
          "aggregations": { "time": "avg", "group": "sum" }
        }
      ],
      "dataSourceType": "container",
      "paging": {
        "from": 0,
        "to": 99
      }
    }
    

    Given below is a sample response :

    {
        "data": [
            {
                "t": 1582756200,
                "d": [
                    6.481
                ]
            }
        ],
        "start": 1582755600,
        "end": 1582756200
    }
    

    Python Script Library for Data API

    Sysdig Python client for interacting with the Data API. See Python SDC Client for comprehensive examples and documentation.

    Function: sysdig_api.get_data

    sdclient.get_data(metrics,start_ts,end_ts,sampling_s,filter,paging,datasource_type)
    

    Returns the requested metrics data.

    Arguments

    Arguments

    Description

    start_ts

    Start of a query time window. The timestamp is expressed in seconds.

    end_ts

    End of a query time window. The timestamp is expressed in seconds.

    sampling_s

    Data resolution expressed in seconds.

    filter

    Specifies the scope for the data to be returned.

    metrics

    List of metrics to query.

    datasource_type

    The source for the metrics.

    Accepted values: host and container.

    paging

    Specifies the number of rows of data to be returned. By default, rows from 0 to 9 (10 rows) are returned.

    Successful Return Value

    Returns the requested metrics data in a JSON file.

    Sample Script

    ok, res = sysdig_api.get_data(
      start_ts = -600,
      end_ts = 0,
      sampling_s = 600,
      filter = None,
      metrics = [
        {
          "id": "cpu.cores.used",
          "aggregations": {
              "time": "avg",
              "group": "sum"
          }
        }
      ],
      datasource_type = "container",
      paging = {"from": 0, "to": 99}
    )