OpenShift API-Server

Metrics, Dashboards, Alerts and more for OpenShift API-Server Integration in Sysdig Monitor.
OpenShift API-Server

This integration is disabled by default. See Enable and Disable Integrations to enable it in your account.

Versions supported: > v4.8

This integration is out-of-the-box, so it doesn’t require any exporter.

This integration has 18 metrics.

Timeseries generated: API Server generates ~5k timeseries

List of Alerts

AlertDescriptionFormat
[OpenShift API Server] Deprecated APIsAPI-Server Deprecated APIsPrometheus
[OpenShift API Server] Certificate ExpiryAPI-Server Certificate ExpiryPrometheus
[OpenShift API Server] Admission Controller High LatencyAPI-Server Admission Controller High LatencyPrometheus
[OpenShift API Server] Webhook Admission Controller High LatencyAPI-Server Webhook Admission Controller High LatencyPrometheus
[OpenShift API Server] High 4xx RequestError RateAPIS-Server High 4xx Request Error RatePrometheus
[OpenShift API Server] High 5xx RequestError RateAPIS-Server High 5xx Request Error RatePrometheus
[OpenShift API Server] High Request LatencyAPIS-Server High Request LatencyPrometheus

List of Dashboards

OpenShift v4 API Server

The dashboard provides information on the K8s API Server and OpenShift API Server. OpenShift v4 API Server

List of Metrics

Metric name
apiserver_admission_controller_admission_duration_seconds_count
apiserver_admission_controller_admission_duration_seconds_sum
apiserver_admission_webhook_admission_duration_seconds_count
apiserver_admission_webhook_admission_duration_seconds_sum
apiserver_client_certificate_expiration_seconds_bucket
apiserver_client_certificate_expiration_seconds_count
apiserver_request_duration_seconds_count
apiserver_request_duration_seconds_sum
apiserver_request_total
apiserver_requested_deprecated_apis
apiserver_response_sizes_count
apiserver_response_sizes_sum
apiserver_tls_handshake_errors_total
go_goroutines
process_cpu_seconds_total
process_resident_memory_bytes
workqueue_adds_total
workqueue_depth

Prerequisites

None.

Installation

Installing an exporter is not required for this integration.

Monitoring and Troubleshooting OpenShift API Server

Because OpenShift 4.X comes with both Prometheus and API servers ready to use, no additional installation is required. The OpenShift API server metrics are exposed using the \federated endpoint.

Learning how to monitor Kubernetes API server is vital when running Kubernetes in production. Monitoring kube-apiserver will help you detect and troubleshoot latency and errors, and validate whether the service performs as expected.

Here are some interesting queries to run and metrics to monitor for troubleshooting the OpenShift API Server.

Deprecated APIs

To check if deprecated API versions are used, use the following query:

sum by (kube_cluster_name, resource, removed_release,version)(apiserver_requested_deprecated_apis)

Certificate Expiration

Certificates are used to authenticate to the API server, and you can check with the following query if a certificate is expiring next week:

apiserver_client_certificate_expiration_seconds_count > 0 and histogram_quantile(0.01, sum by (job, le) (rate(apiserver_client_certificate_expiration_seconds_bucket[5m]))) < 7*24*60*60

API Server Latency

Latency spike is typically a sign of overload in the API server. Probably your cluster has a high load and the API server needs to be scaled out. Use the following query to check for latency spikes in the last 10 minutes.

sum by (kube_cluster_name,verb,apiserver)(rate(apiserver_request_duration_seconds_sum{verb!="WATCH"}[10m]))/sum by (kube_cluster_name,verb,apiserver)(rate(apiserver_request_duration_seconds_count{verb!="WATCH"}[10m]))

Request Error Rate

Request errror rate means that the API server is responding with 5xx errors. Check the CPU and memory of your api-server pods.

sum by(kube_cluster_name)(rate(apiserver_request_total{code=~"5..",kube_cluster_name=~$cluster}[5m])) / sum by(kube_cluster_name)(rate(apiserver_request_total{kube_cluster_name=~$cluster}[5m])) > 0.05

Agent Configuration

The default agent job for this integration is as follows:

- job_name: openshift-apiserver-default
  honor_labels: true
  scheme: https
  bearer_token_file: /var/run/secrets/kubernetes.io/serviceaccount/token
  tls_config:
    insecure_skip_verify: true
  metrics_path: '/federate'
  params:
    'match[]':
      - '{__name__=~"apiserver_request_total|apiserver_request_duration_seconds_sum|apiserver_request_duration_seconds_count|workqueue_adds_total|workqueue_depth|apiserver_response_sizes_sum|apiserver_response_sizes_count|apiserver_requested_deprecated_apis|apiserver_client_certificate_expiration_seconds_bucket|apiserver_client_certificate_expiration_seconds_count|apiserver_admission_controller_admission_duration_seconds_sum|apiserver_admission_controller_admission_duration_seconds_count|apiserver_admission_webhook_admission_duration_seconds_sum|apiserver_admission_webhook_admission_duration_seconds_count|apiserver_tls_handshake_errors_total|go_goroutines|process_resident_memory_bytes|process_cpu_seconds_total",code!="0"}'
  kubernetes_sd_configs:
  - role: pod
  relabel_configs:     
  - action: keep
    source_labels: [__meta_kubernetes_pod_host_ip]
    regex: __HOSTIPS__
  - source_labels: [__meta_kubernetes_pod_phase]
    action: keep
    regex: Running
  - action: keep
    source_labels:
    - __meta_kubernetes_namespace
    - __meta_kubernetes_pod_name
    separator: '/'
    regex: 'openshift-monitoring/prometheus-k8s-0'
    # Holding on to pod-id and container name so we can associate the metrics
    # with the container (and cluster hierarchy)
  - action: replace
    source_labels: [__meta_kubernetes_pod_uid]
    target_label: sysdig_k8s_pod_uid
  - action: replace
    source_labels: [__meta_kubernetes_pod_container_name]
    target_label: sysdig_k8s_pod_container_name
  - action: replace
    source_labels: [ __address__ ] 
    target_label: _sysdig_integration_openshift_apiserver
    replacement: true
  metric_relabel_configs:
  - source_labels: [__name__]
    regex: (apiserver_request_total|apiserver_request_duration_seconds_sum|apiserver_request_duration_seconds_count|workqueue_adds_total|workqueue_depth|apiserver_response_sizes_sum|apiserver_response_sizes_count|apiserver_requested_deprecated_apis|apiserver_client_certificate_expiration_seconds_bucket|apiserver_client_certificate_expiration_seconds_count|apiserver_admission_controller_admission_duration_seconds_sum|apiserver_admission_controller_admission_duration_seconds_count|apiserver_admission_webhook_admission_duration_seconds_sum|apiserver_admission_webhook_admission_duration_seconds_count|apiserver_tls_handshake_errors_total|go_goroutines|process_resident_memory_bytes|process_cpu_seconds_total)
    action: keep
  - action: replace
    source_labels: [namespace]
    target_label: kube_namespace_name
  - action: replace
    source_labels: [pod]
    target_label: kube_pod_name
  - action: replace
    target_label: job
    replacement: openshift-apiserver-default