OpenShift Controller Manager
This integration is enabled by default.
Versions supported: > v4.8
This integration is out-of-the-box, so it doesn’t require any exporter.
This integration has 12 metrics.
Timeseries generated: Controller Manager generates ~650 timeseries
List of Alerts
Alert | Description | Format |
---|---|---|
[OpenShift Controller Manager] Process Down | Controller Manager has disappeared from target discovery. | Prometheus |
[OpenShift Controller Manager] High 4xx RequestError Rate | OpenShift Controller Manager High 4xx Request Error Rate | Prometheus |
[OpenShift Controller Manager] High 5xx RequestError Rate | OpenShift Controller Manager High 5xx Request Error Rate | Prometheus |
List of Dashboards
OpenShift v4 Controller Manager
If you are using Prometheus Remote Write you will need to add the following metric relabel config for this label.
- action: replace
source_labels: [ __address__ ]
target_label: _sysdig_integration_openshift_controller_manager
replacement: true
The dashboard provides information on the K8s and OpenShift Controller Manager.
List of Metrics
Metric name |
---|
go_goroutines |
rest_client_requests_total |
sysdig_container_cpu_cores_used |
sysdig_container_memory_used_bytes |
workqueue_adds_total |
workqueue_depth |
workqueue_queue_duration_seconds_count |
workqueue_queue_duration_seconds_sum |
workqueue_retries_total |
workqueue_unfinished_work_seconds |
workqueue_work_duration_seconds_count |
workqueue_work_duration_seconds_sum |
Prerequisites
None.
Installation
Installing an exporter is not required for this integration.
Monitoring and Troubleshooting OpenShift Controller Manager
Because OpenShift 4.X comes with both Prometheus and Controller Manager ready to use, no additional installation is required. The OpenShift Controller Manager metrics are exposed using a federated endpoint.
Here are some interesting queries to run and metrics to monitor for troubleshooting the OpenShift Controller Manager.
Work Queue
Work Queue Retries
The total number of retries that have been handled by the work queue. This value should be near 0.
topk(30,rate(workqueue_retries_total{job="openshift-controller-default"}[10m]))
Work Queue Latency
Queue latency is the time tasks spend in the queue before being processed
topk(30,rate(workqueue_queue_duration_seconds_sum{job="openshift-controller-default"}[10m]) / rate(workqueue_queue_duration_seconds_count{job="openshift-controller-default"}[10m]))
Work Queue Depth
This query checks the depth of the queue. High values can indicate the saturation of the controller manager.
topk(30,rate(workqueue_depth{job="openshift-controller-default"}[10m]))
Scheduler API Requests
Kube API Requests By Code
Check that there are no 5xx or 4xx error codes in the scheduler requests.
sum by (kube_cluster_name,kube_pod_name)(rate(rest_client_requests_total{job="openshift-controller-default",code=~"4.."}[10m]))
sum by (kube_cluster_name,kube_pod_name)(rate(rest_client_requests_total{job="openshift-controller-default",code=~"5.."}[10m]))
Agent Configuration
The default agent job for this integration is as follows:
- job_name: openshift-controller-manager-default
honor_labels: true
scheme: https
bearer_token_file: /var/run/secrets/kubernetes.io/serviceaccount/token
tls_config:
insecure_skip_verify: true
metrics_path: '/federate'
params:
'match[]':
- '{job=~"kube-controller-manager|controller-manager",__name__=~"workqueue_retries_total|workqueue_unfinished_work_seconds|workqueue_queue_duration_seconds_count|workqueue_work_duration_seconds_count|workqueue_queue_duration_seconds_sum|workqueue_work_duration_seconds_sum|workqueue_depth|workqueue_adds_total|rest_client_requests_total|go_goroutines"}'
kubernetes_sd_configs:
- role: pod
relabel_configs:
- action: keep
source_labels: [__meta_kubernetes_pod_host_ip]
regex: __HOSTIPS__
- source_labels: [__meta_kubernetes_pod_phase]
action: keep
regex: Running
- action: keep
source_labels:
- __meta_kubernetes_namespace
- __meta_kubernetes_pod_name
separator: '/'
regex: 'openshift-monitoring/prometheus-k8s-1'
# Holding on to pod-id and container name so we can associate the metrics
# with the container (and cluster hierarchy)
- action: replace
source_labels: [__meta_kubernetes_pod_uid]
target_label: sysdig_k8s_pod_uid
- action: replace
source_labels: [__meta_kubernetes_pod_container_name]
target_label: sysdig_k8s_pod_container_name
# Remove extended labelset
- action: replace
replacement: true
target_label: sysdig_omit_source
metric_relabel_configs:
- source_labels: [__name__]
regex: (go_goroutines|rest_client_requests_total|sysdig_container_cpu_cores_used|sysdig_container_memory_used_bytes|workqueue_adds_total|workqueue_depth|workqueue_queue_duration_seconds_count|workqueue_queue_duration_seconds_sum|workqueue_retries_total|workqueue_unfinished_work_seconds|workqueue_work_duration_seconds_count|workqueue_work_duration_seconds_sum)
action: keep
- action: replace
source_labels: [namespace]
target_label: kube_namespace_name
- action: replace
source_labels: [pod]
target_label: kube_pod_name
- source_labels: [job]
target_label: controller
- action: replace
target_label: job
replacement: openshift-controller-manager-default
- action: replace
source_labels: [ __address__ ]
target_label: _sysdig_integration_openshift_controller_manager
replacement: true
- action: replace
source_labels: [controller]
regex: '(controller-manager)'
target_label: controller
replacement: 'openshift-$1'
Feedback
Was this page helpful?
Glad to hear it! Please tell us how we can improve.
Sorry to hear that. Please tell us how we can improve.