OpenShift CoreDNS

Metrics, Dashboards, Alerts and more for OpenShift CoreDNS Integration in Sysdig Monitor.
OpenShift CoreDNS

This integration is enabled by default.

Versions supported: > v4.8

This integration is out-of-the-box, so it doesn’t require any exporter.

This integration has 13 metrics.

Timeseries generated: CoreDNS generates ~230 timeseries per dns-default pod

List of Alerts

AlertDescriptionFormat
[OpenShift CoreDNS] Process DownCoreDNS has disappeared from target discovery.Prometheus
[OpenShift CoreDNS] High Failed ResponsesCoreDNS is returning failed responses.Prometheus
[OpenShift CoreDNS] High LatencyCoreDNS responses latency is higher than 60ms.Prometheus
[OpenShift CoreDNS] Panics ObservedCoreDNS Panics Observed.Prometheus

List of Dashboards

OpenShift v4 CoreDNS

The dashboard provides information on the OpenShift CoreDNS. OpenShift v4 CoreDNS

List of Metrics

Metric name
coredns_cache_hits_total
coredns_cache_misses_total
coredns_dns_request_duration_seconds_bucket
coredns_dns_request_size_bytes_bucket
coredns_dns_requests_total
coredns_dns_response_size_bytes_bucket
coredns_dns_responses_total
coredns_forward_request_duration_seconds_bucket
coredns_panics_total
coredns_plugin_enabled
go_goroutines
process_cpu_seconds_total
process_resident_memory_bytes

Preparing the Integration

No preparations are required for this integration.

Installing

The installation of an exporter is not required for this integration.

Monitoring and Troubleshooting OpenShift CoreDNS

Because OpenShift 4.X comes with both Prometheus and CoreDNS ready to use, no additional installation is required. OpenShift CoreDNS metrics are exposed on the SSL port 9154.

Here are some interesting queries to run and metrics to monitor for troubleshooting OpenShift 4.

CoreDNS Panics

Number of Panics

To check the CoreDNS number of panics, use the following query:

sum(coredns_panics_total)

See the CoreDNS pods logs when you see this number growing.

DNS Requests

By Type

To filter DNS request types, use the following query:

(sum(rate(coredns_dns_requests_total[$__interval])) by (type,kube_cluster_name,kube_pod_name))

By Protocol

To filter DNS request types by protocol, use the following query:

(sum(rate(coredns_dns_requests_total[$__interval]) ) by (proto,kube_cluster_name,kube_pod_name))

By Zone

To filter DNS request types by zone, use the following query:

(sum(rate(coredns_dns_requests_total[$__interval]) ) by (zone,kube_cluster_name,kube_pod_name))

By Latency

This metrics detects any degradation in the service. With the following query, you can compare percentile 99 against average.

histogram_quantile(0.99, sum(rate(coredns_dns_request_duration_seconds_bucket[5m])) by(server, zone, le))

Error Rate

Watch carefully for this metric as you can filter depending on the status code: 200,404,400,500.

sum by (server, status)(coredns_dns_https_responses_total{server, status})

Cache

Cache Hit

To check the cache hit rate, use the following query:

sum(rate(coredns_cache_hits_total[$__interval])) by (type,kube_cluster_name,kube_pod_name)

Cache Miss

To check the cache miss rate, use the following query:

sum(rate(coredns_cache_misses_total[$__interval])) by(server,kube_cluster_name,kube_pod_name)

Agent Configuration

This is the default agent job for this integration:

- job_name: openshift-dns-default
  honor_labels: true
  tls_config:
    insecure_skip_verify: true
  bearer_token_file: /var/run/secrets/kubernetes.io/serviceaccount/token
  scheme: https        
  kubernetes_sd_configs:
  - role: pod
  relabel_configs:
  - action: keep
    source_labels: [__meta_kubernetes_pod_host_ip]
    regex: __HOSTIPS__
  - action: keep
    source_labels:
    - __meta_kubernetes_namespace
    - __meta_kubernetes_pod_name
    separator: '/'
    regex: 'openshift-dns/dns-default.+'
  - source_labels:
    - __address__
    action: keep
    regex: (.*:9154)
  - source_labels:
    - __meta_kubernetes_pod_name
    action: replace
    target_label: instance
  - action: labelmap
    regex: __meta_kubernetes_pod_label_(.+)
  - action: replace
    source_labels: [__meta_kubernetes_pod_uid]
    target_label: sysdig_k8s_pod_uid
  - action: replace
    source_labels: [__meta_kubernetes_pod_container_name]
    target_label: sysdig_k8s_pod_container_name
  metric_relabel_configs:
  - source_labels: [__name__]
    regex: (coredns_cache_hits_total|coredns_cache_misses_total|coredns_dns_request_duration_seconds_bucket|coredns_dns_request_size_bytes_bucket|coredns_dns_requests_total|coredns_dns_response_size_bytes_bucket|coredns_dns_responses_total|coredns_forward_request_duration_seconds_bucket|coredns_panics_total|coredns_plugin_enabled|go_goroutines|process_cpu_seconds_total|process_resident_memory_bytes)
    action: keep