Cluster Shield Troubleshooting

This page provides troubleshooting guidance for Sysdig Cluster Shield.

For an overview of Cluster Shield and installation instructions, see Cluster Shield.

API Slowness in EKS

Overview

If you experience slowness with the Kubernetes API server in an Elastic Kubernetes Service (EKS) cluster, and you have Cluster Shield with the Audit feature enabled, the issue could be related to cconnectivity problems between the EKS control plane and the audit webhook endpoint.

API slowness occurs because every API call to the Kubernetes API server must be validated by admission controllers before processing. These controllers enforce security and compliance checks on incoming requests.

If the admission controllers are unreachable—often due to networking issues preventing the EKS control plane from reaching the webhook endpoints—the API server waits while attempting to contact them. This waiting leads to retries and timeouts, significantly slowing down API response times. To prevent this, ensure that the admission controllers are accessible and properly connected.

Solution

To ensure proper connectivity and prevent slowness, do the following:

  1. Allow API Server Connectivity to Pods:

    For Cluster Shield’s Audit feature to function efficiently, the Kubernetes API server must be able to connect to the webhook endpoint.

    In EKS, where a custom Container Network Interface (CNI) may block direct communication, ensure that the necessary ports are open and accessible

    • Audit uses the port 6443 by default.
    • Admission Control uses the port 8443 by default.

    To customize these ports, see CNI on EKS.

  2. Update Security Group Rules:

    • Update the inbound rules for the Security Group associated with your EKS worker nodes to allow TCP traffic on the Audit port. 6443 is the default port used by Cluster Shield Audit.
    • Ensure that the source of this traffic is the EKS cluster’s control plane security group.

    This configuration allows the API server to reach the audit webhook endpoint without unnecessary delays.

    Security Group Inbound Rule Requirements:

    • Protocol: TCP
    • Port: The port you specified for Cluster Shield Audit. 6443 is the default port.
    • Source: The EKS cluster’s control plane security group

If you have also enabled the admission control feature of Cluster Shield, ensure that the TCP traffic on the admission controller port is allowed. 8443 is the default port for admission controller.

CNI on EKS

At times, you may need to change the default ports for Cluster Shield’s Audit and Admission Controller.

For instance, when using a custom Container Network Interface (CNI) on EKS, the API server may not be able to reach the webhook endpoint. This occurs because the control plane cannot be configured to run on a custom CNI in EKS.

To resolve this issue, when installing Cluster Shield via Helm, apply the following configurations:

clusterShield:
  hostNetwork: true
  features:
    audit:
      http_port: 5000 # Or any other open and unused port >1024
    admission_control:
      http_port: 6000 # Or any other open and unused port >1024

Update the inbound rule in the EKS worker nodes security group, allowing TCP communication on the ports you specified from the EKS cluster security group. In this example, the ports you allow TCP communication are 5000 and 6000.