This the multi-page printable view of this section. Click here to print.

Return to the regular view of this page.

  • 1:
    • 2:
      • 2.1:
        • 2.1.1:
          • 2.1.2:
            • 2.1.3:
              • 2.1.4:
            • 3:
              • 3.1:
                • 3.1.1:
                  • 3.1.2:
                    • 3.1.3:
                    • 3.2:
                      • 3.3:
                        • 3.4:
                        • 4:
                          • 4.1:
                            • 4.2:
                              • 4.3:
                                • 4.4:
                                  • 4.5:
                                    • 4.6:
                                      • 4.7:
                                        • 4.8:
                                          • 4.9:
                                          • 5:
                                            • 5.1:
                                              • 5.2:
                                                • 5.3:
                                                  • 5.4:
                                                    • 5.4.1:
                                                      • 5.4.2:
                                                        • 5.4.3:
                                                          • 5.4.4:
                                                          • 5.5:
                                                            • 5.5.1:
                                                              • 5.5.2:
                                                                • 5.5.2.1:
                                                                  • 5.5.2.2:
                                                                    • 5.5.2.3:
                                                                      • 5.5.2.4:
                                                                        • 5.5.2.5:
                                                                          • 5.5.2.6:
                                                                          • 5.5.3:
                                                                            • 5.5.3.1:
                                                                              • 5.5.3.2:
                                                                                • 5.5.3.3:
                                                                                  • 5.5.3.4:
                                                                                    • 5.5.3.5:
                                                                                  • 5.6:
                                                                                    • 5.6.1:
                                                                                      • 5.6.2:
                                                                                        • 5.6.3:
                                                                                          • 5.6.4:
                                                                                            • 5.6.4.1:
                                                                                              • 5.6.4.2:
                                                                                            • 5.7:
                                                                                            • 6:
                                                                                              • 6.1:
                                                                                                • 6.2:
                                                                                                  • 6.3:
                                                                                                    • 6.4:
                                                                                                      • 6.5:
                                                                                                        • 6.6:
                                                                                                          • 6.7:
                                                                                                            • 6.8:
                                                                                                              • 6.9:
                                                                                                              • 7:
                                                                                                                • 7.1:
                                                                                                                  • 7.2:
                                                                                                                    • 7.3:
                                                                                                                      • 7.4:
                                                                                                                        • 7.5:
                                                                                                                          • 7.6:
                                                                                                                            • 7.7:
                                                                                                                            • 8:
                                                                                                                              • 8.1:
                                                                                                                                • 8.1.1:
                                                                                                                                  • 8.1.2:
                                                                                                                                  • 8.2:
                                                                                                                                    • 8.2.1:
                                                                                                                                      • 8.2.2:
                                                                                                                                        • 8.2.3:
                                                                                                                                          • 8.2.4:
                                                                                                                                            • 8.2.5:
                                                                                                                                              • 8.2.6:
                                                                                                                                                • 8.2.7:
                                                                                                                                                  • 8.2.8:
                                                                                                                                                    • 8.2.9:
                                                                                                                                                      • 8.2.10:
                                                                                                                                                        • 8.2.11:
                                                                                                                                                          • 8.2.12:
                                                                                                                                                            • 8.2.13:
                                                                                                                                                              • 8.2.14:
                                                                                                                                                                • 8.2.15:
                                                                                                                                                                  • 8.2.16:
                                                                                                                                                                    • 8.2.17:
                                                                                                                                                                      • 8.2.18:
                                                                                                                                                                        • 8.2.19:
                                                                                                                                                                          • 8.2.20:
                                                                                                                                                                            • 8.2.21:
                                                                                                                                                                              • 8.2.22:
                                                                                                                                                                              • 8.3:
                                                                                                                                                                                • 8.3.1:
                                                                                                                                                                                  • 8.3.1.1:
                                                                                                                                                                                    • 8.3.1.2:
                                                                                                                                                                                      • 8.3.1.3:
                                                                                                                                                                                      • 8.3.2:
                                                                                                                                                                                        • 8.3.3:
                                                                                                                                                                                          • 8.3.4:
                                                                                                                                                                                          • 8.4:
                                                                                                                                                                                            • 8.4.1:
                                                                                                                                                                                              • 8.4.2:
                                                                                                                                                                                                • 8.4.3:
                                                                                                                                                                                                  • 8.4.4:
                                                                                                                                                                                                  • 8.5:
                                                                                                                                                                                                    • 8.5.1:
                                                                                                                                                                                                    • 8.6:
                                                                                                                                                                                                      • 8.6.1:
                                                                                                                                                                                                        • 8.6.1.1:
                                                                                                                                                                                                          • 8.6.1.2:
                                                                                                                                                                                                            • 8.6.1.3:
                                                                                                                                                                                                              • 8.6.1.4:
                                                                                                                                                                                                                • 8.6.1.5:
                                                                                                                                                                                                                  • 8.6.1.6:
                                                                                                                                                                                                                    • 8.6.1.7:
                                                                                                                                                                                                                    • 8.6.2:
                                                                                                                                                                                                                      • 8.6.2.1:
                                                                                                                                                                                                                        • 8.6.2.2:
                                                                                                                                                                                                                          • 8.6.2.3:
                                                                                                                                                                                                                            • 8.6.2.4:
                                                                                                                                                                                                                              • 8.6.2.5:
                                                                                                                                                                                                                                • 8.6.2.6:
                                                                                                                                                                                                                                  • 8.6.2.7:
                                                                                                                                                                                                                                    • 8.6.2.8:
                                                                                                                                                                                                                                      • 8.6.2.9:
                                                                                                                                                                                                                                        • 8.6.2.10:
                                                                                                                                                                                                                                          • 8.6.2.11:
                                                                                                                                                                                                                                            • 8.6.2.12:
                                                                                                                                                                                                                                              • 8.6.2.13:
                                                                                                                                                                                                                                                • 8.6.2.14:
                                                                                                                                                                                                                                                  • 8.6.2.15:
                                                                                                                                                                                                                                                    • 8.6.2.16:
                                                                                                                                                                                                                                                      • 8.6.2.17:
                                                                                                                                                                                                                                                        • 8.6.2.18:
                                                                                                                                                                                                                                                          • 8.6.2.19:
                                                                                                                                                                                                                                                            • 8.6.2.20:
                                                                                                                                                                                                                                                              • 8.6.2.21:
                                                                                                                                                                                                                                                                • 8.6.2.22:
                                                                                                                                                                                                                                                                  • 8.6.2.23:
                                                                                                                                                                                                                                                                    • 8.6.2.24:
                                                                                                                                                                                                                                                                      • 8.6.2.25:
                                                                                                                                                                                                                                                                        • 8.6.2.26:
                                                                                                                                                                                                                                                                          • 8.6.2.27:
                                                                                                                                                                                                                                                                          • 8.6.3:
                                                                                                                                                                                                                                                                            • 8.6.4:
                                                                                                                                                                                                                                                                          • 9:
                                                                                                                                                                                                                                                                            • 9.1:
                                                                                                                                                                                                                                                                              • 9.2:
                                                                                                                                                                                                                                                                              • 10:
                                                                                                                                                                                                                                                                                • 10.1:
                                                                                                                                                                                                                                                                                  • 10.1.1:
                                                                                                                                                                                                                                                                                    • 10.1.2:
                                                                                                                                                                                                                                                                                      • 10.1.3:
                                                                                                                                                                                                                                                                                        • 10.1.4:
                                                                                                                                                                                                                                                                                        • 10.2:
                                                                                                                                                                                                                                                                                          • 10.3:
                                                                                                                                                                                                                                                                                            • 10.4:
                                                                                                                                                                                                                                                                                              • 10.5:
                                                                                                                                                                                                                                                                                                • 10.6:
                                                                                                                                                                                                                                                                                                  • 10.7:
                                                                                                                                                                                                                                                                                                    • 10.8:
                                                                                                                                                                                                                                                                                                      • 10.9:
                                                                                                                                                                                                                                                                                                        • 10.9.1:
                                                                                                                                                                                                                                                                                                      • 10.10:
                                                                                                                                                                                                                                                                                                        • 10.10.1:
                                                                                                                                                                                                                                                                                                      • 10.11:
                                                                                                                                                                                                                                                                                                        • 10.11.1:
                                                                                                                                                                                                                                                                                                          • 10.11.2:
                                                                                                                                                                                                                                                                                                            • 10.11.2.1:
                                                                                                                                                                                                                                                                                                              • 10.11.2.2:
                                                                                                                                                                                                                                                                                                                • 10.11.2.2.1:
                                                                                                                                                                                                                                                                                                                  • 10.11.2.2.2:
                                                                                                                                                                                                                                                                                                                  • 10.11.2.3:
                                                                                                                                                                                                                                                                                                                    • 10.11.2.3.1:
                                                                                                                                                                                                                                                                                                                      • 10.11.2.3.2:
                                                                                                                                                                                                                                                                                                                      • 10.11.2.4:
                                                                                                                                                                                                                                                                                                                        • 10.11.2.5:
                                                                                                                                                                                                                                                                                                                          • 10.11.2.5.1:
                                                                                                                                                                                                                                                                                                                        • 10.11.2.6:
                                                                                                                                                                                                                                                                                                                          • 10.11.2.7:
                                                                                                                                                                                                                                                                                                                            • 10.11.2.8:
                                                                                                                                                                                                                                                                                                                              • 10.11.2.8.1:
                                                                                                                                                                                                                                                                                                                            • 10.11.2.9:
                                                                                                                                                                                                                                                                                                                              • 10.11.2.9.1:
                                                                                                                                                                                                                                                                                                                            • 10.11.2.10:
                                                                                                                                                                                                                                                                                                                              • 10.11.2.10.1:
                                                                                                                                                                                                                                                                                                                            • 10.11.2.11:
                                                                                                                                                                                                                                                                                                                              • 10.11.2.11.1:
                                                                                                                                                                                                                                                                                                                            • 10.11.2.12:
                                                                                                                                                                                                                                                                                                                              • 10.11.2.12.1:
                                                                                                                                                                                                                                                                                                                            • 10.11.2.13:
                                                                                                                                                                                                                                                                                                                              • 10.11.2.13.1:
                                                                                                                                                                                                                                                                                                                            • 10.11.2.14:
                                                                                                                                                                                                                                                                                                                              • 10.11.2.14.1:
                                                                                                                                                                                                                                                                                                                                • 10.11.2.14.1.1:
                                                                                                                                                                                                                                                                                                                              • 10.11.2.14.2:
                                                                                                                                                                                                                                                                                                                                • 10.11.2.14.2.1:
                                                                                                                                                                                                                                                                                                                              • 10.11.2.14.3:
                                                                                                                                                                                                                                                                                                                                • 10.11.2.14.3.1:
                                                                                                                                                                                                                                                                                                                            • 10.11.2.15:
                                                                                                                                                                                                                                                                                                                              • 10.11.2.15.1:
                                                                                                                                                                                                                                                                                                                            • 10.11.2.16:
                                                                                                                                                                                                                                                                                                                              • 10.11.2.17:
                                                                                                                                                                                                                                                                                                                                • 10.11.2.17.1:
                                                                                                                                                                                                                                                                                                                                  • 10.11.2.17.1.1:
                                                                                                                                                                                                                                                                                                                                • 10.11.2.17.2:
                                                                                                                                                                                                                                                                                                                                  • 10.11.2.17.2.1:
                                                                                                                                                                                                                                                                                                                              • 10.11.2.18:
                                                                                                                                                                                                                                                                                                                                • 10.11.2.18.1:
                                                                                                                                                                                                                                                                                                                              • 10.11.2.19:
                                                                                                                                                                                                                                                                                                                                • 10.11.2.19.1:
                                                                                                                                                                                                                                                                                                                              • 10.11.2.20:
                                                                                                                                                                                                                                                                                                                                • 10.11.2.20.1:
                                                                                                                                                                                                                                                                                                                              • 10.11.2.21:
                                                                                                                                                                                                                                                                                                                                • 10.11.2.21.1:
                                                                                                                                                                                                                                                                                                                              • 10.11.2.22:
                                                                                                                                                                                                                                                                                                                                • 10.11.2.22.1:
                                                                                                                                                                                                                                                                                                                              • 10.11.2.23:
                                                                                                                                                                                                                                                                                                                                • 10.11.2.23.1:
                                                                                                                                                                                                                                                                                                                              • 10.11.2.24:
                                                                                                                                                                                                                                                                                                                                • 10.11.2.24.1:
                                                                                                                                                                                                                                                                                                                              • 10.11.2.25:
                                                                                                                                                                                                                                                                                                                                • 10.11.2.25.1:
                                                                                                                                                                                                                                                                                                                            • 10.11.3:
                                                                                                                                                                                                                                                                                                                              • 10.11.3.1:
                                                                                                                                                                                                                                                                                                                                • 10.11.3.2:
                                                                                                                                                                                                                                                                                                                                • 10.11.4:
                                                                                                                                                                                                                                                                                                                                  • 10.11.5:
                                                                                                                                                                                                                                                                                                                                    • 10.11.5.1:
                                                                                                                                                                                                                                                                                                                                      • 10.11.5.1.1:
                                                                                                                                                                                                                                                                                                                                        • 10.11.5.1.2:
                                                                                                                                                                                                                                                                                                                                          • 10.11.5.1.3:
                                                                                                                                                                                                                                                                                                                                            • 10.11.5.1.4:
                                                                                                                                                                                                                                                                                                                                              • 10.11.5.1.5:
                                                                                                                                                                                                                                                                                                                                                • 10.11.5.1.6:
                                                                                                                                                                                                                                                                                                                                                  • 10.11.5.1.7:
                                                                                                                                                                                                                                                                                                                                                    • 10.11.5.1.8:
                                                                                                                                                                                                                                                                                                                                                  • 10.11.6:
                                                                                                                                                                                                                                                                                                                                                    • 10.11.7:
                                                                                                                                                                                                                                                                                                                                                      • 10.11.8:
                                                                                                                                                                                                                                                                                                                                                        • 10.11.9:
                                                                                                                                                                                                                                                                                                                                                          • 10.11.10:
                                                                                                                                                                                                                                                                                                                                                            • 10.11.10.1:
                                                                                                                                                                                                                                                                                                                                                              • 10.11.10.2:
                                                                                                                                                                                                                                                                                                                                                              • 10.11.11:
                                                                                                                                                                                                                                                                                                                                                                • 10.11.12:
                                                                                                                                                                                                                                                                                                                                                                  • 10.11.13:
                                                                                                                                                                                                                                                                                                                                                                    • 10.11.14:
                                                                                                                                                                                                                                                                                                                                                                      • 10.11.15:
                                                                                                                                                                                                                                                                                                                                                                        • 10.11.16:

                                                                                                                                                                                                                                                                                                                                                                      Sysdig Monitor

                                                                                                                                                                                                                                                                                                                                                                      Sysdig Monitor is part of Sysdig’s container intelligence platform. Sysdig uses a unified platform to deliver security, monitoring, and forensics in a container- and microservices-friendly architecture. Sysdig Monitor is a monitoring, troubleshooting, and alerting suite offering deep, process-level visibility into dynamic, distributed production environments. Sysdig Monitor captures, correlates, and visualizes full-stack data, and provides dashboards for monitoring.

                                                                                                                                                                                                                                                                                                                                                                      In the background, the Sysdig agent lives on the hosts being monitored and collects the appropriate metrics and events. Out of the box, the agent reports on a wide variety of pre-defined metrics. Additional metrics and custom parameters are available via agent configuration files. For more information, see the Sysdig Agent Documentation.

                                                                                                                                                                                                                                                                                                                                                                      Major Benefits

                                                                                                                                                                                                                                                                                                                                                                      • Explore and monitor application performance at any level of the infrastructure stack

                                                                                                                                                                                                                                                                                                                                                                      • Correlate metrics and events, and compare with past performance

                                                                                                                                                                                                                                                                                                                                                                      • Observe platform state and health

                                                                                                                                                                                                                                                                                                                                                                      • Auto-detect anomalies

                                                                                                                                                                                                                                                                                                                                                                      • Visualize and share performance metrics with out-of-the-box and custom dashboards

                                                                                                                                                                                                                                                                                                                                                                      • Powerful, tuned, and flexible alerts

                                                                                                                                                                                                                                                                                                                                                                      • Proactively alert on incidents across services, hosts, containers and so on

                                                                                                                                                                                                                                                                                                                                                                      • Trigger system captures for offline troubleshooting and forensics

                                                                                                                                                                                                                                                                                                                                                                      • Analyze system call activity to accelerate problem resolution

                                                                                                                                                                                                                                                                                                                                                                      Key Components

                                                                                                                                                                                                                                                                                                                                                                      Monitor Interface

                                                                                                                                                                                                                                                                                                                                                                      Log into the Sysdig Monitor interface, and get started with the basics.

                                                                                                                                                                                                                                                                                                                                                                      Advisor

                                                                                                                                                                                                                                                                                                                                                                      Operate and troubleshoot Kubernetes infrastructure easily with a curated and unified view of metrics, alerts, and events.

                                                                                                                                                                                                                                                                                                                                                                      Explore the Infrastructure

                                                                                                                                                                                                                                                                                                                                                                      Dive into Sysdig Monitor with a deeper understanding of the Explore module, data aggregation, and how to break down data.

                                                                                                                                                                                                                                                                                                                                                                      This feature is available in the Enterprise tier of the Sysdig product. See https://sysdig.com/pricing for details, or contact sales@sysdig.com.

                                                                                                                                                                                                                                                                                                                                                                      Metrics

                                                                                                                                                                                                                                                                                                                                                                      The backbone of monitoring: learn more about metrics, integrate external platforms, and explore the complete metrics dictionary.

                                                                                                                                                                                                                                                                                                                                                                      Alerts

                                                                                                                                                                                                                                                                                                                                                                      Learn how to build alerts to notify users of infrastructure events, changes in behavior, and unauthorized access.

                                                                                                                                                                                                                                                                                                                                                                      Dashboards

                                                                                                                                                                                                                                                                                                                                                                      Learn how to build a custom dashboard, configure the default ones, or reconfigure panels to best suit your infrastructure.

                                                                                                                                                                                                                                                                                                                                                                      Integrations

                                                                                                                                                                                                                                                                                                                                                                      Integrate with various inbound and outbound data sources ranging from a number of platforms and orchestrators to a wide range of applications.

                                                                                                                                                                                                                                                                                                                                                                      Events

                                                                                                                                                                                                                                                                                                                                                                      Integrate Docker and Kubernetes events, customize event notifications, and review infrastructure history.

                                                                                                                                                                                                                                                                                                                                                                      Captures

                                                                                                                                                                                                                                                                                                                                                                      Create capture files containing system calls and other OS events to assist monitoring and troubleshooting the infrastructure.

                                                                                                                                                                                                                                                                                                                                                                      1 -

                                                                                                                                                                                                                                                                                                                                                                      Getting Started with Sysdig Monitor

                                                                                                                                                                                                                                                                                                                                                                      Sysdig Monitor allows you to maximize the visibility of your Kubernetes environments with native Prometheus support. You can troubleshoot issues faster with Sysdig’s eBPF derived metrics, out-of-the-box dashboards, and alerts.

                                                                                                                                                                                                                                                                                                                                                                      You can choose Sysdig Monitor for a Free Trial option to quickly connect to a single cloud account with Sysdig and start with Prometheus-compatible Kubernetes and cloud monitoring.

                                                                                                                                                                                                                                                                                                                                                                      Once connected, the Get Started page shows a subset of the options available in the 30-day trial or Enterprise.

                                                                                                                                                                                                                                                                                                                                                                      Get Started Page

                                                                                                                                                                                                                                                                                                                                                                      The Get Started page targets the key steps to ensure users are getting the most value out of Sysdig Monitor. The page is updated with new steps as users complete tasks and Sysdig adds new features to the product.

                                                                                                                                                                                                                                                                                                                                                                      The Get Started page also serves as a linking page for

                                                                                                                                                                                                                                                                                                                                                                      • Documentation

                                                                                                                                                                                                                                                                                                                                                                      • Release Notes

                                                                                                                                                                                                                                                                                                                                                                      • The Sysdig Blog

                                                                                                                                                                                                                                                                                                                                                                      • Self-Paced Training

                                                                                                                                                                                                                                                                                                                                                                      • Support

                                                                                                                                                                                                                                                                                                                                                                      Users can access the Get Started page at any time by clicking the rocketship in the side menu.

                                                                                                                                                                                                                                                                                                                                                                      Install the Agent

                                                                                                                                                                                                                                                                                                                                                                      Installing the agent on your infrastructure allows Sysdig to collect data for monitoring and security purposes. For more information, see Quick Install Sysdig Agent on Kubernetes.

                                                                                                                                                                                                                                                                                                                                                                      (Optional) Connect Your Prometheus Servers

                                                                                                                                                                                                                                                                                                                                                                      Connecting your Prometheus servers to Sysdig-managed Prometheus Service helps leverage Sysdig for scalable long-term storage of your Prometheus metrics, PromQL dashboards, centralized querying, and PromQL-based alerting. For more information, see Collect Prometheus Metrics.

                                                                                                                                                                                                                                                                                                                                                                      Invite Your Team

                                                                                                                                                                                                                                                                                                                                                                      Invite someone in your team to use this Sysdig Monitor account. They will be notified with an email. A user will be created for them and will be added to the default team. They are automatically assigned to the Advanced User role.

                                                                                                                                                                                                                                                                                                                                                                      Monitor Your Kubernetes Clusters

                                                                                                                                                                                                                                                                                                                                                                      Get a unified view of the health, risk, and capacity of your Kubernetes infrastructure in a multi- and hybrid-cloud environment. For more information, see Dashboard Templates.

                                                                                                                                                                                                                                                                                                                                                                      Workload Status & Performance

                                                                                                                                                                                                                                                                                                                                                                      Get deep insight into your Kubernetes workloads faster with the Workload Status & Performance Dashboard.

                                                                                                                                                                                                                                                                                                                                                                      Pod Status & Performance

                                                                                                                                                                                                                                                                                                                                                                      Drill down to workload pods and monitor pod-level resource usage and troubleshoot performance issues with the Pod Status & Performance Dashboard.

                                                                                                                                                                                                                                                                                                                                                                      Cluster Capacity Planning

                                                                                                                                                                                                                                                                                                                                                                      Verify if your cluster is sized properly for existing deployed applications, identify over-commit on resources that can lead to pod evictions, discover unused requested resources or containers without limits defined with the Cluster Capacity Planning Dashboard.

                                                                                                                                                                                                                                                                                                                                                                      Cluster/Namespace Available Resources

                                                                                                                                                                                                                                                                                                                                                                      Determine if your cluster has the capacity to deploy a new workload and ascertain if increasing CPU or memory requests or placing limits on an existing application is necessary with the Cluster/Namespace Available Resources Dashboard.

                                                                                                                                                                                                                                                                                                                                                                      Pod Rightsizing & Workload Capacity Optimization

                                                                                                                                                                                                                                                                                                                                                                      Identify resource-hogging workloads while optimizing your capacity with the Pod Rightsizing & Workload Capacity Optimization Dashboard.

                                                                                                                                                                                                                                                                                                                                                                      Set Up Alert

                                                                                                                                                                                                                                                                                                                                                                      Sysdig Monitor emits alerts to get proactive notification of events, anomalies, or any incident that requires attention. The alerting system provides out-of-the-box push gateways for regular email, Slack, Cloud-provider notification queues, and custom webhooks, among others. See .

                                                                                                                                                                                                                                                                                                                                                                      Configure a Notification Channel

                                                                                                                                                                                                                                                                                                                                                                      Alerts are used in Sysdig Monitor when Event thresholds have been crossed and can be sent over a variety of supported notification channels. Integrate Sysdig with your notification dispatchers and incident management workflows. See Set Up Notification Channels

                                                                                                                                                                                                                                                                                                                                                                      Turn on Alerts

                                                                                                                                                                                                                                                                                                                                                                      Turn on recommended alerts from our Alerts Library. Customize our recommendations or create your own alerts from scratch. See Alerts Library.

                                                                                                                                                                                                                                                                                                                                                                      Monitor Your Services

                                                                                                                                                                                                                                                                                                                                                                      Create a Dashboard

                                                                                                                                                                                                                                                                                                                                                                      Create customized dashboards to display the most relevant views and metrics for the infrastructure in a single location. Each dashboard is comprised of a series of panels configured to display specific data in a number of different formats. See Dashboards.

                                                                                                                                                                                                                                                                                                                                                                      Get Started with PromQL

                                                                                                                                                                                                                                                                                                                                                                      Write PromQL queries easier with form-based querying available with Sysdig Monitor. All metrics are enriched with cloud and Kubernetes metadata avoiding complicated PromQL joins. See Using PromQL.

                                                                                                                                                                                                                                                                                                                                                                      Monitoring Integrations

                                                                                                                                                                                                                                                                                                                                                                      Sysdig discovers services running in infrastructure and recommends appropriate Monitoring Integrations that allow you to collect service-specific metrics. The integration bundle includes out-of-the-box dashboards and default alerts. See Configure Monitoring Integrations.

                                                                                                                                                                                                                                                                                                                                                                      Advanced Actions

                                                                                                                                                                                                                                                                                                                                                                      Integrate development tools:

                                                                                                                                                                                                                                                                                                                                                                      2 -

                                                                                                                                                                                                                                                                                                                                                                      Advisor

                                                                                                                                                                                                                                                                                                                                                                      Advisor brings your metrics, alerts, and events into a focused and curated view to help you operate and troubleshoot Kubernetes infrastructure. To help you solve problems faster, over time, Advisor will surface your infrastructure issues that you should pay attention to.

                                                                                                                                                                                                                                                                                                                                                                      Advisor is available to only our SaaS users. The feature is not currently available for on-prem environments.

                                                                                                                                                                                                                                                                                                                                                                      Advisor presents your infrastructure grouped by cluster, namespace, workload, and pod. You cannot currently configure a custom grouping. Depending on the selection, you will see different curated views and you can switch between the following:

                                                                                                                                                                                                                                                                                                                                                                      • Triggered alerts
                                                                                                                                                                                                                                                                                                                                                                      • Events from Kubernetes, container engines, and custom events sent via the Monitor APIs
                                                                                                                                                                                                                                                                                                                                                                      • Cluster usage and capacity
                                                                                                                                                                                                                                                                                                                                                                      • Key golden signals (requests, latency, errors) derived from system calls
                                                                                                                                                                                                                                                                                                                                                                      • Kubernetes metrics about the health and status of Kubernetes objects
                                                                                                                                                                                                                                                                                                                                                                      • Container live logs
                                                                                                                                                                                                                                                                                                                                                                      • Process and network telemetry (CPU, memory, network connections, etc.)
                                                                                                                                                                                                                                                                                                                                                                      • Monitoring Integrations

                                                                                                                                                                                                                                                                                                                                                                      The time window of metrics displayed on Advisor is the last 1 hour of collected data. To see historical values for a metric, drill down to a related dashboard or explore a metric using the Explore UI.

                                                                                                                                                                                                                                                                                                                                                                      Live logs

                                                                                                                                                                                                                                                                                                                                                                      Advisor can display live logs for a container, which is the equivalent of running kubectl logs. This is useful for troubleshooting application errors or problems such as pods in a CrashLoopBackOff state.

                                                                                                                                                                                                                                                                                                                                                                      When selecting a Pod, a Logs tab will appear. If there are multiple containers within a pod, you can select the container you wish to view logs for. Once requested, logs are streamed for 3 minutes before the session is automatically closed (you can simply re-start streaming if necessary).

                                                                                                                                                                                                                                                                                                                                                                      Live logs are tailed on-demand and thus not persisted. After a session is closed they are no longer accessible.

                                                                                                                                                                                                                                                                                                                                                                      Manage Access to Live Logs

                                                                                                                                                                                                                                                                                                                                                                      By default live logs is available for users within the scope of their Sysdig Team. Use Custom Roles to manage live logs permissions.

                                                                                                                                                                                                                                                                                                                                                                      Configure Agent for Live Logs

                                                                                                                                                                                                                                                                                                                                                                      Live logs are enabled by default in agent 12.7.0 or newer. Agent 12.6.0 supports live logs but must be manually enabled by setting enabled: true. Older versions of the Sysdig Agent do not support live logs.

                                                                                                                                                                                                                                                                                                                                                                      Live logs can be enabled or disabled within the agent configuration.

                                                                                                                                                                                                                                                                                                                                                                      To turn live logs off globally for a cluster, add the following in the dragent.yaml file:

                                                                                                                                                                                                                                                                                                                                                                      live_logs:
                                                                                                                                                                                                                                                                                                                                                                        enabled: false
                                                                                                                                                                                                                                                                                                                                                                      

                                                                                                                                                                                                                                                                                                                                                                      If using Helm, this is configured via sysdig.settings. For example:

                                                                                                                                                                                                                                                                                                                                                                      sysdig:
                                                                                                                                                                                                                                                                                                                                                                       # Advanced settings. Any option in here will be directly translated into dragent.yaml in the Configmap
                                                                                                                                                                                                                                                                                                                                                                       settings:
                                                                                                                                                                                                                                                                                                                                                                         live_logs:
                                                                                                                                                                                                                                                                                                                                                                           enabled: false
                                                                                                                                                                                                                                                                                                                                                                      

                                                                                                                                                                                                                                                                                                                                                                      2.1 -

                                                                                                                                                                                                                                                                                                                                                                      Overview

                                                                                                                                                                                                                                                                                                                                                                      Overview leverages Sysdig’s unified data platform to monitor, secure, and troubleshoot your hosts and Kubernetes clusters and workloads.

                                                                                                                                                                                                                                                                                                                                                                      The module provides a unified view of the health, risk, and capacity of your Kubernetes infrastructure— a single pane of glass for host machines as well as Kubernetes Clusters, Nodes, Namespaces, and Workloads across a multi- and hybrid-cloud environment. You can easily filter by any of these entities and view associated events and health data.

                                                                                                                                                                                                                                                                                                                                                                      Overview shows metrics prioritized by event count and severity, allowing you to get to the root cause of the problem faster. Sysdig Monitor polls the infrastructure data every 10 minutes and refreshes the metrics and events on the Overview page with the system health.

                                                                                                                                                                                                                                                                                                                                                                      Key Benefits

                                                                                                                                                                                                                                                                                                                                                                      Overview provides the following benefits:

                                                                                                                                                                                                                                                                                                                                                                      • Show a unified view of the health, risk, resource use, and capacity of your infrastructure environment at scale

                                                                                                                                                                                                                                                                                                                                                                        • Render metrics, security events, compliance CIS benchmark results, and contextual events in a single location

                                                                                                                                                                                                                                                                                                                                                                        • Eliminate the need for stand-alone security, monitoring, and forensics tools

                                                                                                                                                                                                                                                                                                                                                                        • View data on-the-fly by workload or by infrastructure

                                                                                                                                                                                                                                                                                                                                                                      • Display contextual live event stream from alerts, Kubernetes, containers, policies, and image scanning results

                                                                                                                                                                                                                                                                                                                                                                      • Surface entities intelligently based on event count and severity

                                                                                                                                                                                                                                                                                                                                                                      • Drills down from Clusters to Nodes and Namespaces

                                                                                                                                                                                                                                                                                                                                                                      • Support Infrastructure monitoring of multi- and hybrid- cloud environments

                                                                                                                                                                                                                                                                                                                                                                      • Expose relevant information based on core operational users :

                                                                                                                                                                                                                                                                                                                                                                        • DevOps / Platform Ops

                                                                                                                                                                                                                                                                                                                                                                        • Security Analyst

                                                                                                                                                                                                                                                                                                                                                                        • Service Owner

                                                                                                                                                                                                                                                                                                                                                                      Accessing the Overview User Interface

                                                                                                                                                                                                                                                                                                                                                                      You can access and set the scope of Overview in the Sysdig Monitor UI or with the URL:

                                                                                                                                                                                                                                                                                                                                                                      Click Overview in the left navigation, then select one of the Kubernetes entities:

                                                                                                                                                                                                                                                                                                                                                                      About the Overview User Interface

                                                                                                                                                                                                                                                                                                                                                                      The Overview interface opens to the Clusters Overview page. This section describes the major components of the interface and the navigation options.

                                                                                                                                                                                                                                                                                                                                                                      Though the default landing page is Clusters Overview, when you have no Kubernetes clusters configured, the Overview tab opens to the Hosts view. In addition, when you reopen the Overview menu, the default view will be your last visited Overview page as it retains the visit history.

                                                                                                                                                                                                                                                                                                                                                                      Overview Rows

                                                                                                                                                                                                                                                                                                                                                                      Each row represents a Kubernetes entity: a cluster, node, namespace, or workload. In the screenshot above, each row shows a Kubernetes cluster.

                                                                                                                                                                                                                                                                                                                                                                      • Navigating rows is easy

                                                                                                                                                                                                                                                                                                                                                                        Click on the Overview icon in the left navigation and choose an Overview page, or drill down into the next Overview page to explore the next granular level of data. Each Overview page shows 10 rows by default and a maximum of 100 rows. Click Load More to display additional rows if there are more than 10 rows per page.

                                                                                                                                                                                                                                                                                                                                                                      • Ability to select a specific row in an Overview page

                                                                                                                                                                                                                                                                                                                                                                        Each row contains the scope of the relevant entity that it is showing data for. Clicking a specific row leads to deselecting the rest of the rows (for instance, selecting staging deselects all other rows in the screenshot above) to focus on the scope of the selected entity, including the events which are scoped out by that row. Pausing to focus on a single row provides a snapshot of what is going on until at the moment with the entity under purview.

                                                                                                                                                                                                                                                                                                                                                                      • Entities are ranked according to the severity and the number of events detected in them

                                                                                                                                                                                                                                                                                                                                                                        Rows are sorted by the count and severity level of the events associated with the entity and are displayed in descending order. The items with the highest number of high severity events are shown first, followed by medium, low, and info. This organization helps to highlight events demanding immediate attention and to streamline troubleshooting efforts, in environments that may include thousands of entities.

                                                                                                                                                                                                                                                                                                                                                                      Scope Editor

                                                                                                                                                                                                                                                                                                                                                                      Scope Editor allows targeting down to a specific entity, such as a particular workload or namespace, from environments that may include thousands of entities. The levels of scope, determined by Kubernetes hierarchy, progresses from Workload to Cluster where Cluster being at the top level. In smaller environments, using the Scope Editor is equivalent to clicking a single row in an Overview page where no scope has been applied.

                                                                                                                                                                                                                                                                                                                                                                      Cluster: The highest level in the hierarchy. The only scope applied to the page is Cluster. It allows you to select a specific cluster from a list of available ones.

                                                                                                                                                                                                                                                                                                                                                                      Node: The second level in the hierarchy. The scope is determined by Cluster and Node. Selection is narrowed down to a specific node in a selected cluster.

                                                                                                                                                                                                                                                                                                                                                                      Namespace: The third level in the hierarchy. The scope is determined by Cluster and Namespace. Selection is narrowed down to a specific namespace in a selected cluster.

                                                                                                                                                                                                                                                                                                                                                                      Workloads: The last entity in the hierarchy. The scope is initially determined by Cluster and Namespace, then the selection is narrowed to a specific Deployment, Service, or StatefulSet. Choosing all three options are not allowed.

                                                                                                                                                                                                                                                                                                                                                                      Time Navigation

                                                                                                                                                                                                                                                                                                                                                                      The Overview feature is based around time. Sysdig Monitor polls the infrastructure data every 1 minute and refreshes the metrics and events on the Overview page with the system health. The time range is fixed at 12 hours. However, the gauge and compliance score widgets display the latest data sample, not an aggregation over the entire 12-hour time range.

                                                                                                                                                                                                                                                                                                                                                                      The Overview feed is always live and cannot be paused.

                                                                                                                                                                                                                                                                                                                                                                      Unified Stream of Events

                                                                                                                                                                                                                                                                                                                                                                      The right panel of Overview provides a context-sensitive events feed.

                                                                                                                                                                                                                                                                                                                                                                      Click an overview row to see relevant Events on the right. Each event is intelligently populated with end-to-end metadata to give context and enable troubleshooting.

                                                                                                                                                                                                                                                                                                                                                                      Event Types

                                                                                                                                                                                                                                                                                                                                                                      Overview renders the following event types:

                                                                                                                                                                                                                                                                                                                                                                      • Alert: See Alerts.

                                                                                                                                                                                                                                                                                                                                                                      • Custom: Ensure that Custom labels are enabled to view this type of events.

                                                                                                                                                                                                                                                                                                                                                                      • Containers: Events associated with containers.

                                                                                                                                                                                                                                                                                                                                                                      • Kubernetes: Events associated with Kubernetes infrastructure.

                                                                                                                                                                                                                                                                                                                                                                      • Scanning: See Image Scanning.

                                                                                                                                                                                                                                                                                                                                                                      • Policy: See Policies.

                                                                                                                                                                                                                                                                                                                                                                      Event Statuses

                                                                                                                                                                                                                                                                                                                                                                      Overview renders the following alert-generated event statuses:

                                                                                                                                                                                                                                                                                                                                                                      • Triggered: The alert condition has been met and still persists.

                                                                                                                                                                                                                                                                                                                                                                      • Resolved: A previously existed alert condition no longer persists.

                                                                                                                                                                                                                                                                                                                                                                      • Acknowledged: The event has been acknowledged by the intended recipient.

                                                                                                                                                                                                                                                                                                                                                                      • Un-acknowledged: The event has not been acknowledged by an intended recipient. All events are by default marked as Un-acknowledged.

                                                                                                                                                                                                                                                                                                                                                                      • Silenced: The alert event has been silenced for a specified scope. No alert notification will be sent out to the channels during the silenced window.

                                                                                                                                                                                                                                                                                                                                                                      General Guidelines

                                                                                                                                                                                                                                                                                                                                                                      First-Time Usage

                                                                                                                                                                                                                                                                                                                                                                      • If the environment is created for the first time, Sysdig Monitor fetches data and generates associated pages. The Overview feature is immediately enabled. However, wait for, at the maximum, 1 hour to see the Overview pages with the necessary data.

                                                                                                                                                                                                                                                                                                                                                                      • Overview uses time windows in segments of 1H, 6H and 1D, and therefore wait respectively for 1H, 6H and 1D to be able to see data on the Overview pages.

                                                                                                                                                                                                                                                                                                                                                                      • If enough data is not available for the first 1 hour, the “No Data Available” page will be presented until the first 1 hour passes.

                                                                                                                                                                                                                                                                                                                                                                      Tuning Overview Data

                                                                                                                                                                                                                                                                                                                                                                      Sysdig Monitor leverages a caching mechanism to fetch pre-computed data for the Overview screens.

                                                                                                                                                                                                                                                                                                                                                                      If pre-computed data is unavailable, data fetched will be non-computed data, which must be calculated before displaying. This additional computational time adds delays. Caching is enabled for Overview but for optimum performance, you must wait for 1H, 6H, and 1D windows the first time you use Overview. After the specified time has passed, the data will be automatically be cached with every passing minute.

                                                                                                                                                                                                                                                                                                                                                                      Enabling Overview for On-Prem Deployments

                                                                                                                                                                                                                                                                                                                                                                      The Overview feature is not available by default on On-Prem deployments. Use the following API to enable it:

                                                                                                                                                                                                                                                                                                                                                                      1. Get the Beta settings as follows:

                                                                                                                                                                                                                                                                                                                                                                        curl -X GET 'https://<Sysdig URL>/api/on-prem/settings/overviews' \
                                                                                                                                                                                                                                                                                                                                                                        -H 'Authorization: Bearer <GLOBAL_SUPER_ADMIN_SDC_TOKEN>' \
                                                                                                                                                                                                                                                                                                                                                                        -H 'X-Sysdig-Product: SDC' -k
                                                                                                                                                                                                                                                                                                                                                                        

                                                                                                                                                                                                                                                                                                                                                                        Replace <Sysdig URL> with the Sysdig URL associated with your deployment and <GLOBAL_SUPER_ADMIN_SDC_TOKEN> with the SDC token associated with your deployment.

                                                                                                                                                                                                                                                                                                                                                                      2. Copy the payload and change the desired values in the settings.

                                                                                                                                                                                                                                                                                                                                                                      3. Update the settings as follows:

                                                                                                                                                                                                                                                                                                                                                                        curl X PUT 'https://<Sysdig URL>/api/on-prem/settings/overview' \
                                                                                                                                                                                                                                                                                                                                                                        -H 'Authorization: Bearer <GLOBAL_SUPER_ADMIN_SDC_TOKEN>' \
                                                                                                                                                                                                                                                                                                                                                                        -H 'X-Sysdig-Product: SDC' \
                                                                                                                                                                                                                                                                                                                                                                        -d '{  "overviews": true,  "eventScopeExpansion": true}'
                                                                                                                                                                                                                                                                                                                                                                        

                                                                                                                                                                                                                                                                                                                                                                      Feature Flags

                                                                                                                                                                                                                                                                                                                                                                      • overviews: Set overviews to true to enable the backend components and the UI.

                                                                                                                                                                                                                                                                                                                                                                      • eventScopeExpansion: Set eventScopeExpansion to true to enable scope expansion for all the Event types.

                                                                                                                                                                                                                                                                                                                                                                      2.1.1 -

                                                                                                                                                                                                                                                                                                                                                                      Clusters Data

                                                                                                                                                                                                                                                                                                                                                                      This topic discusses the Clusters Overview page and helps you understand its gauge charts and the data displayed on them.

                                                                                                                                                                                                                                                                                                                                                                      About Clusters Overview

                                                                                                                                                                                                                                                                                                                                                                      In Kubernetes, a pool of nodes combine together their resources to form a more powerful machine, that is a Cluster. The Cluster Overview page provides key metrics indicating the health, risk, capacity, and compliance of each cluster. Your cluster can reside in any cloud or multi-cloud environment of your choice.

                                                                                                                                                                                                                                                                                                                                                                      Each row in the Clusters page represents a cluster. Clusters are sorted by the severity of corresponding events in order to highlight the area that needs attention. For example, a cluster with high severity events is bubbled up to the top of the page to highlight the issue. You can further drill down to the Nodes or Namespaces Overview page for investigating at each level.

                                                                                                                                                                                                                                                                                                                                                                      In environments where no Sysdig Secure is enabled, Network I/O is shown instead of the Compliance score.

                                                                                                                                                                                                                                                                                                                                                                      Interpret the Cluster Data

                                                                                                                                                                                                                                                                                                                                                                      This topic gives insight into the metrics displayed on the Clusters Overview screen.

                                                                                                                                                                                                                                                                                                                                                                      Node Ready Status

                                                                                                                                                                                                                                                                                                                                                                      The chart shows the latest value returned by avg(min(kubernetes.node.ready)).

                                                                                                                                                                                                                                                                                                                                                                      What Is It?

                                                                                                                                                                                                                                                                                                                                                                      The number shows the readiness for nodes to accept pods across the entire cluster. The numeric availability indicates the percentage of time the nodes are reported as ready by Kubernetes. For example:

                                                                                                                                                                                                                                                                                                                                                                      • 100% is displayed when 10 out of 10 nodes are ready for the entire time window, say, for the last one hour.

                                                                                                                                                                                                                                                                                                                                                                      • 95% is displayed when 9 out of 10 nodes are ready for the entire time window and one node is ready only for 50% of the time.

                                                                                                                                                                                                                                                                                                                                                                      The bar chart displays the trend across the selected time window, and each bar represents a time slice. For example, selecting the last 1-hour window displays 6 bars, each indicating a 10-minute time slice. Each bar represents the availability across the time slice (green) or the unavailability (red).

                                                                                                                                                                                                                                                                                                                                                                      For instance, the following image shows an average availability of 80% across the last 1-hour, and each 10-minute time slice shows a constant availability for the same time window:

                                                                                                                                                                                                                                                                                                                                                                      What to Expect?

                                                                                                                                                                                                                                                                                                                                                                      Expect a constant 100% at all times.

                                                                                                                                                                                                                                                                                                                                                                      What to Do Otherwise?

                                                                                                                                                                                                                                                                                                                                                                      If the value is less than 100%, determine whether a node is not available at all, or one or more nodes are partially available.

                                                                                                                                                                                                                                                                                                                                                                      • Drill down either to the Nodes screen in Overview or to the “Kubernetes Cluster Overview” in Explore to see the list of nodes and their availability.

                                                                                                                                                                                                                                                                                                                                                                      • Check the Kubernetes Node Overview dashboard in Explore to identify the problem that Kubernetes reports.

                                                                                                                                                                                                                                                                                                                                                                      Pods Available vs Desired

                                                                                                                                                                                                                                                                                                                                                                      The chart shows the latest value returned by sum(avg(kubernetes.namespace.pod.available.count)) / sum(avg(kubernetes.namespace.pod.desired.count)).

                                                                                                                                                                                                                                                                                                                                                                      What Is It?

                                                                                                                                                                                                                                                                                                                                                                      The chart displays the ratio between available and desired pods, averaged across the selected time window, for all the pods in a given Cluster. The upper bound shows the number of desired pods in the Cluster.

                                                                                                                                                                                                                                                                                                                                                                      For instance, the following image shows 42 desired pods are available to use:

                                                                                                                                                                                                                                                                                                                                                                      What to Expect?

                                                                                                                                                                                                                                                                                                                                                                      You should typically expect 100%.

                                                                                                                                                                                                                                                                                                                                                                      If certain pods take a long time to be available you might temporarily see a value that is less than 100%. Pulling images, pod initialization, readiness probe, and so on causes such delays.

                                                                                                                                                                                                                                                                                                                                                                      What to Do Otherwise?

                                                                                                                                                                                                                                                                                                                                                                      Identify one or more Namespaces that have lower availability. To do so, drill down to the Namespaces screen, then drill down to the Workloads screen to identify the unavailable pods.

                                                                                                                                                                                                                                                                                                                                                                      If the number of unavailable pods is considerably higher (the ratio is significantly low), check the status of the Nodes. A Node failure will cause several pods to become unavailable across most of the Namespaces.

                                                                                                                                                                                                                                                                                                                                                                      Several factors could cause the pods to stuck in the Pending state:

                                                                                                                                                                                                                                                                                                                                                                      • Pods make requests for resources that exceed what’s available across the nodes (the remaining allocatable pods).

                                                                                                                                                                                                                                                                                                                                                                      • Pods make requests higher than the availability of every single node. For example, you have 8-core Nodes and you create a pod with a 16-core request. These pods might require reconfiguration and specific setup related to Node affinity and anti-affinity constraints.

                                                                                                                                                                                                                                                                                                                                                                      • Namespace quota is reached before making a high resource request.

                                                                                                                                                                                                                                                                                                                                                                        If a quota is enforced at the Namespace level, you may hit the limit independent of the resource availability across the Nodes.

                                                                                                                                                                                                                                                                                                                                                                      CPU Requests vs Allocatable

                                                                                                                                                                                                                                                                                                                                                                      The chart shows the latest value returned by sum(avg(kubernetes.pod.resourceRequests.cpuCores)) / sum(avg(kubernetes.node.allocatable.cpuCores)).

                                                                                                                                                                                                                                                                                                                                                                      What Is It?

                                                                                                                                                                                                                                                                                                                                                                      The chart displays the ratio between CPU requests configured for all the pods in a selected Cluster and allocatable CPUs across all the nodes.

                                                                                                                                                                                                                                                                                                                                                                      The upper bound shows the number of allocatable CPU cores across all the nodes in the Cluster.

                                                                                                                                                                                                                                                                                                                                                                      For instance, the image below shows that out of 620 available CPU cores across all the nodes (allocatable CPUs), 71% is requested by the pods:

                                                                                                                                                                                                                                                                                                                                                                      What to Expect?

                                                                                                                                                                                                                                                                                                                                                                      Your resource utilization strategy determines what ratio you can expect. A healthy ratio falls between 50% and 80%.

                                                                                                                                                                                                                                                                                                                                                                      Assuming all the nodes have the same amount of allocatable resources, a reasonable upper bound is the value of (node_count - 1) / node_count x 100. For example, the ratio will be 90% if you have 9 nodes. Having this percentage protects you against a node becoming unavailable.

                                                                                                                                                                                                                                                                                                                                                                      What to Do Otherwise?

                                                                                                                                                                                                                                                                                                                                                                      A lower ratio indicates under-utilized resources (and corresponding cost) in your infrastructure. A higher ratio indicates insufficient resources. As a result

                                                                                                                                                                                                                                                                                                                                                                      • Applications cannot be scheduled to be run.

                                                                                                                                                                                                                                                                                                                                                                      • Pods might not start and remain in a Pending/Unscheduled state.

                                                                                                                                                                                                                                                                                                                                                                      To triage, do the following:

                                                                                                                                                                                                                                                                                                                                                                      • Drill down to the Nodes screen to get insights into how resources are utilized across all nodes.

                                                                                                                                                                                                                                                                                                                                                                      • Drill down to the Namespaces screen to understand how resources are requested across Namespaces.

                                                                                                                                                                                                                                                                                                                                                                      • Drill down to Explore and refer to the following dashboards:

                                                                                                                                                                                                                                                                                                                                                                        • Kubernetes CPU Allocation Optimization: Evaluate whether a significant amount of resources are under-utilized in the infrastructure.

                                                                                                                                                                                                                                                                                                                                                                        • Kubernetes Workloads CPU Usage and Allocation: Determine whether pods are properly configured and are using resources as expected.

                                                                                                                                                                                                                                                                                                                                                                      Can the Value Be Higher than 100%?

                                                                                                                                                                                                                                                                                                                                                                      Currently, the ratio accounts only for scheduled pods, while pending pods are excluded from the calculation. This means pods have been scheduled to run on Nodes out of the allocatable pods. Consequently, the ratio cannot be higher than 100%.

                                                                                                                                                                                                                                                                                                                                                                      In the case of over-commitment (pods requesting for more resources than what’s available), you can expect a higher Requests vs Allocatable ratio and a lower Pods Available vs Desired ratio. What it indicates is that most of the available resources are being used, and what’s left is not enough to schedule additional pods. Therefore, the Available vs Desired ratio for pods will decrease.

                                                                                                                                                                                                                                                                                                                                                                      When your environment has pods that are updated often or that are deleted and created often (for example, testing Clusters), the total requests might appear higher than what it is at any given time. Consequently, the ratio becomes higher across the selected time window, and you might see a value that is higher than 100%. This error is rendered due to how the data engine calculates the aggregated ratio.

                                                                                                                                                                                                                                                                                                                                                                      Drill down to Kubernetes Cluster Overview to see the CPU Cores Usage vs Requests vs Allocatable time series to correctly evaluate the trend of the request commitments.

                                                                                                                                                                                                                                                                                                                                                                      Listed below are some of the factors that could cause the pods to stuck in a Pending state:

                                                                                                                                                                                                                                                                                                                                                                      • Pods make requests that exceed what’s available across the nodes (the remaining allocatable pods). The Requests vs Allocatable ratio is an indicator of this issue.

                                                                                                                                                                                                                                                                                                                                                                      • Pods make requests that are higher than the availability of every single Node. For example, you have 8-core Nodes and you create a pod with a 16-core request. These pods might require reconfiguration and specific setup related to Node affinity and anti-affinity constraints.

                                                                                                                                                                                                                                                                                                                                                                      • The Quota set at the Namespace level is reached before a request is configured. The Requests vs Allocatable ratio may not suggest the problem, but the Pods Available vs Desired ratio would decrease, especially for the specific Namespaces. See the Namespaces screen in Overview.

                                                                                                                                                                                                                                                                                                                                                                      Memory Requests vs Allocatable

                                                                                                                                                                                                                                                                                                                                                                      The chart shows the latest value returned by sum(avg(kubernetes.pod.resourceRequests.memBytes)) / sum(avg(kubernetes.node.allocatable.memBytes)).

                                                                                                                                                                                                                                                                                                                                                                      What Is It?

                                                                                                                                                                                                                                                                                                                                                                      The chart displays the ratio between memory requests configured for all the pods in the Cluster and allocatable memory available across all the Nodes.

                                                                                                                                                                                                                                                                                                                                                                      The upper bound shows the allocatable memory available across all Nodes. The value is expressed in bytes, displayed in a specified unit.

                                                                                                                                                                                                                                                                                                                                                                      For instance, the image below shows that out of 29.7 GiB available across all Nodes (allocatable memory), 35% is requested by the pods:

                                                                                                                                                                                                                                                                                                                                                                      What to Expect?

                                                                                                                                                                                                                                                                                                                                                                      Your resource utilization strategy determines what ratio you can expect. A healthy ratio falls between 50% and 80%.

                                                                                                                                                                                                                                                                                                                                                                      Assuming all the nodes have the same amount of allocatable resources, a reasonable upper bound is the value of (node_count - 1) / node_count x 100. For example, 90% if you have 9 nodes. This ratio protects your system against a node becoming unavailable.

                                                                                                                                                                                                                                                                                                                                                                      What to do Otherwise

                                                                                                                                                                                                                                                                                                                                                                      A lower ratio indicates under-utilized resources (and corresponding cost) in your infrastructure. A higher ratio indicates insufficient resources. As a result

                                                                                                                                                                                                                                                                                                                                                                      • Applications cannot be scheduled to be run.

                                                                                                                                                                                                                                                                                                                                                                      • Pods might not start and remain in a Pending/Unscheduled state.

                                                                                                                                                                                                                                                                                                                                                                      To troubleshoot, do the following:

                                                                                                                                                                                                                                                                                                                                                                      • Drill down to the Nodes screen to get insights into how resources are utilized across all the Nodes.

                                                                                                                                                                                                                                                                                                                                                                      • Drill down to the Namespaces screen to understand how resources are requested across Namespaces.

                                                                                                                                                                                                                                                                                                                                                                      • Drill down to Explore and refer to the following dashboards:

                                                                                                                                                                                                                                                                                                                                                                        • Kubernetes Memory Allocation Optimization: Evaluate whether a significant amount of resources are under-utilized in the infrastructure.

                                                                                                                                                                                                                                                                                                                                                                        • Kubernetes Workloads Memory Usage and Allocation: Determine whether pods are properly configured and are using resources as expected.

                                                                                                                                                                                                                                                                                                                                                                      Can the Value be Higher than 100%?

                                                                                                                                                                                                                                                                                                                                                                      The ratio currently accounts only for scheduled pods, while pending pods are excluded from the calculation. What this implies is that pods have been scheduled to run on Nodes out of the allocatable resources available. Consequently, the ratio cannot be higher than 100%.

                                                                                                                                                                                                                                                                                                                                                                      In the case of over-commitment (pods requesting for more resources than what’s available), expect a higher Requests vs Allocatable ratio and a lower Pods Available vs Desired ratio. What it indicates is that most of the available resources have been used and what’s left is not enough to schedule additional pods. Therefore, the Pods Available vs Desired ratio will decrease.

                                                                                                                                                                                                                                                                                                                                                                      When your environment has pods that are updated often or that are deleted and created often (for example, testing Clusters), the total requests might appear higher than what it is at any given time. Consequently, the ratio becomes higher across the selected time window, and you might see a value that is higher than 100%. This error is rendered due to how the data engine calculates the aggregated ratio.

                                                                                                                                                                                                                                                                                                                                                                      Drill down to Kubernetes Cluster Overview to see the Memory Requests vs Allocatable time series to correctly evaluate the trend for the request commitments.

                                                                                                                                                                                                                                                                                                                                                                      Listed are some of the factors that could cause your pods to stuck in a Pending state:

                                                                                                                                                                                                                                                                                                                                                                      • Pods make requests that exceed what’s available across the nodes (the remaining allocatable pods). The Requests vs Allocatable ratio is an indicator of this issue.

                                                                                                                                                                                                                                                                                                                                                                      • Pods make requests that are higher than the availability of every single Node. For example, you have 8-core nodes and you create a pod with a 16-core request. These pods might require configuration changes and specific setup related to node affinity and anti-affinity factors.

                                                                                                                                                                                                                                                                                                                                                                      • The Quota set at the Namespace-level is reached before a high request is configured. The Requests vs Allocatable ratio might not suggest the problem, but the Pods Available vs Desired ratio would decrease, especially for the specific Namespaces. See the Namespaces screen in Overview.

                                                                                                                                                                                                                                                                                                                                                                      Compliance Score

                                                                                                                                                                                                                                                                                                                                                                      Docker: The latest value returned by avg(avg(compliance.k8s-bench.pass_pct)).

                                                                                                                                                                                                                                                                                                                                                                      Kubernetes: The latest value returned by avg(avg(compliance.docker-bench.pass_pct)).

                                                                                                                                                                                                                                                                                                                                                                      What Is it?

                                                                                                                                                                                                                                                                                                                                                                      The numbers show the percentage of benchmarks that succeeded in the selected time window, respectively for Docker and Kubernetes entities.

                                                                                                                                                                                                                                                                                                                                                                      What to Expect

                                                                                                                                                                                                                                                                                                                                                                      If you do not have Sysdig Secure enabled, or you do not have benchmarks scheduled, then you should expect no data available.

                                                                                                                                                                                                                                                                                                                                                                      Otherwise, the higher the score, the more compliant your infrastructure is.

                                                                                                                                                                                                                                                                                                                                                                      What to Do Otherwise?

                                                                                                                                                                                                                                                                                                                                                                      If the score is lower than expected, drill down to Docker Compliance Report or Kubernetes Compliance Report to see further details about benchmark checks and their results.

                                                                                                                                                                                                                                                                                                                                                                      You may also want to use the Benchmarks / Results page in Sysdig Secure to see the history of checks.

                                                                                                                                                                                                                                                                                                                                                                      2.1.2 -

                                                                                                                                                                                                                                                                                                                                                                      Nodes Data

                                                                                                                                                                                                                                                                                                                                                                      This topic discusses the Nodes Overview page and helps you understand its gauge charts and the data displayed on them.

                                                                                                                                                                                                                                                                                                                                                                      About Nodes Overview

                                                                                                                                                                                                                                                                                                                                                                      A node refers to a worker machine in Kubernetes. A physical machine or VM can represent a node. The Nodes Overview page provides key metrics indicating the health, capacity, and compliance of each node in your cluster.

                                                                                                                                                                                                                                                                                                                                                                      In environments where no Sysdig Secure is enabled, Network I/O is shown instead of the Compliance score.

                                                                                                                                                                                                                                                                                                                                                                      Interpret the Nodes Data

                                                                                                                                                                                                                                                                                                                                                                      This topic gives insight into the metrics displayed on the Nodes Overview page.

                                                                                                                                                                                                                                                                                                                                                                      Node Ready Status

                                                                                                                                                                                                                                                                                                                                                                      The chart shows the latest value returned by avg(min(kubernetes.node.ready)).

                                                                                                                                                                                                                                                                                                                                                                      What Is It?

                                                                                                                                                                                                                                                                                                                                                                      The number expresses the Node readiness to accept pods across the Cluster. The numeric availability indicates the percentage of time the Node is reported ready by Kubernetes. For example:

                                                                                                                                                                                                                                                                                                                                                                      • 100% is displayed when a Node is ready for the entire time window, say, for the last one hour.

                                                                                                                                                                                                                                                                                                                                                                      • 95% when the Node is ready for 95% of the time window, say, 57 out of 60 minutes.

                                                                                                                                                                                                                                                                                                                                                                      The bar chart displays the trend across the selected time window, and each bar represents a time slice. For example, selecting “last 1 hour” displays 6 bars, each indicating a 10-minute time slice. Each bar shows the availability across the time slice (green) and the unavailability (red).

                                                                                                                                                                                                                                                                                                                                                                      For instance, the image below indicates the Node has not been ready for the entire last 1-hour time window:

                                                                                                                                                                                                                                                                                                                                                                      What to Expect?

                                                                                                                                                                                                                                                                                                                                                                      The chart should show a constant 100% at all times.

                                                                                                                                                                                                                                                                                                                                                                      What to Do Otherwise?

                                                                                                                                                                                                                                                                                                                                                                      If the number is less than 100%, review the status reported by Kubernetes. Drill-down to the Kubernetes Node Overview Dashboard in Explore to see details about the Node readiness:

                                                                                                                                                                                                                                                                                                                                                                      If the Node Ready Status has an alternating behavior, as shown in the image, the node is flapping. Flapping indicates that the kubelet is not healthy. See specific conditions reported by Kubernetes that would help determine the causes for the Node not being ready. Such conditions include network issues and memory pressure.

                                                                                                                                                                                                                                                                                                                                                                      Pods Ready vs Allocatable

                                                                                                                                                                                                                                                                                                                                                                      The chart reports the latest value of sum(avg(kubernetes.pod.status.ready)) / avg(avg(kubernetes.node.allocatable.pods)).

                                                                                                                                                                                                                                                                                                                                                                      What Is It?

                                                                                                                                                                                                                                                                                                                                                                      It is the ratio between available and allocatable pods configured on the node, averaged across the selected time window.

                                                                                                                                                                                                                                                                                                                                                                      The Clusters page includes a similar chart named Pods Available vs Desired. However, the meaning is different:

                                                                                                                                                                                                                                                                                                                                                                      • The Pods Available vs Desired chart for Clusters highlights how many pods you expect and how many are actually available. See IsPodAvailable for a detailed definition.

                                                                                                                                                                                                                                                                                                                                                                      • The Pods Ready vs Allocatable chart for Nodes indicates how many pods can be scheduled on each Node and how many are actually ready.

                                                                                                                                                                                                                                                                                                                                                                      The upper bound shows the number of pods you can allocate in the node. See node configuration.

                                                                                                                                                                                                                                                                                                                                                                      For instance, the image below indicates that you can allocate 110 pods in the Node (default configuration), but only 11 pods are ready:

                                                                                                                                                                                                                                                                                                                                                                      What to Expect?

                                                                                                                                                                                                                                                                                                                                                                      The ratio does not relate to resource utilization, but it measures the pod density on each node. The more pods you have on a single node, the more effort the kubelet has to put in order to manage the pods, the routing mechanism, and Kubernetes overall.

                                                                                                                                                                                                                                                                                                                                                                      Given the allocatable is properly set, values lower than 80% indicate a healthy status.

                                                                                                                                                                                                                                                                                                                                                                      What to Do Otherwise?

                                                                                                                                                                                                                                                                                                                                                                      • Reviewing the default maximum pods configuration of the kubelet to allow more pods, especially if the CPU and memory utilization is healthy.

                                                                                                                                                                                                                                                                                                                                                                      • Adding more nodes to allow for more pods to be scheduled.

                                                                                                                                                                                                                                                                                                                                                                      • Reviewing kubelet process performance and Node resource utilization in general. A higher ratio indicates high pressure on the operating system and for Kubernetes itself.

                                                                                                                                                                                                                                                                                                                                                                      CPU Requests vs Allocatable

                                                                                                                                                                                                                                                                                                                                                                      The chart shows the latest value returned by sum(avg(kubernetes.pod.resourceRequests.cpuCores)) / sum(avg(kubernetes.node.allocatable.cpuCores)).

                                                                                                                                                                                                                                                                                                                                                                      What Is It?

                                                                                                                                                                                                                                                                                                                                                                      The chart shows the ratio between the number of CPU cores requested by the pods scheduled on the Node and the number of cores available to pods. The upper bound shows the CPU cores available to pods, which corresponds to the user-defined configuration for allocatable CPU.

                                                                                                                                                                                                                                                                                                                                                                      For instance, the image below shows that the Node has 16 CPU cores available, out of which, 84% are requested by the pods scheduled on the Node:

                                                                                                                                                                                                                                                                                                                                                                      What to Expect?

                                                                                                                                                                                                                                                                                                                                                                      Expect a value up to 80%.

                                                                                                                                                                                                                                                                                                                                                                      Assuming all the nodes have the same amount of allocatable resources, a reasonable upper bound is the value of (node_count - 1) / node_count x 100. For example, 90% if you have 9 nodes. Having a high ratio protects your system against a Node becoming unavailable.

                                                                                                                                                                                                                                                                                                                                                                      What to Do Otherwise?

                                                                                                                                                                                                                                                                                                                                                                      • A low ratio indicates the Node is underutilized. Drill up to the corresponding cluster in the Clusters page to determine whether the number of pods currently running is lower, or if the pods cannot run for other reasons.

                                                                                                                                                                                                                                                                                                                                                                      • A high ratio indicates a potential risk of being unable to schedule additional pods on the Node.

                                                                                                                                                                                                                                                                                                                                                                        Drill down to the  Kubernetes Node Overview Dashboard to evaluate what Namespaces, Workloads, and pods are running. Additionally, drill up in the Clusters page to evaluate whether you are over-committing the CPU resource. You might not have enough resources to fulfill requests, and consequently, pods might not be able to run on the Node. Consider adding Nodes or replacing Nodes with additional CPU cores.

                                                                                                                                                                                                                                                                                                                                                                      Can the Value Be Higher than 100%?

                                                                                                                                                                                                                                                                                                                                                                      Kubernetes schedules pods on Nodes where sufficient allocatable resources are available to fulfill the pod request. This means Kubernetes does not allow having a total request higher than the allocatable. Consequently, the ratio cannot be higher than 100%.

                                                                                                                                                                                                                                                                                                                                                                      Over-committing (pods requesting resources higher than the capacity) results in a high Requests vs Allocatable ratio and a low Pods Available vs Desired ratio at the Cluster level. What it indicates is that most of the available resources are being used, consequently, what’s available is not sufficient to schedule additional pods. Therefore, Pods Available vs Desired ratio will also decrease.

                                                                                                                                                                                                                                                                                                                                                                      Memory Requests vs Allocatable

                                                                                                                                                                                                                                                                                                                                                                      The chart highlights the latest value returned by sum(avg(kubernetes.pod.resourceRequests.memBytes)) / sum(avg(kubernetes.node.allocatable.memBytes)).

                                                                                                                                                                                                                                                                                                                                                                      What Is It?

                                                                                                                                                                                                                                                                                                                                                                      The ratio between the number of bytes of memory is requested by the pods scheduled on the node and the number of bytes of memory available.The upper bound shows the memory available to pods, which corresponds to the user-defined allocatable memory configuration.

                                                                                                                                                                                                                                                                                                                                                                      For instance, the image below indicates the node has 62.8 GiB of memory available, out of which, 37% is requested by the pods scheduled on the Node:

                                                                                                                                                                                                                                                                                                                                                                      What to Expect?

                                                                                                                                                                                                                                                                                                                                                                      A healthy ratio falls under 80%.

                                                                                                                                                                                                                                                                                                                                                                      Assuming all the nodes have the same amount of allocatable resources, a reasonable upper bound is the value of (node_count - 1) / node_count x 100. For example, the ratio is 90% if you have 9 nodes. Having a high ratio protects your system against a node becoming unavailable.

                                                                                                                                                                                                                                                                                                                                                                      What to Do Otherwise?

                                                                                                                                                                                                                                                                                                                                                                      • A low ratio indicates that the Node is underutilized. Drill up to the corresponding cluster in the Clusters page to determine whether the number of pods running is low, or if pods cannot run for other reasons.

                                                                                                                                                                                                                                                                                                                                                                      • A high ratio indicates a potential risk of being unable to schedule additional pods on the node.

                                                                                                                                                                                                                                                                                                                                                                        • Drill down to the  Kubernetes Node Overview dashboard to evaluate what Namespaces, Workloads, and pods are running.

                                                                                                                                                                                                                                                                                                                                                                        • Additionally, drill up in the Clusters page to evaluate whether you are over-committing the memory resource. Consequently, you don’t have enough resources to fulfill requests, and pods might not be able to run. Consider adding nodes or replacing nodes with more memory.

                                                                                                                                                                                                                                                                                                                                                                      Can the Value be Higher than 100%?

                                                                                                                                                                                                                                                                                                                                                                      Kubernetes schedules pods on nodes where sufficient allocatable resources are available to fulfill the pod request. This means Kubernetes does not allow having a total request higher than the allocatable. Consequently, the ratio cannot be higher than 100%.

                                                                                                                                                                                                                                                                                                                                                                      Over-committing (pods requesting for more resources than that are available) results in a high Requests vs Allocatable ratio at the Nodes level and a low Pods Available vs Desired ratio at the Cluster level. What it indicates is that most of the resources are being used, consequently, what’s available is not sufficient to schedule additional pods. Therefore, Pods Available vs Desired ratio will also decrease.

                                                                                                                                                                                                                                                                                                                                                                      Network I/O

                                                                                                                                                                                                                                                                                                                                                                      The chart shows the latest value returned by avg(avg(net.bytes.total)).

                                                                                                                                                                                                                                                                                                                                                                      What Is It?

                                                                                                                                                                                                                                                                                                                                                                      The sparkline shows the trend of network traffic (inbound and outbound) for a Node. The number indicates the most recent rate of restarts per second.

                                                                                                                                                                                                                                                                                                                                                                      For reference, the sparklines show the following number of steps (sampling):

                                                                                                                                                                                                                                                                                                                                                                      • Last hour: 6 steps, each for a 10-minute time slice

                                                                                                                                                                                                                                                                                                                                                                      • Last 6 hours: 12 steps, each for a 20-minute time slice

                                                                                                                                                                                                                                                                                                                                                                      • Last day: 12 steps, each for a 2-hour time slice

                                                                                                                                                                                                                                                                                                                                                                      What to Expect?

                                                                                                                                                                                                                                                                                                                                                                      The metric highly depends on what type of applications run on the Node. You should expect some network activity for Kubernetes related operations.

                                                                                                                                                                                                                                                                                                                                                                      Drilling down to the Kubernetes Node Overview Dashboard in Explore will provide additional details, such as network activity across pods.

                                                                                                                                                                                                                                                                                                                                                                      2.1.3 -

                                                                                                                                                                                                                                                                                                                                                                      Namespaces Data

                                                                                                                                                                                                                                                                                                                                                                      This topic discusses the Namespaces Overview page and helps you understand its gauge charts and the data displayed on them.

                                                                                                                                                                                                                                                                                                                                                                      About Namespaces Overview

                                                                                                                                                                                                                                                                                                                                                                      Namespaces are virtual clusters on a physical cluster. They provide logical separation between the teams and their environments. The Namespaces Overview page provides key metrics indicating the health, capacity, and performance of each Namespace in your cluster.

                                                                                                                                                                                                                                                                                                                                                                      Interpret the Namespaces Data

                                                                                                                                                                                                                                                                                                                                                                      This topic gives insight into the metrics displayed on the Namespaces Overview screen.

                                                                                                                                                                                                                                                                                                                                                                      Pod Restarts

                                                                                                                                                                                                                                                                                                                                                                      The chart highlights the latest value returned by avg(timeAvg(kubernetes.pod.restart.rate)).

                                                                                                                                                                                                                                                                                                                                                                      What Is It?

                                                                                                                                                                                                                                                                                                                                                                      The sparkline shows the trend of pod restarts rate across all the pods in a selected Namespace. The number shows the most recent rate of restarts per second.

                                                                                                                                                                                                                                                                                                                                                                      For instance, the image shows a rate of 0.04 restarts per second for the last 2-hours, given the selected time window is one day. The trend also suggests a non-flat pattern (periodic crashes).

                                                                                                                                                                                                                                                                                                                                                                      • Last hour: 6 steps, each for a 10-minute time slice

                                                                                                                                                                                                                                                                                                                                                                      • Last 6 hours: 12 steps, each for a 20-minute time slice

                                                                                                                                                                                                                                                                                                                                                                      • Last day: 12 steps, each for a 2-hour time slice

                                                                                                                                                                                                                                                                                                                                                                      What to Expect?

                                                                                                                                                                                                                                                                                                                                                                      Expect 0 restarts for any pod.

                                                                                                                                                                                                                                                                                                                                                                      What to Do Otherwise?

                                                                                                                                                                                                                                                                                                                                                                      A few restarts across the last one hour or larger time windows might not indicate a serious problem. In the event restart loop, identify the root cause as follows:

                                                                                                                                                                                                                                                                                                                                                                      • Drill down to the Workloads page in Overview to identify the Workloads that have been stuck at a restart loop.

                                                                                                                                                                                                                                                                                                                                                                      • Drill down to the Kubernetes Namespace Overview to see a detailed trend broken down by pods:

                                                                                                                                                                                                                                                                                                                                                                      Pods Available vs Desired

                                                                                                                                                                                                                                                                                                                                                                      The chart shows the latest value returned by sum(avg(kubernetes.namespace.pod.available.count)) / sum(avg(kubernetes.namespace.pod.desired.count)).

                                                                                                                                                                                                                                                                                                                                                                      What Is It?

                                                                                                                                                                                                                                                                                                                                                                      The chart displays the ratio between available and desired pods, averaged across the selected time window, in a given Namespace.

                                                                                                                                                                                                                                                                                                                                                                      The upper bound shows the number of desired pods in the namespace.

                                                                                                                                                                                                                                                                                                                                                                      For instance, the image below shows 42 desired pods that are available:

                                                                                                                                                                                                                                                                                                                                                                      What to Expect?

                                                                                                                                                                                                                                                                                                                                                                      Expect 100% on the chart.

                                                                                                                                                                                                                                                                                                                                                                      If certain pods take a significant amount of time to become available due to delays (image pull time, pod initialization, readiness probe) you might temporarily see a ratio lower than 100%.

                                                                                                                                                                                                                                                                                                                                                                      What to Do Otherwise?

                                                                                                                                                                                                                                                                                                                                                                      • Identify one or more Workloads that have low availability by drilling down to the Workloads page.

                                                                                                                                                                                                                                                                                                                                                                      • Once you identify the Workload, drill down to the related dashboard in Explore. For example, Kubernetes Deployment Overview to determine the trend and the state of the pods.

                                                                                                                                                                                                                                                                                                                                                                        For instance, in the following image, the ratio is 98% (3.93 / 4 x 100). The decline is due to an update that caused pods to be terminated and consequently to be started with a newer version.

                                                                                                                                                                                                                                                                                                                                                                      CPU Used vs Requests

                                                                                                                                                                                                                                                                                                                                                                      The chart shows the latest value returned by sum(avg(cpu.cores.used)) / sum(avg(kubernetes.pod.resourceRequests.cpuCores)).

                                                                                                                                                                                                                                                                                                                                                                      What Is It?

                                                                                                                                                                                                                                                                                                                                                                      The chart shows the ratio between the total CPU usage across all the pods in the Namespace and the total CPU requested by all the pods.

                                                                                                                                                                                                                                                                                                                                                                      The upper bound shows the total CPU requested by all the pods. The value is expressed as the number of CPU cores.

                                                                                                                                                                                                                                                                                                                                                                      For instance, the image below shows the pods in a Namespace requests for 40 CPU cores, of which only 43% is being used (about 17 cores):

                                                                                                                                                                                                                                                                                                                                                                      What to Expect?

                                                                                                                                                                                                                                                                                                                                                                      The value you see depends on the type of Workloads running in the Namespace.

                                                                                                                                                                                                                                                                                                                                                                      Typically, values that fall between 80% and 120% is considered healthy. Values higher than 100% is considered healthy relatively for a short amount of time.

                                                                                                                                                                                                                                                                                                                                                                      For applications whose resource usage is constant (such as background processes), expect the ratio to be close to 100%.

                                                                                                                                                                                                                                                                                                                                                                      For “bursty” applications, such as an API server, expect the ratio to be less than 100%. Note that this value is averaged for the selected time window, therefore, a usage spike would be compensated by an idle period.

                                                                                                                                                                                                                                                                                                                                                                      What to Do Otherwise?

                                                                                                                                                                                                                                                                                                                                                                      A low usage indicates that the application is not properly running (not executing the expected functions) or the Workload configuration is not accurate (requests are too high compared to what the pods actually need).

                                                                                                                                                                                                                                                                                                                                                                      A high usage indicates that the application is operating with a heavy load or the workload configuration is not accurate (requests are too low compared to what pods actually need).

                                                                                                                                                                                                                                                                                                                                                                      In either case, drill down to the Workloads page to determine the workload that requires a deeper analysis.

                                                                                                                                                                                                                                                                                                                                                                      Can the Value Be Higher than 100%?

                                                                                                                                                                                                                                                                                                                                                                      Yes, it can.

                                                                                                                                                                                                                                                                                                                                                                      • You can configure requests without limits, or requests lower than the limits. In either case, you are allowing the containers to use more resources than requested, typically to handle temporary overloads.

                                                                                                                                                                                                                                                                                                                                                                      • Consider a Namespace with two Workloads with one pod each. Say, one Workload is configured to request for 1 CPU core and uses 1 CPU core (ratio of Used vs Request is 100%). The other Workload is configured without any request and uses 1 CPU core. In this example, 2 CPU cores used to 1 CPU core requested ratio at the Namespace level is 200%.

                                                                                                                                                                                                                                                                                                                                                                      Memory Used vs Requests

                                                                                                                                                                                                                                                                                                                                                                      The chart shows the latest value returned by sum(avg(memory.bytes.used)) / sum(avg(kubernetes.pod.resourceRequests.memBytes)).

                                                                                                                                                                                                                                                                                                                                                                      What Is It?

                                                                                                                                                                                                                                                                                                                                                                      The chart shows the ratio between the total memory usage across all pods of the Namespace and the total memory requested by all pods.

                                                                                                                                                                                                                                                                                                                                                                      The upper bound shows the total memory requested by all the pods, expressed in a specified unit for bytes.

                                                                                                                                                                                                                                                                                                                                                                      For instance, the image below shows that all the pods in the Namespace requests for 120 GiB, of which only 24% is being used (about 29 GiB):

                                                                                                                                                                                                                                                                                                                                                                      What to Expect?

                                                                                                                                                                                                                                                                                                                                                                      It depends on the type of Workloads you run in the Namespace. Typically, values that fall between 80% and 120% are considered healthy.

                                                                                                                                                                                                                                                                                                                                                                      Values that are higher than 100% considered normal for a relatively short amount of time.

                                                                                                                                                                                                                                                                                                                                                                      What to Do Otherwise?

                                                                                                                                                                                                                                                                                                                                                                      A low usage indicates the application is not properly running (not executing the expected functions) or the workload configuration is not accurate (high requests compared to what the pods actually need).

                                                                                                                                                                                                                                                                                                                                                                      A high usage indicates the application is operating with a high load or the Workload configuration is not accurate (Fewer requests compared to what the pods actually need).

                                                                                                                                                                                                                                                                                                                                                                      Given the configured limits for the Workloads and the memory pressure on the nodes, if the Workloads use more memory than what’s requested they are at risk of eviction. See Exceed a Container’s Limit for more information.

                                                                                                                                                                                                                                                                                                                                                                      In both cases, you may want to drill down to the Workloads page to determine which Workload requires a deeper analysis.

                                                                                                                                                                                                                                                                                                                                                                      Can the Value Be Higher than 100%?

                                                                                                                                                                                                                                                                                                                                                                      Yes, it can.

                                                                                                                                                                                                                                                                                                                                                                      • You can configure requests without limits, or requests lower than the limits. In either case, you are allowing the containers to use more resources than requested, typically to handle temporary overloads.

                                                                                                                                                                                                                                                                                                                                                                      • Consider a Namespace with two Workloads with one pod each. Say, one Workload is configured to request for 1 GiB of memory and uses 1 GiB (Used vs Request ratio is 100%). The other Workload is configured without any request and uses 1 GiB. In this example, 2 GiB of Memory Used to1 GiB Requested ratio at the Namespace level is 200%.

                                                                                                                                                                                                                                                                                                                                                                      Network I/O

                                                                                                                                                                                                                                                                                                                                                                      The chart shows the latest value returned by avg(avg(net.bytes.total)).

                                                                                                                                                                                                                                                                                                                                                                      What Is It?

                                                                                                                                                                                                                                                                                                                                                                      The sparkline shows the trend of network traffic (inbound and outbound) for all the pods in the Namespace. The number shows the most recent rate, expressed in restarts per second.

                                                                                                                                                                                                                                                                                                                                                                      For reference, the sparklines show the following number of steps (sampling):

                                                                                                                                                                                                                                                                                                                                                                      • Last hour: 6 steps, each for a 10-minute time slice

                                                                                                                                                                                                                                                                                                                                                                      • Last 6 hours: 12 steps, each for a 30-minute time slice

                                                                                                                                                                                                                                                                                                                                                                      • Last day: 12 steps, each for a 2-hour time slice

                                                                                                                                                                                                                                                                                                                                                                      What to Expect?

                                                                                                                                                                                                                                                                                                                                                                      The type of applications run in the Namespace determine the metrics. Drilling down to the Kubernetes Namespace Overview Dashboard in Explore provides additional details, such as network activity across pods.

                                                                                                                                                                                                                                                                                                                                                                      2.1.4 -

                                                                                                                                                                                                                                                                                                                                                                      Workloads Data

                                                                                                                                                                                                                                                                                                                                                                      This topic discusses the Workloads Overview page and helps you understand its gauge charts and the data displayed on them.

                                                                                                                                                                                                                                                                                                                                                                      About Workloads Overview

                                                                                                                                                                                                                                                                                                                                                                      Workloads, in Kubernetes terminology, refers to your containerized applications. Workloads comprise of Deployments, Statefulsets, and Daemonsets within a Namespace.

                                                                                                                                                                                                                                                                                                                                                                      In a Cluster, worker nodes run your application workloads, whereas the master node provides the core Kubernetes services and orchestration for application workloads. The Workloads Overview page provides the key metrics indicating health, capacity, and compliance.

                                                                                                                                                                                                                                                                                                                                                                      Interpret the Workloads Data

                                                                                                                                                                                                                                                                                                                                                                      This topic gives insight into the metrics displayed on the Workloads Overview page.

                                                                                                                                                                                                                                                                                                                                                                      Pod Restarts

                                                                                                                                                                                                                                                                                                                                                                      The chart displays the latest value returned by sum(timeAvg(kubernetes.pod.restart.rate)).

                                                                                                                                                                                                                                                                                                                                                                      What Is It?

                                                                                                                                                                                                                                                                                                                                                                      The sparkline shows the trend of Pod Restarts rate across all the pods in a selected Workload. The number shows the most recent rate, expressed in Restarts per Second.

                                                                                                                                                                                                                                                                                                                                                                      For instance, the image below shows the trend for the last hour. The number indicates that the rate of pod restarts is less than 0.01 for the last 10 minutes.

                                                                                                                                                                                                                                                                                                                                                                      For reference, the sparklines show the following number of steps (sampling):

                                                                                                                                                                                                                                                                                                                                                                      • Last hour: 6 steps, each for a 10-minute time slice.

                                                                                                                                                                                                                                                                                                                                                                      • Last 6 hours: 12 steps, each for a 20-minute time slice.

                                                                                                                                                                                                                                                                                                                                                                      • Last day: 12 steps, each for a 2-hour time slice.

                                                                                                                                                                                                                                                                                                                                                                      What to Expect?

                                                                                                                                                                                                                                                                                                                                                                      A healthy pod will have 0 restarts at any given time.

                                                                                                                                                                                                                                                                                                                                                                      What to Do Otherwise?

                                                                                                                                                                                                                                                                                                                                                                      In most cases, fewer restarts in the last hour (or larger time windows) do not indicate a serious problem. Drill down to the Kubernetes Overview Dashboard related to the Workload in Explore. For example, Kubernetes StatefulSet Overview provides a detailed trend broken down by pods.

                                                                                                                                                                                                                                                                                                                                                                      In this example, the number of restarts is constant (roughly every 5 minutes) and no pods are ready. This might indicate a crash loop back-off .

                                                                                                                                                                                                                                                                                                                                                                      Pods Available vs Desired

                                                                                                                                                                                                                                                                                                                                                                      The chart shows the latest value of returned by sum(avg(kubernetes.deployment.replicas.available)) / sum(avg(kubernetes.deployment.replicas.desired)).

                                                                                                                                                                                                                                                                                                                                                                      What Is It?

                                                                                                                                                                                                                                                                                                                                                                      The chart displays the ratio between available and desired pods, averaged across the selected time window, for all the pods in a given Workload.

                                                                                                                                                                                                                                                                                                                                                                      The upper bound shows the number of desired pods in the Workload.

                                                                                                                                                                                                                                                                                                                                                                      For instance, the image below shows all the 42 desired pods are available.

                                                                                                                                                                                                                                                                                                                                                                      What to Expect?

                                                                                                                                                                                                                                                                                                                                                                      You should typically expect 100%.

                                                                                                                                                                                                                                                                                                                                                                      If certain pods take a significant amount of time to become available (image pull time, pod initialization, readiness probe), then you may temporarily see a ratio lower than 100%.

                                                                                                                                                                                                                                                                                                                                                                      What to Do Otherwise?

                                                                                                                                                                                                                                                                                                                                                                      Determine the Workloads that have low availability by drilling down to the related Dashboard in Explore. For example, the Kubernetes Deployment Overview helps understand the trend and the state of the pods.

                                                                                                                                                                                                                                                                                                                                                                      For instance, the image above shows that the ratio is 98% (3.93 / 4 x 100). The slight decline is due to an update that caused pods to be terminated and consequently to be started with a newer version.

                                                                                                                                                                                                                                                                                                                                                                      CPU Used vs Requests

                                                                                                                                                                                                                                                                                                                                                                      The chart shows the latest value returned by sum(avg(cpu.cores.used)) / sum(avg(kubernetes.pod.resourceRequests.cpuCores)).

                                                                                                                                                                                                                                                                                                                                                                      What Is It?

                                                                                                                                                                                                                                                                                                                                                                      The chart shows the ratio between the total CPU usage across all pods of a selected Workload and the total CPU requested by all the pods.

                                                                                                                                                                                                                                                                                                                                                                      The upper bound shows the total CPU requested by all the pods. The value denotes the number of CPU cores.

                                                                                                                                                                                                                                                                                                                                                                      In this image, the pods in the Workload requests for 40 CPU cores, of which 43% is actually used (about 17 cores).

                                                                                                                                                                                                                                                                                                                                                                      What to Expect?

                                                                                                                                                                                                                                                                                                                                                                      It depends on the type of workload.

                                                                                                                                                                                                                                                                                                                                                                      For applications (background processes) whose resource usage is constant, expect the ratio to be around 100%.

                                                                                                                                                                                                                                                                                                                                                                      For “bursty” applications, such as an API server, expect the ratio to be lower than 100%. Note that the value is averaged for the selected time window, therefore, a usage spike would be compensated by an idle period.

                                                                                                                                                                                                                                                                                                                                                                      Generally, values between 80% and 120% are considered normal. Values that are higher than 100% deemed normal if it’s observed only for a relatively short time.

                                                                                                                                                                                                                                                                                                                                                                      What to Do Otherwise?

                                                                                                                                                                                                                                                                                                                                                                      • A low usage indicates that the application is not properly running (not executing the expected functions) or the Workload configuration is not accurate (requests are too high compared to what the pods actually need).

                                                                                                                                                                                                                                                                                                                                                                      • A high usage indicates that the load is high for applications or the Workload configuration is not accurate (low requests compared to what the pods actually need).

                                                                                                                                                                                                                                                                                                                                                                      In either case, drill down to the Kubernetes Overview Dashboard corresponding to the Workload in Explore. For example, the Kubernetes Deployment Overview Dashboard provides insight into resource usage and configuration.

                                                                                                                                                                                                                                                                                                                                                                      Can the Value Be Higher than 100%?

                                                                                                                                                                                                                                                                                                                                                                      Yes, it can.

                                                                                                                                                                                                                                                                                                                                                                      • Configuring CPU requests without limits or requests lower than limits is permissible. In these cases, you are allowing the containers to use more resources than requested, typically to handle temporary overloads.

                                                                                                                                                                                                                                                                                                                                                                      • Consider a Workload with two containers. Say, one container is configured to request for 1 CPU core and uses 1 CPU core (Used vs Request ratio is 100%). The other is configured without any request and uses 1 CPU core. In this example, the 2 CPU core Used to 1 CPU core Requested ratio is 200% at the Workload level.

                                                                                                                                                                                                                                                                                                                                                                      What Does “No Data” Mean?

                                                                                                                                                                                                                                                                                                                                                                      If the Workload is configured with no requests and limits, then the Usage vs Requests ratio cannot be computed. In this case, the chart will show “no data”. Drill down to the Dashboard in Explore to evaluate the actual usage.

                                                                                                                                                                                                                                                                                                                                                                      You must always configure requests. Setting requests helps to detect Workloads that require reconfiguration.

                                                                                                                                                                                                                                                                                                                                                                      Kubernetes itself might expose Workloads with no requests or limits configured. For example, the kube-system Namespace can have Workloads without requests configured.

                                                                                                                                                                                                                                                                                                                                                                      Memory Used vs Requests

                                                                                                                                                                                                                                                                                                                                                                      The chart shows the latest value returned by sum(avg(memory.bytes.used)) / sum(avg(kubernetes.pod.resourceRequests.memBytes)).

                                                                                                                                                                                                                                                                                                                                                                      What Is It?

                                                                                                                                                                                                                                                                                                                                                                      The chart shows the ratio between the total memory usage across all the pods in a Workload and the total memory requested by the Workload.

                                                                                                                                                                                                                                                                                                                                                                      The upper bound shows the total memory requested by all the pods, expressed in the specified unit of bytes.

                                                                                                                                                                                                                                                                                                                                                                      For instance, the image shows that the pods in the selected Workload requested for 120 GiB, of which 24% is actually used (about 29 GiB).

                                                                                                                                                                                                                                                                                                                                                                      What to Expect?

                                                                                                                                                                                                                                                                                                                                                                      The type of Workload determines the ratio. Values between 80% and 120% are considered normal. Values that are higher than 100% is deemed normal if it’s observed only for a relatively short time.

                                                                                                                                                                                                                                                                                                                                                                      What to Do Otherwise?

                                                                                                                                                                                                                                                                                                                                                                      A low memory usage indicates that the application is not properly running (not executing the expected functions) or the Workload configuration is not accurate (requests are too high compared to what the pods actually need).

                                                                                                                                                                                                                                                                                                                                                                      A high memory usage indicates that the load is higher for applications or the Workload configuration is not accurate (low requests compared to what the pods actually need).

                                                                                                                                                                                                                                                                                                                                                                      Given the configured limits for the Workloads and the memory pressure on the nodes, if the Workloads use more memory than what’s requested they are at risk of eviction. For more information, see Container’s Memory Limit.

                                                                                                                                                                                                                                                                                                                                                                      In either case, drill down to the Workloads page to determine the Workload that requires a deeper analysis.

                                                                                                                                                                                                                                                                                                                                                                      Can the Value Be Higher than 100%?

                                                                                                                                                                                                                                                                                                                                                                      Yes, it can.

                                                                                                                                                                                                                                                                                                                                                                      • Configuring memory requests without limits or requests lower than limits is permissible. In these cases, you are allowing the containers to use more resources than requested, typically to handle temporary overloads.

                                                                                                                                                                                                                                                                                                                                                                      • Consider a Workload with two containers. Say, one container is configured to request for 1 GiB of memory and uses 1 GiB (Used vs Request ratio is 100%), while the other is configured without any request and uses 1 GiB of memory. In this example, the 2 GiB of memory used to 1 GiB requested ratio is 200% at the Workload level.

                                                                                                                                                                                                                                                                                                                                                                      What Does “No Data” Mean?

                                                                                                                                                                                                                                                                                                                                                                      If the Workload is configured with no memory requests and limits, then the Usage vs Requests ratio cannot be computed. In this case, the chart will show “no data”. Drill down to the Dashboard in Explore to evaluate the actual usage.

                                                                                                                                                                                                                                                                                                                                                                      You must configure requests. It helps to detect Workloads that require reconfiguration.

                                                                                                                                                                                                                                                                                                                                                                      Kubernetes itself might expose Workloads with no requests or limits configured. For example, the kube-system Namespace can have Workloads without requests configured.

                                                                                                                                                                                                                                                                                                                                                                      Network I/O

                                                                                                                                                                                                                                                                                                                                                                      The chart shows the latest value returned by avg(avg(net.bytes.total)).

                                                                                                                                                                                                                                                                                                                                                                      What Is It?

                                                                                                                                                                                                                                                                                                                                                                      The sparkline shows the trend of network traffic (inbound and outbound) for the Workload. The number shows the most recent rate, expressed in bytes per second in a specific unit.

                                                                                                                                                                                                                                                                                                                                                                      For reference, the sparklines show the following number of steps (sampling):

                                                                                                                                                                                                                                                                                                                                                                      • Last hour: 6 steps, each for a 10-minute time slice

                                                                                                                                                                                                                                                                                                                                                                      • Last 6 hours: 12 steps, each for a 30-minute time slice

                                                                                                                                                                                                                                                                                                                                                                      • Last day: 12 steps, each for a 2-hour time slice

                                                                                                                                                                                                                                                                                                                                                                      What to Expect?

                                                                                                                                                                                                                                                                                                                                                                      The type of application runs in the Workload determines the metrics. Drill down to the Kubernetes Overview Dashboard corresponding to the Workload in Explore. For example, the Kubernetes Deployment Overview Dashboard provides additional details, such as network activity across pods.

                                                                                                                                                                                                                                                                                                                                                                      3 -

                                                                                                                                                                                                                                                                                                                                                                      Explore

                                                                                                                                                                                                                                                                                                                                                                      This feature is available in the Enterprise tier of the Sysdig product. See https://sysdig.com/pricing for details, or contact sales@sysdig.com.

                                                                                                                                                                                                                                                                                                                                                                      About Explore

                                                                                                                                                                                                                                                                                                                                                                      The Sysdig Monitor user interface centers around the Explore module, where you perform the majority of infrastructure monitoring operations. Sysdig Monitor automatically discovers your stack and presents pre-built views in Metric Explorer. Explore provides you with the ability to view and troubleshoot key metrics and entities of your infrastructure stack. You can drill down to any layers of your infrastructure hierarchy and view granular-level data. Metrics Explorer allows you to run form queries and build infrastructure views by using interactive metric and label filtering.

                                                                                                                                                                                                                                                                                                                                                                      Grouping controls how entities are organized in Explore. Grouping is fully customizable by logical layers, such as containers, Kubernetes clusters, or services.

                                                                                                                                                                                                                                                                                                                                                                      In addition to the Explore interface, Sysdig provides a PromQL Query Explorer and PromQL Library. They help you understand metrics and corresponding labels and values clearly, to create queries faster, and to build Dashboard and Alerts easily.

                                                                                                                                                                                                                                                                                                                                                                      Benefits of Using Explore

                                                                                                                                                                                                                                                                                                                                                                      • Explore gives insight into:

                                                                                                                                                                                                                                                                                                                                                                        • Metrics and labels associated with your infrastructure.
                                                                                                                                                                                                                                                                                                                                                                        • Scope of the metrics. View the list of metrics collected from different part of the infrastructure. You can easily understand the association between a metric and the infrastructure layer it belongs to.
                                                                                                                                                                                                                                                                                                                                                                      • Explore allows

                                                                                                                                                                                                                                                                                                                                                                        • One-click access to the Metric Explorer view for All Workloads, Nodes, Containerized Apps, and Hosts&Containers in your environment.
                                                                                                                                                                                                                                                                                                                                                                        • One-click access to PromQL Query Explore and PromQL Library.
                                                                                                                                                                                                                                                                                                                                                                        • One-click access to available data sources. These are the immutable Groupings and clicking one of these options has the same effect as selecting a Grouping from the menu dropdown.
                                                                                                                                                                                                                                                                                                                                                                        • Use either form query or PromQL to query metrics and build Dashboard panels or create alerts.
                                                                                                                                                                                                                                                                                                                                                                        • View the last selected Grouping.

                                                                                                                                                                                                                                                                                                                                                                      Explore Interface

                                                                                                                                                                                                                                                                                                                                                                      This section outlines the key areas of the interface and detail basic navigation steps.

                                                                                                                                                                                                                                                                                                                                                                      There are several key areas highlighted in the image above:

                                                                                                                                                                                                                                                                                                                                                                      • Switch Products: This allows you to switch between Sysdig products.

                                                                                                                                                                                                                                                                                                                                                                      • Grouping: Groupings are hierarchical organizations of tags, allowing users to organize their infrastructure views using the Grouping Wizard in a logical hierarchy. For more information on groupings, refer to Grouping, Scoping, and Segmenting Metrics.

                                                                                                                                                                                                                                                                                                                                                                      • Modules: Quick links for each of the main Sysdig Monitor modules: Explore, Dashboards, Alerts, Events, and Captures.

                                                                                                                                                                                                                                                                                                                                                                      • PromQL Query Explorer: Run PromQL queries to build your infrastructure views and get an in-depth insight into what’s going on. See PromQL Query Explorer.

                                                                                                                                                                                                                                                                                                                                                                      • PromQL Query Library: Provides a set of out-of-the-box PromQL queries. See PromQL Library.

                                                                                                                                                                                                                                                                                                                                                                      • Management: Quick links for Sysdig Spotlight, help material, and the user profile configuration settings.

                                                                                                                                                                                                                                                                                                                                                                      • Scope Filtering: This allows you to explore deep down the infrastructure stack and retrieve all the components in a certain category in a single organized element.

                                                                                                                                                                                                                                                                                                                                                                      • Search Metrics: Helps you select desired metrics and build a query with one-click.

                                                                                                                                                                                                                                                                                                                                                                      • Time Navigation: Helps you customize the time window used for displaying data

                                                                                                                                                                                                                                                                                                                                                                      • Key Page Actions: Quick links to create alerts and dashboards.

                                                                                                                                                                                                                                                                                                                                                                      Learn More

                                                                                                                                                                                                                                                                                                                                                                      Learn more about using Explore in the following sections:

                                                                                                                                                                                                                                                                                                                                                                      3.1 -

                                                                                                                                                                                                                                                                                                                                                                      Metrics Explorer

                                                                                                                                                                                                                                                                                                                                                                      Use the Metrics Explorer for advanced metric exploration and querying. In addition to the core functionalities (grouping, scope tree, metrics, and graphing) of Explore, Metrics Explorer provides you the ability to:

                                                                                                                                                                                                                                                                                                                                                                      • Graph multiple metrics simultaneously for correlation. For example, CPU usage vs CPU limits.
                                                                                                                                                                                                                                                                                                                                                                      • View ungrouped queries by default, showing the individual time series for a metric.
                                                                                                                                                                                                                                                                                                                                                                      • View context-specific metrics for a selected a scope. You no longer see no data for a selected metric.
                                                                                                                                                                                                                                                                                                                                                                      • View metrics that are logically categorized with metric namespace prefix.
                                                                                                                                                                                                                                                                                                                                                                      • Display metrics at high resolution. For example a 1-hour view now shows data at 10-seconds resolution instead of 1 minute.

                                                                                                                                                                                                                                                                                                                                                                      About the Metrics Explorer UI

                                                                                                                                                                                                                                                                                                                                                                      The main components of the Metrics Explorer UI are widgets, time navigation, dashboard, and time series panel.

                                                                                                                                                                                                                                                                                                                                                                      You’ll find Metrics Explorer on the Explore slider menu on the Sysdig Monitor UI. Click Explore to display the slider.

                                                                                                                                                                                                                                                                                                                                                                      Use Metrics Explorer

                                                                                                                                                                                                                                                                                                                                                                      This section helps you drill down into your infrastructure stack for troubleshooting views and create alerts and dashboard by using Metrics Explorer.

                                                                                                                                                                                                                                                                                                                                                                      Switch Groupings

                                                                                                                                                                                                                                                                                                                                                                      Sysdig Monitor detects and collects the metrics associated with your infrastructure once the agent is deployed in your environment. Use the Explore UI to search, group, and troubleshoot your infrastructure components.

                                                                                                                                                                                                                                                                                                                                                                      To switch between available data sources:

                                                                                                                                                                                                                                                                                                                                                                      1. On the Metrics Explorer tab, click the My Groupings drop-down menu:

                                                                                                                                                                                                                                                                                                                                                                      2. Select the desired grouping from the drop-down list.

                                                                                                                                                                                                                                                                                                                                                                      Groupings Editor

                                                                                                                                                                                                                                                                                                                                                                      The Groupings Editor helps you create and manage your infrastructure groupings.

                                                                                                                                                                                                                                                                                                                                                                      Filter Infrastructure (Scope Filtering)

                                                                                                                                                                                                                                                                                                                                                                      You can drill down the infrastructure stack and get insight into the numerous metrics available to you at each level of your stack. These displays can be found by selecting a top-level infrastructure object, then using the scope filtering for relevant infrastructure objects and metrics filtering for desired metrics.

                                                                                                                                                                                                                                                                                                                                                                      Sysdig Monitor displays only the metrics and dashboards that are relevant to the selected infrastructure object.

                                                                                                                                                                                                                                                                                                                                                                      Metrics

                                                                                                                                                                                                                                                                                                                                                                      You can view specific metrics for an infrastructure object by navigating the scope filtering and metrics filtering menus:

                                                                                                                                                                                                                                                                                                                                                                      1. On the Metrics Explorer tab, open the scope filtering menu.

                                                                                                                                                                                                                                                                                                                                                                      2. Select the infrastructure object you want to explore.

                                                                                                                                                                                                                                                                                                                                                                      3. Navigate to Filter metrics.

                                                                                                                                                                                                                                                                                                                                                                      4. Click the desired metrics.

                                                                                                                                                                                                                                                                                                                                                                        The metric will instantly be presented on the form query and on the dashboard. The scope of the metric, when viewed via the scope filtering menu, is set to the infrastructure object that you have selected.

                                                                                                                                                                                                                                                                                                                                                                      5. Optionally, click Add Query, then click a metric to add additional queries.

                                                                                                                                                                                                                                                                                                                                                                        You can do all the operations, such as setting Time Aggregation, Show Top 50 and Bottom 50 time series, Group Rollup, Segmentation, and Unit of Value Returned by Query, as you use form query. See Building a Form-Based Query for more information.

                                                                                                                                                                                                                                                                                                                                                                      Create an Alert

                                                                                                                                                                                                                                                                                                                                                                      1. Build a form query as described in Metrics.

                                                                                                                                                                                                                                                                                                                                                                      2. Click Create Alert.

                                                                                                                                                                                                                                                                                                                                                                        If you have built multiple queries, you will be prompted to choose a single metric to be alerted on.

                                                                                                                                                                                                                                                                                                                                                                      3. Select the metric you want to create an alert for.

                                                                                                                                                                                                                                                                                                                                                                      4. Click Create Alert. The New Metric Alert page will be displayed.

                                                                                                                                                                                                                                                                                                                                                                        The group aggregation will be set to the default one for an alert that is created from a query with group aggregation set to none.

                                                                                                                                                                                                                                                                                                                                                                      5. Complete creating the alert as described in Metric Alerts.

                                                                                                                                                                                                                                                                                                                                                                      Create a Dashboard Panel

                                                                                                                                                                                                                                                                                                                                                                      1. Build a form query as described in Metrics.

                                                                                                                                                                                                                                                                                                                                                                      2. Click Create dashboard panel.

                                                                                                                                                                                                                                                                                                                                                                      3. Select an existing dashboard or create a new dashboard by typing in a name.

                                                                                                                                                                                                                                                                                                                                                                      4. Click Copy and Open. The newly created dashboard will be displayed.

                                                                                                                                                                                                                                                                                                                                                                        The group aggregation will be set to the default one for a dashboard that is created from a query with group aggregation set to none.

                                                                                                                                                                                                                                                                                                                                                                      5. Optionally, continue with other operations as described in Managing Panels.

                                                                                                                                                                                                                                                                                                                                                                      3.1.1 -

                                                                                                                                                                                                                                                                                                                                                                      Groupings Editor

                                                                                                                                                                                                                                                                                                                                                                      Groupings are hierarchical organizations of labels, allowing you to organize your infrastructure views on the Explore UI in a logical hierarchy.

                                                                                                                                                                                                                                                                                                                                                                      An example grouping is shown below:

                                                                                                                                                                                                                                                                                                                                                                      The example above groups the infrastructure into four levels. This results in a tree view in the Groupings Editor with four levels, with rows for each infrastructure object applicable to each level.

                                                                                                                                                                                                                                                                                                                                                                      As each label is selected, Sysdig Monitor automatically filters out labels for the next selection that no longer fit the hierarchy, to ensure that only logical groupings are created.

                                                                                                                                                                                                                                                                                                                                                                      Sysdig Monitor automatically organizes all the configured groupings that are inapplicable to the current infrastructure under Inapplicable Groupings.

                                                                                                                                                                                                                                                                                                                                                                      Manage Groupings

                                                                                                                                                                                                                                                                                                                                                                      You can perform the following operations using the Groupings Editor:

                                                                                                                                                                                                                                                                                                                                                                      • Search existing groupings

                                                                                                                                                                                                                                                                                                                                                                      • Create a new grouping

                                                                                                                                                                                                                                                                                                                                                                      • Edit an existing grouping

                                                                                                                                                                                                                                                                                                                                                                      • Rename a groupings

                                                                                                                                                                                                                                                                                                                                                                      • Share a grouping with the active team

                                                                                                                                                                                                                                                                                                                                                                      Search for a Grouping

                                                                                                                                                                                                                                                                                                                                                                      1. Do one of the following:

                                                                                                                                                                                                                                                                                                                                                                        • From Explore, click the Groupings drop-down. Search for the desired grouping.

                                                                                                                                                                                                                                                                                                                                                                          Either select the desired grouping, or search for it by scrolling down the list or by using the search bar, and then select it.

                                                                                                                                                                                                                                                                                                                                                                        • Click Manage Groupings and open the Groupings Editor.

                                                                                                                                                                                                                                                                                                                                                                          Either select the desired grouping, or search for it by scrolling down the list or by using the search bar, and then select it.

                                                                                                                                                                                                                                                                                                                                                                      Create a New Grouping

                                                                                                                                                                                                                                                                                                                                                                      1. In the Explore tab, click the Groupings drop-down, then click Manage Groupings.

                                                                                                                                                                                                                                                                                                                                                                      2. Open the Groupings Editor.

                                                                                                                                                                                                                                                                                                                                                                      3. Click Add.

                                                                                                                                                                                                                                                                                                                                                                        The New Groupings page is displayed.

                                                                                                                                                                                                                                                                                                                                                                      4. Enter the following information:

                                                                                                                                                                                                                                                                                                                                                                        • Groupings Name: Set an appropriate name to identify the grouping that you are creating.

                                                                                                                                                                                                                                                                                                                                                                        • Shared with Team: Select if you want to share the grouping with the active team that you are part of.

                                                                                                                                                                                                                                                                                                                                                                        • Hierarchy: Determine the hierarchical representation of the grouping by choosing a top-level label and subsequent ones. Repeat adding the labels until there are no further layers available in the infrastructure label hierarchy.

                                                                                                                                                                                                                                                                                                                                                                          You can search for the label by entering the first few characters in the Select label drop-down or scrolling down. As you add labels, the preview displays associated components in your infrastructure.

                                                                                                                                                                                                                                                                                                                                                                      5. Check the preview to ensure that the label selection is correct.

                                                                                                                                                                                                                                                                                                                                                                      6. Click Save&Apply.

                                                                                                                                                                                                                                                                                                                                                                      Rename a Grouping

                                                                                                                                                                                                                                                                                                                                                                      Renaming is allowed only for groupings that are owned by you. To rename a shared grouping, create a copy of it and edit the name.

                                                                                                                                                                                                                                                                                                                                                                      1. On Explore, click the Groupings drill-down. Search for the desired grouping.

                                                                                                                                                                                                                                                                                                                                                                      2. Click the Edit button next to the grouping.

                                                                                                                                                                                                                                                                                                                                                                      3. Open the Groupings Editor.

                                                                                                                                                                                                                                                                                                                                                                      4. Select the desired grouping. You can either scroll down the list or use the search bar.

                                                                                                                                                                                                                                                                                                                                                                      5. Click Edit.

                                                                                                                                                                                                                                                                                                                                                                        The edit window is displayed on the screen.

                                                                                                                                                                                                                                                                                                                                                                      6. Specify the new grouping name, then click Save& Apply to save the changes.

                                                                                                                                                                                                                                                                                                                                                                      Share a Grouping with Your Active Team

                                                                                                                                                                                                                                                                                                                                                                      Custom groupings are owned by you, and therefore you can share them with all the members of your active team. To share a default grouping, create a custom grouping and use the Shared with Team option in the Grouping Editor.

                                                                                                                                                                                                                                                                                                                                                                      1. Click the Groupings drill-down and click Manage Groupings.

                                                                                                                                                                                                                                                                                                                                                                        The Grouping Editor screen appears.

                                                                                                                                                                                                                                                                                                                                                                      2. Highlight the relevant grouping and click Edit.

                                                                                                                                                                                                                                                                                                                                                                      3. Click Shared with Team.

                                                                                                                                                                                                                                                                                                                                                                      4. Click Save &Apply to save the changes.

                                                                                                                                                                                                                                                                                                                                                                      To share a default grouping, create a custom grouping and then use the Shared with Team option in the Grouping Editor.

                                                                                                                                                                                                                                                                                                                                                                      3.1.2 -

                                                                                                                                                                                                                                                                                                                                                                      Time Windows

                                                                                                                                                                                                                                                                                                                                                                      By default, Sysdig Monitor displays information in Live mode. This means that dashboards, panels, and the Explore views will be automatically updated with new data as time passes, and will display the most recent data available for the configured time window.

                                                                                                                                                                                                                                                                                                                                                                      By default, time navigation will enter Live mode with an hour time window.

                                                                                                                                                                                                                                                                                                                                                                      The time window navigation bar provides users with quick links to common time windows, as well as the ability to configure a custom time period in order to review historical data.

                                                                                                                                                                                                                                                                                                                                                                      As shown in the image above, the navigation bar provides a number of pieces of information:

                                                                                                                                                                                                                                                                                                                                                                      • The state of the data (Live or Past).

                                                                                                                                                                                                                                                                                                                                                                      • The current time window.

                                                                                                                                                                                                                                                                                                                                                                      • The configured timezone.

                                                                                                                                                                                                                                                                                                                                                                      In addition, the navigation bar provides:

                                                                                                                                                                                                                                                                                                                                                                      • Quick links for common time windows

                                                                                                                                                                                                                                                                                                                                                                        • Metrics Explorer: five minute, ten minutes, one hour, six hours, twelve hours, one day, and two weeks.
                                                                                                                                                                                                                                                                                                                                                                        • Explore: ten seconds, five minute, ten minutes, one hour, six hours, one day, and two weeks.
                                                                                                                                                                                                                                                                                                                                                                      • A custom time window configuration option.

                                                                                                                                                                                                                                                                                                                                                                      • A pause/play button to exit Live mode and freeze the data to a time window, and to return to Live mode.

                                                                                                                                                                                                                                                                                                                                                                      • Step back/forward buttons to jump through a frozen time window to review historical data.

                                                                                                                                                                                                                                                                                                                                                                      • Zoom in/out buttons to increase/decrease the time window (note applicable to Metrics Explorer)

                                                                                                                                                                                                                                                                                                                                                                      Configure a Custom Time Period

                                                                                                                                                                                                                                                                                                                                                                      The Time Navigation drop-down panel can be used to configure a specific time range. To configure a manual range:

                                                                                                                                                                                                                                                                                                                                                                      Metrics Explorer

                                                                                                                                                                                                                                                                                                                                                                      1. On the Metrics Explorer tab, click the custom panel the time navigation bar.

                                                                                                                                                                                                                                                                                                                                                                      2. Configure the start and end points, and click Save to save the changes.

                                                                                                                                                                                                                                                                                                                                                                      Some limitations apply to custom time windows. Refer to Time Window Limitations for more information.

                                                                                                                                                                                                                                                                                                                                                                      Explore

                                                                                                                                                                                                                                                                                                                                                                      1. On the Explore tab, click CUSTOM on the time navigation bar.

                                                                                                                                                                                                                                                                                                                                                                      2. Configure the start and end points, and click Adjust time to save the changes.

                                                                                                                                                                                                                                                                                                                                                                      Some limitations apply to custom time windows. Refer to Time Window Limitations for more information.

                                                                                                                                                                                                                                                                                                                                                                      Time Window Limitations

                                                                                                                                                                                                                                                                                                                                                                      Some time window configurations may not be available in certain situations. In these instances, a modification to the time window is automatically applied, and a warning notification will be displayed:

                                                                                                                                                                                                                                                                                                                                                                      There are two main reasons for a time window being unavailable. Both relate to data granularity and specificity:

                                                                                                                                                                                                                                                                                                                                                                      • The time window specifies the granularity of data that has expired and is no longer available. For example, a time window specifying a one-hour time range from six months ago would not be available, resulting in the time window being modified to a time range of at least one day.

                                                                                                                                                                                                                                                                                                                                                                      • The time window specifies a granularity of data that is too high given the size of the window, as a graph can only handle a certain number of data points. For example, a multi-hour time range would contain too many datapoints at one-minute granularity, and would automatically be modified to 10-minute granularity.

                                                                                                                                                                                                                                                                                                                                                                      3.1.3 -

                                                                                                                                                                                                                                                                                                                                                                      Explore Workflows

                                                                                                                                                                                                                                                                                                                                                                      While every user has unique needs from Sysdig Monitor, there are three main workflows that you can follow when building out the interface and monitoring your infrastructure.

                                                                                                                                                                                                                                                                                                                                                                      Workflow One

                                                                                                                                                                                                                                                                                                                                                                      This workflow assumes that an alert has not been triggered yet.

                                                                                                                                                                                                                                                                                                                                                                      Start with Explore , identify a problem area, then drill-down into the data. This workflow is the most basic approach, as it begins with a user monitoring the overall infrastructure, rather than with a specific alert notification. The workflow tends to follow the following steps:

                                                                                                                                                                                                                                                                                                                                                                      1. Organize the infrastructure with groupings.

                                                                                                                                                                                                                                                                                                                                                                      2. Define key signals with alerts and dashboards to detect a problem.

                                                                                                                                                                                                                                                                                                                                                                      3. Identify a problem area, and drill down into the data using dashboards, metrics, and by adjusting groupings and scope as necessary.

                                                                                                                                                                                                                                                                                                                                                                      Workflow Two

                                                                                                                                                                                                                                                                                                                                                                      Start with an event notification, and begin troubleshooting. This workflow begins with an already configured alert and event being triggered. Unlike workflow one, this workflow assumes that pre-determined data boundaries have already been set:

                                                                                                                                                                                                                                                                                                                                                                      1. Explore the event by adjusting time windows, scope, and segmentation.

                                                                                                                                                                                                                                                                                                                                                                      2. Identify the exact area of concern within the infrastructure.

                                                                                                                                                                                                                                                                                                                                                                      3. Drill down into the data to troubleshoot the issue.

                                                                                                                                                                                                                                                                                                                                                                      Workflow Three

                                                                                                                                                                                                                                                                                                                                                                      Customize default dashboard panels to troubleshoot a potential issue. This workflow assumes that an issue has been identified within one of the default dashboards, but alerts have not been set up for the problem area.

                                                                                                                                                                                                                                                                                                                                                                      1. Copy the displayed panel to a new dashboard.

                                                                                                                                                                                                                                                                                                                                                                      2. Create an alert based on the dashboard panel.

                                                                                                                                                                                                                                                                                                                                                                      3. Configure a Sysdig Capture on demand.

                                                                                                                                                                                                                                                                                                                                                                      3.2 -

                                                                                                                                                                                                                                                                                                                                                                      PromQL Query Explorer

                                                                                                                                                                                                                                                                                                                                                                      Use the PromQL Query Explorer to run PromQL queries and build infrastructure views. It allows you

                                                                                                                                                                                                                                                                                                                                                                      • Write PromQL queries faster by automatically identifying the common labels and labels among different metrics.

                                                                                                                                                                                                                                                                                                                                                                        See Run PromQL Queries Faster with Extended Label Set.

                                                                                                                                                                                                                                                                                                                                                                      • Query metrics by leveraging advanced functions, operators, and boolean logic.

                                                                                                                                                                                                                                                                                                                                                                      • Interactively modify the PromQL results by using visual label filtering.

                                                                                                                                                                                                                                                                                                                                                                      • Use label filtering to visualize the common labels between metrics, which is key when combining multiple metrics.

                                                                                                                                                                                                                                                                                                                                                                      About the PromQL Explorer UI

                                                                                                                                                                                                                                                                                                                                                                      The main components of the PromQL Query Explorer UI include widgets, time navigation, and dashboard and time series panel.

                                                                                                                                                                                                                                                                                                                                                                      You’ll find PromQL Explore under the Explore tab on the Sysdig Monitor UI.

                                                                                                                                                                                                                                                                                                                                                                      PromQL Query

                                                                                                                                                                                                                                                                                                                                                                      The PromQL field supports manually building PromQL queries. You can manually enter simple or complex PromQL queries and build dashboards and create alerts. The PromQL Query Explorer allows running up to 5 queries simultaneously. With the query field, you can do the following:

                                                                                                                                                                                                                                                                                                                                                                      • Explore metrics and labels available in your infrastructure.

                                                                                                                                                                                                                                                                                                                                                                        For example, calculate the number of bytes received in a selected host:

                                                                                                                                                                                                                                                                                                                                                                        sysdig_host_net_total_bytes{host_mac="0a:e2:e8:b4:6c:1a"}
                                                                                                                                                                                                                                                                                                                                                                        

                                                                                                                                                                                                                                                                                                                                                                        Calculate the number of bytes received in all the hosts except one:

                                                                                                                                                                                                                                                                                                                                                                        sysdig_host_net_total_bytes{host_mac!="0a:a3:4b:3e:db:a2"}
                                                                                                                                                                                                                                                                                                                                                                        

                                                                                                                                                                                                                                                                                                                                                                        Compare current data with historical data:

                                                                                                                                                                                                                                                                                                                                                                        sysdig_host_net_total_bytes offset 7d
                                                                                                                                                                                                                                                                                                                                                                        
                                                                                                                                                                                                                                                                                                                                                                      • Use arithmetic operators to perform calculations on one or more metrics or labels.

                                                                                                                                                                                                                                                                                                                                                                        For example, calculate the rate of incoming bytes and convert it to bits:

                                                                                                                                                                                                                                                                                                                                                                        rate(sysdig_host_net_total_bytes[5m]) * 8
                                                                                                                                                                                                                                                                                                                                                                        
                                                                                                                                                                                                                                                                                                                                                                      • Build complex PromQL queries.

                                                                                                                                                                                                                                                                                                                                                                        For example, return summary ingress traffic across all the network interfaces grouped by instances

                                                                                                                                                                                                                                                                                                                                                                        sum(rate(sysdig_host_net_total_bytes[5m])) by (container_id)
                                                                                                                                                                                                                                                                                                                                                                        

                                                                                                                                                                                                                                                                                                                                                                      Label Filtering

                                                                                                                                                                                                                                                                                                                                                                      Label filtering to automatically identify common labels between queries for vector matching. In the given example, you can see that A and B metrics have only the host_mac label in common.

                                                                                                                                                                                                                                                                                                                                                                      You can also filter by using the relational operators available in the time series table. Simply click the operator for it to be automatically applied to the queries. Run the queries again to visualize the metrics.

                                                                                                                                                                                                                                                                                                                                                                      Filtering simultaneously applies to all the queries in the PromQL Query Explorer.

                                                                                                                                                                                                                                                                                                                                                                      Widgets

                                                                                                                                                                                                                                                                                                                                                                      PromQL Query Explorer supports only time series (Timechart). You can run advanced (PromQL) queries and build dashboard panels. PromQL Explorer does not support building form-based queries.

                                                                                                                                                                                                                                                                                                                                                                      Time Navigation

                                                                                                                                                                                                                                                                                                                                                                      PromQL Query Explorer is designed around time. After a query has been executed, Sysdig Monitor polls the infrastructure data every 10 seconds and refreshes the metrics on the Dashboard panel. You select how to view this gathered data by choosing a Preset interval and a time Range. For more information, see Time Navigation.

                                                                                                                                                                                                                                                                                                                                                                      Legend

                                                                                                                                                                                                                                                                                                                                                                      The legend is positioned on the upper right corner of the panel. Each query will have associated legends listed in the same execution order.

                                                                                                                                                                                                                                                                                                                                                                      Build a Query

                                                                                                                                                                                                                                                                                                                                                                      1. On the Explore tab, click PromQL Query.

                                                                                                                                                                                                                                                                                                                                                                      2. Enter a PromQL query manually.

                                                                                                                                                                                                                                                                                                                                                                        sysdig_host_cpu_used_percent
                                                                                                                                                                                                                                                                                                                                                                        

                                                                                                                                                                                                                                                                                                                                                                        Click Add Query to run multiple queries. You can run up to 5 queries at once.

                                                                                                                                                                                                                                                                                                                                                                        sysdig_container_cpu_used_percent
                                                                                                                                                                                                                                                                                                                                                                        
                                                                                                                                                                                                                                                                                                                                                                      3. Click Run Query or press command+Enter.

                                                                                                                                                                                                                                                                                                                                                                        A dashboard will appear on the screen. You can either Copy to a Dashboard or Create an Alert.

                                                                                                                                                                                                                                                                                                                                                                      Copy to a Dashboard

                                                                                                                                                                                                                                                                                                                                                                      1. Run a PromQL query.

                                                                                                                                                                                                                                                                                                                                                                      2. Click Create > Create a Dashboard Panel.

                                                                                                                                                                                                                                                                                                                                                                      3. Either select an existing Dashboard or enter the Dashboard name to copy to a new Dashboard.

                                                                                                                                                                                                                                                                                                                                                                      4. Click Copy and Open.

                                                                                                                                                                                                                                                                                                                                                                        The new Dashboard panel with the given title will open to the Dashboard tab.

                                                                                                                                                                                                                                                                                                                                                                        You might want to continue with the Dashboard operations as given in Dashboards.

                                                                                                                                                                                                                                                                                                                                                                      Create an Alert

                                                                                                                                                                                                                                                                                                                                                                      1. Run a PromQL query.

                                                                                                                                                                                                                                                                                                                                                                      2. Click Create > Create Alert.

                                                                                                                                                                                                                                                                                                                                                                      3. If you have multiple queries, select the query you want to create the alert for.

                                                                                                                                                                                                                                                                                                                                                                        A new PromQL Alert page for the selected query appears on the screen.

                                                                                                                                                                                                                                                                                                                                                                        Continue with PromQL Alerts.

                                                                                                                                                                                                                                                                                                                                                                      Remove a Query

                                                                                                                                                                                                                                                                                                                                                                      Click the three dots next to the query field to remove the query.

                                                                                                                                                                                                                                                                                                                                                                      Toggle Query Results

                                                                                                                                                                                                                                                                                                                                                                      Click the respective query buttons, for example, A or B, to show or hide query results.

                                                                                                                                                                                                                                                                                                                                                                      3.3 -

                                                                                                                                                                                                                                                                                                                                                                      PromQL Library

                                                                                                                                                                                                                                                                                                                                                                      PromQL is a powerful language to query metrics, but it could be challenging for beginners. To ease the learning curve of PromQL, Sysdig provides a set of curated examples, called PromQL Library. It helps you perform complex queries against your metrics with one click and get insight into your infrastructure problems which was not previously possible with Sysdig querying. For example, identify containers > 90% limit and counting pods per namespace, and so on.

                                                                                                                                                                                                                                                                                                                                                                      You have the following categories currently to experiment with PromQL:

                                                                                                                                                                                                                                                                                                                                                                      • Kubernetes

                                                                                                                                                                                                                                                                                                                                                                      • Infrastructure

                                                                                                                                                                                                                                                                                                                                                                      • Troubleshooting

                                                                                                                                                                                                                                                                                                                                                                      • PromQL 101

                                                                                                                                                                                                                                                                                                                                                                      Access PromQL Library

                                                                                                                                                                                                                                                                                                                                                                      1. Log in to Sysdig Monitor.

                                                                                                                                                                                                                                                                                                                                                                      2. Click Explore from the left navigation pane.

                                                                                                                                                                                                                                                                                                                                                                      3. On the Explore tab, click PromQL Library.

                                                                                                                                                                                                                                                                                                                                                                        The tab opens to a list of PromQL examples.

                                                                                                                                                                                                                                                                                                                                                                      Use PromQL Library

                                                                                                                                                                                                                                                                                                                                                                      Click Try me to open PromQL Query Explore. A visualization corresponding to the query will be displayed. You can do the following with the query:

                                                                                                                                                                                                                                                                                                                                                                      • Create a dashboard panel

                                                                                                                                                                                                                                                                                                                                                                      • Create an alert

                                                                                                                                                                                                                                                                                                                                                                      See PromQL Query Explorer for more information.

                                                                                                                                                                                                                                                                                                                                                                      To copy a query, click the copy icon next to the query.

                                                                                                                                                                                                                                                                                                                                                                      Filter PromQL Queries

                                                                                                                                                                                                                                                                                                                                                                      Automatic tag filtering identifies common tags in the given examples. You can use the following to filter queries:

                                                                                                                                                                                                                                                                                                                                                                      • Visual label filtering: Simply click the desired color-coded label to filter queries based on tags.

                                                                                                                                                                                                                                                                                                                                                                      • Text search: Use the Text Search bar on the top-left navigation pane.

                                                                                                                                                                                                                                                                                                                                                                      • Label search: Use the Label drop-down list on the top-left navigation pane.

                                                                                                                                                                                                                                                                                                                                                                      • Filter using categories: Use the All Categories checkboxes.

                                                                                                                                                                                                                                                                                                                                                                      3.4 -

                                                                                                                                                                                                                                                                                                                                                                      (Deprecated) Using the Explore Interface

                                                                                                                                                                                                                                                                                                                                                                      This section helps you navigate the Explore menu in the Sysdig Monitor UI.

                                                                                                                                                                                                                                                                                                                                                                      Switch Groupings

                                                                                                                                                                                                                                                                                                                                                                      Sysdig Monitor detects and collects the metrics associated with your infrastructure once the agent is deployed in your environment. Use the Explore UI to search, group, and troubleshoot your infrastructure components.

                                                                                                                                                                                                                                                                                                                                                                      To switch between available data sources:

                                                                                                                                                                                                                                                                                                                                                                      1. On the Explore tab, click the My Groupings drop-down menu:

                                                                                                                                                                                                                                                                                                                                                                      2. Select the desired grouping from the drop-down list.

                                                                                                                                                                                                                                                                                                                                                                      Groupings Editor

                                                                                                                                                                                                                                                                                                                                                                      The Groupings Editor helps you create and manage your infrastructure groupings.

                                                                                                                                                                                                                                                                                                                                                                      Use Drill-Down Menu

                                                                                                                                                                                                                                                                                                                                                                      Sysdig Monitor users can drill down into the infrastructure by using the numerous dashboards and metrics available for display in the Explore UI. These displays can be found by selecting an infrastructure object, and opening the drill-down menu.

                                                                                                                                                                                                                                                                                                                                                                      Sysdig Monitor only displays the metrics and dashboards that are relevant to the selected infrastructure object.

                                                                                                                                                                                                                                                                                                                                                                      Metrics

                                                                                                                                                                                                                                                                                                                                                                      Sysdig Monitor users can view specific metrics for an infrastructure object by navigating the drill-down menu:

                                                                                                                                                                                                                                                                                                                                                                      1. On the Explore tab, open the drill-down menu.

                                                                                                                                                                                                                                                                                                                                                                      2. Navigate to Search Metrics and Dashboard.

                                                                                                                                                                                                                                                                                                                                                                      3. Select the desired metrics.

                                                                                                                                                                                                                                                                                                                                                                        The metric will now be presented on the Explore UI, until the user navigates away from it.

                                                                                                                                                                                                                                                                                                                                                                        The scope of the metric, when viewed via the drill-down menu, is set to the infrastructure object that you have selected.

                                                                                                                                                                                                                                                                                                                                                                      Troubleshooting Views

                                                                                                                                                                                                                                                                                                                                                                      The drill-down menu displays all the default dashboard templates relevant to the selected infrastructure object. These Troubleshooting Views are broken into the following sections:

                                                                                                                                                                                                                                                                                                                                                                      The scope of the Troubleshooting View, when viewed via the drill-down menu, is set to the infrastructure object that you have selected from the drill-down.

                                                                                                                                                                                                                                                                                                                                                                      To navigate to the Troubleshooting Views:

                                                                                                                                                                                                                                                                                                                                                                      1. On the Explore tab, select an infrastructure object.

                                                                                                                                                                                                                                                                                                                                                                      2. Open the drill-down menu and select the desired infrastructure element

                                                                                                                                                                                                                                                                                                                                                                      3. Navigate to Search Metrics and Dashboard.

                                                                                                                                                                                                                                                                                                                                                                      4. Select the desired troubleshooting view.

                                                                                                                                                                                                                                                                                                                                                                        The selected dashboard will now be presented on the screen, until you navigate away from it.

                                                                                                                                                                                                                                                                                                                                                                      Pin and Unpin the Drill-Down Menu

                                                                                                                                                                                                                                                                                                                                                                      1. On the Explore tab, select an infrastructure object.

                                                                                                                                                                                                                                                                                                                                                                      2. Open the drill-down menu.

                                                                                                                                                                                                                                                                                                                                                                      3. Click Pin Menu to pin the menu to the Explore tab.

                                                                                                                                                                                                                                                                                                                                                                        To unpin the menu, click Unpin Menu at the bottom of the menu.

                                                                                                                                                                                                                                                                                                                                                                      4 -

                                                                                                                                                                                                                                                                                                                                                                      Metrics

                                                                                                                                                                                                                                                                                                                                                                      Metrics are quantitative values or measures that can be grouped/divided by labels. Sysdig Monitor metrics are divided into two groups: default metrics (out-of-the-box metrics associated with the system, orchestrator, and network infrastructure), and custom metrics (JMX, StatsD, and multiple other integrated application metrics).

                                                                                                                                                                                                                                                                                                                                                                      Sysdig automatically collects all types of metrics, and auto-labels them. Custom metrics can also have custom (user-defined) labels.

                                                                                                                                                                                                                                                                                                                                                                      Out-of-the box, when an agent is deployed on a host, Sysdig Monitor automatically begins collecting and reporting on a wide array of metrics. The sections below describe how those metrics are conceptualized within the system.

                                                                                                                                                                                                                                                                                                                                                                      In the sections, you can learn more also about the metrics types and the data aggregation techniques supported by Sysdig Monitor:

                                                                                                                                                                                                                                                                                                                                                                      4.1 -

                                                                                                                                                                                                                                                                                                                                                                      Grouping, Scoping, and Segmenting Metrics

                                                                                                                                                                                                                                                                                                                                                                      Data aggregation and filtering in Sysdig Monitor are done through the use of assigned labels. The sections below explain how labels work, the ways they can be used, and how to work with groupings, scopes, and segments.

                                                                                                                                                                                                                                                                                                                                                                      Labels

                                                                                                                                                                                                                                                                                                                                                                      Labels are used to identify and differentiate characteristics of a metric, allowing them to be aggregated or filtered for Explore module views, dashboards, alerts, and captures. Labels can be used in different ways:

                                                                                                                                                                                                                                                                                                                                                                      • To group infrastructure objects into logical hierarchies displayed on the Explore tab (called groupings). For more information, refer to Groupings .

                                                                                                                                                                                                                                                                                                                                                                      • To split aggregated data into segments. For more information, refer to Segments.

                                                                                                                                                                                                                                                                                                                                                                      There are two types of labels:

                                                                                                                                                                                                                                                                                                                                                                      • Infrastructure labels

                                                                                                                                                                                                                                                                                                                                                                      • Metric descriptor labels

                                                                                                                                                                                                                                                                                                                                                                      Infrastructure Labels

                                                                                                                                                                                                                                                                                                                                                                      Infrastructure labels are used to identify objects or entities within the infrastructure that a metric is associated with, including hosts, containers, and processes. An example label is shown below:

                                                                                                                                                                                                                                                                                                                                                                      Sysdig Notation

                                                                                                                                                                                                                                                                                                                                                                      kubernetes.pod.name
                                                                                                                                                                                                                                                                                                                                                                      

                                                                                                                                                                                                                                                                                                                                                                      Proemetheus Notation

                                                                                                                                                                                                                                                                                                                                                                      kubernetes_pod_name
                                                                                                                                                                                                                                                                                                                                                                      

                                                                                                                                                                                                                                                                                                                                                                      The table below outlines what each part of the label represents:

                                                                                                                                                                                                                                                                                                                                                                      Example Label ComponentDescription
                                                                                                                                                                                                                                                                                                                                                                      kubernetesThe infrastructure type.
                                                                                                                                                                                                                                                                                                                                                                      podThe object.
                                                                                                                                                                                                                                                                                                                                                                      nameThe label key.

                                                                                                                                                                                                                                                                                                                                                                      Infrastructure labels are obtained from the infrastructure (including from orchestrators, platforms, and the runtime processes), and Sysdig automatically builds a relationship model using the labels. This allows users to create logical hierarchical groupings to better aggregate the infrastructure objects in the Explore module.

                                                                                                                                                                                                                                                                                                                                                                      For more information on groupings, refer to the Groupings.

                                                                                                                                                                                                                                                                                                                                                                      Metric Descriptor Labels

                                                                                                                                                                                                                                                                                                                                                                      Metric descriptor labels are custom descriptors or key-value pairs applied directly to metrics, obtained from integrations like StatsD, Prometheus, and JMX. Sysdig automatically collects custom metrics from these integrations, and parses the labels from them. Unlike infrastructure labels, these labels can be arbitrary, and do not necessarily map to any entity or object.

                                                                                                                                                                                                                                                                                                                                                                      Metric descriptor labels can only be used for segmenting, not grouping or scoping.

                                                                                                                                                                                                                                                                                                                                                                      An example metric descriptor label is shown below:

                                                                                                                                                                                                                                                                                                                                                                      website_failedRequests:20|region='Asia', customer_ID='abc'
                                                                                                                                                                                                                                                                                                                                                                      

                                                                                                                                                                                                                                                                                                                                                                      The table below outlines what each part of the label represents:

                                                                                                                                                                                                                                                                                                                                                                      Example Label ComponentDescription
                                                                                                                                                                                                                                                                                                                                                                      website_failedRequestsThe metric name.
                                                                                                                                                                                                                                                                                                                                                                      20The metric value.
                                                                                                                                                                                                                                                                                                                                                                      region=‘Asia’, customer_ID=‘abc’The metric descriptor labels. Multiple key-value pairs can be assigned using a comma separated list.

                                                                                                                                                                                                                                                                                                                                                                      Sysdig recommends not using labels to store dimensions with high cardinalities (numerous different label values), such as user IDs, email addresses, URLs, or other unbounded sets of values. Each unique key-value label pair represents a new time series, which can dramatically increase the amount of data stored.

                                                                                                                                                                                                                                                                                                                                                                      Groupings

                                                                                                                                                                                                                                                                                                                                                                      Groupings are hierarchical organizations of labels, allowing users to organize their infrastructure views on the Explore tab in a logical hierarchy. An example grouping is shown below:

                                                                                                                                                                                                                                                                                                                                                                      The example above groups the infrastructure into four levels. This results in a tree view in the Explore module with four levels, with rows for each infrastructure object applicable to each level.

                                                                                                                                                                                                                                                                                                                                                                      As each label is selected, Sysdig Monitor automatically filters out labels for the next selection that no longer fit the hierarchy, to ensure that only logical groupings are created.

                                                                                                                                                                                                                                                                                                                                                                      The example below shows the logical hierarchy structure for Kubernetes:

                                                                                                                                                                                                                                                                                                                                                                      • Clusters: Cluster > Namespace > Replicaset > Pod

                                                                                                                                                                                                                                                                                                                                                                      • Namespace: Cluster > Namespace > HorizontalPodAutoscaler > Deployment > Pod

                                                                                                                                                                                                                                                                                                                                                                      • Daemonsets : Cluster > Namespace > Daemonsets > Pod

                                                                                                                                                                                                                                                                                                                                                                      • Services: Cluster > Namespace > Service > StatefulSet > Pod

                                                                                                                                                                                                                                                                                                                                                                      • Job: Cluster > Namespace > Job > Pod

                                                                                                                                                                                                                                                                                                                                                                      • ReplicationController: Cluster > Namespace > ReplicationController > Pod

                                                                                                                                                                                                                                                                                                                                                                      The default groupings are immutable: They cannot be modified or deleted. However, you can make a copy of them that you can modify.

                                                                                                                                                                                                                                                                                                                                                                      Unified Workload Labels

                                                                                                                                                                                                                                                                                                                                                                      Sysdig provides the following labels to help improve your infrastructure organization and troubleshooting easier.

                                                                                                                                                                                                                                                                                                                                                                      • kubernetes_workload_name: Displays all the Kubernetes workloads and indicates what type and name of workload resource (deployment, daemonSet, replicaSet, and so on) it is.

                                                                                                                                                                                                                                                                                                                                                                      • kubernetes_workload_type: Indicates what type of workload resource (deployment, daemonSet, replicaSet, and so on) it is.

                                                                                                                                                                                                                                                                                                                                                                      The availability of these labels also simplifies Groupings. You do not need different groupings for each type of deployment, instead, you have a single grouping for workloads.

                                                                                                                                                                                                                                                                                                                                                                      The labels allow you to segment metrics, such as sysdig_host_cpu_cores_used_percent , by kubernetes_workload_name to see CPU cores usage for all the workloads, instead of having a separate query for segmenting by kubernetes_deployment_name, kubernetes_replicaSet_name , and so on.

                                                                                                                                                                                                                                                                                                                                                                      Learn More

                                                                                                                                                                                                                                                                                                                                                                      Scopes

                                                                                                                                                                                                                                                                                                                                                                      A scope is a collection of labels that are used to filter out or define the boundaries of a group of data points when creating dashboards, dashboard panels, alerts, and teams. An example scope is shown below:

                                                                                                                                                                                                                                                                                                                                                                      In the example above, the scope is defined by two labels with operators and values defined. The table below defines each of the available operators.

                                                                                                                                                                                                                                                                                                                                                                      OperatorDescription
                                                                                                                                                                                                                                                                                                                                                                      isThe value matches the defined label value exactly.
                                                                                                                                                                                                                                                                                                                                                                      is notThe value does not match the defined label value exactly.
                                                                                                                                                                                                                                                                                                                                                                      inThe value is among the comma separated values entered.
                                                                                                                                                                                                                                                                                                                                                                      not inThe value is not among the comma separated values entered.
                                                                                                                                                                                                                                                                                                                                                                      containsThe label value contains the defined value.
                                                                                                                                                                                                                                                                                                                                                                      does not containThe label value does not contain the defined value.
                                                                                                                                                                                                                                                                                                                                                                      starts withThe label value starts with the defined value.

                                                                                                                                                                                                                                                                                                                                                                      The scope editor provides dynamic filtering capabilities. It restricts the scope of the selection for subsequent filters by rendering valid values that are specific to the previously selected label. Expand the list to view unfiltered suggestions. At run time, users can also supply custom values to achieve more granular filtering. The custom values are preserved. Note that changing a label higher up in the hierarchy might render the subsequent labels incompatible. For example, changing the kubernetes_namespace_name > kubernetes_deployment_name hierarchy to swarm_service_name > kubernetes_deployment_name is invalid as these entities belong to different orchestrators and cannot be logically grouped.

                                                                                                                                                                                                                                                                                                                                                                      Dashboards and Panels

                                                                                                                                                                                                                                                                                                                                                                      Dashboard scopes define the criteria for what metric data will be included in the dashboard’s panels. The current dashboard’s scope can be seen at the top of the dashboard:

                                                                                                                                                                                                                                                                                                                                                                      By default, all dashboard panels abide by the scope of the overall dashboard. However, an individual panel scope can be configured for a different scope than the rest of the dashboard.

                                                                                                                                                                                                                                                                                                                                                                      For more information on Dashboards and Panels, refer to the Dashboards documentation.

                                                                                                                                                                                                                                                                                                                                                                      Alerts

                                                                                                                                                                                                                                                                                                                                                                      Alert scopes are defined during the creation process, and specify what areas within the infrastructure the alert is applicable for. In the example alerts below, the first alert has a scope defined, whereas the second alert does not have a custom scope defined. If no scope is defined, the alert is applicable to the entire infrastructure.

                                                                                                                                                                                                                                                                                                                                                                      For more information on Alerts, refer to the Alerts documentation.

                                                                                                                                                                                                                                                                                                                                                                      Teams

                                                                                                                                                                                                                                                                                                                                                                      A team’s scope determines the highest level of data that team members have visibility for:

                                                                                                                                                                                                                                                                                                                                                                      • If a team’s scope is set to Host, team members can see all host-level and container-level information.

                                                                                                                                                                                                                                                                                                                                                                      • If a team’s scope is set to Container, team members can only see container-level information.

                                                                                                                                                                                                                                                                                                                                                                      A team’s scope only applies to that team. Users that are members of multiple teams may have different visibility depending on which team is active.

                                                                                                                                                                                                                                                                                                                                                                      For more information on teams and configuring team scope, refer to the Manage Teams and Roles documentation.

                                                                                                                                                                                                                                                                                                                                                                      Segments

                                                                                                                                                                                                                                                                                                                                                                      Aggregated data can be split into smaller sections by segmenting the data with labels. This allows for the creation of multi-series comparisons and multiple alerts. In the first image, the metric is not segmented:

                                                                                                                                                                                                                                                                                                                                                                      In the second image, the same metric has been segmented by container_id:

                                                                                                                                                                                                                                                                                                                                                                      Line and Area panels can display any number of segments for any given metric. The example image below displays the sysdig_connection_net_in_bytes metric segmented by both container_id and host_hostname:

                                                                                                                                                                                                                                                                                                                                                                      For more information regarding segmentation in dashboard panels, refer to the Configure Panels documentation. For more information regarding configuring alerts, refer to the Alerts documentation.

                                                                                                                                                                                                                                                                                                                                                                      The Meaning of n/a

                                                                                                                                                                                                                                                                                                                                                                      Sysdig Monitor imports data related to entities such as hosts, containers, processes, and so on, and reports them in tables or panels on the Explore and Dashboards UI, as well as in events, so across the UI you see varieties of data. The term n/a can appear anywhere on the UI where some form of data is displayed.

                                                                                                                                                                                                                                                                                                                                                                      n/a is a term that indicates data that is not available or that it does not apply to a particular instance. In Sysdig parlance, the term signifies one or more entities defined by a particular label, such as hostname or Kubernetes service, for which the label is invalid. In other words, n/a collectively represent entities whose metadata is not relevant to aggregation and filtering techniques—Grouping, Scoping, and Segmenting. For instance, a list of Kubernetes services might display the list of all the services as well as n/a that includes all the containers without the metadata describing a Kubernetes service.

                                                                                                                                                                                                                                                                                                                                                                      You might encounter n/a sporadically in Explore UI as well as in drill-down panels or dashboards, events, and likely elsewhere on the Sysdig Monitor UI when no relevant metadata is available for that particular display. How n/a should be treated depends on the nature of your deployment. The deployment will not be affected by the entities marked n/a.

                                                                                                                                                                                                                                                                                                                                                                      The following are some of the cases that yield n/a on the UI:

                                                                                                                                                                                                                                                                                                                                                                      • Labels are partially available or not available. For example, a host has entities that are not associated with a monitored Kubernetes deployment, or a monitored host has an unmonitored Kubernetes deployment running.

                                                                                                                                                                                                                                                                                                                                                                      • Labels that do not apply to the grouping criteria or at the hierarchy level. For example:

                                                                                                                                                                                                                                                                                                                                                                        • Containers that are not managed by Kubernetes. The containers managed by Kubernetes are identified with their  container_name labels.

                                                                                                                                                                                                                                                                                                                                                                        • In certain groupings by DaemonSet, Deployments render N/A and vice versa. Not all containers belong to both DaemonSet and Deployment objects concurrently. Likewise, a Kubernetes ReplicaSet grouping with the kubernetes_replicaset_name label will not show StatefulSets.

                                                                                                                                                                                                                                                                                                                                                                        • In a kubernetes_cluster_name > kubernetes_namespace_name > kubernetes_deployment_name  grouping, the entities without the kubernetes_cluster_name label yield n/a.

                                                                                                                                                                                                                                                                                                                                                                      • Entities are incorrectly labeled in the infrastructure.

                                                                                                                                                                                                                                                                                                                                                                      • Kubernetes features that are yet to be in sync with Sysdig Monitoring.

                                                                                                                                                                                                                                                                                                                                                                      • The format is not applicable to a particular record in the database.

                                                                                                                                                                                                                                                                                                                                                                      4.2 -

                                                                                                                                                                                                                                                                                                                                                                      Understanding Default, Custom, and Missing Metrics

                                                                                                                                                                                                                                                                                                                                                                      Default Metrics

                                                                                                                                                                                                                                                                                                                                                                      Default metrics include various kinds of metadata which Sysdig Monitor automatically knows how to label, segment, and display.

                                                                                                                                                                                                                                                                                                                                                                      For example:

                                                                                                                                                                                                                                                                                                                                                                      • System metrics for hosts, containers, and processes (CPU used, etc.)

                                                                                                                                                                                                                                                                                                                                                                      • Orchestrator metrics (collected from Kubernetes, Mesos, etc.)

                                                                                                                                                                                                                                                                                                                                                                      • Network metrics (e.g. network traffic)

                                                                                                                                                                                                                                                                                                                                                                      • HTTP

                                                                                                                                                                                                                                                                                                                                                                      • Platform metrics (in some cases)

                                                                                                                                                                                                                                                                                                                                                                      Default metrics are collected mainly from two sources: syscalls and Kubernetes.

                                                                                                                                                                                                                                                                                                                                                                      Custom Metrics

                                                                                                                                                                                                                                                                                                                                                                      About Custom Metrics

                                                                                                                                                                                                                                                                                                                                                                      Custom metrics generally refer to any metrics that the Sysdig Agent collects from some third-party integration. The type of infrastructure and applications integrated determine the custom metrics that the Agent collects and reports to Sysdig Monitor. The supported custom metrics are:

                                                                                                                                                                                                                                                                                                                                                                      Each metric comes with a set of custom labels, and additional labels can be user-created. Sysdig Monitor simply collects and reports them with minimal or no internal processing. The limit currently enforced is 3000 metrics per host. Use the metrics_filter option in the dragent.yaml file to remove unwanted metrics or to choose the metrics to report when hosts exceed this limit. For more information on editing the dragent.yaml file, see Understanding the Agent Config Files.

                                                                                                                                                                                                                                                                                                                                                                      Unit for Custom Metrics

                                                                                                                                                                                                                                                                                                                                                                      Sysdig Monitor detects the default unit of custom metrics automatically with the delimiter suffix in the metrics name. For example, custom_expvar_time_seconds results in a base unit set to seconds. The supported base units are byte, percent, and time. Custom metrics name should carry one of the following delimiter suffixes in order for Sysdig Monitor to identify and configure the accurate unit type.

                                                                                                                                                                                                                                                                                                                                                                      • second

                                                                                                                                                                                                                                                                                                                                                                      • seconds

                                                                                                                                                                                                                                                                                                                                                                      • byte

                                                                                                                                                                                                                                                                                                                                                                      • bytes

                                                                                                                                                                                                                                                                                                                                                                      • total (represents accumulating count)

                                                                                                                                                                                                                                                                                                                                                                      • percent

                                                                                                                                                                                                                                                                                                                                                                      Custom metrics will not be auto-detected and the unit will be incorrect unless this naming convention is followed. For instance, custom_byte_expvar will not yield the correct unit, that is MiB.

                                                                                                                                                                                                                                                                                                                                                                      Editing the Unit Scale

                                                                                                                                                                                                                                                                                                                                                                      You have the flexibility to change the unit scale either by editing the panel on the Dashboard or in the Explore.

                                                                                                                                                                                                                                                                                                                                                                      Explore

                                                                                                                                                                                                                                                                                                                                                                      From the Search Metrics and Dashboard drop-down, select the custom metrics you want to edit the unit selection for, then click More Options. Select the desired unit scale from the Metric Format drop-down and click Save.

                                                                                                                                                                                                                                                                                                                                                                      Dashboard

                                                                                                                                                                                                                                                                                                                                                                      Select the Dashboard Panel associated with the custom metrics you want to modify. Select the desired unit scale from the Metrics drop-down and click Save.

                                                                                                                                                                                                                                                                                                                                                                      Display Missing Data

                                                                                                                                                                                                                                                                                                                                                                      Data can be missing for a few different reasons:

                                                                                                                                                                                                                                                                                                                                                                      • Problems such as faulty network connectivity in the communication channel between your infrastructure and Sysdig metrics store.

                                                                                                                                                                                                                                                                                                                                                                      • Metrics or StatsD batch jobs are submitted sporadically.

                                                                                                                                                                                                                                                                                                                                                                      Sysdig Monitor allows you to configure the behavior of missing data in Dashboards. Though metric type determines the default behavior, you can configure how to visualize missing data and define it at the per-query level. Use the No Data Display drop-down in the Options menu in the panel configuration, and the No Data Message text box under the Panel tab. See Create a New Panel for more information.

                                                                                                                                                                                                                                                                                                                                                                      Consider the following guidelines:

                                                                                                                                                                                                                                                                                                                                                                      • Use the No Data Message text box under the Panel tab to enter a custom message when no data is available to render on the panels. This custom message, which could include links in markdown format and line breaks, is shown when queries return no data and reports no errors.

                                                                                                                                                                                                                                                                                                                                                                      • The No Data Display drop-down has only two options for the Stacked Area timechart: gap and show as zero.

                                                                                                                                                                                                                                                                                                                                                                      • For form-based timechart panels, the default option for a metrics selection that does not contain a StatsD metric is gap.

                                                                                                                                                                                                                                                                                                                                                                      • Adding a StatsD metric to a query in a form-based timechart panel will default the selected No Data Display type to the show as zero , which is the default option for form-based StatsD metrics. You can change this selection to any other type.

                                                                                                                                                                                                                                                                                                                                                                      • The default display option is gap for PromQL Timechart panels.

                                                                                                                                                                                                                                                                                                                                                                      The options for No Data Display are:

                                                                                                                                                                                                                                                                                                                                                                      • gap: The default option for form-based timechart panel, where a query metrics selection does not contain a StatsD metric. gap is the best visualization type for most use cases because it is easy to spot indicating a problem.

                                                                                                                                                                                                                                                                                                                                                                      • show as zero: The best option for StatsD metrics which are only submitted sporadically. For example, batch jobs and count of errors. This is the default display option for StatsD metrics in form-based panels.

                                                                                                                                                                                                                                                                                                                                                                        We do not recommend this option as setting zero could be misleading. For example, this setting will report the value for free disk space as 0% when the disk or host disappears, but in reality, the value is unknown.

                                                                                                                                                                                                                                                                                                                                                                        Prometheus best practices recommend avoiding missing metrics.

                                                                                                                                                                                                                                                                                                                                                                      • connect - solid: Use for measuring the value of a metric, typically a gauge, where you want to visualize the missing samples flattened.

                                                                                                                                                                                                                                                                                                                                                                        The leftmost and rightmost visible data points can be connected as Sysdig does not perform the interpolation.

                                                                                                                                                                                                                                                                                                                                                                      • connect - dotted: Use it for measuring the value of a metric, typically a gauge, where you want to visualize the missing samples flattened.

                                                                                                                                                                                                                                                                                                                                                                        The leftmost and rightmost visible data points can be connected as Sysdig does not perform the interpolation.

                                                                                                                                                                                                                                                                                                                                                                      4.3 -

                                                                                                                                                                                                                                                                                                                                                                      Metric Limits

                                                                                                                                                                                                                                                                                                                                                                      Sysdig ensures that you see the most relevant metric information relevant to your monitored environment. To achieve this, limits are enforced on the number of metrics that the datastore can store. Different limits apply to different metric types and agent versions.

                                                                                                                                                                                                                                                                                                                                                                      The default metric limits per agent is different from the subscription limit imposed on custom time series entitlement. Your entitlement limits per agent could be lower than the metric limits. For more information, see Time Series Billing.

                                                                                                                                                                                                                                                                                                                                                                      View Metric Limits

                                                                                                                                                                                                                                                                                                                                                                      The metric limits are automatically set by the Sysdig backend components based on your plan, agent version, and backend configuration.

                                                                                                                                                                                                                                                                                                                                                                      Use the Sysdig Agent Health & Status dashboard under Host Infrastructure templates to view metric limit for your account and the current usage per host for each metric type.

                                                                                                                                                                                                                                                                                                                                                                      The metric limits are exposed to the UI through the following agent metrics.

                                                                                                                                                                                                                                                                                                                                                                      MetricsDescription
                                                                                                                                                                                                                                                                                                                                                                      statsd_dragent_metricCount_limit_appCheckThe maximum number of unique appCheck timeseries that are allowed in an individual sample from the agent per node.
                                                                                                                                                                                                                                                                                                                                                                      statsd_dragent_metricCount_limit_statsdThe maximum number of unique statsd timeseries that are allowed in an individual sample from the agent per node.
                                                                                                                                                                                                                                                                                                                                                                      statsd_dragent_metricCount_limit_jmxThe maximum number of unique JMX timeseries that are allowed in an individual sample from the agent per node.
                                                                                                                                                                                                                                                                                                                                                                      statsd_dragent_metricCount_limit_prometheusThe maximum number of unique Prometheus timeseries that are allowed in an individual sample from the agent per node.

                                                                                                                                                                                                                                                                                                                                                                      Learn More

                                                                                                                                                                                                                                                                                                                                                                      4.4 -

                                                                                                                                                                                                                                                                                                                                                                      Sysdig Info Metrics

                                                                                                                                                                                                                                                                                                                                                                      Sysdig provides Prometheus compatible Info metrics to show infrastructure (sysdig_*_info) and Kubernetes (kube_*_info) labels. The info metric are gauges with a value of 1 and will have the _info suffix .

                                                                                                                                                                                                                                                                                                                                                                      For example, querying sysdig_host_info in PromQL Query will provide all labels associated with the host, such as:

                                                                                                                                                                                                                                                                                                                                                                      • agent_id
                                                                                                                                                                                                                                                                                                                                                                      • agent_tag_cluster
                                                                                                                                                                                                                                                                                                                                                                      • host_hostname
                                                                                                                                                                                                                                                                                                                                                                      • domain
                                                                                                                                                                                                                                                                                                                                                                      • host
                                                                                                                                                                                                                                                                                                                                                                      • host_domain
                                                                                                                                                                                                                                                                                                                                                                      • host_mac
                                                                                                                                                                                                                                                                                                                                                                      • instance_id

                                                                                                                                                                                                                                                                                                                                                                      Although info metrics are available, all the metrics that are ingested by Sysdig agents are automatically enriched with the metadata and you don’t need to do PromQL joins. For more information, see Run PromQL Queries Faster with Extended Label Set

                                                                                                                                                                                                                                                                                                                                                                      4.5 -

                                                                                                                                                                                                                                                                                                                                                                      Manage Metric Scale

                                                                                                                                                                                                                                                                                                                                                                      Sysdig provides several knobs for managing metric scale.

                                                                                                                                                                                                                                                                                                                                                                      There are three primary ways in which you could include/exclude metrics, should you encounter unwanted metrics limits.

                                                                                                                                                                                                                                                                                                                                                                      1. Include/exclude custom metrics by name filters.

                                                                                                                                                                                                                                                                                                                                                                        See Include/Exclude Custom Metrics.

                                                                                                                                                                                                                                                                                                                                                                      2. Include/exclude metrics emitted by certain containers, Kubernetes annotations, or any other container label at collection time.

                                                                                                                                                                                                                                                                                                                                                                        See Prioritize/Include/Exclude Designated Containers.

                                                                                                                                                                                                                                                                                                                                                                      3. Exclude metrics from unwanted ports.

                                                                                                                                                                                                                                                                                                                                                                        See Blacklist Ports.

                                                                                                                                                                                                                                                                                                                                                                      4.6 -

                                                                                                                                                                                                                                                                                                                                                                      Data Aggregation

                                                                                                                                                                                                                                                                                                                                                                      Sysdig Monitor allows users to adjust the aggregation settings when graphing or creating alerts for a metric, informing how Sysdig rolls up the available data samples in order to create the chart or evaluate the alert. There are two forms of aggregation used for metrics in Sysdig: time aggregation and group aggregation.

                                                                                                                                                                                                                                                                                                                                                                      Time aggregation is always performed before group aggregation.

                                                                                                                                                                                                                                                                                                                                                                      Time Aggregation

                                                                                                                                                                                                                                                                                                                                                                      Time aggregation comes into effect in two overlapping situations:

                                                                                                                                                                                                                                                                                                                                                                      • Charts can only render a limited number of data points. To look at a wide range of data, Sysdig Monitor may need to aggregate granular data into larger samples for visualization.

                                                                                                                                                                                                                                                                                                                                                                      • Sysdig Monitor rolls up historical data over time.

                                                                                                                                                                                                                                                                                                                                                                        Sysdig retains rollups based on each aggregation type, to allow users to choose which data points to utilize when evaluating older data.

                                                                                                                                                                                                                                                                                                                                                                      Sysdig agents collect 1-second samples and report data at 10-second resolution. The data is stored and reported every 10-second with the available aggregations (average, rate, min, max, sum) to make them available via the Sysdig Monitor UI and the API. For time series charts covering five minutes or less, data points are drawn at this 10-second resolution, and any time aggregation selections will have no effect. When an amount of time greater than five minutes is displayed, data points are drawn as an aggregate for an appropriate time interval. For example, for a chart covering one hour, each data point would reflect a one minute interval.

                                                                                                                                                                                                                                                                                                                                                                      At time intervals of one minute and above, charts can be configured to display different aggregates for the 10-second metrics used to calculate each datapoint.

                                                                                                                                                                                                                                                                                                                                                                      Aggregation TypeDescription
                                                                                                                                                                                                                                                                                                                                                                      averageThe average of the retrieved metric values across the time period.
                                                                                                                                                                                                                                                                                                                                                                      rateThe average value of the metric across the time period evaluated.
                                                                                                                                                                                                                                                                                                                                                                      maximumThe highest value during the time period evaluated.
                                                                                                                                                                                                                                                                                                                                                                      minimumThe lowest value during the time period evaluated.
                                                                                                                                                                                                                                                                                                                                                                      sumThe combined sum of the metric across the time period evaluated.

                                                                                                                                                                                                                                                                                                                                                                      In the example images below, the kubernetes_deployment_replicas_available metrics first uses the average for time aggregation:

                                                                                                                                                                                                                                                                                                                                                                      Then uses the sum for time aggregation:

                                                                                                                                                                                                                                                                                                                                                                      • Rate and average are very similar and often provide the same result. However, the calculation of each is different.

                                                                                                                                                                                                                                                                                                                                                                        • If time aggregation is set to one minute, the agent is supposed to retrieve six samples (one every 10 seconds).

                                                                                                                                                                                                                                                                                                                                                                        • In some cases, samples may not be there, due to disconnections or other circumstances. For this example, four samples are available. If this was the case, the average would be calculated by dividing by four, while the rate would be calculated by dividing by six.

                                                                                                                                                                                                                                                                                                                                                                      • Most metrics are sampled once for each time interval, resulting in average and rate returning the same value. However, there will be a distinction for any metrics not reported at every time interval. For example, some custom statsd metrics.

                                                                                                                                                                                                                                                                                                                                                                      • Rate is currently referred to as timeAvg in the Sysdig Monitor API and advanced alerting language.

                                                                                                                                                                                                                                                                                                                                                                      • By default, average is used when displaying data points for a time interval.

                                                                                                                                                                                                                                                                                                                                                                      Group Aggregation

                                                                                                                                                                                                                                                                                                                                                                      Metrics applied to a group of items (for example, several containers, hosts, or nodes) are averaged between the members of the group by default. For example, three hosts report different CPU usage for one sample interval. The three values will be averaged, and reported on the chart as a single datapoint for that metric.

                                                                                                                                                                                                                                                                                                                                                                      There are several different types of group aggregation:

                                                                                                                                                                                                                                                                                                                                                                      Aggregation TypeDescription
                                                                                                                                                                                                                                                                                                                                                                      averageThe average value of the interval’s samples.
                                                                                                                                                                                                                                                                                                                                                                      maximumThe maximum value of the interval’s samples.
                                                                                                                                                                                                                                                                                                                                                                      minimumThe minimum value of the interval’s samples.
                                                                                                                                                                                                                                                                                                                                                                      sumThe combined value of all of the interval’s samples.

                                                                                                                                                                                                                                                                                                                                                                      If a chart or alert is segmented, the group aggregation settings will be utilized for both aggregations across the whole group, and aggregation within each individual segmentation.

                                                                                                                                                                                                                                                                                                                                                                      For example, the image below shows a chart for CPU% across the infrastructure:

                                                                                                                                                                                                                                                                                                                                                                      When segmented by proc_name, the chart shows one CPU% line for each process:

                                                                                                                                                                                                                                                                                                                                                                      Each line provides the average value for every process with the same name. To see the difference, change the group aggregation type to sum:

                                                                                                                                                                                                                                                                                                                                                                      The metric aggregation value showed beside the metric name is for the time aggregation. While the screenshot shows AVG, the group aggregation is set to SUM.

                                                                                                                                                                                                                                                                                                                                                                      Aggregation Examples

                                                                                                                                                                                                                                                                                                                                                                      The tables below provide an example of how each type of aggregation works. The first table provides the metric data, while the second displays the resulting value for each type of aggregation.

                                                                                                                                                                                                                                                                                                                                                                      In the example below, the CPU% metric is applied to a group of servers called webserver. The first chart shows metrics using average aggregation for both time and group. The second chart shows the metrics using maximum aggregation for both time and group.

                                                                                                                                                                                                                                                                                                                                                                      For each one minute interval, the second chart renders the highest CPU usage value found from the servers in the webserver group and from all of the samples reported during the one minute interval. This view can be useful when searching for transient spikes in metrics over long periods of time, that would otherwise be missed with average aggregation.

                                                                                                                                                                                                                                                                                                                                                                      The group aggregation type is dependent on the segmentation. For a view showing metrics for a group of items, the current group aggregation setting will revert to the default setting, if the Segment By selection is changed.

                                                                                                                                                                                                                                                                                                                                                                      4.7 -

                                                                                                                                                                                                                                                                                                                                                                      Deprecated Metrics and Labels

                                                                                                                                                                                                                                                                                                                                                                      Below is the list of metrics and labels that are discontinued with the introduction of new metric store. We made an effort to not deprecate any metrics or labels that are used in existing alerts, but in case you encounter any issues, contact Sysdig Support.

                                                                                                                                                                                                                                                                                                                                                                      We have applied automatic mapping of all net.*.request.time.worst metrics to net.*.request.time, because the maximum aggregation gives equivalent results and it was almost exclusively used in combination with these metrics.

                                                                                                                                                                                                                                                                                                                                                                      Deprecated Metrics

                                                                                                                                                                                                                                                                                                                                                                      The following metrics are no longer supported.

                                                                                                                                                                                                                                                                                                                                                                      • net.request.time.file
                                                                                                                                                                                                                                                                                                                                                                      • net.request.time.file.percent
                                                                                                                                                                                                                                                                                                                                                                      • net.request.time.local
                                                                                                                                                                                                                                                                                                                                                                      • net.request.time.local.percent
                                                                                                                                                                                                                                                                                                                                                                      • net.request.time.net
                                                                                                                                                                                                                                                                                                                                                                      • net.request.time.net.percent
                                                                                                                                                                                                                                                                                                                                                                      • net.request.time.nextTiers
                                                                                                                                                                                                                                                                                                                                                                      • net.request.time.nextTiers.percent
                                                                                                                                                                                                                                                                                                                                                                      • net.request.time.processing
                                                                                                                                                                                                                                                                                                                                                                      • net.request.time.processing.percent
                                                                                                                                                                                                                                                                                                                                                                      • net.request.time.worst.in
                                                                                                                                                                                                                                                                                                                                                                      • net.request.time.worst.out
                                                                                                                                                                                                                                                                                                                                                                      • net.incomplete.connection.count.total
                                                                                                                                                                                                                                                                                                                                                                      • net.http.request.time.worst
                                                                                                                                                                                                                                                                                                                                                                      • net.mongodb.request.time.worst
                                                                                                                                                                                                                                                                                                                                                                      • net.sql.request.time.worst
                                                                                                                                                                                                                                                                                                                                                                      • net.link.clientServer.bytes
                                                                                                                                                                                                                                                                                                                                                                      • net.link.delay.perRequest
                                                                                                                                                                                                                                                                                                                                                                      • net.link.serverClient.bytes

                                                                                                                                                                                                                                                                                                                                                                      Deprecated Labels

                                                                                                                                                                                                                                                                                                                                                                      The following labels are no longer supported:

                                                                                                                                                                                                                                                                                                                                                                      • net.connection.client
                                                                                                                                                                                                                                                                                                                                                                      • net.connection.client.pid
                                                                                                                                                                                                                                                                                                                                                                      • net.connection.direction
                                                                                                                                                                                                                                                                                                                                                                      • net.connection.endpoint.tcp
                                                                                                                                                                                                                                                                                                                                                                      • net.connection.udp.inverted
                                                                                                                                                                                                                                                                                                                                                                      • net.connection.errorCode
                                                                                                                                                                                                                                                                                                                                                                      • net.connection.l4proto
                                                                                                                                                                                                                                                                                                                                                                      • net.connection.server
                                                                                                                                                                                                                                                                                                                                                                      • net.connection.server.pid
                                                                                                                                                                                                                                                                                                                                                                      • net.connection.state
                                                                                                                                                                                                                                                                                                                                                                      • net.role
                                                                                                                                                                                                                                                                                                                                                                      • cloudProvider.resource.endPoint
                                                                                                                                                                                                                                                                                                                                                                      • host.container.mappings
                                                                                                                                                                                                                                                                                                                                                                      • host.ip.all
                                                                                                                                                                                                                                                                                                                                                                      • host.ip.private
                                                                                                                                                                                                                                                                                                                                                                      • host.ip.public
                                                                                                                                                                                                                                                                                                                                                                      • host.server.port
                                                                                                                                                                                                                                                                                                                                                                      • host.isClientServer
                                                                                                                                                                                                                                                                                                                                                                      • host.isInstrumented
                                                                                                                                                                                                                                                                                                                                                                      • host.isInternal
                                                                                                                                                                                                                                                                                                                                                                      • host.procList.main
                                                                                                                                                                                                                                                                                                                                                                      • proc.id
                                                                                                                                                                                                                                                                                                                                                                      • proc.name.client
                                                                                                                                                                                                                                                                                                                                                                      • proc.name.server
                                                                                                                                                                                                                                                                                                                                                                      • program.environment
                                                                                                                                                                                                                                                                                                                                                                      • program.usernames
                                                                                                                                                                                                                                                                                                                                                                      • mesos_cluster
                                                                                                                                                                                                                                                                                                                                                                      • mesos_node
                                                                                                                                                                                                                                                                                                                                                                      • mesos_pid

                                                                                                                                                                                                                                                                                                                                                                      In addition to this list, the composite labels ending with ‘.label’ string will no longer be supported. For example kubernetes.service.label will be deprecated, but kubernetes.service.label.* labels are supported.

                                                                                                                                                                                                                                                                                                                                                                      4.8 -

                                                                                                                                                                                                                                                                                                                                                                      Troubleshooting Metrics

                                                                                                                                                                                                                                                                                                                                                                      Troubleshooting metrics include program metrics, connection-level network metrics, Kubernetes troubleshooting metrics, HTTP URL metrics, and some SQL metrics. They are reported on a granular 10s level and are stored for 4 days. Below is the list of troubleshooting metrics and the labels that you can use to segment them.

                                                                                                                                                                                                                                                                                                                                                                      Program Level Metrics

                                                                                                                                                                                                                                                                                                                                                                      • sysdig_program_cpu_cores_used
                                                                                                                                                                                                                                                                                                                                                                      • sysdig_program_cpu_cores_used_percent
                                                                                                                                                                                                                                                                                                                                                                      • sysdig_program_cpu_used_percent
                                                                                                                                                                                                                                                                                                                                                                      • sysdig_program_memory_used_bytes
                                                                                                                                                                                                                                                                                                                                                                      • sysdig_program_net_in_bytes
                                                                                                                                                                                                                                                                                                                                                                      • sysdig_program_net_out_bytes
                                                                                                                                                                                                                                                                                                                                                                      • sysdig_program_net_connection_in_count
                                                                                                                                                                                                                                                                                                                                                                      • sysdig_program_net_connection_out_count
                                                                                                                                                                                                                                                                                                                                                                      • sysdig_program_net_connection_total_count
                                                                                                                                                                                                                                                                                                                                                                      • sysdig_program_net_error_count
                                                                                                                                                                                                                                                                                                                                                                      • sysdig_program_net_request_count
                                                                                                                                                                                                                                                                                                                                                                      • sysdig_program_net_request_in_count
                                                                                                                                                                                                                                                                                                                                                                      • sysdig_program_net_request_out_count
                                                                                                                                                                                                                                                                                                                                                                      • sysdig_program_net_request_time
                                                                                                                                                                                                                                                                                                                                                                      • sysdig_program_net_request_in_time
                                                                                                                                                                                                                                                                                                                                                                      • sysdig_program_net_tcp_queue_len
                                                                                                                                                                                                                                                                                                                                                                      • sysdig_program_proc_count
                                                                                                                                                                                                                                                                                                                                                                      • sysdig_program_thread_count
                                                                                                                                                                                                                                                                                                                                                                      • sysdig_program_up

                                                                                                                                                                                                                                                                                                                                                                      In addition to the user-defined labels and standard set of labels Sysdig provides, you can use following labels to segment program metrics: program_cmd_line, program_name.

                                                                                                                                                                                                                                                                                                                                                                      Connection-Level Network Metrics

                                                                                                                                                                                                                                                                                                                                                                      • sysdig_connection_net_in_bytes
                                                                                                                                                                                                                                                                                                                                                                      • sysdig_connection_net_out_bytes
                                                                                                                                                                                                                                                                                                                                                                      • sysdig_connection_net_total_bytes
                                                                                                                                                                                                                                                                                                                                                                      • sysdig_connection_net_connection_in _count
                                                                                                                                                                                                                                                                                                                                                                      • sysdig_connection_net_connection_out _count
                                                                                                                                                                                                                                                                                                                                                                      • sysdig_connection_net_connection_total _count
                                                                                                                                                                                                                                                                                                                                                                      • sysdig_connection_net_request_in_count
                                                                                                                                                                                                                                                                                                                                                                      • sysdig_connection_net_request_out_count
                                                                                                                                                                                                                                                                                                                                                                      • sysdig_connection_net_request_count
                                                                                                                                                                                                                                                                                                                                                                      • sysdig_connection_net_request_in_time
                                                                                                                                                                                                                                                                                                                                                                      • sysdig_connection_net_request_out_time
                                                                                                                                                                                                                                                                                                                                                                      • sysdig_connection_net_request_time

                                                                                                                                                                                                                                                                                                                                                                      In addition to the user-defined labels and standard set of labels Sysdig provides, you can use following labels to segment connection level metrics: net_local_service, net_remote_service, net_local_endpoint, net_remote_endpoint, net_client_ip, net_server_ip, net_protocol

                                                                                                                                                                                                                                                                                                                                                                      Kubernetes Troubleshooting Metrics

                                                                                                                                                                                                                                                                                                                                                                      • kube_workload_status_replicas_misscheduled
                                                                                                                                                                                                                                                                                                                                                                      • kube_workload_status_replicas_scheduled
                                                                                                                                                                                                                                                                                                                                                                      • kube_workload_status_replicas_updated
                                                                                                                                                                                                                                                                                                                                                                      • kube_pod_container_status_last_terminated_reason
                                                                                                                                                                                                                                                                                                                                                                      • kube_pod_container_status_ready
                                                                                                                                                                                                                                                                                                                                                                      • kube_pod_container_status_restarts_total
                                                                                                                                                                                                                                                                                                                                                                      • kube_pod_container_status_running
                                                                                                                                                                                                                                                                                                                                                                      • kube_pod_container_status_terminated
                                                                                                                                                                                                                                                                                                                                                                      • kube_pod_container_status_terminated_reason
                                                                                                                                                                                                                                                                                                                                                                      • kube_pod_container_status_waiting
                                                                                                                                                                                                                                                                                                                                                                      • kube_pod_container_status_waiting_reason
                                                                                                                                                                                                                                                                                                                                                                      • kube_pod_init_container_status_last_terminated_reason
                                                                                                                                                                                                                                                                                                                                                                      • kube_pod_init_container_status_ready
                                                                                                                                                                                                                                                                                                                                                                      • kube_pod_init_container_status_restarts_total
                                                                                                                                                                                                                                                                                                                                                                      • kube_pod_init_container_status_running
                                                                                                                                                                                                                                                                                                                                                                      • kube_pod_init_container_status_terminated
                                                                                                                                                                                                                                                                                                                                                                      • kube_pod_init_container_status_terminated_reason
                                                                                                                                                                                                                                                                                                                                                                      • kube_pod_init_container_status_waiting
                                                                                                                                                                                                                                                                                                                                                                      • kube_pod_init_container_status_waiting_reason

                                                                                                                                                                                                                                                                                                                                                                      HTTP URL Metrics

                                                                                                                                                                                                                                                                                                                                                                      • sysdig_host_net_http_url_error_count
                                                                                                                                                                                                                                                                                                                                                                      • sysdig_host_net_http_url_request_count
                                                                                                                                                                                                                                                                                                                                                                      • sysdig_host_net_http_url_request_time
                                                                                                                                                                                                                                                                                                                                                                      • sysdig_container_net_http_url_error_count
                                                                                                                                                                                                                                                                                                                                                                      • sysdig_container_net_http_url_request_count
                                                                                                                                                                                                                                                                                                                                                                      • sysdig_container_net_http_url_request_time

                                                                                                                                                                                                                                                                                                                                                                      In addition to the user-defined labels and standard set of labels Sysdig provides, you can use net_http_url label to segment HTTP URL level metrics.

                                                                                                                                                                                                                                                                                                                                                                      SQL Query Metrics

                                                                                                                                                                                                                                                                                                                                                                      • sysdig_host_net_sql_query_error_count
                                                                                                                                                                                                                                                                                                                                                                      • sysdig_host_net_sql_query_request_count
                                                                                                                                                                                                                                                                                                                                                                      • sysdig_host_net_sql_query_request_time
                                                                                                                                                                                                                                                                                                                                                                      • sysdig_host_net_sql_querytype_error_count
                                                                                                                                                                                                                                                                                                                                                                      • sysdig_host_net_sql_querytype_request_count
                                                                                                                                                                                                                                                                                                                                                                      • sysdig_host_net_sql_querytype_request_time
                                                                                                                                                                                                                                                                                                                                                                      • sysdig_container_net_sql_query_error_count
                                                                                                                                                                                                                                                                                                                                                                      • sysdig_container_net_sql_query_request_count
                                                                                                                                                                                                                                                                                                                                                                      • sysdig_container_net_sql_query_request_time
                                                                                                                                                                                                                                                                                                                                                                      • sysdig_container_net_sql_querytype_error_count
                                                                                                                                                                                                                                                                                                                                                                      • sysdig_container_net_sql_querytype_request_count
                                                                                                                                                                                                                                                                                                                                                                      • sysdig_container_net_sql_querytype_request_time

                                                                                                                                                                                                                                                                                                                                                                      In addition to the user-defined labels and standard set of labels Sysdig provides, you can use net_sql_querytype label to segment SQL querytype metrics by query type.

                                                                                                                                                                                                                                                                                                                                                                      4.9 -

                                                                                                                                                                                                                                                                                                                                                                      Prometheus Metrics Types

                                                                                                                                                                                                                                                                                                                                                                      Sysdig Monitor transforms Prometheus metrics into usable, actionable entries in two ways:

                                                                                                                                                                                                                                                                                                                                                                      Calculated Metrics

                                                                                                                                                                                                                                                                                                                                                                      The Prometheus metrics that are scraped by the Sysdig agent and transformed into the traditional StatsD model are called calculated metrics. In calculated metrics, the delta is stored with the previous value. This delta is what Sysdig uses on the classic backend for metrics analyzing and visualization. While generating the calculated metrics, the gauge metrics are kept as they are, but the counter metrics are transformed.

                                                                                                                                                                                                                                                                                                                                                                      Prometheus calculated metrics cannot be used in PromQL.

                                                                                                                                                                                                                                                                                                                                                                      The Histogram and Summary metrics are transformed into a different format called Prometheus histogram and summary metrics respectively. The transformations include:

                                                                                                                                                                                                                                                                                                                                                                      • All of the quantiles are transformed into a different metric, with the quantile added as a suffix.

                                                                                                                                                                                                                                                                                                                                                                      • The count and sum of these summary metrics are exposed as different metrics with names slightly changed. _ (underscore) in the name is replaced with a period .. For more information, see Mapping Classic Metrics and PromQL Metrics.

                                                                                                                                                                                                                                                                                                                                                                      Prometheus calculated metrics (legacy metrics) are scheduled to be deprecated in the coming months.

                                                                                                                                                                                                                                                                                                                                                                      Raw Metrics

                                                                                                                                                                                                                                                                                                                                                                      In Sysdig parlance, the Prometheus metrics that are scraped (by the Sysdig agent), collected, sent, stored, visualized, and presented exactly as Prometheus exposes them are called raw metrics. Raw metrics are used with PromQL.

                                                                                                                                                                                                                                                                                                                                                                      Sysdig counter is a StatsD type counter, where the difference in value is kept, but not the raw value of the counter, whereas Prometheus raw metrics are counters that are always monotonically increasing. A rate function needs to be applied on Prometheus raw metrics to make sense of it.

                                                                                                                                                                                                                                                                                                                                                                      Time Aggregations Over Prometheus Metrics

                                                                                                                                                                                                                                                                                                                                                                      The following time aggregations are supported for both the metric types:

                                                                                                                                                                                                                                                                                                                                                                      • Average: Returns an average of a set of data points, keeping all the labels.

                                                                                                                                                                                                                                                                                                                                                                      • Maximum and Minimum: Returns a maximal or minimal value, keeping all the labels.

                                                                                                                                                                                                                                                                                                                                                                      • Sum: Returns a sum of the values of data points, keeping all the labels.

                                                                                                                                                                                                                                                                                                                                                                      • Rate (timeAvg): Returns a sum of changes to the counter across data points in a given time period and divides by time, keeping all the labels as they are. For Prometheus raw metrics, timeAvg is calculated by taking the difference and dividing it by time.

                                                                                                                                                                                                                                                                                                                                                                      Prometheus Calculated Metrics

                                                                                                                                                                                                                                                                                                                                                                      Prometheus calculated metrics are treated as gauges by Sysdig, and there the following time aggregations are available:

                                                                                                                                                                                                                                                                                                                                                                      • Average

                                                                                                                                                                                                                                                                                                                                                                      • Sum

                                                                                                                                                                                                                                                                                                                                                                      • Minimum

                                                                                                                                                                                                                                                                                                                                                                      • Maximum

                                                                                                                                                                                                                                                                                                                                                                      Rate (timeAvg) is not available because they are not applicable to gauge metrics.

                                                                                                                                                                                                                                                                                                                                                                      Prometheus Raw Metrics

                                                                                                                                                                                                                                                                                                                                                                      For the gauge type, the following types are available:

                                                                                                                                                                                                                                                                                                                                                                      • Average

                                                                                                                                                                                                                                                                                                                                                                      • Minimum

                                                                                                                                                                                                                                                                                                                                                                      • Maximum

                                                                                                                                                                                                                                                                                                                                                                      For the counter type, the following types are available:

                                                                                                                                                                                                                                                                                                                                                                      • Rate: Calculates the first derivative of the counter (change over time).

                                                                                                                                                                                                                                                                                                                                                                      • Sum: Calculates a complete change of the counter over a period of time.

                                                                                                                                                                                                                                                                                                                                                                      5 -

                                                                                                                                                                                                                                                                                                                                                                      Dashboards

                                                                                                                                                                                                                                                                                                                                                                      Sysdig users can create customized dashboards to display the most useful or relevant views and metrics for the infrastructure in a single location. This feature-rich dashboards support both form-based and PromQL-based queries and offer several user experience enhancements:

                                                                                                                                                                                                                                                                                                                                                                      • Multiple data queries per panel

                                                                                                                                                                                                                                                                                                                                                                      • Basic (form-based) and advanced (PromQL) data queries

                                                                                                                                                                                                                                                                                                                                                                      • Compare basic query result against historical data

                                                                                                                                                                                                                                                                                                                                                                      • Improved granularity of data shown in dashboards. For example, a 1-hour selection shows metrics with 10 seconds intervals.

                                                                                                                                                                                                                                                                                                                                                                      • Display up-to-date metrics without time re-alignment.

                                                                                                                                                                                                                                                                                                                                                                      • Query support:

                                                                                                                                                                                                                                                                                                                                                                        • Allows to query multiple metrics

                                                                                                                                                                                                                                                                                                                                                                        • Render the results of a query (time series) as line, bars, stacked area, stairs, text, and so on.

                                                                                                                                                                                                                                                                                                                                                                        • Ability to scope and segment each query separately

                                                                                                                                                                                                                                                                                                                                                                        • Inherit, augment, or override the dashboard scope

                                                                                                                                                                                                                                                                                                                                                                        • Metric descriptor based units with the ability to override

                                                                                                                                                                                                                                                                                                                                                                        • Assign Y-axis automatically based on query unit type with the ability to override

                                                                                                                                                                                                                                                                                                                                                                      Each dashboard is composed of a series of panels configured to display specific data in a number of different formats. Learn more about how dashboards and panels are created, organized, and managed in the following sections:

                                                                                                                                                                                                                                                                                                                                                                      5.1 -

                                                                                                                                                                                                                                                                                                                                                                      About the Dashboard UI

                                                                                                                                                                                                                                                                                                                                                                      The main components of the Dashboard UI include widgets, time navigation, and panels.

                                                                                                                                                                                                                                                                                                                                                                      Widgets

                                                                                                                                                                                                                                                                                                                                                                      Dashboards support time series (Timechart), Histogram, Number graphs, Table, Text, and Toplist.

                                                                                                                                                                                                                                                                                                                                                                      Timechart, Number and Toplist graph support both form-based and advanced (PromQL) queries, whereas Histogram and Table panels support building only form-based queries. Form-based Number, Table, Histogram, and Toplist panels can show either the latest value for an entity or the entire range of values.

                                                                                                                                                                                                                                                                                                                                                                      Time Navigation

                                                                                                                                                                                                                                                                                                                                                                      Dashboard is designed around time. After a query has been executed, Sysdig Monitor polls the infrastructure data every 10 seconds and refreshes the metrics on the Dashboard panel. You select how to view this gathered data by choosing a Preset interval and a time Range.

                                                                                                                                                                                                                                                                                                                                                                      Presets

                                                                                                                                                                                                                                                                                                                                                                      Presets are a way of visualizing data that Sysdig Monitor gathers every 10 minutes. Select a preset to determine the data sample to be displayed. Overview supports the following presets:

                                                                                                                                                                                                                                                                                                                                                                      • 10 Minutes

                                                                                                                                                                                                                                                                                                                                                                      • 1 Hour

                                                                                                                                                                                                                                                                                                                                                                      • 6 Hour

                                                                                                                                                                                                                                                                                                                                                                      • 12 Hour

                                                                                                                                                                                                                                                                                                                                                                      • 1 Day

                                                                                                                                                                                                                                                                                                                                                                      • 4 Day

                                                                                                                                                                                                                                                                                                                                                                      • 1 Week

                                                                                                                                                                                                                                                                                                                                                                      • 2 Weeks

                                                                                                                                                                                                                                                                                                                                                                      A preset that is 10 minutes or less is refreshed every 30 seconds. A preset that is greater than 10 minutes is refreshed at every 10 second intervals.

                                                                                                                                                                                                                                                                                                                                                                      Presets work in conjunction with Range selections. Selecting a particular preset interval refreshes Range selection and reloads the data subsequently. For example:

                                                                                                                                                                                                                                                                                                                                                                      • 10 Minutes: Resets the Range to December 9, 2.20 pm - December 9, 2.30 pm.

                                                                                                                                                                                                                                                                                                                                                                      • 6 Hour: Resets the Range to December 9, 8.30 am - December 9, 2.30 pm.

                                                                                                                                                                                                                                                                                                                                                                      • 1 Day: Resets the Range to December 8, 2.30 pm - December 9, 2.30 pm.

                                                                                                                                                                                                                                                                                                                                                                      Range

                                                                                                                                                                                                                                                                                                                                                                      Range shows both date and time interval as well as the selected Presets in parenthesis. The Range indicated on the UI is determined by Presets. The time given is the closest time interval and by default, it is the current date and time preset by 1 hour.

                                                                                                                                                                                                                                                                                                                                                                      Click on the Range tab to open a calendar to select a range.

                                                                                                                                                                                                                                                                                                                                                                      See Presets to understand how Range works with Presets.

                                                                                                                                                                                                                                                                                                                                                                      Live

                                                                                                                                                                                                                                                                                                                                                                      The Live badge shows if the data shown is Live or Paused.

                                                                                                                                                                                                                                                                                                                                                                      • Live: the data is continuously updating based on the 10-minute polling of the Sysdig back end. The Overview feed is normally always Live.

                                                                                                                                                                                                                                                                                                                                                                      • Paused: When a specific row is selected, the data refresh pauses and the rows will not be updated with new data coming in.

                                                                                                                                                                                                                                                                                                                                                                      Time Format

                                                                                                                                                                                                                                                                                                                                                                      Dashboards support UTC and PDT time formats. Use the toggle button next to Range to change the time format for the slot shown in Range. The default is PDT.

                                                                                                                                                                                                                                                                                                                                                                      Panel Properties

                                                                                                                                                                                                                                                                                                                                                                      Query

                                                                                                                                                                                                                                                                                                                                                                      With the Dashboard, you can construct queries in two ways: Form-Based and Advanced. As you construct your query and type in a keyword in the Metrics field, auto-complete offers suggestions for the metrics in the query.

                                                                                                                                                                                                                                                                                                                                                                      Form-Based Query

                                                                                                                                                                                                                                                                                                                                                                      Use the UI fields to construct queries. Form-based data queries consist of one metric with time and group aggregation, Segmentation, Display, Unit for both incoming data as well as displaying data on the Y-Axis, and Scope. You can choose to inherit the Dashboard scope.

                                                                                                                                                                                                                                                                                                                                                                      Form-based queries support both Sysdig dot notation and Prometheus-compatible underscore notation.

                                                                                                                                                                                                                                                                                                                                                                      PromQL Query

                                                                                                                                                                                                                                                                                                                                                                      The PromQL field supports only PromQL queries. Manually enter a PromQL query as follows:

                                                                                                                                                                                                                                                                                                                                                                      Each query starts with a group aggregator, followed by a time aggregator, then the metrics and segmentation. For example:

                                                                                                                                                                                                                                                                                                                                                                      topk(10,avg(avg_over_time(sysdig_program_cpu_cores_used{$__scope}[$__interval])) by (program_name, container_name))
                                                                                                                                                                                                                                                                                                                                                                      

                                                                                                                                                                                                                                                                                                                                                                      Alternatively, you can build a form-based query and translate it to PromQl by using the Translate to PromQL option.

                                                                                                                                                                                                                                                                                                                                                                      For more information, see Build PromQL Panels from Form Query.

                                                                                                                                                                                                                                                                                                                                                                      $__interval

                                                                                                                                                                                                                                                                                                                                                                      You can use $__interval within a PromQL query to use the most appropriate sampling depending on the time range you have selected. This configuration ensures that the most granular data is accessible while downsampling when you select a long time range to panels load as fast as possible.

                                                                                                                                                                                                                                                                                                                                                                      Scope variables

                                                                                                                                                                                                                                                                                                                                                                      You can configure scope variables at the dashboard level to quickly filter metrics based on Cluster, Namespace, Workload, and more.

                                                                                                                                                                                                                                                                                                                                                                      When using PromQL queries, you can select the scope by using dynamic variables. This configuration is significant when troubleshooting as it allows you to switch context quickly without reconfiguring queries.

                                                                                                                                                                                                                                                                                                                                                                      $__scope

                                                                                                                                                                                                                                                                                                                                                                      You can use $__scope within a PromQL query to apply a selected scope. It allows you to apply the whole scope instead of applying each scope variable individually to the query. See [en/docs/sysdig-monitor/dashboards/dashboard-scope/#using-__Scope](Using $_scope)

                                                                                                                                                                                                                                                                                                                                                                      Smart Autocompletion and Syntax Highlighting

                                                                                                                                                                                                                                                                                                                                                                      Autocomplete suggests metrics, operators, and functions, while syntax highlighting helps highlight problems within a PromQL query. This is invaluable in dynamic environments and allows you to craft the right queries faster.

                                                                                                                                                                                                                                                                                                                                                                      Define Axes

                                                                                                                                                                                                                                                                                                                                                                      Sysdig Monitor provides the flexibility to add two Y-axes on the graph. You can also determine whether you want to use them at all. Having the option to add an extra Y-axis help when you decide to add an extra query.

                                                                                                                                                                                                                                                                                                                                                                      Specify the following for both Y-Axis and Y-Axis Right:

                                                                                                                                                                                                                                                                                                                                                                      • Show: Select to show the Y-Axis on the graph.

                                                                                                                                                                                                                                                                                                                                                                      • Scale: Specify the scale in which you want the data to be shown on the graph.

                                                                                                                                                                                                                                                                                                                                                                      • Unit: Specify the unit of scale for the incoming data.

                                                                                                                                                                                                                                                                                                                                                                      • Display Format: Specify the unit of scale for the data to be displayed on the Y-Axis.

                                                                                                                                                                                                                                                                                                                                                                      • Y-Max: Specify the highest value to be displayed on the Y-Axis. Consider this as the highest point on the range. You can specify the limits as numeric values. However, the type of values that you specify must match the type of values along the axis. Y-Max should be always greater than Y-Min.

                                                                                                                                                                                                                                                                                                                                                                      • Y-Min: Specify the lowest value to be displayed on the Y-Axis. Consider this as the lowest point on the range. You can specify both limits or you can specify one limit and let the axes automatically calculate the other.

                                                                                                                                                                                                                                                                                                                                                                      Define Legend

                                                                                                                                                                                                                                                                                                                                                                      Determine whether you want a legend with a descriptive label for each plotted time series. Specify the location and layout. Determine the value to be displayed should be the most recently calculated data.

                                                                                                                                                                                                                                                                                                                                                                      For the labels, the legend uses the text you have specified in the Query Display Name and Timeseries Name fields.

                                                                                                                                                                                                                                                                                                                                                                      Enable Show to show the legend or create a legend if one does not exist.

                                                                                                                                                                                                                                                                                                                                                                      Right positions the legend in the upper right corner of the panel. Bottom positions the legend in the lower-left corner of the panel.

                                                                                                                                                                                                                                                                                                                                                                      Define Panel

                                                                                                                                                                                                                                                                                                                                                                      Specify the Panel heading and description by using the Panel tab. The description you enter appears as the panel information as follows:

                                                                                                                                                                                                                                                                                                                                                                      5.2 -

                                                                                                                                                                                                                                                                                                                                                                      Using PromQL

                                                                                                                                                                                                                                                                                                                                                                      PromQL is available only in Sysdig SaaS editions. The feature is not yet supported by Sysdig on-premises installations.

                                                                                                                                                                                                                                                                                                                                                                      The Prometheus Query Language (PromQL) is the defacto standard for querying Prometheus metric data. PromQL is designed to allow the user to select and aggregate time-series data.

                                                                                                                                                                                                                                                                                                                                                                      Sysdig Monitor’s PromQL support includes all of the features, functions, and aggregations in standard open-source PromQL. The PromQL language is documented at Prometheus Query Basics.

                                                                                                                                                                                                                                                                                                                                                                      For new functionalities released as part of agent v10.0.0, see Collect Prometheus Metrics.

                                                                                                                                                                                                                                                                                                                                                                      Construct a PromQL Query

                                                                                                                                                                                                                                                                                                                                                                      In the Dashboard Panel, select the PromQL type to query data using PromQL.

                                                                                                                                                                                                                                                                                                                                                                      • Display: Specify the following:

                                                                                                                                                                                                                                                                                                                                                                        • Type: Select the type of chart. The supported types are Stacked Area and Line. This option is currently not supported for other visualization types.

                                                                                                                                                                                                                                                                                                                                                                        • Query Display Name: A meaningful display name for the legend. The text you enter replaces the metric name displayed in the legend. The default legend title is the metric name. The default legend title is the query itself.

                                                                                                                                                                                                                                                                                                                                                                        • Timeseries Name: A display name of the time series for the query using text and any label values returned with the metric.

                                                                                                                                                                                                                                                                                                                                                                      • Query: Enter one or more PromQL queries directly. For example:

                                                                                                                                                                                                                                                                                                                                                                        sum(rate(sysdig_container_net_in_bytes{$__scope}[$__interval])) by (container_id,agent_id)
                                                                                                                                                                                                                                                                                                                                                                        

                                                                                                                                                                                                                                                                                                                                                                        Specify the following:

                                                                                                                                                                                                                                                                                                                                                                        • Metrics: Search the desired metric. The field supports auto-complete. Enter the text and the rest of the text you type is predicted so you can filter the metric easily. In the example: sysdig_container_net_in_bytes.

                                                                                                                                                                                                                                                                                                                                                                        • Segmentation: This is the process of categorizing aggregated data with labels to provide precise control over the data. Choose an appropriate value for segmenting the aggregated PromQL data. In this example, container_id and agent_id.

                                                                                                                                                                                                                                                                                                                                                                        The PromQL query field supports the following reserved variables. The variables are replaced in the UI in real-time. The expressions are translated into PromQL format and applied to the query while fetching the data.

                                                                                                                                                                                                                                                                                                                                                                        • $__range: Represents the time range currently selected in the time navigation. In the Live mode, the value is constantly updated to reflect the new time range.

                                                                                                                                                                                                                                                                                                                                                                        • $__interval: Represents a time interval and is automatically configured based on the time range.

                                                                                                                                                                                                                                                                                                                                                                        • $_scope: Represents the selected scope that is applied to a PromQL query. The defined scope is applied by using the filter functionality of PromQL similar to how scope variables are applied. It allows you to apply whole scope to the queries, instead of applying each scope variable individually.

                                                                                                                                                                                                                                                                                                                                                                      • Options: Specify the following:

                                                                                                                                                                                                                                                                                                                                                                        • Unit and Y-Axes: Specify the unit of scale and display format.
                                                                                                                                                                                                                                                                                                                                                                        • No Data Display: Determine how to display null data on the dashboard.
                                                                                                                                                                                                                                                                                                                                                                      • Axes: Determine scale, unit, display format, and gauge for the Y-axes.

                                                                                                                                                                                                                                                                                                                                                                      • Legend: Determine the position of the legend in the Dashboard.

                                                                                                                                                                                                                                                                                                                                                                      • Panel: Specify a name and add details about the panel.

                                                                                                                                                                                                                                                                                                                                                                        See Create a New Panel for details.

                                                                                                                                                                                                                                                                                                                                                                      Build PromQL Panels from Form Query

                                                                                                                                                                                                                                                                                                                                                                      You can use the Translate to PromQL option to quickly build a PromQL-based panel from form queries. To do so,

                                                                                                                                                                                                                                                                                                                                                                      1. Build a form query, as given in Building a Form-Based Query. For example, let us build a Toplist for the metric, sysdig_program_cpu_cores_used, segmented by program_name and container_name.

                                                                                                                                                                                                                                                                                                                                                                      2. For Sorting, choose Top.

                                                                                                                                                                                                                                                                                                                                                                      3. Click Translate to PromQL.

                                                                                                                                                                                                                                                                                                                                                                        If a PromQL query is already defined, you will see a message similar to the following:


                                                                                                                                                                                                                                                                                                                                                                      In the scenario, you are overriding manually-created or manually-modified queries in the PromQL tab.

                                                                                                                                                                                                                                                                                                                                                                      1. Click Continue to proceed.

                                                                                                                                                                                                                                                                                                                                                                        The PromQL Toplist panel will be displayed on screen.

                                                                                                                                                                                                                                                                                                                                                                      Apply a Dashboard Scope to a PromQL Query

                                                                                                                                                                                                                                                                                                                                                                      The dashboard scope is automatically applied only to form-based panels. To scope a panel built from a PromQL query, you must use a scope variable within the query. The variable will take the value of the referenced scope parameter, and the PromQL panel will change accordingly.

                                                                                                                                                                                                                                                                                                                                                                      There are two predefined variables available:

                                                                                                                                                                                                                                                                                                                                                                      • $__interval represents the time interval defined based on the time range. This will help to adapt the time range for different operations, such as rate and avg_over_time, and prevent displaying empty graphs due to the change in the granularity of the data.

                                                                                                                                                                                                                                                                                                                                                                      • $__range represents the time interval defined for the dashboard. This is used to adapt operations like calculating average for a time frame selected.

                                                                                                                                                                                                                                                                                                                                                                      The following examples show how to use scope variables within PromQL queries.

                                                                                                                                                                                                                                                                                                                                                                      Example: CPU Used Percent

                                                                                                                                                                                                                                                                                                                                                                      The following query returns the CPU used percent for all the hosts, regardless of the scope configured at the dashboard level, with a mobile average depending on the time span defined.

                                                                                                                                                                                                                                                                                                                                                                      avg_over_time(sysdig_host_cpu_used_percent[$__interval])
                                                                                                                                                                                                                                                                                                                                                                      

                                                                                                                                                                                                                                                                                                                                                                      To scope this query, you must set up an appropriate scope variable. A key step is to provide a variable name that is referenced as part of the query.

                                                                                                                                                                                                                                                                                                                                                                      In this example, hostname is used as the variable name. The host can then be referenced using $hostname as follows:

                                                                                                                                                                                                                                                                                                                                                                      avg_over_time(sysdig_host_cpu_used_percent{host_name=$hostname}[$__interval])
                                                                                                                                                                                                                                                                                                                                                                      

                                                                                                                                                                                                                                                                                                                                                                      Depending on the operator specified while configuring scope values, you might need to use a different operator within the query. If you are not using the correct operator for the scope type, the system will perform the query but will show a warning as the results may not be the expected ones.

                                                                                                                                                                                                                                                                                                                                                                      Scope Operator

                                                                                                                                                                                                                                                                                                                                                                      PromQL Filter Operator

                                                                                                                                                                                                                                                                                                                                                                      Example

                                                                                                                                                                                                                                                                                                                                                                      • is foo

                                                                                                                                                                                                                                                                                                                                                                      • is not foo

                                                                                                                                                                                                                                                                                                                                                                      • = : Select labels that are exactly equal to the provided string.

                                                                                                                                                                                                                                                                                                                                                                      • != : Select labels that are not equal to the provided string.

                                                                                                                                                                                                                                                                                                                                                                      sysdig_host_cpu_used_percent{host_name=$hostname}
                                                                                                                                                                                                                                                                                                                                                                      • in foo,bar

                                                                                                                                                                                                                                                                                                                                                                      • not in foo,bar

                                                                                                                                                                                                                                                                                                                                                                      • =~: Select labels that regex-match the provided string.

                                                                                                                                                                                                                                                                                                                                                                      • !~ : Select labels that do not regex-match the provided string.

                                                                                                                                                                                                                                                                                                                                                                      sysdig_host_cpu_used_percent{host_name=~$hostname}

                                                                                                                                                                                                                                                                                                                                                                      Enrich Metrics with Labels

                                                                                                                                                                                                                                                                                                                                                                      Running PromQL queries in Sysdig Monitor by default returns only a minimum set of labels. To enrich the return results of PromQL queries with additional labels, such as Kubernetes cluster name, you need to use a vector matching operation. The vector matching operation in Prometheus is similar to the SQL-like join operation.

                                                                                                                                                                                                                                                                                                                                                                      Info Metrics

                                                                                                                                                                                                                                                                                                                                                                      Prometheus returns different information metrics that have a value of 1 with several labels. The information that the info metrics return might not be useful as it is. However, joining the labels of an info metric with a non-info metric can provide useful information, such as the value of metric X across an infrastructure/application/deployment.

                                                                                                                                                                                                                                                                                                                                                                      Vector Matching Operation

                                                                                                                                                                                                                                                                                                                                                                      The vector matching operation is similar to an SQL join. You use a vector matching operation to build a PromQL query that can return metrics with information from your infrastructure. Vector matching helps filter and enrich labels, usually adding information labels to the metrics you are trying to visualize.

                                                                                                                                                                                                                                                                                                                                                                      See Mapping Between Classic Metrics and PromQL Metrics for a list of info metrics.

                                                                                                                                                                                                                                                                                                                                                                      Example 1: Return a Metric Filtered by Cluster

                                                                                                                                                                                                                                                                                                                                                                      This example shows a metric returned by an application, say myapp_guage, running on Kubernetes. The query attempts at getting an aggregated value of a cluster, by having one cluster selected in the scope. We assume that previously you have set a $cluster variable in your scope.

                                                                                                                                                                                                                                                                                                                                                                      To do so, run the following query to return the myapp_guage metrics:

                                                                                                                                                                                                                                                                                                                                                                      sum (myapp_gauge * on (container_id) kube_pod_container_info{cluster=$cluster})
                                                                                                                                                                                                                                                                                                                                                                      

                                                                                                                                                                                                                                                                                                                                                                      The query performs the following operations, not necessarily in this order:

                                                                                                                                                                                                                                                                                                                                                                      • The kube_pod_container_info info metrics is filtered, selecting only those timeseries and the associated cluster values you want to see. The selection is based on the cluster label.

                                                                                                                                                                                                                                                                                                                                                                      • The myapp_gauge metric is matched with the kube_pod_container_info metric where the container_id label has the same value, multiplying both the values. Because the info metric has the value 1, multiplying the values doesn’t change the result. As the info metric has already been filtered by a cluster, only those values associated with the cluster will be kept.

                                                                                                                                                                                                                                                                                                                                                                      • The resultant timeseries with the value of myapp_gauge are then aggregated with the sum function and the result is returned.

                                                                                                                                                                                                                                                                                                                                                                      Example 2: Calculate the GC Latency

                                                                                                                                                                                                                                                                                                                                                                      This example shows calculating the GC latency in a go application deployed on a specific Kubernetes namespace.

                                                                                                                                                                                                                                                                                                                                                                      To calculate the GC latency, run the following query:

                                                                                                                                                                                                                                                                                                                                                                      go_gc_duration_seconds * on (container_id,host_mac) group_left(pod,namespace) kube_pod_container_info{namespace=~$namespace}
                                                                                                                                                                                                                                                                                                                                                                      

                                                                                                                                                                                                                                                                                                                                                                      The query is performing the following operations:

                                                                                                                                                                                                                                                                                                                                                                      • The kube_pod_container_info info metrics are filtered based on the namespace variable.

                                                                                                                                                                                                                                                                                                                                                                      • The metrics associated with go_gc_duration_seconds is matched in a many-to-one way with the filtered kube_pod_container_info .

                                                                                                                                                                                                                                                                                                                                                                        The pod and namespace labels are added from the kube_pod_container_info metric to the result. The query keeps only those metrics that have the matching container_id and host_mac labels on both sides.

                                                                                                                                                                                                                                                                                                                                                                      • The values are multiplied and the resulting metrics are returned. The new metrics will only have the values associated with go_gc_duration_seconds because the info metric value is always 1.

                                                                                                                                                                                                                                                                                                                                                                      You can use any Prometheus metric in the query. For example, the query above can be rewritten for a sample Apache metric as follows:

                                                                                                                                                                                                                                                                                                                                                                      appinfo_apache_net_bytes * on (container_id) group_left(pod, namespace) kube_pod_container_info{namespace=~$namespace}
                                                                                                                                                                                                                                                                                                                                                                      

                                                                                                                                                                                                                                                                                                                                                                      Example 3: Calculate Average CPU Used Percent in AWS Hosts

                                                                                                                                                                                                                                                                                                                                                                      This example shows calculating the average CPU used percent per AWS account and region, having the hosts filtered by account and region.

                                                                                                                                                                                                                                                                                                                                                                      avg by(region,account_id) (sysdig_host_cpu_used_percent  * on (host_mac) group_left(region,account_id) sysdig_cloud_provider_info{account_id=~$AWS_account, region=~$AWS_region})
                                                                                                                                                                                                                                                                                                                                                                      

                                                                                                                                                                                                                                                                                                                                                                      The query performs the following operations:

                                                                                                                                                                                                                                                                                                                                                                      • Filters the sysdig_cloud_provider_info metric based on the account_id and region labels that come from the dashboard scope as variables.

                                                                                                                                                                                                                                                                                                                                                                      • Matches the sysdig_host_cpu_used_percent metrics with sysdig_cloud_provider_info. Only those metrics with the same host_mac label on both sides are extracted, adding region and account_id labels to the resulting metrics.

                                                                                                                                                                                                                                                                                                                                                                      • Calculates the average of the new metrics by account_id and region.

                                                                                                                                                                                                                                                                                                                                                                      Example 4: Calculate Total CPU Usage in Deployments

                                                                                                                                                                                                                                                                                                                                                                      This example shows calculating the total CPU usage per deployment. The value can also be filtered by cluster, namespace, and deployment by using the dashboard scope.

                                                                                                                                                                                                                                                                                                                                                                      sum by(cluster,namespace,owner_name) ((sysdig_container_cpu_cores_used * on(container_id) group_left(pod,namespace,cluster) kube_pod_container_info) * on(pod,namespace,cluster) group_left(owner_name) kube_pod_owner{owner_kind="Deployment",owner_name=~$deployment,cluster=~$cluster,namespace=~$namespace})
                                                                                                                                                                                                                                                                                                                                                                      
                                                                                                                                                                                                                                                                                                                                                                      • sysdig_container_cpu_cores_used can be replaced by any metric that has the container_id label.

                                                                                                                                                                                                                                                                                                                                                                      • To connect the sysdig_container_cpu_cores_used metric with the pod, use kube_pod_container_info and then, use kube_pod_owner to connect the pod to other kubernetes objects.

                                                                                                                                                                                                                                                                                                                                                                      The query performs the following:

                                                                                                                                                                                                                                                                                                                                                                      • sysdig_container_cpu_cores_used * on(container_id) group_left(pod,namespace,cluster) kube_pod_container_info:

                                                                                                                                                                                                                                                                                                                                                                        • The sysdig_container_cpu_cores_used metric value is multiplied with kube_pod_container_info (which has the value of 1), by matching container_id and by keeping the pod, namespace and cluster labels as it is.

                                                                                                                                                                                                                                                                                                                                                                          _name_='sysdig_container_cpu_cores_used',container='<label>', container_id='<label>',container_type='DOCKER`,host_mac='<label>'
                                                                                                                                                                                                                                                                                                                                                                          
                                                                                                                                                                                                                                                                                                                                                                        • The new metrics will be

                                                                                                                                                                                                                                                                                                                                                                          cluster='<label>',container='<label>', container_id='<label>',container_type='DOCKER`,host_mac='<label>',namespace='<label>, pod='<label>'
                                                                                                                                                                                                                                                                                                                                                                          
                                                                                                                                                                                                                                                                                                                                                                      • The value extracted from the previous result is multiplied with kube_pod_owner (which has the value of 1) by matching on the pod, namespace, and cluster labels and keeping the owner name from the value of kube_pod_owner . The owner can be deployment, replicaset, service, daemonset, or statefulset object.

                                                                                                                                                                                                                                                                                                                                                                        • The name of the deployment to filter upon is extracted from the kube_pod_owner metrics.

                                                                                                                                                                                                                                                                                                                                                                        • The pod, namespace, and cluster names are extracted from the kube_pod_container_info metrics.

                                                                                                                                                                                                                                                                                                                                                                      • The new metrics will be:

                                                                                                                                                                                                                                                                                                                                                                        cluster='<matched_label>',container='<matched_container_label>', container_id='<label>',container_type='DOCKER`,host_mac='<label>',namespace='<label>, owner_name ='<label>', pod='<label>'
                                                                                                                                                                                                                                                                                                                                                                        
                                                                                                                                                                                                                                                                                                                                                                      • The kube_pod_owner will have a label owner_name that is the name of the object that owns the pod. This value is extracted by filtering:

                                                                                                                                                                                                                                                                                                                                                                        kube_pod_owner{owner_kind="Deployment",owner_name=~$deployment,cluster=~$cluster,namespace=~$namespace}
                                                                                                                                                                                                                                                                                                                                                                        

                                                                                                                                                                                                                                                                                                                                                                        The owner_kind provides the deployment name and the origin of owner_name , that is the dashboard scope.

                                                                                                                                                                                                                                                                                                                                                                      • The sum aggregation is applied and the time series are aggregated by cluster, namespace, and deployment.

                                                                                                                                                                                                                                                                                                                                                                      The following table helps understand the labels applied in each step of the query:

                                                                                                                                                                                                                                                                                                                                                                      __name__

                                                                                                                                                                                                                                                                                                                                                                      container_id

                                                                                                                                                                                                                                                                                                                                                                      container

                                                                                                                                                                                                                                                                                                                                                                      container_type

                                                                                                                                                                                                                                                                                                                                                                      host_mac

                                                                                                                                                                                                                                                                                                                                                                      pod

                                                                                                                                                                                                                                                                                                                                                                      namespace

                                                                                                                                                                                                                                                                                                                                                                      cluster

                                                                                                                                                                                                                                                                                                                                                                      owner_name

                                                                                                                                                                                                                                                                                                                                                                      sysdig_container_cpu_cores_used * on(container_id) group_left(pod,namespace,cluster) kube_pod_container_info)

                                                                                                                                                                                                                                                                                                                                                                      No

                                                                                                                                                                                                                                                                                                                                                                      Yes

                                                                                                                                                                                                                                                                                                                                                                      Yes

                                                                                                                                                                                                                                                                                                                                                                      Yes

                                                                                                                                                                                                                                                                                                                                                                      Yes

                                                                                                                                                                                                                                                                                                                                                                      Yes

                                                                                                                                                                                                                                                                                                                                                                      Yes

                                                                                                                                                                                                                                                                                                                                                                      Yes

                                                                                                                                                                                                                                                                                                                                                                      No

                                                                                                                                                                                                                                                                                                                                                                      (sysdig_container_cpu_cores_used * on(container_id) group_left(pod,namespace,cluster) kube_pod_container_info) * on(pod,namespace,cluster) group_left(owner_name) kube_pod_owner{owner_kind="Deployment",owner_name=~$deployment,cluster=~$cluster,namespace=~$namespace}

                                                                                                                                                                                                                                                                                                                                                                      No

                                                                                                                                                                                                                                                                                                                                                                      Yes

                                                                                                                                                                                                                                                                                                                                                                      Yes

                                                                                                                                                                                                                                                                                                                                                                      Yes

                                                                                                                                                                                                                                                                                                                                                                      Yes

                                                                                                                                                                                                                                                                                                                                                                      Yes

                                                                                                                                                                                                                                                                                                                                                                      Yes

                                                                                                                                                                                                                                                                                                                                                                      Yes

                                                                                                                                                                                                                                                                                                                                                                      Yes

                                                                                                                                                                                                                                                                                                                                                                      sum by(cluster,namespace,owner_name) ((sysdig_container_cpu_cores_used * on(container_id) group_left(pod,namespace,cluster) kube_pod_container_info) * on(pod,namespace,cluster) group_left(owner_name) kube_pod_owner{owner_kind="Deployment",owner_name=~$deployment,cluster=~$cluster,namespace=~$namespace})

                                                                                                                                                                                                                                                                                                                                                                      No

                                                                                                                                                                                                                                                                                                                                                                      No

                                                                                                                                                                                                                                                                                                                                                                      No

                                                                                                                                                                                                                                                                                                                                                                      No

                                                                                                                                                                                                                                                                                                                                                                      No

                                                                                                                                                                                                                                                                                                                                                                      No

                                                                                                                                                                                                                                                                                                                                                                      Yes

                                                                                                                                                                                                                                                                                                                                                                      Yes

                                                                                                                                                                                                                                                                                                                                                                      Yes

                                                                                                                                                                                                                                                                                                                                                                      Formatting

                                                                                                                                                                                                                                                                                                                                                                      Sysdig Monitor supports percentages only as 0-100 values. In calculated ratios, you can skip multiplying the whole query times 100 by selecting percentage as a 0-1 value.

                                                                                                                                                                                                                                                                                                                                                                      Learn More

                                                                                                                                                                                                                                                                                                                                                                      5.3 -

                                                                                                                                                                                                                                                                                                                                                                      Dashboard Scope

                                                                                                                                                                                                                                                                                                                                                                      Dashboard and panel scope defines what data is valid for aggregation and display within the dashboard. The scope can be set at a dashboard-wide level, or overridden for individual panels, by any user type except for View Only users.

                                                                                                                                                                                                                                                                                                                                                                      The current scope is displayed in the top left-hand corner of the module screen:

                                                                                                                                                                                                                                                                                                                                                                      For more information on how scopes work, refer to the Grouping, Scoping, and Segmenting Metrics documentation.

                                                                                                                                                                                                                                                                                                                                                                      Configure Dashboard Scope

                                                                                                                                                                                                                                                                                                                                                                      To configure the scope of an existing dashboard:

                                                                                                                                                                                                                                                                                                                                                                      1. From the Dashboard module, select the relevant dashboard from the dashboard list.

                                                                                                                                                                                                                                                                                                                                                                      2. Click the Edit Scope link in the top right of the module screen:

                                                                                                                                                                                                                                                                                                                                                                      3. Open the first level drop-down menu.

                                                                                                                                                                                                                                                                                                                                                                      4. Select the labels either by clicking the desired label, or searching for the label, then clicking it.

                                                                                                                                                                                                                                                                                                                                                                      5. Select one or more labels values from the drop-down.

                                                                                                                                                                                                                                                                                                                                                                        Scope editor restricts the scope of the selection for subsequent filters by rendering values that are specific to the selected labels. For example, if the value of the kube_namespace_name label is kube-system, the values of the subsequent label, container_name will be filtered by kube-system. This means the containers rendered for filtering are only those that are part of the kube-system namespace.


                                                                                                                                                                                                                                                                                                                                                                      6. Optional: Dashboard Templating.

                                                                                                                                                                                                                                                                                                                                                                        Dashboard scope values can be defined as variables, allowing users to create a template, and use one dashboard for multiple outputs.

                                                                                                                                                                                                                                                                                                                                                                      7. Optional: Add additional label/value combinations to further refine the scope.

                                                                                                                                                                                                                                                                                                                                                                      8. Click Save to save the new scope, or click Cancel button to revert the changes.

                                                                                                                                                                                                                                                                                                                                                                        To reset the dashboard scope to the entire infrastructure, or to update an existing dashboard’s scope to the entire infrastructure, click Clear All.

                                                                                                                                                                                                                                                                                                                                                                      Configure Panel Scope

                                                                                                                                                                                                                                                                                                                                                                      To configure the scope of an existing dashboard panel:

                                                                                                                                                                                                                                                                                                                                                                      1. From the Dashboard module, select the relevant dashboard from the dashboard list.

                                                                                                                                                                                                                                                                                                                                                                      2. Click Edit (pencil) icon:

                                                                                                                                                                                                                                                                                                                                                                      3. From the query field associated with the metric, click Scope.

                                                                                                                                                                                                                                                                                                                                                                      4. Select the labels either by clicking the desired label, or searching for the label, then clicking it.


                                                                                                                                                                                                                                                                                                                                                                      5. Select one or more label values.

                                                                                                                                                                                                                                                                                                                                                                      6. Optionally, apply panel scope to all the queries.

                                                                                                                                                                                                                                                                                                                                                                      7. Click Save to confirm the changes.

                                                                                                                                                                                                                                                                                                                                                                      Using $__scope

                                                                                                                                                                                                                                                                                                                                                                      The Scope variable is indicated by $_scope and can be used in PromQL queries. The variable represents a scope that you have already defined. When you insert the $_scope variable to a PromQL expression, the selected scope is applied to the query you have built. The scope variable allows you to apply the whole scope to the query, instead of applying each scope variable individually.

                                                                                                                                                                                                                                                                                                                                                                      If you select Entire Infrastructure as the scope, no scope will be applied.

                                                                                                                                                                                                                                                                                                                                                                      5.4 -

                                                                                                                                                                                                                                                                                                                                                                      Configure Dashboards

                                                                                                                                                                                                                                                                                                                                                                      There are two parts to creating a dashboard - creating the dashboard itself, and creating the panels that display the information.

                                                                                                                                                                                                                                                                                                                                                                      5.4.1 -

                                                                                                                                                                                                                                                                                                                                                                      Create a New Dashboard

                                                                                                                                                                                                                                                                                                                                                                      To create a dashboard with the following:

                                                                                                                                                                                                                                                                                                                                                                      • Using the Get Started Wizard.

                                                                                                                                                                                                                                                                                                                                                                      • Using a dashboard template.

                                                                                                                                                                                                                                                                                                                                                                        Dashboard templates are essentially immutable dashboards that can’t be edited, and the scope is fixed. You can copy them and customize as desired. See Dashboard Templates.

                                                                                                                                                                                                                                                                                                                                                                      • Using directly the Dashboard tab. This section helps you navigate to the default Panel editor screen.

                                                                                                                                                                                                                                                                                                                                                                      Get Started Wizard

                                                                                                                                                                                                                                                                                                                                                                      Clicking the Create Dashboard takes you to the default panel editor screen.

                                                                                                                                                                                                                                                                                                                                                                      Dashboard Tab

                                                                                                                                                                                                                                                                                                                                                                      1. On the Dashboards tab, click Add Dashboard.

                                                                                                                                                                                                                                                                                                                                                                      2. Select one of the following:

                                                                                                                                                                                                                                                                                                                                                                        • From Dashboard Template: Copy from a dashboard template.

                                                                                                                                                                                                                                                                                                                                                                        • Blank Dashboard: When you create a new dashboard, you are dropped into the panel editor. It is the default dashboard for the avg(avg(sysdig_container_cpu_used_percent)) metrics.

                                                                                                                                                                                                                                                                                                                                                                      3. Specify a name for the dashboard, build a query, and save.

                                                                                                                                                                                                                                                                                                                                                                        For information on running queries, see the following:

                                                                                                                                                                                                                                                                                                                                                                        The new dashboard will now be added to the side panel under My Dashboards and is ready for configuration.

                                                                                                                                                                                                                                                                                                                                                                      5.4.2 -

                                                                                                                                                                                                                                                                                                                                                                      Dashboard Templates

                                                                                                                                                                                                                                                                                                                                                                      Sysdig provides a number of pre-built dashboards, designed around various supported applications, network topologies, infrastructure layouts, and services. These can be used to jump-start the dashboard building process, as templates for further configuration.

                                                                                                                                                                                                                                                                                                                                                                      Templates come with a series of panels already configured, based on the information most relevant users. The example below uses the Container dashboard template:

                                                                                                                                                                                                                                                                                                                                                                      The default dashboard includes number panels for CPU and Memory usage, total, in the network, and out of network bytes, and line graphs comparing in the network and out of network bytes, as well as byte usage by application/port, process, and by the host.

                                                                                                                                                                                                                                                                                                                                                                      To learn more, see Dashboard Templates.

                                                                                                                                                                                                                                                                                                                                                                      5.4.3 -

                                                                                                                                                                                                                                                                                                                                                                      Configure Dashboard Layout

                                                                                                                                                                                                                                                                                                                                                                      Configure Full Screen

                                                                                                                                                                                                                                                                                                                                                                      To view the current dashboard in full-screen mode:

                                                                                                                                                                                                                                                                                                                                                                      Click the Settings (three dots) icon for the dashboard, and select the Full Screen option:

                                                                                                                                                                                                                                                                                                                                                                      Dashboards cannot be configured in full-screen mode. They are read-only until the full-screen mode is exited.

                                                                                                                                                                                                                                                                                                                                                                      To exit full-screen mode, either press the ESC keyboard key or click the Exit (cross) icon.

                                                                                                                                                                                                                                                                                                                                                                      Configure Panel Size

                                                                                                                                                                                                                                                                                                                                                                      Configure Individual Panels

                                                                                                                                                                                                                                                                                                                                                                      The size of individual panels can be altered by moving the mouse cursor over the bottom right corner of a panel, until the diagonal resize cursor appears, pressing and holding the left mouse button, and increasing or decreasing the size of the panel by moving the cursor while pressed. The changes can be saved by clicking the Save Layout link, or reverted by clicking the Revert Changes link.

                                                                                                                                                                                                                                                                                                                                                                      Configure All Panels

                                                                                                                                                                                                                                                                                                                                                                      To configure the size of every panel in the dashboard:

                                                                                                                                                                                                                                                                                                                                                                      1. On the Dashboards tab, select the relevant dashboard from the left-hand panel.

                                                                                                                                                                                                                                                                                                                                                                      2. Click the Settings (three dots) icon for the dashboard.

                                                                                                                                                                                                                                                                                                                                                                      3. Select Layout to open the drop-down menu.

                                                                                                                                                                                                                                                                                                                                                                      4. Select the desired panel size.

                                                                                                                                                                                                                                                                                                                                                                      5. If the new size is correct, click the Save Layout link. Otherwise, select Revert Changes.

                                                                                                                                                                                                                                                                                                                                                                        Configuring this setting overrides all custom panel sizes.

                                                                                                                                                                                                                                                                                                                                                                      Move Panels

                                                                                                                                                                                                                                                                                                                                                                      To move a panel to a new position in the dashboard, move the mouse cursor over the top of the panel, until the hand cursor appears. Press and hold the left mouse button, and move the panel by moving the cursor while pressing the button. The changes can be saved by clicking the Save Layout link, or reverted by clicking the Revert Changes link.

                                                                                                                                                                                                                                                                                                                                                                      5.4.4 -

                                                                                                                                                                                                                                                                                                                                                                      Delete a Dashboard

                                                                                                                                                                                                                                                                                                                                                                      The owner or the administrator of a shared dashboard can delete it. If users duplicate that dashboard, they become the owner of the new one and are allowed to freely delete it.

                                                                                                                                                                                                                                                                                                                                                                      For information on access rights, see Access Levels in Dashboard.

                                                                                                                                                                                                                                                                                                                                                                      To delete an existing dashboard:

                                                                                                                                                                                                                                                                                                                                                                      1. On the Dashboard tab, select the relevant dashboard from the left-hand panel.

                                                                                                                                                                                                                                                                                                                                                                      2. Click the Settings (three dots) icon for the dashboard.

                                                                                                                                                                                                                                                                                                                                                                      3. Select Delete Dashboard.

                                                                                                                                                                                                                                                                                                                                                                      4. Click the Yes, Delete the Dashboard button to confirm the change.

                                                                                                                                                                                                                                                                                                                                                                      5.5 -

                                                                                                                                                                                                                                                                                                                                                                      Configure Panels

                                                                                                                                                                                                                                                                                                                                                                      Learn more about types, creating, and managing panels in the following sections:

                                                                                                                                                                                                                                                                                                                                                                      5.5.1 -

                                                                                                                                                                                                                                                                                                                                                                      Create a New Panel

                                                                                                                                                                                                                                                                                                                                                                      Sysdig Monitor supports both form-based and PromQL-based queries. You simply run a query and Sysdig Monitor builds a Dashboard that you can customize according to your preferences.

                                                                                                                                                                                                                                                                                                                                                                      To create a new panel, you can do one of the following:

                                                                                                                                                                                                                                                                                                                                                                      • Create a new dashboard.

                                                                                                                                                                                                                                                                                                                                                                        When you create a new dashboard, it opens to a pre-built panel. You can run a new query and build the dashboard.

                                                                                                                                                                                                                                                                                                                                                                      • Use a dashboard template.

                                                                                                                                                                                                                                                                                                                                                                        Dashboard templates are essentially immutable dashboards that can’t be edited, and the scope is fixed. You can copy them and customize as desired. See Dashboard Templates.

                                                                                                                                                                                                                                                                                                                                                                      • Add a new panel to an existing dashboard.

                                                                                                                                                                                                                                                                                                                                                                      • For a PromQL panel, use the Translate to PromQL option.

                                                                                                                                                                                                                                                                                                                                                                      To create a new panel:

                                                                                                                                                                                                                                                                                                                                                                      1. On the Dashboard tab, select the relevant dashboard from the drop-down.

                                                                                                                                                                                                                                                                                                                                                                      2. Click the Add Panel icon.

                                                                                                                                                                                                                                                                                                                                                                        The default panel editor opens up.

                                                                                                                                                                                                                                                                                                                                                                      3. Set up the panel:

                                                                                                                                                                                                                                                                                                                                                                        1. Build either a form-based query or a PromQL-based query.

                                                                                                                                                                                                                                                                                                                                                                        2. Define right and left Y-axes.

                                                                                                                                                                                                                                                                                                                                                                        3. Define the legend.

                                                                                                                                                                                                                                                                                                                                                                        4. Specify a unique title and a brief description for the panel, and enter a custom message to report no data.

                                                                                                                                                                                                                                                                                                                                                                      4. Click Save to save the changes.

                                                                                                                                                                                                                                                                                                                                                                      Building a Form-Based Query

                                                                                                                                                                                                                                                                                                                                                                      Each type of visualization has different settings and the query fields are determined by the type. For demonstration purposes, this topic explains the steps to create a Line chart.

                                                                                                                                                                                                                                                                                                                                                                      1. On the Dashboards tab, click Add dashboard.

                                                                                                                                                                                                                                                                                                                                                                        Clicking the (+) icon opens a default panel editor.

                                                                                                                                                                                                                                                                                                                                                                      2. Select a visualization type. To do so, click the Timechart tab.

                                                                                                                                                                                                                                                                                                                                                                        For more information on types of visualization, see Types of Panels.

                                                                                                                                                                                                                                                                                                                                                                      3. Select the appropriate time presets from the time navigation.

                                                                                                                                                                                                                                                                                                                                                                      4. Select a metric from the drop-down as follows:

                                                                                                                                                                                                                                                                                                                                                                        You can either scroll down or type the first few letters of the metrics. As you enter the first few letters the drop-down lists the matching entries.

                                                                                                                                                                                                                                                                                                                                                                      5. Specify Time Aggregation and Group Rollup.

                                                                                                                                                                                                                                                                                                                                                                      6. Specify an appropriate segmentation:

                                                                                                                                                                                                                                                                                                                                                                        You can enter the number of entities and the order in which they are displayed in the legend.

                                                                                                                                                                                                                                                                                                                                                                        Not applicable to Number panels.

                                                                                                                                                                                                                                                                                                                                                                      7. Specify the display text in the Display field.

                                                                                                                                                                                                                                                                                                                                                                        The text appears as a title for the legend:


                                                                                                                                                                                                                                                                                                                                                                      8. (optional) Specify the scope for the panel you are creating.

                                                                                                                                                                                                                                                                                                                                                                        You can either choose to inherit the dashboard scope as it is or apply the scope to one or all the queries.

                                                                                                                                                                                                                                                                                                                                                                      9. Specify the unit of scale and the display format for Y-Axis.

                                                                                                                                                                                                                                                                                                                                                                        This option is currently available only for Timeseries panels.

                                                                                                                                                                                                                                                                                                                                                                      10. Determine how to display null data on the dashboard.

                                                                                                                                                                                                                                                                                                                                                                        You can display no data as a gap, a zero value, a dotted line, or a solid line in the graph. See Display Missing Data.

                                                                                                                                                                                                                                                                                                                                                                      11. Optionally, compare the data against historical data.

                                                                                                                                                                                                                                                                                                                                                                        When segmentation is applied, comparing metrics against historical data is not supported.

                                                                                                                                                                                                                                                                                                                                                                      Building a PromQL Query

                                                                                                                                                                                                                                                                                                                                                                      To run a PromQL query:

                                                                                                                                                                                                                                                                                                                                                                      1. Do one of the following:

                                                                                                                                                                                                                                                                                                                                                                        • Click Add Dashboard if you are creating a new dashboard.

                                                                                                                                                                                                                                                                                                                                                                        • Click Add Panel if you are adding a new panel to an existing Dashboard.

                                                                                                                                                                                                                                                                                                                                                                      2. Click the PromQL button.


                                                                                                                                                                                                                                                                                                                                                                        The PromQL panel appears.
                                                                                                                                                                                                                                                                                                                                                                      3. Enter the query in the PromQL field:


                                                                                                                                                                                                                                                                                                                                                                        In this example, the rate of memory heaps released in bytes in an interval of 5 minutes is calculated and then the total rate is calculated in each Kubernetes cluster.

                                                                                                                                                                                                                                                                                                                                                                      4. Select the desired time window.

                                                                                                                                                                                                                                                                                                                                                                      5. Specify a descriptive title for the legend and a name for the time series.


                                                                                                                                                                                                                                                                                                                                                                        You can specify a variable as shown in the image. The variable name is replaced with the Kubernetes cluster names in the legend.

                                                                                                                                                                                                                                                                                                                                                                      6. Specify the unit for incoming data and how it should be displayed.


                                                                                                                                                                                                                                                                                                                                                                        For example, you can specify the incoming data to be gathered in kilobytes and displayed as megabytes.

                                                                                                                                                                                                                                                                                                                                                                        Also, determine the location of the Y-Axis on the graph. When you have additional queries, the flexibility to place an additional Y-axis on the graph comes in handy.

                                                                                                                                                                                                                                                                                                                                                                      7. Determine how to display null data on the dashboard.

                                                                                                                                                                                                                                                                                                                                                                        You can display no data as a gap, a zero value, a dotted line, or a solid line in the graph. See Display Missing Data.

                                                                                                                                                                                                                                                                                                                                                                      8. Click Save to save the changes.

                                                                                                                                                                                                                                                                                                                                                                      5.5.2 -

                                                                                                                                                                                                                                                                                                                                                                      Types of Panels

                                                                                                                                                                                                                                                                                                                                                                      This topic introduces you to the types of panels in the New Dashboard.

                                                                                                                                                                                                                                                                                                                                                                      5.5.2.1 -

                                                                                                                                                                                                                                                                                                                                                                      Timechart Panel

                                                                                                                                                                                                                                                                                                                                                                      A Timechart is a graph produced by applying statistical aggregation to a label over an interval. The X-axis of a timechart will always be time.

                                                                                                                                                                                                                                                                                                                                                                      Timecharts allow you to see the change in metric value over time. The amount of data visualized on a graph is dependent on the time selection selected within the Dashboard. You can aggregate metrics from multiple sources into a single line, or graph a line per combination of segment labels.

                                                                                                                                                                                                                                                                                                                                                                      Time aggregation: For example, the average value of cpu.used.percent metric is computed for each entity over 1 hour at 1-minute intervals.

                                                                                                                                                                                                                                                                                                                                                                      Group Rollup: For each host.hostName the values from time aggregation are averaged over the scope and the top 10 segments are shown on the chart.

                                                                                                                                                                                                                                                                                                                                                                      The only supported panel type now in time series is the Line chart.

                                                                                                                                                                                                                                                                                                                                                                      Line Chart

                                                                                                                                                                                                                                                                                                                                                                      The Line panel show change over time in a selected window. Time is plotted on the horizontal axis and the change that is measured is plotted on the vertical axis.

                                                                                                                                                                                                                                                                                                                                                                      The image below shows the trend of resource consumption of top resource-hogging hosts in the last one hour.

                                                                                                                                                                                                                                                                                                                                                                      Configure Line Chart

                                                                                                                                                                                                                                                                                                                                                                      For information on configuring a chart, see Create a New Panel.

                                                                                                                                                                                                                                                                                                                                                                      Stacked Area

                                                                                                                                                                                                                                                                                                                                                                      An area chart is distinguished from a line chart by the addition of shading between lines.

                                                                                                                                                                                                                                                                                                                                                                      For information on configuring a chart, see Create a New Panel.

                                                                                                                                                                                                                                                                                                                                                                      5.5.2.2 -

                                                                                                                                                                                                                                                                                                                                                                      Number Panel

                                                                                                                                                                                                                                                                                                                                                                      Number panels allow you to view a single value for a given entity, along with optionally comparing the current value to historical values. Use the Number panel when the number is the most important aspect of the metric you’re trying to display, such as unique visitors to a website.

                                                                                                                                                                                                                                                                                                                                                                      Do not use this panel to see a trend, rather use it when you need to see the average of a value over the given time range. This is also useful for counting entities, such as the number of nodes in a cluster.

                                                                                                                                                                                                                                                                                                                                                                      For information on configuring a panel, see Create a New Panel.

                                                                                                                                                                                                                                                                                                                                                                      Major Features

                                                                                                                                                                                                                                                                                                                                                                      • The default preset for the Number visualization is 1 hour.

                                                                                                                                                                                                                                                                                                                                                                      • The global default values for the threshold are overridable. The new value can be reset back to the global default.

                                                                                                                                                                                                                                                                                                                                                                      • A comparison between two threshold values determines color-coding directions.

                                                                                                                                                                                                                                                                                                                                                                      • The Compare To functionality can be toggled between enabled and disabled.

                                                                                                                                                                                                                                                                                                                                                                      • When the Compare To value is set, the preview is updated accordingly showing the comparison value and an arrow denoting the metric has increased or decreased.

                                                                                                                                                                                                                                                                                                                                                                      • The unit displayed for Thresholds is determined by the query.

                                                                                                                                                                                                                                                                                                                                                                      5.5.2.3 -

                                                                                                                                                                                                                                                                                                                                                                      Table Panel

                                                                                                                                                                                                                                                                                                                                                                      The Table panel displays metric data in tabular form. In this view, you can review metric values and their associated labels in a single view. Use Table panels for such quantitative analysis where you can see actual values instead of visual representations. Similar to a spreadsheet, you can look at a combination of metric values and their segments. This is useful when you don’t necessarily care about the change in metric over time, or want to run reports to download as CSV/JSON for offline analysis.

                                                                                                                                                                                                                                                                                                                                                                      The panel displays the value returned by the metric query specified in the Query tab. The value is determined by the data source and the query. Each datapoint will have an associated raw and an option to add columns for additional metric values.

                                                                                                                                                                                                                                                                                                                                                                      Configuring Table Panel

                                                                                                                                                                                                                                                                                                                                                                      Major features include but not limited to :

                                                                                                                                                                                                                                                                                                                                                                      • Queries

                                                                                                                                                                                                                                                                                                                                                                        • The first query you build cannot be removed.

                                                                                                                                                                                                                                                                                                                                                                        • With subsequent queries are built, you cannot remove all the queries except the first one.

                                                                                                                                                                                                                                                                                                                                                                        • Changing the unit of the query changes the unit in the table as well.

                                                                                                                                                                                                                                                                                                                                                                        • Changing the display format on the query reflects on the row values.

                                                                                                                                                                                                                                                                                                                                                                      • Segmentation

                                                                                                                                                                                                                                                                                                                                                                        • The segmentation label determines the column name.

                                                                                                                                                                                                                                                                                                                                                                        • The segmentation in conjunction with metric values determines the values displayed on the rows.

                                                                                                                                                                                                                                                                                                                                                                      • Scope

                                                                                                                                                                                                                                                                                                                                                                        • The selected scope determines the values displayed on the table.
                                                                                                                                                                                                                                                                                                                                                                      • Metric / Labels Columns

                                                                                                                                                                                                                                                                                                                                                                        • Adding a new query insert a new column with the name of the metric as the column heading.

                                                                                                                                                                                                                                                                                                                                                                        • Metric values in conjunction with segmentation determine the values displayed on the rows.

                                                                                                                                                                                                                                                                                                                                                                      • Sorting

                                                                                                                                                                                                                                                                                                                                                                        • Column sorting is based on the selected column header and the type of sorting (ascending and descending).

                                                                                                                                                                                                                                                                                                                                                                        • When another column is sorted, the table is resorted by that column, resetting the previous sorting.

                                                                                                                                                                                                                                                                                                                                                                      • Resizing

                                                                                                                                                                                                                                                                                                                                                                        • Grab the header column by the borderline to resize the columns.

                                                                                                                                                                                                                                                                                                                                                                        • Browser window resizes shouldn’t reset the resize of the columns if you have resized any columns.

                                                                                                                                                                                                                                                                                                                                                                        • When resizing the browser window, table columns are resized to cover the full width. An exception is when you have already resized columns. In such cases, other columns that you have not resized are resized on browser window resize.

                                                                                                                                                                                                                                                                                                                                                                        • The last column in the table is not resizable.

                                                                                                                                                                                                                                                                                                                                                                      • Export

                                                                                                                                                                                                                                                                                                                                                                        • The table by default shows a maximum of 50 rows.

                                                                                                                                                                                                                                                                                                                                                                        • Clicking on Export all results… below the table opens the Export Data window.

                                                                                                                                                                                                                                                                                                                                                                        • Export data in either JSON or CVS format to a file. The default name of the file is the panel name. Renaming the default filename is permissible.

                                                                                                                                                                                                                                                                                                                                                                      For information on configuring a chart, see Create a New Panel.

                                                                                                                                                                                                                                                                                                                                                                      5.5.2.4 -

                                                                                                                                                                                                                                                                                                                                                                      Text

                                                                                                                                                                                                                                                                                                                                                                      The example below uses a text panel as a reminder list of the testing steps for a procedure.

                                                                                                                                                                                                                                                                                                                                                                      Text Panel Markdown

                                                                                                                                                                                                                                                                                                                                                                      Headers

                                                                                                                                                                                                                                                                                                                                                                      # H1
                                                                                                                                                                                                                                                                                                                                                                      ## H2
                                                                                                                                                                                                                                                                                                                                                                      ### H3
                                                                                                                                                                                                                                                                                                                                                                      #### H4
                                                                                                                                                                                                                                                                                                                                                                      ##### H5
                                                                                                                                                                                                                                                                                                                                                                      ###### H6
                                                                                                                                                                                                                                                                                                                                                                      
                                                                                                                                                                                                                                                                                                                                                                      H1
                                                                                                                                                                                                                                                                                                                                                                      ======
                                                                                                                                                                                                                                                                                                                                                                      H2
                                                                                                                                                                                                                                                                                                                                                                      ------
                                                                                                                                                                                                                                                                                                                                                                      

                                                                                                                                                                                                                                                                                                                                                                      Emphasis

                                                                                                                                                                                                                                                                                                                                                                      *italics* or _italics_
                                                                                                                                                                                                                                                                                                                                                                      **bold** or __bold__
                                                                                                                                                                                                                                                                                                                                                                      **combined _emphasis_**
                                                                                                                                                                                                                                                                                                                                                                      ~~strikethrough~~
                                                                                                                                                                                                                                                                                                                                                                      

                                                                                                                                                                                                                                                                                                                                                                      Lists

                                                                                                                                                                                                                                                                                                                                                                      1. First ordered list item
                                                                                                                                                                                                                                                                                                                                                                      2. Second item
                                                                                                                                                                                                                                                                                                                                                                        * Unordered sub-list.
                                                                                                                                                                                                                                                                                                                                                                          Sub-paragraph within the list item.
                                                                                                                                                                                                                                                                                                                                                                      1. Third item
                                                                                                                                                                                                                                                                                                                                                                        8. First ordered sub-list item.
                                                                                                                                                                                                                                                                                                                                                                      103. Fourth item
                                                                                                                                                                                                                                                                                                                                                                      

                                                                                                                                                                                                                                                                                                                                                                      General guidelines:

                                                                                                                                                                                                                                                                                                                                                                      • The list item number does not matter. As shown in the example below, the formatting defines the lists.

                                                                                                                                                                                                                                                                                                                                                                      • List items can contain properly indented paragraphs, using white space.

                                                                                                                                                                                                                                                                                                                                                                      • Unordered list can use: *, -, or +.

                                                                                                                                                                                                                                                                                                                                                                      Linebreaks

                                                                                                                                                                                                                                                                                                                                                                      This is the first sentence.
                                                                                                                                                                                                                                                                                                                                                                      
                                                                                                                                                                                                                                                                                                                                                                      This line is separated from the one above by two newlines, so it will be a *separate paragraph*.
                                                                                                                                                                                                                                                                                                                                                                      
                                                                                                                                                                                                                                                                                                                                                                      This line is also a separate paragraph.
                                                                                                                                                                                                                                                                                                                                                                      This line is only separated by a single newline, so it's a separate line in the *same paragraph*.
                                                                                                                                                                                                                                                                                                                                                                      

                                                                                                                                                                                                                                                                                                                                                                      Trailing spaces can be used for line-breaks without creating a new paragraph. This behavior is contrary to the typical GFM line break behavior, where trailing spaces are not required.

                                                                                                                                                                                                                                                                                                                                                                      5.5.2.5 -

                                                                                                                                                                                                                                                                                                                                                                      Toplist

                                                                                                                                                                                                                                                                                                                                                                      A Toplist chart displays the specified number of entities, such as containers, with the most or least of any metric value. This is useful for “ranking” metric values in order, for example, considering hosts that have the highest amount of pods running or the highest consumers of CPU or memory in your infrastructure.

                                                                                                                                                                                                                                                                                                                                                                      Major Features

                                                                                                                                                                                                                                                                                                                                                                      • Toplist supports executing multiple queries.

                                                                                                                                                                                                                                                                                                                                                                      • Segmentation is supported for all queries.

                                                                                                                                                                                                                                                                                                                                                                      • Text displayed on the bars in the chart is based on queries and segmentation.

                                                                                                                                                                                                                                                                                                                                                                        • If there is a single query without segmentation, the query name is displayed.

                                                                                                                                                                                                                                                                                                                                                                        • If there is a single query and multiple segmentations are selected, segmentation texts separated by > sign are displayed.

                                                                                                                                                                                                                                                                                                                                                                        • If there are multiple queries, the query name is displayed on the bar.

                                                                                                                                                                                                                                                                                                                                                                      Segmentation

                                                                                                                                                                                                                                                                                                                                                                      You can use multiple objects to simultaneously segment a single metric. For example, cpu.used.percent segmented by kubernetes.cluster.name, kubernetes.namespace.name, and kubernetes.deployment.name.

                                                                                                                                                                                                                                                                                                                                                                      In this example, deployments are sequentially listed in the order of resource consumption. Use Display to toggle between descending (Top) and ascending order (Bottom).

                                                                                                                                                                                                                                                                                                                                                                      5.5.2.6 -

                                                                                                                                                                                                                                                                                                                                                                      Histogram

                                                                                                                                                                                                                                                                                                                                                                      Sysdig Monitor handles three types of Histograms:

                                                                                                                                                                                                                                                                                                                                                                      • Histogram panel type on the Dashboard: Histogram panels allow you to visualize the distribution of metric values for large data collection. You should select a segmentation, and optionally, the number of buckets.

                                                                                                                                                                                                                                                                                                                                                                        Use Histogram for any metric, Sysdig native or custom, counter or gauge, segmented by a dimension/label. The histogram panel helps understand value across different segments. For example, CPU usage percent by pods across your cluster gives you the aggregated value across the selected time.

                                                                                                                                                                                                                                                                                                                                                                      • Legacy Prometheus histogram collection: This implementation of legacy Prometheus Histograms is deprecated in SaaS 3.2.6 release.

                                                                                                                                                                                                                                                                                                                                                                        To create a Histogram, use the Prometheus integration to collect histogram metrics and use the PromQL panel with the histogram_quantile function.

                                                                                                                                                                                                                                                                                                                                                                      • Prometheus histograms (collected as raw metrics): The legacy Prometheus histogram collection is replaced by the new Prometheus histogram. You can natively collect histogram metrics, and for visualization, use timechart:

                                                                                                                                                                                                                                                                                                                                                                        For example, run the following query to build a timechart:

                                                                                                                                                                                                                                                                                                                                                                        sum(histogram_metrics_bucket{kubernetes_cluster_name="prod"}) by (le)
                                                                                                                                                                                                                                                                                                                                                                        
                                                                                                                                                                                                                                                                                                                                                                        

                                                                                                                                                                                                                                                                                                                                                                      5.5.3.1 -

                                                                                                                                                                                                                                                                                                                                                                      Create Panel Alerts

                                                                                                                                                                                                                                                                                                                                                                      Alerts can be created directly from a form-based panel in a New Dashboard. If the panel has more than one query, you must select the query to use as the base for the alert.

                                                                                                                                                                                                                                                                                                                                                                      To create an alert:

                                                                                                                                                                                                                                                                                                                                                                      1. Click the More Options (three dots) icon.

                                                                                                                                                                                                                                                                                                                                                                      2. Select Create Alert.

                                                                                                                                                                                                                                                                                                                                                                      3. Configure the alert, and click the Create button.

                                                                                                                                                                                                                                                                                                                                                                      5.5.3.2 -

                                                                                                                                                                                                                                                                                                                                                                      Export Panel Data

                                                                                                                                                                                                                                                                                                                                                                      Table and Timechart panels in New Dashboard allow exporting data to a CSV or JSON file. This file could serve as a backup of your data or for programmatical use.

                                                                                                                                                                                                                                                                                                                                                                      You can export data using the following:

                                                                                                                                                                                                                                                                                                                                                                      • Panel menu in the New Dashboard

                                                                                                                                                                                                                                                                                                                                                                      • Table panel

                                                                                                                                                                                                                                                                                                                                                                      To export while creating or editing a Table panel:

                                                                                                                                                                                                                                                                                                                                                                      1. Select Table from the Visualization type.

                                                                                                                                                                                                                                                                                                                                                                        The panel opens to the Columns tab.

                                                                                                                                                                                                                                                                                                                                                                      2. Below the table, click Export all results….

                                                                                                                                                                                                                                                                                                                                                                        The Export Data window is displayed.

                                                                                                                                                                                                                                                                                                                                                                      3. Select the format.

                                                                                                                                                                                                                                                                                                                                                                      4. Specify a filename.

                                                                                                                                                                                                                                                                                                                                                                        The default name of the file is the panel name. You can rename the file that you are about to download.

                                                                                                                                                                                                                                                                                                                                                                      5. Click Export to save the data into the file.

                                                                                                                                                                                                                                                                                                                                                                        Exporting might take several minutes to complete.

                                                                                                                                                                                                                                                                                                                                                                      5.5.3.3 -

                                                                                                                                                                                                                                                                                                                                                                      Copy Panels to a Different Dashboards

                                                                                                                                                                                                                                                                                                                                                                      Copy a Single Panel

                                                                                                                                                                                                                                                                                                                                                                      To copy a single panel to a different dashboard:

                                                                                                                                                                                                                                                                                                                                                                      1. From the Explore tab, select the desired drill-down view.

                                                                                                                                                                                                                                                                                                                                                                      2. Hover over the desired panel, select the Settings (ellipsis) icon, and select Copy Panel.

                                                                                                                                                                                                                                                                                                                                                                      3. Open the drop-down menu and select the desired dashboard, or use the text-field to search through existing dashboards.

                                                                                                                                                                                                                                                                                                                                                                        To copy the panel to a new dashboard, enter a name for the new dashboard in the text-field instead.

                                                                                                                                                                                                                                                                                                                                                                      4. Click the Copy and Open button to save the changes and navigate to the configured dashboard.

                                                                                                                                                                                                                                                                                                                                                                      Copy All Panels

                                                                                                                                                                                                                                                                                                                                                                      To copy all panels in a drill-down view to a dashboard:

                                                                                                                                                                                                                                                                                                                                                                      1. From the Explore tab, select the desired drill-down view.

                                                                                                                                                                                                                                                                                                                                                                      2. Select the More Options (three dots) icon.

                                                                                                                                                                                                                                                                                                                                                                      3. Select Copy to Dashboard:

                                                                                                                                                                                                                                                                                                                                                                      4. Open the drop-down menu and select the desired dashboard, or use the text-field to search through existing dashboards.

                                                                                                                                                                                                                                                                                                                                                                        To copy the panel to a new dashboard, enter a name for the new dashboard in the text-field instead.

                                                                                                                                                                                                                                                                                                                                                                      5. Click the Copy and Open button to save the changes and navigate to the configured dashboard.

                                                                                                                                                                                                                                                                                                                                                                      Create a Panel Alert

                                                                                                                                                                                                                                                                                                                                                                      Alerts can be created directly from a dashboard panel:

                                                                                                                                                                                                                                                                                                                                                                      1. Click the More Options (three dots) icon.

                                                                                                                                                                                                                                                                                                                                                                      2. Select CreateAlert.

                                                                                                                                                                                                                                                                                                                                                                      3. Configure the alert, and click the Create button.

                                                                                                                                                                                                                                                                                                                                                                      5.5.3.4 -

                                                                                                                                                                                                                                                                                                                                                                      Duplicate a Panel

                                                                                                                                                                                                                                                                                                                                                                      Hover over the desired panel, click the Settings (ellipsis) icon, and select Duplicate Panel.

                                                                                                                                                                                                                                                                                                                                                                      5.5.3.5 -

                                                                                                                                                                                                                                                                                                                                                                      Delete an Existing Panel

                                                                                                                                                                                                                                                                                                                                                                      To delete a panel from a dashboard:

                                                                                                                                                                                                                                                                                                                                                                      1. Hover over the desired panel, click the Settings (ellipsis) icon, and select Delete Panel.

                                                                                                                                                                                                                                                                                                                                                                      2. Click the Yes, delete panel button to confirm, or the Cancel button to keep the panel.

                                                                                                                                                                                                                                                                                                                                                                      5.6 -

                                                                                                                                                                                                                                                                                                                                                                      Managing Dashboards

                                                                                                                                                                                                                                                                                                                                                                      This section helps you effectively use dashboards and share them with your team.

                                                                                                                                                                                                                                                                                                                                                                      5.6.1 -

                                                                                                                                                                                                                                                                                                                                                                      Dashboards Types

                                                                                                                                                                                                                                                                                                                                                                      Dashboards are organized into the following main categories

                                                                                                                                                                                                                                                                                                                                                                      • My Favorites: The dashboards marked as favorites by the current user.

                                                                                                                                                                                                                                                                                                                                                                      • Shared By My Team: The dashboards created by other users in the team and shared with the current user.

                                                                                                                                                                                                                                                                                                                                                                      • Featured: A curated list of Kubernetes dashboards.

                                                                                                                                                                                                                                                                                                                                                                      • My Dashboards: The dashboards created by the current user.

                                                                                                                                                                                                                                                                                                                                                                      • Dashboard Templates: Out-of-the-box templates that you can copy and use. A dashboard created from a template inherits the template name.

                                                                                                                                                                                                                                                                                                                                                                      5.6.2 -

                                                                                                                                                                                                                                                                                                                                                                      Set a Default Dashboard

                                                                                                                                                                                                                                                                                                                                                                      A default dashboard can be configured by setting the default entry point for a team, unifying a team’s Sysdig Monitor experience, and allowing users to focus their immediate attention on the most relevant information for them. For more information on configuring a default entry point, refer to the Configure an Entry Page or Dashboard for a Team section of the Sysdig Platform documentation.

                                                                                                                                                                                                                                                                                                                                                                      5.6.3 -

                                                                                                                                                                                                                                                                                                                                                                      Display Dashboard Specific Events

                                                                                                                                                                                                                                                                                                                                                                      Sysdig Monitor allows users to configure dashboards to display infrastructure events relevant to a dashboard’s panels within the panels themselves. This allows users an even more in-depth view of the status of their environment. To configure how events are displayed:

                                                                                                                                                                                                                                                                                                                                                                      1. On the Dashboard tab, select the relevant dashboard from the dashboard list.

                                                                                                                                                                                                                                                                                                                                                                      2. Click the Dashboard Settings (three dots) icon and select Events Display:


                                                                                                                                                                                                                                                                                                                                                                      3. Enable the Show Events slider to show events in the dashboard panels.


                                                                                                                                                                                                                                                                                                                                                                      4. Configure the available parameters, and click the Close button.

                                                                                                                                                                                                                                                                                                                                                                      OptionDescription
                                                                                                                                                                                                                                                                                                                                                                      FilterDefines specific events, or a scope of events, to display.
                                                                                                                                                                                                                                                                                                                                                                      ScopeDetermines whether the range of events displayed includes those for dashboard scope or team scope.
                                                                                                                                                                                                                                                                                                                                                                      SeverityDetermines whether only high severity events or all events are displayed.
                                                                                                                                                                                                                                                                                                                                                                      Event TypeDetermines what types of events to be displayed. The supported events types are alert, custom events, containers, or Kubernetes.
                                                                                                                                                                                                                                                                                                                                                                      StatusDetermines the state of events displayed. The supported status are Triggered, Resolved, Acknowledged, Un-acknowledged.

                                                                                                                                                                                                                                                                                                                                                                      5.6.4 -

                                                                                                                                                                                                                                                                                                                                                                      Sharing New Dashboards

                                                                                                                                                                                                                                                                                                                                                                      Dashboards can be shared internally among team members, with other teams, within the wider organization, or publicly, by configuring a public URL for the dashboard.

                                                                                                                                                                                                                                                                                                                                                                      As an owner of a dashboard, you can share the dashboard with any team and provide the Viewer or Collaborator access permission.

                                                                                                                                                                                                                                                                                                                                                                      Access Levels in Dashboard

                                                                                                                                                                                                                                                                                                                                                                      The RBAC-based permissions determine how users can interact with Dashboards. They establish what capabilities are allowed or denied for a user or a team. For more information on RBAC rules, see RBAC Rules for Dashboards.

                                                                                                                                                                                                                                                                                                                                                                      The table below summarizes the various ways a dashboard can be shared and effective permissions for users.

                                                                                                                                                                                                                                                                                                                                                                      Who can share/copyDashboard InstanceTeam/User who has accessCan ReadCan Edit
                                                                                                                                                                                                                                                                                                                                                                      Share with current TeamDashboard CreatorSame dashboard instanceCurrent team members onlyAll members of the teamEdit users of the team
                                                                                                                                                                                                                                                                                                                                                                      Share publicly as URLAny Edit User of the teamSame dashboard instanceAnyone with URL (does not have to by Sysdig user)AnyoneAnyone with URL (does not have to by Sysdig user) with Scope variables
                                                                                                                                                                                                                                                                                                                                                                      Copy to My TeamsAny Edit User of the teamDuplicate Copy of the dashboardCurrent team members onlyAll members of the teamEdit users of the team

                                                                                                                                                                                                                                                                                                                                                                      Share a Dashboard with Teams

                                                                                                                                                                                                                                                                                                                                                                      Dashboards can be shared across a user’s current team or a selected set of teams, allowing other team members to view the dashboard, as well as edit the panels if they have edit permissions within the team.

                                                                                                                                                                                                                                                                                                                                                                      If a dashboard has been shared with another team, a user within that team can then copy it to make it their own if they wish.

                                                                                                                                                                                                                                                                                                                                                                      To share a dashboard:

                                                                                                                                                                                                                                                                                                                                                                      1. Select the dashboard you want to share.

                                                                                                                                                                                                                                                                                                                                                                      2. Click the Dashboard Settings (three dots) icon and select Dashboard Settings.

                                                                                                                                                                                                                                                                                                                                                                      3. In the Dashboard Settings page, use the Shared With drop-down.

                                                                                                                                                                                                                                                                                                                                                                      4. Select one of the three options:

                                                                                                                                                                                                                                                                                                                                                                        • Not Shared: If selected, the specified Dashboard cannot be shared with a team or selected team the owner is a member of.

                                                                                                                                                                                                                                                                                                                                                                        • All Teams: If selected, the owners of the Dashboard can share with all the teams that they are part of.

                                                                                                                                                                                                                                                                                                                                                                        • Selected Teams: If selected, the owner of the Dashboard can share with a selected list of teams. You can select one of the available teams in the drop-down, and select member permission:

                                                                                                                                                                                                                                                                                                                                                                          • View Only: This permission allows members to view the Dashboard.

                                                                                                                                                                                                                                                                                                                                                                          • Collaborator: A collaborator can edit the Dashboard.

                                                                                                                                                                                                                                                                                                                                                                      Enable Public Sharing

                                                                                                                                                                                                                                                                                                                                                                      Dashboards can be shared outside of the internal team by using public URLs. This allows external users to review the dashboard metrics while restricting access to changing panels and configurations.

                                                                                                                                                                                                                                                                                                                                                                      The scope parameters, including scope variables, are included in the Dashboard URL. External users with a valid link can change the scope parameters without having to sign in. They can edit either on the UI or in the URL. The scope parameters are passed to the standard request header, consisting of a question mark, followed by the parameter name, an equal sign, and the parameter value. To edit a parameter in the URL, simply replace it with the desired one.

                                                                                                                                                                                                                                                                                                                                                                      1. Select the dashboard you want to share.

                                                                                                                                                                                                                                                                                                                                                                      2. Click the Dashboard Settings (three dots) icon and select Dashboard Settings.

                                                                                                                                                                                                                                                                                                                                                                      3. In the Dashboard Settings page, enable the Public Sharing slider.

                                                                                                                                                                                                                                                                                                                                                                        When enabled, the dashboard is visible with scope parameters to anyone with the link. If this setting is disabled, the link will no longer work, and the setting will need to be re-enabled and shared again in order for the dashboard to be accessed.

                                                                                                                                                                                                                                                                                                                                                                      4. Copy the public sharing URL for sharing.

                                                                                                                                                                                                                                                                                                                                                                      5.6.4.1 -

                                                                                                                                                                                                                                                                                                                                                                      RBAC Rules for Dashboards

                                                                                                                                                                                                                                                                                                                                                                      The table below summarizes the role-based permissions.

                                                                                                                                                                                                                                                                                                                                                                      Owner Permissions

                                                                                                                                                                                                                                                                                                                                                                      Roles

                                                                                                                                                                                                                                                                                                                                                                      Owner Permissions

                                                                                                                                                                                                                                                                                                                                                                      User Roles

                                                                                                                                                                                                                                                                                                                                                                      Administrator

                                                                                                                                                                                                                                                                                                                                                                      A user owning a dashboard will now have three different team sharing options:

                                                                                                                                                                                                                                                                                                                                                                      • Not Shared

                                                                                                                                                                                                                                                                                                                                                                      • Share with all the teams that the owner is part of

                                                                                                                                                                                                                                                                                                                                                                      • Share with a selected list of teams

                                                                                                                                                                                                                                                                                                                                                                      For the last two options, the owner can pick the type of access: Collaborator (with edit rights) or View only.

                                                                                                                                                                                                                                                                                                                                                                      Regular User (non-administrator user)

                                                                                                                                                                                                                                                                                                                                                                      Team Roles

                                                                                                                                                                                                                                                                                                                                                                      Advanced user

                                                                                                                                                                                                                                                                                                                                                                      Standard user

                                                                                                                                                                                                                                                                                                                                                                      Team manager

                                                                                                                                                                                                                                                                                                                                                                      View-only user

                                                                                                                                                                                                                                                                                                                                                                      Not applicable.

                                                                                                                                                                                                                                                                                                                                                                      Owner Permissions

                                                                                                                                                                                                                                                                                                                                                                      When a user decides to share a dashboard with a set of teams, they’ll only be able to pick teams that they are members of.

                                                                                                                                                                                                                                                                                                                                                                      The table below summarizes what you can do with a shared dashboard.

                                                                                                                                                                                                                                                                                                                                                                      User Permissions

                                                                                                                                                                                                                                                                                                                                                                      User Permissions

                                                                                                                                                                                                                                                                                                                                                                      View Only

                                                                                                                                                                                                                                                                                                                                                                      Collaborator

                                                                                                                                                                                                                                                                                                                                                                      User Role

                                                                                                                                                                                                                                                                                                                                                                      Administrator

                                                                                                                                                                                                                                                                                                                                                                      Edit

                                                                                                                                                                                                                                                                                                                                                                      An admin can still edit a shared dashboard even if it's shared in view-only mode.

                                                                                                                                                                                                                                                                                                                                                                      Edit

                                                                                                                                                                                                                                                                                                                                                                      Regular User (non-administrator user)

                                                                                                                                                                                                                                                                                                                                                                      View Only

                                                                                                                                                                                                                                                                                                                                                                      Team Role

                                                                                                                                                                                                                                                                                                                                                                      Advanced user

                                                                                                                                                                                                                                                                                                                                                                      Advanced user

                                                                                                                                                                                                                                                                                                                                                                      Team manager

                                                                                                                                                                                                                                                                                                                                                                      View-only user

                                                                                                                                                                                                                                                                                                                                                                      View Only

                                                                                                                                                                                                                                                                                                                                                                      User Permissions

                                                                                                                                                                                                                                                                                                                                                                      5.6.4.2 -

                                                                                                                                                                                                                                                                                                                                                                      Transfer Dashboard Ownership

                                                                                                                                                                                                                                                                                                                                                                      Dashboards have a single owner. Sysdig Monitor allows administrators and dashboard owners with administrator permissions to transfer the ownership of a dashboard within the UI.

                                                                                                                                                                                                                                                                                                                                                                      There are several reasons for assigning a new owner to dashboards.

                                                                                                                                                                                                                                                                                                                                                                      • The dashboard owners are no longer in control of the dashboard data.

                                                                                                                                                                                                                                                                                                                                                                      • Administrators require to update the dashboard settings or fix how data is displayed.

                                                                                                                                                                                                                                                                                                                                                                      General Guidelines

                                                                                                                                                                                                                                                                                                                                                                      • When a user is deleted, any shared dashboards they own or have created will be preserved by default.

                                                                                                                                                                                                                                                                                                                                                                      • The administrator can transfer only the dashboards that are shared by other users. Private dashboards cannot be seen and therefore cannot be transferred.

                                                                                                                                                                                                                                                                                                                                                                      • Transferring ownership can only happen one dashboard at a time.

                                                                                                                                                                                                                                                                                                                                                                      • When editing a user, the administrator can specify to transfer dashboards to a new owner.

                                                                                                                                                                                                                                                                                                                                                                      • Before changing the dashboard ownership,

                                                                                                                                                                                                                                                                                                                                                                        • It is a good practice to ensure that the new owner is part of the team the previous owner is part of. The administrator can preview the teams that will no longer be part of before confirming the transfer.

                                                                                                                                                                                                                                                                                                                                                                          The new owner need not be part of any teams the previous owner was part of. In this case, the dashboard will be transferred to the new owner but will no longer be shared with any team. The dashboard will become a private dashboard.

                                                                                                                                                                                                                                                                                                                                                                        • A shared dashboard will be visible only to the teams that the new owner is not part of.

                                                                                                                                                                                                                                                                                                                                                                      Transfer Ownership as an Admin

                                                                                                                                                                                                                                                                                                                                                                      1. Log in to the Monitor UI.

                                                                                                                                                                                                                                                                                                                                                                      2. Select Settings > Users.

                                                                                                                                                                                                                                                                                                                                                                      3. Select the user you want to change the ownership.

                                                                                                                                                                                                                                                                                                                                                                      4. Select one or multiple Dashboards that you want to assign a new owner.

                                                                                                                                                                                                                                                                                                                                                                      5. Click Transfer Ownership.

                                                                                                                                                                                                                                                                                                                                                                        The Transfer Dashboard Ownership page is displayed.

                                                                                                                                                                                                                                                                                                                                                                      6. Select a new user from the drop-down.

                                                                                                                                                                                                                                                                                                                                                                        If the user that you selected is not part of the teams that the Dashboard is shared with, you will see a prompt stating the Dashboard will be unshared with the teams that the new owner is not part of.

                                                                                                                                                                                                                                                                                                                                                                      7. If you are satisfied with the changes, click Transfer.

                                                                                                                                                                                                                                                                                                                                                                      Transfer Ownership as a User

                                                                                                                                                                                                                                                                                                                                                                      1. On the Dashboards tab, select the relevant dashboard from the left-hand panel.

                                                                                                                                                                                                                                                                                                                                                                      2. Click the Settings (three dots) icon for the dashboard.

                                                                                                                                                                                                                                                                                                                                                                      3. Select Transfer Ownership.

                                                                                                                                                                                                                                                                                                                                                                        The Transfer Dashboard Ownership page is displayed.

                                                                                                                                                                                                                                                                                                                                                                      4. Select a new user from the drop-down.

                                                                                                                                                                                                                                                                                                                                                                      5. If everything looks ok, click Transfer.

                                                                                                                                                                                                                                                                                                                                                                        The teams indicated with cross-out text are the ones that had access to the dashboard earlier and will lose access to it after the transfer.

                                                                                                                                                                                                                                                                                                                                                                        The dashboard will also be visible to all the teams that the new owner is part of. If you are not part of the teams that the new owner is a member of, you will no longer have the visibility to the dashboard.

                                                                                                                                                                                                                                                                                                                                                                      5.7 -

                                                                                                                                                                                                                                                                                                                                                                      Dashboard Templates

                                                                                                                                                                                                                                                                                                                                                                      Sysdig provides a number of pre-defined dashboards to assist users in monitoring their environments and applications. Dashboard templates are essentially immutable dashboards that can’t be edited, and the scope is fixed. They are useful as is to get a quick overview of infrastructure, but you can use them as a template and can copy them to customize.

                                                                                                                                                                                                                                                                                                                                                                      This section outlines the main dashboards that are available out-of-the-box.

                                                                                                                                                                                                                                                                                                                                                                      AWS CloudWatch

                                                                                                                                                                                                                                                                                                                                                                      DashboardsPromQLNotes
                                                                                                                                                                                                                                                                                                                                                                      ALB OverviewNo
                                                                                                                                                                                                                                                                                                                                                                      AWS ALBYes
                                                                                                                                                                                                                                                                                                                                                                      AWS EBSYes
                                                                                                                                                                                                                                                                                                                                                                      AWS ECS Fargate OverviewYes
                                                                                                                                                                                                                                                                                                                                                                      AWS ECS Fargate Service DetailYes
                                                                                                                                                                                                                                                                                                                                                                      AWS ELBYes
                                                                                                                                                                                                                                                                                                                                                                      AWS Lambda Function DetailYes
                                                                                                                                                                                                                                                                                                                                                                      AWS Lambda OverviewYes
                                                                                                                                                                                                                                                                                                                                                                      AWS RDSYes
                                                                                                                                                                                                                                                                                                                                                                      AWS S3Yes
                                                                                                                                                                                                                                                                                                                                                                      AWS SQSYes
                                                                                                                                                                                                                                                                                                                                                                      DynamoDB OverviewNo
                                                                                                                                                                                                                                                                                                                                                                      DynamoDB Overview By OperationNo
                                                                                                                                                                                                                                                                                                                                                                      EC2 OverviewNo
                                                                                                                                                                                                                                                                                                                                                                      ECS OverviewNo
                                                                                                                                                                                                                                                                                                                                                                      ECS ProjectsNo
                                                                                                                                                                                                                                                                                                                                                                      ECS ServicesNo
                                                                                                                                                                                                                                                                                                                                                                      ECS Task FamiliesNo
                                                                                                                                                                                                                                                                                                                                                                      ECS TasksNo
                                                                                                                                                                                                                                                                                                                                                                      ELB OverviewNo
                                                                                                                                                                                                                                                                                                                                                                      ElastiCache OverviewNo
                                                                                                                                                                                                                                                                                                                                                                      RDS OverviewNo
                                                                                                                                                                                                                                                                                                                                                                      SQS OverviewNo

                                                                                                                                                                                                                                                                                                                                                                      AWS MetricsStream

                                                                                                                                                                                                                                                                                                                                                                      DashboardsPromQLNotes
                                                                                                                                                                                                                                                                                                                                                                      AWS ALBYes
                                                                                                                                                                                                                                                                                                                                                                      AWS EBSYes
                                                                                                                                                                                                                                                                                                                                                                      AWS ELBYes
                                                                                                                                                                                                                                                                                                                                                                      AWS FargateYes
                                                                                                                                                                                                                                                                                                                                                                      AWS LambdaYes
                                                                                                                                                                                                                                                                                                                                                                      AWS RDSYes
                                                                                                                                                                                                                                                                                                                                                                      AWS S3Yes
                                                                                                                                                                                                                                                                                                                                                                      AWS SQSYes

                                                                                                                                                                                                                                                                                                                                                                      Applications

                                                                                                                                                                                                                                                                                                                                                                      DashboardsPromQLNotes
                                                                                                                                                                                                                                                                                                                                                                      ActiveMQNo
                                                                                                                                                                                                                                                                                                                                                                      Apache (legacy)No
                                                                                                                                                                                                                                                                                                                                                                      Apache App OverviewYes
                                                                                                                                                                                                                                                                                                                                                                      Apache CouchDBNo
                                                                                                                                                                                                                                                                                                                                                                      Apache HBaseNo
                                                                                                                                                                                                                                                                                                                                                                      Apache KafkaNo
                                                                                                                                                                                                                                                                                                                                                                      Apache ZooKeeperNo
                                                                                                                                                                                                                                                                                                                                                                      CassandraYes
                                                                                                                                                                                                                                                                                                                                                                      Cassandra By NodeNo
                                                                                                                                                                                                                                                                                                                                                                      Cassandra OverviewNo
                                                                                                                                                                                                                                                                                                                                                                      CephYes
                                                                                                                                                                                                                                                                                                                                                                      ConsulYes
                                                                                                                                                                                                                                                                                                                                                                      ConsulNo
                                                                                                                                                                                                                                                                                                                                                                      Consul EnvoyYes
                                                                                                                                                                                                                                                                                                                                                                      CouchbaseNo
                                                                                                                                                                                                                                                                                                                                                                      Docker EngineYes
                                                                                                                                                                                                                                                                                                                                                                      ElasticSearch ClusterYes
                                                                                                                                                                                                                                                                                                                                                                      ElasticSearch InfraYes
                                                                                                                                                                                                                                                                                                                                                                      ElasticsearchNo
                                                                                                                                                                                                                                                                                                                                                                      FluentdYes
                                                                                                                                                                                                                                                                                                                                                                      FluentdNo
                                                                                                                                                                                                                                                                                                                                                                      GearmanNo
                                                                                                                                                                                                                                                                                                                                                                      GoNo
                                                                                                                                                                                                                                                                                                                                                                      Go InternalsYes
                                                                                                                                                                                                                                                                                                                                                                      HAProxyNo
                                                                                                                                                                                                                                                                                                                                                                      HAProxy Ingress OverviewYes
                                                                                                                                                                                                                                                                                                                                                                      HAProxy Ingress Service DetailsYes
                                                                                                                                                                                                                                                                                                                                                                      HDFSNo
                                                                                                                                                                                                                                                                                                                                                                      HTTPNo
                                                                                                                                                                                                                                                                                                                                                                      HarborYes
                                                                                                                                                                                                                                                                                                                                                                      Istio 1.0 OverviewNo
                                                                                                                                                                                                                                                                                                                                                                      Istio 1.0 ServiceNo
                                                                                                                                                                                                                                                                                                                                                                      Istio 1.5 OverviewNo
                                                                                                                                                                                                                                                                                                                                                                      Istio 1.5 ServiceNo
                                                                                                                                                                                                                                                                                                                                                                      Istio v1.5 ServiceYes
                                                                                                                                                                                                                                                                                                                                                                      Istio v1.5 WorkloadYes
                                                                                                                                                                                                                                                                                                                                                                      JVMNo
                                                                                                                                                                                                                                                                                                                                                                      KedaYes
                                                                                                                                                                                                                                                                                                                                                                      Kyoto TycoonNo
                                                                                                                                                                                                                                                                                                                                                                      MemcachedNo
                                                                                                                                                                                                                                                                                                                                                                      MemcachedYes
                                                                                                                                                                                                                                                                                                                                                                      Microsoft SqlServer OverviewYes
                                                                                                                                                                                                                                                                                                                                                                      MongoDB (Server)No
                                                                                                                                                                                                                                                                                                                                                                      MongoDB Database DetailsYes
                                                                                                                                                                                                                                                                                                                                                                      MongoDB Instance HealthYes
                                                                                                                                                                                                                                                                                                                                                                      MySQLYes
                                                                                                                                                                                                                                                                                                                                                                      MySQL ServerNo
                                                                                                                                                                                                                                                                                                                                                                      NTPYes
                                                                                                                                                                                                                                                                                                                                                                      NginxYes
                                                                                                                                                                                                                                                                                                                                                                      Nginx (legacy)No
                                                                                                                                                                                                                                                                                                                                                                      Nginx IngressYes
                                                                                                                                                                                                                                                                                                                                                                      OPA GatekeeperYes
                                                                                                                                                                                                                                                                                                                                                                      Oracle DBYes
                                                                                                                                                                                                                                                                                                                                                                      PHP-FPMNo
                                                                                                                                                                                                                                                                                                                                                                      Percona TokuMXNo
                                                                                                                                                                                                                                                                                                                                                                      PgBouncerNo
                                                                                                                                                                                                                                                                                                                                                                      Php-fpmYes
                                                                                                                                                                                                                                                                                                                                                                      Portworx ClusterYes
                                                                                                                                                                                                                                                                                                                                                                      Portworx VolumesYes
                                                                                                                                                                                                                                                                                                                                                                      PostfixNo
                                                                                                                                                                                                                                                                                                                                                                      PostgreSQL Database DetailsYes
                                                                                                                                                                                                                                                                                                                                                                      PostgreSQL Instance HealthYes
                                                                                                                                                                                                                                                                                                                                                                      PostgreSQL ServerNo
                                                                                                                                                                                                                                                                                                                                                                      RabbitMQNo
                                                                                                                                                                                                                                                                                                                                                                      Rabbitmq OverviewYes
                                                                                                                                                                                                                                                                                                                                                                      Rabbitmq UsageYes
                                                                                                                                                                                                                                                                                                                                                                      RedisYes
                                                                                                                                                                                                                                                                                                                                                                      Redis (legacy)No
                                                                                                                                                                                                                                                                                                                                                                      RiakNo
                                                                                                                                                                                                                                                                                                                                                                      Riak CSNo
                                                                                                                                                                                                                                                                                                                                                                      Solr ClusterNo
                                                                                                                                                                                                                                                                                                                                                                      Solr HostNo
                                                                                                                                                                                                                                                                                                                                                                      TomcatNo
                                                                                                                                                                                                                                                                                                                                                                      VarnishNo
                                                                                                                                                                                                                                                                                                                                                                      VoltDBNo
                                                                                                                                                                                                                                                                                                                                                                      Windows Node OverviewYes
                                                                                                                                                                                                                                                                                                                                                                      Windows Node Overview (Legacy)Yes
                                                                                                                                                                                                                                                                                                                                                                      etcdNo
                                                                                                                                                                                                                                                                                                                                                                      lighttpdNo

                                                                                                                                                                                                                                                                                                                                                                      Compliance & Security

                                                                                                                                                                                                                                                                                                                                                                      DashboardsPromQLNotes
                                                                                                                                                                                                                                                                                                                                                                      Docker Compliance ReportNo
                                                                                                                                                                                                                                                                                                                                                                      Kubernetes Compliance Report (v1.4)No
                                                                                                                                                                                                                                                                                                                                                                      Security SummaryNo

                                                                                                                                                                                                                                                                                                                                                                      Containers

                                                                                                                                                                                                                                                                                                                                                                      DashboardsPromQLNotes
                                                                                                                                                                                                                                                                                                                                                                      Container CPU & Memory LimitsNo
                                                                                                                                                                                                                                                                                                                                                                      Container Disk Usage & PerformanceNo
                                                                                                                                                                                                                                                                                                                                                                      Container NetworkNo
                                                                                                                                                                                                                                                                                                                                                                      Container Resource UsageNo

                                                                                                                                                                                                                                                                                                                                                                      Docker

                                                                                                                                                                                                                                                                                                                                                                      DashboardsPromQLNotes
                                                                                                                                                                                                                                                                                                                                                                      OverviewNo
                                                                                                                                                                                                                                                                                                                                                                      ProjectsNo
                                                                                                                                                                                                                                                                                                                                                                      ServicesNo
                                                                                                                                                                                                                                                                                                                                                                      Swarm OverviewNo
                                                                                                                                                                                                                                                                                                                                                                      Swarm ServicesNo
                                                                                                                                                                                                                                                                                                                                                                      Swarm TasksNo

                                                                                                                                                                                                                                                                                                                                                                      Host Infrastructure

                                                                                                                                                                                                                                                                                                                                                                      DashboardsPromQLNotes
                                                                                                                                                                                                                                                                                                                                                                      File System Usage & PerformanceNo
                                                                                                                                                                                                                                                                                                                                                                      Host Resource UsageNo
                                                                                                                                                                                                                                                                                                                                                                      Memory UsageNo
                                                                                                                                                                                                                                                                                                                                                                      NetworkNo
                                                                                                                                                                                                                                                                                                                                                                      Sysdig Agent Health & StatusYes

                                                                                                                                                                                                                                                                                                                                                                      K8s Control Plane

                                                                                                                                                                                                                                                                                                                                                                      DashboardsPromQLNotes
                                                                                                                                                                                                                                                                                                                                                                      Kubernetes API ServerYes
                                                                                                                                                                                                                                                                                                                                                                      Kubernetes Controller ManagerYes
                                                                                                                                                                                                                                                                                                                                                                      Kubernetes CoreDNSYes
                                                                                                                                                                                                                                                                                                                                                                      Kubernetes EtcdYes
                                                                                                                                                                                                                                                                                                                                                                      Kubernetes KubeletYes
                                                                                                                                                                                                                                                                                                                                                                      Kubernetes ProxyYes
                                                                                                                                                                                                                                                                                                                                                                      Kubernetes SchedulerYes

                                                                                                                                                                                                                                                                                                                                                                      Kubernetes

                                                                                                                                                                                                                                                                                                                                                                      DashboardsPromQLNotes
                                                                                                                                                                                                                                                                                                                                                                      CPU Allocation OptimizationNoDeprecated
                                                                                                                                                                                                                                                                                                                                                                      Cluster / Namespace Available ResourcesYes
                                                                                                                                                                                                                                                                                                                                                                      Cluster Capacity PlanningYes
                                                                                                                                                                                                                                                                                                                                                                      Cluster OverviewNoDeprecated
                                                                                                                                                                                                                                                                                                                                                                      Cluster and Node CapacityNoDeprecated
                                                                                                                                                                                                                                                                                                                                                                      Container Resource Usage & TroubleshootingYes
                                                                                                                                                                                                                                                                                                                                                                      DaemonSet OverviewNoDeprecated
                                                                                                                                                                                                                                                                                                                                                                      Deployment OverviewNoDeprecated
                                                                                                                                                                                                                                                                                                                                                                      Health OverviewNoDeprecated
                                                                                                                                                                                                                                                                                                                                                                      Horizontal Pod AutoscalerYes
                                                                                                                                                                                                                                                                                                                                                                      Horizontal Pod Autoscaler (legacy)NoDeprecated
                                                                                                                                                                                                                                                                                                                                                                      Job OverviewNo
                                                                                                                                                                                                                                                                                                                                                                      KSM Cluster / Namespace Available ResourcesYes
                                                                                                                                                                                                                                                                                                                                                                      KSM Container Resource Usage & TroubleshootingYes
                                                                                                                                                                                                                                                                                                                                                                      KSM Pod Status & PerformanceYes
                                                                                                                                                                                                                                                                                                                                                                      KSM Workload Status & PerformanceYes
                                                                                                                                                                                                                                                                                                                                                                      Kubernetes JobsYes
                                                                                                                                                                                                                                                                                                                                                                      Memory Allocation OptimizationNoDeprecated
                                                                                                                                                                                                                                                                                                                                                                      Namespace OverviewNoDeprecated
                                                                                                                                                                                                                                                                                                                                                                      Node OverviewNoDeprecated
                                                                                                                                                                                                                                                                                                                                                                      Node Status & PerformanceYes
                                                                                                                                                                                                                                                                                                                                                                      PVC and StorageYes
                                                                                                                                                                                                                                                                                                                                                                      Pod OverviewNoDeprecated
                                                                                                                                                                                                                                                                                                                                                                      Pod Rightsizing & Workload Capacity OptimizationYes
                                                                                                                                                                                                                                                                                                                                                                      Pod Scheduling TroubleshootingYes
                                                                                                                                                                                                                                                                                                                                                                      Pod Status & PerformanceYes
                                                                                                                                                                                                                                                                                                                                                                      ReplicaSet OverviewNoDeprecated
                                                                                                                                                                                                                                                                                                                                                                      Resource QuotaNoDeprecated
                                                                                                                                                                                                                                                                                                                                                                      Service Golden SignalsNoDeprecated
                                                                                                                                                                                                                                                                                                                                                                      Service HealthNoDeprecated
                                                                                                                                                                                                                                                                                                                                                                      StatefulSet OverviewNoDeprecated
                                                                                                                                                                                                                                                                                                                                                                      Workload Status & PerformanceYes
                                                                                                                                                                                                                                                                                                                                                                      Workloads CPU Usage and AllocationNoDeprecated
                                                                                                                                                                                                                                                                                                                                                                      Workloads Memory Usage and AllocationNoDeprecated

                                                                                                                                                                                                                                                                                                                                                                      Marathon

                                                                                                                                                                                                                                                                                                                                                                      DashboardsPromQLNotes
                                                                                                                                                                                                                                                                                                                                                                      ApplicationsNo
                                                                                                                                                                                                                                                                                                                                                                      GroupsNo
                                                                                                                                                                                                                                                                                                                                                                      Master NodeNo
                                                                                                                                                                                                                                                                                                                                                                      OverviewNo

                                                                                                                                                                                                                                                                                                                                                                      Mesos

                                                                                                                                                                                                                                                                                                                                                                      DashboardsPromQLNotes
                                                                                                                                                                                                                                                                                                                                                                      FrameworksNo
                                                                                                                                                                                                                                                                                                                                                                      Master NodeNo
                                                                                                                                                                                                                                                                                                                                                                      OverviewNo
                                                                                                                                                                                                                                                                                                                                                                      Slave NodeNo
                                                                                                                                                                                                                                                                                                                                                                      TasksNo

                                                                                                                                                                                                                                                                                                                                                                      OpenShift

                                                                                                                                                                                                                                                                                                                                                                      DashboardsPromQLNotes
                                                                                                                                                                                                                                                                                                                                                                      OpenShift HAProxy Ingress OverviewYes
                                                                                                                                                                                                                                                                                                                                                                      OpenShift HAProxy Ingress Service DetailsYes
                                                                                                                                                                                                                                                                                                                                                                      OpenShift v3 API ServerYes
                                                                                                                                                                                                                                                                                                                                                                      OpenShift v3 Controller ManagerYes
                                                                                                                                                                                                                                                                                                                                                                      OpenShift v3 KubeletYes
                                                                                                                                                                                                                                                                                                                                                                      OpenShift v4 API ServerYes
                                                                                                                                                                                                                                                                                                                                                                      OpenShift v4 Controller ManagerYes
                                                                                                                                                                                                                                                                                                                                                                      OpenShift v4 CoreDNSYes
                                                                                                                                                                                                                                                                                                                                                                      OpenShift v4 EtcdYes
                                                                                                                                                                                                                                                                                                                                                                      OpenShift v4 KubeletYes
                                                                                                                                                                                                                                                                                                                                                                      OpenShift v4 SchedulerYes

                                                                                                                                                                                                                                                                                                                                                                      Rancher

                                                                                                                                                                                                                                                                                                                                                                      DashboardsPromQLNotes
                                                                                                                                                                                                                                                                                                                                                                      Rancher API ServerYes
                                                                                                                                                                                                                                                                                                                                                                      Rancher Controller ManagerYes
                                                                                                                                                                                                                                                                                                                                                                      Rancher CoreDNSYes
                                                                                                                                                                                                                                                                                                                                                                      Rancher EtcdYes
                                                                                                                                                                                                                                                                                                                                                                      Rancher KubeletYes
                                                                                                                                                                                                                                                                                                                                                                      Rancher ProxyYes
                                                                                                                                                                                                                                                                                                                                                                      Rancher SchedulerYes

                                                                                                                                                                                                                                                                                                                                                                      Sysdig Secure

                                                                                                                                                                                                                                                                                                                                                                      DashboardsPromQLNotes
                                                                                                                                                                                                                                                                                                                                                                      Serverless Agents Fargate UsageNo
                                                                                                                                                                                                                                                                                                                                                                      Sysdig Admission ControllerYes

                                                                                                                                                                                                                                                                                                                                                                      Troubleshooting

                                                                                                                                                                                                                                                                                                                                                                      DashboardsPromQLNotes
                                                                                                                                                                                                                                                                                                                                                                      MongoDB TroubleshootingNo
                                                                                                                                                                                                                                                                                                                                                                      Network Connections TableNo
                                                                                                                                                                                                                                                                                                                                                                      Process Resource UsageNo
                                                                                                                                                                                                                                                                                                                                                                      SQL TroubleshootingNo
                                                                                                                                                                                                                                                                                                                                                                      Top ProcessesNo

                                                                                                                                                                                                                                                                                                                                                                      6 -

                                                                                                                                                                                                                                                                                                                                                                      Alerts

                                                                                                                                                                                                                                                                                                                                                                      Alert is the responsive component of Sysdig Monitor. Alerts notify you when an event/issue occurs that requires attention. Events and issues are identified based on changes in the metric values collected by Sysdig Monitor. The Alerts module displays out-of-the-box alerts and a wizard for creating and editing alerts as needed.

                                                                                                                                                                                                                                                                                                                                                                      About Sysdig Alert

                                                                                                                                                                                                                                                                                                                                                                      Sysdig Monitor can generate notifications based on certain conditions or events you configure. Using the alert feature, you can keep a tab on your infrastructure and find out about problems as they happen, or even before they happen with the alert conditions you define. In Sysdig Monitor, metrics serve as the central configuration artifact for alerts. A metric ties one or more conditions or events to the measures to take when the condition is met, or an event happens. Alerts work across Sysdig modules including Explore, Dashboard, Events, and Overview.

                                                                                                                                                                                                                                                                                                                                                                      Alert Types

                                                                                                                                                                                                                                                                                                                                                                      The types of alerts available in Sysdig Monitor:

                                                                                                                                                                                                                                                                                                                                                                      • Downtime: Monitor any type of entity, such as a host, a container, or a process, and alert when the entity goes down.

                                                                                                                                                                                                                                                                                                                                                                      • Metric: Monitor time-series metrics, and alert if they violate user-defined thresholds.

                                                                                                                                                                                                                                                                                                                                                                      • PromQL: Monitor metrics through a PromQL query.

                                                                                                                                                                                                                                                                                                                                                                      • Event: Monitor occurrences of specific events, and alert if the total number of occurrences violates a threshold. Useful for alerting on container, orchestration, and service events like restarts and unauthorized access.

                                                                                                                                                                                                                                                                                                                                                                      • Anomaly Detection: Monitor hosts based on their historical behaviors, and alert when they deviate from the expected pattern.

                                                                                                                                                                                                                                                                                                                                                                      • Group Outlier: Monitor a group of hosts and be notified when one acts differently from the rest. Group Outlier Alert is supported only on hosts.

                                                                                                                                                                                                                                                                                                                                                                      Alert Tools

                                                                                                                                                                                                                                                                                                                                                                      The following tools help with alert creation:

                                                                                                                                                                                                                                                                                                                                                                      • Alert Library: Sysdig Monitor provides a set of alerts by default. Use it as it is or as a template to create your own.

                                                                                                                                                                                                                                                                                                                                                                      • Sysdig API: Use Sysdig’s Python client to create, list, delete, update and restore alerts. See examples.

                                                                                                                                                                                                                                                                                                                                                                      Guidelines for Creating Alerts

                                                                                                                                                                                                                                                                                                                                                                      Steps

                                                                                                                                                                                                                                                                                                                                                                      Description

                                                                                                                                                                                                                                                                                                                                                                      Decide What to monitor

                                                                                                                                                                                                                                                                                                                                                                      Determine what type of problem you want to be alerted on. See Alert Types to choose a type of problem.

                                                                                                                                                                                                                                                                                                                                                                      Define how it will be monitored

                                                                                                                                                                                                                                                                                                                                                                      Specify exactly what behavior triggers a violation. For example, Marathon App is down on the Kubernetes Cluster named Production for ten minutes.

                                                                                                                                                                                                                                                                                                                                                                      Decide Where to monitor

                                                                                                                                                                                                                                                                                                                                                                      Narrow down your environment to receive fine-tuned results. Use Scope to choose an entity that you want to keep a close watch on. Specify additional segments (entities) to give context to the problem. For example, in addition to specifying a Kubernetes cluster, add a namespace and deployment to refine your scope.

                                                                                                                                                                                                                                                                                                                                                                      Define when to notify

                                                                                                                                                                                                                                                                                                                                                                      Define the threshold and time window for assessing the alert condition.

                                                                                                                                                                                                                                                                                                                                                                      Single Alert fires an alert for your entire scope, while Multiple Alert fires if any or every segment breach the threshold at once.

                                                                                                                                                                                                                                                                                                                                                                      Multiple Alerts include all the segments you specified to uniquely identify the location and thus provides a full qualification of where the problem occurred. The higher the number of segments the easier to uniquely identify the affected entities.

                                                                                                                                                                                                                                                                                                                                                                      A good analogy for multiple alerts is alerting on cities. For example, creating multiple alerts on San Francisco would trigger an alert which will include information such as the country that it is part of is the USA and the continent is North America.

                                                                                                                                                                                                                                                                                                                                                                      Trigger gives you control over how notifications are created. For example, you may want to receive a notification for every violation, or want only a single notification for a series of consecutive violations.

                                                                                                                                                                                                                                                                                                                                                                      Decide how notifications are sent

                                                                                                                                                                                                                                                                                                                                                                      Alert supports customizable notification channels, including email, mobile push notifications, OpsGenie, Slack, and more. To see supported services, see Set Up Notification Channels.

                                                                                                                                                                                                                                                                                                                                                                      To create alerts, simply:

                                                                                                                                                                                                                                                                                                                                                                      1. Choose an alert type.

                                                                                                                                                                                                                                                                                                                                                                      2. Configure alert parameters.

                                                                                                                                                                                                                                                                                                                                                                      3. Configure the notification channels you want to use for alert notification.

                                                                                                                                                                                                                                                                                                                                                                      Sysdig sometimes deprecates outdated metrics. Alerts that use these metrics will not be modified or disabled, but will no longer be updated. See Deprecated Metrics and Labels.

                                                                                                                                                                                                                                                                                                                                                                      Configure Alerts

                                                                                                                                                                                                                                                                                                                                                                      Use the Alert wizard to create or edit alerts.

                                                                                                                                                                                                                                                                                                                                                                      Open the Alert Wizard

                                                                                                                                                                                                                                                                                                                                                                      There are multiple ways to access the Alert wizard:

                                                                                                                                                                                                                                                                                                                                                                      From Explore

                                                                                                                                                                                                                                                                                                                                                                      Do one of the following:

                                                                                                                                                                                                                                                                                                                                                                      • Select New Alertbeside an entity:

                                                                                                                                                                                                                                                                                                                                                                      • Click More Options (three dots), and select Create a new alert.

                                                                                                                                                                                                                                                                                                                                                                      From Dashboards

                                                                                                                                                                                                                                                                                                                                                                      Click the More Options (three dots) icon for a panel, and select Create Alert.

                                                                                                                                                                                                                                                                                                                                                                      From Alerts

                                                                                                                                                                                                                                                                                                                                                                      Do one of the following:

                                                                                                                                                                                                                                                                                                                                                                      • Click Add Alerts.

                                                                                                                                                                                                                                                                                                                                                                      • Select an existing alert and click Edit.

                                                                                                                                                                                                                                                                                                                                                                      From Overview

                                                                                                                                                                                                                                                                                                                                                                      From the Events panel on the Overview screen, select a custom or an Infrastructure type event. From the event description screen, click Create Alert from Event.

                                                                                                                                                                                                                                                                                                                                                                      Create an Alert

                                                                                                                                                                                                                                                                                                                                                                      Configure notification channels before you begin, so the channels are available to assign to the alert. Optionally, you can add a custom subject and body information into individual alert notifications.

                                                                                                                                                                                                                                                                                                                                                                      Enter Basic Alert Information

                                                                                                                                                                                                                                                                                                                                                                      Configuration slightly defers for each Alert type. See respective pages to learn more. This section covers general instructions to help you acquainted with and navigate the Alerts user interface.

                                                                                                                                                                                                                                                                                                                                                                      To configure an alert, open the Alert wizard and set the following parameters:

                                                                                                                                                                                                                                                                                                                                                                      • Create the alert:

                                                                                                                                                                                                                                                                                                                                                                        • Type: Select the desired Alert Types.

                                                                                                                                                                                                                                                                                                                                                                          Each type has different parameters, but they follow the same pattern:

                                                                                                                                                                                                                                                                                                                                                                          • Name: Specify a meaningful name that can uniquely represent the Alert that you are creating. For example, the entity that an alert targets, such as Production Cluster Failed Scheduling pods.

                                                                                                                                                                                                                                                                                                                                                                          • Group (optional): Specify a meaningful group name for the alert you are creating. Group name helps you narrow down the problem area and focus on the infrastructure view that needs your attention. For example, you can enter Redis for alerts related to Redis services. When the alert triggers you will know which service in your workload requires inspection. Alerts that have no group name will be added to the Default Group. Group name is editable. Edit the alert to do so.

                                                                                                                                                                                                                                                                                                                                                                            An alert can belong to only one group. An alert created from an alert template will have the group already configured by the Monitor Integrations. You can see the existing alert groups on the Alerts details page.

                                                                                                                                                                                                                                                                                                                                                                            See Groupings for more information on how Sysdig handles infrastructure views.

                                                                                                                                                                                                                                                                                                                                                                          • Description (optional): Briefly expand on the alert name or alert condition to give additional context for the recipient.

                                                                                                                                                                                                                                                                                                                                                                          • Priority: Select a priority. High, Medium, Low, and Info. You can later sort by the severity by using the top navigation pane.

                                                                                                                                                                                                                                                                                                                                                                          • Specify the parameters in the Define, Notify, and Act sections.

                                                                                                                                                                                                                                                                                                                                                                      • Define:

                                                                                                                                                                                                                                                                                                                                                                        Based on the alert type, define the parameters.

                                                                                                                                                                                                                                                                                                                                                                        • Downtime: Select the entity to monitor. For more information, see Downtime Alert.

                                                                                                                                                                                                                                                                                                                                                                        • Metric: Select a metric that this alert will monitor. You also define how the data is aggregated, such as average, maximum, minimum, or sum. Metrics are applied to a group of items (group aggregation). For more information, see Metric Alerts.

                                                                                                                                                                                                                                                                                                                                                                        • PromQL: Enter the PromQL query and duration. For more information, see PromQL Alerts.

                                                                                                                                                                                                                                                                                                                                                                        • Event: Filter the custom event to be alerted on by using the name, tag, description, and a source tag. For more information, see Event Alerts

                                                                                                                                                                                                                                                                                                                                                                        • Anomaly Detection: Specify the metrics to be monitored for anomalies. For more information, see Anomaly Detection Alerts.

                                                                                                                                                                                                                                                                                                                                                                        • Group Outlier: Specify the metrics to be monitored for outliers. For more information, see Group Outlier Alerts.

                                                                                                                                                                                                                                                                                                                                                                      To alert on multiple metrics using boolean logic, click Create multi-condition alerts. See Multi-Condition Alerts.

                                                                                                                                                                                                                                                                                                                                                                      • Scope: Everywhere, or a more limited scope to filter a specific component of the infrastructure monitored, such as a Kubernetes deployment, a Sysdig Agent, or a specific service.

                                                                                                                                                                                                                                                                                                                                                                      • Trigger: Boundaries for assessing the alert condition, and whether to send a single alert or multiple alerts. Supported time scales are minute, hour, or day.

                                                                                                                                                                                                                                                                                                                                                                        • Single alert: Single Alert fires an alert for your entire scope.

                                                                                                                                                                                                                                                                                                                                                                        • Multiple alerts: Multiple Alert fires if any or every segment breaches the threshold at once.

                                                                                                                                                                                                                                                                                                                                                                          Multiple alerts are triggered for each segment you specify. The specified segments will be represented in alerts. The higher the number of segments the easier to uniquely identify the affected entities.

                                                                                                                                                                                                                                                                                                                                                                      For detailed description, see respective sections on Alert Types.

                                                                                                                                                                                                                                                                                                                                                                      • (2) Notify

                                                                                                                                                                                                                                                                                                                                                                        • Notification Channel: Select from the configured notification channels in the list. Supported channels are:

                                                                                                                                                                                                                                                                                                                                                                          • Email

                                                                                                                                                                                                                                                                                                                                                                          • Slack

                                                                                                                                                                                                                                                                                                                                                                          • Amazon SNS Topic

                                                                                                                                                                                                                                                                                                                                                                          • Opsgenie

                                                                                                                                                                                                                                                                                                                                                                          • Pagerduty

                                                                                                                                                                                                                                                                                                                                                                          • VictorOps

                                                                                                                                                                                                                                                                                                                                                                          • Webhook

                                                                                                                                                                                                                                                                                                                                                                          You can view the list of notification channels configured for each alert on the Alerts page.

                                                                                                                                                                                                                                                                                                                                                                        • Notification Options: Set the time interval at which multiple alerts should be sent.

                                                                                                                                                                                                                                                                                                                                                                        • Format Message: If applicable, add message format details. See Customize Notifications.

                                                                                                                                                                                                                                                                                                                                                                      • (3) Act

                                                                                                                                                                                                                                                                                                                                                                        • (Optional): Configure a Sysdig capture. See also Captures.

                                                                                                                                                                                                                                                                                                                                                                          Sysdig capture files are not available for Event Alerts.

                                                                                                                                                                                                                                                                                                                                                                      • Click Create.

                                                                                                                                                                                                                                                                                                                                                                      Optional: Customize Notifications

                                                                                                                                                                                                                                                                                                                                                                      You can optionally customize individual notifications to provide context for the errors that triggered the alert. All the notification channels support this added contextual information and customization flexibility.

                                                                                                                                                                                                                                                                                                                                                                      Modify the subject, body, or both of the alert notification with the following:

                                                                                                                                                                                                                                                                                                                                                                      • Plaintext: A custom message stating the problem. For example, Stalled Deployment.

                                                                                                                                                                                                                                                                                                                                                                      • Hyperlink: For example, URL to a Dashboard.

                                                                                                                                                                                                                                                                                                                                                                      • Dynamic Variable: For example, a hostname. Note the conventions:

                                                                                                                                                                                                                                                                                                                                                                        • All variables that you insert must be enclosed in double curly braces, such as {{file_mount}}.

                                                                                                                                                                                                                                                                                                                                                                        • Variables are case sensitive.

                                                                                                                                                                                                                                                                                                                                                                        • The variables should correspond to the segment values you created the alert for. For example, if an alert is segmented byhost.hostName andcontainer.name, the corresponding variables will be{{host.hostName}}and {{container.name}} respectively. In addition to these segment variables, __alert_name__  and __alert_status__ are supported. No other segment variables are allowed in the notification subject and body.

                                                                                                                                                                                                                                                                                                                                                                        • Notification subjects will not show up on the Event feed.

                                                                                                                                                                                                                                                                                                                                                                        • Using a variable that is not a part of the segment will trigger an error.

                                                                                                                                                                                                                                                                                                                                                                        • The segment variables used in an alert are turned to the current system values upon sending the alert.

                                                                                                                                                                                                                                                                                                                                                                      The body of the notification message contains a Default Alert Template. It is the default alert notification generated by Sysdig Monitor. You may add free text, variables, or hyperlinks before and after the template.

                                                                                                                                                                                                                                                                                                                                                                      You can send a customized alert notification to the following channels:

                                                                                                                                                                                                                                                                                                                                                                      • Email

                                                                                                                                                                                                                                                                                                                                                                      • Slack

                                                                                                                                                                                                                                                                                                                                                                      • Amazon SNS Topic

                                                                                                                                                                                                                                                                                                                                                                      • Opsgenie

                                                                                                                                                                                                                                                                                                                                                                      • Pagerduty

                                                                                                                                                                                                                                                                                                                                                                      • VictorOps

                                                                                                                                                                                                                                                                                                                                                                      • Webhook

                                                                                                                                                                                                                                                                                                                                                                      Multi-Condition Alerts

                                                                                                                                                                                                                                                                                                                                                                      Multi-condition alerts are advanced alert threshold created on complex conditions. To do so, you define alert thresholds as custom boolean expressions that can involve multiple conditions. Click Create multi-condition alerts to enable adding conditions as boolean expressions.

                                                                                                                                                                                                                                                                                                                                                                      These advanced alerts require specific syntax, as described in the examples below.

                                                                                                                                                                                                                                                                                                                                                                      Format and Operations

                                                                                                                                                                                                                                                                                                                                                                      Each condition has five parts:

                                                                                                                                                                                                                                                                                                                                                                      • Metric Name : Use the exact metric names. To avoid typos, click the HELP link to access the drop-down list of available metrics. Selecting a metric from the list will automatically add the name to the threshold expression being edited.

                                                                                                                                                                                                                                                                                                                                                                      • Group Aggregation (optional): If no group aggregation type is selected, the appropriate default for the metric will be applied (either sum or average). Group aggregation functions must be applied outside of time aggregation functions.

                                                                                                                                                                                                                                                                                                                                                                      • Time aggregation : It’s the historical data rolled up over a selected period of time.

                                                                                                                                                                                                                                                                                                                                                                      • Operator: Both logical and relational operators are supported.

                                                                                                                                                                                                                                                                                                                                                                      • Value: A static numerical value against which a condition is evaluated.

                                                                                                                                                                                                                                                                                                                                                                      The table below displays supported time aggregation functions, group aggregation functions, and relational operators:

                                                                                                                                                                                                                                                                                                                                                                      Time Aggregation FunctionGroup Aggregation FunctionRelational Operator
                                                                                                                                                                                                                                                                                                                                                                      timeAvg()avg()=
                                                                                                                                                                                                                                                                                                                                                                      min()min()<
                                                                                                                                                                                                                                                                                                                                                                      max()max()>
                                                                                                                                                                                                                                                                                                                                                                      sum()sum()<=
                                                                                                                                                                                                                                                                                                                                                                      >=
                                                                                                                                                                                                                                                                                                                                                                      !=

                                                                                                                                                                                                                                                                                                                                                                      The format is:

                                                                                                                                                                                                                                                                                                                                                                      condition1 AND condition2
                                                                                                                                                                                                                                                                                                                                                                      condition1 OR condition2
                                                                                                                                                                                                                                                                                                                                                                      NOT condition1
                                                                                                                                                                                                                                                                                                                                                                      

                                                                                                                                                                                                                                                                                                                                                                      The order of operations can also be altered via parenthesis:

                                                                                                                                                                                                                                                                                                                                                                      NOT (condition1 AND (condition2 OR condition3))
                                                                                                                                                                                                                                                                                                                                                                      

                                                                                                                                                                                                                                                                                                                                                                      Conditions take the following form:

                                                                                                                                                                                                                                                                                                                                                                      groupAggregation(timeAggregation(metric.name)) operator value
                                                                                                                                                                                                                                                                                                                                                                      

                                                                                                                                                                                                                                                                                                                                                                      Example Expressions

                                                                                                                                                                                                                                                                                                                                                                      Several examples of advanced alerts are given below:

                                                                                                                                                                                                                                                                                                                                                                      timeAvg(cpu.used.percent) > 50 AND timeAvg(memory.used.percent) > 75
                                                                                                                                                                                                                                                                                                                                                                      timeAvg(cpu.used.percent) > 50 OR timeAvg(memory.used.percent) > 75
                                                                                                                                                                                                                                                                                                                                                                      timeAvg(container.count) != 10
                                                                                                                                                                                                                                                                                                                                                                      min(min(cpu.used.percent)) <= 30 OR max(max(cpu.used.percent)) >= 60
                                                                                                                                                                                                                                                                                                                                                                      sum(file.bytes.total) > 0 OR sum(net.bytes.total) > 0
                                                                                                                                                                                                                                                                                                                                                                      timeAvg(cpu.used.percent) > 50 AND (timeAvg(mysql.net.connections) > 20 OR timeAvg(memory.used.percent) > 75)
                                                                                                                                                                                                                                                                                                                                                                      
                                                                                                                                                                                                                                                                                                                                                                      

                                                                                                                                                                                                                                                                                                                                                                      6.1 -

                                                                                                                                                                                                                                                                                                                                                                      Manage Alerts

                                                                                                                                                                                                                                                                                                                                                                      Alerts can be managed individually, or as a group, by using the checkboxes on the left side of the Alert UI and the customization bar. The columns of the table can also be configured, to provide you with the necessary data for your use cases.

                                                                                                                                                                                                                                                                                                                                                                      Select a group of alerts and perform several batch operations, such as filtering, deleting, enabling, disabling, or exporting to a JSON object. Select individual alerts to perform tasks such as creating a copy for a different team.

                                                                                                                                                                                                                                                                                                                                                                      View Alert Details

                                                                                                                                                                                                                                                                                                                                                                      The bell button next to an alert indicates that you have not resolved the corresponding events. The Activity Over Last Two Weeks column visually notifies you with an event chart showing the number of events that were triggered over the last two weeks. The color of the event chart represents what severity level they are.

                                                                                                                                                                                                                                                                                                                                                                      To view alert details, click the corresponding alert row. The slider with the alert details will appear. Click an individual event to Take Action. You can do one of the following:

                                                                                                                                                                                                                                                                                                                                                                      • Acknowledge: Mark that the event has been acknowledged by the intended recipient.

                                                                                                                                                                                                                                                                                                                                                                      • Create Silence from Event: If you no longer want to be notified, use this option. You can choose the scope for alert silence. When silenced, alerts will still be triggered but will not send you any notifications.

                                                                                                                                                                                                                                                                                                                                                                      • Explore: Use this option to troubleshoot by using the PromQL Query.

                                                                                                                                                                                                                                                                                                                                                                      The event feed will be empty and The Activity Over Last Two Weeks column will have no event chart if no events are reported in the past two weeks.

                                                                                                                                                                                                                                                                                                                                                                      Enable/Disable Alerts

                                                                                                                                                                                                                                                                                                                                                                      Alerts can be enabled or disabled using the slider or the customization bar. You can perform these operations on a single alert or on multiple alerts as a batch operation.

                                                                                                                                                                                                                                                                                                                                                                      1. From the Alerts module, check the boxes beside the relevant alerts.

                                                                                                                                                                                                                                                                                                                                                                      2. Click Enable Selected or Disable Selected as necessary.

                                                                                                                                                                                                                                                                                                                                                                      Use the slider beside the alert to disable or enable individual alerts.

                                                                                                                                                                                                                                                                                                                                                                      Edit an Existing Alert

                                                                                                                                                                                                                                                                                                                                                                      To edit an existing alert:

                                                                                                                                                                                                                                                                                                                                                                      1. Do one of the following::

                                                                                                                                                                                                                                                                                                                                                                        • Click the Edit button beside the alert.

                                                                                                                                                                                                                                                                                                                                                                        • Click an alert to open the detail view, then click Edit on the top right corner

                                                                                                                                                                                                                                                                                                                                                                      2. Edit the alert, and click Save to confirm the changes.

                                                                                                                                                                                                                                                                                                                                                                      Copy an Alert

                                                                                                                                                                                                                                                                                                                                                                      Alerts can be copied within the current team to allow for similar alerts to be created quickly, or copied to a different team to share alerts.

                                                                                                                                                                                                                                                                                                                                                                      Copy an Alert to the Current Team

                                                                                                                                                                                                                                                                                                                                                                      To copy an alert within the current team:

                                                                                                                                                                                                                                                                                                                                                                      1. Highlight the alert to be copied.

                                                                                                                                                                                                                                                                                                                                                                        The detail view is displayed.

                                                                                                                                                                                                                                                                                                                                                                      2. Click Copy.

                                                                                                                                                                                                                                                                                                                                                                        The Copy Alert screen is displayed.

                                                                                                                                                                                                                                                                                                                                                                      3. Select Current from the drop-down.

                                                                                                                                                                                                                                                                                                                                                                      4. Click Copy and Open.

                                                                                                                                                                                                                                                                                                                                                                        The particular alert in the edit mode appears.

                                                                                                                                                                                                                                                                                                                                                                      5. Make necessary changes and save the alert.

                                                                                                                                                                                                                                                                                                                                                                      Copy an Alert to a Different Team

                                                                                                                                                                                                                                                                                                                                                                      1. Highlight the alert to be copied.

                                                                                                                                                                                                                                                                                                                                                                        The detail view is displayed.

                                                                                                                                                                                                                                                                                                                                                                      2. Click Copy.

                                                                                                                                                                                                                                                                                                                                                                        The Copy Alert screen is displayed.

                                                                                                                                                                                                                                                                                                                                                                      3. Select the teams that the alert should be copied to.

                                                                                                                                                                                                                                                                                                                                                                      4. Click Send Copy.

                                                                                                                                                                                                                                                                                                                                                                      Search for an Alert

                                                                                                                                                                                                                                                                                                                                                                      Search Using Strings

                                                                                                                                                                                                                                                                                                                                                                      The Alerts table can be searched using partial or full strings. For example, the search below displays only events that contain kubernetes:

                                                                                                                                                                                                                                                                                                                                                                      Filter Alerts

                                                                                                                                                                                                                                                                                                                                                                      The alert feed can be filtered in multiple ways, to drill-down into the environment’s history and refine the alert displayed. The feed can be filtered by severity or status. Examples of each are shown below.

                                                                                                                                                                                                                                                                                                                                                                      The example below shows only high and medium severity:

                                                                                                                                                                                                                                                                                                                                                                      The example below shows the alerts that are invalid:

                                                                                                                                                                                                                                                                                                                                                                      Export Alerts as JSON

                                                                                                                                                                                                                                                                                                                                                                      A JSON file can be exported to a local machine, containing JSON snippets for each selected alert:

                                                                                                                                                                                                                                                                                                                                                                      1. Click the checkboxes beside the relevant alerts to be exported.

                                                                                                                                                                                                                                                                                                                                                                      2. Click Export JSON.

                                                                                                                                                                                                                                                                                                                                                                      Delete Alerts

                                                                                                                                                                                                                                                                                                                                                                      Open the Alert page and use one of the following methods to delete alerts :

                                                                                                                                                                                                                                                                                                                                                                      • Hover on a specific alert and click Delete.

                                                                                                                                                                                                                                                                                                                                                                      • Hover on one or more alerts, click the checkbox, then click Delete on the bulk-action toolbar.

                                                                                                                                                                                                                                                                                                                                                                      • Click an alert to see the detailed view, then click Delete on the top right corner.

                                                                                                                                                                                                                                                                                                                                                                      6.2 -

                                                                                                                                                                                                                                                                                                                                                                      Silence Alert Notifications

                                                                                                                                                                                                                                                                                                                                                                      Sysdig Monitor allows you to silence alerts for a given scope for a predefined amount of time. When silenced, alerts will still be triggered but will not send any notifications. You can schedule silencing in advance. This helps administrators to temporarily mute notifications during planned downtime or maintenance and send downtime notifications to selected channels.

                                                                                                                                                                                                                                                                                                                                                                      With an active silence, the only notifications you will receive are those indicating the start time and the end time of the silence. All other notifications for events from that scope will be silenced. When a silence is active, creating an alert triggers the alert but no notification will be sent. Additionally, a triggering event will be generated stating that the alert is silenced.

                                                                                                                                                                                                                                                                                                                                                                      See Working with Alert APIs for programmatically silencing alert notifications.

                                                                                                                                                                                                                                                                                                                                                                      Configure a Silence

                                                                                                                                                                                                                                                                                                                                                                      When you create a new silence, it is by default enabled and scheduled. When the start time arrives for a scheduled silence, it becomes active and the list shows the time remaining. When the end time arrives, the silence becomes completed and cannot be enabled again.

                                                                                                                                                                                                                                                                                                                                                                      To configure a silence:

                                                                                                                                                                                                                                                                                                                                                                      1. Click Alerts on the left navigation on the Monitor UI.

                                                                                                                                                                                                                                                                                                                                                                      2. Click the Silence tab.

                                                                                                                                                                                                                                                                                                                                                                        The page shows the list of all the existing silences.

                                                                                                                                                                                                                                                                                                                                                                      3. Click Set a Silence.

                                                                                                                                                                                                                                                                                                                                                                        The Silence for Scope window is displayed.

                                                                                                                                                                                                                                                                                                                                                                      1. Specify the following:

                                                                                                                                                                                                                                                                                                                                                                        • Scope: Specify the entity you want to apply the scope as. For example, a particular workload or namespace, from environments that may include thousands of entities.

                                                                                                                                                                                                                                                                                                                                                                        • Begins: Specify one of the following: Today, Tomorrow, Pick Another Day. Select the time from the drop-down.

                                                                                                                                                                                                                                                                                                                                                                        • Duration: Specify how long notifications should be suppressed.

                                                                                                                                                                                                                                                                                                                                                                        • Name: Specify a name to identify the silence.

                                                                                                                                                                                                                                                                                                                                                                        • Notify: Select a channel you want to notify about the silence.

                                                                                                                                                                                                                                                                                                                                                                      2. Click Save.

                                                                                                                                                                                                                                                                                                                                                                      Silence Alert Notifications from Event Feed

                                                                                                                                                                                                                                                                                                                                                                      You can also create and edit silences and view silenced alert events on the Events feeds across the Monitor UI. When you create a silence, the alert will still be triggered and posted on the Events feed and in the graph overlays but will indicate that the alert has been silenced.

                                                                                                                                                                                                                                                                                                                                                                      If you have an alert with no notification channel configured, events generated from that alert won’t be marked as silenced. They won’t be visually represented in the events feed as well with the crossed bell icon and the option to silence events.

                                                                                                                                                                                                                                                                                                                                                                      To do so,

                                                                                                                                                                                                                                                                                                                                                                      1. On the event feed, select the alert event that you want to silence.

                                                                                                                                                                                                                                                                                                                                                                      2. On the event details slider, click Take Action.

                                                                                                                                                                                                                                                                                                                                                                      3. Click Create Silence from Event.

                                                                                                                                                                                                                                                                                                                                                                        The Silence for Scope window is displayed.

                                                                                                                                                                                                                                                                                                                                                                      4. Continue configuring the silence as described in 4.

                                                                                                                                                                                                                                                                                                                                                                      Manage Silences

                                                                                                                                                                                                                                                                                                                                                                      Silences can be managed individually, or as a group, by using the checkboxes on the left side of the Silence UI and the customization bar. Select a group of silences and perform batch delete operations. Select individual silences to perform tasks such as enabling, disabling, duplicating, and editing.

                                                                                                                                                                                                                                                                                                                                                                      Change States

                                                                                                                                                                                                                                                                                                                                                                      You can enable or disable a silence by sliding the state bar next to the silences. There are two kinds of silences that will show as enabled: active (a running silence) and a scheduled silence (which will start in the future). Its starting date is back in time but the end date is yet to happen. A clock icon visually represents an active silence.

                                                                                                                                                                                                                                                                                                                                                                      Completed silences cannot be re-enabled once a silenced period is finished. However, you can duplicate it with all the data but you need to set a new silencing period.

                                                                                                                                                                                                                                                                                                                                                                      A silence can be disabled only when:

                                                                                                                                                                                                                                                                                                                                                                      • The silence is not yet started

                                                                                                                                                                                                                                                                                                                                                                      • The silence is in progress.

                                                                                                                                                                                                                                                                                                                                                                      Filter Silences

                                                                                                                                                                                                                                                                                                                                                                      Use the search bar to filter silences. You can either perform a simple auto-complete text search or use the categories. The feed can be filtered by the following categories: Active, Scheduled, Completed.

                                                                                                                                                                                                                                                                                                                                                                      For example, the following shows the completed silences that start with “cl”.

                                                                                                                                                                                                                                                                                                                                                                      Duplicate a Silence

                                                                                                                                                                                                                                                                                                                                                                      Do one of the following to duplicate a silence:

                                                                                                                                                                                                                                                                                                                                                                      • Click the Duplicate hover-the-row button on the menu.

                                                                                                                                                                                                                                                                                                                                                                      • Click the row for the Silence for Scope window to open. On the window, make necessary changes if required and click Duplicate.

                                                                                                                                                                                                                                                                                                                                                                      Edit Silence

                                                                                                                                                                                                                                                                                                                                                                      You can edit scheduled silences. For the active ones, you can only extend the time. You cannot edit completed silences.

                                                                                                                                                                                                                                                                                                                                                                      To edit a silence, do one of the following:

                                                                                                                                                                                                                                                                                                                                                                      • Click the row for the Silence for Scope window to open. Make necessary changes and click Update.

                                                                                                                                                                                                                                                                                                                                                                      • Click the Edit hover-the-row button on the menu. The Silence for Scope window will be displayed.

                                                                                                                                                                                                                                                                                                                                                                        Make necessary changes and click Update.

                                                                                                                                                                                                                                                                                                                                                                      Extend the Time Duration

                                                                                                                                                                                                                                                                                                                                                                      For the active silences, you can extend the duration to one of the following:

                                                                                                                                                                                                                                                                                                                                                                      • 1 Hour

                                                                                                                                                                                                                                                                                                                                                                      • 2 Hours,

                                                                                                                                                                                                                                                                                                                                                                      • 6 Hours,

                                                                                                                                                                                                                                                                                                                                                                      • 12 Hours

                                                                                                                                                                                                                                                                                                                                                                      • 24 Hours

                                                                                                                                                                                                                                                                                                                                                                      To do so, click the extend the time duration button on the menu and choose the duration. You can extend the time of an active silence even from the Silence for Scope window.

                                                                                                                                                                                                                                                                                                                                                                      Extending the time duration will notify the configured notification channels that the downtime is extended. You can also extend the time from a Slack notification of a silence by clicking the given link. It opens the Silence for Scope window of the running silence where you can make necessary adjustments.

                                                                                                                                                                                                                                                                                                                                                                      You cannot extend the duration of completed silences.

                                                                                                                                                                                                                                                                                                                                                                      6.3 -

                                                                                                                                                                                                                                                                                                                                                                      Alerts Library

                                                                                                                                                                                                                                                                                                                                                                      To help you get started quickly, Sysdig provides a set of curated alert templates called Alerts Library. Powered by Monitor Integrations , Sysdig automatically detects the applications and services running in your environment and recommends alerts that you can enable.

                                                                                                                                                                                                                                                                                                                                                                      Two types of alert templates are included in Alerts Library:

                                                                                                                                                                                                                                                                                                                                                                      • Recommended: Alert suggestions based on the services that are detected running in your infrastructure.

                                                                                                                                                                                                                                                                                                                                                                      • All templates: You can browse templates for all the services. For some templates, you might need to configure Monitor Integrations.

                                                                                                                                                                                                                                                                                                                                                                      Access Alerts Library

                                                                                                                                                                                                                                                                                                                                                                      1. Log in to Sysdig Monitor.

                                                                                                                                                                                                                                                                                                                                                                      2. Click Alerts from the left navigation pane.

                                                                                                                                                                                                                                                                                                                                                                      3. On the Alerts tab, click  Library.

                                                                                                                                                                                                                                                                                                                                                                      Import an Alert

                                                                                                                                                                                                                                                                                                                                                                      1. Locate the service that you want to configure an alert for.

                                                                                                                                                                                                                                                                                                                                                                        To do so, either use the text search or identify from a list of services.

                                                                                                                                                                                                                                                                                                                                                                      2. For example, click Redis.

                                                                                                                                                                                                                                                                                                                                                                        Eight template suggestions are displayed for 14 Redis services running on the environment.

                                                                                                                                                                                                                                                                                                                                                                      3. From a list of template suggestions, choose the desired template.

                                                                                                                                                                                                                                                                                                                                                                        The Redis page shows the alerts that are already in use and that you can enable.

                                                                                                                                                                                                                                                                                                                                                                      4. Enable one or more alert templates. To do so, you can do one of the following:

                                                                                                                                                                                                                                                                                                                                                                        • Click Enable Alert.

                                                                                                                                                                                                                                                                                                                                                                        • Bulk enable templates. Select the check box corresponding to the alert templates and click Enable Alert on the top-right corner.

                                                                                                                                                                                                                                                                                                                                                                        • Click on the alert template to display the slider. Click the Enable Alert on the slider.

                                                                                                                                                                                                                                                                                                                                                                      5. On the Configure Redis Alert page, specify the Scope and select the Notification channels.

                                                                                                                                                                                                                                                                                                                                                                      6. Click Enable Alert.

                                                                                                                                                                                                                                                                                                                                                                        You will see a message stating that the Redis Alert has been successfully created.

                                                                                                                                                                                                                                                                                                                                                                      Use Alerts Library

                                                                                                                                                                                                                                                                                                                                                                      In addition to importing an alert, you can also do the following with the Alerts Library:

                                                                                                                                                                                                                                                                                                                                                                      • Identify Alert templates associated with the services running in your infrastructure.

                                                                                                                                                                                                                                                                                                                                                                      • Bulk import Alert templates. See Import an Alert.

                                                                                                                                                                                                                                                                                                                                                                      • View alerts that are already configured.

                                                                                                                                                                                                                                                                                                                                                                      • Filter Alert templates. Enter the search string to display the matching results.

                                                                                                                                                                                                                                                                                                                                                                      • Discover the workloads where a service is running. To do so, click on the Alert template to display the slider. On the slider, click Workloads.

                                                                                                                                                                                                                                                                                                                                                                      • View the alerts in use. To do so, click on an Alert template to display the slider. On the slider, click Alerts in use.

                                                                                                                                                                                                                                                                                                                                                                      • Configure an alert.

                                                                                                                                                                                                                                                                                                                                                                        Additional alert configuration, such as changing the alert name, description, and severity can be done after the import.

                                                                                                                                                                                                                                                                                                                                                                      6.4 -

                                                                                                                                                                                                                                                                                                                                                                      Downtime Alert

                                                                                                                                                                                                                                                                                                                                                                      Sysdig Monitor continuously surveils any type of entity in your infrastructure, such as a host, a container, a process, or a service, and sends notifications when the monitored entity is not available or responding. Downtime alert focuses mainly on unscheduled downtime of your infrastructure.

                                                                                                                                                                                                                                                                                                                                                                      In this example, a Kubernetes cluster is monitored and the alert is segmented on both cluster and namespace. When a Kubernetes cluster in the selected availability zone goes down, notifications will be sent with necessary information on both cluster and affected namespace.

                                                                                                                                                                                                                                                                                                                                                                      The lines shown in the preview chart represent the values for the segments selected to monitor. The popup is a color-coded legend to show which segment (or combination of segments if there is more than one) the lines represent. You can also deselect some segment lines to prevent them from showing in the chart. Note that there is a limit of 10 lines that Sysdig Monitor ever shows in the preview chart. For downtime alerts, segments are actually what you select for the “Select entity to monitor” option.

                                                                                                                                                                                                                                                                                                                                                                      Define a Downtime Alert

                                                                                                                                                                                                                                                                                                                                                                      Guidelines

                                                                                                                                                                                                                                                                                                                                                                      • Set a unique name and description: Set a meaningful name and description that help recipients easily identify the alert.

                                                                                                                                                                                                                                                                                                                                                                      • Severity: Set a severity level for your alert. The Priority—High, Medium, Low, and Info—are reflected in the Alert list, where you can sort by the severity of the Alert. You can use severity as a criterion when creating alerts, for example: if there are more than 10 high severity events, notify.

                                                                                                                                                                                                                                                                                                                                                                      • Specify multiple segments: Selecting a single segment might not always supply enough information to troubleshoot. Enrich the selected entity with related information by adding additional related segments. Enter hierarchical entities so you have the bottom-down picture of what went wrong and where. For example, specifying a Kubernetes Cluster alone does not provide the context necessary to troubleshoot. In order to narrow down the issue, add further contextual information, such as Kubernetes Namespace, Kubernetes Deployment, and so on.

                                                                                                                                                                                                                                                                                                                                                                      Specify Entity

                                                                                                                                                                                                                                                                                                                                                                      1. Select an entity whose downtime you want to monitor for.

                                                                                                                                                                                                                                                                                                                                                                        In this example, you are monitoring the unscheduled downtime of a host.

                                                                                                                                                                                                                                                                                                                                                                      2. Specify additional segments:

                                                                                                                                                                                                                                                                                                                                                                        The specified entities are segmented on and notified with the default notification template as well as on the Preview. In this example, data is segmented on Kubernetes cluster name and namespace name. When a cluster is affected, the notification will not only include the affected cluster details but also the associated namespaces.

                                                                                                                                                                                                                                                                                                                                                                      Configure Scope

                                                                                                                                                                                                                                                                                                                                                                      Filter the environment on which this alert will apply. An alert will fire when a host goes down in the availability zone, us-east-1b.

                                                                                                                                                                                                                                                                                                                                                                      Use in or contain operators to match multiple different possible values to apply scope.

                                                                                                                                                                                                                                                                                                                                                                      The contain and not contain operators help you retrieve values if you know part of the values. For example, us retrieves values that contain strings that start with “us”, such as “us-east-1b”, “us-west-2b”, and so on.

                                                                                                                                                                                                                                                                                                                                                                      The in and not in operators help you filter multiple values.

                                                                                                                                                                                                                                                                                                                                                                      You can also create alerts directly from Explore and Dashboards for automatically populating this scope.

                                                                                                                                                                                                                                                                                                                                                                      Configure Trigger

                                                                                                                                                                                                                                                                                                                                                                      Define the threshold and time window for assessing the alert condition. Supported time scales are minute, hour, or day.

                                                                                                                                                                                                                                                                                                                                                                      If the monitored host or Kubernetes cluster is not available or not responding for the last 10 minutes, recipients will be notified.

                                                                                                                                                                                                                                                                                                                                                                      You can set any value for % and a value greater than 1 for the time window. For example, If you choose 50% instead of 100%, a notification will be triggered when the entity is down for 5 minutes in the selected time window of 10 minutes.

                                                                                                                                                                                                                                                                                                                                                                      Use Cases

                                                                                                                                                                                                                                                                                                                                                                      • Your e-commerce website is down during the peak hours of Black Friday, Christmas, or New Year season.

                                                                                                                                                                                                                                                                                                                                                                      • Production servers of your data center experience a critical outage

                                                                                                                                                                                                                                                                                                                                                                      • MySQL database is unreachable

                                                                                                                                                                                                                                                                                                                                                                      • File upload does not work on your marketing website.

                                                                                                                                                                                                                                                                                                                                                                      6.5 -

                                                                                                                                                                                                                                                                                                                                                                      PromQL Alerts

                                                                                                                                                                                                                                                                                                                                                                      Sysdig Monitor enables you to use PromQL to define metric expressions that you can alert on. You define the alert conditions using the PromQL-based metric expression. This way, you can combine different metrics and warn on cases like service-level agreement breach, running out of disk space in a day, and so on.

                                                                                                                                                                                                                                                                                                                                                                      Examples

                                                                                                                                                                                                                                                                                                                                                                      For PromQL alerts, you can use any metric that is available in PromQL, including Sysdig native metrics. For more details see the various integrations available on promcat.io.

                                                                                                                                                                                                                                                                                                                                                                      Low Disk Space Alert

                                                                                                                                                                                                                                                                                                                                                                      Warn if disk space falls below a specified quantity. For example disk space is below 10GB in the 24h hour:

                                                                                                                                                                                                                                                                                                                                                                      predict_linear(sysdig_fs_free_bytes{fstype!~"tmpfs"}[1h], 24*3600) < 10000000000
                                                                                                                                                                                                                                                                                                                                                                      

                                                                                                                                                                                                                                                                                                                                                                      Slow Etcd Requests

                                                                                                                                                                                                                                                                                                                                                                      Notify if etcd requests are slow. This example uses the promcat.io integration.

                                                                                                                                                                                                                                                                                                                                                                      histogram_quantile(0.99, rate(etcd_http_successful_duration_seconds_bucket[5m]) > 0.15
                                                                                                                                                                                                                                                                                                                                                                      

                                                                                                                                                                                                                                                                                                                                                                      High Heap Usage

                                                                                                                                                                                                                                                                                                                                                                      Warn when the heap usage in ElasticSearch is more than 80%. This example uses the promcat.io integration.

                                                                                                                                                                                                                                                                                                                                                                      (elasticsearch_jvm_memory_used_bytes{area="heap"} / elasticsearch_jvm_memory_max_bytes{area="heap"}) * 100 > 80
                                                                                                                                                                                                                                                                                                                                                                      

                                                                                                                                                                                                                                                                                                                                                                      Guidelines

                                                                                                                                                                                                                                                                                                                                                                      Sysdig Monitor does not currently support the following:

                                                                                                                                                                                                                                                                                                                                                                      • Interact with the Prometheus alert manager or import alert manager configuration.

                                                                                                                                                                                                                                                                                                                                                                      • Provide the ability to use, copy, paste, and import predefined alert rules.

                                                                                                                                                                                                                                                                                                                                                                      • Convert the alert rules to map to the Sysdig alert editor.

                                                                                                                                                                                                                                                                                                                                                                      Create a PromQL Alert

                                                                                                                                                                                                                                                                                                                                                                      Set a meaningful name and description that help recipients easily identify the alert.

                                                                                                                                                                                                                                                                                                                                                                      Set a Priority

                                                                                                                                                                                                                                                                                                                                                                      Select a priority for the alert that you are creating. The supported priorities are High, Medium, Low, and Info. You can also view and sort events in the dashboard and explore UI, as well as sort them by severity.

                                                                                                                                                                                                                                                                                                                                                                      Define a PromQL Alert

                                                                                                                                                                                                                                                                                                                                                                      PromQL: Enter a valid PromQL expression. The query will be executed every minute. However, the alert will be triggered only if the query returns data for the specified duration.

                                                                                                                                                                                                                                                                                                                                                                      In this example, you will be alerted when the rate of HTTP requests has doubled over the last 5 minutes.

                                                                                                                                                                                                                                                                                                                                                                      Duration: Specify the time window for evaluating the alert condition in minutes, hour, or day. The alert will be triggered if the query returns data for the specified duration.

                                                                                                                                                                                                                                                                                                                                                                      Define Notification

                                                                                                                                                                                                                                                                                                                                                                      Notification Channels: Select from the configured notification channels in the list.

                                                                                                                                                                                                                                                                                                                                                                      Re-notification Options: Set the time interval at which multiple alerts should be sent if the problem remains unresolved.

                                                                                                                                                                                                                                                                                                                                                                      Notification Message & Events: Enter a subject and body. Optionally, you can choose an existing template for the body. Modify the subject, body, or both for the alert notification with a hyperlink, plain text, or dynamic variables.

                                                                                                                                                                                                                                                                                                                                                                      Import Prometheus Alert Rules

                                                                                                                                                                                                                                                                                                                                                                      Sysdig Alert allows you to import Prometheus rules or create new rules on the fly and add them to the existing list of alerts. Click the Upload Prometheus Rules option and enter the rules as YAML in the Upload Prometheus Rules YAML editor. Importing your Prometheus alert rules will convert them to PromQL-based Sysdig alerts. Ensure that the alert rules are valid YAML.

                                                                                                                                                                                                                                                                                                                                                                      You can upload one or more alert rules in a single YAML and create multiple alerts simultaneously.

                                                                                                                                                                                                                                                                                                                                                                      Once the rules are imported to Sysdig Monitor, the alert list will be automatically sorted by last modified date.

                                                                                                                                                                                                                                                                                                                                                                      Besides the pre-populated template, each rule specified in the Upload Prometheus Rules YAML editor requires the following fields:

                                                                                                                                                                                                                                                                                                                                                                      • alert

                                                                                                                                                                                                                                                                                                                                                                      • expr

                                                                                                                                                                                                                                                                                                                                                                      • for

                                                                                                                                                                                                                                                                                                                                                                      See the following examples to understand the format of Prometheus Rules YAML. Ensure that the alert rules are valid YAML to pass validation.

                                                                                                                                                                                                                                                                                                                                                                      Example: Alert Prometheus Crash Looping

                                                                                                                                                                                                                                                                                                                                                                      To alert potential Prometheus crash looping. Create a rule to alert when Prometheus restart more than twice in the last 10 minutes.

                                                                                                                                                                                                                                                                                                                                                                      groups:
                                                                                                                                                                                                                                                                                                                                                                      - name: crashlooping
                                                                                                                                                                                                                                                                                                                                                                        rules:
                                                                                                                                                                                                                                                                                                                                                                        - alert: PrometheusTooManyRestarts
                                                                                                                                                                                                                                                                                                                                                                          expr: changes(process_start_time_seconds{job=~"prometheus|pushgateway|alertmanager"}[10m]) > 2
                                                                                                                                                                                                                                                                                                                                                                          for: 0m
                                                                                                                                                                                                                                                                                                                                                                          labels:
                                                                                                                                                                                                                                                                                                                                                                            severity: warning
                                                                                                                                                                                                                                                                                                                                                                          annotations:
                                                                                                                                                                                                                                                                                                                                                                            summary: Prometheus too many restarts (instance {{ $labels.instance }})
                                                                                                                                                                                                                                                                                                                                                                            description: Prometheus has restarted more than twice in the last 15 minutes. It might be crashlooping.\n  VALUE = {{ $value }}\n
                                                                                                                                                                                                                                                                                                                                                                      

                                                                                                                                                                                                                                                                                                                                                                      Example: Alert HTTP Error Rate

                                                                                                                                                                                                                                                                                                                                                                      To alert HTTP requests with status 5xx (> 5%) or high latency:

                                                                                                                                                                                                                                                                                                                                                                      groups:
                                                                                                                                                                                                                                                                                                                                                                      - name: default
                                                                                                                                                                                                                                                                                                                                                                        rules:
                                                                                                                                                                                                                                                                                                                                                                        # Paste your rules here
                                                                                                                                                                                                                                                                                                                                                                        - alert: NginxHighHttp5xxErrorRate
                                                                                                                                                                                                                                                                                                                                                                          expr: sum(rate(nginx_http_requests_total{status=~"^5.."}[1m])) / sum(rate(nginx_http_requests_total[1m])) * 100 > 5
                                                                                                                                                                                                                                                                                                                                                                          for: 1m
                                                                                                                                                                                                                                                                                                                                                                          labels:
                                                                                                                                                                                                                                                                                                                                                                            severity: critical
                                                                                                                                                                                                                                                                                                                                                                          annotations:
                                                                                                                                                                                                                                                                                                                                                                            summary: Nginx high HTTP 5xx error rate (instance {{ $labels.instance }})
                                                                                                                                                                                                                                                                                                                                                                            description: Too many HTTP requests with status 5xx
                                                                                                                                                                                                                                                                                                                                                                        - alert: NginxLatencyHigh
                                                                                                                                                                                                                                                                                                                                                                          expr: histogram_quantile(0.99, sum(rate(nginx_http_request_duration_seconds_bucket[2m])) by (host, node)) > 3
                                                                                                                                                                                                                                                                                                                                                                          for: 2m
                                                                                                                                                                                                                                                                                                                                                                          labels:
                                                                                                                                                                                                                                                                                                                                                                            severity: warning
                                                                                                                                                                                                                                                                                                                                                                          annotations:
                                                                                                                                                                                                                                                                                                                                                                            summary: Nginx latency high (instance {{ $labels.instance }})
                                                                                                                                                                                                                                                                                                                                                                            description: Nginx p99 latency is higher than 3 seconds
                                                                                                                                                                                                                                                                                                                                                                      

                                                                                                                                                                                                                                                                                                                                                                      Learn More

                                                                                                                                                                                                                                                                                                                                                                      6.6 -

                                                                                                                                                                                                                                                                                                                                                                      Metric Alerts

                                                                                                                                                                                                                                                                                                                                                                      Sysdig Monitor keeps a watch on time-series metrics, and alert if they violate user-defined thresholds.

                                                                                                                                                                                                                                                                                                                                                                      The lines shown in the preview chart represent the values for the segments selected to monitor. The popup is a color-coded legend to show which segment (or combination of segments if there is more than one) the lines represent. You can also deselect some segment lines to prevent them from showing in the chart. Note that there is a limit of 10 lines that Sysdig Monitor ever shows in the preview chart.

                                                                                                                                                                                                                                                                                                                                                                      Defining a Metric Alert

                                                                                                                                                                                                                                                                                                                                                                      Guidelines

                                                                                                                                                                                                                                                                                                                                                                      • Set a unique name and description: Set a meaningful name and description that help recipients easily identify the alert

                                                                                                                                                                                                                                                                                                                                                                      • Specify multiple segments: Selecting a single segment might not always supply enough information to troubleshoot. Enrich the selected entity with related information by adding additional related segments. Enter hierarchical entities so you have the bottom-down picture of what went wrong and where. For example, specifying a Kubernetes Cluster alone does not provide the context necessary to troubleshoot. In order to narrow down the issue, add further contextual information, such as Kubernetes Namespace, Kubernetes Deployment, and so on.

                                                                                                                                                                                                                                                                                                                                                                      Specify Metrics

                                                                                                                                                                                                                                                                                                                                                                      Select a metric that this alert will monitor. You can also define how data is aggregated, such as avg, max, min or sum. To alert on multiple metrics using boolean logic, switch to multi-condition alert.

                                                                                                                                                                                                                                                                                                                                                                      Configure Scope

                                                                                                                                                                                                                                                                                                                                                                      Filter the environment on which this alert will apply.

                                                                                                                                                                                                                                                                                                                                                                      Filter the environment on which this alert will apply. An alert will fire when a host goes down in the availability zone, us-east-1b.

                                                                                                                                                                                                                                                                                                                                                                      Use advanced operators to include, exclude, or pattern-match groups, tags, and entities. See Multi-Condition Alerts.

                                                                                                                                                                                                                                                                                                                                                                      You can also create alerts directly from Explore and Dashboards for automatically populating this scope.

                                                                                                                                                                                                                                                                                                                                                                      Configure Trigger

                                                                                                                                                                                                                                                                                                                                                                      Define the threshold and time window for assessing the alert condition. Single Alert fires an alert for your entire scope, while Multiple Alert fires if any or every segment breach the threshold at once.

                                                                                                                                                                                                                                                                                                                                                                      Metric alerts can be triggered to notify you of different aggregations:

                                                                                                                                                                                                                                                                                                                                                                      Aggregation

                                                                                                                                                                                                                                                                                                                                                                      Description

                                                                                                                                                                                                                                                                                                                                                                      on average

                                                                                                                                                                                                                                                                                                                                                                      The average of the retrieved metric values across the time period. Actual number of samples retrieved is used to calculate the value.

                                                                                                                                                                                                                                                                                                                                                                      For example, if new data is retrieved in the 7th minute of a 10-minutes sample and the alert is defined as on average, the alert will be calculated by summing the 3 recorded values and dividing by 3.

                                                                                                                                                                                                                                                                                                                                                                      as a rate

                                                                                                                                                                                                                                                                                                                                                                      The average value of the metric across the time period evaluated. The expected number of values is used to calculate the rate to trigger the alert.

                                                                                                                                                                                                                                                                                                                                                                      For example, if new data is retrieved in the 7th minute of a 10-minutes sample and the alert is defined as as a rate, the alert will be calculated by summing the 3 recorded values and dividing by 10 ( 10 x 1 minute samples).

                                                                                                                                                                                                                                                                                                                                                                      in sum

                                                                                                                                                                                                                                                                                                                                                                      The combined sum of the metric across the time period evaluated.

                                                                                                                                                                                                                                                                                                                                                                      at least once

                                                                                                                                                                                                                                                                                                                                                                      The trigger value is met for at least one sample in the evaluated period.

                                                                                                                                                                                                                                                                                                                                                                      for the entire time

                                                                                                                                                                                                                                                                                                                                                                      The trigger value is met for a every sample in the evaluated period.

                                                                                                                                                                                                                                                                                                                                                                      as a rate of change

                                                                                                                                                                                                                                                                                                                                                                      The trigger value is met the change in value over the evaluated period.

                                                                                                                                                                                                                                                                                                                                                                      For example, if the file system used percentage goes above 75 for the last 5 minutes on an average, multiple alerts will be triggered. The mac address of the host and mount directory of the file system will be represented in the alert notification.

                                                                                                                                                                                                                                                                                                                                                                      Usecases

                                                                                                                                                                                                                                                                                                                                                                      • Number of processes running on a host is not normal

                                                                                                                                                                                                                                                                                                                                                                      • Root volume disk usage in a container is high

                                                                                                                                                                                                                                                                                                                                                                      6.7 -

                                                                                                                                                                                                                                                                                                                                                                      Event Alerts

                                                                                                                                                                                                                                                                                                                                                                      Monitor occurrences of specific events, and alert if the total number of occurrences violates a threshold. Useful for alerting on container, orchestration, and service events like restarts and deployments.

                                                                                                                                                                                                                                                                                                                                                                      Alerts on events support only one segmentation label. An alert is generated for each segment.

                                                                                                                                                                                                                                                                                                                                                                      Defining a Metric Alert

                                                                                                                                                                                                                                                                                                                                                                      Guidelines

                                                                                                                                                                                                                                                                                                                                                                      • Set a unique name and description: Set a meaningful name and description that help recipients easily identify the alert.

                                                                                                                                                                                                                                                                                                                                                                      • Severity: Set a severity level for your alert. The Priority: High, Medium, Low, and Info are reflected in the Alert list, where you can sort by the severity by using the top navigation pane. You can use severity as a criterion when creating events and alerts, for example: if there are more than 10 high severity events, notify.

                                                                                                                                                                                                                                                                                                                                                                      • Source Tag: Supported source tags are Kubernetes, Docker, and Containerd.

                                                                                                                                                                                                                                                                                                                                                                      • Trigger: Specify the trigger condition in terms of the number of events for a given duration.

                                                                                                                                                                                                                                                                                                                                                                        Event alert support only one segmentation label. If you choose Multiple Alerts, Sysdig generates only one alert for a selected segment.

                                                                                                                                                                                                                                                                                                                                                                      Specify Event

                                                                                                                                                                                                                                                                                                                                                                      1. Specify the name, tag, or description of an event.

                                                                                                                                                                                                                                                                                                                                                                      2. Specify a Source Tag.

                                                                                                                                                                                                                                                                                                                                                                      Configure Scope

                                                                                                                                                                                                                                                                                                                                                                      Filter the environment on which this alert will apply. Use advanced operators to include, exclude, or pattern-match groups, tags, and entities. You can also create alerts directly from Explore and Dashboards for automatically populating this scope.

                                                                                                                                                                                                                                                                                                                                                                      In this example, failing a liveness probe in the agent-process-whitelist-cluster cluster triggers an alert.

                                                                                                                                                                                                                                                                                                                                                                      Configure Trigger

                                                                                                                                                                                                                                                                                                                                                                      Define the threshold and time window for assessing the alert condition. Single Alert fires an alert for your entire scope, while Multiple Alert fires if any or every segment breach the threshold at once.

                                                                                                                                                                                                                                                                                                                                                                      If the number of events triggered in the monitored entity is greater than 5 for the last 10 minutes, recipients will be notified through the selected channel.

                                                                                                                                                                                                                                                                                                                                                                      6.8 -

                                                                                                                                                                                                                                                                                                                                                                      Anomaly Detection Alerts

                                                                                                                                                                                                                                                                                                                                                                      Anomaly refers to an outlier in a given data set polled from an environment. It is a deviation from a conformed pattern. Anomaly detection is about identifying these anomalous observations. A set of data points collectively, a single instance of data or context-specific abnormalities help detect anomalies. For example, unauthorized copying of a directory from a container, high CPU or memory consumption, and so on.

                                                                                                                                                                                                                                                                                                                                                                      Define an Anomaly Detection Alert

                                                                                                                                                                                                                                                                                                                                                                      Guidelines

                                                                                                                                                                                                                                                                                                                                                                      • Set a unique name and description: Set a meaningful name and description that help recipients easily identify the alert

                                                                                                                                                                                                                                                                                                                                                                      • Severity: Set a severity level for your alert. The Priority: High, Medium, Low, and Info are reflected in the Alert list, where you can sort by the severity by using the top navigation pane. You can use severity as a criterion when creating events and alerts, for example: if there are more than 10 high severity events, notify.

                                                                                                                                                                                                                                                                                                                                                                      • Specify multiple segments: Selecting a single segment might not always supply enough information to troubleshoot. Enrich the selected entity with related information by adding additional related segments. Enter hierarchical entities so you have the bottom-down picture of what went wrong and where. For example, specifying a Kubernetes Cluster alone does not provide the context necessary to troubleshoot. In order to narrow down the issue, add further contextual information, such as Kubernetes Namespace, Kubernetes Deployment, and so on.

                                                                                                                                                                                                                                                                                                                                                                      Specify Entity

                                                                                                                                                                                                                                                                                                                                                                      Select one or more metrics whose behavior you want to monitor.

                                                                                                                                                                                                                                                                                                                                                                      Configure Scope

                                                                                                                                                                                                                                                                                                                                                                      Filter the environment on which this alert will apply. An alert will fire when the value returned by one of the selected metrics does not follow the pattern in the availability zone, us-east-1b.

                                                                                                                                                                                                                                                                                                                                                                      You can also create alerts directly from Explore and Dashboards for automatically populating this scope.

                                                                                                                                                                                                                                                                                                                                                                      Configure Trigger

                                                                                                                                                                                                                                                                                                                                                                      Trigger gives you control over how notifications are created and help prevent flooding your notification channel with notifications. For example, you may want to receive a notification for every violation, or only want a single notification for a series of consecutive violations.

                                                                                                                                                                                                                                                                                                                                                                      Define the threshold and time window for assessing the alert condition. Supported time scales are minute, hour, or day.

                                                                                                                                                                                                                                                                                                                                                                      If the monitored host or Kubernetes cluster is not available or not responding for the last 5 minutes, recipients will be notified.

                                                                                                                                                                                                                                                                                                                                                                      You can set any value for % and a value greater than 1 for the time window. For example, If you choose 50% instead of 100%, a notification will be triggered when the entity is down for 2.5 minutes in the selected time window of 5 minutes.

                                                                                                                                                                                                                                                                                                                                                                      6.9 -

                                                                                                                                                                                                                                                                                                                                                                      Group Outlier Alerts

                                                                                                                                                                                                                                                                                                                                                                      Sysdig Monitor observes a group of hosts and notifies you when one acts differently from the rest.

                                                                                                                                                                                                                                                                                                                                                                      Define a Group Outlier Alert

                                                                                                                                                                                                                                                                                                                                                                      Guidelines

                                                                                                                                                                                                                                                                                                                                                                      • Set a unique name and description: Set a meaningful name and description that help recipients easily identify the alert

                                                                                                                                                                                                                                                                                                                                                                      • Severity: Set a severity level for your alert. The Priority: High, Medium, Low and Info are reflected in the Alert list, where you can sort by the severity by using the top navigation pane. You can use severity as a criterion when creating events and alerts, for example: if there are more than 10 high severity events, notify.

                                                                                                                                                                                                                                                                                                                                                                      Specify Entity

                                                                                                                                                                                                                                                                                                                                                                      Select one or more metrics whose behavior you want to monitor.

                                                                                                                                                                                                                                                                                                                                                                      Configure Scope

                                                                                                                                                                                                                                                                                                                                                                      Filter the environment on which this alert will apply. An alert will fire when the value returned by one of the selected metrics does not follow the pattern in the availability zone, us-east-1b.

                                                                                                                                                                                                                                                                                                                                                                      You can also create alerts directly from Explore and Dashboards for automatically populating this scope.

                                                                                                                                                                                                                                                                                                                                                                      Configure Trigger

                                                                                                                                                                                                                                                                                                                                                                      Trigger gives you control over how notifications are created and help prevent flooding your notification channel with notifications. For example, you may want to receive a notification for every violation, or only want a single notification for a series of consecutive violations.

                                                                                                                                                                                                                                                                                                                                                                      Define the threshold and time window for assessing the alert condition. Supported time scales are minute, hour, or day.

                                                                                                                                                                                                                                                                                                                                                                      If the monitored host or Kubernetes cluster is not available or not responding for the last 5 minutes, recipients will be notified.

                                                                                                                                                                                                                                                                                                                                                                      You can set any value for % and a value greater than 1 for the time window. For example, If you choose 50% instead of 100%, a notification will be triggered when the entity is down for 2.5 minutes in the selected time window of 5 minutes.

                                                                                                                                                                                                                                                                                                                                                                      Usecases

                                                                                                                                                                                                                                                                                                                                                                      • Load balancer servers have uneven workloads

                                                                                                                                                                                                                                                                                                                                                                      • Changes in applications or instances deployed in different availability zones.

                                                                                                                                                                                                                                                                                                                                                                      • Network hogging hosts in a cluster

                                                                                                                                                                                                                                                                                                                                                                      7 -

                                                                                                                                                                                                                                                                                                                                                                      Events

                                                                                                                                                                                                                                                                                                                                                                      The Sysdig Monitor Events module displays a comprehensive and unified list of events, both monitoring and security, that have occurred within the environment, as a live events feed. The feed displays events created by triggered alerts, pulled from infrastructure services, initiated by Sysdig Security such as policy and image scanning, or defined by users, and allows users to review, track, and resolve issues. Each event is enriched with rich metadata and the entire relationship within the system under purview is built when searched for events. With a unified Event stream, Sysdig Monitor eliminates the need for standalone tools for security and monitoring alerts.

                                                                                                                                                                                                                                                                                                                                                                      Learn more about Sysdig Monitor Events in the following sections:

                                                                                                                                                                                                                                                                                                                                                                      7.1 -

                                                                                                                                                                                                                                                                                                                                                                      Event Types

                                                                                                                                                                                                                                                                                                                                                                      There are three primary types of events displayed in the Sysdig Secure Events feed: alert events, infrastructure events, and custom events. Note that image scanning and security events are displayed in the Sysdig Secure interface.

                                                                                                                                                                                                                                                                                                                                                                      Alert Events

                                                                                                                                                                                                                                                                                                                                                                      Alert events are triggered by user-configured alerts. For more information on configuring alerts, refer to the Sysdig Monitor Alerts documentation.

                                                                                                                                                                                                                                                                                                                                                                      Infrastructure Events

                                                                                                                                                                                                                                                                                                                                                                      Events can be collected from supported services within the production environment. The Sysdig agent automatically discovers these services and is configured to collect event data for a select group of events by default. Additional events can be added to the list by configuring the dragent.yaml file.

                                                                                                                                                                                                                                                                                                                                                                      Sysdig currently supports event monitoring for the following infrastructure services:

                                                                                                                                                                                                                                                                                                                                                                      Events marked with * are enabled by default. For more information on configuring additional infrastructure events, refer to the Enable/Disable Event Data.

                                                                                                                                                                                                                                                                                                                                                                      Docker Events

                                                                                                                                                                                                                                                                                                                                                                      The following Docker events are supported.

                                                                                                                                                                                                                                                                                                                                                                      docker:
                                                                                                                                                                                                                                                                                                                                                                          container:
                                                                                                                                                                                                                                                                                                                                                                            - attach       # Container Attached      (information)
                                                                                                                                                                                                                                                                                                                                                                            - commit       # Container Committed     (information)
                                                                                                                                                                                                                                                                                                                                                                            - copy         # Container Copied        (information)
                                                                                                                                                                                                                                                                                                                                                                            - create       # Container Created       (information)
                                                                                                                                                                                                                                                                                                                                                                            - destroy      # Container Destroyed     (warning)
                                                                                                                                                                                                                                                                                                                                                                            - die          # Container Died          (warning)
                                                                                                                                                                                                                                                                                                                                                                            - exec_create  # Container Exec Created  (information)
                                                                                                                                                                                                                                                                                                                                                                            - exec_start   # Container Exec Started  (information)
                                                                                                                                                                                                                                                                                                                                                                            - export       # Container Exported      (information)
                                                                                                                                                                                                                                                                                                                                                                            - kill         # Container Killed        (warning)*
                                                                                                                                                                                                                                                                                                                                                                            - oom          # Container Out of Memory (warning)*
                                                                                                                                                                                                                                                                                                                                                                            - pause        # Container Paused        (information)
                                                                                                                                                                                                                                                                                                                                                                            - rename       # Container Renamed       (information)
                                                                                                                                                                                                                                                                                                                                                                            - resize       # Container Resized       (information)
                                                                                                                                                                                                                                                                                                                                                                            - restart      # Container Restarted     (warning)
                                                                                                                                                                                                                                                                                                                                                                            - start        # Container Started       (information)
                                                                                                                                                                                                                                                                                                                                                                            - stop         # Container Stopped       (information)
                                                                                                                                                                                                                                                                                                                                                                            - top          # Container Top           (information)
                                                                                                                                                                                                                                                                                                                                                                            - unpause      # Container Unpaused      (information)
                                                                                                                                                                                                                                                                                                                                                                            - update       # Container Updated       (information)
                                                                                                                                                                                                                                                                                                                                                                          image:
                                                                                                                                                                                                                                                                                                                                                                            - delete # Image Deleted  (information)
                                                                                                                                                                                                                                                                                                                                                                            - import # Image Imported (information)
                                                                                                                                                                                                                                                                                                                                                                            - pull   # Image Pulled   (information)
                                                                                                                                                                                                                                                                                                                                                                            - push   # Image Pushed   (information)
                                                                                                                                                                                                                                                                                                                                                                            - tag    # Image Tagged   (information)
                                                                                                                                                                                                                                                                                                                                                                            - untag  # Image Untaged  (information)
                                                                                                                                                                                                                                                                                                                                                                          volume:
                                                                                                                                                                                                                                                                                                                                                                            - create  # Volume Created    (information)
                                                                                                                                                                                                                                                                                                                                                                            - mount   # Volume Mounted    (information)
                                                                                                                                                                                                                                                                                                                                                                            - unmount # Volume Unmounted  (information)
                                                                                                                                                                                                                                                                                                                                                                            - destroy # Volume Destroyed  (information)
                                                                                                                                                                                                                                                                                                                                                                          network:
                                                                                                                                                                                                                                                                                                                                                                            - create     # Network Created       (information)
                                                                                                                                                                                                                                                                                                                                                                            - connect    # Network Connected     (information)
                                                                                                                                                                                                                                                                                                                                                                            - disconnect # Network Disconnected  (information)
                                                                                                                                                                                                                                                                                                                                                                            - destroy    # Network Destroyed     (information)
                                                                                                                                                                                                                                                                                                                                                                      

                                                                                                                                                                                                                                                                                                                                                                      Kubernetes Events

                                                                                                                                                                                                                                                                                                                                                                      The following Kubernetes events are supported.

                                                                                                                                                                                                                                                                                                                                                                      kubernetes:
                                                                                                                                                                                                                                                                                                                                                                          node:
                                                                                                                                                                                                                                                                                                                                                                            - TerminatedAllPods       # Terminated All Pods      (information)
                                                                                                                                                                                                                                                                                                                                                                            - RegisteredNode          # Node Registered          (information)*
                                                                                                                                                                                                                                                                                                                                                                            - RemovingNode            # Removing Node            (information)*
                                                                                                                                                                                                                                                                                                                                                                            - DeletingNode            # Deleting Node            (information)*
                                                                                                                                                                                                                                                                                                                                                                            - DeletingAllPods         # Deleting All Pods        (information)
                                                                                                                                                                                                                                                                                                                                                                            - TerminatingEvictedPod   # Terminating Evicted Pod  (information)*
                                                                                                                                                                                                                                                                                                                                                                            - NodeReady               # Node Ready               (information)*
                                                                                                                                                                                                                                                                                                                                                                            - NodeNotReady            # Node not Ready           (information)*
                                                                                                                                                                                                                                                                                                                                                                            - NodeSchedulable         # Node is Schedulable      (information)*
                                                                                                                                                                                                                                                                                                                                                                            - NodeNotSchedulable      # Node is not Schedulable  (information)*
                                                                                                                                                                                                                                                                                                                                                                            - CIDRNotAvailable        # CIDR not Available       (information)*
                                                                                                                                                                                                                                                                                                                                                                            - CIDRAssignmentFailed    # CIDR Assignment Failed   (information)*
                                                                                                                                                                                                                                                                                                                                                                            - Starting                # Starting Kubelet         (information)*
                                                                                                                                                                                                                                                                                                                                                                            - KubeletSetupFailed      # Kubelet Setup Failed     (warning)*
                                                                                                                                                                                                                                                                                                                                                                            - FailedMount             # Volume Mount Failed      (warning)*
                                                                                                                                                                                                                                                                                                                                                                            - NodeSelectorMismatching # Node Selector Mismatch   (warning)*
                                                                                                                                                                                                                                                                                                                                                                            - InsufficientFreeCPU     # Insufficient Free CPU    (warning)*
                                                                                                                                                                                                                                                                                                                                                                            - InsufficientFreeMemory  # Insufficient Free Mem    (warning)*
                                                                                                                                                                                                                                                                                                                                                                            - OutOfDisk               # Out of Disk              (information)*
                                                                                                                                                                                                                                                                                                                                                                            - HostNetworkNotSupported # Host Ntw not Supported   (warning)*
                                                                                                                                                                                                                                                                                                                                                                            - NilShaper               # Undefined Shaper         (warning)*
                                                                                                                                                                                                                                                                                                                                                                            - Rebooted                # Node Rebooted            (warning)*
                                                                                                                                                                                                                                                                                                                                                                            - NodeHasSufficientDisk   # Node Has Sufficient Disk (information)*
                                                                                                                                                                                                                                                                                                                                                                            - NodeOutOfDisk           # Node Out of Disk Space   (information)*
                                                                                                                                                                                                                                                                                                                                                                            - InvalidDiskCapacity     # Invalid Disk Capacity    (warning)*
                                                                                                                                                                                                                                                                                                                                                                            - FreeDiskSpaceFailed     # Free Disk Space Failed   (warning)*
                                                                                                                                                                                                                                                                                                                                                                          pod:
                                                                                                                                                                                                                                                                                                                                                                            - Pulling           # Pulling Container Image          (information)
                                                                                                                                                                                                                                                                                                                                                                            - Pulled            # Ctr Img Pulled                   (information)
                                                                                                                                                                                                                                                                                                                                                                            - Failed            # Ctr Img Pull/Create/Start Fail   (warning)*
                                                                                                                                                                                                                                                                                                                                                                            - InspectFailed     # Ctr Img Inspect Failed           (warning)*
                                                                                                                                                                                                                                                                                                                                                                            - ErrImageNeverPull # Ctr Img NeverPull Policy Violate (warning)*
                                                                                                                                                                                                                                                                                                                                                                            - BackOff           # Back Off Ctr Start, Image Pull   (warning)
                                                                                                                                                                                                                                                                                                                                                                            - Created           # Container Created                (information)
                                                                                                                                                                                                                                                                                                                                                                            - Started           # Container Started                (information)
                                                                                                                                                                                                                                                                                                                                                                            - Killing           # Killing Container                (information)*
                                                                                                                                                                                                                                                                                                                                                                            - Unhealthy         # Container Unhealthy              (warning)
                                                                                                                                                                                                                                                                                                                                                                            - FailedSync        # Pod Sync Failed                  (warning)
                                                                                                                                                                                                                                                                                                                                                                            - FailedValidation  # Failed Pod Config Validation     (warning)
                                                                                                                                                                                                                                                                                                                                                                            - OutOfDisk         # Out of Disk                      (information)*
                                                                                                                                                                                                                                                                                                                                                                            - HostPortConflict  # Host/Port Conflict               (warning)*
                                                                                                                                                                                                                                                                                                                                                                          replicationController:
                                                                                                                                                                                                                                                                                                                                                                            - SuccessfulCreate    # Pod Created        (information)*
                                                                                                                                                                                                                                                                                                                                                                            - FailedCreate        # Pod Create Failed  (warning)*
                                                                                                                                                                                                                                                                                                                                                                            - SuccessfulDelete    # Pod Deleted        (information)*
                                                                                                                                                                                                                                                                                                                                                                            - FailedDelete        # Pod Delete Failed  (warning)*
                                                                                                                                                                                                                                                                                                                                                                      

                                                                                                                                                                                                                                                                                                                                                                      Custom Events

                                                                                                                                                                                                                                                                                                                                                                      Additional events can be collected by the Sysdig agent and displayed in the Events module, but require more comprehensive configuration steps. These custom events can be integrated via:

                                                                                                                                                                                                                                                                                                                                                                      • The Sysdig Monitor Slackbot

                                                                                                                                                                                                                                                                                                                                                                      • Python scripts (either pre-built by Sysdig or user-created)

                                                                                                                                                                                                                                                                                                                                                                      • A CURL request

                                                                                                                                                                                                                                                                                                                                                                      For brief sample scripts regarding configuring other custom events, refer to the Custom Events. For more information, contact Sysdig Support.

                                                                                                                                                                                                                                                                                                                                                                      LogDNA Events

                                                                                                                                                                                                                                                                                                                                                                      Sysdig provides the ability to view LogDNA alerts as Sysdig events.

                                                                                                                                                                                                                                                                                                                                                                      If you are both a LogDNA and Sysdig Monitor user, you can send alerts from the LogDNA platform to Sysdig Monitor as Sysdig events. These events will provide a link redirecting you to the LogDNA for further investigation. Similar to other types of Sysdig Events, you can create alerts based on the LogDNA events.

                                                                                                                                                                                                                                                                                                                                                                      The log data provided by LogDNA carries additional details about system health. The ability to view relevant LogDNA events in Sysdig helps you debug and monitor the health of a system efficiently.

                                                                                                                                                                                                                                                                                                                                                                      For example, if the number of logs generated during a deployment is higher than expected, you get notified with your Sysdig Events feed.

                                                                                                                                                                                                                                                                                                                                                                      There is no configuration required on the Sysdig Monitor side. For information on configuring LogDNA to send alerts to Sysdig Monitor, see Sysdig Alert Integration.

                                                                                                                                                                                                                                                                                                                                                                      7.2 -

                                                                                                                                                                                                                                                                                                                                                                      Custom Events

                                                                                                                                                                                                                                                                                                                                                                      Sysdig Monitor can ingest any custom event created, including code deploys, auto-scaling activities, and business level actions. These events will be automatically overlayed on charts and graphs for easy correlation of all performance data. The sections below outline the different ways custom events can be sent to Sysdig Monitor.

                                                                                                                                                                                                                                                                                                                                                                      Application Integrations

                                                                                                                                                                                                                                                                                                                                                                      Sysdig Monitor supports event integrations with certain applications by default. The Sysdig agent will automatically discover these services and begin collecting event data from them. For more information, refer to the Events documentation.

                                                                                                                                                                                                                                                                                                                                                                      Sysdig Monitor Slackbot

                                                                                                                                                                                                                                                                                                                                                                      Sysdigbot, the Sysdig Monitor Slackbot, allows users to post custom events directly to the Sysdig Cloud through chats with a Slack bot.

                                                                                                                                                                                                                                                                                                                                                                      Prebuilt Python Script

                                                                                                                                                                                                                                                                                                                                                                      The Sysdig python script provides a way to send events to Sysdig Monitor directly from the command line, using the following command structure:

                                                                                                                                                                                                                                                                                                                                                                      python post_event.py SYSDIG_TOKEN NAME [-d DESCRIPTION] [-s SEVERITY] [-c SCOPE] [-t TAGS] [-h]
                                                                                                                                                                                                                                                                                                                                                                      

                                                                                                                                                                                                                                                                                                                                                                      For more information, refer to the Sysdig Github repository.

                                                                                                                                                                                                                                                                                                                                                                      Python Sample Client

                                                                                                                                                                                                                                                                                                                                                                      The Sysdig Monitor python client acts as a wrapper around the Sysdig Monitor REST API, exposing most of the REST API functionality to provide an easy to use and install python interface. The post_event() function can be used to send events to Sysdig Monitor from any custom script. An example script is shown below:

                                                                                                                                                                                                                                                                                                                                                                      import os
                                                                                                                                                                                                                                                                                                                                                                      import sys
                                                                                                                                                                                                                                                                                                                                                                      
                                                                                                                                                                                                                                                                                                                                                                      sys.path.insert(0, os.path.join(os.path.dirname(os.path.realpath(sys.argv[0])), '..'))
                                                                                                                                                                                                                                                                                                                                                                      
                                                                                                                                                                                                                                                                                                                                                                      from sdcclient import SdcClient
                                                                                                                                                                                                                                                                                                                                                                      
                                                                                                                                                                                                                                                                                                                                                                      # Parse arguments
                                                                                                                                                                                                                                                                                                                                                                      sdc_token = sys.argv[1]
                                                                                                                                                                                                                                                                                                                                                                      name = sys.argv[2]
                                                                                                                                                                                                                                                                                                                                                                      
                                                                                                                                                                                                                                                                                                                                                                      # Instantiate the SDC client
                                                                                                                                                                                                                                                                                                                                                                      sdclient = SdcClient(SDC_TOKEN)
                                                                                                                                                                                                                                                                                                                                                                      
                                                                                                                                                                                                                                                                                                                                                                      # Post the event using post_event(self, name, description=None, severity=None, event_filter=None, tags=None)
                                                                                                                                                                                                                                                                                                                                                                      res = sdclient.post_event(NAME)
                                                                                                                                                                                                                                                                                                                                                                      

                                                                                                                                                                                                                                                                                                                                                                      Curl Sample Client

                                                                                                                                                                                                                                                                                                                                                                      The Sysdig Monitor REST API offers the full functionality of the Sysdig Monitor app over API, allowing custom events to be sent directly to the Sysdig Cloud over the REST API. The example below is a curl request:

                                                                                                                                                                                                                                                                                                                                                                      #!/bin/bash
                                                                                                                                                                                                                                                                                                                                                                      SDC_ACCESS_TOKEN='626abc7-YOUR-TOKEN-HERE-3a3ghj432'
                                                                                                                                                                                                                                                                                                                                                                      ENDPOINT='app.sysdigcloud.com'
                                                                                                                                                                                                                                                                                                                                                                      
                                                                                                                                                                                                                                                                                                                                                                      curl -X POST -s https://'"${ENDPOINT}"'/api/v2/events -H 'Content-Type: application/json; charset=UTF-8' -H 'Accept: application/json, text/javascript, */*; q=0.01' -H "Authorization: Bearer ${SDC_ACCESS_TOKEN}" --data-binary '{"event": {"name": "Jenkins - start wordpress deploy", "description": "deploy", "severity": "MEDIUM", "scope": "host.hostName = \"ip-10-1-1-1\" and build = \"89\""}}}'
                                                                                                                                                                                                                                                                                                                                                                      sleep 5s
                                                                                                                                                                                                                                                                                                                                                                      

                                                                                                                                                                                                                                                                                                                                                                      See also Enable/Disable Event Data.

                                                                                                                                                                                                                                                                                                                                                                      7.3 -

                                                                                                                                                                                                                                                                                                                                                                      Severity and Status

                                                                                                                                                                                                                                                                                                                                                                      Event Severity

                                                                                                                                                                                                                                                                                                                                                                      Event severity is broken down into four categories in the Sysdig Monitor UI, to better visualize issue priority, and allow for easier filtering practices.

                                                                                                                                                                                                                                                                                                                                                                      Scripts that used the former severity values (0-7) will continue to work as expected, as the new categories are simplified groupings of those values.

                                                                                                                                                                                                                                                                                                                                                                      The image below outlines the severity value breakdown:

                                                                                                                                                                                                                                                                                                                                                                      Event Status

                                                                                                                                                                                                                                                                                                                                                                      There are two primary event states: triggered, and resolved. In addition, there are two additional statuses available to improve filtering practices.

                                                                                                                                                                                                                                                                                                                                                                      Event Status

                                                                                                                                                                                                                                                                                                                                                                      Description

                                                                                                                                                                                                                                                                                                                                                                      Triggered

                                                                                                                                                                                                                                                                                                                                                                      The circumstances that triggered the event remain in place (for example, the node remains down).

                                                                                                                                                                                                                                                                                                                                                                      Resolved

                                                                                                                                                                                                                                                                                                                                                                      The circumstances that triggered the event are no longer in place (for example, the metric value has returned to within a normal range).

                                                                                                                                                                                                                                                                                                                                                                      Acknowledged

                                                                                                                                                                                                                                                                                                                                                                      Manual label to assist in further filtering the events feed.

                                                                                                                                                                                                                                                                                                                                                                      The acknowledged label is a purely visual marker. It does not reflect the current state (triggered/resolved) of the event.

                                                                                                                                                                                                                                                                                                                                                                      Custom events cannot be marked as acknowledged.

                                                                                                                                                                                                                                                                                                                                                                      Unacknowledged

                                                                                                                                                                                                                                                                                                                                                                      Manual label to assist in further filtering the events feed.

                                                                                                                                                                                                                                                                                                                                                                      All events are marked as unacknowledged by default.

                                                                                                                                                                                                                                                                                                                                                                      Silenced

                                                                                                                                                                                                                                                                                                                                                                      List of silenced event alerts. For more information, see Silence Alert Notifications.

                                                                                                                                                                                                                                                                                                                                                                      For more information on filtering the Events feed, refer to Filtering and Searching Events.

                                                                                                                                                                                                                                                                                                                                                                      7.4 -

                                                                                                                                                                                                                                                                                                                                                                      Event Scope

                                                                                                                                                                                                                                                                                                                                                                      By default, Events feed displays events from the entire environment. However, the feed can be configured to only show events from a particular scope within that environment. The scope of the event feeds can be configured by labels.

                                                                                                                                                                                                                                                                                                                                                                      Labels refer to a set of meaningful key-value pair (whitelist) that is defined by Sysdig Monitor. As a user, you have the ability to configure the whitelist. For example, if you are using ECS and have custom container labels you have defined, you have the ability to configure the whitelist and add the labels you need. Once done, all the infrastructure events related to containers are enriched with these labels and the event scope will display associated metadata.

                                                                                                                                                                                                                                                                                                                                                                      For more information on scoping, refer to the Grouping, Scoping, and Segmenting Metrics documentation.

                                                                                                                                                                                                                                                                                                                                                                      Configure Event Scope

                                                                                                                                                                                                                                                                                                                                                                      To configure the events feed scope:

                                                                                                                                                                                                                                                                                                                                                                      1. From the Events module, click the Edit Scope link.

                                                                                                                                                                                                                                                                                                                                                                      2. Open the top-level drop-down menu.

                                                                                                                                                                                                                                                                                                                                                                      3. Select the desired label, either by scrolling through the list, or by typing the name/partial name into the search bar, and selecting it.

                                                                                                                                                                                                                                                                                                                                                                      4. Open the Operator drop-down menu, and select the relevant option.

                                                                                                                                                                                                                                                                                                                                                                      5. Open the Value drop-down menu, and select the relevant options.

                                                                                                                                                                                                                                                                                                                                                                      6. Optional: Open the next level drop-down menu, and repeat steps 3-5.

                                                                                                                                                                                                                                                                                                                                                                      7. Optional: Repeat step 6 for each additional layer of scope required.

                                                                                                                                                                                                                                                                                                                                                                        Individual layers of the scope can be removed if necessary, by clicking the Delete (x) icon beside the relevant layer.

                                                                                                                                                                                                                                                                                                                                                                      8. Click the Apply button to save the new scope.

                                                                                                                                                                                                                                                                                                                                                                      Filter Events by Scope

                                                                                                                                                                                                                                                                                                                                                                      Events are by default filtered by scope in Dashboards and Explore to show the most relevant events associated with the selected scope. This capability enables you to quickly narrow down the potential problems in the area under purview. However, you can turn the filtering off and see Events from the complete scope. To do so in Explore:

                                                                                                                                                                                                                                                                                                                                                                      1. On the Explore module, click the Options (three dots) icon and select Events.

                                                                                                                                                                                                                                                                                                                                                                        Event Scope Editor

                                                                                                                                                                                                                                                                                                                                                                        The Events panel appears. you can do the following:

                                                                                                                                                                                                                                                                                                                                                                        • Determine whether to show events or not.

                                                                                                                                                                                                                                                                                                                                                                        • Determine the maximum number of events to be displayed in the Explore table.

                                                                                                                                                                                                                                                                                                                                                                        • Filter events by

                                                                                                                                                                                                                                                                                                                                                                          • Type: The types of events supported are custom events and alerts. See Event Types for more information.

                                                                                                                                                                                                                                                                                                                                                                          • State: The types of events supported are triggered and resolved. See Severity and Status for more information.

                                                                                                                                                                                                                                                                                                                                                                          • Severity: The supported severity levels are all severity types, high severity, and both high and medium levels. See Severity and Status for more information.

                                                                                                                                                                                                                                                                                                                                                                          • Resolution: The supported resolutions are both acknowledged and unacknowledged, acknowledged only, and unacknowledged only. See Severity and Status for more information.

                                                                                                                                                                                                                                                                                                                                                                        • Determine whether to show events by scope. Use the toggle button to turn off filtering by scope.

                                                                                                                                                                                                                                                                                                                                                                          If you disable this option, the Explore table will show feed for all the events in the infrastructure, including those that are irrelevant to the selected scope. Leave the Filter events by selected scope option enabled to see only the relevant events.

                                                                                                                                                                                                                                                                                                                                                                      2. Click Save.

                                                                                                                                                                                                                                                                                                                                                                        Similarly, you can turn off filtering events by scope in Dashboards.

                                                                                                                                                                                                                                                                                                                                                                      Reset the Environment Scope

                                                                                                                                                                                                                                                                                                                                                                      To reset the scope to the entire environment:

                                                                                                                                                                                                                                                                                                                                                                      1. From the Events module, click the Edit Scope link.

                                                                                                                                                                                                                                                                                                                                                                      2. Click the Clear All link.

                                                                                                                                                                                                                                                                                                                                                                      3. Click the Apply button to save the changes.

                                                                                                                                                                                                                                                                                                                                                                      7.5 -

                                                                                                                                                                                                                                                                                                                                                                      Configure Event Alerts

                                                                                                                                                                                                                                                                                                                                                                      Event alerts can be created (for custom events) and configured (for alert events, and custom events with a previously created alert) from the Event Details panel:

                                                                                                                                                                                                                                                                                                                                                                      1. From the Events module, select the event from the feed to open the Event Details panel.

                                                                                                                                                                                                                                                                                                                                                                      2. Open the Configure Alert panel:

                                                                                                                                                                                                                                                                                                                                                                        1. For existing alerts, click the Edit Alert link.

                                                                                                                                                                                                                                                                                                                                                                        2. For new alerts, click the Create Alert from Event button.

                                                                                                                                                                                                                                                                                                                                                                      3. Configure the alert as necessary. For more information on configuring alerts, refer to the Alerts documentation.

                                                                                                                                                                                                                                                                                                                                                                        New alerts will be auto-filled with information from the custom event.

                                                                                                                                                                                                                                                                                                                                                                      4. Click the Create button for new alerts, or the Save button for existing alerts.

                                                                                                                                                                                                                                                                                                                                                                      7.6 -

                                                                                                                                                                                                                                                                                                                                                                      Filtering and Searching Events

                                                                                                                                                                                                                                                                                                                                                                      Filter Events

                                                                                                                                                                                                                                                                                                                                                                      The events feed can be filtered in multiple ways, to drill-down into the environment’s history and refine the events displayed. The feed can be filtered by severity, type, and/or status. Examples of each are shown below.

                                                                                                                                                                                                                                                                                                                                                                      The example below shows only high and medium severity events:

                                                                                                                                                                                                                                                                                                                                                                      The example below shows only Kubernetes events:

                                                                                                                                                                                                                                                                                                                                                                      The example below shows only events that are Unacknowledged:

                                                                                                                                                                                                                                                                                                                                                                      The Acknowledged label is a purely visual marker, and does not reflect the current state (triggered/resolved) of the event. By default, all events are Unacknowledged.

                                                                                                                                                                                                                                                                                                                                                                      The example below shows medium severity Alert events that remain Triggered, but have been acknowledged:

                                                                                                                                                                                                                                                                                                                                                                      Search for an Event

                                                                                                                                                                                                                                                                                                                                                                      The event feeds can be searched by using the search icon in the top bar:

                                                                                                                                                                                                                                                                                                                                                                      7.7 -

                                                                                                                                                                                                                                                                                                                                                                      Review Events

                                                                                                                                                                                                                                                                                                                                                                      Events can be reviewed in detail by clicking on the event listing in the feed:

                                                                                                                                                                                                                                                                                                                                                                      To review the environment at the time of the event in detail, click the Explore button to navigate to the Explore module. The Explore module will automatically drill-down to the impacted environment objects.

                                                                                                                                                                                                                                                                                                                                                                      The Event Details Panel

                                                                                                                                                                                                                                                                                                                                                                      The Event Details panel contains detailed information about the event. This information is different, depending on whether the event is an Alert event or a Custom event.

                                                                                                                                                                                                                                                                                                                                                                      Alert Events

                                                                                                                                                                                                                                                                                                                                                                      The example below is of an Alert event:

                                                                                                                                                                                                                                                                                                                                                                      MetadataDescription
                                                                                                                                                                                                                                                                                                                                                                      Event IDThe unique ID of the event.
                                                                                                                                                                                                                                                                                                                                                                      SeverityThe severity of the event (High, Medium, Low, Info).
                                                                                                                                                                                                                                                                                                                                                                      StateThe current state of the event (Triggered, Resolved)
                                                                                                                                                                                                                                                                                                                                                                      DurationThe length of time the event lasted.
                                                                                                                                                                                                                                                                                                                                                                      AcknowledgedWhether the event has been acknowledged or not.
                                                                                                                                                                                                                                                                                                                                                                      TriggerThe cause of the event (for example, the metric that exceeded the defined range, and the value it reached).
                                                                                                                                                                                                                                                                                                                                                                      EntityThe entity on which the event occurred.
                                                                                                                                                                                                                                                                                                                                                                      Start TimeThe date and time the event started.
                                                                                                                                                                                                                                                                                                                                                                      End TimeThe date and time the event ended.
                                                                                                                                                                                                                                                                                                                                                                      Alert NameThe name of the alert that was triggered.
                                                                                                                                                                                                                                                                                                                                                                      TypeThe type of alert.
                                                                                                                                                                                                                                                                                                                                                                      MetricsThe metric/s that were affected.
                                                                                                                                                                                                                                                                                                                                                                      Trigger ConditionThe condition that was met to trigger the alert.
                                                                                                                                                                                                                                                                                                                                                                      ScopeThe scope of the alert.
                                                                                                                                                                                                                                                                                                                                                                      SegmentThe segmentation applied to the alert.

                                                                                                                                                                                                                                                                                                                                                                      To configure the alert that created the event, click the Edit Alert link in the Event Details panel. For more information about alerts, refer to the Alerts documentation.

                                                                                                                                                                                                                                                                                                                                                                      Security Events

                                                                                                                                                                                                                                                                                                                                                                      Policy

                                                                                                                                                                                                                                                                                                                                                                      The example shows an event notifying a potentially unauthorized terminal shell in a container. For more information on Policy alerts, see Secure Events.

                                                                                                                                                                                                                                                                                                                                                                      MetadataDescription
                                                                                                                                                                                                                                                                                                                                                                      Event IDThe unique ID of the event.
                                                                                                                                                                                                                                                                                                                                                                      SeverityThe severity of the event (High, Medium, Low, Info).
                                                                                                                                                                                                                                                                                                                                                                      Date / TimeThe date and time the event occurred.
                                                                                                                                                                                                                                                                                                                                                                      HostThe hostname and physical address (MAC)
                                                                                                                                                                                                                                                                                                                                                                      ContainerThe container name, unique identifier, and image.
                                                                                                                                                                                                                                                                                                                                                                      SummaryA detailed description of what occurred.

                                                                                                                                                                                                                                                                                                                                                                      Scanning

                                                                                                                                                                                                                                                                                                                                                                      The example is a high severity event alerting a change in the scan result of an elasticsearch image on Quay. For more information on Scanning, see Scanning Alerts.

                                                                                                                                                                                                                                                                                                                                                                      MetadataDescription
                                                                                                                                                                                                                                                                                                                                                                      Event IDThe unique ID of the event.
                                                                                                                                                                                                                                                                                                                                                                      SeverityThe severity of the event (High, Medium, Low, Info).
                                                                                                                                                                                                                                                                                                                                                                      Date / TimeThe date and time the event occurred.
                                                                                                                                                                                                                                                                                                                                                                      Image RegistryThe repository where the image resides (for example, Quay).
                                                                                                                                                                                                                                                                                                                                                                      TagThe image name associated with the image.
                                                                                                                                                                                                                                                                                                                                                                      Image IDThe unique identifier of the image.
                                                                                                                                                                                                                                                                                                                                                                      DigestA content-addressable identifier which contains the SHA256 hash of the image’s JSON configuration object.

                                                                                                                                                                                                                                                                                                                                                                      Infrastructure and Custom Events

                                                                                                                                                                                                                                                                                                                                                                      Infrastructure and custom events display the same set of information in the Event Details panel. The example below is a Docker event:

                                                                                                                                                                                                                                                                                                                                                                      MetadataDescription
                                                                                                                                                                                                                                                                                                                                                                      Event IDThe unique ID of the event.
                                                                                                                                                                                                                                                                                                                                                                      SeverityThe severity of the event (High, Medium, Low, Info).
                                                                                                                                                                                                                                                                                                                                                                      Date / TimeThe date and time the event occurred.
                                                                                                                                                                                                                                                                                                                                                                      SourceThe source of the event (for example, Docker).
                                                                                                                                                                                                                                                                                                                                                                      ScopeThe scope of the event.
                                                                                                                                                                                                                                                                                                                                                                      DescriptionA detailed description of what occurred.

                                                                                                                                                                                                                                                                                                                                                                      8 -

                                                                                                                                                                                                                                                                                                                                                                      Monitoring Integrations

                                                                                                                                                                                                                                                                                                                                                                      Integrations for Sysdig Monitor include a number of platforms, orchestrators, and a wide range of applications designed to extend Monitor capabilities and collect metrics from these systems. Sysdig collects metrics from Prometheus, JMX, StatsD, Kubernetes, and a number of applications to provide a 360-degree view of your infrastructure. Many metrics are collected out of the box; you can also extend the integration or create custom metrics to receive curated insights into your infrastructure stack.

                                                                                                                                                                                                                                                                                                                                                                      Key Benefits

                                                                                                                                                                                                                                                                                                                                                                      • Collects the richest data set for cloud-native visibility and security.

                                                                                                                                                                                                                                                                                                                                                                      • Polls data, auto-discover context in order to provide operational and security insights.

                                                                                                                                                                                                                                                                                                                                                                      • Simplifies deploying your monitoring integrations by providing guided configuration, curated list of enterprise-grade images, integration with CI/CD workflow.

                                                                                                                                                                                                                                                                                                                                                                      • Extends the power of Prometheus metrics with additional insight from other metrics types and infrastructure stack.

                                                                                                                                                                                                                                                                                                                                                                      • Employs Prometheus alert and events and provides ready-to-use dashboards for Kubernetes monitoring needs.

                                                                                                                                                                                                                                                                                                                                                                      • Exposes application metrics using Java JMX and MBeans monitoring.

                                                                                                                                                                                                                                                                                                                                                                      Key Integrations

                                                                                                                                                                                                                                                                                                                                                                      Inbound

                                                                                                                                                                                                                                                                                                                                                                      • Monitoring Integrations

                                                                                                                                                                                                                                                                                                                                                                        Describes how to configure Monitoring Integration in your infrastructure and receive deeper insight into the health and performance of your services across platforms and the cloud.

                                                                                                                                                                                                                                                                                                                                                                      • Prometheus Metrics

                                                                                                                                                                                                                                                                                                                                                                        Describes how Sysdig agent enables automatically collecting metrics from services that expose native Prometheus metrics as well as from applications with Prometheus exporters, how to set up your environment, and scrape Prometheus metrics seamlessly.

                                                                                                                                                                                                                                                                                                                                                                      • Agent Installation

                                                                                                                                                                                                                                                                                                                                                                        Learn how to install Sysdig agents on supported platforms.

                                                                                                                                                                                                                                                                                                                                                                      • AWS CloudWatch

                                                                                                                                                                                                                                                                                                                                                                        Illustrates how to configure Sysdig to collect various types of CloudWatch metrics.

                                                                                                                                                                                                                                                                                                                                                                      • Java Management Extention (JMX) Metrics

                                                                                                                                                                                                                                                                                                                                                                        Describes how to configure your Java virtual machines so Sysdig Agent can collect JMX metrics using the JMX protocol.

                                                                                                                                                                                                                                                                                                                                                                      • StatsD Metrics

                                                                                                                                                                                                                                                                                                                                                                        Describes how the Sysdig agent collects custom StatsD metrics with an embedded StatsD server.

                                                                                                                                                                                                                                                                                                                                                                      • Node.JS Metrics

                                                                                                                                                                                                                                                                                                                                                                        Illustrates how Sysdig is able to monitor node.js applications by linking a library to the node.js codebase.

                                                                                                                                                                                                                                                                                                                                                                      • Monitor Log Files

                                                                                                                                                                                                                                                                                                                                                                        Learn how to search a string by using the chisel script called logwatcher.

                                                                                                                                                                                                                                                                                                                                                                      • (legacy) Integrate Applications

                                                                                                                                                                                                                                                                                                                                                                        Describes the monitoring capabilities of Sysdig agent with application check scripts or ‘app checks’.

                                                                                                                                                                                                                                                                                                                                                                      Oubound

                                                                                                                                                                                                                                                                                                                                                                      • Notification Channels

                                                                                                                                                                                                                                                                                                                                                                        Learn how to add, edit, or delete a variety of notification channel types, and how to disable or delete notifications when they are not needed, for example, during scheduled downtime.

                                                                                                                                                                                                                                                                                                                                                                      • S3 Capture Storage

                                                                                                                                                                                                                                                                                                                                                                        Learn how to configure Sysdig to use an AWS S3 bucket or custom S3 storage for storing Capture files.

                                                                                                                                                                                                                                                                                                                                                                      Platform Metrics (IBM)

                                                                                                                                                                                                                                                                                                                                                                      For Sysdig instances deployed on IBM Cloud Monitoring with Sysdig, an additional form of metrics collection is offered: Platform metrics. Rather than being collected by the Sysdig agent, when enabled, Platform metrics are reported to Sysdig directly by the IBM Cloud infrastructure.

                                                                                                                                                                                                                                                                                                                                                                      Enable this feature by logging into the IBM Cloud console and selecting “Enable” for IBM Platform metrics under the Configure your resource section when creating a new IBM Cloud Monitoring with a Sysdig instance, as described here.

                                                                                                                                                                                                                                                                                                                                                                      8.1 -

                                                                                                                                                                                                                                                                                                                                                                      Configure Monitoring Integrations

                                                                                                                                                                                                                                                                                                                                                                      Monitoring Integration provides an at-a-glance summary of workloads running in your infrastructure and a deeper insight into the health and performance of your services across platforms and the cloud. You can easily identify the workloads in your team scope, the service discovered (such as etcd) within each workload, and configure the Prometheus exporter integration to collect and visualize time series metrics. Monitoring Integration also powers Alerts Library.

                                                                                                                                                                                                                                                                                                                                                                      The following indicates integration status for each service integrations:

                                                                                                                                                                                                                                                                                                                                                                      • Reporting Metrics: The integration is configured correctly and is reporting metrics.

                                                                                                                                                                                                                                                                                                                                                                      • Needs Attention: An integration has stopped working and is no longer reporting metrics or requires some other type of attention.

                                                                                                                                                                                                                                                                                                                                                                      • Pending Metrics: An integration has recently been configured and has been waiting to receive metrics.

                                                                                                                                                                                                                                                                                                                                                                      • Configure Integration: The integration needs to be configured, and therefore no metrics are reported.

                                                                                                                                                                                                                                                                                                                                                                      Ensure that you meet the prerequisites given in Guidelines for Monitoring Integrations to make the best use of this feature.

                                                                                                                                                                                                                                                                                                                                                                      Access Monitoring Integrations

                                                                                                                                                                                                                                                                                                                                                                      1. Log in to Sysdig Monitor.

                                                                                                                                                                                                                                                                                                                                                                      2. Select Integration > Monitoring Integration in the management section of the left-hand sidebar.

                                                                                                                                                                                                                                                                                                                                                                        The Integrations page is displayed. Continue with Configure an Integration.

                                                                                                                                                                                                                                                                                                                                                                      Configure an Integration

                                                                                                                                                                                                                                                                                                                                                                      1. Locate the service that you want to configure an integration for. To do so, identify the workload and drill down to the grouping where the service is running.

                                                                                                                                                                                                                                                                                                                                                                        To locate the service, you can use one of the following:

                                                                                                                                                                                                                                                                                                                                                                        • Text search
                                                                                                                                                                                                                                                                                                                                                                        • Type filtering
                                                                                                                                                                                                                                                                                                                                                                        • Left navigation to filter the workload and then use text search or type filtering
                                                                                                                                                                                                                                                                                                                                                                        • Use the Configure Integration option on the top, and locate the service using text search or type filtering
                                                                                                                                                                                                                                                                                                                                                                      2. Click Configure Integration.

                                                                                                                                                                                                                                                                                                                                                                        1. Click Start Installation.
                                                                                                                                                                                                                                                                                                                                                                        2. Review the prerequisites.
                                                                                                                                                                                                                                                                                                                                                                        3. Do one of the following:
                                                                                                                                                                                                                                                                                                                                                                          • Dry Run: Use kubectl command to install the service. Follow the on-screen instructions to complete the tasks successfully.
                                                                                                                                                                                                                                                                                                                                                                          • Patch: Install directly on your workload. Follow the on-screen instructions to complete the tasks successfully.
                                                                                                                                                                                                                                                                                                                                                                          • Manual: Use an exporter and install the service manually. Click Documentation to learn more about the service exporter and integrate with Sysdig Monitor
                                                                                                                                                                                                                                                                                                                                                                      3. Click Validate to validate the installation.

                                                                                                                                                                                                                                                                                                                                                                      4. Make sure that the wizard shows the Installation Complete screen.

                                                                                                                                                                                                                                                                                                                                                                      5. Click Close to close the window.

                                                                                                                                                                                                                                                                                                                                                                      Show Unidentified Workloads

                                                                                                                                                                                                                                                                                                                                                                      The services that Sysdig Monitor cannot discover can technically still be monitored through the Unidentified Workloads option. You can view the workloads with these unidentified services or applications and see their status. To do so, use the Unidentified Workloads slider at the top right corner of the Integration page.

                                                                                                                                                                                                                                                                                                                                                                      Learn More

                                                                                                                                                                                                                                                                                                                                                                      8.1.1 -

                                                                                                                                                                                                                                                                                                                                                                      Guidelines for Monitoring Integrations

                                                                                                                                                                                                                                                                                                                                                                      If you are directed to this page from the Sysdig Monitor app, your agent deployment might include a configuration that causes either of the following:

                                                                                                                                                                                                                                                                                                                                                                      • Prohibits the use of Monitoring Integrations
                                                                                                                                                                                                                                                                                                                                                                      • Affect the current metrics you are already collecting

                                                                                                                                                                                                                                                                                                                                                                      Ensure that you meet the prerequisites to successfully use Monitoring Integrations. For technical assistance, contact Sysdig Support.

                                                                                                                                                                                                                                                                                                                                                                      Prerequisites

                                                                                                                                                                                                                                                                                                                                                                      • Upgrade Sysdig agent to v12.0.0

                                                                                                                                                                                                                                                                                                                                                                      • If you have clusters with more than 50 nodes and you don’t have the prom_service_discovery option enabled:

                                                                                                                                                                                                                                                                                                                                                                        • Enabling the latest Prometheus features might create an additional connection to the Kubernetes API server from each Sysdig agent in your environment. The surge in agent connections can increase the CPU and memory load in your API servers. Therefore, ensure that your API servers are suitably sized to handle the increased load in large clusters.
                                                                                                                                                                                                                                                                                                                                                                        • If you encounter any problems contact Sysdig Support.
                                                                                                                                                                                                                                                                                                                                                                      • Remove the following manual configurations in the dragent.yaml file because they might interfere with those provided by Sysdig:

                                                                                                                                                                                                                                                                                                                                                                        • use_promscrape
                                                                                                                                                                                                                                                                                                                                                                        • promscrape_fastproto
                                                                                                                                                                                                                                                                                                                                                                        • prom_service_discovery
                                                                                                                                                                                                                                                                                                                                                                        • prometheus.max_metrics
                                                                                                                                                                                                                                                                                                                                                                        • prometheus.ingest_raw
                                                                                                                                                                                                                                                                                                                                                                        • prometheus.ingest_calculated
                                                                                                                                                                                                                                                                                                                                                                      • The sysdig_sd_configs configuration is no longer supported. Remove the existing prometheus.yaml if it includes the sysdig_sd_configs configuration.

                                                                                                                                                                                                                                                                                                                                                                      If you are not currently using Prometheus metrics in Sysdig Monitor, you can skip the following steps:

                                                                                                                                                                                                                                                                                                                                                                      • If you are using a custom Prometheus process_filter in dragent.yaml to trigger scraping, see Migrating from Promscrape V1 to V2.

                                                                                                                                                                                                                                                                                                                                                                      • If you are using service annotations or container labels to find scrape targets, you may need to create new scrape_configs in prometheus.yaml , preferably based on Kubernetes pods service discovery. This configuration can be complicated in certain environments and therefore we recommend that you contact Sysdig support for help.

                                                                                                                                                                                                                                                                                                                                                                      Learn More

                                                                                                                                                                                                                                                                                                                                                                      8.1.2 -

                                                                                                                                                                                                                                                                                                                                                                      Configure Default Integrations

                                                                                                                                                                                                                                                                                                                                                                      Each Monitoring Integration holds a specific job that scrapes its metrics and sends them to Sysdig Monitor. To optimize metrics scraping for building dashboards and alerts in Sysdig Monitor, Sysdig offers default jobs for these integrations. Periodically, the Sysdig agent connects with Sysdig Monitor and retrieves the default jobs and make the Monitoring Integrations available for use. See the list of the available integrations and corresponding jobs.

                                                                                                                                                                                                                                                                                                                                                                      You can find all the jobs in the /opt/draios/etc/promscrape.yaml file in the sysdig-agent container in your cluster.

                                                                                                                                                                                                                                                                                                                                                                      Supported Monitoring Integrations

                                                                                                                                                                                                                                                                                                                                                                      IntegrationOut of the BoxEnabled by defaultJob name in config file
                                                                                                                                                                                                                                                                                                                                                                      Apacheapache-exporter-default, apache-grok-default
                                                                                                                                                                                                                                                                                                                                                                      Cephceph-default
                                                                                                                                                                                                                                                                                                                                                                      Consulconsul-server-default, consul-envoy-default
                                                                                                                                                                                                                                                                                                                                                                      ElasticSearchelasticsearch-default
                                                                                                                                                                                                                                                                                                                                                                      Fluentdfluentd-default
                                                                                                                                                                                                                                                                                                                                                                      HaProxyhaproxy-default
                                                                                                                                                                                                                                                                                                                                                                      Harborharbor-exporter-default, harbor-core-default, harbor-registry-default, harbor-jobservice-default
                                                                                                                                                                                                                                                                                                                                                                      Kubernetes API Serverkubernetes-apiservers-default
                                                                                                                                                                                                                                                                                                                                                                      Kubernetes Control Planekube-dns-default, kube-scheduler-default, kube-controller-manager-default
                                                                                                                                                                                                                                                                                                                                                                      Kubernetes Etcdetcd-default
                                                                                                                                                                                                                                                                                                                                                                      Kubeletk8s-kubelet-default
                                                                                                                                                                                                                                                                                                                                                                      Kube-Proxykubernetes-kube-proxy-default
                                                                                                                                                                                                                                                                                                                                                                      Kubernetes Persistent Volume Claimk8s-pvc-default
                                                                                                                                                                                                                                                                                                                                                                      Kubernetes Storagek8s-storage-default
                                                                                                                                                                                                                                                                                                                                                                      Kedakeda-default
                                                                                                                                                                                                                                                                                                                                                                      Memcachedmemcached-default
                                                                                                                                                                                                                                                                                                                                                                      MongoDBmongodb-default
                                                                                                                                                                                                                                                                                                                                                                      MySQLmysql-default
                                                                                                                                                                                                                                                                                                                                                                      Nginxnginx-default
                                                                                                                                                                                                                                                                                                                                                                      Nginx Ingressnginx-ingress-default
                                                                                                                                                                                                                                                                                                                                                                      NTPntp-default
                                                                                                                                                                                                                                                                                                                                                                      Open Policy Agent - Gatekeeperopa-default
                                                                                                                                                                                                                                                                                                                                                                      Php-fpmphp-fpm-default
                                                                                                                                                                                                                                                                                                                                                                      Portworxportworx-default, portworx-openshift-default
                                                                                                                                                                                                                                                                                                                                                                      PostgreSQLpostgres-default
                                                                                                                                                                                                                                                                                                                                                                      Prometheus Default Jobk8s-pods
                                                                                                                                                                                                                                                                                                                                                                      RabbitMQrabbitmq-default
                                                                                                                                                                                                                                                                                                                                                                      Redisredis-default
                                                                                                                                                                                                                                                                                                                                                                      Sysdig Admission Controllersysdig-admission-controller-default

                                                                                                                                                                                                                                                                                                                                                                      Enable and Disable Integrations

                                                                                                                                                                                                                                                                                                                                                                      Some integrations are disabled by default due to the potential high cardinality of their metrics. To enable them, contact Sysdig Support. The same applies to disabling integrations by default in all your clusters.

                                                                                                                                                                                                                                                                                                                                                                      Customize a Default Job

                                                                                                                                                                                                                                                                                                                                                                      The default jobs offered by Sysdig for integrations are optimized to scrape the metrics for building dashboards and alerts in Sysdig Monitor. Instead of processing all the metrics available, you can determine which metrics to include or exclude for your requirements. To do so, you can overwrite the default configuration in the prometheus.yaml file. The prometheus.yaml file is located in the sysdig-agent ConfigMap in the sysdig-agent namespace.

                                                                                                                                                                                                                                                                                                                                                                      You can overwrite the default job for a specific integration by adding a new job to the prometheus.yaml file with the same name as the default job that you want to replace. For example, if you want to create a new job for the Apache integration, create a new job with the name apache-default. The jobs defined by the user has precedence over the default ones.

                                                                                                                                                                                                                                                                                                                                                                      See Supported Monitoring Integrations for the complete list of integrations and corresponding job names.

                                                                                                                                                                                                                                                                                                                                                                      Use Sysdig Annotations in Exporters

                                                                                                                                                                                                                                                                                                                                                                      Sysdig provides a set of Helm charts that helps you configure the exporters for the integrations. For more information on installing Monitor Integrations, see the Monitoring Integrations option in the Sysdig Monitor. Additionally, the Helm charts are publicly available in the Sysdig Helm repository.

                                                                                                                                                                                                                                                                                                                                                                      If exporters are already installed in your cluster, you can use the standard Prometheus annotations and the Sysdig agent will automatically scrape them.

                                                                                                                                                                                                                                                                                                                                                                      For example, if you use the annotation given below, the incoming metrics will have the information about the pod that generates the metrics.

                                                                                                                                                                                                                                                                                                                                                                      spec:
                                                                                                                                                                                                                                                                                                                                                                        template:
                                                                                                                                                                                                                                                                                                                                                                          metadata:
                                                                                                                                                                                                                                                                                                                                                                            annotations:
                                                                                                                                                                                                                                                                                                                                                                              prometheus.io/path: /metrics
                                                                                                                                                                                                                                                                                                                                                                              prometheus.io/port: '9100'
                                                                                                                                                                                                                                                                                                                                                                              prometheus.io/scrape: 'true'
                                                                                                                                                                                                                                                                                                                                                                      

                                                                                                                                                                                                                                                                                                                                                                      If you use an exporter, the incoming metrics will be associated with the exporter pod, not the application pod. To change this behavior, you can use the Sysdig-provided annotations and configure the exporter on the agent.

                                                                                                                                                                                                                                                                                                                                                                      Annotate the Exporter

                                                                                                                                                                                                                                                                                                                                                                      Use the following annotations to configure the exporter:

                                                                                                                                                                                                                                                                                                                                                                      spec:
                                                                                                                                                                                                                                                                                                                                                                        template:
                                                                                                                                                                                                                                                                                                                                                                          metadata:
                                                                                                                                                                                                                                                                                                                                                                            annotations:
                                                                                                                                                                                                                                                                                                                                                                              promcat.sysdig.com/port: '9187'
                                                                                                                                                                                                                                                                                                                                                                              promcat.sysdig.com/target_ns: my-namespace
                                                                                                                                                                                                                                                                                                                                                                              promcat.sysdig.com/target_workload_type: deployment
                                                                                                                                                                                                                                                                                                                                                                              promcat.sysdig.com/target_workload_name: my-workload
                                                                                                                                                                                                                                                                                                                                                                              promcat.sysdig.com/integration_type: my-integration
                                                                                                                                                                                                                                                                                                                                                                      
                                                                                                                                                                                                                                                                                                                                                                      • port: The port to scrape for metrics on the exporter.
                                                                                                                                                                                                                                                                                                                                                                      • target_ns: The namespace of the workload corresponding to the application (not the exporter).
                                                                                                                                                                                                                                                                                                                                                                      • target_workload_type: The type of the workload of the application (not the exporter). The possible values are deployment, statefulset, and daemonset.
                                                                                                                                                                                                                                                                                                                                                                      • target_workload_name: The name of the workload corresponding to the application (not the exporter).
                                                                                                                                                                                                                                                                                                                                                                      • integration_type: The type of the integration. The job created in the Sysdig agent use this value to find the exporter.

                                                                                                                                                                                                                                                                                                                                                                      Configure a New Job

                                                                                                                                                                                                                                                                                                                                                                      Edit the prometheus.yaml file to configure a new job in Sysdig agent. The file is located in the sysdig-agent ConfigMap in the sysdig-agent namespace.

                                                                                                                                                                                                                                                                                                                                                                      You can use the following example template:

                                                                                                                                                                                                                                                                                                                                                                      - job_name: my-integration
                                                                                                                                                                                                                                                                                                                                                                        tls_config:
                                                                                                                                                                                                                                                                                                                                                                          insecure_skip_verify: true
                                                                                                                                                                                                                                                                                                                                                                        kubernetes_sd_configs:
                                                                                                                                                                                                                                                                                                                                                                          - role: pod
                                                                                                                                                                                                                                                                                                                                                                        relabel_configs:
                                                                                                                                                                                                                                                                                                                                                                          - action: keep
                                                                                                                                                                                                                                                                                                                                                                            source_labels: [__meta_kubernetes_pod_host_ip]
                                                                                                                                                                                                                                                                                                                                                                            regex: __HOSTIPS__
                                                                                                                                                                                                                                                                                                                                                                          - action: drop
                                                                                                                                                                                                                                                                                                                                                                            source_labels: [__meta_kubernetes_pod_annotation_promcat_sysdig_com_omit]
                                                                                                                                                                                                                                                                                                                                                                            regex: true
                                                                                                                                                                                                                                                                                                                                                                          - action: keep
                                                                                                                                                                                                                                                                                                                                                                            source_labels:
                                                                                                                                                                                                                                                                                                                                                                              - __meta_kubernetes_pod_annotation_promcat_sysdig_com_integration_type
                                                                                                                                                                                                                                                                                                                                                                            regex: 'my-integration' # Use here the integration type that you defined in your annotations
                                                                                                                                                                                                                                                                                                                                                                          - action: replace
                                                                                                                                                                                                                                                                                                                                                                            source_labels: [__meta_kubernetes_pod_annotation_promcat_sysdig_com_target_ns]
                                                                                                                                                                                                                                                                                                                                                                            target_label: kube_namespace_name
                                                                                                                                                                                                                                                                                                                                                                          - action: replace
                                                                                                                                                                                                                                                                                                                                                                            source_labels: [__meta_kubernetes_pod_annotation_promcat_sysdig_com_target_workload_type]
                                                                                                                                                                                                                                                                                                                                                                            target_label: kube_workload_type
                                                                                                                                                                                                                                                                                                                                                                          - action: replace
                                                                                                                                                                                                                                                                                                                                                                            source_labels: [__meta_kubernetes_pod_annotation_promcat_sysdig_com_target_workload_name]
                                                                                                                                                                                                                                                                                                                                                                            target_label: kube_workload_name
                                                                                                                                                                                                                                                                                                                                                                          - action: replace
                                                                                                                                                                                                                                                                                                                                                                            replacement: true
                                                                                                                                                                                                                                                                                                                                                                            target_label: sysdig_omit_source
                                                                                                                                                                                                                                                                                                                                                                          - action: replace
                                                                                                                                                                                                                                                                                                                                                                            source_labels: [__address__, __meta_kubernetes_pod_annotation_promcat_sysdig_com_port]
                                                                                                                                                                                                                                                                                                                                                                            regex: ([^:]+)(?::\d+)?;(\d+)
                                                                                                                                                                                                                                                                                                                                                                            replacement: $1:$2
                                                                                                                                                                                                                                                                                                                                                                            target_label: __address__
                                                                                                                                                                                                                                                                                                                                                                          - action: replace
                                                                                                                                                                                                                                                                                                                                                                            source_labels: [__meta_kubernetes_pod_uid]
                                                                                                                                                                                                                                                                                                                                                                            target_label: sysdig_k8s_pod_uid
                                                                                                                                                                                                                                                                                                                                                                          - action: replace
                                                                                                                                                                                                                                                                                                                                                                            source_labels: [__meta_kubernetes_pod_container_name]
                                                                                                                                                                                                                                                                                                                                                                            target_label: sysdig_k8s_pod_container_name
                                                                                                                                                                                                                                                                                                                                                                      

                                                                                                                                                                                                                                                                                                                                                                      Exclude a Deployment from Being Scraped

                                                                                                                                                                                                                                                                                                                                                                      If you want the agent to exclude a deployment from being scraped, use the following annotation:

                                                                                                                                                                                                                                                                                                                                                                      spec:
                                                                                                                                                                                                                                                                                                                                                                        template:
                                                                                                                                                                                                                                                                                                                                                                          metadata:
                                                                                                                                                                                                                                                                                                                                                                            annotations:
                                                                                                                                                                                                                                                                                                                                                                              promcat.sysdig.com/omit: 'true'
                                                                                                                                                                                                                                                                                                                                                                      

                                                                                                                                                                                                                                                                                                                                                                      Learn More

                                                                                                                                                                                                                                                                                                                                                                      8.2.1 -

                                                                                                                                                                                                                                                                                                                                                                      Apache

                                                                                                                                                                                                                                                                                                                                                                      Apache

                                                                                                                                                                                                                                                                                                                                                                      This integration is enabled by default.

                                                                                                                                                                                                                                                                                                                                                                      List of alerts

                                                                                                                                                                                                                                                                                                                                                                      AlertDescriptionFormat
                                                                                                                                                                                                                                                                                                                                                                      [Apache] No Instance UpNo instances upPrometheus
                                                                                                                                                                                                                                                                                                                                                                      [Apache] Up Time Less Than One HourInstance with UpTime less than one hourPrometheus
                                                                                                                                                                                                                                                                                                                                                                      [Apache] Time Since Last OK Request More Than One HourTime since last OK request higher than one hourPrometheus
                                                                                                                                                                                                                                                                                                                                                                      [Apache] High Error RateHigh error ratePrometheus
                                                                                                                                                                                                                                                                                                                                                                      [Apache] High Rate Of Busy Workers In InstanceLow workers in open_slot statePrometheus

                                                                                                                                                                                                                                                                                                                                                                      List of dashboards:

                                                                                                                                                                                                                                                                                                                                                                      • Apache

                                                                                                                                                                                                                                                                                                                                                                      List of metrics:

                                                                                                                                                                                                                                                                                                                                                                      • apache_accesses_total
                                                                                                                                                                                                                                                                                                                                                                      • apache_connections
                                                                                                                                                                                                                                                                                                                                                                      • apache_cpuload
                                                                                                                                                                                                                                                                                                                                                                      • apache_duration_ms_total
                                                                                                                                                                                                                                                                                                                                                                      • apache_http_last_request_seconds
                                                                                                                                                                                                                                                                                                                                                                      • apache_http_response_codes_total
                                                                                                                                                                                                                                                                                                                                                                      • apache_scoreboard
                                                                                                                                                                                                                                                                                                                                                                      • apache_sent_kilobytes_total
                                                                                                                                                                                                                                                                                                                                                                      • apache_up
                                                                                                                                                                                                                                                                                                                                                                      • apache_uptime_seconds_total
                                                                                                                                                                                                                                                                                                                                                                      • apache_workers

                                                                                                                                                                                                                                                                                                                                                                      8.2.2 -

                                                                                                                                                                                                                                                                                                                                                                      Ceph

                                                                                                                                                                                                                                                                                                                                                                      Ceph

                                                                                                                                                                                                                                                                                                                                                                      This integration is enabled by default.

                                                                                                                                                                                                                                                                                                                                                                      List of alerts

                                                                                                                                                                                                                                                                                                                                                                      AlertDescriptionFormat
                                                                                                                                                                                                                                                                                                                                                                      [Ceph] Ceph Manager is absentCeph Manager has disappeared from Prometheus target discovery.Prometheus
                                                                                                                                                                                                                                                                                                                                                                      [Ceph] Ceph Manager is missing replicasCeph Manager is missing replicas.Prometheus
                                                                                                                                                                                                                                                                                                                                                                      [Ceph] Ceph quorum at riskStorage cluster quorum is low. Contact Support.Prometheus
                                                                                                                                                                                                                                                                                                                                                                      [Ceph] High number of leader changesCeph Monitor has seen a lot of leader changes per minute recently.Prometheus

                                                                                                                                                                                                                                                                                                                                                                      List of dashboards:

                                                                                                                                                                                                                                                                                                                                                                      • ceph

                                                                                                                                                                                                                                                                                                                                                                      List of metrics:

                                                                                                                                                                                                                                                                                                                                                                      • ceph_cluster_total_bytes
                                                                                                                                                                                                                                                                                                                                                                      • ceph_cluster_total_used_bytes
                                                                                                                                                                                                                                                                                                                                                                      • ceph_health_status
                                                                                                                                                                                                                                                                                                                                                                      • ceph_mgr_status
                                                                                                                                                                                                                                                                                                                                                                      • ceph_mon_metadata
                                                                                                                                                                                                                                                                                                                                                                      • ceph_mon_num_elections
                                                                                                                                                                                                                                                                                                                                                                      • ceph_mon_quorum_status
                                                                                                                                                                                                                                                                                                                                                                      • ceph_osd_apply_latency_ms
                                                                                                                                                                                                                                                                                                                                                                      • ceph_osd_commit_latency_ms
                                                                                                                                                                                                                                                                                                                                                                      • ceph_osd_in
                                                                                                                                                                                                                                                                                                                                                                      • ceph_osd_metadata
                                                                                                                                                                                                                                                                                                                                                                      • ceph_osd_numpg
                                                                                                                                                                                                                                                                                                                                                                      • ceph_osd_op_r
                                                                                                                                                                                                                                                                                                                                                                      • ceph_osd_op_r_latency_count
                                                                                                                                                                                                                                                                                                                                                                      • ceph_osd_op_r_latency_sum
                                                                                                                                                                                                                                                                                                                                                                      • ceph_osd_op_r_out_bytes
                                                                                                                                                                                                                                                                                                                                                                      • ceph_osd_op_w
                                                                                                                                                                                                                                                                                                                                                                      • ceph_osd_op_w_in_bytes
                                                                                                                                                                                                                                                                                                                                                                      • ceph_osd_op_w_latency_count
                                                                                                                                                                                                                                                                                                                                                                      • ceph_osd_op_w_latency_sum
                                                                                                                                                                                                                                                                                                                                                                      • ceph_osd_recovery_bytes
                                                                                                                                                                                                                                                                                                                                                                      • ceph_osd_recovery_ops
                                                                                                                                                                                                                                                                                                                                                                      • ceph_osd_up
                                                                                                                                                                                                                                                                                                                                                                      • ceph_pool_max_avail

                                                                                                                                                                                                                                                                                                                                                                      8.2.3 -

                                                                                                                                                                                                                                                                                                                                                                      Consul

                                                                                                                                                                                                                                                                                                                                                                      Consul

                                                                                                                                                                                                                                                                                                                                                                      This integration is enabled by default.

                                                                                                                                                                                                                                                                                                                                                                      List of alerts

                                                                                                                                                                                                                                                                                                                                                                      AlertDescriptionFormat
                                                                                                                                                                                                                                                                                                                                                                      [Consul] KV Store update time anomalyKV Store update time anomalyPrometheus
                                                                                                                                                                                                                                                                                                                                                                      [Consul] Transaction time anomalyTransaction time anomalyPrometheus
                                                                                                                                                                                                                                                                                                                                                                      [Consul] Raft transactions count anomalyRaft transactions count anomalyPrometheus
                                                                                                                                                                                                                                                                                                                                                                      [Consul] Raft commit time anomalyRaft commit time anomalyPrometheus
                                                                                                                                                                                                                                                                                                                                                                      [Consul] Leader time to contact followers too highLeader time to contact followers too highPrometheus
                                                                                                                                                                                                                                                                                                                                                                      [Consul] Flapping leadershipFlapping leadershipPrometheus
                                                                                                                                                                                                                                                                                                                                                                      [Consul] Too many electionsToo many electionsPrometheus
                                                                                                                                                                                                                                                                                                                                                                      [Consul] Server cluster unhealthyServer cluster unhealthyPrometheus
                                                                                                                                                                                                                                                                                                                                                                      [Consul] Zero failure toleranceZero failure tolerancePrometheus
                                                                                                                                                                                                                                                                                                                                                                      [Consul] Client RPC requests anomalyConsul client RPC requests anomalyPrometheus
                                                                                                                                                                                                                                                                                                                                                                      [Consul] Client RPC requests rate limit exceededConsul client RPC requests rate limit exceededPrometheus
                                                                                                                                                                                                                                                                                                                                                                      [Consul] Client RPC requests failedConsul client RPC requests failedPrometheus
                                                                                                                                                                                                                                                                                                                                                                      [Consul] License ExpiryConsul License ExpiryPrometheus
                                                                                                                                                                                                                                                                                                                                                                      [Consul] Garbage Collection pause highConsul Garbage Collection pause highPrometheus
                                                                                                                                                                                                                                                                                                                                                                      [Consul] Garbage Collection pause too highConsul Garbage Collection pause too highPrometheus
                                                                                                                                                                                                                                                                                                                                                                      [Consul] Raft restore duration too highConsul Raft restore duration too highPrometheus
                                                                                                                                                                                                                                                                                                                                                                      [Consul] RPC requests error rate is highConsul RPC requests error rate is highPrometheus
                                                                                                                                                                                                                                                                                                                                                                      [Consul] Cache hit rate is lowConsul Cache hit rate is lowPrometheus
                                                                                                                                                                                                                                                                                                                                                                      [Consul] High 4xx RequestError RateHigh 4xx RequestError RatePrometheus
                                                                                                                                                                                                                                                                                                                                                                      [Consul] High Request LatencyEnvoy High Request LatencyPrometheus
                                                                                                                                                                                                                                                                                                                                                                      [Consul] High Response LatencyEnvoy High Response LatencyPrometheus
                                                                                                                                                                                                                                                                                                                                                                      [Consul] Certificate close to expireCertificate close to expirePrometheus

                                                                                                                                                                                                                                                                                                                                                                      List of dashboards:

                                                                                                                                                                                                                                                                                                                                                                      • consul
                                                                                                                                                                                                                                                                                                                                                                      • consul envoy

                                                                                                                                                                                                                                                                                                                                                                      List of metrics:

                                                                                                                                                                                                                                                                                                                                                                      • consul_autopilot_failure_tolerance
                                                                                                                                                                                                                                                                                                                                                                      • consul_autopilot_healthy
                                                                                                                                                                                                                                                                                                                                                                      • consul_client_rpc
                                                                                                                                                                                                                                                                                                                                                                      • consul_client_rpc_exceeded
                                                                                                                                                                                                                                                                                                                                                                      • consul_client_rpc_failed
                                                                                                                                                                                                                                                                                                                                                                      • consul_consul_cache_bypass
                                                                                                                                                                                                                                                                                                                                                                      • consul_consul_cache_entries_count
                                                                                                                                                                                                                                                                                                                                                                      • consul_consul_cache_evict_expired
                                                                                                                                                                                                                                                                                                                                                                      • consul_consul_cache_fetch_error
                                                                                                                                                                                                                                                                                                                                                                      • consul_consul_cache_fetch_success
                                                                                                                                                                                                                                                                                                                                                                      • consul_kvs_apply_sum
                                                                                                                                                                                                                                                                                                                                                                      • consul_raft_apply
                                                                                                                                                                                                                                                                                                                                                                      • consul_raft_commitTime_sum
                                                                                                                                                                                                                                                                                                                                                                      • consul_raft_fsm_lastRestoreDuration
                                                                                                                                                                                                                                                                                                                                                                      • consul_raft_leader_lastContact
                                                                                                                                                                                                                                                                                                                                                                      • consul_raft_leader_oldestLogAge
                                                                                                                                                                                                                                                                                                                                                                      • consul_raft_rpc_installSnapshot
                                                                                                                                                                                                                                                                                                                                                                      • consul_raft_state_candidate
                                                                                                                                                                                                                                                                                                                                                                      • consul_raft_state_leader
                                                                                                                                                                                                                                                                                                                                                                      • consul_rpc_cross_dc
                                                                                                                                                                                                                                                                                                                                                                      • consul_rpc_queries_blocking
                                                                                                                                                                                                                                                                                                                                                                      • consul_rpc_query
                                                                                                                                                                                                                                                                                                                                                                      • consul_rpc_request
                                                                                                                                                                                                                                                                                                                                                                      • consul_rpc_request_error
                                                                                                                                                                                                                                                                                                                                                                      • consul_runtime_gc_pause_ns
                                                                                                                                                                                                                                                                                                                                                                      • consul_runtime_gc_pause_ns_sum
                                                                                                                                                                                                                                                                                                                                                                      • consul_system_licenseExpiration
                                                                                                                                                                                                                                                                                                                                                                      • consul_txn_apply_sum
                                                                                                                                                                                                                                                                                                                                                                      • envoy_cluster_membership_change
                                                                                                                                                                                                                                                                                                                                                                      • envoy_cluster_membership_healthy
                                                                                                                                                                                                                                                                                                                                                                      • envoy_cluster_membership_total
                                                                                                                                                                                                                                                                                                                                                                      • envoy_cluster_upstream_cx_active
                                                                                                                                                                                                                                                                                                                                                                      • envoy_cluster_upstream_cx_connect_ms_bucket
                                                                                                                                                                                                                                                                                                                                                                      • envoy_cluster_upstream_rq_active
                                                                                                                                                                                                                                                                                                                                                                      • envoy_cluster_upstream_rq_pending_active
                                                                                                                                                                                                                                                                                                                                                                      • envoy_cluster_upstream_rq_time_bucket
                                                                                                                                                                                                                                                                                                                                                                      • envoy_cluster_upstream_rq_xx
                                                                                                                                                                                                                                                                                                                                                                      • envoy_server_days_until_first_cert_expiring
                                                                                                                                                                                                                                                                                                                                                                      • go_build_info
                                                                                                                                                                                                                                                                                                                                                                      • go_gc_duration_seconds
                                                                                                                                                                                                                                                                                                                                                                      • go_gc_duration_seconds_count
                                                                                                                                                                                                                                                                                                                                                                      • go_gc_duration_seconds_sum
                                                                                                                                                                                                                                                                                                                                                                      • go_goroutines
                                                                                                                                                                                                                                                                                                                                                                      • go_memstats_buck_hash_sys_bytes
                                                                                                                                                                                                                                                                                                                                                                      • go_memstats_gc_sys_bytes
                                                                                                                                                                                                                                                                                                                                                                      • go_memstats_heap_alloc_bytes
                                                                                                                                                                                                                                                                                                                                                                      • go_memstats_heap_idle_bytes
                                                                                                                                                                                                                                                                                                                                                                      • go_memstats_heap_inuse_bytes
                                                                                                                                                                                                                                                                                                                                                                      • go_memstats_heap_released_bytes
                                                                                                                                                                                                                                                                                                                                                                      • go_memstats_heap_sys_bytes
                                                                                                                                                                                                                                                                                                                                                                      • go_memstats_lookups_total
                                                                                                                                                                                                                                                                                                                                                                      • go_memstats_mallocs_total
                                                                                                                                                                                                                                                                                                                                                                      • go_memstats_mcache_inuse_bytes
                                                                                                                                                                                                                                                                                                                                                                      • go_memstats_mcache_sys_bytes
                                                                                                                                                                                                                                                                                                                                                                      • go_memstats_mspan_inuse_bytes
                                                                                                                                                                                                                                                                                                                                                                      • go_memstats_mspan_sys_bytes
                                                                                                                                                                                                                                                                                                                                                                      • go_memstats_next_gc_bytes
                                                                                                                                                                                                                                                                                                                                                                      • go_memstats_stack_inuse_bytes
                                                                                                                                                                                                                                                                                                                                                                      • go_memstats_stack_sys_bytes
                                                                                                                                                                                                                                                                                                                                                                      • go_memstats_sys_bytes
                                                                                                                                                                                                                                                                                                                                                                      • go_threads
                                                                                                                                                                                                                                                                                                                                                                      • process_cpu_seconds_total
                                                                                                                                                                                                                                                                                                                                                                      • process_max_fds
                                                                                                                                                                                                                                                                                                                                                                      • process_open_fds

                                                                                                                                                                                                                                                                                                                                                                      8.2.4 -

                                                                                                                                                                                                                                                                                                                                                                      Elasticsearch

                                                                                                                                                                                                                                                                                                                                                                      Elasticsearch

                                                                                                                                                                                                                                                                                                                                                                      This integration is enabled by default.

                                                                                                                                                                                                                                                                                                                                                                      List of alerts

                                                                                                                                                                                                                                                                                                                                                                      AlertDescriptionFormat
                                                                                                                                                                                                                                                                                                                                                                      [Elasticsearch] Heap Usage Too HighThe heap usage is over 90%Prometheus
                                                                                                                                                                                                                                                                                                                                                                      [Elasticsearch] Heap Usage WarningThe heap usage is over 80%Prometheus
                                                                                                                                                                                                                                                                                                                                                                      [Elasticsearch] Disk Space LowDisk available less than 20%Prometheus
                                                                                                                                                                                                                                                                                                                                                                      [Elasticsearch] Disk Out Of SpaceDisk available less than 10%Prometheus
                                                                                                                                                                                                                                                                                                                                                                      [Elasticsearch] Cluster RedCluster in Red statusPrometheus
                                                                                                                                                                                                                                                                                                                                                                      [Elasticsearch] Cluster YellowCluster in Yellow statusPrometheus
                                                                                                                                                                                                                                                                                                                                                                      [Elasticsearch] Relocation ShardsRelocating shards for too longPrometheus
                                                                                                                                                                                                                                                                                                                                                                      [Elasticsearch] Initializing ShardsInitializing shards takes too longPrometheus
                                                                                                                                                                                                                                                                                                                                                                      [Elasticsearch] Unassigned ShardsUnassigned shards for long timePrometheus
                                                                                                                                                                                                                                                                                                                                                                      [Elasticsearch] Pending TasksElasticsearch has a high number of pending tasksPrometheus
                                                                                                                                                                                                                                                                                                                                                                      [Elasticsearch] No New DocumentsElasticsearch has no new documents for a period of timePrometheus

                                                                                                                                                                                                                                                                                                                                                                      List of dashboards:

                                                                                                                                                                                                                                                                                                                                                                      • ElasticSearch_Cluster
                                                                                                                                                                                                                                                                                                                                                                      • ElasticSearch_Infra

                                                                                                                                                                                                                                                                                                                                                                      List of metrics:

                                                                                                                                                                                                                                                                                                                                                                      • elasticsearch_cluster_health_active_primary_shards
                                                                                                                                                                                                                                                                                                                                                                      • elasticsearch_cluster_health_active_shards
                                                                                                                                                                                                                                                                                                                                                                      • elasticsearch_cluster_health_initializing_shards
                                                                                                                                                                                                                                                                                                                                                                      • elasticsearch_cluster_health_number_of_data_nodes
                                                                                                                                                                                                                                                                                                                                                                      • elasticsearch_cluster_health_number_of_nodes
                                                                                                                                                                                                                                                                                                                                                                      • elasticsearch_cluster_health_number_of_pending_tasks
                                                                                                                                                                                                                                                                                                                                                                      • elasticsearch_cluster_health_relocating_shards
                                                                                                                                                                                                                                                                                                                                                                      • elasticsearch_cluster_health_status
                                                                                                                                                                                                                                                                                                                                                                      • elasticsearch_cluster_health_unassigned_shards
                                                                                                                                                                                                                                                                                                                                                                      • elasticsearch_filesystem_data_available_bytes
                                                                                                                                                                                                                                                                                                                                                                      • elasticsearch_filesystem_data_size_bytes
                                                                                                                                                                                                                                                                                                                                                                      • elasticsearch_indices_docs
                                                                                                                                                                                                                                                                                                                                                                      • elasticsearch_indices_indexing_index_time_seconds_total
                                                                                                                                                                                                                                                                                                                                                                      • elasticsearch_indices_indexing_index_total
                                                                                                                                                                                                                                                                                                                                                                      • elasticsearch_indices_merges_total_time_seconds_total
                                                                                                                                                                                                                                                                                                                                                                      • elasticsearch_indices_search_query_time_seconds
                                                                                                                                                                                                                                                                                                                                                                      • elasticsearch_indices_store_throttle_time_seconds_total
                                                                                                                                                                                                                                                                                                                                                                      • elasticsearch_jvm_gc_collection_seconds_count
                                                                                                                                                                                                                                                                                                                                                                      • elasticsearch_jvm_gc_collection_seconds_sum
                                                                                                                                                                                                                                                                                                                                                                      • elasticsearch_jvm_memory_committed_bytes
                                                                                                                                                                                                                                                                                                                                                                      • elasticsearch_jvm_memory_max_bytes
                                                                                                                                                                                                                                                                                                                                                                      • elasticsearch_jvm_memory_pool_peak_used_bytes
                                                                                                                                                                                                                                                                                                                                                                      • elasticsearch_jvm_memory_used_bytes
                                                                                                                                                                                                                                                                                                                                                                      • elasticsearch_os_load1
                                                                                                                                                                                                                                                                                                                                                                      • elasticsearch_os_load15
                                                                                                                                                                                                                                                                                                                                                                      • elasticsearch_os_load5
                                                                                                                                                                                                                                                                                                                                                                      • elasticsearch_process_cpu_percent
                                                                                                                                                                                                                                                                                                                                                                      • elasticsearch_transport_rx_size_bytes_total
                                                                                                                                                                                                                                                                                                                                                                      • elasticsearch_transport_tx_size_bytes_total

                                                                                                                                                                                                                                                                                                                                                                      8.2.5 -

                                                                                                                                                                                                                                                                                                                                                                      Fluentd

                                                                                                                                                                                                                                                                                                                                                                      Fluentd

                                                                                                                                                                                                                                                                                                                                                                      This integration is enabled by default.

                                                                                                                                                                                                                                                                                                                                                                      List of alerts

                                                                                                                                                                                                                                                                                                                                                                      AlertDescriptionFormat
                                                                                                                                                                                                                                                                                                                                                                      [Fluentd] No Input From ContainerNo Input From Container.Prometheus
                                                                                                                                                                                                                                                                                                                                                                      [Fluentd] High Error RatioHigh Error Ratio.Prometheus
                                                                                                                                                                                                                                                                                                                                                                      [Fluentd] High Retry RatioHigh Retry Ratio.Prometheus
                                                                                                                                                                                                                                                                                                                                                                      [Fluentd] High Retry WaitHigh Retry Wait.Prometheus
                                                                                                                                                                                                                                                                                                                                                                      [Fluentd] Low Buffer Available SpaceLow Buffer Available Space.Prometheus
                                                                                                                                                                                                                                                                                                                                                                      [Fluentd] Buffer Queue Length IncreasingBuffer Queue Length Increasing.Prometheus
                                                                                                                                                                                                                                                                                                                                                                      [Fluentd] Buffer Total Bytes IncreasingBuffer Total Bytes Increasing.Prometheus
                                                                                                                                                                                                                                                                                                                                                                      [Fluentd] High Slow Flush RatioHigh Slow Flush Ratio.Prometheus
                                                                                                                                                                                                                                                                                                                                                                      [Fluentd] No Output Records From PluginNo Output Records From Plugin.Prometheus

                                                                                                                                                                                                                                                                                                                                                                      List of dashboards:

                                                                                                                                                                                                                                                                                                                                                                      • Fluentd

                                                                                                                                                                                                                                                                                                                                                                      List of metrics:

                                                                                                                                                                                                                                                                                                                                                                      • fluentd_input_status_num_records_total
                                                                                                                                                                                                                                                                                                                                                                      • fluentd_output_status_buffer_available_space_ratio
                                                                                                                                                                                                                                                                                                                                                                      • fluentd_output_status_buffer_queue_length
                                                                                                                                                                                                                                                                                                                                                                      • fluentd_output_status_buffer_total_bytes
                                                                                                                                                                                                                                                                                                                                                                      • fluentd_output_status_emit_count
                                                                                                                                                                                                                                                                                                                                                                      • fluentd_output_status_emit_records
                                                                                                                                                                                                                                                                                                                                                                      • fluentd_output_status_flush_time_count
                                                                                                                                                                                                                                                                                                                                                                      • fluentd_output_status_num_errors
                                                                                                                                                                                                                                                                                                                                                                      • fluentd_output_status_retry_count
                                                                                                                                                                                                                                                                                                                                                                      • fluentd_output_status_retry_wait
                                                                                                                                                                                                                                                                                                                                                                      • fluentd_output_status_rollback_count
                                                                                                                                                                                                                                                                                                                                                                      • fluentd_output_status_slow_flush_count

                                                                                                                                                                                                                                                                                                                                                                      8.2.6 -

                                                                                                                                                                                                                                                                                                                                                                      Haproxy-ingress

                                                                                                                                                                                                                                                                                                                                                                      Haproxy-ingress

                                                                                                                                                                                                                                                                                                                                                                      This integration is enabled by default.

                                                                                                                                                                                                                                                                                                                                                                      List of alerts

                                                                                                                                                                                                                                                                                                                                                                      AlertDescriptionFormat
                                                                                                                                                                                                                                                                                                                                                                      [Haproxy-Ingress] Uptime less than 1 hourThis alert detects when all of the instances of the ingress controller have an uptime of less than 1 hour.Prometheus
                                                                                                                                                                                                                                                                                                                                                                      [Haproxy-Ingress] Frontend DownThis alert detects when a frontend has all of its instances down for more than 10 minutes.Prometheus
                                                                                                                                                                                                                                                                                                                                                                      [Haproxy-Ingress] Backend DownThis alert detects when a backend has all of its instances down for more than 10 minutes.Prometheus
                                                                                                                                                                                                                                                                                                                                                                      [Haproxy-Ingress] High Sessions UsageThis alert triggers when the backend sessions overpass the 85% of the sessions capacity for 10 minutes.Prometheus
                                                                                                                                                                                                                                                                                                                                                                      [Haproxy-Ingress] High Error RateThis alert triggers when there is an error rate over 15% for over 10 minutes in a proxy.Prometheus
                                                                                                                                                                                                                                                                                                                                                                      [Haproxy-Ingress] High Request Denied RateThese alerts detect when there is a denied rate of requests over 10% for over 10 minutes in a proxy.Prometheus
                                                                                                                                                                                                                                                                                                                                                                      [Haproxy-Ingress] High Response Denied RateThese alerts detect when there is a denied rate of responses over 10% for over 10 minutes in a proxy.Prometheus
                                                                                                                                                                                                                                                                                                                                                                      [Haproxy-Ingress] High Response RateThis alert triggers when a proxy has a mean response time higher than 250ms for over 10 minutes.Prometheus

                                                                                                                                                                                                                                                                                                                                                                      List of dashboards:

                                                                                                                                                                                                                                                                                                                                                                      • HAProxy_Ingress_Overview
                                                                                                                                                                                                                                                                                                                                                                      • HAProxy_Ingress_Service_Details

                                                                                                                                                                                                                                                                                                                                                                      List of metrics:

                                                                                                                                                                                                                                                                                                                                                                      • haproxy_backend_bytes_in_total
                                                                                                                                                                                                                                                                                                                                                                      • haproxy_backend_bytes_out_total
                                                                                                                                                                                                                                                                                                                                                                      • haproxy_backend_client_aborts_total
                                                                                                                                                                                                                                                                                                                                                                      • haproxy_backend_connect_time_average_seconds
                                                                                                                                                                                                                                                                                                                                                                      • haproxy_backend_current_queue
                                                                                                                                                                                                                                                                                                                                                                      • haproxy_backend_http_requests_total
                                                                                                                                                                                                                                                                                                                                                                      • haproxy_backend_http_responses_total
                                                                                                                                                                                                                                                                                                                                                                      • haproxy_backend_limit_sessions
                                                                                                                                                                                                                                                                                                                                                                      • haproxy_backend_queue_time_average_seconds
                                                                                                                                                                                                                                                                                                                                                                      • haproxy_backend_requests_denied_total
                                                                                                                                                                                                                                                                                                                                                                      • haproxy_backend_response_time_average_seconds
                                                                                                                                                                                                                                                                                                                                                                      • haproxy_backend_responses_denied_total
                                                                                                                                                                                                                                                                                                                                                                      • haproxy_backend_sessions_total
                                                                                                                                                                                                                                                                                                                                                                      • haproxy_backend_status
                                                                                                                                                                                                                                                                                                                                                                      • haproxy_frontend_bytes_in_total
                                                                                                                                                                                                                                                                                                                                                                      • haproxy_frontend_bytes_out_total
                                                                                                                                                                                                                                                                                                                                                                      • haproxy_frontend_connections_total
                                                                                                                                                                                                                                                                                                                                                                      • haproxy_frontend_denied_connections_total
                                                                                                                                                                                                                                                                                                                                                                      • haproxy_frontend_denied_sessions_total
                                                                                                                                                                                                                                                                                                                                                                      • haproxy_frontend_request_errors_total
                                                                                                                                                                                                                                                                                                                                                                      • haproxy_frontend_requests_denied_total
                                                                                                                                                                                                                                                                                                                                                                      • haproxy_frontend_responses_denied_total
                                                                                                                                                                                                                                                                                                                                                                      • haproxy_frontend_status
                                                                                                                                                                                                                                                                                                                                                                      • haproxy_process_active_peers
                                                                                                                                                                                                                                                                                                                                                                      • haproxy_process_current_connection_rate
                                                                                                                                                                                                                                                                                                                                                                      • haproxy_process_current_run_queue
                                                                                                                                                                                                                                                                                                                                                                      • haproxy_process_current_session_rate
                                                                                                                                                                                                                                                                                                                                                                      • haproxy_process_current_tasks
                                                                                                                                                                                                                                                                                                                                                                      • haproxy_process_jobs
                                                                                                                                                                                                                                                                                                                                                                      • haproxy_process_ssl_connections_total
                                                                                                                                                                                                                                                                                                                                                                      • haproxy_process_start_time_seconds

                                                                                                                                                                                                                                                                                                                                                                      8.2.7 -

                                                                                                                                                                                                                                                                                                                                                                      Harbor

                                                                                                                                                                                                                                                                                                                                                                      Harbor

                                                                                                                                                                                                                                                                                                                                                                      This integration is enabled by default.

                                                                                                                                                                                                                                                                                                                                                                      List of alerts

                                                                                                                                                                                                                                                                                                                                                                      AlertDescriptionFormat
                                                                                                                                                                                                                                                                                                                                                                      [Harbor] Harbor Core Is DownHarbor Core Is DownPrometheus
                                                                                                                                                                                                                                                                                                                                                                      [Harbor] Harbor Database Is DownHarbor Database Is DownPrometheus
                                                                                                                                                                                                                                                                                                                                                                      [Harbor] Harbor Registry Is DownHarbor Registry Is DownPrometheus
                                                                                                                                                                                                                                                                                                                                                                      [Harbor] Harbor Redis Is DownHarbor Redis Is DownPrometheus
                                                                                                                                                                                                                                                                                                                                                                      [Harbor] Harbor Trivy Is DownHarbor Trivy Is DownPrometheus
                                                                                                                                                                                                                                                                                                                                                                      [Harbor] Harbor JobService Is DownHarbor JobService Is DownPrometheus
                                                                                                                                                                                                                                                                                                                                                                      [Harbor] Project Quota Is Raising The LimitProject Quota Is Raising The LimitPrometheus
                                                                                                                                                                                                                                                                                                                                                                      [Harbor] Harbor p99 latency is higher than 10 secondsHarbor p99 latency is higher than 10 secondsPrometheus
                                                                                                                                                                                                                                                                                                                                                                      [Harbor] Harbor Error Rate is HighHarbor Error Rate is HighPrometheus

                                                                                                                                                                                                                                                                                                                                                                      List of dashboards:

                                                                                                                                                                                                                                                                                                                                                                      • Harbor

                                                                                                                                                                                                                                                                                                                                                                      List of metrics:

                                                                                                                                                                                                                                                                                                                                                                      • go_build_info
                                                                                                                                                                                                                                                                                                                                                                      • go_gc_duration_seconds
                                                                                                                                                                                                                                                                                                                                                                      • go_gc_duration_seconds_count
                                                                                                                                                                                                                                                                                                                                                                      • go_gc_duration_seconds_sum
                                                                                                                                                                                                                                                                                                                                                                      • go_goroutines
                                                                                                                                                                                                                                                                                                                                                                      • go_memstats_buck_hash_sys_bytes
                                                                                                                                                                                                                                                                                                                                                                      • go_memstats_gc_sys_bytes
                                                                                                                                                                                                                                                                                                                                                                      • go_memstats_heap_alloc_bytes
                                                                                                                                                                                                                                                                                                                                                                      • go_memstats_heap_idle_bytes
                                                                                                                                                                                                                                                                                                                                                                      • go_memstats_heap_inuse_bytes
                                                                                                                                                                                                                                                                                                                                                                      • go_memstats_heap_released_bytes
                                                                                                                                                                                                                                                                                                                                                                      • go_memstats_heap_sys_bytes
                                                                                                                                                                                                                                                                                                                                                                      • go_memstats_lookups_total
                                                                                                                                                                                                                                                                                                                                                                      • go_memstats_mallocs_total
                                                                                                                                                                                                                                                                                                                                                                      • go_memstats_mcache_inuse_bytes
                                                                                                                                                                                                                                                                                                                                                                      • go_memstats_mcache_sys_bytes
                                                                                                                                                                                                                                                                                                                                                                      • go_memstats_mspan_inuse_bytes
                                                                                                                                                                                                                                                                                                                                                                      • go_memstats_mspan_sys_bytes
                                                                                                                                                                                                                                                                                                                                                                      • go_memstats_next_gc_bytes
                                                                                                                                                                                                                                                                                                                                                                      • go_memstats_stack_inuse_bytes
                                                                                                                                                                                                                                                                                                                                                                      • go_memstats_stack_sys_bytes
                                                                                                                                                                                                                                                                                                                                                                      • go_memstats_sys_bytes
                                                                                                                                                                                                                                                                                                                                                                      • go_threads
                                                                                                                                                                                                                                                                                                                                                                      • harbor_artifact_pulled
                                                                                                                                                                                                                                                                                                                                                                      • harbor_core_http_request_duration_seconds
                                                                                                                                                                                                                                                                                                                                                                      • harbor_jobservice_task_process_time_seconds
                                                                                                                                                                                                                                                                                                                                                                      • harbor_project_member_total
                                                                                                                                                                                                                                                                                                                                                                      • harbor_project_quota_byte
                                                                                                                                                                                                                                                                                                                                                                      • harbor_project_quota_usage_byte
                                                                                                                                                                                                                                                                                                                                                                      • harbor_project_repo_total
                                                                                                                                                                                                                                                                                                                                                                      • harbor_project_total
                                                                                                                                                                                                                                                                                                                                                                      • harbor_quotas_size_bytes
                                                                                                                                                                                                                                                                                                                                                                      • harbor_task_concurrency
                                                                                                                                                                                                                                                                                                                                                                      • harbor_task_queue_latency
                                                                                                                                                                                                                                                                                                                                                                      • harbor_task_queue_size
                                                                                                                                                                                                                                                                                                                                                                      • harbor_up
                                                                                                                                                                                                                                                                                                                                                                      • process_cpu_seconds_total
                                                                                                                                                                                                                                                                                                                                                                      • process_max_fds
                                                                                                                                                                                                                                                                                                                                                                      • process_open_fds
                                                                                                                                                                                                                                                                                                                                                                      • registry_http_request_duration_seconds_bucket
                                                                                                                                                                                                                                                                                                                                                                      • registry_http_request_size_bytes_bucket
                                                                                                                                                                                                                                                                                                                                                                      • registry_http_requests_total
                                                                                                                                                                                                                                                                                                                                                                      • registry_http_response_size_bytes_bucket
                                                                                                                                                                                                                                                                                                                                                                      • registry_storage_action_seconds_bucket

                                                                                                                                                                                                                                                                                                                                                                      8.2.8 -

                                                                                                                                                                                                                                                                                                                                                                      K8s-etcd

                                                                                                                                                                                                                                                                                                                                                                      K8s-etcd

                                                                                                                                                                                                                                                                                                                                                                      This integration is enabled by default.

                                                                                                                                                                                                                                                                                                                                                                      List of dashboards:

                                                                                                                                                                                                                                                                                                                                                                      • Kubernetes_Etcd

                                                                                                                                                                                                                                                                                                                                                                      List of metrics:

                                                                                                                                                                                                                                                                                                                                                                      • etcd_debugging_mvcc_db_total_size_in_bytes
                                                                                                                                                                                                                                                                                                                                                                      • etcd_disk_backend_commit_duration_seconds_bucket
                                                                                                                                                                                                                                                                                                                                                                      • etcd_disk_wal_fsync_duration_seconds_bucket
                                                                                                                                                                                                                                                                                                                                                                      • etcd_grpc_proxy_cache_hits_total
                                                                                                                                                                                                                                                                                                                                                                      • etcd_grpc_proxy_cache_misses_total
                                                                                                                                                                                                                                                                                                                                                                      • etcd_network_client_grpc_received_bytes_total
                                                                                                                                                                                                                                                                                                                                                                      • etcd_network_client_grpc_sent_bytes_total
                                                                                                                                                                                                                                                                                                                                                                      • etcd_network_peer_received_bytes_total
                                                                                                                                                                                                                                                                                                                                                                      • etcd_network_peer_received_failures_total
                                                                                                                                                                                                                                                                                                                                                                      • etcd_network_peer_round_trip_time_seconds_bucket
                                                                                                                                                                                                                                                                                                                                                                      • etcd_network_peer_sent_bytes_total
                                                                                                                                                                                                                                                                                                                                                                      • etcd_network_peer_sent_failures_total
                                                                                                                                                                                                                                                                                                                                                                      • etcd_server_has_leader
                                                                                                                                                                                                                                                                                                                                                                      • etcd_server_id
                                                                                                                                                                                                                                                                                                                                                                      • etcd_server_leader_changes_seen_total
                                                                                                                                                                                                                                                                                                                                                                      • etcd_server_proposals_applied_total
                                                                                                                                                                                                                                                                                                                                                                      • etcd_server_proposals_committed_total
                                                                                                                                                                                                                                                                                                                                                                      • etcd_server_proposals_failed_total
                                                                                                                                                                                                                                                                                                                                                                      • etcd_server_proposals_pending
                                                                                                                                                                                                                                                                                                                                                                      • go_build_info
                                                                                                                                                                                                                                                                                                                                                                      • go_gc_duration_seconds
                                                                                                                                                                                                                                                                                                                                                                      • go_gc_duration_seconds_count
                                                                                                                                                                                                                                                                                                                                                                      • go_gc_duration_seconds_sum
                                                                                                                                                                                                                                                                                                                                                                      • go_goroutines
                                                                                                                                                                                                                                                                                                                                                                      • go_memstats_buck_hash_sys_bytes
                                                                                                                                                                                                                                                                                                                                                                      • go_memstats_gc_sys_bytes
                                                                                                                                                                                                                                                                                                                                                                      • go_memstats_heap_alloc_bytes
                                                                                                                                                                                                                                                                                                                                                                      • go_memstats_heap_idle_bytes
                                                                                                                                                                                                                                                                                                                                                                      • go_memstats_heap_inuse_bytes
                                                                                                                                                                                                                                                                                                                                                                      • go_memstats_heap_released_bytes
                                                                                                                                                                                                                                                                                                                                                                      • go_memstats_heap_sys_bytes
                                                                                                                                                                                                                                                                                                                                                                      • go_memstats_lookups_total
                                                                                                                                                                                                                                                                                                                                                                      • go_memstats_mallocs_total
                                                                                                                                                                                                                                                                                                                                                                      • go_memstats_mcache_inuse_bytes
                                                                                                                                                                                                                                                                                                                                                                      • go_memstats_mcache_sys_bytes
                                                                                                                                                                                                                                                                                                                                                                      • go_memstats_mspan_inuse_bytes
                                                                                                                                                                                                                                                                                                                                                                      • go_memstats_mspan_sys_bytes
                                                                                                                                                                                                                                                                                                                                                                      • go_memstats_next_gc_bytes
                                                                                                                                                                                                                                                                                                                                                                      • go_memstats_stack_inuse_bytes
                                                                                                                                                                                                                                                                                                                                                                      • go_memstats_stack_sys_bytes
                                                                                                                                                                                                                                                                                                                                                                      • go_memstats_sys_bytes
                                                                                                                                                                                                                                                                                                                                                                      • go_threads
                                                                                                                                                                                                                                                                                                                                                                      • grpc_server_handled_total
                                                                                                                                                                                                                                                                                                                                                                      • grpc_server_started_total
                                                                                                                                                                                                                                                                                                                                                                      • process_cpu_seconds_total
                                                                                                                                                                                                                                                                                                                                                                      • process_max_fds
                                                                                                                                                                                                                                                                                                                                                                      • process_open_fds
                                                                                                                                                                                                                                                                                                                                                                      • sysdig_container_cpu_cores_used
                                                                                                                                                                                                                                                                                                                                                                      • sysdig_container_memory_used_bytes

                                                                                                                                                                                                                                                                                                                                                                      8.2.9 -

                                                                                                                                                                                                                                                                                                                                                                      Keda

                                                                                                                                                                                                                                                                                                                                                                      Keda

                                                                                                                                                                                                                                                                                                                                                                      This integration is enabled by default.

                                                                                                                                                                                                                                                                                                                                                                      List of alerts

                                                                                                                                                                                                                                                                                                                                                                      AlertDescriptionFormat
                                                                                                                                                                                                                                                                                                                                                                      [Keda] Errors in Scaled ObjectErrors detected in scaled objectPrometheus

                                                                                                                                                                                                                                                                                                                                                                      List of dashboards:

                                                                                                                                                                                                                                                                                                                                                                      • Keda

                                                                                                                                                                                                                                                                                                                                                                      List of metrics:

                                                                                                                                                                                                                                                                                                                                                                      • keda_metrics_adapter_scaled_object_errors
                                                                                                                                                                                                                                                                                                                                                                      • keda_metrics_adapter_scaler_metrics_value
                                                                                                                                                                                                                                                                                                                                                                      • kubernetes.hpa.replicas.current
                                                                                                                                                                                                                                                                                                                                                                      • kubernetes.hpa.replicas.desired
                                                                                                                                                                                                                                                                                                                                                                      • kubernetes.hpa.replicas.max
                                                                                                                                                                                                                                                                                                                                                                      • kubernetes.hpa.replicas.min

                                                                                                                                                                                                                                                                                                                                                                      8.2.10 -

                                                                                                                                                                                                                                                                                                                                                                      Memcached

                                                                                                                                                                                                                                                                                                                                                                      Memcached

                                                                                                                                                                                                                                                                                                                                                                      This integration is enabled by default.

                                                                                                                                                                                                                                                                                                                                                                      List of alerts

                                                                                                                                                                                                                                                                                                                                                                      AlertDescriptionFormat
                                                                                                                                                                                                                                                                                                                                                                      [Memcached] Instance DownInstance is not reachablePrometheus
                                                                                                                                                                                                                                                                                                                                                                      [Memcached] Low UpTimeUptime of less than 1 hour in a Memcached instancePrometheus
                                                                                                                                                                                                                                                                                                                                                                      [Memcached] Connection ThrottledConnection throttled because max number of requests per event process reachedPrometheus
                                                                                                                                                                                                                                                                                                                                                                      [Memcached] Connections Close To The Limit 85%The mumber of connections are close to the limitPrometheus
                                                                                                                                                                                                                                                                                                                                                                      [Memcached] Connections Limit ReachedReached the number of maximum connections and caused a connection errorPrometheus

                                                                                                                                                                                                                                                                                                                                                                      List of dashboards:

                                                                                                                                                                                                                                                                                                                                                                      • Memcached

                                                                                                                                                                                                                                                                                                                                                                      List of metrics:

                                                                                                                                                                                                                                                                                                                                                                      • memcached_commands_total
                                                                                                                                                                                                                                                                                                                                                                      • memcached_connections_listener_disabled_total
                                                                                                                                                                                                                                                                                                                                                                      • memcached_connections_yielded_total
                                                                                                                                                                                                                                                                                                                                                                      • memcached_current_bytes
                                                                                                                                                                                                                                                                                                                                                                      • memcached_current_connections
                                                                                                                                                                                                                                                                                                                                                                      • memcached_current_items
                                                                                                                                                                                                                                                                                                                                                                      • memcached_items_evicted_total
                                                                                                                                                                                                                                                                                                                                                                      • memcached_items_reclaimed_total
                                                                                                                                                                                                                                                                                                                                                                      • memcached_items_total
                                                                                                                                                                                                                                                                                                                                                                      • memcached_limit_bytes
                                                                                                                                                                                                                                                                                                                                                                      • memcached_max_connections
                                                                                                                                                                                                                                                                                                                                                                      • memcached_up
                                                                                                                                                                                                                                                                                                                                                                      • memcached_uptime_seconds

                                                                                                                                                                                                                                                                                                                                                                      8.2.11 -

                                                                                                                                                                                                                                                                                                                                                                      Mongodb

                                                                                                                                                                                                                                                                                                                                                                      Mongodb

                                                                                                                                                                                                                                                                                                                                                                      This integration is enabled by default.

                                                                                                                                                                                                                                                                                                                                                                      List of alerts

                                                                                                                                                                                                                                                                                                                                                                      AlertDescriptionFormat
                                                                                                                                                                                                                                                                                                                                                                      [MongoDB] Instance DownMongo server detected down by instancePrometheus
                                                                                                                                                                                                                                                                                                                                                                      [MongoDB] Uptime less than one hourMongo server detected down by instancePrometheus
                                                                                                                                                                                                                                                                                                                                                                      [MongoDB] Asserts detectedMongo server detected down by instancePrometheus
                                                                                                                                                                                                                                                                                                                                                                      [MongoDB] High LatencyHigh latency in instancePrometheus
                                                                                                                                                                                                                                                                                                                                                                      [MongoDB] High Ticket UtilizationTicket usage over 75% in instancePrometheus
                                                                                                                                                                                                                                                                                                                                                                      [MongoDB] Recurrent Cursor TimeoutRecurrent cursors timeout in instancePrometheus
                                                                                                                                                                                                                                                                                                                                                                      [MongoDB] Recurrent Memory Page FaultsRecurrent cursors timeout in instancePrometheus

                                                                                                                                                                                                                                                                                                                                                                      List of dashboards:

                                                                                                                                                                                                                                                                                                                                                                      • MongoDB_Database_Details
                                                                                                                                                                                                                                                                                                                                                                      • MongoDB_Instance_Health

                                                                                                                                                                                                                                                                                                                                                                      List of metrics:

                                                                                                                                                                                                                                                                                                                                                                      • mongodb_asserts_total
                                                                                                                                                                                                                                                                                                                                                                      • mongodb_connections
                                                                                                                                                                                                                                                                                                                                                                      • mongodb_extra_info_page_faults_total
                                                                                                                                                                                                                                                                                                                                                                      • mongodb_instance_uptime_seconds
                                                                                                                                                                                                                                                                                                                                                                      • mongodb_memory
                                                                                                                                                                                                                                                                                                                                                                      • mongodb_mongod_db_collections_total
                                                                                                                                                                                                                                                                                                                                                                      • mongodb_mongod_db_data_size_bytes
                                                                                                                                                                                                                                                                                                                                                                      • mongodb_mongod_db_index_size_bytes
                                                                                                                                                                                                                                                                                                                                                                      • mongodb_mongod_db_indexes_total
                                                                                                                                                                                                                                                                                                                                                                      • mongodb_mongod_db_objects_total
                                                                                                                                                                                                                                                                                                                                                                      • mongodb_mongod_global_lock_client
                                                                                                                                                                                                                                                                                                                                                                      • mongodb_mongod_global_lock_current_queue
                                                                                                                                                                                                                                                                                                                                                                      • mongodb_mongod_global_lock_ratio
                                                                                                                                                                                                                                                                                                                                                                      • mongodb_mongod_metrics_cursor_open
                                                                                                                                                                                                                                                                                                                                                                      • mongodb_mongod_metrics_cursor_timed_out_total
                                                                                                                                                                                                                                                                                                                                                                      • mongodb_mongod_op_latencies_latency_total
                                                                                                                                                                                                                                                                                                                                                                      • mongodb_mongod_op_latencies_ops_total
                                                                                                                                                                                                                                                                                                                                                                      • mongodb_mongod_wiredtiger_cache_bytes
                                                                                                                                                                                                                                                                                                                                                                      • mongodb_mongod_wiredtiger_cache_bytes_total
                                                                                                                                                                                                                                                                                                                                                                      • mongodb_mongod_wiredtiger_cache_evicted_total
                                                                                                                                                                                                                                                                                                                                                                      • mongodb_mongod_wiredtiger_cache_pages
                                                                                                                                                                                                                                                                                                                                                                      • mongodb_mongod_wiredtiger_concurrent_transactions_out_tickets
                                                                                                                                                                                                                                                                                                                                                                      • mongodb_mongod_wiredtiger_concurrent_transactions_total_tickets
                                                                                                                                                                                                                                                                                                                                                                      • mongodb_network_bytes_total
                                                                                                                                                                                                                                                                                                                                                                      • mongodb_network_metrics_num_requests_total
                                                                                                                                                                                                                                                                                                                                                                      • mongodb_op_counters_total
                                                                                                                                                                                                                                                                                                                                                                      • mongodb_up
                                                                                                                                                                                                                                                                                                                                                                      • net.error.count

                                                                                                                                                                                                                                                                                                                                                                      8.2.12 -

                                                                                                                                                                                                                                                                                                                                                                      Mysql

                                                                                                                                                                                                                                                                                                                                                                      Mysql

                                                                                                                                                                                                                                                                                                                                                                      This integration is enabled by default.

                                                                                                                                                                                                                                                                                                                                                                      List of alerts

                                                                                                                                                                                                                                                                                                                                                                      AlertDescriptionFormat
                                                                                                                                                                                                                                                                                                                                                                      [MySQL] Mysql DownMySQL instance is downPrometheus
                                                                                                                                                                                                                                                                                                                                                                      [MySQL] Mysql RestartedMySQL has just been restarted, less than one minute agoPrometheus
                                                                                                                                                                                                                                                                                                                                                                      [MySQL] Mysql Too any Connections (>80%)More than 80% of MySQL connections are in usePrometheus
                                                                                                                                                                                                                                                                                                                                                                      [MySQL] Mysql High Threads RunningMore than 60% of MySQL connections are in running statePrometheus
                                                                                                                                                                                                                                                                                                                                                                      [MySQL] Mysql HighOpen FilesMore than 80% of MySQL files openPrometheus
                                                                                                                                                                                                                                                                                                                                                                      [MySQL] Mysql Slow QueriesMySQL server mysql has some new slow queryPrometheus
                                                                                                                                                                                                                                                                                                                                                                      [MySQL] Mysql Innodb Log WaitsMySQL innodb log writes stallingPrometheus
                                                                                                                                                                                                                                                                                                                                                                      [MySQL] Mysql Slave Io Thread Not RunningMySQL Slave IO thread not runningPrometheus
                                                                                                                                                                                                                                                                                                                                                                      [MySQL] Mysql Slave Sql Thread Not RunningMySQL Slave SQL thread not runningPrometheus
                                                                                                                                                                                                                                                                                                                                                                      [MySQL] Mysql Slave Replication LagMySQL Slave replication lagPrometheus

                                                                                                                                                                                                                                                                                                                                                                      List of dashboards:

                                                                                                                                                                                                                                                                                                                                                                      • MySQL

                                                                                                                                                                                                                                                                                                                                                                      List of metrics:

                                                                                                                                                                                                                                                                                                                                                                      • mysql_global_status_aborted_clients
                                                                                                                                                                                                                                                                                                                                                                      • mysql_global_status_aborted_connects
                                                                                                                                                                                                                                                                                                                                                                      • mysql_global_status_buffer_pool_pages
                                                                                                                                                                                                                                                                                                                                                                      • mysql_global_status_bytes_received
                                                                                                                                                                                                                                                                                                                                                                      • mysql_global_status_bytes_sent
                                                                                                                                                                                                                                                                                                                                                                      • mysql_global_status_commands_total
                                                                                                                                                                                                                                                                                                                                                                      • mysql_global_status_connection_errors_total
                                                                                                                                                                                                                                                                                                                                                                      • mysql_global_status_innodb_buffer_pool_read_requests
                                                                                                                                                                                                                                                                                                                                                                      • mysql_global_status_innodb_buffer_pool_reads
                                                                                                                                                                                                                                                                                                                                                                      • mysql_global_status_innodb_log_waits
                                                                                                                                                                                                                                                                                                                                                                      • mysql_global_status_innodb_mem_adaptive_hash
                                                                                                                                                                                                                                                                                                                                                                      • mysql_global_status_innodb_mem_dictionary
                                                                                                                                                                                                                                                                                                                                                                      • mysql_global_status_innodb_page_size
                                                                                                                                                                                                                                                                                                                                                                      • mysql_global_status_questions
                                                                                                                                                                                                                                                                                                                                                                      • mysql_global_status_select_full_join
                                                                                                                                                                                                                                                                                                                                                                      • mysql_global_status_select_full_range_join
                                                                                                                                                                                                                                                                                                                                                                      • mysql_global_status_select_range_check
                                                                                                                                                                                                                                                                                                                                                                      • mysql_global_status_select_scan
                                                                                                                                                                                                                                                                                                                                                                      • mysql_global_status_slow_queries
                                                                                                                                                                                                                                                                                                                                                                      • mysql_global_status_sort_merge_passes
                                                                                                                                                                                                                                                                                                                                                                      • mysql_global_status_sort_range
                                                                                                                                                                                                                                                                                                                                                                      • mysql_global_status_sort_rows
                                                                                                                                                                                                                                                                                                                                                                      • mysql_global_status_sort_scan
                                                                                                                                                                                                                                                                                                                                                                      • mysql_global_status_table_locks_immediate
                                                                                                                                                                                                                                                                                                                                                                      • mysql_global_status_table_locks_waited
                                                                                                                                                                                                                                                                                                                                                                      • mysql_global_status_table_open_cache_hits
                                                                                                                                                                                                                                                                                                                                                                      • mysql_global_status_table_open_cache_misses
                                                                                                                                                                                                                                                                                                                                                                      • mysql_global_status_threads_cached
                                                                                                                                                                                                                                                                                                                                                                      • mysql_global_status_threads_connected
                                                                                                                                                                                                                                                                                                                                                                      • mysql_global_status_threads_created
                                                                                                                                                                                                                                                                                                                                                                      • mysql_global_status_threads_running
                                                                                                                                                                                                                                                                                                                                                                      • mysql_global_status_uptime
                                                                                                                                                                                                                                                                                                                                                                      • mysql_global_variables_innodb_additional_mem_pool_size
                                                                                                                                                                                                                                                                                                                                                                      • mysql_global_variables_innodb_log_buffer_size
                                                                                                                                                                                                                                                                                                                                                                      • mysql_global_variables_innodb_open_files
                                                                                                                                                                                                                                                                                                                                                                      • mysql_global_variables_key_buffer_size
                                                                                                                                                                                                                                                                                                                                                                      • mysql_global_variables_max_connections
                                                                                                                                                                                                                                                                                                                                                                      • mysql_global_variables_open_files_limit
                                                                                                                                                                                                                                                                                                                                                                      • mysql_global_variables_query_cache_size
                                                                                                                                                                                                                                                                                                                                                                      • mysql_global_variables_thread_cache_size
                                                                                                                                                                                                                                                                                                                                                                      • mysql_global_variables_tokudb_cache_size
                                                                                                                                                                                                                                                                                                                                                                      • mysql_slave_status_master_server_id
                                                                                                                                                                                                                                                                                                                                                                      • mysql_slave_status_seconds_behind_master
                                                                                                                                                                                                                                                                                                                                                                      • mysql_slave_status_slave_io_running
                                                                                                                                                                                                                                                                                                                                                                      • mysql_slave_status_slave_sql_running
                                                                                                                                                                                                                                                                                                                                                                      • mysql_slave_status_sql_delay
                                                                                                                                                                                                                                                                                                                                                                      • mysql_up

                                                                                                                                                                                                                                                                                                                                                                      8.2.13 -

                                                                                                                                                                                                                                                                                                                                                                      Nginx

                                                                                                                                                                                                                                                                                                                                                                      Nginx

                                                                                                                                                                                                                                                                                                                                                                      This integration is enabled by default.

                                                                                                                                                                                                                                                                                                                                                                      List of alerts

                                                                                                                                                                                                                                                                                                                                                                      AlertDescriptionFormat
                                                                                                                                                                                                                                                                                                                                                                      [Nginx] No Intances UpNo Nginx instances UpPrometheus

                                                                                                                                                                                                                                                                                                                                                                      List of dashboards:

                                                                                                                                                                                                                                                                                                                                                                      • NGINX_App_Overview

                                                                                                                                                                                                                                                                                                                                                                      List of metrics:

                                                                                                                                                                                                                                                                                                                                                                      • net.bytes.in
                                                                                                                                                                                                                                                                                                                                                                      • net.bytes.out
                                                                                                                                                                                                                                                                                                                                                                      • net.http.error.count
                                                                                                                                                                                                                                                                                                                                                                      • net.http.request.count
                                                                                                                                                                                                                                                                                                                                                                      • net.http.request.time
                                                                                                                                                                                                                                                                                                                                                                      • nginx_connections_accepted
                                                                                                                                                                                                                                                                                                                                                                      • nginx_connections_active
                                                                                                                                                                                                                                                                                                                                                                      • nginx_connections_handled
                                                                                                                                                                                                                                                                                                                                                                      • nginx_connections_reading
                                                                                                                                                                                                                                                                                                                                                                      • nginx_connections_waiting
                                                                                                                                                                                                                                                                                                                                                                      • nginx_connections_writing
                                                                                                                                                                                                                                                                                                                                                                      • nginx_up

                                                                                                                                                                                                                                                                                                                                                                      8.2.14 -

                                                                                                                                                                                                                                                                                                                                                                      Nginx-ingress

                                                                                                                                                                                                                                                                                                                                                                      Nginx-ingress

                                                                                                                                                                                                                                                                                                                                                                      This integration is enabled by default.

                                                                                                                                                                                                                                                                                                                                                                      List of alerts

                                                                                                                                                                                                                                                                                                                                                                      AlertDescriptionFormat
                                                                                                                                                                                                                                                                                                                                                                      [Nginx-Ingress] High Http 4xx Error RateToo many HTTP requests with status 4xx (> 5%)Prometheus
                                                                                                                                                                                                                                                                                                                                                                      [Nginx-Ingress] High Http 5xx Error RateToo many HTTP requests with status 5xx (> 5%)Prometheus
                                                                                                                                                                                                                                                                                                                                                                      [Nginx-Ingress] High LatencyNginx p99 latency is higher than 10 secondsPrometheus
                                                                                                                                                                                                                                                                                                                                                                      [Nginx-Ingress] Ingress Certificate ExpiryNginx Ingress Certificate will expire in less than 14 daysPrometheus

                                                                                                                                                                                                                                                                                                                                                                      List of dashboards:

                                                                                                                                                                                                                                                                                                                                                                      • Nginx_Kubernetes_Ingress_Controller

                                                                                                                                                                                                                                                                                                                                                                      List of metrics:

                                                                                                                                                                                                                                                                                                                                                                      • go_build_info
                                                                                                                                                                                                                                                                                                                                                                      • go_gc_duration_seconds
                                                                                                                                                                                                                                                                                                                                                                      • go_gc_duration_seconds_count
                                                                                                                                                                                                                                                                                                                                                                      • go_gc_duration_seconds_sum
                                                                                                                                                                                                                                                                                                                                                                      • go_goroutines
                                                                                                                                                                                                                                                                                                                                                                      • go_memstats_buck_hash_sys_bytes
                                                                                                                                                                                                                                                                                                                                                                      • go_memstats_gc_sys_bytes
                                                                                                                                                                                                                                                                                                                                                                      • go_memstats_heap_alloc_bytes
                                                                                                                                                                                                                                                                                                                                                                      • go_memstats_heap_idle_bytes
                                                                                                                                                                                                                                                                                                                                                                      • go_memstats_heap_inuse_bytes
                                                                                                                                                                                                                                                                                                                                                                      • go_memstats_heap_released_bytes
                                                                                                                                                                                                                                                                                                                                                                      • go_memstats_heap_sys_bytes
                                                                                                                                                                                                                                                                                                                                                                      • go_memstats_lookups_total
                                                                                                                                                                                                                                                                                                                                                                      • go_memstats_mallocs_total
                                                                                                                                                                                                                                                                                                                                                                      • go_memstats_mcache_inuse_bytes
                                                                                                                                                                                                                                                                                                                                                                      • go_memstats_mcache_sys_bytes
                                                                                                                                                                                                                                                                                                                                                                      • go_memstats_mspan_inuse_bytes
                                                                                                                                                                                                                                                                                                                                                                      • go_memstats_mspan_sys_bytes
                                                                                                                                                                                                                                                                                                                                                                      • go_memstats_next_gc_bytes
                                                                                                                                                                                                                                                                                                                                                                      • go_memstats_stack_inuse_bytes
                                                                                                                                                                                                                                                                                                                                                                      • go_memstats_stack_sys_bytes
                                                                                                                                                                                                                                                                                                                                                                      • go_memstats_sys_bytes
                                                                                                                                                                                                                                                                                                                                                                      • go_threads
                                                                                                                                                                                                                                                                                                                                                                      • nginx_ingress_controller_config_last_reload_successful
                                                                                                                                                                                                                                                                                                                                                                      • nginx_ingress_controller_config_last_reload_successful_timestamp_seconds
                                                                                                                                                                                                                                                                                                                                                                      • nginx_ingress_controller_ingress_upstream_latency_seconds_count
                                                                                                                                                                                                                                                                                                                                                                      • nginx_ingress_controller_ingress_upstream_latency_seconds_sum
                                                                                                                                                                                                                                                                                                                                                                      • nginx_ingress_controller_nginx_process_connections
                                                                                                                                                                                                                                                                                                                                                                      • nginx_ingress_controller_nginx_process_cpu_seconds_total
                                                                                                                                                                                                                                                                                                                                                                      • nginx_ingress_controller_nginx_process_resident_memory_bytes
                                                                                                                                                                                                                                                                                                                                                                      • nginx_ingress_controller_request_duration_seconds_bucket
                                                                                                                                                                                                                                                                                                                                                                      • nginx_ingress_controller_request_duration_seconds_count
                                                                                                                                                                                                                                                                                                                                                                      • nginx_ingress_controller_request_duration_seconds_sum
                                                                                                                                                                                                                                                                                                                                                                      • nginx_ingress_controller_request_size_sum
                                                                                                                                                                                                                                                                                                                                                                      • nginx_ingress_controller_requests
                                                                                                                                                                                                                                                                                                                                                                      • nginx_ingress_controller_response_duration_seconds_count
                                                                                                                                                                                                                                                                                                                                                                      • nginx_ingress_controller_response_duration_seconds_sum
                                                                                                                                                                                                                                                                                                                                                                      • nginx_ingress_controller_response_size_sum
                                                                                                                                                                                                                                                                                                                                                                      • nginx_ingress_controller_ssl_expire_time_seconds
                                                                                                                                                                                                                                                                                                                                                                      • process_cpu_seconds_total
                                                                                                                                                                                                                                                                                                                                                                      • process_max_fds
                                                                                                                                                                                                                                                                                                                                                                      • process_open_fds

                                                                                                                                                                                                                                                                                                                                                                      8.2.15 -

                                                                                                                                                                                                                                                                                                                                                                      Ntp

                                                                                                                                                                                                                                                                                                                                                                      Ntp

                                                                                                                                                                                                                                                                                                                                                                      This integration is enabled by default.

                                                                                                                                                                                                                                                                                                                                                                      List of alerts

                                                                                                                                                                                                                                                                                                                                                                      AlertDescriptionFormat
                                                                                                                                                                                                                                                                                                                                                                      [Ntp] Drift is too highDrift is too highPrometheus

                                                                                                                                                                                                                                                                                                                                                                      List of dashboards:

                                                                                                                                                                                                                                                                                                                                                                      • ntp

                                                                                                                                                                                                                                                                                                                                                                      List of metrics:

                                                                                                                                                                                                                                                                                                                                                                      • ntp_drift_seconds

                                                                                                                                                                                                                                                                                                                                                                      8.2.16 -

                                                                                                                                                                                                                                                                                                                                                                      Opa

                                                                                                                                                                                                                                                                                                                                                                      Opa

                                                                                                                                                                                                                                                                                                                                                                      This integration is enabled by default.

                                                                                                                                                                                                                                                                                                                                                                      List of alerts

                                                                                                                                                                                                                                                                                                                                                                      AlertDescriptionFormat
                                                                                                                                                                                                                                                                                                                                                                      [Opa gatekeeper] Too much time since the last auditThere was more than 120 second since the last auditPrometheus
                                                                                                                                                                                                                                                                                                                                                                      [Opa gatekeeper] Spike of violationsThere was more than 30 violationsPrometheus

                                                                                                                                                                                                                                                                                                                                                                      List of dashboards:

                                                                                                                                                                                                                                                                                                                                                                      • OPA_Gatekeeper

                                                                                                                                                                                                                                                                                                                                                                      List of metrics:

                                                                                                                                                                                                                                                                                                                                                                      • gatekeeper_audit_duration_seconds_bucket
                                                                                                                                                                                                                                                                                                                                                                      • gatekeeper_audit_last_run_time
                                                                                                                                                                                                                                                                                                                                                                      • gatekeeper_constraint_template_ingestion_count
                                                                                                                                                                                                                                                                                                                                                                      • gatekeeper_constraint_template_ingestion_duration_seconds_bucket
                                                                                                                                                                                                                                                                                                                                                                      • gatekeeper_constraint_templates
                                                                                                                                                                                                                                                                                                                                                                      • gatekeeper_constraints
                                                                                                                                                                                                                                                                                                                                                                      • gatekeeper_request_count
                                                                                                                                                                                                                                                                                                                                                                      • gatekeeper_request_duration_seconds_bucket
                                                                                                                                                                                                                                                                                                                                                                      • gatekeeper_request_duration_seconds_count
                                                                                                                                                                                                                                                                                                                                                                      • gatekeeper_violations

                                                                                                                                                                                                                                                                                                                                                                      8.2.17 -

                                                                                                                                                                                                                                                                                                                                                                      Php-fpm

                                                                                                                                                                                                                                                                                                                                                                      Php-fpm

                                                                                                                                                                                                                                                                                                                                                                      This integration is enabled by default.

                                                                                                                                                                                                                                                                                                                                                                      List of alerts

                                                                                                                                                                                                                                                                                                                                                                      AlertDescriptionFormat
                                                                                                                                                                                                                                                                                                                                                                      [Php-Fpm] Percentage of instances lowLess than 75% of instances are upPrometheus
                                                                                                                                                                                                                                                                                                                                                                      [Php-Fpm] Recently rebootInstances have been recently rebootPrometheus
                                                                                                                                                                                                                                                                                                                                                                      [Php-Fpm] Limit of child proccess exceededNumber of childs process have been exceededPrometheus
                                                                                                                                                                                                                                                                                                                                                                      [Php-Fpm] Reaching limit of queue processBuffer of queue requests reaching its limitPrometheus
                                                                                                                                                                                                                                                                                                                                                                      [Php-Fpm] Too slow requests processingRequests have taking too much time to be processedPrometheus

                                                                                                                                                                                                                                                                                                                                                                      List of dashboards:

                                                                                                                                                                                                                                                                                                                                                                      • Php-fpm

                                                                                                                                                                                                                                                                                                                                                                      List of metrics:

                                                                                                                                                                                                                                                                                                                                                                      • kube_workload_status_desired
                                                                                                                                                                                                                                                                                                                                                                      • phpfpm_accepted_connections
                                                                                                                                                                                                                                                                                                                                                                      • phpfpm_active_processes
                                                                                                                                                                                                                                                                                                                                                                      • phpfpm_idle_processes
                                                                                                                                                                                                                                                                                                                                                                      • phpfpm_listen_queue
                                                                                                                                                                                                                                                                                                                                                                      • phpfpm_listen_queue_length
                                                                                                                                                                                                                                                                                                                                                                      • phpfpm_max_children_reached
                                                                                                                                                                                                                                                                                                                                                                      • phpfpm_process_requests
                                                                                                                                                                                                                                                                                                                                                                      • phpfpm_slow_requests
                                                                                                                                                                                                                                                                                                                                                                      • phpfpm_start_since
                                                                                                                                                                                                                                                                                                                                                                      • phpfpm_total_processes
                                                                                                                                                                                                                                                                                                                                                                      • phpfpm_up

                                                                                                                                                                                                                                                                                                                                                                      8.2.18 -

                                                                                                                                                                                                                                                                                                                                                                      Portworx

                                                                                                                                                                                                                                                                                                                                                                      Portworx

                                                                                                                                                                                                                                                                                                                                                                      This integration is enabled by default.

                                                                                                                                                                                                                                                                                                                                                                      List of alerts

                                                                                                                                                                                                                                                                                                                                                                      AlertDescriptionFormat
                                                                                                                                                                                                                                                                                                                                                                      [Portworx] No QuorumPortworx No Quorum.Prometheus
                                                                                                                                                                                                                                                                                                                                                                      [Portworx] Node Status Not OKPortworx Node Status Not OK.Prometheus
                                                                                                                                                                                                                                                                                                                                                                      [Portworx] Offline NodesPortworx Offline Nodes.Prometheus
                                                                                                                                                                                                                                                                                                                                                                      [Portworx] Nodes Storage Full or DownPortworx Nodes Storage Full or Down.Prometheus
                                                                                                                                                                                                                                                                                                                                                                      [Portworx] Offline Storage NodesPortworx Offline Storage Nodes.Prometheus
                                                                                                                                                                                                                                                                                                                                                                      [Portworx] Unhealthy Node KVDBPortworx Unhealthy Node KVDB.Prometheus
                                                                                                                                                                                                                                                                                                                                                                      [Portworx] Cache read hit rate is lowPortworx Cache read hit rate is low.Prometheus
                                                                                                                                                                                                                                                                                                                                                                      [Portworx] Cache write hit rate is lowPortworx Cache write hit rate is low.Prometheus
                                                                                                                                                                                                                                                                                                                                                                      [Portworx] High Read Latency In DiskPortworx High Read Latency In Disk.Prometheus
                                                                                                                                                                                                                                                                                                                                                                      [Portworx] High Write Latency In DiskPortworx High Write Latency In Disk.Prometheus
                                                                                                                                                                                                                                                                                                                                                                      [Portworx] Low Cluster CapacityPortworx Low Cluster Capacity.Prometheus
                                                                                                                                                                                                                                                                                                                                                                      [Portworx] Disk Full In 48HPortworx Disk Full In 48H.Prometheus
                                                                                                                                                                                                                                                                                                                                                                      [Portworx] Disk Full In 12HPortworx Disk Full In 12H.Prometheus
                                                                                                                                                                                                                                                                                                                                                                      [Portworx] Pool Status Not OnlinePortworx Node Status Not Online.Prometheus
                                                                                                                                                                                                                                                                                                                                                                      [Portworx] High Write Latency In PoolPortworx High Write Latency In Pool.Prometheus
                                                                                                                                                                                                                                                                                                                                                                      [Portworx] Pool Full In 48HPortworx Pool Full In 48H.Prometheus
                                                                                                                                                                                                                                                                                                                                                                      [Portworx] Pool Full In 12HPortworx Pool Full In 12H.Prometheus
                                                                                                                                                                                                                                                                                                                                                                      [Portworx] High Write Latency In VolumePortworx High Write Latency In Volume.Prometheus
                                                                                                                                                                                                                                                                                                                                                                      [Portworx] High Read Latency In VolumePortworx High Read Latency In Volume.Prometheus
                                                                                                                                                                                                                                                                                                                                                                      [Portworx] License ExpiryPortworx License Expiry.Prometheus

                                                                                                                                                                                                                                                                                                                                                                      List of dashboards:

                                                                                                                                                                                                                                                                                                                                                                      • Portworx Cluster
                                                                                                                                                                                                                                                                                                                                                                      • Portworx Volumes

                                                                                                                                                                                                                                                                                                                                                                      List of metrics:

                                                                                                                                                                                                                                                                                                                                                                      • go_build_info
                                                                                                                                                                                                                                                                                                                                                                      • go_gc_duration_seconds
                                                                                                                                                                                                                                                                                                                                                                      • go_gc_duration_seconds_count
                                                                                                                                                                                                                                                                                                                                                                      • go_gc_duration_seconds_sum
                                                                                                                                                                                                                                                                                                                                                                      • go_goroutines
                                                                                                                                                                                                                                                                                                                                                                      • go_memstats_buck_hash_sys_bytes
                                                                                                                                                                                                                                                                                                                                                                      • go_memstats_gc_sys_bytes
                                                                                                                                                                                                                                                                                                                                                                      • go_memstats_heap_alloc_bytes
                                                                                                                                                                                                                                                                                                                                                                      • go_memstats_heap_idle_bytes
                                                                                                                                                                                                                                                                                                                                                                      • go_memstats_heap_inuse_bytes
                                                                                                                                                                                                                                                                                                                                                                      • go_memstats_heap_released_bytes
                                                                                                                                                                                                                                                                                                                                                                      • go_memstats_heap_sys_bytes
                                                                                                                                                                                                                                                                                                                                                                      • go_memstats_lookups_total
                                                                                                                                                                                                                                                                                                                                                                      • go_memstats_mallocs_total
                                                                                                                                                                                                                                                                                                                                                                      • go_memstats_mcache_inuse_bytes
                                                                                                                                                                                                                                                                                                                                                                      • go_memstats_mcache_sys_bytes
                                                                                                                                                                                                                                                                                                                                                                      • go_memstats_mspan_inuse_bytes
                                                                                                                                                                                                                                                                                                                                                                      • go_memstats_mspan_sys_bytes
                                                                                                                                                                                                                                                                                                                                                                      • go_memstats_next_gc_bytes
                                                                                                                                                                                                                                                                                                                                                                      • go_memstats_stack_inuse_bytes
                                                                                                                                                                                                                                                                                                                                                                      • go_memstats_stack_sys_bytes
                                                                                                                                                                                                                                                                                                                                                                      • go_memstats_sys_bytes
                                                                                                                                                                                                                                                                                                                                                                      • go_threads
                                                                                                                                                                                                                                                                                                                                                                      • process_cpu_seconds_total
                                                                                                                                                                                                                                                                                                                                                                      • process_max_fds
                                                                                                                                                                                                                                                                                                                                                                      • process_open_fds
                                                                                                                                                                                                                                                                                                                                                                      • px_cluster_disk_available_bytes
                                                                                                                                                                                                                                                                                                                                                                      • px_cluster_disk_total_bytes
                                                                                                                                                                                                                                                                                                                                                                      • px_cluster_status_nodes_offline
                                                                                                                                                                                                                                                                                                                                                                      • px_cluster_status_nodes_online
                                                                                                                                                                                                                                                                                                                                                                      • px_cluster_status_nodes_storage_down
                                                                                                                                                                                                                                                                                                                                                                      • px_cluster_status_quorum
                                                                                                                                                                                                                                                                                                                                                                      • px_cluster_status_size
                                                                                                                                                                                                                                                                                                                                                                      • px_cluster_status_storage_nodes_decommissioned
                                                                                                                                                                                                                                                                                                                                                                      • px_cluster_status_storage_nodes_offline
                                                                                                                                                                                                                                                                                                                                                                      • px_cluster_status_storage_nodes_online
                                                                                                                                                                                                                                                                                                                                                                      • px_disk_stats_num_reads_total
                                                                                                                                                                                                                                                                                                                                                                      • px_disk_stats_num_writes_total
                                                                                                                                                                                                                                                                                                                                                                      • px_disk_stats_read_bytes_total
                                                                                                                                                                                                                                                                                                                                                                      • px_disk_stats_read_latency_seconds
                                                                                                                                                                                                                                                                                                                                                                      • px_disk_stats_used_bytes
                                                                                                                                                                                                                                                                                                                                                                      • px_disk_stats_write_latency_seconds
                                                                                                                                                                                                                                                                                                                                                                      • px_disk_stats_written_bytes_total
                                                                                                                                                                                                                                                                                                                                                                      • px_kvdb_health_state_node_view
                                                                                                                                                                                                                                                                                                                                                                      • px_network_io_received_bytes_total
                                                                                                                                                                                                                                                                                                                                                                      • px_network_io_sent_bytes_total
                                                                                                                                                                                                                                                                                                                                                                      • px_node_status_license_expiry
                                                                                                                                                                                                                                                                                                                                                                      • px_node_status_node_status
                                                                                                                                                                                                                                                                                                                                                                      • px_pool_stats_available_bytes
                                                                                                                                                                                                                                                                                                                                                                      • px_pool_stats_flushed_bytes_total
                                                                                                                                                                                                                                                                                                                                                                      • px_pool_stats_num_flushes_total
                                                                                                                                                                                                                                                                                                                                                                      • px_pool_stats_num_writes
                                                                                                                                                                                                                                                                                                                                                                      • px_pool_stats_status
                                                                                                                                                                                                                                                                                                                                                                      • px_pool_stats_total_bytes
                                                                                                                                                                                                                                                                                                                                                                      • px_pool_stats_write_latency_seconds
                                                                                                                                                                                                                                                                                                                                                                      • px_pool_stats_written_bytes
                                                                                                                                                                                                                                                                                                                                                                      • px_px_cache_read_hits
                                                                                                                                                                                                                                                                                                                                                                      • px_px_cache_read_miss
                                                                                                                                                                                                                                                                                                                                                                      • px_px_cache_write_hits
                                                                                                                                                                                                                                                                                                                                                                      • px_px_cache_write_miss
                                                                                                                                                                                                                                                                                                                                                                      • px_volume_attached
                                                                                                                                                                                                                                                                                                                                                                      • px_volume_attached_state
                                                                                                                                                                                                                                                                                                                                                                      • px_volume_capacity_bytes
                                                                                                                                                                                                                                                                                                                                                                      • px_volume_currhalevel
                                                                                                                                                                                                                                                                                                                                                                      • px_volume_halevel
                                                                                                                                                                                                                                                                                                                                                                      • px_volume_read_bytes_total
                                                                                                                                                                                                                                                                                                                                                                      • px_volume_read_latency_seconds
                                                                                                                                                                                                                                                                                                                                                                      • px_volume_reads_total
                                                                                                                                                                                                                                                                                                                                                                      • px_volume_replication_status
                                                                                                                                                                                                                                                                                                                                                                      • px_volume_state
                                                                                                                                                                                                                                                                                                                                                                      • px_volume_status
                                                                                                                                                                                                                                                                                                                                                                      • px_volume_usage_bytes
                                                                                                                                                                                                                                                                                                                                                                      • px_volume_write_latency_seconds
                                                                                                                                                                                                                                                                                                                                                                      • px_volume_writes_total
                                                                                                                                                                                                                                                                                                                                                                      • px_volume_written_bytes_total

                                                                                                                                                                                                                                                                                                                                                                      8.2.19 -

                                                                                                                                                                                                                                                                                                                                                                      Postgresql

                                                                                                                                                                                                                                                                                                                                                                      Postgresql

                                                                                                                                                                                                                                                                                                                                                                      This integration is enabled by default.

                                                                                                                                                                                                                                                                                                                                                                      List of alerts

                                                                                                                                                                                                                                                                                                                                                                      AlertDescriptionFormat
                                                                                                                                                                                                                                                                                                                                                                      [PostgreSQL] Instance DownPostgreSQL instance is unavailablePrometheus
                                                                                                                                                                                                                                                                                                                                                                      [PostgreSQL] Low UpTimeThe PostgreSQL instance has a UpTime of less than 1 hourPrometheus
                                                                                                                                                                                                                                                                                                                                                                      [PostgreSQL] Max Write Buffer ReachedBackground writer stops because it reached the maximum write buffersPrometheus
                                                                                                                                                                                                                                                                                                                                                                      [PostgreSQL] High WAL Files Archive Error RateHigh error rate in WAL files archiverPrometheus
                                                                                                                                                                                                                                                                                                                                                                      [PostgreSQL] Low Available ConnectionsLow available network connectionsPrometheus
                                                                                                                                                                                                                                                                                                                                                                      [PostgreSQL] High Response TimeHigh response time in at least one of the databasesPrometheus
                                                                                                                                                                                                                                                                                                                                                                      [PostgreSQL] Low Cache Hit RateLow cache hit ratePrometheus
                                                                                                                                                                                                                                                                                                                                                                      [PostgreSQL] DeadLocks In DatabaseDeadlocks detected in databasePrometheus

                                                                                                                                                                                                                                                                                                                                                                      List of dashboards:

                                                                                                                                                                                                                                                                                                                                                                      • Postgresql_DB_Golden_Signals
                                                                                                                                                                                                                                                                                                                                                                      • Postgresql_Instance_Health

                                                                                                                                                                                                                                                                                                                                                                      List of metrics:

                                                                                                                                                                                                                                                                                                                                                                      • pg_database_size_bytes
                                                                                                                                                                                                                                                                                                                                                                      • pg_locks_count
                                                                                                                                                                                                                                                                                                                                                                      • pg_postmaster_start_time_seconds
                                                                                                                                                                                                                                                                                                                                                                      • pg_replication_lag
                                                                                                                                                                                                                                                                                                                                                                      • pg_settings_max_connections
                                                                                                                                                                                                                                                                                                                                                                      • pg_settings_superuser_reserved_connections
                                                                                                                                                                                                                                                                                                                                                                      • pg_stat_activity_count
                                                                                                                                                                                                                                                                                                                                                                      • pg_stat_activity_max_tx_duration
                                                                                                                                                                                                                                                                                                                                                                      • pg_stat_archiver_archived_count
                                                                                                                                                                                                                                                                                                                                                                      • pg_stat_archiver_failed_count
                                                                                                                                                                                                                                                                                                                                                                      • pg_stat_bgwriter_buffers_alloc
                                                                                                                                                                                                                                                                                                                                                                      • pg_stat_bgwriter_buffers_backend
                                                                                                                                                                                                                                                                                                                                                                      • pg_stat_bgwriter_buffers_checkpoint
                                                                                                                                                                                                                                                                                                                                                                      • pg_stat_bgwriter_buffers_clean
                                                                                                                                                                                                                                                                                                                                                                      • pg_stat_bgwriter_checkpoint_sync_time
                                                                                                                                                                                                                                                                                                                                                                      • pg_stat_bgwriter_checkpoint_write_time
                                                                                                                                                                                                                                                                                                                                                                      • pg_stat_bgwriter_checkpoints_req
                                                                                                                                                                                                                                                                                                                                                                      • pg_stat_bgwriter_checkpoints_timed
                                                                                                                                                                                                                                                                                                                                                                      • pg_stat_bgwriter_maxwritten_clean
                                                                                                                                                                                                                                                                                                                                                                      • pg_stat_database_blk_read_time
                                                                                                                                                                                                                                                                                                                                                                      • pg_stat_database_blks_hit
                                                                                                                                                                                                                                                                                                                                                                      • pg_stat_database_blks_read
                                                                                                                                                                                                                                                                                                                                                                      • pg_stat_database_conflicts_confl_deadlock
                                                                                                                                                                                                                                                                                                                                                                      • pg_stat_database_conflicts_confl_lock
                                                                                                                                                                                                                                                                                                                                                                      • pg_stat_database_deadlocks
                                                                                                                                                                                                                                                                                                                                                                      • pg_stat_database_numbackends
                                                                                                                                                                                                                                                                                                                                                                      • pg_stat_database_temp_bytes
                                                                                                                                                                                                                                                                                                                                                                      • pg_stat_database_tup_deleted
                                                                                                                                                                                                                                                                                                                                                                      • pg_stat_database_tup_fetched
                                                                                                                                                                                                                                                                                                                                                                      • pg_stat_database_tup_inserted
                                                                                                                                                                                                                                                                                                                                                                      • pg_stat_database_tup_returned
                                                                                                                                                                                                                                                                                                                                                                      • pg_stat_database_tup_updated
                                                                                                                                                                                                                                                                                                                                                                      • pg_stat_database_xact_commit
                                                                                                                                                                                                                                                                                                                                                                      • pg_stat_database_xact_rollback
                                                                                                                                                                                                                                                                                                                                                                      • pg_stat_user_tables_idx_scan
                                                                                                                                                                                                                                                                                                                                                                      • pg_stat_user_tables_n_tup_hot_upd
                                                                                                                                                                                                                                                                                                                                                                      • pg_stat_user_tables_seq_scan
                                                                                                                                                                                                                                                                                                                                                                      • pg_up

                                                                                                                                                                                                                                                                                                                                                                      8.2.20 -

                                                                                                                                                                                                                                                                                                                                                                      Rabbitmq

                                                                                                                                                                                                                                                                                                                                                                      Rabbitmq

                                                                                                                                                                                                                                                                                                                                                                      This integration is enabled by default.

                                                                                                                                                                                                                                                                                                                                                                      List of alerts

                                                                                                                                                                                                                                                                                                                                                                      AlertDescriptionFormat
                                                                                                                                                                                                                                                                                                                                                                      [RabbitMQ] Cluster Operator Unavailable ReplicasThere are kube_pod_names that are either running but not yet available or kube_pod_names that still have not been created.Prometheus
                                                                                                                                                                                                                                                                                                                                                                      [RabbitMQ] Insufficient Established Erlang Distribution LinksInsuffient establised erland distribution linksPrometheus
                                                                                                                                                                                                                                                                                                                                                                      [RabbitMQ] Low Disk Watermark PredictedThe predicted free disk space in 24 hours from now is lowPrometheus
                                                                                                                                                                                                                                                                                                                                                                      [RabbitMQ] High Connection ChurnThere are a high connection churnPrometheus
                                                                                                                                                                                                                                                                                                                                                                      [RabbitMQ] No MajorityOfNodesReadyThere are so many nodes not readyPrometheus
                                                                                                                                                                                                                                                                                                                                                                      [RabbitMQ] Persistent Volume MissingThere is at least one pvc not boundPrometheus
                                                                                                                                                                                                                                                                                                                                                                      [RabbitMQ] Unroutable MessagesThere were unroutable message within the last 5 minutes in RabbitMQ clusterPrometheus
                                                                                                                                                                                                                                                                                                                                                                      [RabbitMQ] File Descriptors Near LimitThe file descriptors are near to the limitPrometheus
                                                                                                                                                                                                                                                                                                                                                                      [RabbitMQ] Container RestartsOver the last 10 minutes a rabbitmq container was restartedPrometheus
                                                                                                                                                                                                                                                                                                                                                                      [RabbitMQ] TCP Sockets Near LimitThe TCP sockets are near to the limitPrometheus

                                                                                                                                                                                                                                                                                                                                                                      List of dashboards:

                                                                                                                                                                                                                                                                                                                                                                      • Rabbitmq_Usage
                                                                                                                                                                                                                                                                                                                                                                      • Rabbitmq_Overview

                                                                                                                                                                                                                                                                                                                                                                      List of metrics:

                                                                                                                                                                                                                                                                                                                                                                      • erlang_vm_dist_node_state
                                                                                                                                                                                                                                                                                                                                                                      • kube_deployment_status_replicas_unavailable
                                                                                                                                                                                                                                                                                                                                                                      • kube_kube_pod_name_container_status_restarts_total
                                                                                                                                                                                                                                                                                                                                                                      • kube_persistentvolumeclaim_status_phase
                                                                                                                                                                                                                                                                                                                                                                      • kube_statefulset_replicas
                                                                                                                                                                                                                                                                                                                                                                      • kube_statefulset_status_replicas_ready
                                                                                                                                                                                                                                                                                                                                                                      • rabbitmq_build_info
                                                                                                                                                                                                                                                                                                                                                                      • rabbitmq_channel_consumers
                                                                                                                                                                                                                                                                                                                                                                      • rabbitmq_channel_get_ack_total
                                                                                                                                                                                                                                                                                                                                                                      • rabbitmq_channel_get_empty_total
                                                                                                                                                                                                                                                                                                                                                                      • rabbitmq_channel_get_total
                                                                                                                                                                                                                                                                                                                                                                      • rabbitmq_channel_messages_acked_total
                                                                                                                                                                                                                                                                                                                                                                      • rabbitmq_channel_messages_confirmed_total
                                                                                                                                                                                                                                                                                                                                                                      • rabbitmq_channel_messages_delivered_ack_total
                                                                                                                                                                                                                                                                                                                                                                      • rabbitmq_channel_messages_delivered_total
                                                                                                                                                                                                                                                                                                                                                                      • rabbitmq_channel_messages_published_total
                                                                                                                                                                                                                                                                                                                                                                      • rabbitmq_channel_messages_redelivered_total
                                                                                                                                                                                                                                                                                                                                                                      • rabbitmq_channel_messages_unconfirmed
                                                                                                                                                                                                                                                                                                                                                                      • rabbitmq_channel_messages_unroutable_dropped_total
                                                                                                                                                                                                                                                                                                                                                                      • rabbitmq_channel_messages_unroutable_returned_total
                                                                                                                                                                                                                                                                                                                                                                      • rabbitmq_channels
                                                                                                                                                                                                                                                                                                                                                                      • rabbitmq_channels_closed_total
                                                                                                                                                                                                                                                                                                                                                                      • rabbitmq_channels_opened_total
                                                                                                                                                                                                                                                                                                                                                                      • rabbitmq_connections
                                                                                                                                                                                                                                                                                                                                                                      • rabbitmq_connections_closed_total
                                                                                                                                                                                                                                                                                                                                                                      • rabbitmq_connections_opened_total
                                                                                                                                                                                                                                                                                                                                                                      • rabbitmq_disk_space_available_bytes
                                                                                                                                                                                                                                                                                                                                                                      • rabbitmq_disk_space_available_limit_bytes
                                                                                                                                                                                                                                                                                                                                                                      • rabbitmq_process_max_fds
                                                                                                                                                                                                                                                                                                                                                                      • rabbitmq_process_max_tcp_sockets
                                                                                                                                                                                                                                                                                                                                                                      • rabbitmq_process_open_fds
                                                                                                                                                                                                                                                                                                                                                                      • rabbitmq_process_open_tcp_sockets
                                                                                                                                                                                                                                                                                                                                                                      • rabbitmq_process_resident_memory_bytes
                                                                                                                                                                                                                                                                                                                                                                      • rabbitmq_queue_messages_published_total
                                                                                                                                                                                                                                                                                                                                                                      • rabbitmq_queue_messages_ready
                                                                                                                                                                                                                                                                                                                                                                      • rabbitmq_queue_messages_unacked
                                                                                                                                                                                                                                                                                                                                                                      • rabbitmq_queues
                                                                                                                                                                                                                                                                                                                                                                      • rabbitmq_queues_created_total
                                                                                                                                                                                                                                                                                                                                                                      • rabbitmq_queues_declared_total
                                                                                                                                                                                                                                                                                                                                                                      • rabbitmq_queues_deleted_total
                                                                                                                                                                                                                                                                                                                                                                      • rabbitmq_resident_memory_limit_bytes

                                                                                                                                                                                                                                                                                                                                                                      8.2.21 -

                                                                                                                                                                                                                                                                                                                                                                      Redis

                                                                                                                                                                                                                                                                                                                                                                      Redis

                                                                                                                                                                                                                                                                                                                                                                      This integration is enabled by default.

                                                                                                                                                                                                                                                                                                                                                                      List of alerts

                                                                                                                                                                                                                                                                                                                                                                      AlertDescriptionFormat
                                                                                                                                                                                                                                                                                                                                                                      [Redis] Low UpTimeUptime of less than 1 hour in a redis instancePrometheus
                                                                                                                                                                                                                                                                                                                                                                      [Redis] High Memory UsageHigh memory usagePrometheus
                                                                                                                                                                                                                                                                                                                                                                      [Redis] High Clients UsageHigh client connections usagePrometheus
                                                                                                                                                                                                                                                                                                                                                                      [Redis] High Response TimeResponse time over 250msPrometheus
                                                                                                                                                                                                                                                                                                                                                                      [Redis] High Fragmentation RatioHigh fragmentation ratioPrometheus
                                                                                                                                                                                                                                                                                                                                                                      [Redis] High Keys Eviction RatioHigh keys eviction ratioPrometheus
                                                                                                                                                                                                                                                                                                                                                                      [Redis] Recurrent Rejected ConnectionsRecurrent rejected connectionsPrometheus
                                                                                                                                                                                                                                                                                                                                                                      [Redis] Low Hit RatioLow keyspace hit ratioPrometheus

                                                                                                                                                                                                                                                                                                                                                                      List of dashboards:

                                                                                                                                                                                                                                                                                                                                                                      • Redis_Golden_Signals

                                                                                                                                                                                                                                                                                                                                                                      List of metrics:

                                                                                                                                                                                                                                                                                                                                                                      • redis_blocked_clients
                                                                                                                                                                                                                                                                                                                                                                      • redis_commands_duration_seconds_total
                                                                                                                                                                                                                                                                                                                                                                      • redis_commands_processed_total
                                                                                                                                                                                                                                                                                                                                                                      • redis_commands_total
                                                                                                                                                                                                                                                                                                                                                                      • redis_config_maxclients
                                                                                                                                                                                                                                                                                                                                                                      • redis_connected_clients
                                                                                                                                                                                                                                                                                                                                                                      • redis_connected_slaves
                                                                                                                                                                                                                                                                                                                                                                      • redis_connections_received_total
                                                                                                                                                                                                                                                                                                                                                                      • redis_cpu_sys_children_seconds_total
                                                                                                                                                                                                                                                                                                                                                                      • redis_cpu_sys_seconds_total
                                                                                                                                                                                                                                                                                                                                                                      • redis_cpu_user_children_seconds_total
                                                                                                                                                                                                                                                                                                                                                                      • redis_cpu_user_seconds_total
                                                                                                                                                                                                                                                                                                                                                                      • redis_db_avg_ttl_seconds
                                                                                                                                                                                                                                                                                                                                                                      • redis_db_keys
                                                                                                                                                                                                                                                                                                                                                                      • redis_evicted_keys_total
                                                                                                                                                                                                                                                                                                                                                                      • redis_expired_keys_total
                                                                                                                                                                                                                                                                                                                                                                      • redis_keyspace_hits_total
                                                                                                                                                                                                                                                                                                                                                                      • redis_keyspace_misses_total
                                                                                                                                                                                                                                                                                                                                                                      • redis_mem_fragmentation_ratio
                                                                                                                                                                                                                                                                                                                                                                      • redis_memory_max_bytes
                                                                                                                                                                                                                                                                                                                                                                      • redis_memory_used_bytes
                                                                                                                                                                                                                                                                                                                                                                      • redis_memory_used_dataset_bytes
                                                                                                                                                                                                                                                                                                                                                                      • redis_memory_used_lua_bytes
                                                                                                                                                                                                                                                                                                                                                                      • redis_memory_used_overhead_bytes
                                                                                                                                                                                                                                                                                                                                                                      • redis_memory_used_scripts_bytes
                                                                                                                                                                                                                                                                                                                                                                      • redis_net_input_bytes_total
                                                                                                                                                                                                                                                                                                                                                                      • redis_net_output_bytes_total
                                                                                                                                                                                                                                                                                                                                                                      • redis_pubsub_channels
                                                                                                                                                                                                                                                                                                                                                                      • redis_pubsub_patterns
                                                                                                                                                                                                                                                                                                                                                                      • redis_rdb_changes_since_last_save
                                                                                                                                                                                                                                                                                                                                                                      • redis_rdb_last_save_timestamp_seconds
                                                                                                                                                                                                                                                                                                                                                                      • redis_rejected_connections_total
                                                                                                                                                                                                                                                                                                                                                                      • redis_slowlog_length
                                                                                                                                                                                                                                                                                                                                                                      • redis_uptime_in_seconds

                                                                                                                                                                                                                                                                                                                                                                      8.2.22 -

                                                                                                                                                                                                                                                                                                                                                                      Sysdig-admission-controller

                                                                                                                                                                                                                                                                                                                                                                      Sysdig-admission-controller

                                                                                                                                                                                                                                                                                                                                                                      This integration is enabled by default.

                                                                                                                                                                                                                                                                                                                                                                      List of alerts

                                                                                                                                                                                                                                                                                                                                                                      AlertDescriptionFormat
                                                                                                                                                                                                                                                                                                                                                                      [Sysdig Admission Controller] No K8s Audit Events ReceivedThe Admission Controller is not receiving Kubernetes Audit eventsPrometheus
                                                                                                                                                                                                                                                                                                                                                                      [Sysdig Admission Controller] K8s Audit Events ThrottlingKubernetes Audit events is being throttledPrometheus
                                                                                                                                                                                                                                                                                                                                                                      [Sysdig Admission Controller] Scanning Events ThrottlingScanning events is being throttledPrometheus
                                                                                                                                                                                                                                                                                                                                                                      [Sysdig Admission Controller] Inline Scanning ThrottlingThe inline scanning queue is not empty for a long timePrometheus
                                                                                                                                                                                                                                                                                                                                                                      [Sysdig Admission Controller] High Error Rate In Scan Status From BackendHigh Error Rate In Scan Status From BackendPrometheus
                                                                                                                                                                                                                                                                                                                                                                      [Sysdig Admission Controller] High Error Rate In Scan Report From BackendHigh Error Rate In Scan Status From BackendPrometheus
                                                                                                                                                                                                                                                                                                                                                                      [Sysdig Admission Controller] High Error Rate In Image ScanHigh Error Rate In Image ScanPrometheus

                                                                                                                                                                                                                                                                                                                                                                      List of dashboards:

                                                                                                                                                                                                                                                                                                                                                                      • Sysdig_Admission_Controller

                                                                                                                                                                                                                                                                                                                                                                      List of metrics:

                                                                                                                                                                                                                                                                                                                                                                      • go_build_info
                                                                                                                                                                                                                                                                                                                                                                      • go_gc_duration_seconds
                                                                                                                                                                                                                                                                                                                                                                      • go_gc_duration_seconds_count
                                                                                                                                                                                                                                                                                                                                                                      • go_gc_duration_seconds_sum
                                                                                                                                                                                                                                                                                                                                                                      • go_goroutines
                                                                                                                                                                                                                                                                                                                                                                      • go_memstats_buck_hash_sys_bytes
                                                                                                                                                                                                                                                                                                                                                                      • go_memstats_gc_sys_bytes
                                                                                                                                                                                                                                                                                                                                                                      • go_memstats_heap_alloc_bytes
                                                                                                                                                                                                                                                                                                                                                                      • go_memstats_heap_idle_bytes
                                                                                                                                                                                                                                                                                                                                                                      • go_memstats_heap_inuse_bytes
                                                                                                                                                                                                                                                                                                                                                                      • go_memstats_heap_released_bytes
                                                                                                                                                                                                                                                                                                                                                                      • go_memstats_heap_sys_bytes
                                                                                                                                                                                                                                                                                                                                                                      • go_memstats_lookups_total
                                                                                                                                                                                                                                                                                                                                                                      • go_memstats_mallocs_total
                                                                                                                                                                                                                                                                                                                                                                      • go_memstats_mcache_inuse_bytes
                                                                                                                                                                                                                                                                                                                                                                      • go_memstats_mcache_sys_bytes
                                                                                                                                                                                                                                                                                                                                                                      • go_memstats_mspan_inuse_bytes
                                                                                                                                                                                                                                                                                                                                                                      • go_memstats_mspan_sys_bytes
                                                                                                                                                                                                                                                                                                                                                                      • go_memstats_next_gc_bytes
                                                                                                                                                                                                                                                                                                                                                                      • go_memstats_stack_inuse_bytes
                                                                                                                                                                                                                                                                                                                                                                      • go_memstats_stack_sys_bytes
                                                                                                                                                                                                                                                                                                                                                                      • go_memstats_sys_bytes
                                                                                                                                                                                                                                                                                                                                                                      • go_threads
                                                                                                                                                                                                                                                                                                                                                                      • k8s_audit_ac_alerts_total
                                                                                                                                                                                                                                                                                                                                                                      • k8s_audit_ac_events_processed_total
                                                                                                                                                                                                                                                                                                                                                                      • k8s_audit_ac_events_received_total
                                                                                                                                                                                                                                                                                                                                                                      • process_cpu_seconds_total
                                                                                                                                                                                                                                                                                                                                                                      • process_max_fds
                                                                                                                                                                                                                                                                                                                                                                      • process_open_fds
                                                                                                                                                                                                                                                                                                                                                                      • queue_length
                                                                                                                                                                                                                                                                                                                                                                      • scan_report_cache_hits
                                                                                                                                                                                                                                                                                                                                                                      • scan_report_cache_misses
                                                                                                                                                                                                                                                                                                                                                                      • scan_status_cache_hits
                                                                                                                                                                                                                                                                                                                                                                      • scan_status_cache_misses
                                                                                                                                                                                                                                                                                                                                                                      • scanner_scan_errors
                                                                                                                                                                                                                                                                                                                                                                      • scanner_scan_report_error_from_backend_count
                                                                                                                                                                                                                                                                                                                                                                      • scanner_scan_report_retrieved_from_backend_count
                                                                                                                                                                                                                                                                                                                                                                      • scanner_scan_requests_already_queued
                                                                                                                                                                                                                                                                                                                                                                      • scanner_scan_requests_error
                                                                                                                                                                                                                                                                                                                                                                      • scanner_scan_requests_queued
                                                                                                                                                                                                                                                                                                                                                                      • scanner_scan_status_error_from_backend_count
                                                                                                                                                                                                                                                                                                                                                                      • scanner_scan_status_retrieved_from_backend_count
                                                                                                                                                                                                                                                                                                                                                                      • scanner_scan_success
                                                                                                                                                                                                                                                                                                                                                                      • scanning_ac_admission_responses_total
                                                                                                                                                                                                                                                                                                                                                                      • scanning_ac_containers_processed_total
                                                                                                                                                                                                                                                                                                                                                                      • scanning_ac_http_scanning_handler_requests_total

                                                                                                                                                                                                                                                                                                                                                                      8.3 -

                                                                                                                                                                                                                                                                                                                                                                      Custom Integrations for Sysdig Monitor

                                                                                                                                                                                                                                                                                                                                                                      • Prometheus Metrics

                                                                                                                                                                                                                                                                                                                                                                        Describes how Sysdig agent enables automatically collecting metrics from services that expose native Prometheus metrics as well as from applications with Prometheus exporters, how to set up your environment, and scrape Prometheus metrics seamlessly.

                                                                                                                                                                                                                                                                                                                                                                      • Java Management Extention (JMX) Metrics

                                                                                                                                                                                                                                                                                                                                                                        Describes how to configure your Java virtual machines so Sysdig Agent can collect JMX metrics using the JMX protocol.

                                                                                                                                                                                                                                                                                                                                                                      • StatsD Metrics

                                                                                                                                                                                                                                                                                                                                                                        Describes how the Sysdig agent collects custom StatsD metrics with an embedded StatsD server.

                                                                                                                                                                                                                                                                                                                                                                      • Node.JS Metrics

                                                                                                                                                                                                                                                                                                                                                                        Illustrates how Sysdig is able to monitor node.js applications by linking a library to the node.js codebase.

                                                                                                                                                                                                                                                                                                                                                                      8.3.1 -

                                                                                                                                                                                                                                                                                                                                                                      Collect Prometheus Metrics

                                                                                                                                                                                                                                                                                                                                                                      Sysdig supports collecting, storing, and querying Prometheus native metrics and labels. You can use Sysdig in the same way that you use Prometheus and leverage Prometheus Query Language (PromQL) to create dashboards and alerts. Sysdig is compatible with Prometheus HTTP API to query your monitoring data programmatically using PromQL and extend Sysdig to other platforms like Grafana.

                                                                                                                                                                                                                                                                                                                                                                      From a metric collection standpoint, a lightweight Prometheus server is directly embedded into the Sysdig agent to facilitate metric collection. This also supports targets, instances, and jobs with filtering and relabeling using Prometheus syntax. You can configure the agent to identify these processes that expose Prometheus metric endpoints on its own host and send it to the Sysdig collector for storing and further processing.

                                                                                                                                                                                                                                                                                                                                                                      The Prometheus product itself does not necessarily have to be installed for Prometheus metrics collection.

                                                                                                                                                                                                                                                                                                                                                                      Agent Compatibility

                                                                                                                                                                                                                                                                                                                                                                      See the Sysdig agent versions and compatibility with Prometheus features:

                                                                                                                                                                                                                                                                                                                                                                      Sysdig Agent v12.2.0 and Above

                                                                                                                                                                                                                                                                                                                                                                      The following features are enabled by default:

                                                                                                                                                                                                                                                                                                                                                                      • Automatically scraping any Kubernetes pods with the following annotation set: prometheus.io/scrape=true
                                                                                                                                                                                                                                                                                                                                                                      • Automatically scrape applications supported by Monitoring Integrations.

                                                                                                                                                                                                                                                                                                                                                                      For more information, see Set up the Environment.

                                                                                                                                                                                                                                                                                                                                                                      Sysdig Agent Prior to v12.0.0

                                                                                                                                                                                                                                                                                                                                                                      Manually enable Prometheus in dragent.yaml file:

                                                                                                                                                                                                                                                                                                                                                                        prometheus:
                                                                                                                                                                                                                                                                                                                                                                             enabled: true
                                                                                                                                                                                                                                                                                                                                                                      

                                                                                                                                                                                                                                                                                                                                                                      For more information, see Enable Promscrape V2 on Older Versions of Sysdig Agent .

                                                                                                                                                                                                                                                                                                                                                                      Learn More

                                                                                                                                                                                                                                                                                                                                                                      The following topics describe in detail about setting up the environment for service discovery, metrics collection, and further processing.

                                                                                                                                                                                                                                                                                                                                                                      See the following blog posts for additional context on the Prometheus metric and how such metrics are typically used.

                                                                                                                                                                                                                                                                                                                                                                      8.3.1.1 -

                                                                                                                                                                                                                                                                                                                                                                      Set Up the Environment

                                                                                                                                                                                                                                                                                                                                                                      If you are already leveraging Kubernetes Service Discovery, specifically the approach given in prometheus-kubernetes.yml, you might already have annotations attached to the pods that mark them as eligible for scraping. Such environments can quickly begin scraping the same metrics by using the Sysdig agent in a single step.

                                                                                                                                                                                                                                                                                                                                                                      If you are not using Kubernetes Service Discovery, follow the instructions given below:

                                                                                                                                                                                                                                                                                                                                                                      Annotation

                                                                                                                                                                                                                                                                                                                                                                      Ensure that the Kubernetes pods that contain your Prometheus exporters have been deployed with the following annotations to enable scraping, substituting the listening exporter-TCP-port:

                                                                                                                                                                                                                                                                                                                                                                      spec:
                                                                                                                                                                                                                                                                                                                                                                        template:
                                                                                                                                                                                                                                                                                                                                                                          metadata:
                                                                                                                                                                                                                                                                                                                                                                            annotations:
                                                                                                                                                                                                                                                                                                                                                                              prometheus.io/scrape: "true"
                                                                                                                                                                                                                                                                                                                                                                              prometheus.io/port: "exporter-TCP-port"
                                                                                                                                                                                                                                                                                                                                                                      

                                                                                                                                                                                                                                                                                                                                                                      The configuration above assumes your exporters use the typical endpoint called /metrics. If your exporter is using a different endpoint, specify by adding the following additional annotation, substituting the exporter-endpoint-name:

                                                                                                                                                                                                                                                                                                                                                                      prometheus.io/path: "/exporter-endpoint-name"
                                                                                                                                                                                                                                                                                                                                                                      
                                                                                                                                                                                                                                                                                                                                                                      

                                                                                                                                                                                                                                                                                                                                                                      Sample Exporter

                                                                                                                                                                                                                                                                                                                                                                      Use the Sample Exporter to test your environment. You will quickly see auto-discovered Prometheus metrics being displayed on Sysdig Monitor. You can use this working example as a basis to similarly annotate your own exporters.

                                                                                                                                                                                                                                                                                                                                                                      8.3.1.2 -

                                                                                                                                                                                                                                                                                                                                                                      Enable Prometheus Native Service Discovery

                                                                                                                                                                                                                                                                                                                                                                      Prometheus service discovery is a standard method of finding endpoints to scrape for metrics. You configure prometheus.yaml and custom jobs to prepare for scraping endpoints in the same way you do for native Prometheus.

                                                                                                                                                                                                                                                                                                                                                                      For metric collection, a lightweight Prometheus server, named promscrape, is directly embedded into the Sysdig agent to facilitate metric collection. Promscrape supports filtering and relabeling targets, instances, and jobs and identify them using the custom jobs configured in the prometheus.yaml file. The latest versions of Sysdig agent (above v12.0.0) by default identify the processes that expose Prometheus metric endpoints on its own host and send it to the Sysdig collector for storing and further processing. On older versions of Sysdig agent, you enable these features by configuring dragent.yaml.

                                                                                                                                                                                                                                                                                                                                                                      Working with Promscrape

                                                                                                                                                                                                                                                                                                                                                                      Promscrape is a lightweight Prometheus server that is embedded with the Sysdig agent. Promscrape scrapes metrics from Prometheus endpoints and sends them for storing and processing.

                                                                                                                                                                                                                                                                                                                                                                      Promscrape has two versions: Promscrape V1 and Promscrape V2.

                                                                                                                                                                                                                                                                                                                                                                      • Promscrape V2

                                                                                                                                                                                                                                                                                                                                                                        Promscrape itself discovers targets by using the standard Prometheus configuration (native Prometheus service discovery), allowing the use of relabel_configs to find or modify targets. An instance of promscrape runs on every node that is running a Sysdig agent and is intended to collect metrics from local as well as remote targets specified in the prometheus.yaml file. The prometheus.yaml file you create is shared across all such nodes.

                                                                                                                                                                                                                                                                                                                                                                        Promscrape V2 is enabled by default on Sysdig agent v12.5.0 and above. On older versions of Sysdig agent, you need to manually enable Promscrape V2, which allows for native Prometheus service discovery, by setting the prom_service_discovery parameter to true in dragent.yaml.

                                                                                                                                                                                                                                                                                                                                                                      • Promscrape V1

                                                                                                                                                                                                                                                                                                                                                                        Sysdig agent discovers scrape targets through the Sysdig process_filter rules. For more information, see Process Filter.

                                                                                                                                                                                                                                                                                                                                                                      About Promscrape V2

                                                                                                                                                                                                                                                                                                                                                                      Supported Features

                                                                                                                                                                                                                                                                                                                                                                      Promscrape V2 supports the following native Prometheus capabilities:

                                                                                                                                                                                                                                                                                                                                                                      • Relabeling: Promscrape V2 supports Prometheus native relabel_config and metric_relabel_configs. Relabel configuration enables the following:

                                                                                                                                                                                                                                                                                                                                                                        • Drop unnecessary metrics or unwanted labels from metrics

                                                                                                                                                                                                                                                                                                                                                                        • Edit the label format of the target before scraping the labels

                                                                                                                                                                                                                                                                                                                                                                      • Sample format: In addition to the regular sample format (metrics name, labels, and metrics reading), Promscrape V2 includes metrics type (counter, gauge, histogram, summary) to every sample sent to the agent.

                                                                                                                                                                                                                                                                                                                                                                      • Scraping configuration: Promscrape V2 supports all types of scraping configuration, such as federation, blackbox-exporter, and so on.

                                                                                                                                                                                                                                                                                                                                                                      • Label mapping: The metrics can be mapped to their source (pod, process) by using the source labels which in turn map certain Prometheus label names to the known agent tags.

                                                                                                                                                                                                                                                                                                                                                                      Unsupported Features

                                                                                                                                                                                                                                                                                                                                                                      • Promscrape V2 does not support calculated metrics.

                                                                                                                                                                                                                                                                                                                                                                      • Promscrape V2 does not support cluster-wide features such as recording rules and alert management.

                                                                                                                                                                                                                                                                                                                                                                      • Service discovery configurations in Promscrape V1 (process_filter) and Promscrape V2 (prometheus.yaml) are incompatible and non-translatable.

                                                                                                                                                                                                                                                                                                                                                                      • Promscrape V2 collects metrics from both local and remote targets specified in the prometheus.yaml file and therefore it does not make sense to configure promscrape to scrape remote targets, because you will see metrics duplication in this case.

                                                                                                                                                                                                                                                                                                                                                                      • Promscrape V2 does not have the cluster view and therefore it ignores the configuration of recording rules and alerts, which is used in the cluster-wide metrics collection. Therefore, the following Prometheus Configurations are not supported

                                                                                                                                                                                                                                                                                                                                                                      • Sysdig uses __HOSTNAME__, which is not a standard Prometheus keyword.

                                                                                                                                                                                                                                                                                                                                                                      Enable Promscrape V2 on Older Versions of Sysdig Agent

                                                                                                                                                                                                                                                                                                                                                                      To enable Prometheus native service discovery on agent versions prior to 11.2:

                                                                                                                                                                                                                                                                                                                                                                      1. Open dragent.yaml file.

                                                                                                                                                                                                                                                                                                                                                                      2. Set the following Prometheus Service Discovery parameter to true:

                                                                                                                                                                                                                                                                                                                                                                        prometheus:
                                                                                                                                                                                                                                                                                                                                                                          prom_service_discovery: true
                                                                                                                                                                                                                                                                                                                                                                        

                                                                                                                                                                                                                                                                                                                                                                        If true, promscrape.v2 is used. Otherwise, promscrape.v1 is used to scrape the targets.

                                                                                                                                                                                                                                                                                                                                                                      3. Restart the agent.

                                                                                                                                                                                                                                                                                                                                                                      Create Custom Jobs

                                                                                                                                                                                                                                                                                                                                                                      Prerequisites

                                                                                                                                                                                                                                                                                                                                                                      Ensure the following features are enabled:

                                                                                                                                                                                                                                                                                                                                                                      • Monitoring Integration
                                                                                                                                                                                                                                                                                                                                                                      • Promscrape V2

                                                                                                                                                                                                                                                                                                                                                                      If you are using Sysdig agent v12.0.0 or above, these features are enabled by default.

                                                                                                                                                                                                                                                                                                                                                                      Prepare Custom Job

                                                                                                                                                                                                                                                                                                                                                                      You set up custom jobs in the Prometheus configuration file to identify endpoints that expose Prometheus metrics. Sysdig agent uses these custom jobs to scrape endpoints by using promscrape, the lightweight Prometheus server embedded in it.

                                                                                                                                                                                                                                                                                                                                                                      Guidelines

                                                                                                                                                                                                                                                                                                                                                                      • Ensure that targets are scraped only by the agent running on the same node as the target. You do this by adding the host selection relabeling rules.

                                                                                                                                                                                                                                                                                                                                                                      • Use the the sysdig specific relabeling rules to automatically get the right workload labels applied.

                                                                                                                                                                                                                                                                                                                                                                      Example Prometheus Configuration file

                                                                                                                                                                                                                                                                                                                                                                      The prometheus.yaml file comes with a default configuration for scraping the pods running on the local node. This configuration also includes the rules to preserve pod UID and container name labels for further correlation with Kubernetes State Metrics or Sysdig native metrics.

                                                                                                                                                                                                                                                                                                                                                                      Here is an example prometheus.yaml file that you can use to set up custom jobs.

                                                                                                                                                                                                                                                                                                                                                                      global:
                                                                                                                                                                                                                                                                                                                                                                        scrape_interval: 10s
                                                                                                                                                                                                                                                                                                                                                                      scrape_configs:
                                                                                                                                                                                                                                                                                                                                                                      - job_name: 'my_pod_job'
                                                                                                                                                                                                                                                                                                                                                                        sample_limit: 40000
                                                                                                                                                                                                                                                                                                                                                                        tls_config:
                                                                                                                                                                                                                                                                                                                                                                          insecure_skip_verify: true
                                                                                                                                                                                                                                                                                                                                                                        kubernetes_sd_configs:
                                                                                                                                                                                                                                                                                                                                                                        - role: pod
                                                                                                                                                                                                                                                                                                                                                                        relabel_configs:
                                                                                                                                                                                                                                                                                                                                                                          # Look for pod name starting with "my_pod_prefix" in namespace "my_namespace"
                                                                                                                                                                                                                                                                                                                                                                        - action:
                                                                                                                                                                                                                                                                                                                                                                          source_labels: [__meta_kubernetes_namespace,__meta_kubernetes_pod_name]
                                                                                                                                                                                                                                                                                                                                                                          separator: /
                                                                                                                                                                                                                                                                                                                                                                          regex: my_namespace/my_pod_prefix.+
                                                                                                                                                                                                                                                                                                                                                                      
                                                                                                                                                                                                                                                                                                                                                                          # In those pods try to scrape from port 9876
                                                                                                                                                                                                                                                                                                                                                                        - source_labels: [__address__]
                                                                                                                                                                                                                                                                                                                                                                          action: replace
                                                                                                                                                                                                                                                                                                                                                                          target_label: __address__
                                                                                                                                                                                                                                                                                                                                                                          regex: (.+?)(\\:\\d)?
                                                                                                                                                                                                                                                                                                                                                                          replacement: $1:9876
                                                                                                                                                                                                                                                                                                                                                                      
                                                                                                                                                                                                                                                                                                                                                                          # Trying to ensure we only scrape local targets
                                                                                                                                                                                                                                                                                                                                                                          # __HOSTIPS__ is replaced by promscrape with a regex list of the IP addresses
                                                                                                                                                                                                                                                                                                                                                                          # of all the active network interfaces on the host
                                                                                                                                                                                                                                                                                                                                                                        - action: keep
                                                                                                                                                                                                                                                                                                                                                                          source_labels: [__meta_kubernetes_pod_host_ip]
                                                                                                                                                                                                                                                                                                                                                                          regex: __HOSTIPS__
                                                                                                                                                                                                                                                                                                                                                                      
                                                                                                                                                                                                                                                                                                                                                                          # Holding on to pod-id and container name so we can associate the metrics
                                                                                                                                                                                                                                                                                                                                                                          # with the container (and cluster hierarchy)
                                                                                                                                                                                                                                                                                                                                                                        - action: replace
                                                                                                                                                                                                                                                                                                                                                                          source_labels: [__meta_kubernetes_pod_uid]
                                                                                                                                                                                                                                                                                                                                                                          target_label: sysdig_k8s_pod_uid
                                                                                                                                                                                                                                                                                                                                                                        - action: replace
                                                                                                                                                                                                                                                                                                                                                                          source_labels: [__meta_kubernetes_pod_container_name]
                                                                                                                                                                                                                                                                                                                                                                          target_label: sysdig_k8s_pod_container_name
                                                                                                                                                                                                                                                                                                                                                                      

                                                                                                                                                                                                                                                                                                                                                                      Default Scrape Job

                                                                                                                                                                                                                                                                                                                                                                      If Monitoring Integration is not enabled for you and you still want to automatically collect metrics from pods with the Prometheus annotations set (prometheus.io/scrape=true), add the following default scrape job to your prometheus.yaml file:

                                                                                                                                                                                                                                                                                                                                                                      - job_name: 'k8s-pods'
                                                                                                                                                                                                                                                                                                                                                                        sample_limit: 40000
                                                                                                                                                                                                                                                                                                                                                                        tls_config:
                                                                                                                                                                                                                                                                                                                                                                          insecure_skip_verify: true
                                                                                                                                                                                                                                                                                                                                                                        kubernetes_sd_configs:
                                                                                                                                                                                                                                                                                                                                                                        - role: pod
                                                                                                                                                                                                                                                                                                                                                                        relabel_configs:
                                                                                                                                                                                                                                                                                                                                                                          # Trying to ensure we only scrape local targets
                                                                                                                                                                                                                                                                                                                                                                          # __HOSTIPS__ is replaced by promscrape with a regex list of the IP addresses
                                                                                                                                                                                                                                                                                                                                                                          # of all the active network interfaces on the host
                                                                                                                                                                                                                                                                                                                                                                        - action: keep
                                                                                                                                                                                                                                                                                                                                                                          source_labels: [__meta_kubernetes_pod_host_ip]
                                                                                                                                                                                                                                                                                                                                                                          regex: __HOSTIPS__
                                                                                                                                                                                                                                                                                                                                                                        - action: keep
                                                                                                                                                                                                                                                                                                                                                                          source_labels: [__meta_kubernetes_pod_annotation_prometheus_io_scrape]
                                                                                                                                                                                                                                                                                                                                                                          regex: true
                                                                                                                                                                                                                                                                                                                                                                        - action: replace
                                                                                                                                                                                                                                                                                                                                                                          source_labels: [__meta_kubernetes_pod_annotation_prometheus_io_scheme]
                                                                                                                                                                                                                                                                                                                                                                          target_label: __scheme__
                                                                                                                                                                                                                                                                                                                                                                          regex: (https?)
                                                                                                                                                                                                                                                                                                                                                                        - action: replace
                                                                                                                                                                                                                                                                                                                                                                          source_labels: [__meta_kubernetes_pod_annotation_prometheus_io_path]
                                                                                                                                                                                                                                                                                                                                                                          target_label: __metrics_path__
                                                                                                                                                                                                                                                                                                                                                                          regex: (.+)
                                                                                                                                                                                                                                                                                                                                                                        - action: replace
                                                                                                                                                                                                                                                                                                                                                                          source_labels: [__address__, __meta_kubernetes_pod_annotation_prometheus_io_port]
                                                                                                                                                                                                                                                                                                                                                                          regex: ([^:]+)(?::\d+)?;(\d+)
                                                                                                                                                                                                                                                                                                                                                                          replacement: $1:$2
                                                                                                                                                                                                                                                                                                                                                                          target_label: __address__
                                                                                                                                                                                                                                                                                                                                                                      
                                                                                                                                                                                                                                                                                                                                                                          # Holding on to pod-id and container name so we can associate the metrics
                                                                                                                                                                                                                                                                                                                                                                          # with the container (and cluster hierarchy)
                                                                                                                                                                                                                                                                                                                                                                        - action: replace
                                                                                                                                                                                                                                                                                                                                                                          source_labels: [__meta_kubernetes_pod_uid]
                                                                                                                                                                                                                                                                                                                                                                          target_label: sysdig_k8s_pod_uid
                                                                                                                                                                                                                                                                                                                                                                        - action: replace
                                                                                                                                                                                                                                                                                                                                                                          source_labels: [__meta_kubernetes_pod_container_name]
                                                                                                                                                                                                                                                                                                                                                                          target_label: sysdig_k8s_pod_container_name
                                                                                                                                                                                                                                                                                                                                                                      

                                                                                                                                                                                                                                                                                                                                                                      Default Prometheus Configuration File

                                                                                                                                                                                                                                                                                                                                                                      Here is the default prometheus.yaml file.

                                                                                                                                                                                                                                                                                                                                                                      global:
                                                                                                                                                                                                                                                                                                                                                                        scrape_interval: 10s
                                                                                                                                                                                                                                                                                                                                                                      scrape_configs:
                                                                                                                                                                                                                                                                                                                                                                      - job_name: 'k8s-pods'
                                                                                                                                                                                                                                                                                                                                                                        tls_config:
                                                                                                                                                                                                                                                                                                                                                                          insecure_skip_verify: true
                                                                                                                                                                                                                                                                                                                                                                        kubernetes_sd_configs:
                                                                                                                                                                                                                                                                                                                                                                        - role: pod
                                                                                                                                                                                                                                                                                                                                                                        relabel_configs:
                                                                                                                                                                                                                                                                                                                                                                          # Trying to ensure we only scrape local targets
                                                                                                                                                                                                                                                                                                                                                                          # __HOSTIPS__ is replaced by promscrape with a regex list of the IP addresses
                                                                                                                                                                                                                                                                                                                                                                          # of all the active network interfaces on the host
                                                                                                                                                                                                                                                                                                                                                                        - action: keep
                                                                                                                                                                                                                                                                                                                                                                          source_labels: [__meta_kubernetes_pod_host_ip]
                                                                                                                                                                                                                                                                                                                                                                          regex: __HOSTIPS__
                                                                                                                                                                                                                                                                                                                                                                        - action: keep
                                                                                                                                                                                                                                                                                                                                                                          source_labels: [__meta_kubernetes_pod_annotation_prometheus_io_scrape]
                                                                                                                                                                                                                                                                                                                                                                          regex: true
                                                                                                                                                                                                                                                                                                                                                                        - action: replace
                                                                                                                                                                                                                                                                                                                                                                          source_labels: [__meta_kubernetes_pod_annotation_prometheus_io_scheme]
                                                                                                                                                                                                                                                                                                                                                                          target_label: __scheme__
                                                                                                                                                                                                                                                                                                                                                                          regex: (https?)
                                                                                                                                                                                                                                                                                                                                                                        - action: replace
                                                                                                                                                                                                                                                                                                                                                                          source_labels: [__meta_kubernetes_pod_annotation_prometheus_io_path]
                                                                                                                                                                                                                                                                                                                                                                          target_label: __metrics_path__
                                                                                                                                                                                                                                                                                                                                                                          regex: (.+)
                                                                                                                                                                                                                                                                                                                                                                        - action: replace
                                                                                                                                                                                                                                                                                                                                                                          source_labels: [__address__, __meta_kubernetes_pod_annotation_prometheus_io_port]
                                                                                                                                                                                                                                                                                                                                                                          regex: ([^:]+)(?::\d+)?;(\d+)
                                                                                                                                                                                                                                                                                                                                                                          replacement: $1:$2
                                                                                                                                                                                                                                                                                                                                                                          target_label: __address__
                                                                                                                                                                                                                                                                                                                                                                          # Holding on to pod-id and container name so we can associate the metrics
                                                                                                                                                                                                                                                                                                                                                                          # with the container (and cluster hierarchy)
                                                                                                                                                                                                                                                                                                                                                                        - action: replace
                                                                                                                                                                                                                                                                                                                                                                          source_labels: [__meta_kubernetes_pod_uid]
                                                                                                                                                                                                                                                                                                                                                                          target_label: sysdig_k8s_pod_uid
                                                                                                                                                                                                                                                                                                                                                                        - action: replace
                                                                                                                                                                                                                                                                                                                                                                          source_labels: [__meta_kubernetes_pod_container_name]
                                                                                                                                                                                                                                                                                                                                                                          target_label: sysdig_k8s_pod_container_name
                                                                                                                                                                                                                                                                                                                                                                      

                                                                                                                                                                                                                                                                                                                                                                      Understand the Prometheus Settings

                                                                                                                                                                                                                                                                                                                                                                      Scrape Interval

                                                                                                                                                                                                                                                                                                                                                                      The default scrape interval is 10 seconds. However, the value can be overridden per scraping job. The scrape interval configured in the prometheus.yaml is independent of the agent configuration.

                                                                                                                                                                                                                                                                                                                                                                      Promscrape V2 reads prometheus.yaml and initiates scraping jobs.

                                                                                                                                                                                                                                                                                                                                                                      The metrics from targets are collected per scrape interval for each target and immediately forwarded to the agent. The agent sends the metrics every 10 seconds to the Sysdig collector. Only those metrics that have been received since the last transmission are sent to the collector. If a scraping job for a job has a scrape interval longer than 10 seconds, the agent transmissions might not include all the metrics from that job.

                                                                                                                                                                                                                                                                                                                                                                      Hostname Selection

                                                                                                                                                                                                                                                                                                                                                                      __HOSTIPS__ is replaced by the host IP addresses. Selection by the host IP address is preferred because of its reliability.

                                                                                                                                                                                                                                                                                                                                                                      __HOSTNAME__ is replaced with the actual hostname before promscrape starts scraping the targets. This allows promscrape to ignore targets running on other hosts.

                                                                                                                                                                                                                                                                                                                                                                      Relabeling Configuration

                                                                                                                                                                                                                                                                                                                                                                      The default Prometheus configuration file contains the following two relabeling configurations:

                                                                                                                                                                                                                                                                                                                                                                      - action: replace
                                                                                                                                                                                                                                                                                                                                                                        source_labels: [__meta_kubernetes_pod_uid]
                                                                                                                                                                                                                                                                                                                                                                        target_label: sysdig_k8s_pod_uid
                                                                                                                                                                                                                                                                                                                                                                      - action: replace
                                                                                                                                                                                                                                                                                                                                                                        source_labels: [__meta_kubernetes_pod_container_name]
                                                                                                                                                                                                                                                                                                                                                                        target_label: sysdig_k8s_pod_container_name
                                                                                                                                                                                                                                                                                                                                                                      

                                                                                                                                                                                                                                                                                                                                                                      These rules add two labels, sysdig_k8s_pod_uid and sysdig_k8s_pod_container_name to every metric gathered from the local targets, containing pod ID and container name respectively. These labels will be dropped from the metrics before sending them to the Sysdig collector for further processing.

                                                                                                                                                                                                                                                                                                                                                                      Configure Prometheus Configuration File Using the Agent Configmap

                                                                                                                                                                                                                                                                                                                                                                      Here is an example for setting up the prometheus.yaml file using the agent configmap:

                                                                                                                                                                                                                                                                                                                                                                      apiVersion: v1
                                                                                                                                                                                                                                                                                                                                                                      data:
                                                                                                                                                                                                                                                                                                                                                                        dragent.yaml: |
                                                                                                                                                                                                                                                                                                                                                                          new_k8s: true
                                                                                                                                                                                                                                                                                                                                                                          k8s_cluster_name: your-cluster-name
                                                                                                                                                                                                                                                                                                                                                                          metrics_excess_log: true
                                                                                                                                                                                                                                                                                                                                                                          10s_flush_enable: true
                                                                                                                                                                                                                                                                                                                                                                          app_checks_enabled: false
                                                                                                                                                                                                                                                                                                                                                                          use_promscrape: true
                                                                                                                                                                                                                                                                                                                                                                          new_k8s: true
                                                                                                                                                                                                                                                                                                                                                                          promscrape_fastproto: true
                                                                                                                                                                                                                                                                                                                                                                          prometheus:
                                                                                                                                                                                                                                                                                                                                                                            enabled: true
                                                                                                                                                                                                                                                                                                                                                                            prom_service_discovery: true
                                                                                                                                                                                                                                                                                                                                                                            log_errors: true
                                                                                                                                                                                                                                                                                                                                                                            max_metrics: 200000
                                                                                                                                                                                                                                                                                                                                                                            max_metrics_per_process: 200000
                                                                                                                                                                                                                                                                                                                                                                            max_tags_per_metric: 100
                                                                                                                                                                                                                                                                                                                                                                            ingest_raw: true
                                                                                                                                                                                                                                                                                                                                                                            ingest_calculated: false
                                                                                                                                                                                                                                                                                                                                                                          snaplen: 512
                                                                                                                                                                                                                                                                                                                                                                          tags: role:cluster
                                                                                                                                                                                                                                                                                                                                                                        prometheus.yaml: |
                                                                                                                                                                                                                                                                                                                                                                          global:
                                                                                                                                                                                                                                                                                                                                                                            scrape_interval: 10s
                                                                                                                                                                                                                                                                                                                                                                          scrape_configs:
                                                                                                                                                                                                                                                                                                                                                                          - job_name: 'haproxy-router'
                                                                                                                                                                                                                                                                                                                                                                            basic_auth:
                                                                                                                                                                                                                                                                                                                                                                              username: USER
                                                                                                                                                                                                                                                                                                                                                                              password: PASSWORD
                                                                                                                                                                                                                                                                                                                                                                            tls_config:
                                                                                                                                                                                                                                                                                                                                                                              insecure_skip_verify: true
                                                                                                                                                                                                                                                                                                                                                                            kubernetes_sd_configs:
                                                                                                                                                                                                                                                                                                                                                                            - role: pod
                                                                                                                                                                                                                                                                                                                                                                            relabel_configs:
                                                                                                                                                                                                                                                                                                                                                                              # Trying to ensure we only scrape local targets
                                                                                                                                                                                                                                                                                                                                                                              # We need the wildcard at the end because in AWS the node name is the FQDN,
                                                                                                                                                                                                                                                                                                                                                                              # whereas in Azure the node name is the base host name
                                                                                                                                                                                                                                                                                                                                                                            - action: keep
                                                                                                                                                                                                                                                                                                                                                                              source_labels: [__meta_kubernetes_pod_host_ip]
                                                                                                                                                                                                                                                                                                                                                                              regex: __HOSTIPS__
                                                                                                                                                                                                                                                                                                                                                                            - action: keep
                                                                                                                                                                                                                                                                                                                                                                              source_labels:
                                                                                                                                                                                                                                                                                                                                                                              - __meta_kubernetes_namespace
                                                                                                                                                                                                                                                                                                                                                                              - __meta_kubernetes_pod_name
                                                                                                                                                                                                                                                                                                                                                                              separator: '/'
                                                                                                                                                                                                                                                                                                                                                                              regex: 'default/router-1-.+'
                                                                                                                                                                                                                                                                                                                                                                              # Holding on to pod-id and container name so we can associate the metrics
                                                                                                                                                                                                                                                                                                                                                                              # with the container (and cluster hierarchy)
                                                                                                                                                                                                                                                                                                                                                                            - action: replace
                                                                                                                                                                                                                                                                                                                                                                              source_labels: [__meta_kubernetes_pod_uid]
                                                                                                                                                                                                                                                                                                                                                                              target_label: sysdig_k8s_pod_uid
                                                                                                                                                                                                                                                                                                                                                                            - action: replace
                                                                                                                                                                                                                                                                                                                                                                              source_labels: [__meta_kubernetes_pod_container_name]
                                                                                                                                                                                                                                                                                                                                                                              target_label: sysdig_k8s_pod_container_name
                                                                                                                                                                                                                                                                                                                                                                      
                                                                                                                                                                                                                                                                                                                                                                      kind: ConfigMap
                                                                                                                                                                                                                                                                                                                                                                      metadata:
                                                                                                                                                                                                                                                                                                                                                                          labels:
                                                                                                                                                                                                                                                                                                                                                                            app: sysdig-agent
                                                                                                                                                                                                                                                                                                                                                                          name: sysdig-agent
                                                                                                                                                                                                                                                                                                                                                                          namespace: sysdig-agent
                                                                                                                                                                                                                                                                                                                                                                      

                                                                                                                                                                                                                                                                                                                                                                      8.3.1.3 -

                                                                                                                                                                                                                                                                                                                                                                      Migrating from Promscrape V1 to V2

                                                                                                                                                                                                                                                                                                                                                                      Promscrape is the lightweight Prometheus server in the Sysdig agent. An updated version of promscrape, named Promscrape V2 is available. This configuration is controlled by the prom_discovery_service parameter in the dragent.yaml file. To use the latest features, such as Service Discovery and Monitoring Integrations, you need to have this option enabled in your environment.

                                                                                                                                                                                                                                                                                                                                                                      Compare Promscrape V1 and V2

                                                                                                                                                                                                                                                                                                                                                                      The main difference between V1 and V2 is how scrape targets are determined.

                                                                                                                                                                                                                                                                                                                                                                      In v1 targets are found through process-filtering rules configured in dragent.yaml or dragent.default.yaml (if no rules are given in dragent.yaml).The process-filtering rules are applied to all the running processes on the host. Matches are made based on process attributes, such as process name or TCP ports being listened to, as well as associated contexts from docker or Kubernetes, such as container labels or Kubernetes annotations.

                                                                                                                                                                                                                                                                                                                                                                      With Promscrape V2, scrape targets are determined by scrape_configs fields in a prometheus.yaml file (or the prometheus-v2.default.yaml file if no prometheus.yaml exists). Because promscrape is adapted from the open-source Prometheus server, the scrape_config settings are compatible with the normal Prometheus configuration. Here is an example:

                                                                                                                                                                                                                                                                                                                                                                      global:
                                                                                                                                                                                                                                                                                                                                                                        scrape_interval: 10s
                                                                                                                                                                                                                                                                                                                                                                      scrape_configs:
                                                                                                                                                                                                                                                                                                                                                                      - job_name: 'my_pod_job'
                                                                                                                                                                                                                                                                                                                                                                        sample_limit: 40000
                                                                                                                                                                                                                                                                                                                                                                        tls_config:
                                                                                                                                                                                                                                                                                                                                                                          insecure_skip_verify: true
                                                                                                                                                                                                                                                                                                                                                                        kubernetes_sd_configs:
                                                                                                                                                                                                                                                                                                                                                                        - role: pod
                                                                                                                                                                                                                                                                                                                                                                        relabel_configs:
                                                                                                                                                                                                                                                                                                                                                                          # Look for pod name starting with "my_pod_prefix" in namespace "my_namespace"
                                                                                                                                                                                                                                                                                                                                                                        - action:
                                                                                                                                                                                                                                                                                                                                                                          source_labels: [__meta_kubernetes_namespace,__meta_kubernetes_pod_name,__meta_kubernetes_pod_label]
                                                                                                                                                                                                                                                                                                                                                                          separator: /
                                                                                                                                                                                                                                                                                                                                                                          regex: my_namespace/my_pod_prefix.+
                                                                                                                                                                                                                                                                                                                                                                        - action: keep
                                                                                                                                                                                                                                                                                                                                                                          source_labels: [__meta_kubernetes_pod_label_app]
                                                                                                                                                                                                                                                                                                                                                                          regex: my_app_metrics
                                                                                                                                                                                                                                                                                                                                                                      
                                                                                                                                                                                                                                                                                                                                                                          # In those pods try to scrape from port 9876
                                                                                                                                                                                                                                                                                                                                                                        - source_labels: [__address__]
                                                                                                                                                                                                                                                                                                                                                                          action: replace
                                                                                                                                                                                                                                                                                                                                                                          target_label: __address__
                                                                                                                                                                                                                                                                                                                                                                          regex: (.+?)(\\:\\d)?
                                                                                                                                                                                                                                                                                                                                                                          replacement: $1:9876
                                                                                                                                                                                                                                                                                                                                                                      
                                                                                                                                                                                                                                                                                                                                                                          # Trying to ensure we only scrape local targets
                                                                                                                                                                                                                                                                                                                                                                          # __HOSTIPS__ is replaced by promscrape with a regex list of the IP addresses
                                                                                                                                                                                                                                                                                                                                                                          # of all the active network interfaces on the host
                                                                                                                                                                                                                                                                                                                                                                        - action: keep
                                                                                                                                                                                                                                                                                                                                                                          source_labels: [__meta_kubernetes_pod_host_ip]
                                                                                                                                                                                                                                                                                                                                                                          regex: __HOSTIPS__
                                                                                                                                                                                                                                                                                                                                                                      
                                                                                                                                                                                                                                                                                                                                                                          # Holding on to pod-id and container name so we can associate the metrics
                                                                                                                                                                                                                                                                                                                                                                          # with the container (and cluster hierarchy)
                                                                                                                                                                                                                                                                                                                                                                        - action: replace
                                                                                                                                                                                                                                                                                                                                                                          source_labels: [__meta_kubernetes_pod_uid]
                                                                                                                                                                                                                                                                                                                                                                          target_label: sysdig_k8s_pod_uid
                                                                                                                                                                                                                                                                                                                                                                        - action: replace
                                                                                                                                                                                                                                                                                                                                                                          source_labels: [__meta_kubernetes_pod_container_name]
                                                                                                                                                                                                                                                                                                                                                                          target_label: sysdig_k8s_pod_container_name
                                                                                                                                                                                                                                                                                                                                                                      

                                                                                                                                                                                                                                                                                                                                                                      Migrate Using Default Configuration

                                                                                                                                                                                                                                                                                                                                                                      The default configuration for Promscrape v1 triggers the scraping based on standard Kubernetes pod annotations and container labels. The default configuration for v2 currently triggers scraping only based on the standard Kubernetes pod annotations leveraging the Prometheus native service discovery.

                                                                                                                                                                                                                                                                                                                                                                      Example Pod Annotations

                                                                                                                                                                                                                                                                                                                                                                      Annotation

                                                                                                                                                                                                                                                                                                                                                                      Value

                                                                                                                                                                                                                                                                                                                                                                      Description

                                                                                                                                                                                                                                                                                                                                                                      spec: template: metadata: annotations: prometheus.io/scrape: "true" prometheus.io/port: ""

                                                                                                                                                                                                                                                                                                                                                                      true

                                                                                                                                                                                                                                                                                                                                                                      Required field.

                                                                                                                                                                                                                                                                                                                                                                      prometheus.io/port: ""

                                                                                                                                                                                                                                                                                                                                                                      The port number to scrape

                                                                                                                                                                                                                                                                                                                                                                      Optional. It will scrape all pod-registered ports if omitted.

                                                                                                                                                                                                                                                                                                                                                                      prometheus.io/scheme

                                                                                                                                                                                                                                                                                                                                                                      <http|https>

                                                                                                                                                                                                                                                                                                                                                                      The default is http.

                                                                                                                                                                                                                                                                                                                                                                      (required field)prometheus.io/path

                                                                                                                                                                                                                                                                                                                                                                      The URL

                                                                                                                                                                                                                                                                                                                                                                      The default is /metrics.

                                                                                                                                                                                                                                                                                                                                                                      Example Static Job

                                                                                                                                                                                                                                                                                                                                                                      - job_name: 'static10'
                                                                                                                                                                                                                                                                                                                                                                        static_configs:
                                                                                                                                                                                                                                                                                                                                                                          - targets: ['localhost:5010']
                                                                                                                                                                                                                                                                                                                                                                      

                                                                                                                                                                                                                                                                                                                                                                      Guidelines

                                                                                                                                                                                                                                                                                                                                                                      • Users running Kubernetes with Promscrape v1 default rules and triggering scraping based on pod annotations need not take any action to migrate to v2. The migration happens automatically.

                                                                                                                                                                                                                                                                                                                                                                      • Users operating non-Kubernetes environments might need to continue using v1 for now, depending on how scraping is triggered. As of today promscrape.v2 doesn’t support leveraging container and Docker labels to discover Prometheus metrics endpoints. If your environment is one of these, define static jobs with the IP:port to be scrapped.

                                                                                                                                                                                                                                                                                                                                                                      Migrate Using Custom Rules

                                                                                                                                                                                                                                                                                                                                                                      If you relying on custom process_filter rules to collect metrics, use any method using standard Prometheus configuration syntax to scrape the endpoints. We recommend one of the following:

                                                                                                                                                                                                                                                                                                                                                                      • Adopt the standard approach of adding the standard Prometheus annotations to their pods. For more information, see Migrate Using Default Configuration.
                                                                                                                                                                                                                                                                                                                                                                      • Write a Prometheus scrape_config by using Kubernetes pods service discovery and use the appropriate pod metadata to trigger their scrapes.

                                                                                                                                                                                                                                                                                                                                                                      See the below example for converting your process_filter rules to Prometheus terminology.

                                                                                                                                                                                                                                                                                                                                                                      process_filter

                                                                                                                                                                                                                                                                                                                                                                      Prometheus

                                                                                                                                                                                                                                                                                                                                                                      - include:
                                                                                                                                                                                                                                                                                                                                                                          kubernetes.pod.annotation.sysdig.com/test: true
                                                                                                                                                                                                                                                                                                                                                                      - action: keep
                                                                                                                                                                                                                                                                                                                                                                        source_labels: [__meta_kubernetes_pod_annotation_sysdig_com_test]
                                                                                                                                                                                                                                                                                                                                                                        regex: true
                                                                                                                                                                                                                                                                                                                                                                      - include:
                                                                                                                                                                                                                                                                                                                                                                          kubernetes.pod.label.app: sysdig
                                                                                                                                                                                                                                                                                                                                                                      - action: keep
                                                                                                                                                                                                                                                                                                                                                                        source_labels: [__meta_kubernetes_pod_label_app]
                                                                                                                                                                                                                                                                                                                                                                        regex: 'sysdig'
                                                                                                                                                                                                                                                                                                                                                                      -include:
                                                                                                                                                                                                                                                                                                                                                                         container.label.com.sysdig.test: true

                                                                                                                                                                                                                                                                                                                                                                      Not supported.

                                                                                                                                                                                                                                                                                                                                                                      - include:
                                                                                                                                                                                                                                                                                                                                                                          process.name: test

                                                                                                                                                                                                                                                                                                                                                                      Not supported.

                                                                                                                                                                                                                                                                                                                                                                      - include:
                                                                                                                                                                                                                                                                                                                                                                          process.cmdline: sysdig-agent

                                                                                                                                                                                                                                                                                                                                                                      Not supported.

                                                                                                                                                                                                                                                                                                                                                                      - include:
                                                                                                                                                                                                                                                                                                                                                                          port: 8080
                                                                                                                                                                                                                                                                                                                                                                      - action: keep
                                                                                                                                                                                                                                                                                                                                                                        source_labels: [__meta_kubernetes_pod_container_port_number]
                                                                                                                                                                                                                                                                                                                                                                        regex: '8080'
                                                                                                                                                                                                                                                                                                                                                                      - include:
                                                                                                                                                                                                                                                                                                                                                                          container.image: sysdig-agent

                                                                                                                                                                                                                                                                                                                                                                      Not supported.

                                                                                                                                                                                                                                                                                                                                                                      - include:
                                                                                                                                                                                                                                                                                                                                                                          container.name: sysdig-agent
                                                                                                                                                                                                                                                                                                                                                                      - action: keep
                                                                                                                                                                                                                                                                                                                                                                        source_labels: [__meta_kubernetes_pod_container_name]
                                                                                                                                                                                                                                                                                                                                                                        regex: 'sysdig-agent'
                                                                                                                                                                                                                                                                                                                                                                      - include:
                                                                                                                                                                                                                                                                                                                                                                          appcheck.match: sysdig

                                                                                                                                                                                                                                                                                                                                                                      Appchecks are not compatble with Promscrape v2. See Configure Monitoring Integrations for supported integrations.

                                                                                                                                                                                                                                                                                                                                                                      Contact Support

                                                                                                                                                                                                                                                                                                                                                                      If you have any queries related to promscrape migration, contact Sysdig Support.

                                                                                                                                                                                                                                                                                                                                                                      8.3.2 -

                                                                                                                                                                                                                                                                                                                                                                      Integrate JMX Metrics from Java Virtual Machines

                                                                                                                                                                                                                                                                                                                                                                      The Sysdig agent retrieves data from your Java virtual machines using the JMX protocol. The agent is configured to automatically discover active Java virtual machines and poll them for basic JVM metrics like Heap Memory and Garbage collector as well as application-specific metrics. Now, the following applications are supported by default:

                                                                                                                                                                                                                                                                                                                                                                      • ActiveMQ
                                                                                                                                                                                                                                                                                                                                                                      • Cassandra
                                                                                                                                                                                                                                                                                                                                                                      • Elasticsearch
                                                                                                                                                                                                                                                                                                                                                                      • HBase
                                                                                                                                                                                                                                                                                                                                                                      • Kafka
                                                                                                                                                                                                                                                                                                                                                                      • Tomcat
                                                                                                                                                                                                                                                                                                                                                                      • Zookeeper

                                                                                                                                                                                                                                                                                                                                                                      The agent can also be easily configured to extract custom JMX metrics coming from your own Java processes. Metrics extracted are shown in the pre-defined Application views or under the Metrics > JVM and JMX menus.

                                                                                                                                                                                                                                                                                                                                                                      The module java.management must be loaded for the Sysdig agent to collect both JVM and JMX metrics.

                                                                                                                                                                                                                                                                                                                                                                      The default JMX metrics configuration is found in the /opt/draios/etc/dragent.default.yaml file. When customizing existing entries, copy the complete application’s bean listing from that defaults yaml file into the user settings file /opt/draios/etc/dragent.yaml. The Sysdig agent will merge configurations of both files.

                                                                                                                                                                                                                                                                                                                                                                      Java versions 7 - 10 are currently supported by the Sysdig agents.

                                                                                                                                                                                                                                                                                                                                                                      For Java 11-14 you must be running minimum agent version 10.1.0 and must run the app with the JMX Remote option.

                                                                                                                                                                                                                                                                                                                                                                      Here is what your dragent.yaml file might look like for a customized entry for the Spark application:

                                                                                                                                                                                                                                                                                                                                                                      customerid: 07c948-your-key-here-006f3b
                                                                                                                                                                                                                                                                                                                                                                      tags: local:nyc,service:db3
                                                                                                                                                                                                                                                                                                                                                                      jmx:
                                                                                                                                                                                                                                                                                                                                                                        per_process_beans:
                                                                                                                                                                                                                                                                                                                                                                          spark:
                                                                                                                                                                                                                                                                                                                                                                            pattern: "spark"
                                                                                                                                                                                                                                                                                                                                                                            beans:
                                                                                                                                                                                                                                                                                                                                                                              - query: "metrics:name=Spark shell.BlockManager.disk.diskSpaceUsed_MB"
                                                                                                                                                                                                                                                                                                                                                                                attributes:
                                                                                                                                                                                                                                                                                                                                                                                  - name: VALUE
                                                                                                                                                                                                                                                                                                                                                                                    alias: spark.metric
                                                                                                                                                                                                                                                                                                                                                                      

                                                                                                                                                                                                                                                                                                                                                                      Include the jmx: and per_process_beans: section headers at the beginning of your application/bean list. For more information on adding parameters to a container agent’s configuration file, see Understanding the Agent Config Files.

                                                                                                                                                                                                                                                                                                                                                                      Bean Configuration

                                                                                                                                                                                                                                                                                                                                                                      Basic JVM metrics are pre-defined inside the default_beans: section. This section is defined in the agent’s default settings file and contains beans and attributes that are going to be polled for every Java process, like memory and garbage collector usage:

                                                                                                                                                                                                                                                                                                                                                                      jmx:
                                                                                                                                                                                                                                                                                                                                                                        default_beans:
                                                                                                                                                                                                                                                                                                                                                                          - query: "java.lang:type=Memory"
                                                                                                                                                                                                                                                                                                                                                                            attributes:
                                                                                                                                                                                                                                                                                                                                                                              - HeapMemoryUsage
                                                                                                                                                                                                                                                                                                                                                                              - NonHeapMemoryUsage
                                                                                                                                                                                                                                                                                                                                                                          - query: "java.lang:type=GarbageCollector,*"
                                                                                                                                                                                                                                                                                                                                                                            attributes:
                                                                                                                                                                                                                                                                                                                                                                              - name: "CollectionCount"
                                                                                                                                                                                                                                                                                                                                                                                type: "counter"
                                                                                                                                                                                                                                                                                                                                                                              - name: "CollectionTime"
                                                                                                                                                                                                                                                                                                                                                                                type: "counter"
                                                                                                                                                                                                                                                                                                                                                                      

                                                                                                                                                                                                                                                                                                                                                                      Metrics specific for each application are specified in sections named after the applications. For example, this is the Tomcat section:

                                                                                                                                                                                                                                                                                                                                                                      per_process_beans:
                                                                                                                                                                                                                                                                                                                                                                          tomcat:
                                                                                                                                                                                                                                                                                                                                                                            pattern: "catalina"
                                                                                                                                                                                                                                                                                                                                                                            beans:
                                                                                                                                                                                                                                                                                                                                                                              - query: "Catalina:type=Cache,*"
                                                                                                                                                                                                                                                                                                                                                                                attributes:
                                                                                                                                                                                                                                                                                                                                                                                  - accessCount
                                                                                                                                                                                                                                                                                                                                                                                  - cacheSize
                                                                                                                                                                                                                                                                                                                                                                                  - hitsCount
                                                                                                                                                                                                                                                                                                                                                                                  - . . .
                                                                                                                                                                                                                                                                                                                                                                      

                                                                                                                                                                                                                                                                                                                                                                      The key name, tomcat in this case, will be displayed as a process name in the Sysdig Monitor user interface instead of just java. The pattern: parameter specifies a string that is used to match a java process name and arguments with this set of JMX metrics. If the process main class full name contains the given text, the process is tagged and the metrics specified in the section will be fetched.

                                                                                                                                                                                                                                                                                                                                                                      The class names are matched against the process argument list. If you implement JMX metrics in a custom manner that does not expose the class names on the command line, you will need to find a pattern which conveniently matches your java invocation command line.

                                                                                                                                                                                                                                                                                                                                                                      The beans: section contains the list of beans to be queried, based on JMX patterns. JMX patterns are explained in details in the Oracle documentation, but in practice, the format of the query line is pretty simple: you can specify the full name of the bean like java.lang:type=Memory , or you can fetch multiple beans in a single line using the wildcard * as in: java.lang:type=GarbageCollector,* (note that this is just a wildcard, not a regex).

                                                                                                                                                                                                                                                                                                                                                                      To get the list of all the beans and attributes that your application exports, you can use JVisualVM, Jmxterm, JConsole or other similar tools. Here is a screenshot from JConsole showing where to find the namespace, bean and attribute (metric) information (JConsole is available when you install the Java Development Kit):

                                                                                                                                                                                                                                                                                                                                                                      For each query, you have to specify the attributes that you want to retrieve, and for each of them a new metric will be created. We support the following JMX attributes types (For these attributes, all the subattributes will be retrieved):

                                                                                                                                                                                                                                                                                                                                                                      Attributes may be absolute values or rates. For absolute values, we need to calculate a per second rate before sending them. In this case, you can specify type: counter , the default is rate which can be omitted, so usually you can simply write the attribute name.

                                                                                                                                                                                                                                                                                                                                                                      Limits

                                                                                                                                                                                                                                                                                                                                                                      The total number of JMX metrics polled per host is limited to 500. The maximum number of beans queried per process is limited to 300. If more metrics are needed please contact your sales representative with your use case.

                                                                                                                                                                                                                                                                                                                                                                      In agents 0.46 and earlier, the limit was 100 beans for each process.

                                                                                                                                                                                                                                                                                                                                                                      Aliases

                                                                                                                                                                                                                                                                                                                                                                      JMX beans and attributes can have very long names. To avoid interface cluttering we added support for aliasing, you can specify an alias in the attribute configuration. For example:

                                                                                                                                                                                                                                                                                                                                                                        cassandra:
                                                                                                                                                                                                                                                                                                                                                                          pattern: "cassandra"
                                                                                                                                                                                                                                                                                                                                                                          beans:
                                                                                                                                                                                                                                                                                                                                                                            - query: "org.apache.cassandra.db:type=StorageProxy
                                                                                                                                                                                                                                                                                                                                                                              attributes:
                                                                                                                                                                                                                                                                                                                                                                                - name: RecentWriteLatencyMicros
                                                                                                                                                                                                                                                                                                                                                                                  alias: cassandra.write.latency
                                                                                                                                                                                                                                                                                                                                                                                - name: RecentReadLatencyMicros
                                                                                                                                                                                                                                                                                                                                                                                  alias: cassandra.read.latency
                                                                                                                                                                                                                                                                                                                                                                      

                                                                                                                                                                                                                                                                                                                                                                      In this way the alias will be used in Sysdig Monitor instead of the raw bean name. Aliases can be dynamic as well, getting data from the bean name - useful where you use pattern bean queries. For example, here we are using the attribute name to create different metrics:

                                                                                                                                                                                                                                                                                                                                                                            - query: "java.lang:type=GarbageCollector,*"
                                                                                                                                                                                                                                                                                                                                                                              attributes:
                                                                                                                                                                                                                                                                                                                                                                                - name: CollectionCount
                                                                                                                                                                                                                                                                                                                                                                                  type: counter
                                                                                                                                                                                                                                                                                                                                                                                  alias: jvm.gc.{name}.count
                                                                                                                                                                                                                                                                                                                                                                                - name: CollectionTime
                                                                                                                                                                                                                                                                                                                                                                                  type: counter
                                                                                                                                                                                                                                                                                                                                                                                  alias: jvm.gc.{name}.time
                                                                                                                                                                                                                                                                                                                                                                      

                                                                                                                                                                                                                                                                                                                                                                      This query will match multiple beans (All Garbage collectors) and the metric name will reflect the name of the Garbage Collector. For example: jvm.gc.ConcurrentMarkSweep.count . General syntax is: {<bean_property_key>} , to get all beans properties you can use a JMX explorer like JVisualVM or Jmxterm.

                                                                                                                                                                                                                                                                                                                                                                      To use these metrics in promQL queries, you have to add the prefix jmx_ and replace the dots (.) from metrics name by underscores (_). For example, the metric name jvm.gc.ConcurrentMarkSweep.count will be jmx_jvm_gc_ConcurrentMarkSweep_count in promQL.

                                                                                                                                                                                                                                                                                                                                                                      Troubleshooting: Why Can’t I See Java (JMX) Metrics?

                                                                                                                                                                                                                                                                                                                                                                      The Sysdig agent normally auto-discovers Java processes running on your host and enables the JMX extensions for polling them.

                                                                                                                                                                                                                                                                                                                                                                      JMX Remote

                                                                                                                                                                                                                                                                                                                                                                      If your Java application is not discovered automatically by the agent, try adding the following parameter on your application’s command line:

                                                                                                                                                                                                                                                                                                                                                                       -Dcom.sun.management.jmxremote
                                                                                                                                                                                                                                                                                                                                                                      

                                                                                                                                                                                                                                                                                                                                                                      For more information, see Oracle’s web page on monitoring using JMX technology.

                                                                                                                                                                                                                                                                                                                                                                      Java Versions

                                                                                                                                                                                                                                                                                                                                                                      Java versions 7 - 10 are currently supported by the Sysdig agents.

                                                                                                                                                                                                                                                                                                                                                                      For Java 11-14 you must be running minimum agent version 10.1.0 and must run the app with the JMX Remote option.

                                                                                                                                                                                                                                                                                                                                                                      Java-Based Applications and JMX Authentication

                                                                                                                                                                                                                                                                                                                                                                      For Java-based applications (Cassandra, Elasticsearch, Kafka, Tomcat, Zookeeper and etc.), the Sysdig agent requires the Java runtime environment (JRE) to be installed to poll for metrics (beans).

                                                                                                                                                                                                                                                                                                                                                                      The Sysdig agent does not support JMX authentication.

                                                                                                                                                                                                                                                                                                                                                                      If the Docker-container-based Sysdig agent is installed, the JRE is installed alongside the agent binaries and no further dependencies exist. However, if you are installing the service-based agent (non-container) and you do not see the JVM/JMX metrics reporting, your host may not have the JRE installed or it may not be installed in the expected location: usr/bin/java

                                                                                                                                                                                                                                                                                                                                                                      To confirm if the Sysdig agent is able to find the JRE, restart the agent with service dragent restart and check the agent’s /opt/draios/logs/draios.log file for the two Java detection and location log entries recorded during agent startup.

                                                                                                                                                                                                                                                                                                                                                                      Example if Java is missing or not found:

                                                                                                                                                                                                                                                                                                                                                                      2017-09-08 23:19:27.944, Information, java detected: false
                                                                                                                                                                                                                                                                                                                                                                      2017-09-08 23:19:27.944, Information, java_binary:
                                                                                                                                                                                                                                                                                                                                                                      

                                                                                                                                                                                                                                                                                                                                                                      Example if Java is found:

                                                                                                                                                                                                                                                                                                                                                                      2017-09-08 23:19:27.944, Information, java detected: true
                                                                                                                                                                                                                                                                                                                                                                      2017-09-08 23:19:27.944, Information, java_binary: /usr/bin/java
                                                                                                                                                                                                                                                                                                                                                                      

                                                                                                                                                                                                                                                                                                                                                                      If Java is not installed, the resolution is to install the Java Runtime Environment. If your host has Java installed but not in the expected location ( /usr/bin/java ) you can install a symlink from /usr/bin/java to the actual binary OR set the java_home: variable in the Sysdig agent’s configuration file: /opt/draios/etc/dragent.yaml

                                                                                                                                                                                                                                                                                                                                                                      java_home: /usr/my_java_location/
                                                                                                                                                                                                                                                                                                                                                                      

                                                                                                                                                                                                                                                                                                                                                                      Disabling JMX Polling

                                                                                                                                                                                                                                                                                                                                                                      If you do not need it or otherwise want to disable JMX metrics reporting, you can add the following two lines to the agent’s user settings configuration file /opt/draios/etc/dragent.yaml:

                                                                                                                                                                                                                                                                                                                                                                      jmx:
                                                                                                                                                                                                                                                                                                                                                                        enabled: false
                                                                                                                                                                                                                                                                                                                                                                      

                                                                                                                                                                                                                                                                                                                                                                      After editing the file, restart the native Linux agent via service dragent restart or restart the container agent to make the change take effect.

                                                                                                                                                                                                                                                                                                                                                                      If using our containerized agent, instead of editing the dragent.yaml file, you can add this extra parameter in the docker run command when starting the agent:

                                                                                                                                                                                                                                                                                                                                                                      -e ADDITIONAL_CONF="jmx:\n  enabled: false\n"
                                                                                                                                                                                                                                                                                                                                                                      

                                                                                                                                                                                                                                                                                                                                                                      8.3.3 -

                                                                                                                                                                                                                                                                                                                                                                      Integrate StatsD Metrics

                                                                                                                                                                                                                                                                                                                                                                      StatsD is an open-source project built by Etsy. Using a StatsD library specific to your application’s language, it allows for the easy generation and transmission of custom application metrics to a collection server.

                                                                                                                                                                                                                                                                                                                                                                      The Sysdig agent contains an embedded StatsD server, so your custom metrics can now be sent to our collector and be relayed to the Sysdig Monitor backend for aggregation. Your application metrics and the rich set of metrics collected by our agent already can all be visualized in the same simple and intuitive graphical interface. Configuring alert notifications is also exactly the same.

                                                                                                                                                                                                                                                                                                                                                                      Installation and Configuration

                                                                                                                                                                                                                                                                                                                                                                      The Statsd server, embedded in Sysdig agent beginning with version 0.1.136, is pre-configured and starts by default so no additional user configuration is necessary. Install the agent in a supported distribution directly or install the Docker containerized version in your container server and you’re done.

                                                                                                                                                                                                                                                                                                                                                                      Sending StatsD Metrics

                                                                                                                                                                                                                                                                                                                                                                      Active Collection

                                                                                                                                                                                                                                                                                                                                                                      By default, the Sysdig agent’s embedded StatsD collector listens on the standard StatsD port, 8125, both on TCP and UDP. StatsD is a text based protocol, where samples are separated by a \n .

                                                                                                                                                                                                                                                                                                                                                                      Sending metrics from your application to the collector is as simple as:

                                                                                                                                                                                                                                                                                                                                                                      echo "hello_statsd:1|c" > /dev/udp/127.0.0.1/8125
                                                                                                                                                                                                                                                                                                                                                                      

                                                                                                                                                                                                                                                                                                                                                                      The example transmits the counter metric "hello_statsd" with a value of ‘1’ to the Statsd collector listening on UDP port 8125. Here is a second example sending the output of a more complex shell command giving the number of established network connections:

                                                                                                                                                                                                                                                                                                                                                                      echo "EstablishedConnections:`netstat -a | grep ESTAB | wc -l`|c" > /dev/udp/127.0.0.1/8125
                                                                                                                                                                                                                                                                                                                                                                      

                                                                                                                                                                                                                                                                                                                                                                      The protocol format is as follows:

                                                                                                                                                                                                                                                                                                                                                                      METRIC_NAME:METRIC_VALUE|TYPE[|@SAMPLING_RATIO]
                                                                                                                                                                                                                                                                                                                                                                      

                                                                                                                                                                                                                                                                                                                                                                      Metric names can be any string except reserved characters: |#:@ . Value is a number and depends on the metric type. Type can be any of: c, ms, g, s . Sampling ratio is a value between 0 (exclusive) and 1 and it’s used to handle subsampling. When sent, metrics will be available in the same display menu for the subviews as the built in metrics.

                                                                                                                                                                                                                                                                                                                                                                      Passive Collection

                                                                                                                                                                                                                                                                                                                                                                      In infrastructures already containing a third party StatsD collection server, StatsD metrics can be collected “out of band”. A passive collection technique is automatically performed by our agent by intercepting system calls - as is done for all the Sysdig Monitor metrics normally collected. This method does not require changing your current StatsD configuration and is an excellent way to ’test drive’ the Sysdig Monitor application without having to perform any modifications other than agent installation.

                                                                                                                                                                                                                                                                                                                                                                      The passive mode of collection is especially suitable for containerized environments where simplicity and efficiency are essential. With the containerized version of the Sysdig Monitor agent running on the host, all other container applications can continue to transmit to any currently implemented collector. In the case where no collector exists, container applications can simply be configured to send StatsD metrics to the localhost interface (127.0.0.1) as demonstrated above - no actual StatsD server needs to be listening at that address.

                                                                                                                                                                                                                                                                                                                                                                      Effectively, each network transmission made from inside the application container, including statsd messages sent to a non existent destination, generates a system call. The Sysdig agent captures these system calls from its own container, where the statsd collector is listening. In practice, the Sysdig agent acts as a transparent proxy between the application and the StatsD collector, even if they are in different containers. The agent correlates which container a system call is coming from, and uses that information to transparently label the StatsD messages.

                                                                                                                                                                                                                                                                                                                                                                      The above graphic demonstrates the components of the Sysdig agent and where metrics are actively or passively collected. Regardless of the method of collection, the number of StatsD metrics the agent can transmit is limited by your payment plan.

                                                                                                                                                                                                                                                                                                                                                                      Note 1: When using the passive technique, ICMP port unreachable events may be generated on the host network.

                                                                                                                                                                                                                                                                                                                                                                      Note 2: Some clients may use IPv6 addressing (::1) for the “localhost” address string. Metrics collection over IPv6 is not supported at this time. If your StatsD metrics are not visible in the Sysdig Monitor interface, please use “127.0.0.1” instead of “localhost” string to force IPv4. Another solution that may be required is adding the JVM option: java.net.preferIPv4Stack=true.

                                                                                                                                                                                                                                                                                                                                                                      Note 3: When StatsD metrics are not continuously transmitted by your application (once per second as in the case of all agent created metrics), the charts will render a ‘zero’ or null value. Any alert conditions will only look at those Statsd values actually transmitted and ignore the nulls.

                                                                                                                                                                                                                                                                                                                                                                      Supported Metric Types

                                                                                                                                                                                                                                                                                                                                                                      Counter

                                                                                                                                                                                                                                                                                                                                                                      A counter metric is updated with the value sent by the application, sent to the Sysdig Monitor backend, and then reset to zero. You can use it to count, for example, how many calls have been made to an API:

                                                                                                                                                                                                                                                                                                                                                                      api.login:1|c
                                                                                                                                                                                                                                                                                                                                                                      

                                                                                                                                                                                                                                                                                                                                                                      You can specify negative values to decrement a counter.

                                                                                                                                                                                                                                                                                                                                                                      Gauge

                                                                                                                                                                                                                                                                                                                                                                      A gauge is a single value that will be sent as is:

                                                                                                                                                                                                                                                                                                                                                                      table_size:10000|g
                                                                                                                                                                                                                                                                                                                                                                      

                                                                                                                                                                                                                                                                                                                                                                      These are plotted as received, in the sense, they are at a point in time metrics. You can achieve relative increments or decrements on a counter by prepending the value with a + or a - respectively. As an example, these three samples will cause table_size to be 950:

                                                                                                                                                                                                                                                                                                                                                                      table_size:1000|g
                                                                                                                                                                                                                                                                                                                                                                      table_size:-100|g
                                                                                                                                                                                                                                                                                                                                                                      table_size:+50|g
                                                                                                                                                                                                                                                                                                                                                                      

                                                                                                                                                                                                                                                                                                                                                                      In Sysdig Monitor, the gauge value is only rendered on the various charts when it is actually transmitted by your application. When not transmitted, a null is plotted on the charts which is not used in any calculations or alerts.

                                                                                                                                                                                                                                                                                                                                                                      Set

                                                                                                                                                                                                                                                                                                                                                                      A set is like a counter, but it counts unique elements. For example:

                                                                                                                                                                                                                                                                                                                                                                      active_users:user1|s active_users:user2|sactive_users:user1|s
                                                                                                                                                                                                                                                                                                                                                                      

                                                                                                                                                                                                                                                                                                                                                                      Will cause the value of active_users to be 2.

                                                                                                                                                                                                                                                                                                                                                                      Metric Labels

                                                                                                                                                                                                                                                                                                                                                                      Labels are an extension of the StatsD specification offered by Sysdig Monitor to offer better flexibility in the way metrics are grouped, filtered and visualized. Labeling can be achieved by using the following syntax:

                                                                                                                                                                                                                                                                                                                                                                      enqueued_messages#az=eu-west-3,country=italy:10|c
                                                                                                                                                                                                                                                                                                                                                                      

                                                                                                                                                                                                                                                                                                                                                                      In general, this is the syntax you can use for labeling:

                                                                                                                                                                                                                                                                                                                                                                      METRIC_NAME#LABEL_NAME=LABEL_VALUE,LABEL_NAME ...
                                                                                                                                                                                                                                                                                                                                                                      

                                                                                                                                                                                                                                                                                                                                                                      Labels can be simple strings or key/value pairs, separated by an = sign. Simple labels can be used for filtering in the Sysdig Monitor web interface. Key/value labels can be used for both filtering and segmentation.

                                                                                                                                                                                                                                                                                                                                                                      Label names prefixed with ‘agent.label’ are reserved for Sysdig agent use only and any custom labels starting with that prefix will be ignored.

                                                                                                                                                                                                                                                                                                                                                                      Limits

                                                                                                                                                                                                                                                                                                                                                                      The number of StatsD metrics the agent can transmit is limited to 1000 for the host and 1000 for all running containers combined. If more metrics are needed please contact your sales representative with your use case.

                                                                                                                                                                                                                                                                                                                                                                      Collect StatsD Metrics Under Load

                                                                                                                                                                                                                                                                                                                                                                      The Sysdig agent can reliably receive StatsD metrics from containers, even while the agent is under load. This setting is controlled by the use_forwarder configuration parameter.

                                                                                                                                                                                                                                                                                                                                                                      The Sysdig agent automatically parses and records StatsD metrics. Historically, the agent parsed the system call stream from the kernel in order to read and record StatsD metrics from containers. For performance reasons, the agent may not be able to collect all StatsD metrics using this mechanism if the load is high. For example, if the StatsD client writes more than 2kB worth of StatsD metrics in a single system call, the agent will truncate the StatsD message, resulting in loss of StatsD metrics.

                                                                                                                                                                                                                                                                                                                                                                      With the introduction of the togglable use_forwarder option, the agent can collect StastsD metrics even under high load.

                                                                                                                                                                                                                                                                                                                                                                      This feature is introduced in Sysdig agent v0.90.1. As of agent v10.4.0, the configuration is enabled by default.

                                                                                                                                                                                                                                                                                                                                                                      statsd:
                                                                                                                                                                                                                                                                                                                                                                        use_forwarder: true
                                                                                                                                                                                                                                                                                                                                                                      

                                                                                                                                                                                                                                                                                                                                                                      To disable, set it to false:

                                                                                                                                                                                                                                                                                                                                                                      statsd:
                                                                                                                                                                                                                                                                                                                                                                        use_forwarder: false
                                                                                                                                                                                                                                                                                                                                                                      

                                                                                                                                                                                                                                                                                                                                                                      When enabled, rather than use the system call stream for container StatsD messages, the agent listens for UDP datagrams on the configured StatsD port on the localhost within the container’s network namespace. This enables the agent to reliably receive StatsD metrics from containers, even while the agent is under load.

                                                                                                                                                                                                                                                                                                                                                                      This option introduces a behavior change in the agent, both in the destination address and in port settings.

                                                                                                                                                                                                                                                                                                                                                                      • When the option is disabled, the agent reads StatsD metrics that are destined to any remote address.

                                                                                                                                                                                                                                                                                                                                                                        With the option is enabled, the agent receives only those metrics that are addressed to the localhost.

                                                                                                                                                                                                                                                                                                                                                                      • When the option is disabled, the agent reads the container StatsD messages destined to only port 8125.

                                                                                                                                                                                                                                                                                                                                                                        When the option is enabled, the agent uses the configured StatsD port.

                                                                                                                                                                                                                                                                                                                                                                      StatsD Server Running in a Monitored Container

                                                                                                                                                                                                                                                                                                                                                                      Using the forwarder is not a valid use case when a StatsD server is running in the container that you are monitoring.

                                                                                                                                                                                                                                                                                                                                                                      A StatsD server running in a container will already have a process bound to port 8125 or a configured StatsD port, so you can’t use that port to collect the metrics with the forwarder. A 10-second startup delay exists in the detection logic to allow any custom StatsD process to bind to that particular port before the forwarder. This ensures that the forwarder does not interrupt the operation.

                                                                                                                                                                                                                                                                                                                                                                      Therefore, for this particular use case, you will need to use the traditional method. Disable the forwarder and capture the metrics via the system call stream.

                                                                                                                                                                                                                                                                                                                                                                      Compatible Clients

                                                                                                                                                                                                                                                                                                                                                                      Every StatsD compliant client works with our implementation. Here is a quick list, it’s provided just as reference. We don’t support them, we support only the protocol specification compliance.

                                                                                                                                                                                                                                                                                                                                                                      A full list can be found at the StatsD GitHub page.

                                                                                                                                                                                                                                                                                                                                                                      Turning Off StatsD Reporting

                                                                                                                                                                                                                                                                                                                                                                      To disable Sysdig agent’s embedded StatsD server, append the following lines to the /opt/draios/etc/dragent.yaml configuration file in each installed host:

                                                                                                                                                                                                                                                                                                                                                                      statsd:
                                                                                                                                                                                                                                                                                                                                                                        enabled: false
                                                                                                                                                                                                                                                                                                                                                                      

                                                                                                                                                                                                                                                                                                                                                                      Note that if Sysdig Secure is used, a compliance check is enabled by default and it sends metrics via StatsD. When disabling StatsD, you need to disable the compliance check as well.

                                                                                                                                                                                                                                                                                                                                                                      security:
                                                                                                                                                                                                                                                                                                                                                                        default_compliance_schedule: ""
                                                                                                                                                                                                                                                                                                                                                                      

                                                                                                                                                                                                                                                                                                                                                                      After modifying the configuration file, you will need to restart the agent with:

                                                                                                                                                                                                                                                                                                                                                                      service dragent restart
                                                                                                                                                                                                                                                                                                                                                                      

                                                                                                                                                                                                                                                                                                                                                                      Changing the StatsD Listener Port and Transport Protocol

                                                                                                                                                                                                                                                                                                                                                                      To modify the port that the agent’s embedded StatsD server listens on, append the following lines to the /opt/draios/etc/dragent.yaml configuration file in each installed host (replace #### with your port):

                                                                                                                                                                                                                                                                                                                                                                      statsd:
                                                                                                                                                                                                                                                                                                                                                                        tcp_port: ####
                                                                                                                                                                                                                                                                                                                                                                        udp_port: ####
                                                                                                                                                                                                                                                                                                                                                                      

                                                                                                                                                                                                                                                                                                                                                                      Characters Allowed For StatsD Metric Names

                                                                                                                                                                                                                                                                                                                                                                      Use standard ASCII characters, we suggest also to use . namespaces as we do for all our metrics.

                                                                                                                                                                                                                                                                                                                                                                      Allowed characters: a-z A-Z 0-9 _ .

                                                                                                                                                                                                                                                                                                                                                                      For more information on adding parameters to a container agent’s configuration file, see /en/docs/installation/sysdig-agent/agent-configuration/understand-the-agent-configuration/.

                                                                                                                                                                                                                                                                                                                                                                      8.3.4 -

                                                                                                                                                                                                                                                                                                                                                                      Integrate Node.js Application Metrics

                                                                                                                                                                                                                                                                                                                                                                      Sysdig is able to monitor node.js applications by linking a library to the node.js code, which then creates a server in the code to export the StatsD metrics.

                                                                                                                                                                                                                                                                                                                                                                      The example below shows a node.js application that exports metrics using the Prometheus protocol:

                                                                                                                                                                                                                                                                                                                                                                      {
                                                                                                                                                                                                                                                                                                                                                                                "name": "node-example",
                                                                                                                                                                                                                                                                                                                                                                                "version": "1.0.0",
                                                                                                                                                                                                                                                                                                                                                                                "description": "Node example exporting metrics via Prometheus",
                                                                                                                                                                                                                                                                                                                                                                                "main": "index.js",
                                                                                                                                                                                                                                                                                                                                                                                "scripts": {
                                                                                                                                                                                                                                                                                                                                                                                  "test": "echo \"Error: no test specified\" && exit 1"
                                                                                                                                                                                                                                                                                                                                                                                },
                                                                                                                                                                                                                                                                                                                                                                                "license": "BSD-2-Clause",
                                                                                                                                                                                                                                                                                                                                                                                "dependencies": {
                                                                                                                                                                                                                                                                                                                                                                                  "express": "^4.14.0",
                                                                                                                                                                                                                                                                                                                                                                                  "gc-stats": "^1.0.0",
                                                                                                                                                                                                                                                                                                                                                                                  "prom-client": "^6.3.0",
                                                                                                                                                                                                                                                                                                                                                                                  "prometheus-gc-stats": "^0.3.1"
                                                                                                                                                                                                                                                                                                                                                                                }
                                                                                                                                                                                                                                                                                                                                                                      }
                                                                                                                                                                                                                                                                                                                                                                      

                                                                                                                                                                                                                                                                                                                                                                      The index.js library function is shown below:

                                                                                                                                                                                                                                                                                                                                                                              // Use express as HTTP middleware
                                                                                                                                                                                                                                                                                                                                                                              // Feel free to use your own
                                                                                                                                                                                                                                                                                                                                                                              var express = require('express')
                                                                                                                                                                                                                                                                                                                                                                                      var app = express()
                                                                                                                                                                                                                                                                                                                                                                      
                                                                                                                                                                                                                                                                                                                                                                              // Initialize Prometheus exporter
                                                                                                                                                                                                                                                                                                                                                                                      const prom = require('prom-client')
                                                                                                                                                                                                                                                                                                                                                                                      const prom_gc = require('prometheus-gc-stats')
                                                                                                                                                                                                                                                                                                                                                                                      prom_gc()
                                                                                                                                                                                                                                                                                                                                                                      
                                                                                                                                                                                                                                                                                                                                                                              // Sample HTTP route
                                                                                                                                                                                                                                                                                                                                                                                      app.get('/', function (req, res) {
                                                                                                                                                                                                                                                                                                                                                                                      res.send('Hello World!')
                                                                                                                                                                                                                                                                                                                                                                                      })
                                                                                                                                                                                                                                                                                                                                                                      
                                                                                                                                                                                                                                                                                                                                                                              // Export Prometheus metrics from /metrics endpoint
                                                                                                                                                                                                                                                                                                                                                                                      app.get('/metrics', function(req, res) {
                                                                                                                                                                                                                                                                                                                                                                                      res.end(prom.register.metrics());
                                                                                                                                                                                                                                                                                                                                                                                      });
                                                                                                                                                                                                                                                                                                                                                                      
                                                                                                                                                                                                                                                                                                                                                                                      app.listen(3000, function () {
                                                                                                                                                                                                                                                                                                                                                                                      console.log('Example app listening on port 3000!')
                                                                                                                                                                                                                                                                                                                                                                                      })
                                                                                                                                                                                                                                                                                                                                                                      

                                                                                                                                                                                                                                                                                                                                                                      To integrate an application:

                                                                                                                                                                                                                                                                                                                                                                      1. Add an appcheck in the dockerfile:

                                                                                                                                                                                                                                                                                                                                                                        FROM node:latest
                                                                                                                                                                                                                                                                                                                                                                        WORKDIR /app
                                                                                                                                                                                                                                                                                                                                                                        ADD package.json ./
                                                                                                                                                                                                                                                                                                                                                                        RUN npm install
                                                                                                                                                                                                                                                                                                                                                                        ENV SYSDIG_AGENT_CONF 'app_checks: [{name: node, check_module: prometheus, pattern: {comm: node}, conf: { url: "http://localhost:{port}/metrics" }}]'
                                                                                                                                                                                                                                                                                                                                                                        ADD index.js ./
                                                                                                                                                                                                                                                                                                                                                                        ENTRYPOINT [ "node", "index.js" ]
                                                                                                                                                                                                                                                                                                                                                                        
                                                                                                                                                                                                                                                                                                                                                                      2. Run the application:

                                                                                                                                                                                                                                                                                                                                                                        user@host:~$ docker build -t node-example
                                                                                                                                                                                                                                                                                                                                                                        user@host:~$ docker run -d node-example
                                                                                                                                                                                                                                                                                                                                                                        

                                                                                                                                                                                                                                                                                                                                                                      Once the Sysdig agent is deployed, node.js metrics will be automatically retrieved. The image below shows an example of key node.js metrics visible on the Sysdig Monitor UI:

                                                                                                                                                                                                                                                                                                                                                                      For code and configuration examples, refer to the Github repository.

                                                                                                                                                                                                                                                                                                                                                                      8.4 -

                                                                                                                                                                                                                                                                                                                                                                      Advanced Configuration for Monitoring Integrations

                                                                                                                                                                                                                                                                                                                                                                      8.4.1 -

                                                                                                                                                                                                                                                                                                                                                                      Configure PVC Metrics

                                                                                                                                                                                                                                                                                                                                                                      You can use dashboards and alerts for PersistentVolumeClaim (PVC) metrics in the regions where PVC metrics are supported.

                                                                                                                                                                                                                                                                                                                                                                      To see data on PVC dashboards and alerts, ensure that the prerequisites are met.

                                                                                                                                                                                                                                                                                                                                                                      Prerequisites

                                                                                                                                                                                                                                                                                                                                                                      Apply Rules

                                                                                                                                                                                                                                                                                                                                                                      If you are upgrading the Sysdig agent, either download sysdig-agent-clusterrole.yaml or apply the following rule to the ClusterRole associated with your Sysdig agent.

                                                                                                                                                                                                                                                                                                                                                                      rules:
                                                                                                                                                                                                                                                                                                                                                                      - apiGroups:
                                                                                                                                                                                                                                                                                                                                                                        - ""
                                                                                                                                                                                                                                                                                                                                                                        resources:
                                                                                                                                                                                                                                                                                                                                                                        - nodes/metrics
                                                                                                                                                                                                                                                                                                                                                                          nodes/proxy
                                                                                                                                                                                                                                                                                                                                                                      - nonResourceURLs:
                                                                                                                                                                                                                                                                                                                                                                        - /metrics
                                                                                                                                                                                                                                                                                                                                                                        verbs:
                                                                                                                                                                                                                                                                                                                                                                        - get
                                                                                                                                                                                                                                                                                                                                                                      

                                                                                                                                                                                                                                                                                                                                                                      The rules are required to scrape the kubelet containers. With this rule enabled, you will also have the kubelet metrics and can access kubelet templates for both dashboards and alerts.

                                                                                                                                                                                                                                                                                                                                                                      This configuration change is only required for agent upgrades because the sysdig-agent-clusterrole.yaml associated with fresh installations will already have this configuration. See Steps for Kubernetes (Vanilla) for information on Sysdig agent installation.

                                                                                                                                                                                                                                                                                                                                                                      Sysdig Agent v12.3.0 or Above

                                                                                                                                                                                                                                                                                                                                                                      PVC metrics are enabled by default for Sysdig agent v12.3.0 or above. To disable collecting PVC metrics, add the following to the dragent.yaml file:

                                                                                                                                                                                                                                                                                                                                                                      k8s_extra_resources:
                                                                                                                                                                                                                                                                                                                                                                        include:
                                                                                                                                                                                                                                                                                                                                                                          - services
                                                                                                                                                                                                                                                                                                                                                                          - resourcequotas
                                                                                                                                                                                                                                                                                                                                                                      

                                                                                                                                                                                                                                                                                                                                                                      Sysdig Agent Prior to v12.3.0

                                                                                                                                                                                                                                                                                                                                                                      Contact your Sysdig representative or Sysdig Support for technical assistance with enabling PVC metrics in your environment.
                                                                                                                                                                                                                                                                                                                                                                      • Upgrade Sysdig agent to v12.2.0 or above

                                                                                                                                                                                                                                                                                                                                                                      • If you are an existing Sysdig user, include the following configuration in the dragent.yaml file:

                                                                                                                                                                                                                                                                                                                                                                        k8s_extra_resources:
                                                                                                                                                                                                                                                                                                                                                                          include:
                                                                                                                                                                                                                                                                                                                                                                            - persistentvolumes
                                                                                                                                                                                                                                                                                                                                                                            - persistentvolumeclaims
                                                                                                                                                                                                                                                                                                                                                                            - storageclasses
                                                                                                                                                                                                                                                                                                                                                                        

                                                                                                                                                                                                                                                                                                                                                                      Access PVC Dashboard Template

                                                                                                                                                                                                                                                                                                                                                                      1. Log in to Sysdig Monitor and click Dashboards.

                                                                                                                                                                                                                                                                                                                                                                      2. On the Dashboards slider, scroll down to locate Dashboard Templates.

                                                                                                                                                                                                                                                                                                                                                                      3. Click Kubernetes to expand the Kubernetes dashboard templates.

                                                                                                                                                                                                                                                                                                                                                                      4. Select the PVC and Storage dashboard.

                                                                                                                                                                                                                                                                                                                                                                      Access PVC Alert Template

                                                                                                                                                                                                                                                                                                                                                                      1. Log in to Sysdig Monitor and click Alerts.

                                                                                                                                                                                                                                                                                                                                                                      2. On the Alerts page, click Library.

                                                                                                                                                                                                                                                                                                                                                                      3. On the Library page, click All Templates.

                                                                                                                                                                                                                                                                                                                                                                      4. Select the Kubenetes PVC alert templates.

                                                                                                                                                                                                                                                                                                                                                                      PVC Metrics

                                                                                                                                                                                                                                                                                                                                                                      MetricsMetric TypeLabelsMetric Source
                                                                                                                                                                                                                                                                                                                                                                      kube_persistentvolume_status_phaseGaugepersistentvolume, phaseKubernetes API
                                                                                                                                                                                                                                                                                                                                                                      kube_persistentvolume_claim_refGaugepersistentvolume, nameKubernetes API
                                                                                                                                                                                                                                                                                                                                                                      kube_storageclass_createdGaugestorageclassKubernetes API
                                                                                                                                                                                                                                                                                                                                                                      kube_storageclass_infoGaugestorageclass, provisioner, reclaim_policy, volume_binding_modeKubernetes API
                                                                                                                                                                                                                                                                                                                                                                      kube_storageclass_labelsGaugestorageclassKubernetes API
                                                                                                                                                                                                                                                                                                                                                                      kube_pod_spec_volumes_persistentvolumeclaims_infoGaugenamespace, pod, uid, volume, persistentvolumeclaimKubernetes API
                                                                                                                                                                                                                                                                                                                                                                      kube_pod_spec_volumes_persistentvolumeclaims_readonlyGaugenamespace, pod, uid, volume, persistentvolumeclaimKubernetes API
                                                                                                                                                                                                                                                                                                                                                                      kube_persistentvolumeclaim_status_conditionGaugenamespace, persistentvolumeclaim, type, statusKubernetes API
                                                                                                                                                                                                                                                                                                                                                                      kube_persistentvolumeclaim_status_phaseGaugenamespace, persistentvolumeclaim, phaseKubernetes API
                                                                                                                                                                                                                                                                                                                                                                      kube_persistentvolumeclaim_access_modeGaugenamespace, persistentvolumeclaim, access_modeKubernetes API
                                                                                                                                                                                                                                                                                                                                                                      kubelet_volume_stats_inodesGaugenamespace, persistentvolumeclaimKubelet
                                                                                                                                                                                                                                                                                                                                                                      kubelet_volume_stats_inodes_freeGaugenamespace, persistentvolumeclaimKubelet
                                                                                                                                                                                                                                                                                                                                                                      kubelet_volume_stats_inodes_usedGaugenamespace, persistentvolumeclaimKubelet
                                                                                                                                                                                                                                                                                                                                                                      kubelet_volume_stats_used_bytesGaugenamespace, persistentvolumeclaimKubelet
                                                                                                                                                                                                                                                                                                                                                                      kubelet_volume_stats_available_bytesGaugenamespace, persistentvolumeclaimKubelet
                                                                                                                                                                                                                                                                                                                                                                      kubelet_volume_stats_capacity_bytesGaugenamespace, persistentvolumeclaimKubelet
                                                                                                                                                                                                                                                                                                                                                                      storage_operation_duration_seconds_bucketGaugeoperation_name, volume_plugin,leKubelet
                                                                                                                                                                                                                                                                                                                                                                      storage_operation_duration_seconds_sumGaugeoperation_name, volume_pluginKubelet
                                                                                                                                                                                                                                                                                                                                                                      storage_operation_duration_seconds_countGaugeoperation_name, volume_pluginKubelet
                                                                                                                                                                                                                                                                                                                                                                      storage_operation_errors_totalGaugeoperation_name, volume_pluginKubelet
                                                                                                                                                                                                                                                                                                                                                                      storage_operation_status_countGaugeoperation_name, status, volume_pluginKubelet

                                                                                                                                                                                                                                                                                                                                                                      8.4.2 -

                                                                                                                                                                                                                                                                                                                                                                      Integrate Keda for HPA

                                                                                                                                                                                                                                                                                                                                                                      Sysdig supports Keda to deploy Kubernetes Horizontal Pod Autoscaler (HPA) using custom metrics exposed by Sysdig Monitor. You can do this by configuring Prometheus queries and endpoints in Keda. Keda uses that information to query your Prometheus server and create HPA. The HPA will takee care of scaling pods based on your usage of resources, such as CPU and memory.

                                                                                                                                                                                                                                                                                                                                                                      This option replaces Sysdig’s existing custom metric server for HPA.

                                                                                                                                                                                                                                                                                                                                                                      Install Keda

                                                                                                                                                                                                                                                                                                                                                                      Requirements:

                                                                                                                                                                                                                                                                                                                                                                      • Helm
                                                                                                                                                                                                                                                                                                                                                                      • Keda v2.3 or above (Endpoint authentication)

                                                                                                                                                                                                                                                                                                                                                                      Install Keda with helm by running the following command:

                                                                                                                                                                                                                                                                                                                                                                      helm repo add kedacore https://kedacore.github.io/charts
                                                                                                                                                                                                                                                                                                                                                                      helm repo update
                                                                                                                                                                                                                                                                                                                                                                      helm install keda kedacore/keda --namespace keda --create-namespace \
                                                                                                                                                                                                                                                                                                                                                                        --set image.metricsApiServer.tag=2.4.0 --set image.keda.tag=2.4.0 \
                                                                                                                                                                                                                                                                                                                                                                        --set prometheus.metricServer.enabled=true
                                                                                                                                                                                                                                                                                                                                                                      

                                                                                                                                                                                                                                                                                                                                                                      Create Authentication for Sysdig Prometheus Endpoint

                                                                                                                                                                                                                                                                                                                                                                      Do the following in each namespace where you want to use Keda. This example uses the namespace, keda.

                                                                                                                                                                                                                                                                                                                                                                      1. Create the secret with the API key as the bearer token:

                                                                                                                                                                                                                                                                                                                                                                        kubectl create secret generic keda-prom-secret --from-literal=bearerToken=<API_KEY> -n keda
                                                                                                                                                                                                                                                                                                                                                                        
                                                                                                                                                                                                                                                                                                                                                                      2. Create the triggerAuthentication.yaml file:

                                                                                                                                                                                                                                                                                                                                                                        apiVersion: keda.sh/v1alpha1
                                                                                                                                                                                                                                                                                                                                                                        kind: TriggerAuthentication
                                                                                                                                                                                                                                                                                                                                                                        metadata:
                                                                                                                                                                                                                                                                                                                                                                          name: keda-prom-creds
                                                                                                                                                                                                                                                                                                                                                                        spec:
                                                                                                                                                                                                                                                                                                                                                                          secretTargetRef:
                                                                                                                                                                                                                                                                                                                                                                          - parameter: bearerToken
                                                                                                                                                                                                                                                                                                                                                                            name: keda-prom-secret
                                                                                                                                                                                                                                                                                                                                                                            key: bearerToken
                                                                                                                                                                                                                                                                                                                                                                        
                                                                                                                                                                                                                                                                                                                                                                      3. Apply the configurations in the triggerAuthentication.yaml file :

                                                                                                                                                                                                                                                                                                                                                                        kubectl apply -f -n keda triggerAuthentication.yaml
                                                                                                                                                                                                                                                                                                                                                                        

                                                                                                                                                                                                                                                                                                                                                                      Configure HPA

                                                                                                                                                                                                                                                                                                                                                                      You can configure HPA for a Deployment, StatefulSet, or CRD. Keda uses a CRD to configure the HPA. You create a ScaledObject and it automatically sets up the metrics server and the HPA object under the hood.

                                                                                                                                                                                                                                                                                                                                                                      1. To create a ScaledObject, specify the following:

                                                                                                                                                                                                                                                                                                                                                                        • spec.scaleTargetRef.name: The unique name of the Deployment.
                                                                                                                                                                                                                                                                                                                                                                        • spec.scaleTargetRef.kind: The kind of object to be scaled: Deployment, SStatefulSet, CustomResource.
                                                                                                                                                                                                                                                                                                                                                                        • spec.minReplicaCount: The minimum number of replicas that the Deployment should have.
                                                                                                                                                                                                                                                                                                                                                                        • spec.maxReplicaCount: The maximum number of replicas that the Deployment should have.
                                                                                                                                                                                                                                                                                                                                                                      2. In the ScaledObject, use a trigger of type prometheus to get the metrics from your Sysdig Monitor account. To do so, specify the following:

                                                                                                                                                                                                                                                                                                                                                                        • triggers.metadata.serverAddress: The address of the Prometheus endpoint. It is the Sysdig Monitor URL with prefix /prometheus. For example: https://app.sysdigcloud.com/prometheus.
                                                                                                                                                                                                                                                                                                                                                                        • triggers.metadata.query: The PromQL query that will return a value. Ensure that the query returns a vector/scalar single element response.
                                                                                                                                                                                                                                                                                                                                                                        • triggers.metadata.metricName: The name of the metric that will be created in the kubernetes API endpoint, /apis/external.metrics.k8s.io/v1beta1.
                                                                                                                                                                                                                                                                                                                                                                        • triggers.metadata.threshold: The threshold that will be used to scale the Deployment.
                                                                                                                                                                                                                                                                                                                                                                      3. Ensure that you add the authModes and authenticationRef to the trigger.

                                                                                                                                                                                                                                                                                                                                                                      4. Check the ScaledObject. Here is an example of a ScaledObject:

                                                                                                                                                                                                                                                                                                                                                                        apiVersion: keda.sh/v1alpha1
                                                                                                                                                                                                                                                                                                                                                                        kind: ScaledObject
                                                                                                                                                                                                                                                                                                                                                                        metadata:
                                                                                                                                                                                                                                                                                                                                                                          name: keda-web
                                                                                                                                                                                                                                                                                                                                                                        spec:
                                                                                                                                                                                                                                                                                                                                                                          scaleTargetRef:
                                                                                                                                                                                                                                                                                                                                                                            kind: Deployment
                                                                                                                                                                                                                                                                                                                                                                            name: web
                                                                                                                                                                                                                                                                                                                                                                          minReplicaCount: 1
                                                                                                                                                                                                                                                                                                                                                                          maxReplicaCount: 4
                                                                                                                                                                                                                                                                                                                                                                          triggers:
                                                                                                                                                                                                                                                                                                                                                                          - type: prometheus
                                                                                                                                                                                                                                                                                                                                                                            metadata:
                                                                                                                                                                                                                                                                                                                                                                              serverAddress: https://app.sysdigcloud.com/prometheus
                                                                                                                                                                                                                                                                                                                                                                              metricName: sysdig_container_cpu_cores_used
                                                                                                                                                                                                                                                                                                                                                                              query: sum(sysdig_container_cpu_cores_used{kube_cluster_name="my-cluster-name", kube_namespace_name="keda", kube_workload_name = "web"} * 10
                                                                                                                                                                                                                                                                                                                                                                              threshold: "5"
                                                                                                                                                                                                                                                                                                                                                                              authModes: "bearer"
                                                                                                                                                                                                                                                                                                                                                                            authenticationRef:
                                                                                                                                                                                                                                                                                                                                                                              name: keda-prom-creds
                                                                                                                                                                                                                                                                                                                                                                        
                                                                                                                                                                                                                                                                                                                                                                        
                                                                                                                                                                                                                                                                                                                                                                        

                                                                                                                                                                                                                                                                                                                                                                      The HPA will divide the value of the metric by the number of current replicas, therefore, try to avoid using the AVERAGE aggregation. Use SUM instead to aggregate the metrics by workload. For example, if the sum of all the values of all the pods is 100 and there are 5 replicas, the HPA will calculate that the value of the metric is 20.

                                                                                                                                                                                                                                                                                                                                                                      Advanced Configurations

                                                                                                                                                                                                                                                                                                                                                                      The ScaledObject permits additional options:

                                                                                                                                                                                                                                                                                                                                                                      spec.pollingInterval:

                                                                                                                                                                                                                                                                                                                                                                      Specify the interval to check each trigger on. By default KEDA will check each trigger source on every ScaledObject every 30 seconds.

                                                                                                                                                                                                                                                                                                                                                                      Warning: setting this to a low value will cause Keda to make frequent API calls to the Prometheus endpoint. The minimum value for pollingInterval is 10 seconds. The scraping frequency of the Sysdig Agent is 10 seconds.

                                                                                                                                                                                                                                                                                                                                                                      spec.cooldownPeriod:

                                                                                                                                                                                                                                                                                                                                                                      The wait period between the last active trigger reported and scaling the resource back to 0. By default the value is 5 minutes (300 seconds).

                                                                                                                                                                                                                                                                                                                                                                      spec.idleReplicaCount:

                                                                                                                                                                                                                                                                                                                                                                      Enabling this property allows KEDA to scale the resource down to the specified number of replicas. If some activity exists on the target triggers, KEDA will scale the target resource immediately to the value of minReplicaCount and scaling is handed over to HPA. When there is no activity, the target resource is again scaled down to the value specified by idleReplicaCount. This setting must be less than minReplicaCount.

                                                                                                                                                                                                                                                                                                                                                                      spec.fallback:

                                                                                                                                                                                                                                                                                                                                                                      This property allows you to define a number of replicas if consecutive connection errors happens with the Prometheus endpoint of your Sysdig account.

                                                                                                                                                                                                                                                                                                                                                                      • spec.fallback.failureThreshold: The number of consecutive errors to apply the fallback.
                                                                                                                                                                                                                                                                                                                                                                      • spec.fallback.replicas: The number of replicas to apply in case of connection error.

                                                                                                                                                                                                                                                                                                                                                                      spec.advanced.horizontalPodAutoscalerConfig.behavior:

                                                                                                                                                                                                                                                                                                                                                                      This property allows you to define the behavior of the Kubernetes HPA Object. See the Kubernetes documentation for more information.

                                                                                                                                                                                                                                                                                                                                                                      Learn More

                                                                                                                                                                                                                                                                                                                                                                      8.4.3 -

                                                                                                                                                                                                                                                                                                                                                                      Configure Recording Rules

                                                                                                                                                                                                                                                                                                                                                                      Sysdig now supports Prometheus recording rules for metric aggregation and querying.

                                                                                                                                                                                                                                                                                                                                                                      You can configure recording rules by using the Sysdig API. Ensure that you define them in a Prometheus compatible way. The mandatory parameters are:

                                                                                                                                                                                                                                                                                                                                                                      • record: The unique name of the time series. It must be a valid metric name.

                                                                                                                                                                                                                                                                                                                                                                      • expr: The PromQL expression to evaluate. In each evaluation cycle, the given expression is evaluated and the result is recorded as a new set of time series with the metric name specified in record.

                                                                                                                                                                                                                                                                                                                                                                      • labels: The unique identifiers to add or overwrite before storing the result.

                                                                                                                                                                                                                                                                                                                                                                      To enable this feature in your environment, contact Sysdig Support.

                                                                                                                                                                                                                                                                                                                                                                      8.4.4 -

                                                                                                                                                                                                                                                                                                                                                                      Configure Sysdig with Grafana

                                                                                                                                                                                                                                                                                                                                                                      Sysdig enables Grafana users to query metrics from Sysdig and visualize them in Grafana dashboards. In order to integrate Sysdig with Grafana, you configure a data source. There are two types of data sources supported:

                                                                                                                                                                                                                                                                                                                                                                      • Prometheus

                                                                                                                                                                                                                                                                                                                                                                        Prometheus data source comes with Grafana and is natively compatible with PromQL. Sysdig provides a Prometheus-compatible API to achieve API-only integration with Grafana.

                                                                                                                                                                                                                                                                                                                                                                      • Sysdig

                                                                                                                                                                                                                                                                                                                                                                        Sysdig data source requires additional settings and is more compatible with the simple “form-based” data configuration. Use the Sysdig native API instead of the Prometheus API. See Sysdig Grafana datasource for more information.

                                                                                                                                                                                                                                                                                                                                                                      Using the Prometheus API on Grafana v6.7 and Above

                                                                                                                                                                                                                                                                                                                                                                      You use the Sysdig Prometheus API to set up the datasource to use with Grafana. Before Grafana can consume Sysdig metrics, Grafana must authenticate itself to Sysdig. To do so, you must set up an HTTP authentication by using the Sysdig API Token because no UI support is currently available on Grafana.

                                                                                                                                                                                                                                                                                                                                                                      1. Assuming that you are not using Grafana, spin up a Grafana container as follows:

                                                                                                                                                                                                                                                                                                                                                                        $ docker run --rm -p 3000:3000 --name grafana grafana/grafana
                                                                                                                                                                                                                                                                                                                                                                        
                                                                                                                                                                                                                                                                                                                                                                      2. Login to Grafana as administrator and create a new datasource by using the following information:

                                                                                                                                                                                                                                                                                                                                                                        • URL: https://<Monitor URL for Your Region>/prometheus

                                                                                                                                                                                                                                                                                                                                                                          See SaaS Regions and IP Ranges and identify the correct URLs associated with your Sysdig application and region.

                                                                                                                                                                                                                                                                                                                                                                        • Authentication: Do not select any authentication mechanisms.

                                                                                                                                                                                                                                                                                                                                                                        • Access: Server (default)

                                                                                                                                                                                                                                                                                                                                                                        • Custom HTTP Headers:

                                                                                                                                                                                                                                                                                                                                                                          • Header: Enter the word, Authorization

                                                                                                                                                                                                                                                                                                                                                                          • Value:  Enter the word, Bearer , followed by a space and <Your Sysdig API Token>

                                                                                                                                                                                                                                                                                                                                                                            API Token is available through Settings > User Profile > Sysdig Monitor API.

                                                                                                                                                                                                                                                                                                                                                                      Using the Grafana API on Grafana v6.6 and Below

                                                                                                                                                                                                                                                                                                                                                                      The feature requires Grafana v5.3.0 or above.

                                                                                                                                                                                                                                                                                                                                                                      You use the Grafana API to set up the Sysdig datasource.

                                                                                                                                                                                                                                                                                                                                                                      1. Download and run Grafana in a container.

                                                                                                                                                                                                                                                                                                                                                                        docker run --rm -p 3000:3000 --name grafana grafana/grafana
                                                                                                                                                                                                                                                                                                                                                                        
                                                                                                                                                                                                                                                                                                                                                                      2. Create a JSON file.

                                                                                                                                                                                                                                                                                                                                                                        cat grafana-stg-ds.json
                                                                                                                                                                                                                                                                                                                                                                        {
                                                                                                                                                                                                                                                                                                                                                                            "name": "Sysdig staging PromQL",
                                                                                                                                                                                                                                                                                                                                                                            "orgId": 1,
                                                                                                                                                                                                                                                                                                                                                                            "type": "prometheus",
                                                                                                                                                                                                                                                                                                                                                                            "access": "proxy",
                                                                                                                                                                                                                                                                                                                                                                            "url": "https://app-staging.sysdigcloud.com/prometheus",
                                                                                                                                                                                                                                                                                                                                                                            "basicAuth": false,
                                                                                                                                                                                                                                                                                                                                                                            "withCredentials": false,
                                                                                                                                                                                                                                                                                                                                                                            "isDefault": false,
                                                                                                                                                                                                                                                                                                                                                                            "editable": true,
                                                                                                                                                                                                                                                                                                                                                                            "jsonData": {
                                                                                                                                                                                                                                                                                                                                                                                "httpHeaderName1": "Authorization",
                                                                                                                                                                                                                                                                                                                                                                                "tlsSkipVerify": true
                                                                                                                                                                                                                                                                                                                                                                            },
                                                                                                                                                                                                                                                                                                                                                                            "secureJsonData": {
                                                                                                                                                                                                                                                                                                                                                                                "httpHeaderValue1": "Bearer your-Sysdig-API-token"
                                                                                                                                                                                                                                                                                                                                                                            }
                                                                                                                                                                                                                                                                                                                                                                        }
                                                                                                                                                                                                                                                                                                                                                                        
                                                                                                                                                                                                                                                                                                                                                                      3. Get your Sysdig API Token and plug it in the JSON file above.

                                                                                                                                                                                                                                                                                                                                                                        "httpHeaderValue1": "Bearer your_Sysdig_API_Token"
                                                                                                                                                                                                                                                                                                                                                                        
                                                                                                                                                                                                                                                                                                                                                                      4. Add the datasource to Grafana.

                                                                                                                                                                                                                                                                                                                                                                        curl -u admin:admin -H "Content-Type: application/json" http://localhost:3000/api/datasources -XPOST -d @grafana-stg-ds.json
                                                                                                                                                                                                                                                                                                                                                                        
                                                                                                                                                                                                                                                                                                                                                                      5. Run Grafana.

                                                                                                                                                                                                                                                                                                                                                                        http://localhost:3000
                                                                                                                                                                                                                                                                                                                                                                        
                                                                                                                                                                                                                                                                                                                                                                      6. Use the default credentials, admin: admin, to sign in to Grafana.

                                                                                                                                                                                                                                                                                                                                                                      7. Open the Data Source tab under Configuration on Grafana and confirm that the one you have added is listed on the page.

                                                                                                                                                                                                                                                                                                                                                                      8.5 -

                                                                                                                                                                                                                                                                                                                                                                      Troubleshoot Monitoring Integrations

                                                                                                                                                                                                                                                                                                                                                                      Review the common troubleshooting scenarios you might encounter while getting a Monitor integration working and see what you can do if an integration does not report metics after installation.

                                                                                                                                                                                                                                                                                                                                                                      Check Prerequisites

                                                                                                                                                                                                                                                                                                                                                                      Some integrations require secrets and other resources available in the correct namespace in order for it to work. Integrations such as database exporters might require you to create a user and provide with special permissions in the database to be able to connect with the endpoint and generate metrics.

                                                                                                                                                                                                                                                                                                                                                                      Ensure that the prerequisites of the integration are met before proceeding with installation.

                                                                                                                                                                                                                                                                                                                                                                      Verify Exporter Is Running

                                                                                                                                                                                                                                                                                                                                                                      If the integration is an exporter, ensure that the pods corresponding to the exporter are running correctly. You can check this after installing the integration. If the exporter is installed as a sidecar of the application (such as Nginx), verify that the exporter container is added to the pod.

                                                                                                                                                                                                                                                                                                                                                                      You can check the status of the pods with the Kubernetes dashboard Pods Status and Performance or with the following command:

                                                                                                                                                                                                                                                                                                                                                                      kubectl get pods --namespace=<namespace>
                                                                                                                                                                                                                                                                                                                                                                      

                                                                                                                                                                                                                                                                                                                                                                      Additionally, if the container has problems and cannot start, check the description of the pod for error messages:

                                                                                                                                                                                                                                                                                                                                                                      kubectl describe pod <pod-name> --namespace=<namespace>
                                                                                                                                                                                                                                                                                                                                                                      

                                                                                                                                                                                                                                                                                                                                                                      Verify Metrics Are Generated

                                                                                                                                                                                                                                                                                                                                                                      Check whether a running exporter is generating metrics by accessing the metrics endpoint:

                                                                                                                                                                                                                                                                                                                                                                      kubectl port-forward <pod-name> <pod-port> <local-port> --namespace=<namespace>
                                                                                                                                                                                                                                                                                                                                                                      curl http://localhost:<local-port>/metrics
                                                                                                                                                                                                                                                                                                                                                                      

                                                                                                                                                                                                                                                                                                                                                                      This is also valid for applications that don’t need an exporter to generate their own metrics.

                                                                                                                                                                                                                                                                                                                                                                      If the exporter is not generating metics, there could be problems accessing or authenticating with the application. Check the logs associated with the pods:

                                                                                                                                                                                                                                                                                                                                                                      kubectl logs <pod-name> --namespace=<namespace>
                                                                                                                                                                                                                                                                                                                                                                      

                                                                                                                                                                                                                                                                                                                                                                      If the application is instrumented and is not generating metrics, check if the Prometheus metrics option or the module is activated.

                                                                                                                                                                                                                                                                                                                                                                      Verify Sysdig Agent Is Scraping Metrics

                                                                                                                                                                                                                                                                                                                                                                      If an application doesn’t need an exporter to generate metrics, check if it has the default Prometheus annotations.

                                                                                                                                                                                                                                                                                                                                                                      Additionally, you can check if the Sysdig agent can access the metrics endpoint. To do so, use the following command:

                                                                                                                                                                                                                                                                                                                                                                      kubectl exec <sysdig-agent-pod-name> --namespace=sysdig-agent -- /bin/sh -c "curl http://<exporer-pod-ip>:<pod-port>/metrics"
                                                                                                                                                                                                                                                                                                                                                                      

                                                                                                                                                                                                                                                                                                                                                                      Select the Sysdig Agent pod in the same node than the pod used to scrape.

                                                                                                                                                                                                                                                                                                                                                                      8.5.1 -

                                                                                                                                                                                                                                                                                                                                                                      Monitor Log Files

                                                                                                                                                                                                                                                                                                                                                                      You can search for particular strings within a given log file, and create a metric that is displayed in Sysdig Monitor’s Explore page. The metrics appear under the StatsD section:

                                                                                                                                                                                                                                                                                                                                                                      Sysdig provides this functionality via a “chisel” script called “logwatcher”, written in Lua. You call the script by adding a logwatcher parameter in the chisels section of the agent configuration file (dragent.yaml). You define the log file name and the precise string to be searched. The results are displayed as metrics in the Monitor UI.

                                                                                                                                                                                                                                                                                                                                                                      Caveats

                                                                                                                                                                                                                                                                                                                                                                      The logwatcher chisel adds to Sysdig’s monitoring capability but is not a fully featured log monitor. Note the following limitations:

                                                                                                                                                                                                                                                                                                                                                                      • No regex support: Sysdig does not offer regex support; you must define the precise log file and string to be searched.

                                                                                                                                                                                                                                                                                                                                                                        (If you were to supply a string with spaces, forward-slashes, or back-slashes in it, the metric generated would also have these characters and so could not be used to create an alert.)

                                                                                                                                                                                                                                                                                                                                                                      • Limit of 12 string searches/host: Logwatcher is implemented as a LUA script and, due to resources consumed by this chisel, it is not recommended to have more than a dozen string searches configured per agent/host.

                                                                                                                                                                                                                                                                                                                                                                      Implementation

                                                                                                                                                                                                                                                                                                                                                                      Edit the agent configuration file to enable the logwatcher chisel. See Understanding the Agent Config Files for editing options.

                                                                                                                                                                                                                                                                                                                                                                      Preparation

                                                                                                                                                                                                                                                                                                                                                                      Determine the log file name(s) and string(s) you want to monitor.

                                                                                                                                                                                                                                                                                                                                                                      To monitor the output of docker logs <container-name>, find the container’s docker log file with:

                                                                                                                                                                                                                                                                                                                                                                      docker inspect <container-name> | grep LogPath

                                                                                                                                                                                                                                                                                                                                                                      Edit dragent.yaml

                                                                                                                                                                                                                                                                                                                                                                      1. Access dragent.yaml directly at /opt/draios/etc/dragent.yaml.

                                                                                                                                                                                                                                                                                                                                                                      2. Add a chisels entry:

                                                                                                                                                                                                                                                                                                                                                                        Format:

                                                                                                                                                                                                                                                                                                                                                                        chisels:
                                                                                                                                                                                                                                                                                                                                                                          - name: logwatcher
                                                                                                                                                                                                                                                                                                                                                                            args:
                                                                                                                                                                                                                                                                                                                                                                              filespattern: YOURFILENAME.log
                                                                                                                                                                                                                                                                                                                                                                              term: YOURSTRING
                                                                                                                                                                                                                                                                                                                                                                        

                                                                                                                                                                                                                                                                                                                                                                        Sample Entry:

                                                                                                                                                                                                                                                                                                                                                                        customerid: 831f2-your-key-here-d69401
                                                                                                                                                                                                                                                                                                                                                                        tags: tagname.tagvalue
                                                                                                                                                                                                                                                                                                                                                                        chisels:
                                                                                                                                                                                                                                                                                                                                                                          - name: logwatcher
                                                                                                                                                                                                                                                                                                                                                                            args:
                                                                                                                                                                                                                                                                                                                                                                              filespattern: draios.log
                                                                                                                                                                                                                                                                                                                                                                              term: Sent
                                                                                                                                                                                                                                                                                                                                                                        

                                                                                                                                                                                                                                                                                                                                                                        In this example, Sysdig’s own draios.log is searched for the Sent string.

                                                                                                                                                                                                                                                                                                                                                                        The output, in the Sysdig Monitor UI, would show the StatsD metric logwatcher.draios_log.Sent and the number of ‘Sent’ items detected.

                                                                                                                                                                                                                                                                                                                                                                      3. Optional: Add multiple -name: sections in the config file to search for additional logs/strings.

                                                                                                                                                                                                                                                                                                                                                                        Note the recommended 12-string/agent limit.

                                                                                                                                                                                                                                                                                                                                                                      4. Restart the agent for changes to take effect.

                                                                                                                                                                                                                                                                                                                                                                        For container agent:

                                                                                                                                                                                                                                                                                                                                                                        docker restart sysdig-agent
                                                                                                                                                                                                                                                                                                                                                                        

                                                                                                                                                                                                                                                                                                                                                                        For non-containerized (service) agent:

                                                                                                                                                                                                                                                                                                                                                                        service dragent restart
                                                                                                                                                                                                                                                                                                                                                                        

                                                                                                                                                                                                                                                                                                                                                                      Parameters

                                                                                                                                                                                                                                                                                                                                                                      NameValueDescription
                                                                                                                                                                                                                                                                                                                                                                      namelogwatcherThe chisel used in the enterprise Sysdig platform to search log files. (Other chisels are available in Sysdig’s open-source product.)
                                                                                                                                                                                                                                                                                                                                                                      filespatternYOURFILENAME.logThe log file to be searched. Do not specify a path with the file name.
                                                                                                                                                                                                                                                                                                                                                                      termYOURSTRINGThe string to be searched.

                                                                                                                                                                                                                                                                                                                                                                      View Log File Metrics in the Monitor UI

                                                                                                                                                                                                                                                                                                                                                                      To view logwatcher results:

                                                                                                                                                                                                                                                                                                                                                                      1. Log in to Sysdig Monitor and select Explore.

                                                                                                                                                                                                                                                                                                                                                                      2. Select Entire Infrastructure > Overview by Host.

                                                                                                                                                                                                                                                                                                                                                                      3. In the resulting drop-down, either scroll to Metrics > StatsD > logwatcher or enter “logwatcher” in the search field.

                                                                                                                                                                                                                                                                                                                                                                        Each string you configured in the agent config file will be listed in the format logwatcher.YOURFILENAME_log.STRING.

                                                                                                                                                                                                                                                                                                                                                                      4. The relevant metrics are displayed.

                                                                                                                                                                                                                                                                                                                                                                      You can also Add an Alert on logwatcher metrics, to be notified when an important log entry appears.

                                                                                                                                                                                                                                                                                                                                                                      8.6 -

                                                                                                                                                                                                                                                                                                                                                                      (Legacy) Integrations for Sysdig Monitor

                                                                                                                                                                                                                                                                                                                                                                      Integrate metrics with Sysdig Monitor from a number of platforms, orchestrators, and a wide range of applications. Sysdig collects metrics from Prometheus, JMX, StatsD, Kubernetes, and many application stacks to provide a 360-degree view of your infrastructure. Many metrics are collected by default out of the box; you can also extend the integration or create custom metrics.

                                                                                                                                                                                                                                                                                                                                                                      Key Benefits

                                                                                                                                                                                                                                                                                                                                                                      • Collects the richest data set for cloud-native visibility and security

                                                                                                                                                                                                                                                                                                                                                                      • Polls data, auto-discover context in order to provide operational and security insights

                                                                                                                                                                                                                                                                                                                                                                      • Extends the power of Prometheus metrics with additional insights from other metrics types and infrastructure stack

                                                                                                                                                                                                                                                                                                                                                                      • Integrate Prometheus alert and events for Kubernetes monitoring needs

                                                                                                                                                                                                                                                                                                                                                                      • Expose application metrics using Java JMX and MBeans monitoring

                                                                                                                                                                                                                                                                                                                                                                      Key Integrations

                                                                                                                                                                                                                                                                                                                                                                      Inbound

                                                                                                                                                                                                                                                                                                                                                                      • Prometheus Metrics

                                                                                                                                                                                                                                                                                                                                                                        Describes how Sysdig Agent enables automatically collecting metrics from Prometheus exporters, how to set up your environment, and scrape Prometheus metrics from local as well as remote hosts.

                                                                                                                                                                                                                                                                                                                                                                      • Java Management Extention (JMX) Metrics

                                                                                                                                                                                                                                                                                                                                                                        Describes how to configure your Java virtual machines so Sysdig Agent can collect JMX metrics using the JMX protocol.

                                                                                                                                                                                                                                                                                                                                                                      • StatsD Metrics

                                                                                                                                                                                                                                                                                                                                                                        Describes how the Sysdig agent collects custom StatsD metrics with an embedded StatsD server.

                                                                                                                                                                                                                                                                                                                                                                      • Node.JS Metrics

                                                                                                                                                                                                                                                                                                                                                                        Illustrates how Sysdig is able to monitor node.js applications by linking a library to the node.js codebase.

                                                                                                                                                                                                                                                                                                                                                                      • Integrate Applications

                                                                                                                                                                                                                                                                                                                                                                        Describes the monitoring capabilities of Sysdig agent with application check scripts or ‘app checks’.

                                                                                                                                                                                                                                                                                                                                                                      • Monitor Log Files

                                                                                                                                                                                                                                                                                                                                                                        Learn how to search a string by using the chisel script called logwatcher.

                                                                                                                                                                                                                                                                                                                                                                      • AWS CloudWatch

                                                                                                                                                                                                                                                                                                                                                                        Illustrates how to configure Sysdig to collect various types of CloudWatch metrics.

                                                                                                                                                                                                                                                                                                                                                                      • Agent Installation

                                                                                                                                                                                                                                                                                                                                                                        Learn how to install Sysdig agents on supported platforms.

                                                                                                                                                                                                                                                                                                                                                                      Oubound

                                                                                                                                                                                                                                                                                                                                                                      • Notification Channels

                                                                                                                                                                                                                                                                                                                                                                        Learn how to add, edit, or delete a variety of notification channel types, and how to disable or delete notifications when they are not needed, for example, during scheduled downtime.

                                                                                                                                                                                                                                                                                                                                                                      • S3 Capture Storage

                                                                                                                                                                                                                                                                                                                                                                        Learn how to configure Sysdig to use an AWS S3 bucket or custom S3 storage for storing Capture files.

                                                                                                                                                                                                                                                                                                                                                                      Platform Metrics (IBM)

                                                                                                                                                                                                                                                                                                                                                                      For Sysdig instances deployed on IBM Cloud Monitoring with Sysdig, an additional form of metrics collection is offered: Platform metrics. Rather than being collected by the Sysdig agent, when enabled, Platform metrics are reported to Sysdig directly by the IBM Cloud infrastructure.

                                                                                                                                                                                                                                                                                                                                                                      Enable this feature by logging into the IBM Cloud console and selecting “Enable” for IBM Platform metrics under the Configure your resource section when creating a new IBM Cloud Monitoring with a Sysdig instance, as described here.

                                                                                                                                                                                                                                                                                                                                                                      8.6.1 -

                                                                                                                                                                                                                                                                                                                                                                      (Legacy) Collect Prometheus Metrics

                                                                                                                                                                                                                                                                                                                                                                      Sysdig supports collecting, storing, and querying Prometheus native metrics and labels. You can use Sysdig in the same way that you use Prometheus and leverage Prometheus Query Language (PromQL) to create dashboards and alerts. Sysdig is compatible with Prometheus HTTP API to query your monitoring data programmatically using PromQL and extend Sysdig to other platforms like Grafana.

                                                                                                                                                                                                                                                                                                                                                                      From a metric collection standpoint, a lightweight Prometheus server is directly embedded into the Sysdig agent to facilitate metric collection. This also supports targets, instances, and jobs with filtering and relabeling using Prometheus syntax. You can configure the agent to identify these processes that expose Prometheus metric endpoints on its own host and send it to the Sysdig collector for storing and further processing.

                                                                                                                                                                                                                                                                                                                                                                      This document uses metric and time series interchangeably. The description of configuration parameters refers to “metric”, but in strict Prometheus terms, those imply time series. That is, applying a limit of 100 metrics implies applying a limit on time series, where all the time series data might not have the same metric name.

                                                                                                                                                                                                                                                                                                                                                                      The Prometheus product itself does not necessarily have to be installed for Prometheus metrics collection.

                                                                                                                                                                                                                                                                                                                                                                      See the Sysdig agent versions and compatibility with Prometheus features:

                                                                                                                                                                                                                                                                                                                                                                      • Latest versions of agent (v12.0.0 and above): The following features are enabled by default:

                                                                                                                                                                                                                                                                                                                                                                        • Automatically scraping any Kubernetes pods with the following annotation set: prometheus.io/scrape=true
                                                                                                                                                                                                                                                                                                                                                                        • Automatically scrape applications supported by Monitoring Integrations.
                                                                                                                                                                                                                                                                                                                                                                      • Sysdig agent prior to v12.0.0: Manually enable Prometheus in dragent.yaml file:

                                                                                                                                                                                                                                                                                                                                                                          prometheus:
                                                                                                                                                                                                                                                                                                                                                                               enabled: true
                                                                                                                                                                                                                                                                                                                                                                        

                                                                                                                                                                                                                                                                                                                                                                      Learn More

                                                                                                                                                                                                                                                                                                                                                                      The following topics describe in detail how to configure the Sysdig agent for service discovery, metrics collection, and further processing.

                                                                                                                                                                                                                                                                                                                                                                      See the following blog posts for additional context on the Prometheus metric and how such metrics are typically used.

                                                                                                                                                                                                                                                                                                                                                                      8.6.1.1 -

                                                                                                                                                                                                                                                                                                                                                                      (Legacy) Working with Prometheus Metrics

                                                                                                                                                                                                                                                                                                                                                                      The Sysdig agent uses its visibility to all running processes (at both the host and container levels) to find eligible targets for scraping Prometheus metrics. By default, no scraping is attempted. Once the feature is enabled, the agent assembles a list of eligible targets, apply filtering rules, and sends back to the Sysdig collector.

                                                                                                                                                                                                                                                                                                                                                                      Latest Prometheus Features

                                                                                                                                                                                                                                                                                                                                                                      Sysdig agents v12.0 or above is required for the following capabilities:

                                                                                                                                                                                                                                                                                                                                                                      Sysdig agents v10.0 or above is required for the following capabilities:

                                                                                                                                                                                                                                                                                                                                                                      • New capabilities of using Prometheus data:

                                                                                                                                                                                                                                                                                                                                                                        • Ability to visualize data using PromQL queries. See Using PromQL.

                                                                                                                                                                                                                                                                                                                                                                        • Create alerts from PromQL-based Dashboards. See Create Panel Alerts.

                                                                                                                                                                                                                                                                                                                                                                        • Backward compatibility for dashboards v2 and alerts.

                                                                                                                                                                                                                                                                                                                                                                          The new PromQL data cannot be visualized by using the Dashboard v2 Histogram. Use time-series based visualization for the histogram metrics.

                                                                                                                                                                                                                                                                                                                                                                      • New metrics limit per agent

                                                                                                                                                                                                                                                                                                                                                                      • 10-second data granularity

                                                                                                                                                                                                                                                                                                                                                                      • Higher retention rate on the new metric store.

                                                                                                                                                                                                                                                                                                                                                                      Prerequisites and Guidelines

                                                                                                                                                                                                                                                                                                                                                                      • Sysdig agent v 10.0.0 and above is required for the latest Prometheus features.

                                                                                                                                                                                                                                                                                                                                                                      • Prometheus feature is enabled in the dragent.yaml file.

                                                                                                                                                                                                                                                                                                                                                                        prometheus:
                                                                                                                                                                                                                                                                                                                                                                          enabled: true
                                                                                                                                                                                                                                                                                                                                                                        

                                                                                                                                                                                                                                                                                                                                                                        See Setting up the Environment for more information.

                                                                                                                                                                                                                                                                                                                                                                      • The endpoints of the target should be available on a TCP connection to the agent. The agent scrapes a target, remote or local, specified by the IP: Port or the URL in dragent.yaml.

                                                                                                                                                                                                                                                                                                                                                                      Service Discovery

                                                                                                                                                                                                                                                                                                                                                                      To use native Prometheus service discovery, enable Promscrape V2 as described in Enable Prometheus Native Service Discovery. This section covers the Sysdig way of service discovery that involves configuring process filters in the Sysdig agent.

                                                                                                                                                                                                                                                                                                                                                                      The way service discovery works in the Sysdig agent differs from that of the Prometheus server. While the Prometheus server has built-in integration with several service discovery mechanisms and the prometheus.yml file to read the configuration settings from, the Sysdig agent auto-discovers any process (exporter or instrumented) that matches the specifications in the dragent.yaml, file and instructs the embedded lightweight Prometheus server to retrieve the metrics from it.

                                                                                                                                                                                                                                                                                                                                                                      The lightweight Prometheus server in the agent is named promscrape and is controlled by the flag of the same name in the dragent.yaml file. See Configuring Sysdig Agent for more information.

                                                                                                                                                                                                                                                                                                                                                                      Unlike the Prometheus server that can scrape processes running on all the machines in a cluster, the agent can scrape only those processes that are running on the host that it is installed on.

                                                                                                                                                                                                                                                                                                                                                                      Within the set of eligible processes/ports/endpoints, the agent scrapes only the ports that are exporting Prometheus metrics and will stop attempting to scrape or retry on ports based on how they respond to attempts to connect and scrape them. It is therefore strongly recommended that you create a configuration that restricts the process and ports for attempted scraping to the minimum expected range for your exporters. This minimizes the potential for unintended side-effects in both the Agent and your applications due to repeated failed connection attempts.

                                                                                                                                                                                                                                                                                                                                                                      The end to end metric collection can be summarized as follows:

                                                                                                                                                                                                                                                                                                                                                                      1. A process is determined to be eligible for possible scraping if it positively matches against a series of Process Filter include/exclude rules. See Process Filter for more information.

                                                                                                                                                                                                                                                                                                                                                                      2. The Agent will then attempt to scrape an eligible process at a /metrics endpoint on all of its listening TCP ports unless the additional configuration is present to restrict scraping to a subset of ports and/or another endpoint name.

                                                                                                                                                                                                                                                                                                                                                                      3. Upon receiving the metrics, the agent applies the following rules before sending them to the Sysdig collector.

                                                                                                                                                                                                                                                                                                                                                                      The metrics ultimately appear in the Sysdig Monitor Explore interface in the Prometheus section.

                                                                                                                                                                                                                                                                                                                                                                      8.6.1.2 -

                                                                                                                                                                                                                                                                                                                                                                      (Legacy) Set Up the Environment

                                                                                                                                                                                                                                                                                                                                                                      Quick Start For Kubernetes Environments

                                                                                                                                                                                                                                                                                                                                                                      Prometheus users who are already leveraging Kubernetes Service Discovery (specifically the approach in this sample prometheus-kubernetes.yml) may already have Annotations attached to the Pods that mark them as eligible for scraping. Such environments can quickly begin scraping the same metrics using the Sysdig Agent in a couple of easy steps.

                                                                                                                                                                                                                                                                                                                                                                      1. Enable the Prometheus metrics feature in the Sysdig Agent. Assuming you are deploying using DaemonSets, the needed config can be added to the Agent’s dragent.yaml by including the following in your DaemonSet YAML (placing it in the env section for the sysdig-agent container):

                                                                                                                                                                                                                                                                                                                                                                        - name: ADDITIONAL_CONF
                                                                                                                                                                                                                                                                                                                                                                          value: "prometheus:\n  enabled: true"
                                                                                                                                                                                                                                                                                                                                                                        
                                                                                                                                                                                                                                                                                                                                                                      2. Ensure the Kubernetes Pods that contain your Prometheus exporters have been deployed with the following Annotations to enable scraping (substituting the listening exporter-TCP-port) :

                                                                                                                                                                                                                                                                                                                                                                        spec:
                                                                                                                                                                                                                                                                                                                                                                          template:
                                                                                                                                                                                                                                                                                                                                                                            metadata:
                                                                                                                                                                                                                                                                                                                                                                              annotations:
                                                                                                                                                                                                                                                                                                                                                                                prometheus.io/scrape: "true"
                                                                                                                                                                                                                                                                                                                                                                                prometheus.io/port: "exporter-TCP-port"
                                                                                                                                                                                                                                                                                                                                                                        

                                                                                                                                                                                                                                                                                                                                                                        The configuration above assumes your exporters use the typical endpoint called /metrics. If an exporter is using a different endpoint, this can also be specified by adding the following additional optional Annotation, substituting the exporter-endpoint-name:

                                                                                                                                                                                                                                                                                                                                                                        prometheus.io/path: "/exporter-endpoint-name"
                                                                                                                                                                                                                                                                                                                                                                        

                                                                                                                                                                                                                                                                                                                                                                      If you try this Kubernetes Deployment of a simple exporter, you will quickly see auto-discovered Prometheus metrics being displayed in Sysdig Monitor. You can use this working example as a basis to similarly Annotate your own exporters.

                                                                                                                                                                                                                                                                                                                                                                      If you have Prometheus exporters not deployed in annotated Kubernetes Pods that you would like to scrape, the following sections describe the full set of options to configure the Agent to find and scrape your metrics.

                                                                                                                                                                                                                                                                                                                                                                      Quick Start for Container Environments

                                                                                                                                                                                                                                                                                                                                                                      In order for Prometheus scraping to work in a Docker-based container environment, set the following labels to the application containers, substituting <exporter-port> and <exporter-path> with the correct port and path where metrics are exported by your application:

                                                                                                                                                                                                                                                                                                                                                                      • io.prometheus.scrape=true

                                                                                                                                                                                                                                                                                                                                                                      • io.prometheus.port=<exporter-port>

                                                                                                                                                                                                                                                                                                                                                                      • io.prometheus.path=<exporter-path>

                                                                                                                                                                                                                                                                                                                                                                      For example, if mysqld-exporter is to be scraped, spin up the container as follows:

                                                                                                                                                                                                                                                                                                                                                                      docker -d -l io.prometheus.scrape=true -l io.prometheus.port=9104 -l io.prometheus.path=/metrics mysqld-exporter
                                                                                                                                                                                                                                                                                                                                                                      
                                                                                                                                                                                                                                                                                                                                                                      

                                                                                                                                                                                                                                                                                                                                                                      8.6.1.3 -

                                                                                                                                                                                                                                                                                                                                                                      (Legacy) Configuring Sysdig Agent

                                                                                                                                                                                                                                                                                                                                                                      This feature is not supported with Promscrape V2. For information on different versions of Promscrape and migrating to the latest version, see Migrating from Promscrape V1 to V2.

                                                                                                                                                                                                                                                                                                                                                                      As is typical for the agent, the default configuration for the feature is specified in dragent.default.yaml, and you can override the defaults by configuring parameters in the dragent.yaml. For each parameter, you do not set in dragent.yaml, the defaults in dragent.default.yaml will remain in effect.

                                                                                                                                                                                                                                                                                                                                                                      Main Configuration Parameters

                                                                                                                                                                                                                                                                                                                                                                      Parameter

                                                                                                                                                                                                                                                                                                                                                                      Default

                                                                                                                                                                                                                                                                                                                                                                      Description

                                                                                                                                                                                                                                                                                                                                                                      prometheus

                                                                                                                                                                                                                                                                                                                                                                      See below

                                                                                                                                                                                                                                                                                                                                                                      Turns Prometheus scraping on and off.

                                                                                                                                                                                                                                                                                                                                                                      process_filter

                                                                                                                                                                                                                                                                                                                                                                      See below

                                                                                                                                                                                                                                                                                                                                                                      Specifies which processes may be eligible for scraping. See [Process Filter](/en/docs/sysdig-monitor/monitoring-integrations/legacy-integrations/legacycollect-prometheus-metrics/configuring-sysdig-agent/#process-filter).

                                                                                                                                                                                                                                                                                                                                                                      use_promscrape

                                                                                                                                                                                                                                                                                                                                                                      See below.

                                                                                                                                                                                                                                                                                                                                                                      Determines whether to use promscrape for scraping Prometheus metrics.

                                                                                                                                                                                                                                                                                                                                                                      promscrape

                                                                                                                                                                                                                                                                                                                                                                      Promscrape is a lightweight Prometheus server that is embedded with the Sysdig agent. The use_promscrape parameter controls whether to use it to scrape Prometheus endpoints.

                                                                                                                                                                                                                                                                                                                                                                      Promscrape has two versions: Promscrape V1 and Promscrape V2. With V1, Sysdig agent discovers scrape targets through the process_filter rules. With V2, promscrape itself discovers targets by using the standard Prometheus configuration, allowing the use of relabel_configs to find or modify targets.

                                                                                                                                                                                                                                                                                                                                                                      Parameters

                                                                                                                                                                                                                                                                                                                                                                      Default

                                                                                                                                                                                                                                                                                                                                                                      Description

                                                                                                                                                                                                                                                                                                                                                                      use_promscrape

                                                                                                                                                                                                                                                                                                                                                                      true

                                                                                                                                                                                                                                                                                                                                                                      prometheus

                                                                                                                                                                                                                                                                                                                                                                      The prometheus section defines the behavior related to Prometheus metrics collection and analysis. It allows for turning the feature on, set a limit from the agent side on the number of metrics to be scraped, and determines whether to report histogram metrics and log failed scrape attempts.

                                                                                                                                                                                                                                                                                                                                                                      Parameter

                                                                                                                                                                                                                                                                                                                                                                      Default

                                                                                                                                                                                                                                                                                                                                                                      Description

                                                                                                                                                                                                                                                                                                                                                                      enabled

                                                                                                                                                                                                                                                                                                                                                                      false

                                                                                                                                                                                                                                                                                                                                                                      Turns Prometheus scraping on and off.

                                                                                                                                                                                                                                                                                                                                                                      interval

                                                                                                                                                                                                                                                                                                                                                                      10

                                                                                                                                                                                                                                                                                                                                                                      How often (in seconds) the agent will scrape a port for Prometheus metrics

                                                                                                                                                                                                                                                                                                                                                                      prom_service_discovery

                                                                                                                                                                                                                                                                                                                                                                      true

                                                                                                                                                                                                                                                                                                                                                                      Enables native Prometheus service discovery. If disabled, promscrape.v1 is used to scrape the targets. See Enable Prometheus Native Service Discovery.

                                                                                                                                                                                                                                                                                                                                                                      On agent versions prior to 11.2, the default is false.

                                                                                                                                                                                                                                                                                                                                                                      max_metrics

                                                                                                                                                                                                                                                                                                                                                                      1000

                                                                                                                                                                                                                                                                                                                                                                      The maximum number of total Prometheus metrics that will be scraped across all targets. This value of 1000 is the maximum per-agent, and is a separate limit from other Custom Metrics. For example, StatsD, JMX, and App Checks.

                                                                                                                                                                                                                                                                                                                                                                      timeout

                                                                                                                                                                                                                                                                                                                                                                      1

                                                                                                                                                                                                                                                                                                                                                                      Used to configure the amount of time the agent will wait while scraping a Prometheus endpoint before timing out. The default value is 1 second.

                                                                                                                                                                                                                                                                                                                                                                      As of agent v10.0, this parameter is only used when promscrape is disabled. Since promscrape is now default, timeout can be considered deprecated, however it is still used when you explicitly disable promscrape.

                                                                                                                                                                                                                                                                                                                                                                      Process Filter

                                                                                                                                                                                                                                                                                                                                                                      The process_filter section specifies which of the processes known by an agent may be eligible for scraping.

                                                                                                                                                                                                                                                                                                                                                                      Note that once you specify a process_filter in your dragent.yaml, this replaces the entire Prometheus process_filter section (i.e. all the rules) shown in the dragent.default.yaml.

                                                                                                                                                                                                                                                                                                                                                                      The Process Filter is specified in a series of include and exclude rules that are evaluated top-to-bottom for each process known by an Agent. If a process matches an include rule, scraping will be attempted via a /metrics endpoint on each listening TCP port for the process, unless a conf section also appears within the rule to further restrict how the process will be scraped. See conf for more information.

                                                                                                                                                                                                                                                                                                                                                                      Multiple patterns can be specified in a single rule, in which case all patterns must match for the rule to be a match (AND logic).

                                                                                                                                                                                                                                                                                                                                                                      Within a pattern value, simple “glob” wildcarding may be used, where * matches any number of characters (including none) and ? matches any single character. Note that due to YAML syntax, when using wildcards, be sure to enclose the value in quotes ("*").

                                                                                                                                                                                                                                                                                                                                                                      The table below describes the supported patterns in Process Filter rules. To provide realistic examples, we’ll use a simple sample Prometheus exporter (source code here) which can be deployed as a container using the Docker command line below. To help illustrate some of the configuration options, this sample exporter presents Prometheus metrics on /prometheus instead of the more common /metrics endpoint, which will be shown in the example configurations further below.

                                                                                                                                                                                                                                                                                                                                                                      # docker run -d -p 8080:8080 \
                                                                                                                                                                                                                                                                                                                                                                          --label class="exporter" \
                                                                                                                                                                                                                                                                                                                                                                          --name my-java-app \
                                                                                                                                                                                                                                                                                                                                                                          luca3m/prometheus-java-app
                                                                                                                                                                                                                                                                                                                                                                      
                                                                                                                                                                                                                                                                                                                                                                      # ps auxww | grep app.jar
                                                                                                                                                                                                                                                                                                                                                                      root     11502 95.9  9.2 3745724 753632 ?      Ssl  15:52   1:42 java -jar /app.jar --management.security.enabled=false
                                                                                                                                                                                                                                                                                                                                                                      
                                                                                                                                                                                                                                                                                                                                                                      # curl http://localhost:8080/prometheus
                                                                                                                                                                                                                                                                                                                                                                      ...
                                                                                                                                                                                                                                                                                                                                                                      random_bucket{le="0.005",} 6.0
                                                                                                                                                                                                                                                                                                                                                                      random_bucket{le="0.01",} 17.0
                                                                                                                                                                                                                                                                                                                                                                      random_bucket{le="0.025",} 51.0
                                                                                                                                                                                                                                                                                                                                                                      ...
                                                                                                                                                                                                                                                                                                                                                                      

                                                                                                                                                                                                                                                                                                                                                                      Pattern name

                                                                                                                                                                                                                                                                                                                                                                      Description

                                                                                                                                                                                                                                                                                                                                                                      Example

                                                                                                                                                                                                                                                                                                                                                                      container.image

                                                                                                                                                                                                                                                                                                                                                                      Matches if the process is running inside a container running the specified image

                                                                                                                                                                                                                                                                                                                                                                      - include:

                                                                                                                                                                                                                                                                                                                                                                      container.image: luca3m/prometheus-java-app

                                                                                                                                                                                                                                                                                                                                                                      container.name

                                                                                                                                                                                                                                                                                                                                                                      Matches if the process is running inside a container with the specified name

                                                                                                                                                                                                                                                                                                                                                                      - include:

                                                                                                                                                                                                                                                                                                                                                                      container.name: my-java-app

                                                                                                                                                                                                                                                                                                                                                                      container.label.*

                                                                                                                                                                                                                                                                                                                                                                      Matches if the process is running in a container that has a Label matching the given value

                                                                                                                                                                                                                                                                                                                                                                      - include:

                                                                                                                                                                                                                                                                                                                                                                      container.label.class: exporter

                                                                                                                                                                                                                                                                                                                                                                      kubernetes.<object>.annotation.* kubernetes.<object>.label.*

                                                                                                                                                                                                                                                                                                                                                                      Matches if the process is attached to a Kubernetes object (Pod, Namespace, etc.) that is marked with the Annotation/Label matching the given value.

                                                                                                                                                                                                                                                                                                                                                                      Note: This pattern does not apply to the Docker-only command-line shown above, but would instead apply if the exporter were installed as a Kubernetes Deployment using this example YAML.

                                                                                                                                                                                                                                                                                                                                                                      Note: See Kubernetes Objects, below, for information on the full set of supported Annotations and Labels.

                                                                                                                                                                                                                                                                                                                                                                      - include:

                                                                                                                                                                                                                                                                                                                                                                      kubernetes.pod.annotation.prometheus.io/scrape: true

                                                                                                                                                                                                                                                                                                                                                                      process.name

                                                                                                                                                                                                                                                                                                                                                                      Matches the name of the running process

                                                                                                                                                                                                                                                                                                                                                                      - include:

                                                                                                                                                                                                                                                                                                                                                                      process.name: java

                                                                                                                                                                                                                                                                                                                                                                      process.cmdline

                                                                                                                                                                                                                                                                                                                                                                      Matches a command line argument

                                                                                                                                                                                                                                                                                                                                                                      - include:

                                                                                                                                                                                                                                                                                                                                                                      process.cmdline: "*app.jar*"

                                                                                                                                                                                                                                                                                                                                                                      port

                                                                                                                                                                                                                                                                                                                                                                      Matches if the process is listening on one or more TCP ports.

                                                                                                                                                                                                                                                                                                                                                                      The pattern for a single rule can specify a single port as shown in this example, or a single range (e.g.8079-8081), but does not support comma-separated lists of ports/ranges.

                                                                                                                                                                                                                                                                                                                                                                      Note: This parameter is only used to confirm if a process is eligible for scraping based on the ports on which it is listening. For example, if a process is listening on one port for application traffic and has a second port open for exporting Prometheus metrics, it would be possible to specify the application port here (but not the exporting port), and the exporting port in the conf section (but not the application port), and the process would be matched as eligible and the exporting port would be scraped.

                                                                                                                                                                                                                                                                                                                                                                      - include:

                                                                                                                                                                                                                                                                                                                                                                      port: 8080

                                                                                                                                                                                                                                                                                                                                                                      appcheck.match

                                                                                                                                                                                                                                                                                                                                                                      Matches if an Application Check with the specific name or pattern is scheduled to run for the process.

                                                                                                                                                                                                                                                                                                                                                                      - exclude:

                                                                                                                                                                                                                                                                                                                                                                      appcheck.match: "*"

                                                                                                                                                                                                                                                                                                                                                                      Instead of the **`include`** examples shown above that would have each matched our process, due to the previously-described ability to combine multiple patterns in a single rule, the following very strict configuration would also have matched:
                                                                                                                                                                                                                                                                                                                                                                      - include:
                                                                                                                                                                                                                                                                                                                                                                          container.image: luca3m/prometheus-java-app
                                                                                                                                                                                                                                                                                                                                                                          container.name: my-java-app
                                                                                                                                                                                                                                                                                                                                                                          container.label.class: exporter
                                                                                                                                                                                                                                                                                                                                                                          process.name: java
                                                                                                                                                                                                                                                                                                                                                                          process.cmdline: "*app.jar*"
                                                                                                                                                                                                                                                                                                                                                                          port: 8080
                                                                                                                                                                                                                                                                                                                                                                      

                                                                                                                                                                                                                                                                                                                                                                      conf

                                                                                                                                                                                                                                                                                                                                                                      Each include rule in the port_filter may include a conf portion that further describes how scraping will be attempted on the eligible process. If a conf portion is not included, scraping will be attempted at a /metrics endpoint on all listening ports of the matching process. The possible settings:

                                                                                                                                                                                                                                                                                                                                                                      Parameter name

                                                                                                                                                                                                                                                                                                                                                                      Description

                                                                                                                                                                                                                                                                                                                                                                      Example

                                                                                                                                                                                                                                                                                                                                                                      port

                                                                                                                                                                                                                                                                                                                                                                      Either a static number for a single TCP port to be scraped, or a container/Kubernetes Label name or Kubernetes Annotation specified in curly braces. If the process is running in a container that is marked with this Label or is attached to a Kubernetes object (Pod, Namespace, etc.) that is marked with this Annotation/Label, scraping will be attempted only on the port specified as the value of the Label/Annotation.

                                                                                                                                                                                                                                                                                                                                                                      Note: The Label/Annotation to match against will not include the text shown in red.

                                                                                                                                                                                                                                                                                                                                                                      Note: See Kubernetes Objectsfor information on the full set of supported Annotations and Labels.

                                                                                                                                                                                                                                                                                                                                                                      Note: If running the exporter inside a container, this should specify the port number that the exporter process in the container is listening on, not the port that the container exposes to the host.

                                                                                                                                                                                                                                                                                                                                                                      port: 8080

                                                                                                                                                                                                                                                                                                                                                                      - or -

                                                                                                                                                                                                                                                                                                                                                                      port: "{container.label.io.prometheus.port}"

                                                                                                                                                                                                                                                                                                                                                                      - or -

                                                                                                                                                                                                                                                                                                                                                                      port: "{kubernetes.pod.annotation.prometheus.io/port}"

                                                                                                                                                                                                                                                                                                                                                                      port_filter

                                                                                                                                                                                                                                                                                                                                                                      A set of include and exclude rules that define the ultimate set of listening TCP ports for an eligible process on which scraping may be attempted. Note that the syntax is different from the port pattern option from within the higher-level include rule in the process_filter. Here a given rule can include single ports, comma-separated lists of ports (enclosed in square brackets), or contiguous port ranges (without brackets).

                                                                                                                                                                                                                                                                                                                                                                      port_filter:

                                                                                                                                                                                                                                                                                                                                                                      - include: 8080 - exclude: [9092,9200,9300] - include: 9090-9100

                                                                                                                                                                                                                                                                                                                                                                      path

                                                                                                                                                                                                                                                                                                                                                                      Either the static specification of an endpoint to be scraped, or a container/Kubernetes Label name or Kubernetes Annotation specified in curly braces. If the process is running in a container that is marked with this Label or is attached to a Kubernetes object (Pod, Namespace, etc.) that is marked with this Annotation/Label, scraping will be attempted via the endpoint specified as the value of the Label/Annotation.

                                                                                                                                                                                                                                                                                                                                                                      If path is not specified, or specified but the Agent does not find the Label/Annotation attached to the process, the common Prometheus exporter default of /metrics will be used.

                                                                                                                                                                                                                                                                                                                                                                      Note: A Label/Annotation to match against will not include the text shown in red.

                                                                                                                                                                                                                                                                                                                                                                      Note: See Kubernetes Objects for information on the full set of supported Annotations and Labels.

                                                                                                                                                                                                                                                                                                                                                                      path: "/prometheus"

                                                                                                                                                                                                                                                                                                                                                                      - or -

                                                                                                                                                                                                                                                                                                                                                                      path: "{container.label.io.prometheus.path}"

                                                                                                                                                                                                                                                                                                                                                                      - or -

                                                                                                                                                                                                                                                                                                                                                                      path: "{kubernetes.pod.annotation.prometheus.io/path}"

                                                                                                                                                                                                                                                                                                                                                                      host

                                                                                                                                                                                                                                                                                                                                                                      A hostname or IP address. The default is localhost.

                                                                                                                                                                                                                                                                                                                                                                      host: 192.168.1.101
                                                                                                                                                                                                                                                                                                                                                                      - or -
                                                                                                                                                                                                                                                                                                                                                                      host: subdomain.example.com
                                                                                                                                                                                                                                                                                                                                                                      - or -
                                                                                                                                                                                                                                                                                                                                                                      host: localhost

                                                                                                                                                                                                                                                                                                                                                                      use_https

                                                                                                                                                                                                                                                                                                                                                                      When set to true, connectivity to the exporter will only be attempted through HTTPS instead of HTTP. It is false by default.

                                                                                                                                                                                                                                                                                                                                                                      (Available in Agent version 0.79.0 and newer)

                                                                                                                                                                                                                                                                                                                                                                      use_https: true

                                                                                                                                                                                                                                                                                                                                                                      ssl_verify

                                                                                                                                                                                                                                                                                                                                                                      When set to true, verification will be performed for the server certificates for an HTTPS connection. It is false by default. Verification was enabled by default before 0.79.0.

                                                                                                                                                                                                                                                                                                                                                                      (Available in Agent version 0.79.0 and newer)

                                                                                                                                                                                                                                                                                                                                                                      ssl_verify: true

                                                                                                                                                                                                                                                                                                                                                                      Authentication Integration

                                                                                                                                                                                                                                                                                                                                                                      As of agent version 0.89, Sysdig can collect Prometheus metrics from endpoints requiring authentication. Use the parameters below to enable this function.

                                                                                                                                                                                                                                                                                                                                                                      • For username/password authentication:

                                                                                                                                                                                                                                                                                                                                                                        • username

                                                                                                                                                                                                                                                                                                                                                                        • password

                                                                                                                                                                                                                                                                                                                                                                      • For authentication using a token:

                                                                                                                                                                                                                                                                                                                                                                        • auth_token_path
                                                                                                                                                                                                                                                                                                                                                                      • For certificate authentication with a certificate key:

                                                                                                                                                                                                                                                                                                                                                                        • auth_cert_path

                                                                                                                                                                                                                                                                                                                                                                        • auth_key_path

                                                                                                                                                                                                                                                                                                                                                                      Token substitution is also supported for all the authorization parameters. For instance a username can be taken from a Kubernetes annotation by specifying

                                                                                                                                                                                                                                                                                                                                                                      username: "{kubernetes.service.annotation.prometheus.openshift.io/username}"

                                                                                                                                                                                                                                                                                                                                                                      conf Authentication Example

                                                                                                                                                                                                                                                                                                                                                                      Below is an example of the dragent.yaml section showing all the Prometheus authentication configuration options, on OpenShift, Kubernetes, and etcd.

                                                                                                                                                                                                                                                                                                                                                                      In this example:

                                                                                                                                                                                                                                                                                                                                                                      • The username/password are taken from a default annotation used by OpenShift.

                                                                                                                                                                                                                                                                                                                                                                      • The auth token path is commonly available in Kubernetes deployments.

                                                                                                                                                                                                                                                                                                                                                                      • The certificate and key used here for etcd may normally not be as easily accessible to the agent. In this case they were extracted from the host namespace, constructed into Kubernetes secrets, and then mounted into the agent container.

                                                                                                                                                                                                                                                                                                                                                                      prometheus:
                                                                                                                                                                                                                                                                                                                                                                        enabled: true
                                                                                                                                                                                                                                                                                                                                                                        process_filter:
                                                                                                                                                                                                                                                                                                                                                                          - include:
                                                                                                                                                                                                                                                                                                                                                                              port: 1936
                                                                                                                                                                                                                                                                                                                                                                              conf:
                                                                                                                                                                                                                                                                                                                                                                                  username: "{kubernetes.service.annotation.prometheus.openshift.io/username}"
                                                                                                                                                                                                                                                                                                                                                                                  password: "{kubernetes.service.annotation.prometheus.openshift.io/password}"
                                                                                                                                                                                                                                                                                                                                                                          - include:
                                                                                                                                                                                                                                                                                                                                                                              process.name: kubelet
                                                                                                                                                                                                                                                                                                                                                                              conf:
                                                                                                                                                                                                                                                                                                                                                                                  port: 10250
                                                                                                                                                                                                                                                                                                                                                                                  use_https: true
                                                                                                                                                                                                                                                                                                                                                                                  auth_token_path: "/run/secrets/kubernetes.io/serviceaccount/token"
                                                                                                                                                                                                                                                                                                                                                                          - include:
                                                                                                                                                                                                                                                                                                                                                                              process.name: etcd
                                                                                                                                                                                                                                                                                                                                                                              conf:
                                                                                                                                                                                                                                                                                                                                                                                  port: 2379
                                                                                                                                                                                                                                                                                                                                                                                  use_https: true
                                                                                                                                                                                                                                                                                                                                                                                  auth_cert_path: "/run/secrets/etcd/client-cert"
                                                                                                                                                                                                                                                                                                                                                                                  auth_key_path: "/run/secrets/etcd/client-key"
                                                                                                                                                                                                                                                                                                                                                                      

                                                                                                                                                                                                                                                                                                                                                                      Kubernetes Objects

                                                                                                                                                                                                                                                                                                                                                                      As described above, there are multiple configuration options that can be set based on auto-discovered values for Kubernetes Labels and/or Annotations. The format in each case begins with "kubernetes.OBJECT.annotation." or "kubernetes.OBJECT.label." where OBJECT can be any of the following supported Kubernetes object types:

                                                                                                                                                                                                                                                                                                                                                                      • daemonSet

                                                                                                                                                                                                                                                                                                                                                                      • deployment

                                                                                                                                                                                                                                                                                                                                                                      • namespace

                                                                                                                                                                                                                                                                                                                                                                      • node

                                                                                                                                                                                                                                                                                                                                                                      • pod

                                                                                                                                                                                                                                                                                                                                                                      • replicaSet

                                                                                                                                                                                                                                                                                                                                                                      • replicationController

                                                                                                                                                                                                                                                                                                                                                                      • service

                                                                                                                                                                                                                                                                                                                                                                      • statefulset

                                                                                                                                                                                                                                                                                                                                                                      The configuration text you add after the final dot becomes the name of the Kubernetes Label/Annotation that the Agent will look for. If the Label/Annotation is discovered attached to the process, the value of that Label/Annotation will be used for the configuration option.

                                                                                                                                                                                                                                                                                                                                                                      Note that there are multiple ways for a Kubernetes Label/Annotation to be attached to a particular process. One of the simplest examples of this is the Pod-based approach shown in Quick Start For Kubernetes Environments. However, as an example alternative to marking at the Pod level, you could attach Labels/Annotations at the Namespace level, in which case auto-discovered configuration options would apply to all processes running in that Namespace regardless of whether they’re in a Deployment, DaemonSet, ReplicaSet, etc.

                                                                                                                                                                                                                                                                                                                                                                      8.6.1.4 -

                                                                                                                                                                                                                                                                                                                                                                      (Legacy) Filtering Prometheus Metrics

                                                                                                                                                                                                                                                                                                                                                                      As of Sysdig agent 9.8.0, a lightweight Prometheus server is embedded in agents named promscrape and a prometheus.yaml file is included as part of configuration files. Using the open source Prometheus capabilities, Sysdig leverages a Prometheus feature to allow you to filter Prometheus metrics at the source before ingestion. To do so, you will:

                                                                                                                                                                                                                                                                                                                                                                      • Ensure that the Prometheus scraping is enabled in the  dragent.yaml file.

                                                                                                                                                                                                                                                                                                                                                                        prometheus:
                                                                                                                                                                                                                                                                                                                                                                          enabled: true
                                                                                                                                                                                                                                                                                                                                                                        
                                                                                                                                                                                                                                                                                                                                                                      • On agent v9.8.0 and above, enable the feature by setting the

                                                                                                                                                                                                                                                                                                                                                                        use_promscrape parameter to true in the dragent.yaml. See Enable Filtering at Ingestion.

                                                                                                                                                                                                                                                                                                                                                                      • Edit the configuration in the prometheus.yaml file. See Edit Prometheus Configuration File.

                                                                                                                                                                                                                                                                                                                                                                        Sysdig-specific configuration is found in the prometheus.yaml file.

                                                                                                                                                                                                                                                                                                                                                                      Enable Filtering at Ingestion

                                                                                                                                                                                                                                                                                                                                                                      On agent v9.8.0, in order for target filtering to work, the use_promscrape parameter in the dragent.yaml must be set to true. For more information on configuration, see Configuring Sysdig Agent.

                                                                                                                                                                                                                                                                                                                                                                      use_promscrape: true
                                                                                                                                                                                                                                                                                                                                                                      

                                                                                                                                                                                                                                                                                                                                                                      On agent v10.0, use_promscrape is enabled by default. Implies, promscrape is used for scraping Prometheus metrics.

                                                                                                                                                                                                                                                                                                                                                                      Filtering configuration is optional. The absence of prometheus.yaml  will not change the existing behavior of the agent.

                                                                                                                                                                                                                                                                                                                                                                      Edit Prometheus Configuration File

                                                                                                                                                                                                                                                                                                                                                                      About the Prometheus Configuration File

                                                                                                                                                                                                                                                                                                                                                                      The prometheus.yaml file contains mostly the filtering/relabeling configuration in a list of key-value pairs, representing target process attributes.

                                                                                                                                                                                                                                                                                                                                                                      You replace keys and values with the desired tags corresponding to your environment.

                                                                                                                                                                                                                                                                                                                                                                      In this file, you will configure the following:

                                                                                                                                                                                                                                                                                                                                                                      • Default scrape interval (optional).

                                                                                                                                                                                                                                                                                                                                                                        For example:

                                                                                                                                                                                                                                                                                                                                                                        scrape_interval: 10s

                                                                                                                                                                                                                                                                                                                                                                      • Of the labeling parameters that Prometheus offers, Sysdig supports only metric_relabel_configs. The relabel_config parameter is not supported.

                                                                                                                                                                                                                                                                                                                                                                      • Zero or more process-specific filtering configurations (optional).

                                                                                                                                                                                                                                                                                                                                                                        See Kubernetes Environments and Docker Environments

                                                                                                                                                                                                                                                                                                                                                                        The filtering configuration includes:

                                                                                                                                                                                                                                                                                                                                                                        • Filtering rules

                                                                                                                                                                                                                                                                                                                                                                          For example:

                                                                                                                                                                                                                                                                                                                                                                          - source_labels: [container_label_io_kubernetes_pod_name]

                                                                                                                                                                                                                                                                                                                                                                        • Limit on number of scraped samples (optional)

                                                                                                                                                                                                                                                                                                                                                                          For example:

                                                                                                                                                                                                                                                                                                                                                                          sample_limit: 2000

                                                                                                                                                                                                                                                                                                                                                                      • Default filtering configuration (optional). The filtering configuration includes:

                                                                                                                                                                                                                                                                                                                                                                        • Filtering rules

                                                                                                                                                                                                                                                                                                                                                                          For example:

                                                                                                                                                                                                                                                                                                                                                                          - source_labels: [car]

                                                                                                                                                                                                                                                                                                                                                                        • Limit on number of scraped samples (optional)

                                                                                                                                                                                                                                                                                                                                                                          For example:

                                                                                                                                                                                                                                                                                                                                                                          sample_limit: 2000

                                                                                                                                                                                                                                                                                                                                                                      The prometheus.yaml file is installed alongside dragent.yaml. For the most part, the syntax of prometheus.yaml complies with the standard Prometheus configuration

                                                                                                                                                                                                                                                                                                                                                                      Default Configuration

                                                                                                                                                                                                                                                                                                                                                                      A configuration with empty key-value pairs is considered a default configuration. The default configuration will be applied to all the processes to be scraped that don’t have a matching filtering configuration. In Sample Prometheus Configuration File, the job_name: 'default' section represents the default configuration.

                                                                                                                                                                                                                                                                                                                                                                      Kubernetes Environments

                                                                                                                                                                                                                                                                                                                                                                      If the agent runs in Kubernetes environments (Open Source/OpenShift/GKE), include the following Kubernetes objects as key-value pairs. See Agent Install: Kubernetes for details on agent installation.

                                                                                                                                                                                                                                                                                                                                                                      For example:

                                                                                                                                                                                                                                                                                                                                                                      sysdig_sd_configs:
                                                                                                                                                                                                                                                                                                                                                                      - tags:
                                                                                                                                                                                                                                                                                                                                                                          namespace: backend
                                                                                                                                                                                                                                                                                                                                                                          deployment: my-api
                                                                                                                                                                                                                                                                                                                                                                      

                                                                                                                                                                                                                                                                                                                                                                      In addition to the aforementioned tags, any of these object types can be matched against:

                                                                                                                                                                                                                                                                                                                                                                      daemonset: my_daemon
                                                                                                                                                                                                                                                                                                                                                                      deployment: my_deployment
                                                                                                                                                                                                                                                                                                                                                                      hpa: my_hpa
                                                                                                                                                                                                                                                                                                                                                                      namespace: my_namespace
                                                                                                                                                                                                                                                                                                                                                                      node: my_node
                                                                                                                                                                                                                                                                                                                                                                      pod: my_pode
                                                                                                                                                                                                                                                                                                                                                                      replicaset: my_replica
                                                                                                                                                                                                                                                                                                                                                                      replicationcontroller: my_controller
                                                                                                                                                                                                                                                                                                                                                                      resourcequota: my_quota
                                                                                                                                                                                                                                                                                                                                                                      service: my_service
                                                                                                                                                                                                                                                                                                                                                                      stateful: my_statefulset
                                                                                                                                                                                                                                                                                                                                                                      

                                                                                                                                                                                                                                                                                                                                                                      For Kubernetes/OpenShift/GKE deployments, prometheus.yaml shares the same ConfigMap with dragent.yaml.

                                                                                                                                                                                                                                                                                                                                                                      Docker Environments

                                                                                                                                                                                                                                                                                                                                                                      In Docker environments, include attributes such as container, host, port, and more. For example:

                                                                                                                                                                                                                                                                                                                                                                      sysdig_sd_configs:
                                                                                                                                                                                                                                                                                                                                                                      - tags:
                                                                                                                                                                                                                                                                                                                                                                          host: my-host
                                                                                                                                                                                                                                                                                                                                                                          port: 8080
                                                                                                                                                                                                                                                                                                                                                                      

                                                                                                                                                                                                                                                                                                                                                                      For Docker-based deployments, prometheus.yaml can be mounted from the host.

                                                                                                                                                                                                                                                                                                                                                                      Sample Prometheus Configuration File

                                                                                                                                                                                                                                                                                                                                                                      global:
                                                                                                                                                                                                                                                                                                                                                                        scrape_interval: 20s
                                                                                                                                                                                                                                                                                                                                                                      scrape_configs:
                                                                                                                                                                                                                                                                                                                                                                      - job_name: 'default'
                                                                                                                                                                                                                                                                                                                                                                        sysdig_sd_configs: # default config
                                                                                                                                                                                                                                                                                                                                                                        relabel_configs:
                                                                                                                                                                                                                                                                                                                                                                      - job_name: 'my-app-job'
                                                                                                                                                                                                                                                                                                                                                                        sample_limit: 2000
                                                                                                                                                                                                                                                                                                                                                                        sysdig_sd_configs:  # apply this filtering config only to my-app
                                                                                                                                                                                                                                                                                                                                                                        - tags:
                                                                                                                                                                                                                                                                                                                                                                            namespace: backend
                                                                                                                                                                                                                                                                                                                                                                            deployment: my-app
                                                                                                                                                                                                                                                                                                                                                                        metric_relabel_configs:
                                                                                                                                                                                                                                                                                                                                                                        # Drop all metrics starting with http_
                                                                                                                                                                                                                                                                                                                                                                        - source_labels: [__name__]
                                                                                                                                                                                                                                                                                                                                                                          regex: "http_(.+)"
                                                                                                                                                                                                                                                                                                                                                                          action: drop
                                                                                                                                                                                                                                                                                                                                                                        metric_relabel_configs:
                                                                                                                                                                                                                                                                                                                                                                        # Drop all metrics for which the city label equals atlantis
                                                                                                                                                                                                                                                                                                                                                                        - source_labels: [city]
                                                                                                                                                                                                                                                                                                                                                                          regex: "atlantis"
                                                                                                                                                                                                                                                                                                                                                                          action: drop
                                                                                                                                                                                                                                                                                                                                                                      

                                                                                                                                                                                                                                                                                                                                                                      8.6.1.5 -

                                                                                                                                                                                                                                                                                                                                                                      (Legacy) Example Configuration

                                                                                                                                                                                                                                                                                                                                                                      This topic introduces you to default and specific Prometheus configurations.

                                                                                                                                                                                                                                                                                                                                                                      Default Configuration

                                                                                                                                                                                                                                                                                                                                                                      As an example that pulls together many of the configuration elements shown above, consider the default Agent configuration that’s inherited from the dragent.default.yaml.

                                                                                                                                                                                                                                                                                                                                                                      prometheus:
                                                                                                                                                                                                                                                                                                                                                                        enabled: true
                                                                                                                                                                                                                                                                                                                                                                        interval: 10
                                                                                                                                                                                                                                                                                                                                                                        log_errors: true
                                                                                                                                                                                                                                                                                                                                                                        max_metrics: 1000
                                                                                                                                                                                                                                                                                                                                                                        max_metrics_per_process: 100
                                                                                                                                                                                                                                                                                                                                                                        max_tags_per_metric: 20
                                                                                                                                                                                                                                                                                                                                                                      
                                                                                                                                                                                                                                                                                                                                                                        # Filtering processes to scan. Processes not matching a rule will not
                                                                                                                                                                                                                                                                                                                                                                        # be scanned
                                                                                                                                                                                                                                                                                                                                                                        # If an include rule doesn't contain a port or port_filter in the conf
                                                                                                                                                                                                                                                                                                                                                                        # section, we will scan all the ports that a matching process is listening to.
                                                                                                                                                                                                                                                                                                                                                                        process_filter:
                                                                                                                                                                                                                                                                                                                                                                          - exclude:
                                                                                                                                                                                                                                                                                                                                                                              process.name: docker-proxy
                                                                                                                                                                                                                                                                                                                                                                          - exclude:
                                                                                                                                                                                                                                                                                                                                                                              container.image: sysdig/agent
                                                                                                                                                                                                                                                                                                                                                                          # special rule to exclude processes matching configured prometheus appcheck
                                                                                                                                                                                                                                                                                                                                                                          - exclude:
                                                                                                                                                                                                                                                                                                                                                                              appcheck.match: prometheus
                                                                                                                                                                                                                                                                                                                                                                          - include:
                                                                                                                                                                                                                                                                                                                                                                              container.label.io.prometheus.scrape: "true"
                                                                                                                                                                                                                                                                                                                                                                              conf:
                                                                                                                                                                                                                                                                                                                                                                                  # Custom path definition
                                                                                                                                                                                                                                                                                                                                                                                  # If the Label doesn't exist we'll still use "/metrics"
                                                                                                                                                                                                                                                                                                                                                                                  path: "{container.label.io.prometheus.path}"
                                                                                                                                                                                                                                                                                                                                                                      
                                                                                                                                                                                                                                                                                                                                                                                  # Port definition
                                                                                                                                                                                                                                                                                                                                                                                  # - If the Label exists, only scan the given port.
                                                                                                                                                                                                                                                                                                                                                                                  # - If it doesn't, use port_filter instead.
                                                                                                                                                                                                                                                                                                                                                                                  # - If there is no port_filter defined, skip this process
                                                                                                                                                                                                                                                                                                                                                                                  port: "{container.label.io.prometheus.port}"
                                                                                                                                                                                                                                                                                                                                                                                  port_filter:
                                                                                                                                                                                                                                                                                                                                                                                      - exclude: [9092,9200,9300]
                                                                                                                                                                                                                                                                                                                                                                                      - include: 9090-9500
                                                                                                                                                                                                                                                                                                                                                                                      - include: [9913,9984,24231,42004]
                                                                                                                                                                                                                                                                                                                                                                          - exclude:
                                                                                                                                                                                                                                                                                                                                                                              container.label.io.prometheus.scrape: "false"
                                                                                                                                                                                                                                                                                                                                                                          - include:
                                                                                                                                                                                                                                                                                                                                                                              kubernetes.pod.annotation.prometheus.io/scrape: true
                                                                                                                                                                                                                                                                                                                                                                              conf:
                                                                                                                                                                                                                                                                                                                                                                                  path: "{kubernetes.pod.annotation.prometheus.io/path}"
                                                                                                                                                                                                                                                                                                                                                                                  port: "{kubernetes.pod.annotation.prometheus.io/port}"
                                                                                                                                                                                                                                                                                                                                                                          - exclude:
                                                                                                                                                                                                                                                                                                                                                                              kubernetes.pod.annotation.prometheus.io/scrape: false
                                                                                                                                                                                                                                                                                                                                                                      

                                                                                                                                                                                                                                                                                                                                                                      Consider the following about this default configuration:

                                                                                                                                                                                                                                                                                                                                                                      • All Prometheus scraping is disabled by default. To enable the entire configuration shown here, you would only need to add the following to your dragent.yaml:

                                                                                                                                                                                                                                                                                                                                                                        prometheus:
                                                                                                                                                                                                                                                                                                                                                                          enabled: true
                                                                                                                                                                                                                                                                                                                                                                        

                                                                                                                                                                                                                                                                                                                                                                        Enabling this option and any pods (in case of Kubernetes) that have the right annotation set or containers (if not) that have the labels set will automatically be scrapped.

                                                                                                                                                                                                                                                                                                                                                                      • Once enabled, this default configuration is ideal for the use case described in the Quick Start For Kubernetes Environments.

                                                                                                                                                                                                                                                                                                                                                                      • A Process Filter rule excludes processes that are likely to exist in most environments but are known to never export Prometheus metrics, such as the Docker Proxy and the Agent itself.

                                                                                                                                                                                                                                                                                                                                                                      • Another Process Filter rule ensures that any processes configured to be scraped by the legacy Prometheus application check will not be scraped.

                                                                                                                                                                                                                                                                                                                                                                      • Another Process Filter rule is tailored to use container Labels. Processes marked with the container Label io.prometheus.scrape will become eligible for scraping, and if further marked with container Labels io.prometheus.port and/or io.prometheus.path, scraping will be attempted only on this port and/or endpoint. If the container is not marked with the specified path Label, scraping the /metrics endpoint will be attempted. If the container is not marked with the specified port Label, any listening ports in the port_filter will be attempted for scraping (this port_filter in the default is set for the range of ports for common Prometheus exporters, with exclusions for ports in the range that are known to be used by other applications that are not exporters).

                                                                                                                                                                                                                                                                                                                                                                      • The final Process Filter Include rule is tailored to the use case described in the Quick Start For Kubernetes Environments.

                                                                                                                                                                                                                                                                                                                                                                      Scrape a Single Custom Process

                                                                                                                                                                                                                                                                                                                                                                      If you need to scrape a single custom process, for instance, a java process listening on port 9000 with path /prometheus, add the following to the dragent.yaml:

                                                                                                                                                                                                                                                                                                                                                                      prometheus:
                                                                                                                                                                                                                                                                                                                                                                        enabled: true
                                                                                                                                                                                                                                                                                                                                                                        process_filter:
                                                                                                                                                                                                                                                                                                                                                                          - include:
                                                                                                                                                                                                                                                                                                                                                                              process.name: java
                                                                                                                                                                                                                                                                                                                                                                              port: 9000
                                                                                                                                                                                                                                                                                                                                                                              conf:
                                                                                                                                                                                                                                                                                                                                                                                # ensure we only scrape port 9000 as opposed to all ports this process may be listening to
                                                                                                                                                                                                                                                                                                                                                                                port: 9000
                                                                                                                                                                                                                                                                                                                                                                                path: "/prometheus"
                                                                                                                                                                                                                                                                                                                                                                      

                                                                                                                                                                                                                                                                                                                                                                      This configuration overrides the default process_filter section shown in Default Configuration. You can add relevant rules from the default configuration to this to further filter down the metrics.

                                                                                                                                                                                                                                                                                                                                                                      port has different purposes depending on where it’s placed in the configuration. When placed under the include section, it is a condition for matching the include rule.

                                                                                                                                                                                                                                                                                                                                                                      Placing a port under conf indicates that only that particular port is scraped when the rule is matched as opposed to all the ports that the process could be listening on.

                                                                                                                                                                                                                                                                                                                                                                      In this example, the first rule will be matched for the Java process listening on port 9000. The java process listening only on port 9000 will be scrapped.

                                                                                                                                                                                                                                                                                                                                                                      Scrape a Single Custom Process Based on Container Labels

                                                                                                                                                                                                                                                                                                                                                                      If you still want to scrape based on container labels, you could just append the relevant rules from the defaults to the process_filter. For example:

                                                                                                                                                                                                                                                                                                                                                                      prometheus:
                                                                                                                                                                                                                                                                                                                                                                        enabled: true
                                                                                                                                                                                                                                                                                                                                                                        process_filter:
                                                                                                                                                                                                                                                                                                                                                                          - include:
                                                                                                                                                                                                                                                                                                                                                                              process.name: java
                                                                                                                                                                                                                                                                                                                                                                              port: 9000
                                                                                                                                                                                                                                                                                                                                                                              conf:
                                                                                                                                                                                                                                                                                                                                                                                # ensure we only scrape port 9000 as opposed to all ports this process may be listening to
                                                                                                                                                                                                                                                                                                                                                                                port: 9000
                                                                                                                                                                                                                                                                                                                                                                                path: "/prometheus"
                                                                                                                                                                                                                                                                                                                                                                          - exclude:
                                                                                                                                                                                                                                                                                                                                                                              process.name: docker-proxy
                                                                                                                                                                                                                                                                                                                                                                          - include:
                                                                                                                                                                                                                                                                                                                                                                              container.label.io.prometheus.scrape: "true"
                                                                                                                                                                                                                                                                                                                                                                              conf:
                                                                                                                                                                                                                                                                                                                                                                                  path: "{container.label.io.prometheus.path}"
                                                                                                                                                                                                                                                                                                                                                                                  port: "{container.label.io.prometheus.port}"
                                                                                                                                                                                                                                                                                                                                                                      

                                                                                                                                                                                                                                                                                                                                                                      port has a different meaning depending on where it’s placed in the configuration. When placed under the include section, it’s a condition for matching the include rule.

                                                                                                                                                                                                                                                                                                                                                                      Placing port under conf indicates that only that port is scraped when the rule is matched as opposed to all the ports that the process could be listening on.

                                                                                                                                                                                                                                                                                                                                                                      In this example, the first rule will be matched for the process listening on port 9000. The java process listening only on port 9000 will be scrapped.

                                                                                                                                                                                                                                                                                                                                                                      Container Environment

                                                                                                                                                                                                                                                                                                                                                                      With this default configuration enabled, a containerized install of our example exporter shown below would be automatically scraped via the Agent.

                                                                                                                                                                                                                                                                                                                                                                      # docker run -d -p 8080:8080 \
                                                                                                                                                                                                                                                                                                                                                                          --label io.prometheus.scrape="true" \
                                                                                                                                                                                                                                                                                                                                                                          --label io.prometheus.port="8080" \
                                                                                                                                                                                                                                                                                                                                                                          --label io.prometheus.path="/prometheus" \
                                                                                                                                                                                                                                                                                                                                                                          luca3m/prometheus-java-app
                                                                                                                                                                                                                                                                                                                                                                      

                                                                                                                                                                                                                                                                                                                                                                      Kubernetes Environment

                                                                                                                                                                                                                                                                                                                                                                      In a Kubernetes-based environment, a Deployment with the Annotations as shown in this example YAML would be scraped by enabling the default configuration.

                                                                                                                                                                                                                                                                                                                                                                      apiVersion: extensions/v1beta1
                                                                                                                                                                                                                                                                                                                                                                      kind: Deployment
                                                                                                                                                                                                                                                                                                                                                                      metadata:
                                                                                                                                                                                                                                                                                                                                                                        name: prometheus-java-app
                                                                                                                                                                                                                                                                                                                                                                      spec:
                                                                                                                                                                                                                                                                                                                                                                        replicas: 1
                                                                                                                                                                                                                                                                                                                                                                        template:
                                                                                                                                                                                                                                                                                                                                                                          metadata:
                                                                                                                                                                                                                                                                                                                                                                            labels:
                                                                                                                                                                                                                                                                                                                                                                              app: prometheus-java-app
                                                                                                                                                                                                                                                                                                                                                                            annotations:
                                                                                                                                                                                                                                                                                                                                                                              prometheus.io/scrape: "true"
                                                                                                                                                                                                                                                                                                                                                                              prometheus.io/path: "/prometheus"
                                                                                                                                                                                                                                                                                                                                                                              prometheus.io/port: "8080"
                                                                                                                                                                                                                                                                                                                                                                          spec:
                                                                                                                                                                                                                                                                                                                                                                            containers:
                                                                                                                                                                                                                                                                                                                                                                              - name: prometheus-java-app
                                                                                                                                                                                                                                                                                                                                                                                image: luca3m/prometheus-java-app
                                                                                                                                                                                                                                                                                                                                                                                imagePullPolicy: Always
                                                                                                                                                                                                                                                                                                                                                                      

                                                                                                                                                                                                                                                                                                                                                                      Non-Containerized Environment

                                                                                                                                                                                                                                                                                                                                                                      This is an example of a non-containerized environment or a containerized environment that doesn’t use Labels or Annotations. The following dragent.yaml would override the default and do per-second scrapes of our sample exporter and also a second exporter on port 5005, each at their respective non-standard endpoints. This can be thought of as a conservative “whitelist” type of configuration since it restricts scraping to only exporters that are known to exist in the environment and the ports on which they’re known to export Prometheus metrics.

                                                                                                                                                                                                                                                                                                                                                                      prometheus:
                                                                                                                                                                                                                                                                                                                                                                        enabled: true
                                                                                                                                                                                                                                                                                                                                                                        interval: 1
                                                                                                                                                                                                                                                                                                                                                                        process_filter:
                                                                                                                                                                                                                                                                                                                                                                          - include:
                                                                                                                                                                                                                                                                                                                                                                              process.cmdline: "*app.jar*"
                                                                                                                                                                                                                                                                                                                                                                              conf:
                                                                                                                                                                                                                                                                                                                                                                                port: 8080
                                                                                                                                                                                                                                                                                                                                                                                path: "/prometheus"
                                                                                                                                                                                                                                                                                                                                                                          - include:
                                                                                                                                                                                                                                                                                                                                                                              port: 5005
                                                                                                                                                                                                                                                                                                                                                                              conf:
                                                                                                                                                                                                                                                                                                                                                                                port: 5005
                                                                                                                                                                                                                                                                                                                                                                                path: "/wacko"
                                                                                                                                                                                                                                                                                                                                                                      

                                                                                                                                                                                                                                                                                                                                                                      port has a different meaning depending on where it’s placed in the configuration. When placed under the include section, it’s a condition for matching the include rule. Placing port under conf indicates that only that port is scraped when the rule is matched as opposed to all the ports that the process could be listening on.

                                                                                                                                                                                                                                                                                                                                                                      In this example, the first rule will be matched for the process *app.jar*. The java process listening only on port 8080 will be scrapped as opposed to all the ports that *app.jar* could be listening on. The second rule will be matched for port 5005 and the process listening only on 5005 will be scraped.

                                                                                                                                                                                                                                                                                                                                                                      8.6.1.6 -

                                                                                                                                                                                                                                                                                                                                                                      (Legacy) Logging and Troubleshooting

                                                                                                                                                                                                                                                                                                                                                                      Logging

                                                                                                                                                                                                                                                                                                                                                                      After the Agent begins scraping Prometheus metrics, there may be a delay of up to a few minutes before the metrics become visible in Sysdig Monitor. To help quickly confirm your configuration is correct, starting with Agent version 0.80.0, the following log line will appear in the Agent log the first time since starting that it has found and is successfully scraping at least one Prometheus exporter:

                                                                                                                                                                                                                                                                                                                                                                      2018-05-04 21:42:10.048, 8820, Information, 05-04 21:42:10.048324 Starting export of Prometheus metrics
                                                                                                                                                                                                                                                                                                                                                                      

                                                                                                                                                                                                                                                                                                                                                                      As this is an INFO level log message, it will appear in Agents using the default logging settings. To reveal even more detail,increase the Agent log level to DEBUG , which produces a message like the following that reveals the name of a specific metric first detected. You can then look for this metric to be visible in Sysdig Monitor shortly after.

                                                                                                                                                                                                                                                                                                                                                                      2018-05-04 21:50:46.068, 11212, Debug, 05-04 21:50:46.068141 First prometheus metrics since agent start: pid 9583: 5 metrics including: randomSummary.95percentile
                                                                                                                                                                                                                                                                                                                                                                      

                                                                                                                                                                                                                                                                                                                                                                      Troubleshooting

                                                                                                                                                                                                                                                                                                                                                                      See the previous section for information on expected log messages during successful scraping. If you have enabled Prometheus and are not seeing the Starting export message shown there, revisit your configuration.

                                                                                                                                                                                                                                                                                                                                                                      It is also suggested to leave the configuration option in its default setting of log_errors: true , which will reveal any issues scraping eligible processes in the Agent log.

                                                                                                                                                                                                                                                                                                                                                                      For example, here is an error message for a failed scrape of a TCP port that was listening but not accepting HTTP requests:

                                                                                                                                                                                                                                                                                                                                                                      2017-10-13 22:00:12.076, 4984, Error, sdchecks[4987] Exception on running check prometheus.5000: Exception('Timeout when hitting http://localhost:5000/metrics',)
                                                                                                                                                                                                                                                                                                                                                                      2017-10-13 22:00:12.076, 4984, Error, sdchecks, Traceback (most recent call last):
                                                                                                                                                                                                                                                                                                                                                                      2017-10-13 22:00:12.076, 4984, Error, sdchecks, File "/opt/draios/lib/python/sdchecks.py", line 246, in run
                                                                                                                                                                                                                                                                                                                                                                      2017-10-13 22:00:12.076, 4984, Error, sdchecks, self.check_instance.check(self.instance_conf)
                                                                                                                                                                                                                                                                                                                                                                      2017-10-13 22:00:12.076, 4984, Error, sdchecks, File "/opt/draios/lib/python/checks.d/prometheus.py", line 44, in check
                                                                                                                                                                                                                                                                                                                                                                      2017-10-13 22:00:12.076, 4984, Error, sdchecks, metrics = self.get_prometheus_metrics(query_url, timeout, "prometheus")
                                                                                                                                                                                                                                                                                                                                                                      2017-10-13 22:00:12.076, 4984, Error, sdchecks, File "/opt/draios/lib/python/checks.d/prometheus.py", line 105, in get_prometheus_metrics
                                                                                                                                                                                                                                                                                                                                                                      2017-10-13 22:00:12.077, 4984, Error, sdchecks, raise Exception("Timeout when hitting %s" % url)
                                                                                                                                                                                                                                                                                                                                                                      2017-10-13 22:00:12.077, 4984, Error, sdchecks, Exception: Timeout when hitting http://localhost:5000/metrics
                                                                                                                                                                                                                                                                                                                                                                      

                                                                                                                                                                                                                                                                                                                                                                      Here is an example error message for a failed scrape of a port that was responding to HTTP requests on the /metrics endpoint but not responding with valid Prometheus-format data. The invalid endpoint is responding as follows:

                                                                                                                                                                                                                                                                                                                                                                      # curl http://localhost:5002/metrics
                                                                                                                                                                                                                                                                                                                                                                      This ain't no Prometheus metrics!
                                                                                                                                                                                                                                                                                                                                                                      

                                                                                                                                                                                                                                                                                                                                                                      And the corresponding error message in the Agent log, indicating no further scraping will be attempted after the initial failure:

                                                                                                                                                                                                                                                                                                                                                                      2017-10-13 22:03:05.081, 5216, Information, sdchecks[5219] Skip retries for Prometheus error: could not convert string to float: ain't
                                                                                                                                                                                                                                                                                                                                                                      2017-10-13 22:03:05.082, 5216, Error, sdchecks[5219] Exception on running check prometheus.5002: could not convert string to float: ain't
                                                                                                                                                                                                                                                                                                                                                                      
                                                                                                                                                                                                                                                                                                                                                                      

                                                                                                                                                                                                                                                                                                                                                                      8.6.1.7 -

                                                                                                                                                                                                                                                                                                                                                                      This feature is not supported with Promscrape V2. For information on different versions of Promscrape and migrating to the latest version, see Migrating from Promscrape V1 to V2.

                                                                                                                                                                                                                                                                                                                                                                      (Legacy) Collecting Prometheus Metrics from Remote Hosts

                                                                                                                                                                                                                                                                                                                                                                      Sysdig Monitor can collect Prometheus metrics from remote endpoints with minimum configuration. Remote endpoints (remote hosts) refer to hosts where Sysdig Agent cannot be deployed. For example, a Kubernetes master node on managed Kubernetes services such as GKE and EKS where user workload cannot be deployed, which in turn implies no Agents involved. Enabling remote scraping on such hosts is as simple as identifying an Agent to perform the scraping and declaring the endpoint configurations with a remote services section in the Agent configuration file.

                                                                                                                                                                                                                                                                                                                                                                      The collected Prometheus metrics are reported under and associated with the Agent that performed the scraping as opposed to associating them with a process.

                                                                                                                                                                                                                                                                                                                                                                      Preparing the Configuration File

                                                                                                                                                                                                                                                                                                                                                                      Multiple Agents can share the same configuration. Therefore, determine which one of those Agents scrape the remote endpoints with the dragent.yaml file. This is applicable to both

                                                                                                                                                                                                                                                                                                                                                                      • Create a separate configuration section for remote services in the Agent configuration file under the prometheus configuration.

                                                                                                                                                                                                                                                                                                                                                                      • Include a configuration section for each remote endpoint, and add either a URL or host/port (and an optional path) parameter to each section to identify the endpoint to scrape. The optional path identifies the resource at the endpoint. An empty path parameter defaults to the "/metrics" endpoint for scraping.

                                                                                                                                                                                                                                                                                                                                                                      • Optionally, add custom tags for each endpoint configuration for remote services. In the absence of tags, metric reporting might not work as expected when multiple endpoints are involved. Agents cannot distinguish similar metrics scraped from multiple endpoints unless those metrics are uniquely identified by tags.

                                                                                                                                                                                                                                                                                                                                                                      To help you get started, an example configuration for Kubernetes is given below:

                                                                                                                                                                                                                                                                                                                                                                      prometheus:
                                                                                                                                                                                                                                                                                                                                                                        remote_services:
                                                                                                                                                                                                                                                                                                                                                                              - prom_1:
                                                                                                                                                                                                                                                                                                                                                                                  kubernetes.node.annotation.sysdig.com/region: europe
                                                                                                                                                                                                                                                                                                                                                                                  kubernetes.node.annotation.sysdig.com/scraper: true
                                                                                                                                                                                                                                                                                                                                                                                  conf:
                                                                                                                                                                                                                                                                                                                                                                                      url: "https://xx.xxx.xxx.xy:5005/metrics"
                                                                                                                                                                                                                                                                                                                                                                                      tags:
                                                                                                                                                                                                                                                                                                                                                                                          host: xx.xxx.xxx.xy
                                                                                                                                                                                                                                                                                                                                                                                          service: prom_1
                                                                                                                                                                                                                                                                                                                                                                                          scraping_node: "{kubernetes.node.name}"
                                                                                                                                                                                                                                                                                                                                                                              - prom_2:
                                                                                                                                                                                                                                                                                                                                                                                  kubernetes.node.annotation.sysdig.com/region: india
                                                                                                                                                                                                                                                                                                                                                                                  kubernetes.node.annotation.sysdig.com/scraper: true
                                                                                                                                                                                                                                                                                                                                                                                  conf:
                                                                                                                                                                                                                                                                                                                                                                                      host: xx.xxx.xxx.yx
                                                                                                                                                                                                                                                                                                                                                                                      port: 5005
                                                                                                                                                                                                                                                                                                                                                                                      use_https: true
                                                                                                                                                                                                                                                                                                                                                                                      tags:
                                                                                                                                                                                                                                                                                                                                                                                          host: xx.xxx.xxx.yx
                                                                                                                                                                                                                                                                                                                                                                                          service: prom_2
                                                                                                                                                                                                                                                                                                                                                                                          scraping_node: "{kubernetes.node.name}"
                                                                                                                                                                                                                                                                                                                                                                              - prom_3:
                                                                                                                                                                                                                                                                                                                                                                                  kubernetes.pod.annotation.sysdig.com/prom_3_scraper: true
                                                                                                                                                                                                                                                                                                                                                                                  conf:
                                                                                                                                                                                                                                                                                                                                                                                      url: "{kubernetes.pod.annotation.sysdig.com/prom_3_url}"
                                                                                                                                                                                                                                                                                                                                                                                      tags:
                                                                                                                                                                                                                                                                                                                                                                                          service: prom_3
                                                                                                                                                                                                                                                                                                                                                                                          scraping_node: "{kubernetes.node.name}"
                                                                                                                                                                                                                                                                                                                                                                              - haproxy:
                                                                                                                                                                                                                                                                                                                                                                                  kubernetes.node.annotation.yourhost.com/haproxy_scraper: true
                                                                                                                                                                                                                                                                                                                                                                                  conf:
                                                                                                                                                                                                                                                                                                                                                                                      host: "mymasternode"
                                                                                                                                                                                                                                                                                                                                                                                      port: 1936
                                                                                                                                                                                                                                                                                                                                                                                      path: "/metrics"
                                                                                                                                                                                                                                                                                                                                                                                      username: "{kubernetes.node.annotation.yourhost.com/haproxy_username}"
                                                                                                                                                                                                                                                                                                                                                                                      password: "{kubernetes.node.annotation.yourhost.com/haproxy_password}"
                                                                                                                                                                                                                                                                                                                                                                                      tags:
                                                                                                                                                                                                                                                                                                                                                                                          service: router
                                                                                                                                                                                                                                                                                                                                                                      

                                                                                                                                                                                                                                                                                                                                                                      In the above example, scraping is triggered by node and pod annotations. You can add annotations to nodes and pods by using the kubectl annotate command as follows:

                                                                                                                                                                                                                                                                                                                                                                      kubectl annotate node mynode --overwrite sysdig.com/region=india sysdig.com/scraper=true haproxy_scraper=true yourhost.com/haproxy_username=admin yourhost.com/haproxy_password=admin
                                                                                                                                                                                                                                                                                                                                                                      

                                                                                                                                                                                                                                                                                                                                                                      In this example, you set annotation on a node to trigger scraping of the prom2 and haproxy services as defined in the above configuration.

                                                                                                                                                                                                                                                                                                                                                                      Preparing Container Environments

                                                                                                                                                                                                                                                                                                                                                                      An example configuration for Docker environment is given below:

                                                                                                                                                                                                                                                                                                                                                                      prometheus:
                                                                                                                                                                                                                                                                                                                                                                        remote_services:
                                                                                                                                                                                                                                                                                                                                                                              - prom_container:
                                                                                                                                                                                                                                                                                                                                                                                  container.label.com.sysdig.scrape_xyz: true
                                                                                                                                                                                                                                                                                                                                                                                  conf:
                                                                                                                                                                                                                                                                                                                                                                                      url: "https://xyz:5005/metrics"
                                                                                                                                                                                                                                                                                                                                                                                      tags:
                                                                                                                                                                                                                                                                                                                                                                                          host: xyz
                                                                                                                                                                                                                                                                                                                                                                                          service: xyz
                                                                                                                                                                                                                                                                                                                                                                      

                                                                                                                                                                                                                                                                                                                                                                      In order for remote scraping to work in a Docker-based container environment, set the com.sysdig.scrape_xyz=true label to the Agent container. For example:

                                                                                                                                                                                                                                                                                                                                                                      docker run -d --name sysdig-agent --restart always --privileged --net host --pid host -e ACCESS_KEY=<KEY> -e COLLECTOR=<COLLECTOR> -e SECURE=true -e TAGS=example_tag:example_value -v /var/run/docker.sock:/host/var/run/docker.sock -v /dev:/host/dev -v /proc:/host/proc:ro -v /boot:/host/boot:ro -v /lib/modules:/host/lib/modules:ro -v /usr:/host/usr:ro --shm-size=512m sysdig/agent
                                                                                                                                                                                                                                                                                                                                                                      

                                                                                                                                                                                                                                                                                                                                                                      Substitute <KEY>, <COLLECTOR>, TAGS with your account key, collector, and tags respectively.

                                                                                                                                                                                                                                                                                                                                                                      Syntax of the Rules

                                                                                                                                                                                                                                                                                                                                                                      The syntax of the rules for the remote_services is almost identical to those of process_filter with an exception to the include/exclude rule. The remote_services section does not use include/exclude rules. The process_filter uses include and exclude rules of which only the first match against a process is applied, whereas, in the remote_services section, each rule has a corresponding service name and all the matching rules are applied.

                                                                                                                                                                                                                                                                                                                                                                      Rule Conditions

                                                                                                                                                                                                                                                                                                                                                                      The rule conditions work the same way as those for the process_filter. The only caveat is that the rules will be matched against the Agent process and container because the remote process/context is unknown. Therefore, matches for container labels and annotations work as before but they must be applicable to the Agent container as well. For instance, node annotations will apply because the Agent container runs on a node.

                                                                                                                                                                                                                                                                                                                                                                      For annotations, multiple patterns can be specified in a single rule, in which case all patterns must match for the rule to be a match (AND operator). In the following example, the endpoint will not be considered unless both the annotations match:

                                                                                                                                                                                                                                                                                                                                                                      kubernetes.node.annotation.sysdig.com/region_scraper: europe
                                                                                                                                                                                                                                                                                                                                                                      kubernetes.node.annotation.sysdig.com/scraper: true
                                                                                                                                                                                                                                                                                                                                                                      

                                                                                                                                                                                                                                                                                                                                                                      That is, Kubernetes nodes belonging to only the Europe region are considered for scraping.

                                                                                                                                                                                                                                                                                                                                                                      Authenticating Sysdig Agent

                                                                                                                                                                                                                                                                                                                                                                      Sysdig Agent requires necessary permissions on the remote host to scrape for metrics. The authentication methods for local scraping works for authenticating agents on remote hosts as well, but the authorization parameters work only in the agent context.

                                                                                                                                                                                                                                                                                                                                                                      • Authentication based on certificate-key pair requires it to be constructed into Kubernetes secret and mounted to the agent.

                                                                                                                                                                                                                                                                                                                                                                      • In token-based authentication, make sure the agent token has access rights on the remote endpoint to do the scraping.

                                                                                                                                                                                                                                                                                                                                                                      • Use annotation to retrieve username/password instead of passing them in plaintext. Any annotation enclosed in curly braces will be replaced by the value of the annotation. If the annotation doesn’t exist the value will be an empty string. Token substitution is supported for all the authorization parameters. Because authorization works only in the Agent context, credentials cannot be automatically retrieved from the target pod. Therefore, use an annotation in the Agent pod to pass them. To do so, set the password into an annotation for the selected Kubernetes object.

                                                                                                                                                                                                                                                                                                                                                                      In the following example, an HAProxy account is authenticated with the password supplied in the yourhost.com/haproxy_password annotation on the agent node.

                                                                                                                                                                                                                                                                                                                                                                      - haproxy:
                                                                                                                                                                                                                                                                                                                                                                                  kubernetes.node.annotation.yourhost.com/haproxy_scraper: true
                                                                                                                                                                                                                                                                                                                                                                                  conf:
                                                                                                                                                                                                                                                                                                                                                                                      host: "mymasternode"
                                                                                                                                                                                                                                                                                                                                                                                      port: 1936
                                                                                                                                                                                                                                                                                                                                                                                      path: "/metrics"
                                                                                                                                                                                                                                                                                                                                                                                      username: "{kubernetes.node.annotation.yourhost.com/haproxy_username}"
                                                                                                                                                                                                                                                                                                                                                                                      password: "{kubernetes.node.annotation.yourhost.com/haproxy_password}"
                                                                                                                                                                                                                                                                                                                                                                                      tags:
                                                                                                                                                                                                                                                                                                                                                                                          service: router
                                                                                                                                                                                                                                                                                                                                                                      

                                                                                                                                                                                                                                                                                                                                                                      8.6.2 -

                                                                                                                                                                                                                                                                                                                                                                      (Legacy) Integrate Applications (Default App Checks)

                                                                                                                                                                                                                                                                                                                                                                      We are sunsetting application checks in favor of Monitoring Integrations.

                                                                                                                                                                                                                                                                                                                                                                      The Sysdig agent supports additional application monitoring capabilities with application check scripts or ‘app checks’. These are a set of plugins that poll for custom metrics from the specific applications which export them via status or management pages: e.g. NGINX, Redis, MongoDB, Memcached and more.

                                                                                                                                                                                                                                                                                                                                                                      Many app checks are enabled by default in the agent and when a supported application is found, the correct app check script will be called and metrics polled automatically.

                                                                                                                                                                                                                                                                                                                                                                      However, if default connection parameters are changed in your application, you will need to modify the app check connection parameters in the Sysdig Agent configuration file (dragent.yaml) to match your application.

                                                                                                                                                                                                                                                                                                                                                                      In some cases, you may also need to enable the metrics reporting functionality in the application before the agent can poll them.

                                                                                                                                                                                                                                                                                                                                                                      This page details how to make configuration changes in the agent’s configuration file, and provides an application integration example. Click the Supported Applications links for application-specific details.

                                                                                                                                                                                                                                                                                                                                                                      Python Version for App Checks:

                                                                                                                                                                                                                                                                                                                                                                      As of agent version 9.9.0, the default version of Python used for app checks is Python 3.

                                                                                                                                                                                                                                                                                                                                                                      Python 2 can still be used by setting the following option in your dragent.yaml:

                                                                                                                                                                                                                                                                                                                                                                      python_binary: <path to python 2.7 binary>

                                                                                                                                                                                                                                                                                                                                                                      For containerized agents, this path will be: /usr/bin/python2.7

                                                                                                                                                                                                                                                                                                                                                                      Edit dragent.yaml to Integrate or Modify Application Checks

                                                                                                                                                                                                                                                                                                                                                                      Out of the box, the Sysdig agent will gather and report on a wide variety of pre-defined metrics. It can also accommodate any number of custom parameters for additional metrics collection.

                                                                                                                                                                                                                                                                                                                                                                      The agent relies on a pair of configuration files to define metrics collection parameters:

                                                                                                                                                                                                                                                                                                                                                                      dragent.default.yaml

                                                                                                                                                                                                                                                                                                                                                                      The core configuration file. You can look at it to understand more about the default configurations provided.

                                                                                                                                                                                                                                                                                                                                                                      Location: "/opt/draios/etc/dragent.default.yaml."

                                                                                                                                                                                                                                                                                                                                                                      CAUTION. This file should never be edited.

                                                                                                                                                                                                                                                                                                                                                                      dragent.yaml

                                                                                                                                                                                                                                                                                                                                                                      The configuration file where parameters can be added, either directly in YAML as name/value pairs, or using environment variables such as 'ADDITIONAL_CONF." Location: "/opt/draios/etc/dragent.yaml."

                                                                                                                                                                                                                                                                                                                                                                      The “dragent.yaml” file can be accessed and edited in several ways, depending on how the agent was installed.

                                                                                                                                                                                                                                                                                                                                                                      Review Understanding the Agent Config Files for details.

                                                                                                                                                                                                                                                                                                                                                                      The examples in this section presume you are entering YAML code directly intodragent.yaml, under the app_checks section.

                                                                                                                                                                                                                                                                                                                                                                      Find the default settings

                                                                                                                                                                                                                                                                                                                                                                      To find the default app-checks for already supported applications, check the dragent.default.yaml file.

                                                                                                                                                                                                                                                                                                                                                                      (Location: /opt/draios/etc/dragent.default.yaml.)

                                                                                                                                                                                                                                                                                                                                                                      Sample format

                                                                                                                                                                                                                                                                                                                                                                      app_checks:
                                                                                                                                                                                                                                                                                                                                                                        - name: APP_NAME
                                                                                                                                                                                                                                                                                                                                                                          check_module: APP_CHECK_SCRIPT
                                                                                                                                                                                                                                                                                                                                                                          pattern:
                                                                                                                                                                                                                                                                                                                                                                            comm: PROCESS_NAME
                                                                                                                                                                                                                                                                                                                                                                          conf:
                                                                                                                                                                                                                                                                                                                                                                            host: IP_ADDR
                                                                                                                                                                                                                                                                                                                                                                            port: PORT
                                                                                                                                                                                                                                                                                                                                                                      

                                                                                                                                                                                                                                                                                                                                                                      Parameter

                                                                                                                                                                                                                                                                                                                                                                      Parameter 2

                                                                                                                                                                                                                                                                                                                                                                      Description

                                                                                                                                                                                                                                                                                                                                                                      Sample Value

                                                                                                                                                                                                                                                                                                                                                                      app_checks

                                                                                                                                                                                                                                                                                                                                                                      The main section of dragent.default.yaml that contains a list of pre-configured checks.

                                                                                                                                                                                                                                                                                                                                                                      n/a

                                                                                                                                                                                                                                                                                                                                                                      name

                                                                                                                                                                                                                                                                                                                                                                      Every check should have a uniquename: which will be displayed on Sysdig Monitor as the process name of the integrated application.

                                                                                                                                                                                                                                                                                                                                                                      e.g. MongoDB

                                                                                                                                                                                                                                                                                                                                                                      check_module

                                                                                                                                                                                                                                                                                                                                                                      The name of the Python plugin that polls the data from the designated application.

                                                                                                                                                                                                                                                                                                                                                                      All the app check scripts can be found inside the /opt/draios/lib/python/checks.d directory.

                                                                                                                                                                                                                                                                                                                                                                      e.g. elastic

                                                                                                                                                                                                                                                                                                                                                                      pattern

                                                                                                                                                                                                                                                                                                                                                                      This section is used by the Sysdig agent to match a process with a check. Four kinds of keys can be specified along with any arguments to help distinguish them.

                                                                                                                                                                                                                                                                                                                                                                      n/a

                                                                                                                                                                                                                                                                                                                                                                      comm

                                                                                                                                                                                                                                                                                                                                                                      Matches process name as seen in /proc/PID/status

                                                                                                                                                                                                                                                                                                                                                                      port

                                                                                                                                                                                                                                                                                                                                                                      Matches based on the port used (i.e MySQL identified by 'port: 3306')

                                                                                                                                                                                                                                                                                                                                                                      arg

                                                                                                                                                                                                                                                                                                                                                                      Matches any process arguments

                                                                                                                                                                                                                                                                                                                                                                      exe

                                                                                                                                                                                                                                                                                                                                                                      Matches the process exe as seen in /proc/PID/exe link

                                                                                                                                                                                                                                                                                                                                                                      conf

                                                                                                                                                                                                                                                                                                                                                                      This section is specific for each plugin. You can specify any key/values that the plugins support.

                                                                                                                                                                                                                                                                                                                                                                      host

                                                                                                                                                                                                                                                                                                                                                                      Application-specific. A URL or IP address

                                                                                                                                                                                                                                                                                                                                                                      port

                                                                                                                                                                                                                                                                                                                                                                      {...} tokens can be used as values, which will be substituted with values from process info.

                                                                                                                                                                                                                                                                                                                                                                      Change the default settings

                                                                                                                                                                                                                                                                                                                                                                      To override the defaults:

                                                                                                                                                                                                                                                                                                                                                                      1. Copy relevant code blocks from dragent.default.yaml into dragent.yaml . (Or copy the code from the appropriate app check integration page in this documentation section.)

                                                                                                                                                                                                                                                                                                                                                                        Any entries copied into dragent.yaml file will override similar entries in dragent.default.yaml.

                                                                                                                                                                                                                                                                                                                                                                        Never modify dragent.default.yaml, as it will be overwritten whenever the agent is updated.

                                                                                                                                                                                                                                                                                                                                                                      2. Modify the parameters as needed.

                                                                                                                                                                                                                                                                                                                                                                        Be sure to use proper YAML. Pay attention to consistent spacing for indents (as shown) and list all check entries under an app_checks: section title.

                                                                                                                                                                                                                                                                                                                                                                      3. Save the changes and restart the agent.

                                                                                                                                                                                                                                                                                                                                                                        Use service restart agent or docker restart sysdig-agent.

                                                                                                                                                                                                                                                                                                                                                                      Metrics for the relevant application should appear in the Sysdig Monitor interface under the appropriate name.

                                                                                                                                                                                                                                                                                                                                                                      Example 1: Change Name and Add Password

                                                                                                                                                                                                                                                                                                                                                                      Here is a sample app-check entry for Redis. The app_checks section was copied from the dragent.default.yaml file and modified for a specific instance.

                                                                                                                                                                                                                                                                                                                                                                      customerid: 831f3-Your-Access-Key-9401
                                                                                                                                                                                                                                                                                                                                                                      tags: local:sf,acct:dev,svc:db
                                                                                                                                                                                                                                                                                                                                                                      app_checks:
                                                                                                                                                                                                                                                                                                                                                                        - name: redis-6380
                                                                                                                                                                                                                                                                                                                                                                          check_module: redisdb
                                                                                                                                                                                                                                                                                                                                                                          pattern:
                                                                                                                                                                                                                                                                                                                                                                            comm: redis-server
                                                                                                                                                                                                                                                                                                                                                                          conf:
                                                                                                                                                                                                                                                                                                                                                                            host: 127.0.0.1
                                                                                                                                                                                                                                                                                                                                                                            port: PORT
                                                                                                                                                                                                                                                                                                                                                                            password: PASSWORD
                                                                                                                                                                                                                                                                                                                                                                      

                                                                                                                                                                                                                                                                                                                                                                      Edits made:

                                                                                                                                                                                                                                                                                                                                                                      • The name to be displayed in the interface

                                                                                                                                                                                                                                                                                                                                                                      • A required password.

                                                                                                                                                                                                                                                                                                                                                                      As the token PORT is used, it will be translated to the actual port where Redis is listening.

                                                                                                                                                                                                                                                                                                                                                                      Example 2: Increase Polling Interval

                                                                                                                                                                                                                                                                                                                                                                      The default interval for an application check to be run by the agent is set to every second. You can increase the interval per application check by adding the interval: parameter (under the -name section) and the number of seconds to wait before each run of the script.

                                                                                                                                                                                                                                                                                                                                                                      interval: must be put into each app check entry that should run less often; there is no global setting.

                                                                                                                                                                                                                                                                                                                                                                      Example: Run the NTP check once per minute:

                                                                                                                                                                                                                                                                                                                                                                      app_checks:
                                                                                                                                                                                                                                                                                                                                                                        - name: ntp
                                                                                                                                                                                                                                                                                                                                                                          interval: 60
                                                                                                                                                                                                                                                                                                                                                                          pattern:
                                                                                                                                                                                                                                                                                                                                                                            comm: systemd
                                                                                                                                                                                                                                                                                                                                                                          conf:
                                                                                                                                                                                                                                                                                                                                                                            host: us.pool.ntp.org
                                                                                                                                                                                                                                                                                                                                                                      

                                                                                                                                                                                                                                                                                                                                                                      Disabling

                                                                                                                                                                                                                                                                                                                                                                      Disable a Single Application Check

                                                                                                                                                                                                                                                                                                                                                                      Sometimes the default configuration shipped with the Sysdig agent does not work for you or you may not be interested in checks for a single application. To turn a single check off, add an entry like this to disable it:

                                                                                                                                                                                                                                                                                                                                                                      app_checks:
                                                                                                                                                                                                                                                                                                                                                                       - name: nginx
                                                                                                                                                                                                                                                                                                                                                                         enabled: false
                                                                                                                                                                                                                                                                                                                                                                      

                                                                                                                                                                                                                                                                                                                                                                      This entry overrides the default configuration of the nginx check, disabling it.

                                                                                                                                                                                                                                                                                                                                                                      If you are using the ADDITIONAL_CONF parameter to modify your container agent’s configuration, you would add an entry like this to your Docker run command (or Kubernetes manifest):

                                                                                                                                                                                                                                                                                                                                                                      -e ADDITIONAL_CONF="app_checks:\n  - name: nginx\n    enabled: false\n"
                                                                                                                                                                                                                                                                                                                                                                      

                                                                                                                                                                                                                                                                                                                                                                      Disable ALL Application Checks

                                                                                                                                                                                                                                                                                                                                                                      If you do not need it or otherwise want to disable the application check functionality, you can add the following entry to the agent’s user settings configuration file /opt/draios/etc/dragent.yaml:

                                                                                                                                                                                                                                                                                                                                                                      app_checks_enabled: false
                                                                                                                                                                                                                                                                                                                                                                      

                                                                                                                                                                                                                                                                                                                                                                      Restart the agent as shown immediately above for either the native Linux agent installation or the container agent installation.

                                                                                                                                                                                                                                                                                                                                                                      Optional: Configure a Custom App-Check

                                                                                                                                                                                                                                                                                                                                                                      Sysdig allows custom application check-script configurations to be created for each individual container in the infrastructure, via the environment variable SYSDIG_AGENT_CONF. This avoids the need for multiple edits and entries to achieve the container-specific customization, by enabling application teams to configure their own checks.

                                                                                                                                                                                                                                                                                                                                                                      The SYSDIG_AGENT_CONF variable stores a YAML-formatted configuration for the app check, and is used to match app-check configurations. It can be stored directly within the Docker file.

                                                                                                                                                                                                                                                                                                                                                                      The syntax is the same as dragent.yaml syntax.

                                                                                                                                                                                                                                                                                                                                                                      The example below defines a per container app-check for Redis in the Dockerfile, using the SYSDIG_AGENT_CONF environment variable:

                                                                                                                                                                                                                                                                                                                                                                      FROM redis
                                                                                                                                                                                                                                                                                                                                                                      # This config file adds a password for accessing redis instance
                                                                                                                                                                                                                                                                                                                                                                      ADD redis.conf /
                                                                                                                                                                                                                                                                                                                                                                      
                                                                                                                                                                                                                                                                                                                                                                      ENV SYSDIG_AGENT_CONF { "app_checks": [{ "name": "redis", "check_module": "redisdb", "pattern": {"comm": "redis-server"}, "conf": { "host": "127.0.0.1", "port": "6379", "password": "protected"} }] }
                                                                                                                                                                                                                                                                                                                                                                      ENTRYPOINT ["redis-server"]
                                                                                                                                                                                                                                                                                                                                                                      CMD [ "/redis.conf" ]
                                                                                                                                                                                                                                                                                                                                                                      

                                                                                                                                                                                                                                                                                                                                                                      The example below shows how parameters can be added to a container started with docker run, by either using the -e/–envflag variable, or injecting the parameters using an orchestration system (for example, Kubernetes):

                                                                                                                                                                                                                                                                                                                                                                      PER_CONTAINER_CONF='{ "app_checks": [{ "name": "redis", "check_module": "redisdb", "pattern": {"comm": "redis-server"}, "conf": { "host": "127.0.0.1", "port": "6379", "password": "protected"} }] }'
                                                                                                                                                                                                                                                                                                                                                                      
                                                                                                                                                                                                                                                                                                                                                                      docker run --name redis -v /tmp/redis.conf:/etc/redis.conf -e SYSDIG_AGENT_CONF="${PER_CONTAINER_CONF}" -d redis /etc/redis.conf
                                                                                                                                                                                                                                                                                                                                                                      

                                                                                                                                                                                                                                                                                                                                                                      Metrics Limit

                                                                                                                                                                                                                                                                                                                                                                      Metric limits are defined by your payment plan. If more metrics are needed please contact your sales representative with your use case.

                                                                                                                                                                                                                                                                                                                                                                      Note that a metric with the same name but different tag name will count as a unique metric by the agent. Example: a metric 'user.clicks' with the tag 'country=us' and another 'user.clicks' with the 'tag country=it'are considered two metrics which count towards the limit.

                                                                                                                                                                                                                                                                                                                                                                      Supported Applications

                                                                                                                                                                                                                                                                                                                                                                      Below is the supported list of applications the agent will automatically poll.

                                                                                                                                                                                                                                                                                                                                                                      Some app-check scripts will need to be configured since no defaults exist, while some applications may need to be configured to output their metrics. Click a highlighted link to see application-specific notes.

                                                                                                                                                                                                                                                                                                                                                                      • Active MQ
                                                                                                                                                                                                                                                                                                                                                                      • Apache
                                                                                                                                                                                                                                                                                                                                                                      • Apache CouchDB
                                                                                                                                                                                                                                                                                                                                                                      • Apache HBase
                                                                                                                                                                                                                                                                                                                                                                      • Apache Kafka
                                                                                                                                                                                                                                                                                                                                                                      • Apache Zookeeper
                                                                                                                                                                                                                                                                                                                                                                      • Consul
                                                                                                                                                                                                                                                                                                                                                                      • CEPH
                                                                                                                                                                                                                                                                                                                                                                      • Couchbase
                                                                                                                                                                                                                                                                                                                                                                      • Elasticsearch
                                                                                                                                                                                                                                                                                                                                                                      • etcd
                                                                                                                                                                                                                                                                                                                                                                      • fluentd
                                                                                                                                                                                                                                                                                                                                                                      • Gearman
                                                                                                                                                                                                                                                                                                                                                                      • Go
                                                                                                                                                                                                                                                                                                                                                                      • Gunicorn
                                                                                                                                                                                                                                                                                                                                                                      • HAProxy
                                                                                                                                                                                                                                                                                                                                                                      • HDFS
                                                                                                                                                                                                                                                                                                                                                                      • HTTP
                                                                                                                                                                                                                                                                                                                                                                      • Jenkins
                                                                                                                                                                                                                                                                                                                                                                      • JVM
                                                                                                                                                                                                                                                                                                                                                                      • Lighttpd
                                                                                                                                                                                                                                                                                                                                                                      • Memcached
                                                                                                                                                                                                                                                                                                                                                                      • Mesos/Marathon
                                                                                                                                                                                                                                                                                                                                                                      • MongoDB
                                                                                                                                                                                                                                                                                                                                                                      • MySQL
                                                                                                                                                                                                                                                                                                                                                                      • NGINX and NGINX Plus
                                                                                                                                                                                                                                                                                                                                                                      • NTP
                                                                                                                                                                                                                                                                                                                                                                      • PGBouncer
                                                                                                                                                                                                                                                                                                                                                                      • PHP-FPM
                                                                                                                                                                                                                                                                                                                                                                      • Postfix
                                                                                                                                                                                                                                                                                                                                                                      • PostgreSQL
                                                                                                                                                                                                                                                                                                                                                                      • Prometheus
                                                                                                                                                                                                                                                                                                                                                                      • RabbitMQ
                                                                                                                                                                                                                                                                                                                                                                      • RedisDB
                                                                                                                                                                                                                                                                                                                                                                      • Supervisord
                                                                                                                                                                                                                                                                                                                                                                      • SNMP
                                                                                                                                                                                                                                                                                                                                                                      • TCP

                                                                                                                                                                                                                                                                                                                                                                      You can also

                                                                                                                                                                                                                                                                                                                                                                      8.6.2.1 -

                                                                                                                                                                                                                                                                                                                                                                      Apache

                                                                                                                                                                                                                                                                                                                                                                      Apache web server is an open-source, web server creation, deployment, and management software. If Apache is installed on your environment, the Sysdig agent will connect using the mod_status module on Apache. You may need to edit the default entries in the agent configuration file to connect. See the Default Configuration, below.

                                                                                                                                                                                                                                                                                                                                                                      This page describes the default configuration settings, how to edit the configuration to collect additional information, the metrics available for integration, and a sample result in the Sysdig Monitor UI.

                                                                                                                                                                                                                                                                                                                                                                      Apache Setup

                                                                                                                                                                                                                                                                                                                                                                      Install mod_status on your Apache servers and enable ExtendedStatus.

                                                                                                                                                                                                                                                                                                                                                                      The following configuration is required. If it is already present, then un-comment the lines, otherwise add the configuration.

                                                                                                                                                                                                                                                                                                                                                                      LoadModule status_module modules/mod_status.so
                                                                                                                                                                                                                                                                                                                                                                      ...
                                                                                                                                                                                                                                                                                                                                                                      
                                                                                                                                                                                                                                                                                                                                                                      <Location /server-status>
                                                                                                                                                                                                                                                                                                                                                                          SetHandler server-status
                                                                                                                                                                                                                                                                                                                                                                          Order Deny,Allow
                                                                                                                                                                                                                                                                                                                                                                          Deny from all
                                                                                                                                                                                                                                                                                                                                                                          Allow from localhost
                                                                                                                                                                                                                                                                                                                                                                      </Location>
                                                                                                                                                                                                                                                                                                                                                                      ...
                                                                                                                                                                                                                                                                                                                                                                      
                                                                                                                                                                                                                                                                                                                                                                      ExtendedStatus On
                                                                                                                                                                                                                                                                                                                                                                      

                                                                                                                                                                                                                                                                                                                                                                      Sysdig Agent Configuration

                                                                                                                                                                                                                                                                                                                                                                      Review how to edit dragent.yaml to Integrate or Modify Application Checks.

                                                                                                                                                                                                                                                                                                                                                                      Apache has a common default for exposing metrics. The process command name can be either apache2 or httpd. By default, the Sysdig agent will look for the process apache2. If named differently in your environment (e.g. httpd), edit the configuration file to match the process name as shown in Example 1.

                                                                                                                                                                                                                                                                                                                                                                      Default Configuration

                                                                                                                                                                                                                                                                                                                                                                      By default, Sysdig’s dragent.default.yaml uses the following code to connect with Apache and collect all metrics.

                                                                                                                                                                                                                                                                                                                                                                      app_checks:
                                                                                                                                                                                                                                                                                                                                                                        - name: apache
                                                                                                                                                                                                                                                                                                                                                                          check_module: apache
                                                                                                                                                                                                                                                                                                                                                                          pattern:
                                                                                                                                                                                                                                                                                                                                                                            comm: apache2
                                                                                                                                                                                                                                                                                                                                                                          conf:
                                                                                                                                                                                                                                                                                                                                                                            apache_status_url: "http://localhost:{port}/server-status?auto"
                                                                                                                                                                                                                                                                                                                                                                          log_errors: false
                                                                                                                                                                                                                                                                                                                                                                      

                                                                                                                                                                                                                                                                                                                                                                      Example

                                                                                                                                                                                                                                                                                                                                                                      If it is necessary to edit dragent.yaml to change the process name, use the following example and update the comm with the value httpd.

                                                                                                                                                                                                                                                                                                                                                                      Remember! Never edit dragent.default.yaml directly; always edit only dragent.yaml.

                                                                                                                                                                                                                                                                                                                                                                      app_checks:
                                                                                                                                                                                                                                                                                                                                                                        - name: apache
                                                                                                                                                                                                                                                                                                                                                                          check_module: apache
                                                                                                                                                                                                                                                                                                                                                                          pattern:
                                                                                                                                                                                                                                                                                                                                                                            comm: httpd
                                                                                                                                                                                                                                                                                                                                                                          conf:
                                                                                                                                                                                                                                                                                                                                                                            apache_status_url: "http://localhost/server-status?auto"
                                                                                                                                                                                                                                                                                                                                                                          log_errors: false
                                                                                                                                                                                                                                                                                                                                                                      

                                                                                                                                                                                                                                                                                                                                                                      Metrics Available

                                                                                                                                                                                                                                                                                                                                                                      The Apache metrics are listed in the metrics dictionary here: Apache Metrics.

                                                                                                                                                                                                                                                                                                                                                                      UI Examples

                                                                                                                                                                                                                                                                                                                                                                      8.6.2.2 -

                                                                                                                                                                                                                                                                                                                                                                      Apache Kafka

                                                                                                                                                                                                                                                                                                                                                                      Apache Kafka is a distributed streaming platform. Kafka is used for building real-time data pipelines and streaming apps. It is horizontally scalable, fault-tolerant, wicked fast, and runs in production in thousands of companies. If Kafka is installed on your environment, the Sysdig agent will automatically connect. See the Default Configuration, below.

                                                                                                                                                                                                                                                                                                                                                                      The Sysdig agent automatically collects metrics from Kafka via JMX polling. You need to provide consumer names and topics in the agent config file to collect consumer-based Kafka metrics.

                                                                                                                                                                                                                                                                                                                                                                      This page describes the default configuration settings, how to edit the configuration to collect additional information, the metrics available for integration, and a sample result in the Sysdig Monitor UI.

                                                                                                                                                                                                                                                                                                                                                                      Kafka Setup

                                                                                                                                                                                                                                                                                                                                                                      Kafka will automatically expose all metrics. You do not need to add anything on the Kafka instance.

                                                                                                                                                                                                                                                                                                                                                                      Zstandard, one of the compressions available in the Kafka integration, is only included in Kafka versions 2.1.0 or newer. See also Apache documentation.

                                                                                                                                                                                                                                                                                                                                                                      Sysdig Agent Configuration

                                                                                                                                                                                                                                                                                                                                                                      Review how to edit dragent.yaml to Integrate or Modify Application Checks.

                                                                                                                                                                                                                                                                                                                                                                      Metrics from Kafka via JMX polling are already configured in the agent’s default-settings configuration file. Metrics for consumers, however, need to use app-checks to poll the Kafka and Zookeeper API. You need to provide consumer names and topics in dragent.yaml file.

                                                                                                                                                                                                                                                                                                                                                                      Default Configuration

                                                                                                                                                                                                                                                                                                                                                                      Since consumer names and topics are environment-specific, a default configuration is not present in dragent.default.yaml.

                                                                                                                                                                                                                                                                                                                                                                      Refer to the following examples for adding Kafka checks to dragent.yaml.

                                                                                                                                                                                                                                                                                                                                                                      Remember! Never edit dragent.default.yaml directly; always edit only dragent.yaml.

                                                                                                                                                                                                                                                                                                                                                                      Example 1: Basic Configuration

                                                                                                                                                                                                                                                                                                                                                                      A basic example with sample consumer and topic names:

                                                                                                                                                                                                                                                                                                                                                                      app_checks:
                                                                                                                                                                                                                                                                                                                                                                        - name: kafka
                                                                                                                                                                                                                                                                                                                                                                          check_module: kafka_consumer
                                                                                                                                                                                                                                                                                                                                                                          pattern:
                                                                                                                                                                                                                                                                                                                                                                            comm: java
                                                                                                                                                                                                                                                                                                                                                                            arg: kafka.Kafka
                                                                                                                                                                                                                                                                                                                                                                          conf:
                                                                                                                                                                                                                                                                                                                                                                            kafka_connect_str: "127.0.0.1:9092" # kafka address, usually localhost as we run the check on the same instance
                                                                                                                                                                                                                                                                                                                                                                            zk_connect_str: "localhost:2181" # zookeeper address, may be different than localhost
                                                                                                                                                                                                                                                                                                                                                                            zk_prefix: /
                                                                                                                                                                                                                                                                                                                                                                            consumer_groups:
                                                                                                                                                                                                                                                                                                                                                                              sample-consumer-1: # sample consumer name
                                                                                                                                                                                                                                                                                                                                                                                sample-topic-1: [0, ] # sample topic name and partitions
                                                                                                                                                                                                                                                                                                                                                                              sample-consumer-2: # sample consumer name
                                                                                                                                                                                                                                                                                                                                                                                sample-topic-2: [0, 1, 2, 3] # sample topic name and partitions
                                                                                                                                                                                                                                                                                                                                                                      

                                                                                                                                                                                                                                                                                                                                                                      Example 2: Store Consumer Group Info (Kafka 9+)

                                                                                                                                                                                                                                                                                                                                                                      From Kafka 9 onwards, you can store consumer group config info inside Kafka itself for better performance.

                                                                                                                                                                                                                                                                                                                                                                      app_checks:
                                                                                                                                                                                                                                                                                                                                                                        - name: kafka
                                                                                                                                                                                                                                                                                                                                                                          check_module: kafka_consumer
                                                                                                                                                                                                                                                                                                                                                                          pattern:
                                                                                                                                                                                                                                                                                                                                                                            comm: java
                                                                                                                                                                                                                                                                                                                                                                            arg: kafka.Kafka
                                                                                                                                                                                                                                                                                                                                                                          conf:
                                                                                                                                                                                                                                                                                                                                                                            kafka_connect_str: "localhost:9092"
                                                                                                                                                                                                                                                                                                                                                                            zk_connect_str: "localhost:2181"
                                                                                                                                                                                                                                                                                                                                                                            zk_prefix: /
                                                                                                                                                                                                                                                                                                                                                                            kafka_consumer_offsets: true
                                                                                                                                                                                                                                                                                                                                                                            consumer_groups:
                                                                                                                                                                                                                                                                                                                                                                              sample-consumer-1: # sample consumer name
                                                                                                                                                                                                                                                                                                                                                                                sample-topic-1: [0, ] # sample topic name and partitions
                                                                                                                                                                                                                                                                                                                                                                      

                                                                                                                                                                                                                                                                                                                                                                      If kafka_consumer_offsets entry is set to true the app check will look for consumer offsets in Kafka. The appcheck will also look in Kafka if zk_connect_str is not set.

                                                                                                                                                                                                                                                                                                                                                                      Example 3: Aggregate Partitions at the Topic Level

                                                                                                                                                                                                                                                                                                                                                                      To enable aggregation of partitions at the topic level, use kafka_consumer_topics with aggregate_partitions = true.

                                                                                                                                                                                                                                                                                                                                                                      In this case the app check will aggregate the lag & offset values at the partition level, reducing the number of metrics collected.

                                                                                                                                                                                                                                                                                                                                                                      Set aggregate_partitions = false to disable metrics aggregation at the partition level. In this case, the appcheck will show lag and offset values for each partition.

                                                                                                                                                                                                                                                                                                                                                                      app_checks:
                                                                                                                                                                                                                                                                                                                                                                        - name: kafka
                                                                                                                                                                                                                                                                                                                                                                          check_module: kafka_consumer
                                                                                                                                                                                                                                                                                                                                                                          pattern:
                                                                                                                                                                                                                                                                                                                                                                            comm: java
                                                                                                                                                                                                                                                                                                                                                                            arg: kafka.Kafka
                                                                                                                                                                                                                                                                                                                                                                          conf:
                                                                                                                                                                                                                                                                                                                                                                            kafka_connect_str: "localhost:9092"
                                                                                                                                                                                                                                                                                                                                                                            zk_connect_str: "localhost:2181"
                                                                                                                                                                                                                                                                                                                                                                            zk_prefix: /
                                                                                                                                                                                                                                                                                                                                                                            kafka_consumer_offsets: true
                                                                                                                                                                                                                                                                                                                                                                            kafka_consumer_topics:
                                                                                                                                                                                                                                                                                                                                                                              aggregate_partitions: true
                                                                                                                                                                                                                                                                                                                                                                            consumer_groups:
                                                                                                                                                                                                                                                                                                                                                                              sample-consumer-1: # sample consumer name
                                                                                                                                                                                                                                                                                                                                                                                sample-topic-1: [0, ] # sample topic name and partitions
                                                                                                                                                                                                                                                                                                                                                                              sample-consumer-2: # sample consumer name
                                                                                                                                                                                                                                                                                                                                                                                sample-topic-2: [0, 1, 2, 3] # sample topic name and partitions
                                                                                                                                                                                                                                                                                                                                                                      

                                                                                                                                                                                                                                                                                                                                                                      Example 4: Custom Tags

                                                                                                                                                                                                                                                                                                                                                                      Optional tags can be applied to every emitted metric, service check, and/or event.

                                                                                                                                                                                                                                                                                                                                                                      app_checks:
                                                                                                                                                                                                                                                                                                                                                                        - name: kafka
                                                                                                                                                                                                                                                                                                                                                                          check_module: kafka_consumer
                                                                                                                                                                                                                                                                                                                                                                          pattern:
                                                                                                                                                                                                                                                                                                                                                                            comm: java
                                                                                                                                                                                                                                                                                                                                                                            arg: kafka.Kafka
                                                                                                                                                                                                                                                                                                                                                                          conf:
                                                                                                                                                                                                                                                                                                                                                                            kafka_connect_str: "localhost:9092"
                                                                                                                                                                                                                                                                                                                                                                            zk_connect_str: "localhost:2181"
                                                                                                                                                                                                                                                                                                                                                                            zk_prefix: /
                                                                                                                                                                                                                                                                                                                                                                            consumer_groups:
                                                                                                                                                                                                                                                                                                                                                                              sample-consumer-1: # sample consumer name
                                                                                                                                                                                                                                                                                                                                                                                sample-topic-1: [0, ] # sample topic name and partitions
                                                                                                                                                                                                                                                                                                                                                                          tags:  ["key_first_tag:value_1", "key_second_tag:value_2", "key_third_tag:value_3"]
                                                                                                                                                                                                                                                                                                                                                                      

                                                                                                                                                                                                                                                                                                                                                                      Example 5: SSL and Authentication

                                                                                                                                                                                                                                                                                                                                                                      If SSL and authentication are enabled on Kafka, use the following configuration.

                                                                                                                                                                                                                                                                                                                                                                      app_checks:
                                                                                                                                                                                                                                                                                                                                                                        - name: kafka
                                                                                                                                                                                                                                                                                                                                                                          check_module: kafka_consumer
                                                                                                                                                                                                                                                                                                                                                                          pattern:
                                                                                                                                                                                                                                                                                                                                                                            comm: java
                                                                                                                                                                                                                                                                                                                                                                            arg: kafka.Kafka
                                                                                                                                                                                                                                                                                                                                                                          conf:
                                                                                                                                                                                                                                                                                                                                                                            kafka_consumer_offsets: true
                                                                                                                                                                                                                                                                                                                                                                            kafka_connect_str: "127.0.0.1:9093"
                                                                                                                                                                                                                                                                                                                                                                            zk_connect_str: "localhost:2181"
                                                                                                                                                                                                                                                                                                                                                                            zk_prefix: /
                                                                                                                                                                                                                                                                                                                                                                            consumer_groups:
                                                                                                                                                                                                                                                                                                                                                                              test-group:
                                                                                                                                                                                                                                                                                                                                                                                test: [0, ]
                                                                                                                                                                                                                                                                                                                                                                                test-4: [0, 1, 2, 3]
                                                                                                                                                                                                                                                                                                                                                                            security_protocol: SASL_SSL
                                                                                                                                                                                                                                                                                                                                                                            sasl_mechanism: PLAIN
                                                                                                                                                                                                                                                                                                                                                                            sasl_plain_username: <USERNAME>
                                                                                                                                                                                                                                                                                                                                                                            sasl_plain_password: <PASSWORD>
                                                                                                                                                                                                                                                                                                                                                                            ssl_check_hostname: true
                                                                                                                                                                                                                                                                                                                                                                            ssl_cafile:  <SSL_CA_FILE_PATH>
                                                                                                                                                                                                                                                                                                                                                                            #ssl_context: <SSL_CONTEXT>
                                                                                                                                                                                                                                                                                                                                                                            #ssl_certfile: <CERT_FILE_PATH>
                                                                                                                                                                                                                                                                                                                                                                            #ssl_keyfile: <KEY_FILE_PATH>
                                                                                                                                                                                                                                                                                                                                                                            #ssl_password: <PASSWORD>
                                                                                                                                                                                                                                                                                                                                                                            #ssl_crlfile: <SSL_FILE_PATH>
                                                                                                                                                                                                                                                                                                                                                                      

                                                                                                                                                                                                                                                                                                                                                                      Configuration Keywords and Descriptions

                                                                                                                                                                                                                                                                                                                                                                      Keyword

                                                                                                                                                                                                                                                                                                                                                                      Description

                                                                                                                                                                                                                                                                                                                                                                      Default Value

                                                                                                                                                                                                                                                                                                                                                                      security_protocol (str)

                                                                                                                                                                                                                                                                                                                                                                      Protocol used to communicate with brokers.

                                                                                                                                                                                                                                                                                                                                                                      PLAINTEXT

                                                                                                                                                                                                                                                                                                                                                                      sasl_mechanism (str)

                                                                                                                                                                                                                                                                                                                                                                      String picking SASL mechanism when security_protocol is SASL_PLAINTEXT or SASL_SSL

                                                                                                                                                                                                                                                                                                                                                                      Currently only PLAIN is supported

                                                                                                                                                                                                                                                                                                                                                                      sasl_plain_username (str) 

                                                                                                                                                                                                                                                                                                                                                                      Username for SASL PLAIN authentication.

                                                                                                                                                                                                                                                                                                                                                                      sasl_plain_password (str) 

                                                                                                                                                                                                                                                                                                                                                                      Password for SASL PLAIN authentication.

                                                                                                                                                                                                                                                                                                                                                                      ssl_context (ssl.SSLContext) 

                                                                                                                                                                                                                                                                                                                                                                      Pre-configured SSLContext for wrapping socket connections. If provided, all other ssl_* configurations will be ignored.

                                                                                                                                                                                                                                                                                                                                                                      none

                                                                                                                                                                                                                                                                                                                                                                      ssl_check_hostname (bool)

                                                                                                                                                                                                                                                                                                                                                                      Flag to configure whether SSL handshake should verify that the certificate matches the broker's hostname.

                                                                                                                                                                                                                                                                                                                                                                      true

                                                                                                                                                                                                                                                                                                                                                                      ssl_cafile (str)

                                                                                                                                                                                                                                                                                                                                                                      Optional filename of ca file to use in certificate veriication.

                                                                                                                                                                                                                                                                                                                                                                      none

                                                                                                                                                                                                                                                                                                                                                                      ssl_certfile (str)

                                                                                                                                                                                                                                                                                                                                                                      Optional filename of file in pem format containing the client certificate, as well as any CA certificates needed to establish the certificate's authenticity.

                                                                                                                                                                                                                                                                                                                                                                      none

                                                                                                                                                                                                                                                                                                                                                                      ssl_keyfile (str)

                                                                                                                                                                                                                                                                                                                                                                      Optional filename containing the client private key.

                                                                                                                                                                                                                                                                                                                                                                      none

                                                                                                                                                                                                                                                                                                                                                                      ssl_password (str) 

                                                                                                                                                                                                                                                                                                                                                                      Optional password to be used when loading the certificate chain.

                                                                                                                                                                                                                                                                                                                                                                      none

                                                                                                                                                                                                                                                                                                                                                                      ssl_crlfile (str)

                                                                                                                                                                                                                                                                                                                                                                      Optional filename containing the CRL to check for certificate expiration. By default, no CRL check is done.

                                                                                                                                                                                                                                                                                                                                                                      When providing a file, only the leaf certificate will be checked against this CRL. The CRL can only be checked with 2.7.9+.

                                                                                                                                                                                                                                                                                                                                                                      none

                                                                                                                                                                                                                                                                                                                                                                      Example 6: Regex for Consumer Groups and Topics

                                                                                                                                                                                                                                                                                                                                                                      As of Sysdig agent version 0.94, the Kafka app check has added optional regex (regular expression) support for Kafka consumer groups and topics.

                                                                                                                                                                                                                                                                                                                                                                      Regex Configuration:

                                                                                                                                                                                                                                                                                                                                                                      • No new metrics are added with this feature

                                                                                                                                                                                                                                                                                                                                                                      • The new parameter consumer_groups_regex is added, which includes regex for consumers and topics from Kafka. Consumer offsets stored in Zookeeper are not collected.

                                                                                                                                                                                                                                                                                                                                                                      • Regex for topics is optional. When not provided, all topics under the consumer will be reported.

                                                                                                                                                                                                                                                                                                                                                                      • The regex Python syntax is documented here: https://docs.python.org/3.7/library/re.html#regular-expression-syntax

                                                                                                                                                                                                                                                                                                                                                                      • If both consumer_groups and consumer_groups_regex are provided at the same time, matched consumer groups from both parameters will be merged

                                                                                                                                                                                                                                                                                                                                                                      Sample configuration:

                                                                                                                                                                                                                                                                                                                                                                      app_checks:
                                                                                                                                                                                                                                                                                                                                                                        - name: kafka
                                                                                                                                                                                                                                                                                                                                                                          check_module: kafka_consumer
                                                                                                                                                                                                                                                                                                                                                                          pattern:
                                                                                                                                                                                                                                                                                                                                                                            comm: java
                                                                                                                                                                                                                                                                                                                                                                            arg: kafka.Kafka
                                                                                                                                                                                                                                                                                                                                                                          conf:
                                                                                                                                                                                                                                                                                                                                                                            kafka_connect_str: "localhost:9092"
                                                                                                                                                                                                                                                                                                                                                                            zk_connect_str: "localhost:2181"
                                                                                                                                                                                                                                                                                                                                                                            zk_prefix: /
                                                                                                                                                                                                                                                                                                                                                                            kafka_consumer_offsets: true
                                                                                                                                                                                                                                                                                                                                                                            # Regex can be provided in following format
                                                                                                                                                                                                                                                                                                                                                                            # consumer_groups_regex:
                                                                                                                                                                                                                                                                                                                                                                            #   'REGEX_1_FOR_CONSUMER_GROUPS':
                                                                                                                                                                                                                                                                                                                                                                            #      - 'REGEX_1_FOR_TOPIC'
                                                                                                                                                                                                                                                                                                                                                                            #      - 'REGEX_2_FOR_TOPIC'
                                                                                                                                                                                                                                                                                                                                                                            consumer_groups_regex:
                                                                                                                                                                                                                                                                                                                                                                              'consumer*':
                                                                                                                                                                                                                                                                                                                                                                                - 'topic'
                                                                                                                                                                                                                                                                                                                                                                                - '^topic.*'
                                                                                                                                                                                                                                                                                                                                                                                - '.*topic$'
                                                                                                                                                                                                                                                                                                                                                                                - '^topic.*'
                                                                                                                                                                                                                                                                                                                                                                                - 'topic\d+'
                                                                                                                                                                                                                                                                                                                                                                                - '^topic_\w+'
                                                                                                                                                                                                                                                                                                                                                                      

                                                                                                                                                                                                                                                                                                                                                                      Example

                                                                                                                                                                                                                                                                                                                                                                      Regex

                                                                                                                                                                                                                                                                                                                                                                      Description

                                                                                                                                                                                                                                                                                                                                                                      Examples Matched

                                                                                                                                                                                                                                                                                                                                                                      Examples NOT Matched

                                                                                                                                                                                                                                                                                                                                                                      topic_\d+

                                                                                                                                                                                                                                                                                                                                                                      All strings having keyword topic followed by _ and one or more digit characters (equal to [0-9])

                                                                                                                                                                                                                                                                                                                                                                      my-topic_1

                                                                                                                                                                                                                                                                                                                                                                      topic_23

                                                                                                                                                                                                                                                                                                                                                                      topic_5-dev

                                                                                                                                                                                                                                                                                                                                                                      topic_x

                                                                                                                                                                                                                                                                                                                                                                      my-topic-1

                                                                                                                                                                                                                                                                                                                                                                      topic-123

                                                                                                                                                                                                                                                                                                                                                                      topic

                                                                                                                                                                                                                                                                                                                                                                      All strings having topic keyword

                                                                                                                                                                                                                                                                                                                                                                      topic_x

                                                                                                                                                                                                                                                                                                                                                                      x_topic123

                                                                                                                                                                                                                                                                                                                                                                      xyz

                                                                                                                                                                                                                                                                                                                                                                      consumer*

                                                                                                                                                                                                                                                                                                                                                                      All strings have consumer keyword

                                                                                                                                                                                                                                                                                                                                                                      consumer-1

                                                                                                                                                                                                                                                                                                                                                                      sample-consumer

                                                                                                                                                                                                                                                                                                                                                                      sample-consumer-2

                                                                                                                                                                                                                                                                                                                                                                      xyz

                                                                                                                                                                                                                                                                                                                                                                      ^topic_\w+

                                                                                                                                                                                                                                                                                                                                                                      All strings starting with topic followed by _ and any one or more word characters (equal to [a-zA-Z0-9_])

                                                                                                                                                                                                                                                                                                                                                                      topic_12

                                                                                                                                                                                                                                                                                                                                                                      topic_x

                                                                                                                                                                                                                                                                                                                                                                      topic_xyz_123

                                                                                                                                                                                                                                                                                                                                                                      topic-12

                                                                                                                                                                                                                                                                                                                                                                      x_topic

                                                                                                                                                                                                                                                                                                                                                                      topic__xyz

                                                                                                                                                                                                                                                                                                                                                                      ^topic.*

                                                                                                                                                                                                                                                                                                                                                                      All strings starting with topic

                                                                                                                                                                                                                                                                                                                                                                      topic-x

                                                                                                                                                                                                                                                                                                                                                                      topic123

                                                                                                                                                                                                                                                                                                                                                                      x-topic

                                                                                                                                                                                                                                                                                                                                                                      x_topic123

                                                                                                                                                                                                                                                                                                                                                                      .*topic$

                                                                                                                                                                                                                                                                                                                                                                      All strings ending with topic

                                                                                                                                                                                                                                                                                                                                                                      x_topic

                                                                                                                                                                                                                                                                                                                                                                      sampletopic

                                                                                                                                                                                                                                                                                                                                                                      topic-1

                                                                                                                                                                                                                                                                                                                                                                      x_topic123

                                                                                                                                                                                                                                                                                                                                                                      Metrics Available

                                                                                                                                                                                                                                                                                                                                                                      Kafka Consumer Metrics (App Checks)

                                                                                                                                                                                                                                                                                                                                                                      See Apache Kafka Consumer Metrics.

                                                                                                                                                                                                                                                                                                                                                                      JMX Metrics

                                                                                                                                                                                                                                                                                                                                                                      See Apache Kafka JMX Metrics.

                                                                                                                                                                                                                                                                                                                                                                      Result in the Monitor UI

                                                                                                                                                                                                                                                                                                                                                                      8.6.2.3 -

                                                                                                                                                                                                                                                                                                                                                                      Consul

                                                                                                                                                                                                                                                                                                                                                                      Consul is a distributed service mesh to connect, secure, and configure services across any runtime platform and public or private cloud. If Consul is installed on your environment, the Sysdig agent will automatically connect and collect basic metrics. If the Consul Access Control List (ACL) is configured, you may need to edit the default entries to connect. Also, additional latency metrics can be collected by modifying default entries. See the Default Configuration, below.

                                                                                                                                                                                                                                                                                                                                                                      It’s easy! Sysdig automatically detects metrics from this app based on standard default configurations.

                                                                                                                                                                                                                                                                                                                                                                      This page describes the default configuration settings, how to edit the configuration to collect additional information, the metrics available for integration, and a sample result in the Sysdig Monitor UI.

                                                                                                                                                                                                                                                                                                                                                                      Consul Configuration

                                                                                                                                                                                                                                                                                                                                                                      Consul is ready to expose metrics without any special configuration.

                                                                                                                                                                                                                                                                                                                                                                      Sysdig Agent Configuration

                                                                                                                                                                                                                                                                                                                                                                      Review how to edit dragent.yaml to Integrate or Modify Application Checks.

                                                                                                                                                                                                                                                                                                                                                                      Default Configuration

                                                                                                                                                                                                                                                                                                                                                                      By default, Sysdig’s dragent.default.yaml ``uses the following code to connect with Consul and collect basic metrics.

                                                                                                                                                                                                                                                                                                                                                                      app_checks:
                                                                                                                                                                                                                                                                                                                                                                        - name: consul
                                                                                                                                                                                                                                                                                                                                                                          pattern:
                                                                                                                                                                                                                                                                                                                                                                            comm: consul
                                                                                                                                                                                                                                                                                                                                                                          conf:
                                                                                                                                                                                                                                                                                                                                                                            url: "http://localhost:8500"
                                                                                                                                                                                                                                                                                                                                                                            catalog_checks: yes
                                                                                                                                                                                                                                                                                                                                                                      

                                                                                                                                                                                                                                                                                                                                                                      With the dragent.default.yaml file, the below set of metrics are available in the Sysdig Monitor UI:

                                                                                                                                                                                                                                                                                                                                                                      Metrics name
                                                                                                                                                                                                                                                                                                                                                                      consul.catalog.nodes_critical
                                                                                                                                                                                                                                                                                                                                                                      consul.catalog.nodes_passing
                                                                                                                                                                                                                                                                                                                                                                      consul.catalog.nodes_up
                                                                                                                                                                                                                                                                                                                                                                      consul.catalog.nodes_warning
                                                                                                                                                                                                                                                                                                                                                                      consul.catalog.total_nodes
                                                                                                                                                                                                                                                                                                                                                                      consul.catalog.services_critical
                                                                                                                                                                                                                                                                                                                                                                      consul.catalog.services_passing
                                                                                                                                                                                                                                                                                                                                                                      consul.catalog.services_up
                                                                                                                                                                                                                                                                                                                                                                      consul.catalog.services_warning
                                                                                                                                                                                                                                                                                                                                                                      consul.peers

                                                                                                                                                                                                                                                                                                                                                                      Additional metrics and event can be collected by adding configuration in dragent.yaml file. The ACL token must be provided if enabled. See the following examples.

                                                                                                                                                                                                                                                                                                                                                                      Remember! Never edit dragent.default.yaml ``directly; always edit only dragent.yaml.

                                                                                                                                                                                                                                                                                                                                                                      Example 1: Enable Leader Change Event

                                                                                                                                                                                                                                                                                                                                                                      self_leader_check An enabled node will watch for itself to become the leader and will emit an event when that happens. It can be enabled on all nodes.

                                                                                                                                                                                                                                                                                                                                                                      app_checks:
                                                                                                                                                                                                                                                                                                                                                                        - name: consul
                                                                                                                                                                                                                                                                                                                                                                          pattern:
                                                                                                                                                                                                                                                                                                                                                                            comm: consul
                                                                                                                                                                                                                                                                                                                                                                          conf:
                                                                                                                                                                                                                                                                                                                                                                            url: "http://localhost:8500"
                                                                                                                                                                                                                                                                                                                                                                            catalog_checks: yes
                                                                                                                                                                                                                                                                                                                                                                            self_leader_check: yes
                                                                                                                                                                                                                                                                                                                                                                          logs_enabled: true
                                                                                                                                                                                                                                                                                                                                                                      

                                                                                                                                                                                                                                                                                                                                                                      Example 2: Enable Latency Metrics

                                                                                                                                                                                                                                                                                                                                                                      If the network_latency_checks flag is enabled, then the Consul network coordinates will be retrieved and the latency calculated for each node and between data centers.

                                                                                                                                                                                                                                                                                                                                                                      app_checks:
                                                                                                                                                                                                                                                                                                                                                                        - name: consul
                                                                                                                                                                                                                                                                                                                                                                          pattern:
                                                                                                                                                                                                                                                                                                                                                                            comm: consul
                                                                                                                                                                                                                                                                                                                                                                          conf:
                                                                                                                                                                                                                                                                                                                                                                            url: "http://localhost:8500"
                                                                                                                                                                                                                                                                                                                                                                            catalog_checks: yes
                                                                                                                                                                                                                                                                                                                                                                            network_latency_checks: yes
                                                                                                                                                                                                                                                                                                                                                                          logs_enabled: true
                                                                                                                                                                                                                                                                                                                                                                      

                                                                                                                                                                                                                                                                                                                                                                      With the above changes, you can see the following additional metrics:

                                                                                                                                                                                                                                                                                                                                                                      Metrics name
                                                                                                                                                                                                                                                                                                                                                                      consul.net.node.latency.min
                                                                                                                                                                                                                                                                                                                                                                      consul.net.node.latency.p25
                                                                                                                                                                                                                                                                                                                                                                      consul.net.node.latency.median
                                                                                                                                                                                                                                                                                                                                                                      consul.net.node.latency.p75
                                                                                                                                                                                                                                                                                                                                                                      consul.net.node.latency.p90
                                                                                                                                                                                                                                                                                                                                                                      consul.net.node.latency.p95
                                                                                                                                                                                                                                                                                                                                                                      consul.net.node.latency.p99
                                                                                                                                                                                                                                                                                                                                                                      consul.net.node.latency.max

                                                                                                                                                                                                                                                                                                                                                                      Example 3: Enable ACL Token

                                                                                                                                                                                                                                                                                                                                                                      When the ACL Systemis enabled in Consul, the ACL Agent Token must be added in dragent.yaml in order to collect metrics.

                                                                                                                                                                                                                                                                                                                                                                      Follow Consul’s official documentation to Configure ACL, Bootstrap ACL and Create Agent Token.

                                                                                                                                                                                                                                                                                                                                                                      app_checks:
                                                                                                                                                                                                                                                                                                                                                                        - name: consul
                                                                                                                                                                                                                                                                                                                                                                          pattern:
                                                                                                                                                                                                                                                                                                                                                                            comm: consul
                                                                                                                                                                                                                                                                                                                                                                          conf:
                                                                                                                                                                                                                                                                                                                                                                            url: "http://localhost:8500"
                                                                                                                                                                                                                                                                                                                                                                            acl_token: "xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx" #Add agent token
                                                                                                                                                                                                                                                                                                                                                                            catalog_checks: yes
                                                                                                                                                                                                                                                                                                                                                                            logs_enabled: true
                                                                                                                                                                                                                                                                                                                                                                      

                                                                                                                                                                                                                                                                                                                                                                      Example 4: Collect Metrics from Non-Leader Node

                                                                                                                                                                                                                                                                                                                                                                      Required: Agent 9.6.0+

                                                                                                                                                                                                                                                                                                                                                                      With agent 9.6.0, you can use the configuration option single_node_install (Optional. Default: false). Set this option to true and the app check will be performed on non-leader nodes of Consul.

                                                                                                                                                                                                                                                                                                                                                                      app_checks:
                                                                                                                                                                                                                                                                                                                                                                         - name: consul
                                                                                                                                                                                                                                                                                                                                                                          pattern:
                                                                                                                                                                                                                                                                                                                                                                            comm: consul
                                                                                                                                                                                                                                                                                                                                                                          conf:
                                                                                                                                                                                                                                                                                                                                                                            url: "http://localhost:8500"
                                                                                                                                                                                                                                                                                                                                                                            catalog_checks: yes
                                                                                                                                                                                                                                                                                                                                                                            single_node_install: true
                                                                                                                                                                                                                                                                                                                                                                      

                                                                                                                                                                                                                                                                                                                                                                      StatsD Metrics

                                                                                                                                                                                                                                                                                                                                                                      In addition to the metrics from the Sysdig app-check, there are many other metrics that Consul can send using StatsD. Those metrics will be automatically collected by the Sysdig agent’s StatsD integration if Consul is configured to send them.

                                                                                                                                                                                                                                                                                                                                                                      Add statsd_address under telemetry to the Consul config file. The default config file location is /consul/config/local.json

                                                                                                                                                                                                                                                                                                                                                                      {
                                                                                                                                                                                                                                                                                                                                                                      ...
                                                                                                                                                                                                                                                                                                                                                                        "telemetry": {
                                                                                                                                                                                                                                                                                                                                                                           "statsd_address": "127.0.0.1:8125"
                                                                                                                                                                                                                                                                                                                                                                        }
                                                                                                                                                                                                                                                                                                                                                                      ...
                                                                                                                                                                                                                                                                                                                                                                      }
                                                                                                                                                                                                                                                                                                                                                                      

                                                                                                                                                                                                                                                                                                                                                                      See Telemetry Metrics for more details.

                                                                                                                                                                                                                                                                                                                                                                      Metrics Available

                                                                                                                                                                                                                                                                                                                                                                      See Consul Metrics.

                                                                                                                                                                                                                                                                                                                                                                      Result in the Monitor UI

                                                                                                                                                                                                                                                                                                                                                                      8.6.2.4 -

                                                                                                                                                                                                                                                                                                                                                                      Couchbase

                                                                                                                                                                                                                                                                                                                                                                      Couchbase Server is a distributed, open-source, NoSQL database engine. The core architecture is designed to simplify building modern applications with a flexible data model and simpler high availability, high scalability, high performance, and advanced security. If Couchbase is installed on your environment, the Sysdig agent will automatically connect. If authentication is configured, you may need to edit the default entries to connect. See the Default Configuration, below.

                                                                                                                                                                                                                                                                                                                                                                      The Sysdig agent automatically collects all bucket and node metrics. You can also edit the configuration to collect query metrics.

                                                                                                                                                                                                                                                                                                                                                                      This page describes the default configuration settings, how to edit the configuration to collect additional information, the metrics available for integration, and a sample result in the Sysdig Monitor UI.

                                                                                                                                                                                                                                                                                                                                                                      Couchbase Setup

                                                                                                                                                                                                                                                                                                                                                                      Couchbase will automatically expose all metrics. You do not need to configure anything on the Couchbase instance.

                                                                                                                                                                                                                                                                                                                                                                      Sysdig Agent Configuration

                                                                                                                                                                                                                                                                                                                                                                      Review how to edit dragent.yaml to Integrate or Modify Application Checks.

                                                                                                                                                                                                                                                                                                                                                                      Default Configuration

                                                                                                                                                                                                                                                                                                                                                                      By default, Sysdig’s dragent.default.yaml uses the following code to connect with Couchbase and collect all bucket and node metrics.

                                                                                                                                                                                                                                                                                                                                                                      app_checks:
                                                                                                                                                                                                                                                                                                                                                                        - name: couchbase
                                                                                                                                                                                                                                                                                                                                                                          pattern:
                                                                                                                                                                                                                                                                                                                                                                            comm: beam.smp
                                                                                                                                                                                                                                                                                                                                                                            arg: couchbase
                                                                                                                                                                                                                                                                                                                                                                            port: 8091
                                                                                                                                                                                                                                                                                                                                                                          conf:
                                                                                                                                                                                                                                                                                                                                                                            server: http://localhost:8091
                                                                                                                                                                                                                                                                                                                                                                      

                                                                                                                                                                                                                                                                                                                                                                      If authentication is enabled, you need to edit dragent.yaml file to connect with Couchbase. See Example 1.

                                                                                                                                                                                                                                                                                                                                                                      Remember! Never edit dragent.default.yaml directly; always edit only dragent.yaml.

                                                                                                                                                                                                                                                                                                                                                                      Example 1: Authentication

                                                                                                                                                                                                                                                                                                                                                                      Replace <username> and <password> with appropriate values and update the dragent.yaml file.

                                                                                                                                                                                                                                                                                                                                                                      app_checks:
                                                                                                                                                                                                                                                                                                                                                                        - name: couchbase
                                                                                                                                                                                                                                                                                                                                                                          pattern:
                                                                                                                                                                                                                                                                                                                                                                            comm: beam.smp
                                                                                                                                                                                                                                                                                                                                                                            arg: couchbase
                                                                                                                                                                                                                                                                                                                                                                            port: 8091
                                                                                                                                                                                                                                                                                                                                                                          conf:
                                                                                                                                                                                                                                                                                                                                                                            server: http://localhost:8091
                                                                                                                                                                                                                                                                                                                                                                            user: <username>
                                                                                                                                                                                                                                                                                                                                                                            password: <password>
                                                                                                                                                                                                                                                                                                                                                                            # The following block is optional and required only if the 'path' and
                                                                                                                                                                                                                                                                                                                                                                            # 'port' need to be set to non-default values specified here
                                                                                                                                                                                                                                                                                                                                                                            cbstats:
                                                                                                                                                                                                                                                                                                                                                                              port: 11210
                                                                                                                                                                                                                                                                                                                                                                              path: /opt/couchbase/bin/cbstats
                                                                                                                                                                                                                                                                                                                                                                      

                                                                                                                                                                                                                                                                                                                                                                      Example 2: Query Stats

                                                                                                                                                                                                                                                                                                                                                                      Additionally, you can configure query_monitoring_url to get query monitoring stats. This is available from Couchbase version 4.5. See Query Monitoring for more detail.

                                                                                                                                                                                                                                                                                                                                                                      app_checks:
                                                                                                                                                                                                                                                                                                                                                                        - name: couchbase
                                                                                                                                                                                                                                                                                                                                                                          pattern:
                                                                                                                                                                                                                                                                                                                                                                            comm: beam.smp
                                                                                                                                                                                                                                                                                                                                                                            arg: couchbase
                                                                                                                                                                                                                                                                                                                                                                            port: 8091
                                                                                                                                                                                                                                                                                                                                                                          conf:
                                                                                                                                                                                                                                                                                                                                                                            server: http://localhost:8091
                                                                                                                                                                                                                                                                                                                                                                            query_monitoring_url: http://localhost:8093
                                                                                                                                                                                                                                                                                                                                                                      

                                                                                                                                                                                                                                                                                                                                                                      Metrics Available

                                                                                                                                                                                                                                                                                                                                                                      See Couchbase Metrics.

                                                                                                                                                                                                                                                                                                                                                                      Result in the Monitor UI

                                                                                                                                                                                                                                                                                                                                                                      8.6.2.5 -

                                                                                                                                                                                                                                                                                                                                                                      Elasticsearch

                                                                                                                                                                                                                                                                                                                                                                      Elasticsearch is an open-source, distributed, document storage and search engine that stores and retrieves data structures in near real-time. Elasticsearch represents data in the form of structured JSON documents and makes full-text search accessible via RESTful API and web clients for languages like PHP, Python, and Ruby. It’s also elastic in the sense that it’s easy to scale horizontally—simply add more nodes to distribute the load. If Elasticsearch is installed on your environment, the Sysdig agent will automatically connect in most of the cases. See the Default Configuration, below.

                                                                                                                                                                                                                                                                                                                                                                      The Sysdig Agent automatically collects default metrics. You can also edit the configuration to collect Primary Shard stats.

                                                                                                                                                                                                                                                                                                                                                                      This page describes the default configuration settings, how to edit the configuration to collect additional information, the metrics available for integration, and a sample result in the Sysdig Monitor UI.

                                                                                                                                                                                                                                                                                                                                                                      Elasticsearch Setup

                                                                                                                                                                                                                                                                                                                                                                      Elasticsearch is ready to expose metrics without any special configuration.

                                                                                                                                                                                                                                                                                                                                                                      Sysdig Agent Configuration

                                                                                                                                                                                                                                                                                                                                                                      Review how to edit dragent.yaml to Integrate or Modify Application Checks.

                                                                                                                                                                                                                                                                                                                                                                      Default Configuration

                                                                                                                                                                                                                                                                                                                                                                      By default, Sysdig’s dragent.default.yaml uses the following code to connect with Elasticsearch and collect basic metrics.

                                                                                                                                                                                                                                                                                                                                                                      app_checks:
                                                                                                                                                                                                                                                                                                                                                                        - name: elasticsearch
                                                                                                                                                                                                                                                                                                                                                                          check_module: elastic
                                                                                                                                                                                                                                                                                                                                                                          pattern:
                                                                                                                                                                                                                                                                                                                                                                            port: 9200
                                                                                                                                                                                                                                                                                                                                                                            comm: java
                                                                                                                                                                                                                                                                                                                                                                          conf:
                                                                                                                                                                                                                                                                                                                                                                            url: http://localhost:9200
                                                                                                                                                                                                                                                                                                                                                                      

                                                                                                                                                                                                                                                                                                                                                                      For more metrics, you may need to change the elasticsearch default setting in dragent.yaml:

                                                                                                                                                                                                                                                                                                                                                                      Remember! Never edit dragent.default.yaml directly; always edit only dragent.yaml.

                                                                                                                                                                                                                                                                                                                                                                      Example 1: Agent authentication to Elasticsearch Cluster with Authentication

                                                                                                                                                                                                                                                                                                                                                                      Password Authentication

                                                                                                                                                                                                                                                                                                                                                                      app_checks:
                                                                                                                                                                                                                                                                                                                                                                        - name: elasticsearch
                                                                                                                                                                                                                                                                                                                                                                          check_module: elastic
                                                                                                                                                                                                                                                                                                                                                                          pattern:
                                                                                                                                                                                                                                                                                                                                                                            port: 9200
                                                                                                                                                                                                                                                                                                                                                                            comm: java
                                                                                                                                                                                                                                                                                                                                                                          conf:
                                                                                                                                                                                                                                                                                                                                                                            url: https://sysdigcloud-elasticsearch:9200
                                                                                                                                                                                                                                                                                                                                                                            username: readonly
                                                                                                                                                                                                                                                                                                                                                                            password: some_password
                                                                                                                                                                                                                                                                                                                                                                            ssl_verify: false
                                                                                                                                                                                                                                                                                                                                                                      

                                                                                                                                                                                                                                                                                                                                                                      Certificate Authentication

                                                                                                                                                                                                                                                                                                                                                                      app_checks:
                                                                                                                                                                                                                                                                                                                                                                         - name: elasticsearch
                                                                                                                                                                                                                                                                                                                                                                           check_module: elastic
                                                                                                                                                                                                                                                                                                                                                                           pattern:
                                                                                                                                                                                                                                                                                                                                                                             port: 9200
                                                                                                                                                                                                                                                                                                                                                                             comm: java
                                                                                                                                                                                                                                                                                                                                                                           conf:
                                                                                                                                                                                                                                                                                                                                                                             url: https://localhost:9200
                                                                                                                                                                                                                                                                                                                                                                             ssl_cert: /tmp/certs/ssl.crt
                                                                                                                                                                                                                                                                                                                                                                             ssl_key: /tmp/certs/ssl.key
                                                                                                                                                                                                                                                                                                                                                                             ssl_verify: true
                                                                                                                                                                                                                                                                                                                                                                      

                                                                                                                                                                                                                                                                                                                                                                      ssl_cert: Path to the certificate chain used for validating the authenticity of the Elasticsearch server.

                                                                                                                                                                                                                                                                                                                                                                      ssl_key: Path to the certificate key used for authenticating to the Elasticsearch server.

                                                                                                                                                                                                                                                                                                                                                                      Example 2: Enable Primary shard Statistics

                                                                                                                                                                                                                                                                                                                                                                      app_checks:
                                                                                                                                                                                                                                                                                                                                                                        - name: elasticsearch
                                                                                                                                                                                                                                                                                                                                                                          check_module: elastic
                                                                                                                                                                                                                                                                                                                                                                          pattern:
                                                                                                                                                                                                                                                                                                                                                                            port: 9200
                                                                                                                                                                                                                                                                                                                                                                            comm: java
                                                                                                                                                                                                                                                                                                                                                                          conf:
                                                                                                                                                                                                                                                                                                                                                                            url: http://localhost:9200
                                                                                                                                                                                                                                                                                                                                                                            pshard_stats : true
                                                                                                                                                                                                                                                                                                                                                                      

                                                                                                                                                                                                                                                                                                                                                                      pshard-specific Metrics

                                                                                                                                                                                                                                                                                                                                                                      Enable pshard_stats to monitor the following additional metrics:

                                                                                                                                                                                                                                                                                                                                                                      Metric Name
                                                                                                                                                                                                                                                                                                                                                                      elasticsearch.primaries.flush.total
                                                                                                                                                                                                                                                                                                                                                                      elasticsearch.primaries.flush.total.time
                                                                                                                                                                                                                                                                                                                                                                      elasticsearch.primaries.docs.count
                                                                                                                                                                                                                                                                                                                                                                      elasticsearch.primaries.docs.deleted
                                                                                                                                                                                                                                                                                                                                                                      elasticsearch.primaries.get.current
                                                                                                                                                                                                                                                                                                                                                                      elasticsearch.primaries.get.exists.time
                                                                                                                                                                                                                                                                                                                                                                      elasticsearch.primaries.get.exists.total
                                                                                                                                                                                                                                                                                                                                                                      elasticsearch.primaries.get.missing.time
                                                                                                                                                                                                                                                                                                                                                                      elasticsearch.primaries.get.missing.total
                                                                                                                                                                                                                                                                                                                                                                      elasticsearch.primaries.get.time
                                                                                                                                                                                                                                                                                                                                                                      elasticsearch.primaries.get.total
                                                                                                                                                                                                                                                                                                                                                                      elasticsearch.primaries.indexing.delete.current
                                                                                                                                                                                                                                                                                                                                                                      elasticsearch.primaries.indexing.delete.time
                                                                                                                                                                                                                                                                                                                                                                      elasticsearch.primaries.indexing.delete.total
                                                                                                                                                                                                                                                                                                                                                                      elasticsearch.primaries.indexing.index.current
                                                                                                                                                                                                                                                                                                                                                                      elasticsearch.primaries.indexing.index.time
                                                                                                                                                                                                                                                                                                                                                                      elasticsearch.primaries.indexing.index.total
                                                                                                                                                                                                                                                                                                                                                                      elasticsearch.primaries.merges.current
                                                                                                                                                                                                                                                                                                                                                                      elasticsearch.primaries.merges.current.docs
                                                                                                                                                                                                                                                                                                                                                                      elasticsearch.primaries.merges.current.size
                                                                                                                                                                                                                                                                                                                                                                      elasticsearch.primaries.merges.total
                                                                                                                                                                                                                                                                                                                                                                      elasticsearch.primaries.merges.total.docs
                                                                                                                                                                                                                                                                                                                                                                      elasticsearch.primaries.merges.total.size
                                                                                                                                                                                                                                                                                                                                                                      elasticsearch.primaries.merges.total.time
                                                                                                                                                                                                                                                                                                                                                                      elasticsearch.primaries.refresh.total
                                                                                                                                                                                                                                                                                                                                                                      elasticsearch.primaries.refresh.total.time
                                                                                                                                                                                                                                                                                                                                                                      elasticsearch.primaries.search.fetch.current
                                                                                                                                                                                                                                                                                                                                                                      elasticsearch.primaries.search.fetch.time
                                                                                                                                                                                                                                                                                                                                                                      elasticsearch.primaries.search.fetch.total
                                                                                                                                                                                                                                                                                                                                                                      elasticsearch.primaries.search.query.current
                                                                                                                                                                                                                                                                                                                                                                      elasticsearch.primaries.search.query.time
                                                                                                                                                                                                                                                                                                                                                                      elasticsearch.primaries.search.query.total
                                                                                                                                                                                                                                                                                                                                                                      elasticsearch.primaries.store.size

                                                                                                                                                                                                                                                                                                                                                                      Example 3: Enable Primary shard Statistics for Master Node only

                                                                                                                                                                                                                                                                                                                                                                      app_checks:
                                                                                                                                                                                                                                                                                                                                                                        - name: elasticsearch
                                                                                                                                                                                                                                                                                                                                                                          check_module: elastic
                                                                                                                                                                                                                                                                                                                                                                          pattern:
                                                                                                                                                                                                                                                                                                                                                                            port: 9200
                                                                                                                                                                                                                                                                                                                                                                            comm: java
                                                                                                                                                                                                                                                                                                                                                                          conf:
                                                                                                                                                                                                                                                                                                                                                                            url: http://localhost:9200
                                                                                                                                                                                                                                                                                                                                                                            pshard_stats_master_node_only: true
                                                                                                                                                                                                                                                                                                                                                                      

                                                                                                                                                                                                                                                                                                                                                                      Note that this option takes precedence over the pshard_stats option (above). This means that if the following configuration were put into place, only the pshard_stats_master_node_only option would be respected:

                                                                                                                                                                                                                                                                                                                                                                      app_checks:
                                                                                                                                                                                                                                                                                                                                                                        - name: elasticsearch
                                                                                                                                                                                                                                                                                                                                                                          check_module: elastic
                                                                                                                                                                                                                                                                                                                                                                          pattern:
                                                                                                                                                                                                                                                                                                                                                                            port: 9200
                                                                                                                                                                                                                                                                                                                                                                            comm: java
                                                                                                                                                                                                                                                                                                                                                                          conf:
                                                                                                                                                                                                                                                                                                                                                                            url: http://localhost:9200
                                                                                                                                                                                                                                                                                                                                                                            pshard_stats: true
                                                                                                                                                                                                                                                                                                                                                                            pshard_stats_master_node_only: true
                                                                                                                                                                                                                                                                                                                                                                      

                                                                                                                                                                                                                                                                                                                                                                      All Available Metrics

                                                                                                                                                                                                                                                                                                                                                                      With the default settings and the pshard setting, the total available metrics are listed here: Elasticsearch Metrics.

                                                                                                                                                                                                                                                                                                                                                                      Result in the Monitor UI

                                                                                                                                                                                                                                                                                                                                                                      8.6.2.6 -

                                                                                                                                                                                                                                                                                                                                                                      etcd

                                                                                                                                                                                                                                                                                                                                                                      etcdis a distributed key-value store that provides a reliable way to store data across a cluster of machines. If etcd is installed on your environment, the Sysdig agent will automatically connect. If you are using ectd older than version 2, you may need to edit the default entries to connect. See the Default Configuration section, below.

                                                                                                                                                                                                                                                                                                                                                                      The Sysdig Agent automatically collects all metrics.

                                                                                                                                                                                                                                                                                                                                                                      This page describes the default configuration settings, how to edit the configuration to collect additional information, the metrics available for integration, and a sample result in the Sysdig Monitor UI.

                                                                                                                                                                                                                                                                                                                                                                      etcd Versions

                                                                                                                                                                                                                                                                                                                                                                      etcd v2

                                                                                                                                                                                                                                                                                                                                                                      The app check functionality described on this page supports etcd metrics from APIs that are specific to v2 of etcd.

                                                                                                                                                                                                                                                                                                                                                                      These APIs are present in etcd v3 as well, but export metrics only for the v2 datastores. For example, after upgrading from etcd v2 to v3, if the v2 datastores are not migrated to v3, the v2 APIs will continue exporting metrics for these datastores. If the v2 datastores are migrated to v3, the v2 APIs will no longer export metrics for these datastores.

                                                                                                                                                                                                                                                                                                                                                                      etcd v3

                                                                                                                                                                                                                                                                                                                                                                      etcd v3 uses a native Prometheus exporter. The exporter only exports metrics for v3 datastores. For example, after upgrading from etcd v2 to v3, if v2 datastores are not migrated to v3, the Prometheus endpoint will not export metrics for these datastores. The Prometheus endpoint will only export metrics for datastores migrated to v3 or datastores created after the upgrade to v3.

                                                                                                                                                                                                                                                                                                                                                                      If your etcd version is v3 or higher, use the information on this page to enable an integration: Integrate Prometheus Metrics.

                                                                                                                                                                                                                                                                                                                                                                      etcd Setup

                                                                                                                                                                                                                                                                                                                                                                      etcd will automatically expose all metrics. You do not need to add anything to the etcd instance.

                                                                                                                                                                                                                                                                                                                                                                      Sysdig Agent Configuration

                                                                                                                                                                                                                                                                                                                                                                      Review how to Edit dragent.yaml to Integrate or Modify Application Checks.

                                                                                                                                                                                                                                                                                                                                                                      The default agent configuration for etcd will look for the application on localhost, port 2379. No customization is required.

                                                                                                                                                                                                                                                                                                                                                                      Default Configuration

                                                                                                                                                                                                                                                                                                                                                                      By default, Sysdig’s dragent.default.yaml uses the following code to connect with etcd and collect all metrics.

                                                                                                                                                                                                                                                                                                                                                                      app_checks:
                                                                                                                                                                                                                                                                                                                                                                        - name: etcd
                                                                                                                                                                                                                                                                                                                                                                          pattern:
                                                                                                                                                                                                                                                                                                                                                                            comm: etcd
                                                                                                                                                                                                                                                                                                                                                                          conf:
                                                                                                                                                                                                                                                                                                                                                                            url: "http://localhost:2379"
                                                                                                                                                                                                                                                                                                                                                                      

                                                                                                                                                                                                                                                                                                                                                                      etcd (before version 2) does not listen on localhost, so the Sysdig agent will not connect to it automatically. In such case, you may need edit the dragent.yaml file with the hostname and port. See Example 1.

                                                                                                                                                                                                                                                                                                                                                                      Alternatively, you can add the option -bind-addr 0.0.0.0:4001 to the etcd command line to allow the agent to connect.

                                                                                                                                                                                                                                                                                                                                                                      Remember! Never edit dragent.default.yaml directly; always edit only dragent.yaml.

                                                                                                                                                                                                                                                                                                                                                                      Example 1

                                                                                                                                                                                                                                                                                                                                                                      You can use {hostname} and {port} as a tokens in the conf: section. This is the recommended setting for Kubernetes customers.

                                                                                                                                                                                                                                                                                                                                                                      app_checks:
                                                                                                                                                                                                                                                                                                                                                                        - name: etcd
                                                                                                                                                                                                                                                                                                                                                                          pattern:
                                                                                                                                                                                                                                                                                                                                                                            comm: etcd
                                                                                                                                                                                                                                                                                                                                                                          conf:
                                                                                                                                                                                                                                                                                                                                                                            url: "http://{hostname}:{port}"
                                                                                                                                                                                                                                                                                                                                                                      

                                                                                                                                                                                                                                                                                                                                                                      Alternatively you can specify the real hostname and port.

                                                                                                                                                                                                                                                                                                                                                                      app_checks:
                                                                                                                                                                                                                                                                                                                                                                        - name: etcd
                                                                                                                                                                                                                                                                                                                                                                          pattern:
                                                                                                                                                                                                                                                                                                                                                                            comm: etcd
                                                                                                                                                                                                                                                                                                                                                                          conf:
                                                                                                                                                                                                                                                                                                                                                                            url: "http://my_hostname:4000"  #etcd service listening on port 4000
                                                                                                                                                                                                                                                                                                                                                                      

                                                                                                                                                                                                                                                                                                                                                                      Example 2: SSL/TLS Certificate

                                                                                                                                                                                                                                                                                                                                                                      If encryption is used, add the appropriate SSL/TLS entries. Provide correct path of SSL/TLS key and certificates used in etcd configuration in fields ssl_keyfile, ssl_certfile, ssl_ca_certs.

                                                                                                                                                                                                                                                                                                                                                                      app_checks:
                                                                                                                                                                                                                                                                                                                                                                        - name: etcd
                                                                                                                                                                                                                                                                                                                                                                          pattern:
                                                                                                                                                                                                                                                                                                                                                                            comm: etcd
                                                                                                                                                                                                                                                                                                                                                                          conf:
                                                                                                                                                                                                                                                                                                                                                                            url: "https://localhost:PORT"
                                                                                                                                                                                                                                                                                                                                                                            ssl_keyfile:  /etc/etcd/peer.key  # Path to key file
                                                                                                                                                                                                                                                                                                                                                                            ssl_certfile: /etc/etcd/peer.crt  # Path to SSL certificate
                                                                                                                                                                                                                                                                                                                                                                            ssl_ca_certs: /etc/etcd/ca.crt    # Path to CA certificate
                                                                                                                                                                                                                                                                                                                                                                            ssl_cert_validation: True
                                                                                                                                                                                                                                                                                                                                                                      

                                                                                                                                                                                                                                                                                                                                                                      Metrics Available

                                                                                                                                                                                                                                                                                                                                                                      See etcd Metrics.

                                                                                                                                                                                                                                                                                                                                                                      Result in the Monitor UI

                                                                                                                                                                                                                                                                                                                                                                      8.6.2.7 -

                                                                                                                                                                                                                                                                                                                                                                      fluentd

                                                                                                                                                                                                                                                                                                                                                                      Fluentd is an open source data collector, which allows unifying data collection and consumption to better use and understand data. Fluentd structures data as JSON as much as possible, to unify all facets of processing log data: collecting, filtering, buffering, and outputting logs across multiple sources and destinations. If Fluentd is installed on your environment, the Sysdig agent will automatically connect. See See the Default Configuration section, below. The Sysdig agent automatically collects default metrics.

                                                                                                                                                                                                                                                                                                                                                                      This page describes the default configuration settings, how to edit the configuration to collect additional information, the metrics available for integration, and a sample result in the Sysdig Monitor UI.

                                                                                                                                                                                                                                                                                                                                                                      Fluentd Setup

                                                                                                                                                                                                                                                                                                                                                                      Fluentd can be installed as a package (.deb, .rpm, etc) depending on the OS flavor, or it can be deployed in a Docker container. Fluentd installation is documented here. For the examples on this page, a .deb package installation is used.

                                                                                                                                                                                                                                                                                                                                                                      After installing Fluentd, add following lines in fluentd.conf :

                                                                                                                                                                                                                                                                                                                                                                      <source>
                                                                                                                                                                                                                                                                                                                                                                        @type monitor_agent
                                                                                                                                                                                                                                                                                                                                                                        bind 0.0.0.0
                                                                                                                                                                                                                                                                                                                                                                        port 24220
                                                                                                                                                                                                                                                                                                                                                                      </source>
                                                                                                                                                                                                                                                                                                                                                                      

                                                                                                                                                                                                                                                                                                                                                                      Sysdig Agent Configuration

                                                                                                                                                                                                                                                                                                                                                                      Review how to Edit dragent.yaml to Integrate or Modify Application Checks.

                                                                                                                                                                                                                                                                                                                                                                      Default Configuration

                                                                                                                                                                                                                                                                                                                                                                      By default, Sysdig’sdragent.default.yaml uses the following code to connect with Fluentd and collect default metrics.

                                                                                                                                                                                                                                                                                                                                                                      (If you use a non-standard port for monitor_agent , you can configure it as usual in the agent config file dragent.yaml.)

                                                                                                                                                                                                                                                                                                                                                                        - name: fluentd
                                                                                                                                                                                                                                                                                                                                                                          pattern:
                                                                                                                                                                                                                                                                                                                                                                            comm: fluentd
                                                                                                                                                                                                                                                                                                                                                                          conf:
                                                                                                                                                                                                                                                                                                                                                                            monitor_agent_url: http://localhost:24220/api/plugins.json
                                                                                                                                                                                                                                                                                                                                                                      

                                                                                                                                                                                                                                                                                                                                                                      Remember! Never edit dragent.default.yaml directly; always edit only dragent.yaml.

                                                                                                                                                                                                                                                                                                                                                                      Example

                                                                                                                                                                                                                                                                                                                                                                      To generate the metric data, it is necessary to generate some logs through an application. In the following example, HTTP is used. (For more information, see Life of a Fluentd event.)

                                                                                                                                                                                                                                                                                                                                                                      Execute the following command on in the Fluentd environment:

                                                                                                                                                                                                                                                                                                                                                                      $ curl -i -X POST -d 'json={"action":"login","user":2}' http://localhost:8888/test.cycle
                                                                                                                                                                                                                                                                                                                                                                      

                                                                                                                                                                                                                                                                                                                                                                      Expected output: (Note: Here the status code is 200 OK, as HTTP traffic is successfully generated; it will vary per application.)

                                                                                                                                                                                                                                                                                                                                                                      HTTP/1.1 200 OK
                                                                                                                                                                                                                                                                                                                                                                      Content-type: text/plain
                                                                                                                                                                                                                                                                                                                                                                      Connection: Keep-Alive
                                                                                                                                                                                                                                                                                                                                                                      Content-length: 0
                                                                                                                                                                                                                                                                                                                                                                      

                                                                                                                                                                                                                                                                                                                                                                      Metrics Available

                                                                                                                                                                                                                                                                                                                                                                      See fluentd Metrics.

                                                                                                                                                                                                                                                                                                                                                                      Result in the Monitor UI

                                                                                                                                                                                                                                                                                                                                                                      8.6.2.8 -

                                                                                                                                                                                                                                                                                                                                                                      Go

                                                                                                                                                                                                                                                                                                                                                                      Golang expvaris the standard interface designed to instrument and expose custom metrics from a Go program via HTTP. In addition to custom metrics, it also exports some metrics out-of-the-box, such as command line arguments, allocation stats, heap stats, and garbage collection metrics.

                                                                                                                                                                                                                                                                                                                                                                      This page describes the default configuration settings, how to edit the configuration to collect additional information, the metrics available for integration, and a sample result in the Sysdig Monitor UI.

                                                                                                                                                                                                                                                                                                                                                                      Go_expvar Setup

                                                                                                                                                                                                                                                                                                                                                                      You will need to create a custom entry in the user settings config file for your Go application, due to the difficulty in determining if an application is written in Go by looking at process names or arguments. Be sure your app has expvars enabled, which means importing the expvar module and having an HTTP server started from inside your app, as follows:

                                                                                                                                                                                                                                                                                                                                                                      import (
                                                                                                                                                                                                                                                                                                                                                                          ...
                                                                                                                                                                                                                                                                                                                                                                          "net/http"
                                                                                                                                                                                                                                                                                                                                                                          "expvar"
                                                                                                                                                                                                                                                                                                                                                                          ...
                                                                                                                                                                                                                                                                                                                                                                      )
                                                                                                                                                                                                                                                                                                                                                                      
                                                                                                                                                                                                                                                                                                                                                                      // If your application has no http server running for the DefaultServeMux,
                                                                                                                                                                                                                                                                                                                                                                      // you'll have to have a http server running for expvar to use, for example
                                                                                                                                                                                                                                                                                                                                                                      // by adding the following to your init function
                                                                                                                                                                                                                                                                                                                                                                      func init() {
                                                                                                                                                                                                                                                                                                                                                                          go http.ServeAndListen(":8080", nil)
                                                                                                                                                                                                                                                                                                                                                                      }
                                                                                                                                                                                                                                                                                                                                                                      
                                                                                                                                                                                                                                                                                                                                                                      // You can also expose variables that are specific to your application
                                                                                                                                                                                                                                                                                                                                                                      // See http://golang.org/pkg/expvar/ for more information
                                                                                                                                                                                                                                                                                                                                                                      
                                                                                                                                                                                                                                                                                                                                                                      var (
                                                                                                                                                                                                                                                                                                                                                                          exp_points_processed = expvar.NewInt("points_processed")
                                                                                                                                                                                                                                                                                                                                                                      )
                                                                                                                                                                                                                                                                                                                                                                      
                                                                                                                                                                                                                                                                                                                                                                      func processPoints(p RawPoints) {
                                                                                                                                                                                                                                                                                                                                                                          points_processed, err := parsePoints(p)
                                                                                                                                                                                                                                                                                                                                                                          exp_points_processed.Add(points_processed)
                                                                                                                                                                                                                                                                                                                                                                          ...
                                                                                                                                                                                                                                                                                                                                                                      }
                                                                                                                                                                                                                                                                                                                                                                      

                                                                                                                                                                                                                                                                                                                                                                      See also the following blog entry: How to instrument Go code with custom expvar metrics.

                                                                                                                                                                                                                                                                                                                                                                      Sysdig Agent Configuration

                                                                                                                                                                                                                                                                                                                                                                      Review how to Edit dragent.yaml to Integrate or Modify Application Checks.

                                                                                                                                                                                                                                                                                                                                                                      Default Configuration

                                                                                                                                                                                                                                                                                                                                                                      No default configuration for Go is provided in the Sysdig agent dragent.default.yaml file. You must edit the agent config file as described in Example 1.

                                                                                                                                                                                                                                                                                                                                                                      Remember! Never edit dragent.default.yaml directly; always edit only dragent.yaml.

                                                                                                                                                                                                                                                                                                                                                                      Example

                                                                                                                                                                                                                                                                                                                                                                      Add the following code sample to dragent.yaml to collect Go metrics.

                                                                                                                                                                                                                                                                                                                                                                      app_checks:
                                                                                                                                                                                                                                                                                                                                                                        - name: go-expvar
                                                                                                                                                                                                                                                                                                                                                                          check_module: go_expvar
                                                                                                                                                                                                                                                                                                                                                                          pattern:
                                                                                                                                                                                                                                                                                                                                                                                comm: go-expvar
                                                                                                                                                                                                                                                                                                                                                                          conf:
                                                                                                                                                                                                                                                                                                                                                                            expvar_url: "http://localhost:8080/debug/vars" # automatically match url using the listening port
                                                                                                                                                                                                                                                                                                                                                                            # Add custom metrics if you want
                                                                                                                                                                                                                                                                                                                                                                            metrics:
                                                                                                                                                                                                                                                                                                                                                                              - path: system.numberOfSeconds
                                                                                                                                                                                                                                                                                                                                                                                type: gauge # gauge or rate
                                                                                                                                                                                                                                                                                                                                                                                alias: go_expvar.system.numberOfSeconds
                                                                                                                                                                                                                                                                                                                                                                              - path: system.lastLoad
                                                                                                                                                                                                                                                                                                                                                                                type: gauge
                                                                                                                                                                                                                                                                                                                                                                                alias: go_expvar.system.lastLoad
                                                                                                                                                                                                                                                                                                                                                                              - path: system.numberOfLoginsPerUser/.* # You can use / to get inside the map and use .* to match any record inside
                                                                                                                                                                                                                                                                                                                                                                                type: gauge
                                                                                                                                                                                                                                                                                                                                                                              - path: system.allLoad/.*
                                                                                                                                                                                                                                                                                                                                                                                type: gauge
                                                                                                                                                                                                                                                                                                                                                                      

                                                                                                                                                                                                                                                                                                                                                                      Metrics Available

                                                                                                                                                                                                                                                                                                                                                                      See Go Metrics.

                                                                                                                                                                                                                                                                                                                                                                      Result in the Monitor UI

                                                                                                                                                                                                                                                                                                                                                                      8.6.2.9 -

                                                                                                                                                                                                                                                                                                                                                                      HAProxy

                                                                                                                                                                                                                                                                                                                                                                      HAProxy provides a high-availability load balancer and proxy server for TCP- and HTTP-based applications which spreads requests across multiple servers.

                                                                                                                                                                                                                                                                                                                                                                      The Sysdig agent automatically collects haproxy metrics. You can also edit the agent configuration file to collect additional metrics.

                                                                                                                                                                                                                                                                                                                                                                      This page describes the default configuration settings, how to edit the configuration to collect additional information, the metrics available for integration, and a sample result in the Sysdig Monitor UI.

                                                                                                                                                                                                                                                                                                                                                                      HAProxy Setup

                                                                                                                                                                                                                                                                                                                                                                      The stats feature must be enabled on your HAProxy instance. This can be done by adding the following entry to the HAProxy configuration file /etc/haproxy/haproxy.cfg

                                                                                                                                                                                                                                                                                                                                                                      listen stats
                                                                                                                                                                                                                                                                                                                                                                        bind :1936
                                                                                                                                                                                                                                                                                                                                                                        mode http
                                                                                                                                                                                                                                                                                                                                                                        stats enable
                                                                                                                                                                                                                                                                                                                                                                        stats hide-version
                                                                                                                                                                                                                                                                                                                                                                        stats realm Haproxy\ Statistics
                                                                                                                                                                                                                                                                                                                                                                        stats uri /haproxy_stats
                                                                                                                                                                                                                                                                                                                                                                        stats auth stats:stats
                                                                                                                                                                                                                                                                                                                                                                      

                                                                                                                                                                                                                                                                                                                                                                      Sysdig Agent Configuration

                                                                                                                                                                                                                                                                                                                                                                      Review how to Edit dragent.yaml to Integrate or Modify Application Checks.

                                                                                                                                                                                                                                                                                                                                                                      Default Configuration

                                                                                                                                                                                                                                                                                                                                                                      By default, Sysdig’s dragent.default.yaml uses the following code to connect with HAProxy and collect haproxy metrics:

                                                                                                                                                                                                                                                                                                                                                                      app_checks:
                                                                                                                                                                                                                                                                                                                                                                        - name: haproxy
                                                                                                                                                                                                                                                                                                                                                                          pattern:
                                                                                                                                                                                                                                                                                                                                                                            comm: haproxy
                                                                                                                                                                                                                                                                                                                                                                            port: 1936
                                                                                                                                                                                                                                                                                                                                                                          conf:
                                                                                                                                                                                                                                                                                                                                                                            username: stats
                                                                                                                                                                                                                                                                                                                                                                            password: stats
                                                                                                                                                                                                                                                                                                                                                                            url: http://localhost:1936/
                                                                                                                                                                                                                                                                                                                                                                            collect_aggregates_only: True
                                                                                                                                                                                                                                                                                                                                                                          log_errors: false
                                                                                                                                                                                                                                                                                                                                                                      

                                                                                                                                                                                                                                                                                                                                                                      You can get a few additional status metrics by editing the configuration in dragent.yaml,as in the following examples.

                                                                                                                                                                                                                                                                                                                                                                      Remember! Never edit dragent.default.yaml directly; always edit only dragent.yaml

                                                                                                                                                                                                                                                                                                                                                                      Example: Collect Status Metrics Per Service

                                                                                                                                                                                                                                                                                                                                                                      Enable the collect_status_metrics flag to collect the metrics haproxy.count_per_status, and haproxy.backend_hosts.

                                                                                                                                                                                                                                                                                                                                                                      app_checks:
                                                                                                                                                                                                                                                                                                                                                                        - name: haproxy
                                                                                                                                                                                                                                                                                                                                                                          pattern:
                                                                                                                                                                                                                                                                                                                                                                            comm: haproxy
                                                                                                                                                                                                                                                                                                                                                                            port: 1936
                                                                                                                                                                                                                                                                                                                                                                          conf:
                                                                                                                                                                                                                                                                                                                                                                            username: stats
                                                                                                                                                                                                                                                                                                                                                                            password: stats
                                                                                                                                                                                                                                                                                                                                                                            url: http://localhost:1936/haproxy_stats
                                                                                                                                                                                                                                                                                                                                                                            collect_aggregates_only: True
                                                                                                                                                                                                                                                                                                                                                                            collect_status_metrics: True
                                                                                                                                                                                                                                                                                                                                                                          log_errors: false
                                                                                                                                                                                                                                                                                                                                                                      

                                                                                                                                                                                                                                                                                                                                                                      Example: Collect Status Metrics Per Host

                                                                                                                                                                                                                                                                                                                                                                      Enable:

                                                                                                                                                                                                                                                                                                                                                                      • collect_status_metrics_by_host: Instructs the check to collect status metrics per host, instead of per service. This only applies if `collect_status_metrics` is true.

                                                                                                                                                                                                                                                                                                                                                                      • tag_service_check_by_host: When this flag is set, the hostname is also passed with the service check ‘haproxy.backend_up’.

                                                                                                                                                                                                                                                                                                                                                                        By default, only the backend name and service name are associated with it.

                                                                                                                                                                                                                                                                                                                                                                      app_checks:
                                                                                                                                                                                                                                                                                                                                                                        - name: haproxy
                                                                                                                                                                                                                                                                                                                                                                          pattern:
                                                                                                                                                                                                                                                                                                                                                                            comm: haproxy
                                                                                                                                                                                                                                                                                                                                                                            port: 1936
                                                                                                                                                                                                                                                                                                                                                                          conf:
                                                                                                                                                                                                                                                                                                                                                                            username: stats
                                                                                                                                                                                                                                                                                                                                                                            password: stats
                                                                                                                                                                                                                                                                                                                                                                            url: http://localhost:1936/haproxy_stats
                                                                                                                                                                                                                                                                                                                                                                            collect_aggregates_only: True
                                                                                                                                                                                                                                                                                                                                                                            collect_status_metrics: True
                                                                                                                                                                                                                                                                                                                                                                            collect_status_metrics_by_host: True
                                                                                                                                                                                                                                                                                                                                                                            tag_service_check_by_host: True
                                                                                                                                                                                                                                                                                                                                                                          log_errors: false
                                                                                                                                                                                                                                                                                                                                                                      

                                                                                                                                                                                                                                                                                                                                                                      Example: Collect HAProxy Stats by UNIX Socket

                                                                                                                                                                                                                                                                                                                                                                      If you’ve configured HAProxy to report statistics to a UNIX socket, you can set the url in dragent.yaml to the socket’s path (e.g., unix:///var/run/haproxy.sock).

                                                                                                                                                                                                                                                                                                                                                                      Set up HAProxy Config File

                                                                                                                                                                                                                                                                                                                                                                      Edit your HAProxy configuration file ( /etc/haproxy/haproxy.cfg ) to add the following lines to the global section:

                                                                                                                                                                                                                                                                                                                                                                      global
                                                                                                                                                                                                                                                                                                                                                                          [snip]
                                                                                                                                                                                                                                                                                                                                                                             stats socket /run/haproxy/admin.sock mode 660 level admin
                                                                                                                                                                                                                                                                                                                                                                             stats timeout 30s
                                                                                                                                                                                                                                                                                                                                                                          [snip]
                                                                                                                                                                                                                                                                                                                                                                      

                                                                                                                                                                                                                                                                                                                                                                      Edit dragent.yaml url

                                                                                                                                                                                                                                                                                                                                                                      Add the socket URL from the HAProxy config to the dragent.yaml file:

                                                                                                                                                                                                                                                                                                                                                                      app_checks:
                                                                                                                                                                                                                                                                                                                                                                            - name: haproxy
                                                                                                                                                                                                                                                                                                                                                                              pattern:
                                                                                                                                                                                                                                                                                                                                                                                comm: haproxy
                                                                                                                                                                                                                                                                                                                                                                              conf:
                                                                                                                                                                                                                                                                                                                                                                                url: unix:///run/haproxy/admin.sock
                                                                                                                                                                                                                                                                                                                                                                              log_errors: True
                                                                                                                                                                                                                                                                                                                                                                      

                                                                                                                                                                                                                                                                                                                                                                      Metrics Available

                                                                                                                                                                                                                                                                                                                                                                      See HAProxy Metrics.

                                                                                                                                                                                                                                                                                                                                                                      Example: Enable Service Check

                                                                                                                                                                                                                                                                                                                                                                      Required: Agent 9.6.0+

                                                                                                                                                                                                                                                                                                                                                                      enable_service_check: Enable/Disable service check haproxy.backend.up.

                                                                                                                                                                                                                                                                                                                                                                      When set to false , all service checks will be disabled.

                                                                                                                                                                                                                                                                                                                                                                      app_checks:
                                                                                                                                                                                                                                                                                                                                                                        - name: haproxy
                                                                                                                                                                                                                                                                                                                                                                          pattern:
                                                                                                                                                                                                                                                                                                                                                                            comm: haproxy
                                                                                                                                                                                                                                                                                                                                                                            port: 1936
                                                                                                                                                                                                                                                                                                                                                                          conf:
                                                                                                                                                                                                                                                                                                                                                                            username: stats
                                                                                                                                                                                                                                                                                                                                                                            password: stats
                                                                                                                                                                                                                                                                                                                                                                            url: http://localhost:1936/haproxy_stats
                                                                                                                                                                                                                                                                                                                                                                            collect_aggregates_only: true
                                                                                                                                                                                                                                                                                                                                                                            enable_service_check: false
                                                                                                                                                                                                                                                                                                                                                                      

                                                                                                                                                                                                                                                                                                                                                                      Example: Filter Metrics Per Service

                                                                                                                                                                                                                                                                                                                                                                      Required: Agent 9.6.0+

                                                                                                                                                                                                                                                                                                                                                                      services_exclude (Optional): Name or regex of services to be excluded.

                                                                                                                                                                                                                                                                                                                                                                      services_include (Optional): Name or regex of services to be included

                                                                                                                                                                                                                                                                                                                                                                      If a service is excluded with services_exclude, it can still be be included explicitly by services_include. The following example excludes all services except service_1 and service_2.

                                                                                                                                                                                                                                                                                                                                                                      app_checks:
                                                                                                                                                                                                                                                                                                                                                                        - name: haproxy
                                                                                                                                                                                                                                                                                                                                                                          pattern:
                                                                                                                                                                                                                                                                                                                                                                            comm: haproxy
                                                                                                                                                                                                                                                                                                                                                                            port: 1936
                                                                                                                                                                                                                                                                                                                                                                          conf:
                                                                                                                                                                                                                                                                                                                                                                            username: stats
                                                                                                                                                                                                                                                                                                                                                                            password: stats
                                                                                                                                                                                                                                                                                                                                                                            url: http://localhost:1936/haproxy_stats
                                                                                                                                                                                                                                                                                                                                                                            collect_aggregates_only: true
                                                                                                                                                                                                                                                                                                                                                                            services_exclude:
                                                                                                                                                                                                                                                                                                                                                                              - ".*"
                                                                                                                                                                                                                                                                                                                                                                            services_include:
                                                                                                                                                                                                                                                                                                                                                                              - "service_1"
                                                                                                                                                                                                                                                                                                                                                                              - "service_2"
                                                                                                                                                                                                                                                                                                                                                                      

                                                                                                                                                                                                                                                                                                                                                                      Additional Options: active_tag, headers

                                                                                                                                                                                                                                                                                                                                                                      Required: Agent 9.6.0+

                                                                                                                                                                                                                                                                                                                                                                      There are two additional configuration options introduced with agent 9.6.0:

                                                                                                                                                                                                                                                                                                                                                                      • active_tag (Optional. Default: false):

                                                                                                                                                                                                                                                                                                                                                                        Adds tag active to backend metrics that belong to the active pool of connections.

                                                                                                                                                                                                                                                                                                                                                                      • headers (Optional):

                                                                                                                                                                                                                                                                                                                                                                        Extra headers such as auth-token can be passed along with requests.

                                                                                                                                                                                                                                                                                                                                                                      app_checks:
                                                                                                                                                                                                                                                                                                                                                                        - name: haproxy
                                                                                                                                                                                                                                                                                                                                                                          pattern:
                                                                                                                                                                                                                                                                                                                                                                            comm: haproxy
                                                                                                                                                                                                                                                                                                                                                                            port: 1936
                                                                                                                                                                                                                                                                                                                                                                          conf:
                                                                                                                                                                                                                                                                                                                                                                            username: stats
                                                                                                                                                                                                                                                                                                                                                                            password: stats
                                                                                                                                                                                                                                                                                                                                                                            url: http://localhost:1936/haproxy_stats
                                                                                                                                                                                                                                                                                                                                                                            collect_aggregates_only: true
                                                                                                                                                                                                                                                                                                                                                                            active_tag: true
                                                                                                                                                                                                                                                                                                                                                                            headers:
                                                                                                                                                                                                                                                                                                                                                                              <HEADER_NAME>: <HEADER_VALUE>
                                                                                                                                                                                                                                                                                                                                                                              <HEADER_NAME>: <HEADER_VALUE>
                                                                                                                                                                                                                                                                                                                                                                      

                                                                                                                                                                                                                                                                                                                                                                      Result in the Monitor UI

                                                                                                                                                                                                                                                                                                                                                                      8.6.2.10 -

                                                                                                                                                                                                                                                                                                                                                                      HTTP

                                                                                                                                                                                                                                                                                                                                                                      The HTTP check monitors HTTP-based applications for URL availability.

                                                                                                                                                                                                                                                                                                                                                                      This page describes the default configuration settings, how to edit the configuration to collect additional information, the metrics available for integration, and a sample result in the Sysdig Monitor UI.

                                                                                                                                                                                                                                                                                                                                                                      HTTP Setup

                                                                                                                                                                                                                                                                                                                                                                      You do not need to configure anything on HTTP-based applications for the Sysdig agent to connect.

                                                                                                                                                                                                                                                                                                                                                                      Sysdig Agent Configuration

                                                                                                                                                                                                                                                                                                                                                                      Review how to Edit dragent.yaml to Integrate or Modify Application Checks.

                                                                                                                                                                                                                                                                                                                                                                      Default Configuration

                                                                                                                                                                                                                                                                                                                                                                      No default entry is present in the dragent.default.yaml for the HTTP check. You need to add an entry in dragent.yaml as shown in following examples.

                                                                                                                                                                                                                                                                                                                                                                      Never edit dragent.default.yaml directly; always edit only dragent.yaml.

                                                                                                                                                                                                                                                                                                                                                                      Example 1

                                                                                                                                                                                                                                                                                                                                                                      First you must identify the process pattern (comm:). It must match an actively running process for the HTTP check to work. Sysdig recommends the process be the one that is serving the URL being checked.

                                                                                                                                                                                                                                                                                                                                                                      If the URL is is remote from the agent, the user should use a process that is always running, such as “systemd”.

                                                                                                                                                                                                                                                                                                                                                                      Confirm the “comm” value using the following command:

                                                                                                                                                                                                                                                                                                                                                                      cat /proc/1/comm
                                                                                                                                                                                                                                                                                                                                                                      

                                                                                                                                                                                                                                                                                                                                                                      Add the following entry to the dragent.yaml file and modify the 'name:''comm:' and 'url:' parameters as needed:

                                                                                                                                                                                                                                                                                                                                                                      app_checks:
                                                                                                                                                                                                                                                                                                                                                                        - name: EXAMPLE_WEBSITE
                                                                                                                                                                                                                                                                                                                                                                          check_module: http_check
                                                                                                                                                                                                                                                                                                                                                                          pattern:
                                                                                                                                                                                                                                                                                                                                                                            comm:  systemd
                                                                                                                                                                                                                                                                                                                                                                          conf:
                                                                                                                                                                                                                                                                                                                                                                            url: https://www.MYEXAMPLE.com
                                                                                                                                                                                                                                                                                                                                                                      

                                                                                                                                                                                                                                                                                                                                                                      Example 2

                                                                                                                                                                                                                                                                                                                                                                      There are multiple configuration options available with the HTTP check. A full list is provided in the table following Example 2. These keys should be listed under the conf: section of the configuration in Example 1.

                                                                                                                                                                                                                                                                                                                                                                      app_checks:
                                                                                                                                                                                                                                                                                                                                                                        - name: EXAMPLE_WEBSITE
                                                                                                                                                                                                                                                                                                                                                                          check_module: http_check
                                                                                                                                                                                                                                                                                                                                                                          pattern:
                                                                                                                                                                                                                                                                                                                                                                            comm:  systemd
                                                                                                                                                                                                                                                                                                                                                                          conf:
                                                                                                                                                                                                                                                                                                                                                                            url: https://www.MYEXAMPLE.com
                                                                                                                                                                                                                                                                                                                                                                            # timeout: 1
                                                                                                                                                                                                                                                                                                                                                                            #  method: get
                                                                                                                                                                                                                                                                                                                                                                            #  data:
                                                                                                                                                                                                                                                                                                                                                                            #    <KEY>: <VALUE>
                                                                                                                                                                                                                                                                                                                                                                            #  content_match: '<REGEX>''
                                                                                                                                                                                                                                                                                                                                                                            #  reverse_content_match: false
                                                                                                                                                                                                                                                                                                                                                                            #  username: <USERNAME>
                                                                                                                                                                                                                                                                                                                                                                            #  ntlm_domain: <DOMAIN>
                                                                                                                                                                                                                                                                                                                                                                            #  password: <PASSWORD>
                                                                                                                                                                                                                                                                                                                                                                            #  client_cert: /opt/client.crt
                                                                                                                                                                                                                                                                                                                                                                            #  client_key: /opt/client.key
                                                                                                                                                                                                                                                                                                                                                                            #  http_response_status_code: (1|2|3)\d\d
                                                                                                                                                                                                                                                                                                                                                                            #  include_content: false
                                                                                                                                                                                                                                                                                                                                                                            #  collect_response_time: true
                                                                                                                                                                                                                                                                                                                                                                            #  disable_ssl_validation: true
                                                                                                                                                                                                                                                                                                                                                                            #  ignore_ssl_warning: false
                                                                                                                                                                                                                                                                                                                                                                            #  ca_certs: /etc/ssl/certs/ca-certificates.crt
                                                                                                                                                                                                                                                                                                                                                                            #  check_certificate_expiration: true
                                                                                                                                                                                                                                                                                                                                                                            #  days_warning: <THRESHOLD_DAYS>
                                                                                                                                                                                                                                                                                                                                                                            #  check_hostname: true
                                                                                                                                                                                                                                                                                                                                                                            #  ssl_server_name: <HOSTNAME>
                                                                                                                                                                                                                                                                                                                                                                            #  headers:
                                                                                                                                                                                                                                                                                                                                                                            #    Host: alternative.host.example.com
                                                                                                                                                                                                                                                                                                                                                                            #    X-Auth-Token: <AUTH_TOKEN>
                                                                                                                                                                                                                                                                                                                                                                            #  skip_proxy: false
                                                                                                                                                                                                                                                                                                                                                                            #  allow_redirects: true
                                                                                                                                                                                                                                                                                                                                                                            #  include_default_headers: true
                                                                                                                                                                                                                                                                                                                                                                            #  tags:
                                                                                                                                                                                                                                                                                                                                                                            #    - <KEY_1>:<VALUE_1>
                                                                                                                                                                                                                                                                                                                                                                            #    - <KEY_2>:<VALUE_2>
                                                                                                                                                                                                                                                                                                                                                                      

                                                                                                                                                                                                                                                                                                                                                                      Key

                                                                                                                                                                                                                                                                                                                                                                      Description

                                                                                                                                                                                                                                                                                                                                                                      url

                                                                                                                                                                                                                                                                                                                                                                      The URL to test.

                                                                                                                                                                                                                                                                                                                                                                      timeout

                                                                                                                                                                                                                                                                                                                                                                      The time in seconds to allow for a response.

                                                                                                                                                                                                                                                                                                                                                                      method

                                                                                                                                                                                                                                                                                                                                                                      The HTTP method. This setting defaults to GET, though many other HTTP methods are supported, including POST and PUT.

                                                                                                                                                                                                                                                                                                                                                                      data

                                                                                                                                                                                                                                                                                                                                                                      The data option is only available when using the POST method. Data should be included as key-value pairs and will be sent in the body of the request.

                                                                                                                                                                                                                                                                                                                                                                      content_match

                                                                                                                                                                                                                                                                                                                                                                      A string or Python regular expression. The HTTP check will search for this value in the response and will report as DOWN if the string or expression is not found.

                                                                                                                                                                                                                                                                                                                                                                      reverse_content_match

                                                                                                                                                                                                                                                                                                                                                                      When true, reverses the behavior of the content_matchoption, i.e. the HTTP check will report as DOWN if the string or expression in content_match IS found. (default is false)

                                                                                                                                                                                                                                                                                                                                                                      username & password

                                                                                                                                                                                                                                                                                                                                                                      If your service uses basic authentication, you can provide the username and password here.

                                                                                                                                                                                                                                                                                                                                                                      http_response_status_code

                                                                                                                                                                                                                                                                                                                                                                      A string or Python regular expression for an HTTP status code. This check will report DOWN for any status code that does not match. This defaults to 1xx, 2xx and 3xx HTTP status codes. For example: 401 or 4\d\d.

                                                                                                                                                                                                                                                                                                                                                                      include_content

                                                                                                                                                                                                                                                                                                                                                                      When set to true, the check will include the first 200 characters of the HTTP response body in notifications. The default value is false.

                                                                                                                                                                                                                                                                                                                                                                      collect_response_time

                                                                                                                                                                                                                                                                                                                                                                      By default, the check will collect the response time (in seconds) as the metric network.http.response_time. To disable, set this value to false.

                                                                                                                                                                                                                                                                                                                                                                      disable_ssl_validation

                                                                                                                                                                                                                                                                                                                                                                      This setting will skip SSL certificate validation and is enabled by default. If you require SSL certificate validation, set this to false. This option is only used when gathering the response time/aliveness from the specified endpoint. Note this setting doesn't apply to the check_certificate_expirationoption.

                                                                                                                                                                                                                                                                                                                                                                      ignore_ssl_warning

                                                                                                                                                                                                                                                                                                                                                                      When SSL certificate validation is enabled (see setting above), this setting allows you to disable security warnings.

                                                                                                                                                                                                                                                                                                                                                                      ca_certs

                                                                                                                                                                                                                                                                                                                                                                      This setting allows you to override the default certificate path as specified in init_config

                                                                                                                                                                                                                                                                                                                                                                      check_certificate_expiration

                                                                                                                                                                                                                                                                                                                                                                      When check_certificate_expiration is enabled, the service check will check the expiration date of the SSL certificate.

                                                                                                                                                                                                                                                                                                                                                                      Note that this will cause the SSL certificate to be validated, regardless of the value of the disable_ssl_validation setting.

                                                                                                                                                                                                                                                                                                                                                                      days_warning

                                                                                                                                                                                                                                                                                                                                                                      When check_certificate_expiration is enabled, these settings will raise a warning alert when the SSL certificate is within the specified number of days from expiration.

                                                                                                                                                                                                                                                                                                                                                                      check_hostname

                                                                                                                                                                                                                                                                                                                                                                      When check_certificate_expiration is enabled, this setting will raise a warning if the hostname on the SSL certificate does not match the host of the given URL.

                                                                                                                                                                                                                                                                                                                                                                      headers

                                                                                                                                                                                                                                                                                                                                                                      This parameter allows you to send additional headers with the request. e.g. X-Auth-Token: <AUTH_TOKEN>

                                                                                                                                                                                                                                                                                                                                                                      skip_proxy

                                                                                                                                                                                                                                                                                                                                                                      If set, the check will bypass proxy settings and attempt to reach the check URL directly. This defaults to false.

                                                                                                                                                                                                                                                                                                                                                                      allow_redirects

                                                                                                                                                                                                                                                                                                                                                                      This setting allows the service check to follow HTTP redirects and defaults to true.

                                                                                                                                                                                                                                                                                                                                                                      tags

                                                                                                                                                                                                                                                                                                                                                                      A list of arbitrary tags that will be associated with the check.

                                                                                                                                                                                                                                                                                                                                                                      Metrics Available

                                                                                                                                                                                                                                                                                                                                                                      HTTP metrics concern response time and SSL certificate expiry information.

                                                                                                                                                                                                                                                                                                                                                                      See HTTP Metrics.

                                                                                                                                                                                                                                                                                                                                                                      Service Checks

                                                                                                                                                                                                                                                                                                                                                                      http.can_connect:

                                                                                                                                                                                                                                                                                                                                                                      Returns DOWN when any of the following occur:

                                                                                                                                                                                                                                                                                                                                                                      • the request to URL times out

                                                                                                                                                                                                                                                                                                                                                                      • the response code is 4xx/5xx, or it doesn’t match the pattern provided in the http_response_status_code

                                                                                                                                                                                                                                                                                                                                                                      • the response body does not contain the pattern in content_match

                                                                                                                                                                                                                                                                                                                                                                      • reverse_content_match is true and the response body does contain the pattern in content_match

                                                                                                                                                                                                                                                                                                                                                                      • URI contains https and disable_ssl_validation is false, and the SSL connection cannot be validated

                                                                                                                                                                                                                                                                                                                                                                      • Otherwise, returns UP.

                                                                                                                                                                                                                                                                                                                                                                      Segmentation of the http.can_connect can be done by URL.

                                                                                                                                                                                                                                                                                                                                                                      http.ssl_cert:

                                                                                                                                                                                                                                                                                                                                                                      The check returns:

                                                                                                                                                                                                                                                                                                                                                                      • DOWN if the URL’s certificate has already expired

                                                                                                                                                                                                                                                                                                                                                                      • WARNING if the URL’s certificate expires in less than days_warning days

                                                                                                                                                                                                                                                                                                                                                                      • Otherwise, returns UP.

                                                                                                                                                                                                                                                                                                                                                                      To disable this check, set check_certificate_expiration to false.

                                                                                                                                                                                                                                                                                                                                                                      Result in the Monitor UI

                                                                                                                                                                                                                                                                                                                                                                      8.6.2.11 -

                                                                                                                                                                                                                                                                                                                                                                      Jenkins

                                                                                                                                                                                                                                                                                                                                                                      Jenkins is an open-source automation server which helps to automate part of the software development process, permitting continuous integration and facilitating the technical aspects of continuous delivery. It supports version control tools (such as Subversion, Git, Mercurial, etc), can execute Apache Ant, Apache Maven and SBT-based projects, and allows shell scripts and Windows batch commands. If Jenkins is installed on your environment, the Sysdig agent will automatically connect and collect all Jenkins metrics. See the Default Configuration section, below.

                                                                                                                                                                                                                                                                                                                                                                      This page describes the default configuration settings, the metrics available for integration, and a sample result in the Sysdig Monitor UI.

                                                                                                                                                                                                                                                                                                                                                                      Jenkins Setup

                                                                                                                                                                                                                                                                                                                                                                      Requires the standard Jenkins server setup with one or more Jenkins Jobs running on it.

                                                                                                                                                                                                                                                                                                                                                                      Sysdig Agent Configuration

                                                                                                                                                                                                                                                                                                                                                                      Review how to Edit dragent.yaml to Integrate or Modify Application Checks.

                                                                                                                                                                                                                                                                                                                                                                      Remember! Never edit dragent.default.yaml directly; always edit only dragent.yaml.

                                                                                                                                                                                                                                                                                                                                                                      Default Configuration

                                                                                                                                                                                                                                                                                                                                                                      By default, Sysdig’s dragent.default.yaml uses the following code to connect with Jenkins and collect basic metrics.

                                                                                                                                                                                                                                                                                                                                                                        - name: jenkins
                                                                                                                                                                                                                                                                                                                                                                          pattern:
                                                                                                                                                                                                                                                                                                                                                                            comm: java
                                                                                                                                                                                                                                                                                                                                                                            port: 50000
                                                                                                                                                                                                                                                                                                                                                                          conf:
                                                                                                                                                                                                                                                                                                                                                                            name: default
                                                                                                                                                                                                                                                                                                                                                                            jenkins_home: /var/lib/jenkins #this depends on your environment
                                                                                                                                                                                                                                                                                                                                                                      

                                                                                                                                                                                                                                                                                                                                                                      Jenkins Folders Plugin

                                                                                                                                                                                                                                                                                                                                                                      By default, the Sysdig agent does not monitor jobs under job folders created using Folders plugin.

                                                                                                                                                                                                                                                                                                                                                                      Set jobs_folder_depth to monitor these jobs. Job folders are scanned recursively for jobs until the designated folder depth is reached. The default value = 1.

                                                                                                                                                                                                                                                                                                                                                                      app_checks:
                                                                                                                                                                                                                                                                                                                                                                        - name: jenkins
                                                                                                                                                                                                                                                                                                                                                                          pattern:
                                                                                                                                                                                                                                                                                                                                                                            comm: java
                                                                                                                                                                                                                                                                                                                                                                            port: 50000
                                                                                                                                                                                                                                                                                                                                                                          conf:
                                                                                                                                                                                                                                                                                                                                                                            name: default
                                                                                                                                                                                                                                                                                                                                                                            jenkins_home: /var/lib/jenkins
                                                                                                                                                                                                                                                                                                                                                                            jobs_folder_depth: 3
                                                                                                                                                                                                                                                                                                                                                                      

                                                                                                                                                                                                                                                                                                                                                                      Metrics Available

                                                                                                                                                                                                                                                                                                                                                                      The following metrics will be available only after running one or more Jenkins jobs. They handle queue size, job duration, and job waiting time.

                                                                                                                                                                                                                                                                                                                                                                      See Jenkins Metrics.

                                                                                                                                                                                                                                                                                                                                                                      Result in the Monitor UI

                                                                                                                                                                                                                                                                                                                                                                      8.6.2.12 -

                                                                                                                                                                                                                                                                                                                                                                      Lighttpd

                                                                                                                                                                                                                                                                                                                                                                      Lighttpd is a secure, fast, compliant, and very flexible web server that has been optimized for high-performance environments. It has a very low memory footprint compared to other web servers and takes care of CPU load. Its advanced feature set (FastCGI, CGI, Auth, Output Compression, URL Rewriting, and many more) make Lighttpd the perfect web server software for every server that suffers load problems. If Lighttpd is installed on your environment, the Sysdig agent will automatically connect. See the Default Configuration section, below. The Sysdig agent automatically collects the default metrics.

                                                                                                                                                                                                                                                                                                                                                                      This page describes the default configuration settings, how to edit the configuration to collect additional information, the metrics available for integration, and a sample result in the Sysdig Monitor UI.

                                                                                                                                                                                                                                                                                                                                                                      At this time, the Sysdig app check for Lighttpd supports Lighttpd version 1.x.x only.

                                                                                                                                                                                                                                                                                                                                                                      Lighttpd Setup

                                                                                                                                                                                                                                                                                                                                                                      For Lighttpd, the status page must be enabled. Add mod_status in the /etc/lighttpd/lighttpd.conf config file:

                                                                                                                                                                                                                                                                                                                                                                      server.modules = ( ..., "mod_status", ... )
                                                                                                                                                                                                                                                                                                                                                                      

                                                                                                                                                                                                                                                                                                                                                                      Then configure an endpoint for it. If (for security purposes) you want to open the status page only to users from the local network, it can be done by adding the following lines in the /etc/lighttpd/lighttpd.conf file :

                                                                                                                                                                                                                                                                                                                                                                      $HTTP["remoteip"] == "127.0.0.1/8" {
                                                                                                                                                                                                                                                                                                                                                                          status.status-url = "/server-status"
                                                                                                                                                                                                                                                                                                                                                                        }
                                                                                                                                                                                                                                                                                                                                                                      

                                                                                                                                                                                                                                                                                                                                                                      If you want an endpoint to be open for remote users based on authentication, then the mod_auth module should be enabled in the /etc/lighttpd/lighttpd.conf config file:

                                                                                                                                                                                                                                                                                                                                                                      server.modules = ( ..., "mod_auth", ... )
                                                                                                                                                                                                                                                                                                                                                                      

                                                                                                                                                                                                                                                                                                                                                                      Then you can add the auth.require parameter in the /etc/lighttpd/lighttpd.conf config file:

                                                                                                                                                                                                                                                                                                                                                                      auth.require = ( "/server-status" => ( "method"  => ... , "realm"   => ... , "require" => ... ) )
                                                                                                                                                                                                                                                                                                                                                                      

                                                                                                                                                                                                                                                                                                                                                                      For more information on the auth.require parameter, see the Lighttpd documentation..

                                                                                                                                                                                                                                                                                                                                                                      Sysdig Agent Configuration

                                                                                                                                                                                                                                                                                                                                                                      Review how to Edit dragent.yaml to Integrate or Modify Application Checks.

                                                                                                                                                                                                                                                                                                                                                                      Default Configuration

                                                                                                                                                                                                                                                                                                                                                                      By default, Sysdig’s dragent.default.yaml uses the following code to connect with Lighttpd and collect basic metrics.

                                                                                                                                                                                                                                                                                                                                                                      app_checks:
                                                                                                                                                                                                                                                                                                                                                                        - name: lighttpd
                                                                                                                                                                                                                                                                                                                                                                          pattern:
                                                                                                                                                                                                                                                                                                                                                                            comm: lighttpd
                                                                                                                                                                                                                                                                                                                                                                          conf:
                                                                                                                                                                                                                                                                                                                                                                            lighttpd_status_url: "http://localhost:{port}/server-status?auto"
                                                                                                                                                                                                                                                                                                                                                                          log_errors: false
                                                                                                                                                                                                                                                                                                                                                                      

                                                                                                                                                                                                                                                                                                                                                                      Metrics Available

                                                                                                                                                                                                                                                                                                                                                                      These metrics are supported for Lighttpd version 1.x.x only. Lighttpd version 2.x.x is being built and is NOT ready for use as of this publication.

                                                                                                                                                                                                                                                                                                                                                                      See Lighttpd Metrics.

                                                                                                                                                                                                                                                                                                                                                                      Result in the Monitor UI

                                                                                                                                                                                                                                                                                                                                                                      8.6.2.13 -

                                                                                                                                                                                                                                                                                                                                                                      Memcached

                                                                                                                                                                                                                                                                                                                                                                      Memcached is an in-memory key-value store for small chunks of arbitrary data (strings, objects) from the results of database calls, API calls, or page rendering. If Memcached is installed on your environment, the Sysdig agent will automatically connect. See the Default Configuration section, below. The Sysdig agent automatically collects basic metrics. You can also edit the configuration to collect additional metrics related to items and slabs.

                                                                                                                                                                                                                                                                                                                                                                      This page describes the default configuration settings, how to edit the configuration to collect additional information, the metrics available for integration, and a sample result in the Sysdig Monitor UI.

                                                                                                                                                                                                                                                                                                                                                                      Memcached Setup

                                                                                                                                                                                                                                                                                                                                                                      Memcached will automatically expose all metrics. You do not need to add anything on Memcached instance.

                                                                                                                                                                                                                                                                                                                                                                      Sysdig Agent Configuration

                                                                                                                                                                                                                                                                                                                                                                      Review how to Edit dragent.yaml to Integrate or Modify Application Checks.

                                                                                                                                                                                                                                                                                                                                                                      Default Configuration

                                                                                                                                                                                                                                                                                                                                                                      By default, Sysdig’s dragent.default.yaml uses the following code to connect with Memcached and collect basic metrics:

                                                                                                                                                                                                                                                                                                                                                                      app_checks:
                                                                                                                                                                                                                                                                                                                                                                        - name: memcached
                                                                                                                                                                                                                                                                                                                                                                          check_module: mcache
                                                                                                                                                                                                                                                                                                                                                                          pattern:
                                                                                                                                                                                                                                                                                                                                                                            comm: memcached
                                                                                                                                                                                                                                                                                                                                                                          conf:
                                                                                                                                                                                                                                                                                                                                                                            url: localhost
                                                                                                                                                                                                                                                                                                                                                                            port: "{port}"
                                                                                                                                                                                                                                                                                                                                                                      

                                                                                                                                                                                                                                                                                                                                                                      Additional metrics can be collected by editing Sysdig’s configuration file dragent.yaml. If SASL is enabled, authentication parameters must be added to dragent.yaml.

                                                                                                                                                                                                                                                                                                                                                                      Remember! Never edit dragent.default.yaml directly; always edit only dragent.yaml.

                                                                                                                                                                                                                                                                                                                                                                      Example 1: Additional Metrics

                                                                                                                                                                                                                                                                                                                                                                      memcache.items.* and memcache.slabs.* can be collected by setting flags in the options section, as follows . Either value can be set to false if you do not want to collect metrics from them.

                                                                                                                                                                                                                                                                                                                                                                      app_checks:
                                                                                                                                                                                                                                                                                                                                                                        - name: memcached
                                                                                                                                                                                                                                                                                                                                                                          check_module: mcache
                                                                                                                                                                                                                                                                                                                                                                          pattern:
                                                                                                                                                                                                                                                                                                                                                                            comm: memcached
                                                                                                                                                                                                                                                                                                                                                                          conf:
                                                                                                                                                                                                                                                                                                                                                                            url: localhost
                                                                                                                                                                                                                                                                                                                                                                            port: "{port}"
                                                                                                                                                                                                                                                                                                                                                                          options:
                                                                                                                                                                                                                                                                                                                                                                            items: true       # Default is false
                                                                                                                                                                                                                                                                                                                                                                            slabs: true       # Default is false
                                                                                                                                                                                                                                                                                                                                                                      

                                                                                                                                                                                                                                                                                                                                                                      Example 2: SASL

                                                                                                                                                                                                                                                                                                                                                                      SASL authentication can be enabled with Memcached (see instructions here). If enabled, credentials must be provided against username and password fields as shown in Example 2.

                                                                                                                                                                                                                                                                                                                                                                      app_checks:
                                                                                                                                                                                                                                                                                                                                                                        - name: memcached
                                                                                                                                                                                                                                                                                                                                                                          check_module: mcache
                                                                                                                                                                                                                                                                                                                                                                          pattern:
                                                                                                                                                                                                                                                                                                                                                                            comm: memcached
                                                                                                                                                                                                                                                                                                                                                                          conf:
                                                                                                                                                                                                                                                                                                                                                                            url: localhost
                                                                                                                                                                                                                                                                                                                                                                            port: "{port}"
                                                                                                                                                                                                                                                                                                                                                                            username: <username>
                                                                                                                                                                                                                                                                                                                                                                            # Some memcached version will support <username>@<hostname>.
                                                                                                                                                                                                                                                                                                                                                                            # If memcached is installed as a container, hostname of memcached container will be used as username
                                                                                                                                                                                                                                                                                                                                                                            password: <password>
                                                                                                                                                                                                                                                                                                                                                                      

                                                                                                                                                                                                                                                                                                                                                                      Metrics Available

                                                                                                                                                                                                                                                                                                                                                                      See Memcached Metrics.

                                                                                                                                                                                                                                                                                                                                                                      Result in the Monitor UI

                                                                                                                                                                                                                                                                                                                                                                      8.6.2.14 -

                                                                                                                                                                                                                                                                                                                                                                      Mesos/Marathon

                                                                                                                                                                                                                                                                                                                                                                      Mesos is built using the same principles as the Linux kernel, only at a different level of abstraction. The Mesos kernel runs on every machine and provides applications (e.g., Hadoop, Spark, Kafka, Elasticsearch) with APIs for resource management and scheduling across entire datacenter and cloud environments. The Mesos metrics are divided into master and agent. Marathon is a production-grade container orchestration platform for Apache Mesos.

                                                                                                                                                                                                                                                                                                                                                                      If Mesos and Marathon are installed in your environment, the Sysdig agent will automatically connect and start collecting metrics. You may need to edit the default entries to add a custom configuration if the default does not work. See the Default Configuration section, below.

                                                                                                                                                                                                                                                                                                                                                                      This page describes the default configuration settings, how to edit the configuration to collect additional information, the metrics available for integration, and a sample result in the Sysdig Monitor UI.

                                                                                                                                                                                                                                                                                                                                                                      Mesos/Marathon Setup

                                                                                                                                                                                                                                                                                                                                                                      Both Mesos and Marathon will automatically expose all metrics. You do not need to add anything to the Mesos/Marathon instance.

                                                                                                                                                                                                                                                                                                                                                                      Sysdig Agent Configuration

                                                                                                                                                                                                                                                                                                                                                                      Review how to Edit dragent.yaml to Integrate or Modify Application Checks.

                                                                                                                                                                                                                                                                                                                                                                      The Sysdig agent has different entries for mesos-master, mesos-slave and marathon in its configuration file. Default entries are present in Sysdig’s dragent.default.yaml file and collect all metrics for Mesos. For Marathon, it collects basic metrics. You may need add configuration to collect additional metrics.

                                                                                                                                                                                                                                                                                                                                                                      Default Configuration

                                                                                                                                                                                                                                                                                                                                                                      In the URLs for mesos-master and mesos-slave, {mesos_url} will be replaced with either the hostname of the auto-detected mesos master/slave (if auto-detection is enabled), or with an explicit value from mesos_state_uri otherwise.

                                                                                                                                                                                                                                                                                                                                                                      In the URLs for marathon, {marathon_url} will be replaced with the hostname of the first configured/discovered Marathon framework.

                                                                                                                                                                                                                                                                                                                                                                      For all Mesos and Marathon apps, {auth_token} will either be blank or an auto-generated token obtained via the /acs/api/v1/auth/login endpoint.

                                                                                                                                                                                                                                                                                                                                                                      Mesos Master

                                                                                                                                                                                                                                                                                                                                                                      app_checks:
                                                                                                                                                                                                                                                                                                                                                                        - name: mesos-master
                                                                                                                                                                                                                                                                                                                                                                          check_module: mesos_master
                                                                                                                                                                                                                                                                                                                                                                          interval: 30
                                                                                                                                                                                                                                                                                                                                                                          pattern:
                                                                                                                                                                                                                                                                                                                                                                            comm: mesos-master
                                                                                                                                                                                                                                                                                                                                                                          conf:
                                                                                                                                                                                                                                                                                                                                                                            url: "http://localhost:5050"
                                                                                                                                                                                                                                                                                                                                                                          auth_token: "{auth_token}"
                                                                                                                                                                                                                                                                                                                                                                          mesos_creds: "{mesos_creds}"
                                                                                                                                                                                                                                                                                                                                                                      

                                                                                                                                                                                                                                                                                                                                                                      Mesos Agent

                                                                                                                                                                                                                                                                                                                                                                      app_checks:
                                                                                                                                                                                                                                                                                                                                                                      
                                                                                                                                                                                                                                                                                                                                                                        - name: mesos-slave
                                                                                                                                                                                                                                                                                                                                                                          check_module: mesos_slave
                                                                                                                                                                                                                                                                                                                                                                          interval: 30
                                                                                                                                                                                                                                                                                                                                                                          pattern:
                                                                                                                                                                                                                                                                                                                                                                            comm: mesos-slave
                                                                                                                                                                                                                                                                                                                                                                          conf:
                                                                                                                                                                                                                                                                                                                                                                            url: "http://localhost:5051"
                                                                                                                                                                                                                                                                                                                                                                          auth_token: "{auth_token}"
                                                                                                                                                                                                                                                                                                                                                                          mesos_creds: "{mesos_creds}"
                                                                                                                                                                                                                                                                                                                                                                      

                                                                                                                                                                                                                                                                                                                                                                      Marathon

                                                                                                                                                                                                                                                                                                                                                                      app_checks:
                                                                                                                                                                                                                                                                                                                                                                      
                                                                                                                                                                                                                                                                                                                                                                        - name: marathon
                                                                                                                                                                                                                                                                                                                                                                          check_module: marathon
                                                                                                                                                                                                                                                                                                                                                                          interval: 30
                                                                                                                                                                                                                                                                                                                                                                          pattern:
                                                                                                                                                                                                                                                                                                                                                                            arg: mesosphere.marathon.Main
                                                                                                                                                                                                                                                                                                                                                                          conf:
                                                                                                                                                                                                                                                                                                                                                                            url: "{marathon_url}"
                                                                                                                                                                                                                                                                                                                                                                          auth_token: "{auth_token}"
                                                                                                                                                                                                                                                                                                                                                                          marathon_creds: "{marathon_creds}"
                                                                                                                                                                                                                                                                                                                                                                      

                                                                                                                                                                                                                                                                                                                                                                      Remember! Never edit dragent.default.yaml directly; always edit dragent.yaml.

                                                                                                                                                                                                                                                                                                                                                                      Marathon

                                                                                                                                                                                                                                                                                                                                                                      Enable the flag full_metrics to collect all metrics for marathon.

                                                                                                                                                                                                                                                                                                                                                                      The following additional metrics are collected with this configuration:

                                                                                                                                                                                                                                                                                                                                                                      • marathon.cpus

                                                                                                                                                                                                                                                                                                                                                                      • marathon.disk

                                                                                                                                                                                                                                                                                                                                                                      • marathon.instances

                                                                                                                                                                                                                                                                                                                                                                      • marathon.mem

                                                                                                                                                                                                                                                                                                                                                                      app_checks:
                                                                                                                                                                                                                                                                                                                                                                      
                                                                                                                                                                                                                                                                                                                                                                        - name: marathon
                                                                                                                                                                                                                                                                                                                                                                          check_module: marathon
                                                                                                                                                                                                                                                                                                                                                                          interval: 30
                                                                                                                                                                                                                                                                                                                                                                          pattern:
                                                                                                                                                                                                                                                                                                                                                                            arg: mesosphere.marathon.Main
                                                                                                                                                                                                                                                                                                                                                                          conf:
                                                                                                                                                                                                                                                                                                                                                                            url: "{marathon_url}"
                                                                                                                                                                                                                                                                                                                                                                          auth_token: "{auth_token}"
                                                                                                                                                                                                                                                                                                                                                                          marathon_creds: "{marathon_creds}"
                                                                                                                                                                                                                                                                                                                                                                      

                                                                                                                                                                                                                                                                                                                                                                      Metrics Available

                                                                                                                                                                                                                                                                                                                                                                      See Mesos Master Metrics.

                                                                                                                                                                                                                                                                                                                                                                      See Mesos Agent Metrics.

                                                                                                                                                                                                                                                                                                                                                                      See Marathon Metrics.

                                                                                                                                                                                                                                                                                                                                                                      Result in the Monitor UI

                                                                                                                                                                                                                                                                                                                                                                      Mesos Master

                                                                                                                                                                                                                                                                                                                                                                      Mesos Agent

                                                                                                                                                                                                                                                                                                                                                                      Marathon

                                                                                                                                                                                                                                                                                                                                                                      8.6.2.15 -

                                                                                                                                                                                                                                                                                                                                                                      MongoDB

                                                                                                                                                                                                                                                                                                                                                                      MongoDB is an open-source database management system (DBMS) that uses a document-oriented database model that supports various forms of data. If MongoDB is installed in your environment, the Sysdig agent will automatically connect and collect basic metrics (if authentication is not used). You may need to edit the default entries to connect and collect additional metrics. See the Default Configuration section, below.

                                                                                                                                                                                                                                                                                                                                                                      This page describes the default configuration settings, how to edit the configuration to collect additional information, the metrics available for integration, and a sample result in the Sysdig Monitor UI.

                                                                                                                                                                                                                                                                                                                                                                      MongoDB Setup

                                                                                                                                                                                                                                                                                                                                                                      Create a read-only user for the Sysdig agent.

                                                                                                                                                                                                                                                                                                                                                                      # Authenticate as the admin user.
                                                                                                                                                                                                                                                                                                                                                                      use admin
                                                                                                                                                                                                                                                                                                                                                                      db.auth("admin", "<YOUR_MONGODB_ADMIN_PASSWORD>")
                                                                                                                                                                                                                                                                                                                                                                      
                                                                                                                                                                                                                                                                                                                                                                      # On MongoDB 2.x, use the addUser command.
                                                                                                                                                                                                                                                                                                                                                                      db.addUser("sysdig-cloud", "sysdig-cloud-password", true)
                                                                                                                                                                                                                                                                                                                                                                      
                                                                                                                                                                                                                                                                                                                                                                      # On MongoDB 3.x or higher, use the createUser command.
                                                                                                                                                                                                                                                                                                                                                                      db.createUser({
                                                                                                                                                                                                                                                                                                                                                                        "user":"sysdig-cloud",
                                                                                                                                                                                                                                                                                                                                                                        "pwd": "sysdig-cloud-password",
                                                                                                                                                                                                                                                                                                                                                                        "roles" : [
                                                                                                                                                                                                                                                                                                                                                                          {role: 'read', db: 'admin' },
                                                                                                                                                                                                                                                                                                                                                                          {role: 'clusterMonitor', db: 'admin'},
                                                                                                                                                                                                                                                                                                                                                                          {role: 'read', db: 'local' }
                                                                                                                                                                                                                                                                                                                                                                        ]
                                                                                                                                                                                                                                                                                                                                                                      })
                                                                                                                                                                                                                                                                                                                                                                      

                                                                                                                                                                                                                                                                                                                                                                      Sysdig Agent Configuration

                                                                                                                                                                                                                                                                                                                                                                      Review how to Edit dragent.yaml to Integrate or Modify Application Checks.

                                                                                                                                                                                                                                                                                                                                                                      Default Configuration

                                                                                                                                                                                                                                                                                                                                                                      By default, Sysdig’s dragent.default.yaml uses the following code to connect with MongoDB.

                                                                                                                                                                                                                                                                                                                                                                      app_checks:
                                                                                                                                                                                                                                                                                                                                                                        - name: mongodb
                                                                                                                                                                                                                                                                                                                                                                          check_module: mongo
                                                                                                                                                                                                                                                                                                                                                                          pattern:
                                                                                                                                                                                                                                                                                                                                                                            comm: mongod
                                                                                                                                                                                                                                                                                                                                                                          conf:
                                                                                                                                                                                                                                                                                                                                                                            server: "mongodb://localhost:{port}/admin"
                                                                                                                                                                                                                                                                                                                                                                      

                                                                                                                                                                                                                                                                                                                                                                      The default MongoDB entry should work for without modification if authentication is not configured. If you have enabled password authentication, the entry will need to be changed.

                                                                                                                                                                                                                                                                                                                                                                      Some metrics are not available by default. Additional configuration needs to be provided to collect them as shown in following examples.

                                                                                                                                                                                                                                                                                                                                                                      Remember! Never edit dragent.default.yaml directly; always edit only dragent.yaml.

                                                                                                                                                                                                                                                                                                                                                                      Example 1: With Authentication

                                                                                                                                                                                                                                                                                                                                                                      Replace <username> and <password> with actual username and password.

                                                                                                                                                                                                                                                                                                                                                                      app_checks:
                                                                                                                                                                                                                                                                                                                                                                        - name: mongodb
                                                                                                                                                                                                                                                                                                                                                                          check_module: mongo
                                                                                                                                                                                                                                                                                                                                                                          pattern:
                                                                                                                                                                                                                                                                                                                                                                            comm: mongod
                                                                                                                                                                                                                                                                                                                                                                          conf:
                                                                                                                                                                                                                                                                                                                                                                            server: mongodb://<username>:<password>@localhost:{port}/admin
                                                                                                                                                                                                                                                                                                                                                                            replica_check: true
                                                                                                                                                                                                                                                                                                                                                                      

                                                                                                                                                                                                                                                                                                                                                                      Example 2: Additional Metrics

                                                                                                                                                                                                                                                                                                                                                                      Some metrics are not collected by default. These can be collected by adding additional_metrics section in the dragent.yaml file under the app_checks mongodb configuration.

                                                                                                                                                                                                                                                                                                                                                                      Available options are:

                                                                                                                                                                                                                                                                                                                                                                      collection - Metrics of the specified collections

                                                                                                                                                                                                                                                                                                                                                                      metrics.commands - Use of database commands

                                                                                                                                                                                                                                                                                                                                                                      tcmalloc - TCMalloc memory allocator

                                                                                                                                                                                                                                                                                                                                                                      top - Usage statistics for each collection

                                                                                                                                                                                                                                                                                                                                                                      app_checks:
                                                                                                                                                                                                                                                                                                                                                                        - name: mongodb
                                                                                                                                                                                                                                                                                                                                                                          check_module: mongo
                                                                                                                                                                                                                                                                                                                                                                          pattern:
                                                                                                                                                                                                                                                                                                                                                                            comm: mongod
                                                                                                                                                                                                                                                                                                                                                                          conf:
                                                                                                                                                                                                                                                                                                                                                                            server: mongodb://<username>:<password>@localhost:{port}/admin
                                                                                                                                                                                                                                                                                                                                                                            replica_check: true
                                                                                                                                                                                                                                                                                                                                                                            additional_metrics:
                                                                                                                                                                                                                                                                                                                                                                              - collection
                                                                                                                                                                                                                                                                                                                                                                              - metrics.commands
                                                                                                                                                                                                                                                                                                                                                                              - tcmalloc
                                                                                                                                                                                                                                                                                                                                                                              - top
                                                                                                                                                                                                                                                                                                                                                                      

                                                                                                                                                                                                                                                                                                                                                                      List of metrics with respective entries in dragent.yaml:

                                                                                                                                                                                                                                                                                                                                                                      metric prefixEntry under additional_metrics
                                                                                                                                                                                                                                                                                                                                                                      mongodb.collectioncollection
                                                                                                                                                                                                                                                                                                                                                                      mongodb.usage.commandstop
                                                                                                                                                                                                                                                                                                                                                                      mongodb.usage.getmoretop
                                                                                                                                                                                                                                                                                                                                                                      mongodb.usage.inserttop
                                                                                                                                                                                                                                                                                                                                                                      mongodb.usage.queriestop
                                                                                                                                                                                                                                                                                                                                                                      mongodb.usage.readLocktop
                                                                                                                                                                                                                                                                                                                                                                      mongodb.usage.writeLocktop
                                                                                                                                                                                                                                                                                                                                                                      mongodb.usage.removetop
                                                                                                                                                                                                                                                                                                                                                                      mongodb.usage.totaltop
                                                                                                                                                                                                                                                                                                                                                                      mongodb.usage.updatetop
                                                                                                                                                                                                                                                                                                                                                                      mongodb.usage.writeLocktop
                                                                                                                                                                                                                                                                                                                                                                      mongodb.tcmalloctcmalloc
                                                                                                                                                                                                                                                                                                                                                                      mongodb.metrics.commandsmetrics.commands

                                                                                                                                                                                                                                                                                                                                                                      Example 3: Collections Metrics

                                                                                                                                                                                                                                                                                                                                                                      MongoDB stores documents in collections. Collections are analogous to tables in relational databases. The Sysdig agent by default does not collect the following collections metrics:

                                                                                                                                                                                                                                                                                                                                                                      • collections: List of MongoDB collections to be polled by the agent. Metrics will be collected for the specified set of collections. This configuration requires the additional_metrics.collection section to be present with an entry for collection in the dragent.yaml file. The collection entry under additional_metrics is a flag that enables the collection metrics.

                                                                                                                                                                                                                                                                                                                                                                      • collections_indexes_stats: Collect indexes access metrics for every index in every collection in the collections list. The default value is false.

                                                                                                                                                                                                                                                                                                                                                                        The metric is available starting MongoDB v3.2.

                                                                                                                                                                                                                                                                                                                                                                      For the agent to poll them, you must configure the dragent.yaml file and add an entry corresponding to the metrics to the conf section as follows.

                                                                                                                                                                                                                                                                                                                                                                      app_checks:
                                                                                                                                                                                                                                                                                                                                                                        - name: mongodb
                                                                                                                                                                                                                                                                                                                                                                          check_module: mongo
                                                                                                                                                                                                                                                                                                                                                                          pattern:
                                                                                                                                                                                                                                                                                                                                                                            comm: mongod
                                                                                                                                                                                                                                                                                                                                                                          conf:
                                                                                                                                                                                                                                                                                                                                                                            server: mongodb://<username>:<password>@localhost:{port}/admin
                                                                                                                                                                                                                                                                                                                                                                            replica_check: true
                                                                                                                                                                                                                                                                                                                                                                            additional_metrics:
                                                                                                                                                                                                                                                                                                                                                                              - collection
                                                                                                                                                                                                                                                                                                                                                                              - metrics.commands
                                                                                                                                                                                                                                                                                                                                                                              - tcmalloc
                                                                                                                                                                                                                                                                                                                                                                              - top
                                                                                                                                                                                                                                                                                                                                                                            collections:
                                                                                                                                                                                                                                                                                                                                                                              - <LIST_COLLECTIONS>
                                                                                                                                                                                                                                                                                                                                                                            collections_indexes_stats: true
                                                                                                                                                                                                                                                                                                                                                                      

                                                                                                                                                                                                                                                                                                                                                                      Configure SSL for MongoDB App Check

                                                                                                                                                                                                                                                                                                                                                                      You can tighten the security measure of the app check connection with MongoDB by establishing an SSL connection. To enable secure communication, you need to set the SSL configuration in dragent.yaml to true. In an advanced deployment with multi-instances of MongoDB, you need to include a custom CA certificate or client certificate and other additional configurations.

                                                                                                                                                                                                                                                                                                                                                                      Basic SSL Connection

                                                                                                                                                                                                                                                                                                                                                                      In a basic SSL connection:

                                                                                                                                                                                                                                                                                                                                                                      • A single MongoDB instance is running on the host.

                                                                                                                                                                                                                                                                                                                                                                      • An SSL connection with no advanced features, such as the use of a custom CA certificate or client certificate.

                                                                                                                                                                                                                                                                                                                                                                      To establish a basic SSL connection between the agent and the MongoDB instance:

                                                                                                                                                                                                                                                                                                                                                                      1. Open the dragent.yaml file.

                                                                                                                                                                                                                                                                                                                                                                      2. Configure the SSL entries as follows:

                                                                                                                                                                                                                                                                                                                                                                        app_checks:
                                                                                                                                                                                                                                                                                                                                                                          - name: mongodb
                                                                                                                                                                                                                                                                                                                                                                            check_module: mongo
                                                                                                                                                                                                                                                                                                                                                                            pattern:
                                                                                                                                                                                                                                                                                                                                                                              comm: mongod
                                                                                                                                                                                                                                                                                                                                                                            conf:
                                                                                                                                                                                                                                                                                                                                                                              server: "mongodb://<HOSTNAME>:{port}/admin"
                                                                                                                                                                                                                                                                                                                                                                              ssl: true
                                                                                                                                                                                                                                                                                                                                                                              # ssl_cert_reqs: 0    # Disable SSL validation
                                                                                                                                                                                                                                                                                                                                                                        

                                                                                                                                                                                                                                                                                                                                                                        To disable SSL validation, set ssl_cert_reqs to 0. This setting is equivalent to ssl_cert_reqs=CERT_NONE.

                                                                                                                                                                                                                                                                                                                                                                      Advanced SSL Connection

                                                                                                                                                                                                                                                                                                                                                                      In an advanced SSL connection:

                                                                                                                                                                                                                                                                                                                                                                      • Advanced features, such as custom CA certificate or client certificate, are configured.

                                                                                                                                                                                                                                                                                                                                                                      • Single or multi-MongoDB instances are running on the host. The agent is installed as one of the following:

                                                                                                                                                                                                                                                                                                                                                                        • Container

                                                                                                                                                                                                                                                                                                                                                                        • Service

                                                                                                                                                                                                                                                                                                                                                                      Prerequisites

                                                                                                                                                                                                                                                                                                                                                                      Set up the following:

                                                                                                                                                                                                                                                                                                                                                                      • Custom CA certificate

                                                                                                                                                                                                                                                                                                                                                                      • Client SSL verification

                                                                                                                                                                                                                                                                                                                                                                      • SSL validation

                                                                                                                                                                                                                                                                                                                                                                      (Optional ) SSL Configuration Parameters

                                                                                                                                                                                                                                                                                                                                                                      Parameters

                                                                                                                                                                                                                                                                                                                                                                      Description

                                                                                                                                                                                                                                                                                                                                                                      ssl_certfile

                                                                                                                                                                                                                                                                                                                                                                      The certificate file that is used to identify the local connection with MongoDB.

                                                                                                                                                                                                                                                                                                                                                                      ssl_keyfile

                                                                                                                                                                                                                                                                                                                                                                      The private keyfile that is used to identify the local connection with MongoDB. Ignore this option if the key is included with ssl_certfile.

                                                                                                                                                                                                                                                                                                                                                                      ssl_cert_reqs

                                                                                                                                                                                                                                                                                                                                                                      Specifies whether a certificate is required from the MongoDB server, and whether it will be validated if provided. Possible values are:

                                                                                                                                                                                                                                                                                                                                                                      • 0 for ssl.CERT_NONE. Implies certificates are ignored.

                                                                                                                                                                                                                                                                                                                                                                      • 1 for ssl.CERT_OPTIONAL. Implies certificates are not required, but validated if provided.

                                                                                                                                                                                                                                                                                                                                                                      • 2 for ssl.CERT_REQUIRED. Implies certificates are required and validated.

                                                                                                                                                                                                                                                                                                                                                                      ssl_ca_certs

                                                                                                                                                                                                                                                                                                                                                                      The ca_certs file contains a set of concatenated certification authority certificates, which are used to validate certificates used by MongoDB server. Mostly used when server certificates are self-signed.

                                                                                                                                                                                                                                                                                                                                                                      Sysdig Agent as a Container

                                                                                                                                                                                                                                                                                                                                                                      1. If Sysdig agent is installed as a container, start it with an extra volume containing the SSL files mentioned in the agent configuration. For example:

                                                                                                                                                                                                                                                                                                                                                                        # extra parameter added: -v /etc/ssl:/etc/ssl
                                                                                                                                                                                                                                                                                                                                                                        docker run -d --name sysdig-agent --restart always --privileged --net host --pid host -e ACCESS_KEY=xxxxxxxxxxxxx -e SECURE=true -e TAGS=example_tag:example_value -v /var/run/docker.sock:/host/var/run/docker.sock -v /dev:/host/dev -v /proc:/host/proc:ro -v /boot:/host/boot:ro -v /lib/modules:/host/lib/modules:ro -v /usr:/host/usr:ro -v /etc/ssl:/etc/ssl --shm-size=512m sysdig/agent
                                                                                                                                                                                                                                                                                                                                                                        
                                                                                                                                                                                                                                                                                                                                                                      2. Open the dragent.yaml file and configure the SSL entries:

                                                                                                                                                                                                                                                                                                                                                                        app_checks:
                                                                                                                                                                                                                                                                                                                                                                          - name: mongodb
                                                                                                                                                                                                                                                                                                                                                                            check_module: mongo
                                                                                                                                                                                                                                                                                                                                                                            pattern:
                                                                                                                                                                                                                                                                                                                                                                              comm: mongod
                                                                                                                                                                                                                                                                                                                                                                            conf:
                                                                                                                                                                                                                                                                                                                                                                              server: "mongodb://<HOSTNAME>:{port}/admin"
                                                                                                                                                                                                                                                                                                                                                                              ssl: true
                                                                                                                                                                                                                                                                                                                                                                              # ssl_ca_certs: </path/to/ca/certificate>
                                                                                                                                                                                                                                                                                                                                                                              # ssl_cert_reqs: 0    # Disable SSL validation
                                                                                                                                                                                                                                                                                                                                                                              # ssl_certfile: </path/to/client/certfile>
                                                                                                                                                                                                                                                                                                                                                                              # ssl_keyfile: </path/to/client/keyfile>
                                                                                                                                                                                                                                                                                                                                                                        

                                                                                                                                                                                                                                                                                                                                                                      Sysdig Agent as a Process

                                                                                                                                                                                                                                                                                                                                                                      1. If Sysdig agent is installed as a process, store the SSL files on the host and provide the path in the agent configuration.

                                                                                                                                                                                                                                                                                                                                                                        app_checks:
                                                                                                                                                                                                                                                                                                                                                                          - name: mongodb
                                                                                                                                                                                                                                                                                                                                                                            check_module: mongo
                                                                                                                                                                                                                                                                                                                                                                            pattern:
                                                                                                                                                                                                                                                                                                                                                                              comm: mongod
                                                                                                                                                                                                                                                                                                                                                                            conf:
                                                                                                                                                                                                                                                                                                                                                                              server: "mongodb://<HOSTNAME>:{port}/admin"
                                                                                                                                                                                                                                                                                                                                                                              ssl: true
                                                                                                                                                                                                                                                                                                                                                                              # ssl_ca_certs: </path/to/ca/certificate>
                                                                                                                                                                                                                                                                                                                                                                              # ssl_cert_reqs: 0    # Disable SSL validation
                                                                                                                                                                                                                                                                                                                                                                              # ssl_certfile: </path/to/client/certfile>
                                                                                                                                                                                                                                                                                                                                                                              # ssl_keyfile: </path/to/client/keyfile>
                                                                                                                                                                                                                                                                                                                                                                        

                                                                                                                                                                                                                                                                                                                                                                        See optional SSL configuration parameters for information on SSL certificate files.

                                                                                                                                                                                                                                                                                                                                                                      Multi-MongoDB Setup

                                                                                                                                                                                                                                                                                                                                                                      In a multi-MongoDB setup, multiple MongoDB instances are running on a single host. You can configure either a basic or an advanced SSL connection individually for each MongoDB instance.

                                                                                                                                                                                                                                                                                                                                                                      Store SSL Files

                                                                                                                                                                                                                                                                                                                                                                      In an advanced connection, different SSL certificates are used for each instance of MongoDB on the same host and are stored in separate directories. For instance, the SSL files corresponding to two different MongoDB instances can be stored at a mount point as follows:

                                                                                                                                                                                                                                                                                                                                                                      • Mount point is /etc/ssl/

                                                                                                                                                                                                                                                                                                                                                                      • Files for instance 1 are stored in  /etc/ssl/mongo1/

                                                                                                                                                                                                                                                                                                                                                                      • Files for instance 2 are stored in  /etc/ssl/mongo2/

                                                                                                                                                                                                                                                                                                                                                                      Configure the Agent
                                                                                                                                                                                                                                                                                                                                                                      1. Open the dragent.yaml file.

                                                                                                                                                                                                                                                                                                                                                                      2. Configure the SSL entries as follows:

                                                                                                                                                                                                                                                                                                                                                                        app_checks:
                                                                                                                                                                                                                                                                                                                                                                          - name: mongodb-ssl-1
                                                                                                                                                                                                                                                                                                                                                                            check_module: mongo
                                                                                                                                                                                                                                                                                                                                                                            pattern:
                                                                                                                                                                                                                                                                                                                                                                              comm: mongod
                                                                                                                                                                                                                                                                                                                                                                              args: ssl_certificate-1.pem
                                                                                                                                                                                                                                                                                                                                                                            conf:
                                                                                                                                                                                                                                                                                                                                                                              server: "mongodb://<HOSTNAME|Certificate_CN>:{port}/admin"
                                                                                                                                                                                                                                                                                                                                                                              ssl: true
                                                                                                                                                                                                                                                                                                                                                                              ssl_ca_certs: /etc/ssl/mongo1/ca-cert-1
                                                                                                                                                                                                                                                                                                                                                                              tags:
                                                                                                                                                                                                                                                                                                                                                                                - "instance:ssl-1"
                                                                                                                                                                                                                                                                                                                                                                        
                                                                                                                                                                                                                                                                                                                                                                          - name: mongodb-ssl-2
                                                                                                                                                                                                                                                                                                                                                                            check_module: mongo
                                                                                                                                                                                                                                                                                                                                                                            pattern:
                                                                                                                                                                                                                                                                                                                                                                              comm: mongod
                                                                                                                                                                                                                                                                                                                                                                              args: ssl_certificate-2.pem
                                                                                                                                                                                                                                                                                                                                                                            conf:
                                                                                                                                                                                                                                                                                                                                                                              server: "mongodb://<HOSTNAME|Certificate_CN>:{port}/admin"
                                                                                                                                                                                                                                                                                                                                                                              ssl: true
                                                                                                                                                                                                                                                                                                                                                                              ssl_ca_certs: /etc/ssl/mongo2/ca-cert-2
                                                                                                                                                                                                                                                                                                                                                                              tags:
                                                                                                                                                                                                                                                                                                                                                                                - "instance:ssl-2"
                                                                                                                                                                                                                                                                                                                                                                        

                                                                                                                                                                                                                                                                                                                                                                        Replace the names of the instances and certificate files with the names that you prefer.

                                                                                                                                                                                                                                                                                                                                                                      Metrics Available

                                                                                                                                                                                                                                                                                                                                                                      See MongoDB Metrics.

                                                                                                                                                                                                                                                                                                                                                                      Result in the Monitor UI

                                                                                                                                                                                                                                                                                                                                                                      8.6.2.16 -

                                                                                                                                                                                                                                                                                                                                                                      MySQL

                                                                                                                                                                                                                                                                                                                                                                      MySQL is the world’s most popular open-source database. With its proven performance, reliability, and ease-of-use, MySQL has become the leading database choice for web-based applications, used by high profile web properties including Facebook, Twitter, YouTube. Additionally, it is an extremely popular choice as an embedded database, distributed by thousands of ISVs and OEMs.

                                                                                                                                                                                                                                                                                                                                                                      Supported Distribution

                                                                                                                                                                                                                                                                                                                                                                      The MySQL AppCheck is supported for following MySQL versions.

                                                                                                                                                                                                                                                                                                                                                                      If the Sysdig agent is installed as a Process:

                                                                                                                                                                                                                                                                                                                                                                      • Host with Python 2.7: MySQL versions supported - 5.5 to 8

                                                                                                                                                                                                                                                                                                                                                                      • Host with Python 2.6: MySQL versions supported - 4.1 to 5.7 (tested with v5.x only)

                                                                                                                                                                                                                                                                                                                                                                        NOTE: This implies that MySQL 5.5, 5.6 and 5.7 are supported on both the Python 2.6 and 2.7 environments.

                                                                                                                                                                                                                                                                                                                                                                      If the Sysdig agent is installed as a Docker container:

                                                                                                                                                                                                                                                                                                                                                                      The Docker container of the Sysdig agent has Python 2.7 installed. If it is installed, respective versions against Python 2.7 will be supported.

                                                                                                                                                                                                                                                                                                                                                                      The following environments have been tested and are supported. Tests environments include both the Host/Process and Docker environment.

                                                                                                                                                                                                                                                                                                                                                                      PythonMySQL
                                                                                                                                                                                                                                                                                                                                                                      2.7 (Ubuntu 16/ CentOS 7)NoYesYesYesYes
                                                                                                                                                                                                                                                                                                                                                                      2.6 (CentOS 6)YesYesYesYesNo

                                                                                                                                                                                                                                                                                                                                                                      MySQL Setup

                                                                                                                                                                                                                                                                                                                                                                      A user must be created on MySQL so the Sysdig agent can collect metrics. To configure credentials, run the following commands on your server, replacing the sysdig-clouc-password parameter.

                                                                                                                                                                                                                                                                                                                                                                      MySQL version-specific commands to create a user are provided below.

                                                                                                                                                                                                                                                                                                                                                                      # MySQL 5.6 and earlier
                                                                                                                                                                                                                                                                                                                                                                      CREATE USER 'sysdig-cloud'@'127.0.0.1' IDENTIFIED BY 'sysdig-cloud-password';
                                                                                                                                                                                                                                                                                                                                                                      GRANT PROCESS, REPLICATION CLIENT ON *.* TO 'sysdig-cloud'@'127.0.0.1' WITH MAX_USER_CONNECTIONS 5;
                                                                                                                                                                                                                                                                                                                                                                      
                                                                                                                                                                                                                                                                                                                                                                      ## OR ##
                                                                                                                                                                                                                                                                                                                                                                      
                                                                                                                                                                                                                                                                                                                                                                      # MySQL 5.7 and 8
                                                                                                                                                                                                                                                                                                                                                                      CREATE USER 'sysdig-cloud'@'127.0.0.1' IDENTIFIED BY 'sysdig-cloud-password' WITH MAX_USER_CONNECTIONS 5;
                                                                                                                                                                                                                                                                                                                                                                      GRANT PROCESS, REPLICATION CLIENT ON *.* TO 'sysdig-cloud'@'127.0.0.1';
                                                                                                                                                                                                                                                                                                                                                                      

                                                                                                                                                                                                                                                                                                                                                                      Sysdig Agent Configuration

                                                                                                                                                                                                                                                                                                                                                                      Review how to Edit dragent.yaml to Integrate or Modify Application Checks.

                                                                                                                                                                                                                                                                                                                                                                      There is no default configuration for MySQL, as a unique user and password are required for metrics polling.

                                                                                                                                                                                                                                                                                                                                                                      Add the entry for MySQL into dragent.yaml , updating the user and pass field credentials.

                                                                                                                                                                                                                                                                                                                                                                      app_checks:
                                                                                                                                                                                                                                                                                                                                                                        - name: mysql
                                                                                                                                                                                                                                                                                                                                                                          pattern:
                                                                                                                                                                                                                                                                                                                                                                            comm: mysqld
                                                                                                                                                                                                                                                                                                                                                                          conf:
                                                                                                                                                                                                                                                                                                                                                                            server: 127.0.0.1
                                                                                                                                                                                                                                                                                                                                                                            user: sysdig-cloud
                                                                                                                                                                                                                                                                                                                                                                            pass: sysdig-cloud-password
                                                                                                                                                                                                                                                                                                                                                                      

                                                                                                                                                                                                                                                                                                                                                                      Metrics Available

                                                                                                                                                                                                                                                                                                                                                                      See MySQL Metrics.

                                                                                                                                                                                                                                                                                                                                                                      Result in the Monitor UI

                                                                                                                                                                                                                                                                                                                                                                      Default Dashboard

                                                                                                                                                                                                                                                                                                                                                                      Additional Views

                                                                                                                                                                                                                                                                                                                                                                      8.6.2.17 -

                                                                                                                                                                                                                                                                                                                                                                      NGINX and NGINX Plus

                                                                                                                                                                                                                                                                                                                                                                      NGINX is open-source software for web serving, reverse proxying, caching, load balancing, media streaming, and more. It started out as a web server designed for maximum performance and stability. In addition to its HTTP server capabilities, NGINX can also function as a proxy server for email (IMAP, POP3, and SMTP) and a reverse proxy and load balancer for HTTP, TCP, and UDP servers.

                                                                                                                                                                                                                                                                                                                                                                      NGINX Plus is a software load balancer, web server, and content cache built on top of open source NGINX. NGINX Plus has exclusive enterprise‑grade features beyond what’s available in the open-source offering, including session persistence, configuration via API, and active health checks.

                                                                                                                                                                                                                                                                                                                                                                      The Sysdig agent has a default configuration to collect metrics for open-source NGINX, provided that you have the HTTP stub status module enabled. NGINX exposes basic metrics about server activity on a simple status page with this status module. If NGINX Plus is installed, a wide range of metrics is available with the NGINX Plus API.

                                                                                                                                                                                                                                                                                                                                                                      This page describes the setup steps for NGINX/NGINX Plus, the default configuration settings, how to edit the configuration to collect additional information, the metrics available for integration, and sample results in the Sysdig Monitor UI.

                                                                                                                                                                                                                                                                                                                                                                      NGINX/ NGINX Plus Setup

                                                                                                                                                                                                                                                                                                                                                                      This section describes the configuration required on the NGINX server.

                                                                                                                                                                                                                                                                                                                                                                      The Sysdig agent will not collect metrics until the required endpoint is added to the NGINX configuration, per one of the following methods:

                                                                                                                                                                                                                                                                                                                                                                      • For NGINX (Open Source): use the stub status module

                                                                                                                                                                                                                                                                                                                                                                      • For NGINX Plus: use the Plus API

                                                                                                                                                                                                                                                                                                                                                                      Configuration examples of each are provided below

                                                                                                                                                                                                                                                                                                                                                                      NGINX Stub Status Module Configuration

                                                                                                                                                                                                                                                                                                                                                                      The ngx_http_stub_status_module provides access to basic status information. It is compiled by default on most distributions. If not, it should be enabled with the --with-http_stub_status_module configuration parameter.

                                                                                                                                                                                                                                                                                                                                                                      1. To check if the module is already compiled, run the following command:

                                                                                                                                                                                                                                                                                                                                                                        nginx -V 2>&1 | grep -o with-http_stub_status_module
                                                                                                                                                                                                                                                                                                                                                                        

                                                                                                                                                                                                                                                                                                                                                                        If with-http_stub_status_module is listed, the status module is enabled. (For more information, see http://nginx.org/en/docs/http/ngx_http_stub_status_module.html.)

                                                                                                                                                                                                                                                                                                                                                                      2. Update the NGINX configuration file with /nginx_status endpoint as follows. The default NGINX configuration file is present at /etc/nginx/nginx.conf or /etc/nginx/conf.d/default.conf.

                                                                                                                                                                                                                                                                                                                                                                        # HTTP context
                                                                                                                                                                                                                                                                                                                                                                        server {
                                                                                                                                                                                                                                                                                                                                                                        ...
                                                                                                                                                                                                                                                                                                                                                                          # Enable NGINX status module
                                                                                                                                                                                                                                                                                                                                                                          location /nginx_status {
                                                                                                                                                                                                                                                                                                                                                                            # freely available with open source NGINX
                                                                                                                                                                                                                                                                                                                                                                            stub_status;
                                                                                                                                                                                                                                                                                                                                                                            access_log   off;
                                                                                                                                                                                                                                                                                                                                                                        
                                                                                                                                                                                                                                                                                                                                                                            # for open source NGINX < version 1.7.5
                                                                                                                                                                                                                                                                                                                                                                            # stub_status on;
                                                                                                                                                                                                                                                                                                                                                                          }
                                                                                                                                                                                                                                                                                                                                                                        ...
                                                                                                                                                                                                                                                                                                                                                                        }
                                                                                                                                                                                                                                                                                                                                                                        

                                                                                                                                                                                                                                                                                                                                                                      NGINX Plus API Configuration

                                                                                                                                                                                                                                                                                                                                                                      When NGINX Plus is configured, the Plus API can be enabled by adding /api endpoint in the NGINX configuration file as follows.

                                                                                                                                                                                                                                                                                                                                                                      The default NGINX configuration file is present at /etc/nginx/nginx.conf or /etc/nginx/conf.d/default.conf.

                                                                                                                                                                                                                                                                                                                                                                      # HTTP context
                                                                                                                                                                                                                                                                                                                                                                      server {
                                                                                                                                                                                                                                                                                                                                                                      ...
                                                                                                                                                                                                                                                                                                                                                                        # Enable NGINX Plus API
                                                                                                                                                                                                                                                                                                                                                                        location /api {
                                                                                                                                                                                                                                                                                                                                                                          api write=on;
                                                                                                                                                                                                                                                                                                                                                                          allow all;
                                                                                                                                                                                                                                                                                                                                                                        }
                                                                                                                                                                                                                                                                                                                                                                      ...
                                                                                                                                                                                                                                                                                                                                                                      }
                                                                                                                                                                                                                                                                                                                                                                      

                                                                                                                                                                                                                                                                                                                                                                      Sysdig Agent Configuration

                                                                                                                                                                                                                                                                                                                                                                      • Configuration Examples:

                                                                                                                                                                                                                                                                                                                                                                        • Example 1 (Default): When only open-source Nginx is configured.

                                                                                                                                                                                                                                                                                                                                                                        • Example 2: When only NginxPlus node is configured.

                                                                                                                                                                                                                                                                                                                                                                        • Example 3: When Nginx and NginxPlus are installed in different containers on same host.

                                                                                                                                                                                                                                                                                                                                                                      • Flag use_plus_api and is used for differentiating NGINX & NGINXPlus metrics.

                                                                                                                                                                                                                                                                                                                                                                      • NGINXPlus metrics are differentiated with prefix nginx.plus.*

                                                                                                                                                                                                                                                                                                                                                                      • When use_plus_api = true,

                                                                                                                                                                                                                                                                                                                                                                        • nginx_plus_api_url is used to fetch NginxPlus metrics from the NginxPlus node.

                                                                                                                                                                                                                                                                                                                                                                        • nginx_status_url is used to fetch Nginx metrics from the Nginx node (If single host is running two separate containers for Nginx and NginxPlus).

                                                                                                                                                                                                                                                                                                                                                                      Example 1: Default Configuration

                                                                                                                                                                                                                                                                                                                                                                      With the default configuration, only NGINX metrics will be available once the ngx_http_stub_status_module is configured.

                                                                                                                                                                                                                                                                                                                                                                      app_checks:
                                                                                                                                                                                                                                                                                                                                                                        - name: nginx
                                                                                                                                                                                                                                                                                                                                                                          check_module: nginx
                                                                                                                                                                                                                                                                                                                                                                          pattern:
                                                                                                                                                                                                                                                                                                                                                                            exe: "nginx: worker process"
                                                                                                                                                                                                                                                                                                                                                                          conf:
                                                                                                                                                                                                                                                                                                                                                                            nginx_status_url: "http://localhost:{port}/nginx_status"
                                                                                                                                                                                                                                                                                                                                                                          log_errors: true
                                                                                                                                                                                                                                                                                                                                                                      

                                                                                                                                                                                                                                                                                                                                                                      Example 2: NGINX Plus only

                                                                                                                                                                                                                                                                                                                                                                      With this example only NGINX Plus Metrics will be available.

                                                                                                                                                                                                                                                                                                                                                                      app_checks:
                                                                                                                                                                                                                                                                                                                                                                        - name: nginx
                                                                                                                                                                                                                                                                                                                                                                          check_module: nginx
                                                                                                                                                                                                                                                                                                                                                                          pattern:
                                                                                                                                                                                                                                                                                                                                                                            exe: "nginx: worker process"
                                                                                                                                                                                                                                                                                                                                                                          conf:
                                                                                                                                                                                                                                                                                                                                                                            nginx_plus_api_url: "http://localhost:{port}/api"
                                                                                                                                                                                                                                                                                                                                                                            use_plus_api: true
                                                                                                                                                                                                                                                                                                                                                                            user: admin
                                                                                                                                                                                                                                                                                                                                                                            password: admin
                                                                                                                                                                                                                                                                                                                                                                          log_errors: true
                                                                                                                                                                                                                                                                                                                                                                      

                                                                                                                                                                                                                                                                                                                                                                      Example 3: NGINX and NGINX Plus

                                                                                                                                                                                                                                                                                                                                                                      This is special case where NGINX open-source and NGINX PLUS are installed on same host but in different containers. With this configuration, respective metrics will be available for NGINX and NGINX Plus containers.

                                                                                                                                                                                                                                                                                                                                                                      app_checks:
                                                                                                                                                                                                                                                                                                                                                                        - name: nginx
                                                                                                                                                                                                                                                                                                                                                                          check_module: nginx
                                                                                                                                                                                                                                                                                                                                                                          pattern:
                                                                                                                                                                                                                                                                                                                                                                            exe: "nginx: worker process"
                                                                                                                                                                                                                                                                                                                                                                          conf:
                                                                                                                                                                                                                                                                                                                                                                            nginx_plus_api_url: "http://localhost:{port}/api"
                                                                                                                                                                                                                                                                                                                                                                            nginx_status_url: "http://localhost:{port}/nginx_status"
                                                                                                                                                                                                                                                                                                                                                                            use_plus_api: true
                                                                                                                                                                                                                                                                                                                                                                            user: admin
                                                                                                                                                                                                                                                                                                                                                                            password: admin
                                                                                                                                                                                                                                                                                                                                                                          log_errors: true
                                                                                                                                                                                                                                                                                                                                                                      

                                                                                                                                                                                                                                                                                                                                                                      List of Metrics

                                                                                                                                                                                                                                                                                                                                                                      NGINX (Open Source)

                                                                                                                                                                                                                                                                                                                                                                      See NGINX Metrics.

                                                                                                                                                                                                                                                                                                                                                                      NGINX Plus

                                                                                                                                                                                                                                                                                                                                                                      See NGINX Plus Metrics.

                                                                                                                                                                                                                                                                                                                                                                      Result in the Monitor UI

                                                                                                                                                                                                                                                                                                                                                                      8.6.2.18 -

                                                                                                                                                                                                                                                                                                                                                                      NTP

                                                                                                                                                                                                                                                                                                                                                                      NTP stands for Network Time Protocol. It is used to synchronize the time on your Linux system with a centralized NTP server. A local NTP server on the network can be synchronized with an external timing source to keep all the servers in your organization in-sync with an accurate time.

                                                                                                                                                                                                                                                                                                                                                                      If the NTP check is enabled in the Sysdig agent, it reports the time offset of the local agent from an NTP server.

                                                                                                                                                                                                                                                                                                                                                                      This page describes how to edit the configuration to collect information, the metrics available for integration, and a sample result in the Sysdig Monitor UI.

                                                                                                                                                                                                                                                                                                                                                                      Sysdig Agent Configuration

                                                                                                                                                                                                                                                                                                                                                                      Review how to Edit dragent.yaml to Integrate or Modify Application Checks.

                                                                                                                                                                                                                                                                                                                                                                      Default Configuration

                                                                                                                                                                                                                                                                                                                                                                      By default, Sysdig's dragent.default.yaml does not provide any configuration for NTP.

                                                                                                                                                                                                                                                                                                                                                                      Add the configuration in Example 1 to the dragent.yaml file to enable NTP checks.

                                                                                                                                                                                                                                                                                                                                                                      Never edit dragent.default.yaml directly; always edit only dragent.yaml.

                                                                                                                                                                                                                                                                                                                                                                      Example

                                                                                                                                                                                                                                                                                                                                                                      - name: ntp
                                                                                                                                                                                                                                                                                                                                                                          interval: 60
                                                                                                                                                                                                                                                                                                                                                                          pattern:
                                                                                                                                                                                                                                                                                                                                                                            comm: systemd
                                                                                                                                                                                                                                                                                                                                                                          conf:
                                                                                                                                                                                                                                                                                                                                                                            host: us.pool.ntp.org
                                                                                                                                                                                                                                                                                                                                                                            offset_threshold: 60
                                                                                                                                                                                                                                                                                                                                                                      
                                                                                                                                                                                                                                                                                                                                                                      • host : (mandatory) provides the host name of NTP server.

                                                                                                                                                                                                                                                                                                                                                                      • offset_threshold: (optional) provides the difference (in seconds) between the local clock and the NTP server, when the ntp.in_sync service check becomes CRITICAL. The default is 60 seconds.

                                                                                                                                                                                                                                                                                                                                                                      Metrics Available

                                                                                                                                                                                                                                                                                                                                                                      ntp.offset, the time difference between the local clock and the NTP reference clock, is the primary NTP metric.

                                                                                                                                                                                                                                                                                                                                                                      See also NTP Metrics.

                                                                                                                                                                                                                                                                                                                                                                      Service Checks

                                                                                                                                                                                                                                                                                                                                                                      ntp.in_sync:

                                                                                                                                                                                                                                                                                                                                                                      Returns CRITICAL if the NTP offset is greater than the threshold specified in dragent.yaml, otherwise OK.

                                                                                                                                                                                                                                                                                                                                                                      Result in the Monitor UI

                                                                                                                                                                                                                                                                                                                                                                      8.6.2.19 -

                                                                                                                                                                                                                                                                                                                                                                      PGBouncer

                                                                                                                                                                                                                                                                                                                                                                      PgBouncer is a lightweight connection pooler for PostgreSQL. If PgBouncer is installed on your environment, you may need to edit the Sysdig agent configuration file to connect. See the Default Configuration section, below.

                                                                                                                                                                                                                                                                                                                                                                      This page describes the configuration settings, the metrics available for integration, and a sample result in the Sysdig Monitor UI.

                                                                                                                                                                                                                                                                                                                                                                      PgBouncer Setup

                                                                                                                                                                                                                                                                                                                                                                      PgBouncer does not ship with a default stats user configuration. To configure it, you need to add a user allowed to access PgBouncer stats. Do so by adding following line in pgbouncer.ini. The default file location is /etc/pgbouncer/pgbouncer.ini

                                                                                                                                                                                                                                                                                                                                                                      stats_users = sysdig_cloud
                                                                                                                                                                                                                                                                                                                                                                      

                                                                                                                                                                                                                                                                                                                                                                      For the same user you need the following entry in userlist.txt. The default file location is /etc/pgbouncer/userlist.txt

                                                                                                                                                                                                                                                                                                                                                                      "sysdig_cloud" "sysdig_cloud_password"
                                                                                                                                                                                                                                                                                                                                                                      

                                                                                                                                                                                                                                                                                                                                                                      Sysdig Agent Configuration

                                                                                                                                                                                                                                                                                                                                                                      Review how to Edit dragent.yaml to Integrate or Modify Application Checks.

                                                                                                                                                                                                                                                                                                                                                                      Default Configuration

                                                                                                                                                                                                                                                                                                                                                                      No default configuration is present in Sysdig’s dragent.default.yaml file for PgBouncer, as it requires a unique username and password. You must add a custom entry in dragent.yaml as follows:

                                                                                                                                                                                                                                                                                                                                                                      Remember! Never edit dragent.default.yaml directly; always edit only dragent.yaml.

                                                                                                                                                                                                                                                                                                                                                                      Example

                                                                                                                                                                                                                                                                                                                                                                      app_checks:
                                                                                                                                                                                                                                                                                                                                                                        - name: pgbouncer
                                                                                                                                                                                                                                                                                                                                                                          pattern:
                                                                                                                                                                                                                                                                                                                                                                            comm: pgbouncer
                                                                                                                                                                                                                                                                                                                                                                          conf:
                                                                                                                                                                                                                                                                                                                                                                            host: localhost # set if the bind ip is different
                                                                                                                                                                                                                                                                                                                                                                            port: 6432      # set if the port is not the default
                                                                                                                                                                                                                                                                                                                                                                            username: sysdig_cloud
                                                                                                                                                                                                                                                                                                                                                                            password: sysdig_cloud_password #replace with appropriate password
                                                                                                                                                                                                                                                                                                                                                                      

                                                                                                                                                                                                                                                                                                                                                                      Metrics Available

                                                                                                                                                                                                                                                                                                                                                                      See PGBouncer Metrics.

                                                                                                                                                                                                                                                                                                                                                                      Result in the Monitor UI

                                                                                                                                                                                                                                                                                                                                                                      8.6.2.20 -

                                                                                                                                                                                                                                                                                                                                                                      PHP-FPM

                                                                                                                                                                                                                                                                                                                                                                      PHP-FPM (FastCGI Process Manager) is an alternative PHP FastCGI implementation, with some additional features useful for sites of any size, especially busier sites. If PHP-FPM is installed on your environment, the Sysdig agent will automatically connect. You may need to edit the default entries to connect if PHP-FPM has a custom setting in its config file. See the Default Configuration section, below.

                                                                                                                                                                                                                                                                                                                                                                      The Sysdig agent automatically collects all metrics with default configuration.

                                                                                                                                                                                                                                                                                                                                                                      This page describes the default configuration settings, how to edit the configuration to collect additional information, the metrics available for integration, and a sample result in the Sysdig Monitor UI.

                                                                                                                                                                                                                                                                                                                                                                      PHP-FPM Setup

                                                                                                                                                                                                                                                                                                                                                                      This check has a default configuration that should suit most use cases. If it does not work for you, verify that you have added these lines to your php-fpm.conf file. The default location is /etc/

                                                                                                                                                                                                                                                                                                                                                                      pm.status_path = /status
                                                                                                                                                                                                                                                                                                                                                                      ping.path = /ping
                                                                                                                                                                                                                                                                                                                                                                      

                                                                                                                                                                                                                                                                                                                                                                      Sysdig Agent Configuration

                                                                                                                                                                                                                                                                                                                                                                      Review how to Edit dragent.yaml to Integrate or Modify Application Checks.

                                                                                                                                                                                                                                                                                                                                                                      Default Configuration

                                                                                                                                                                                                                                                                                                                                                                      By default, Sysdig’s dragent.default.yaml uses the following code to connect with PHP-FPM and collect all metrics:

                                                                                                                                                                                                                                                                                                                                                                      app_checks:
                                                                                                                                                                                                                                                                                                                                                                        - name: php-fpm
                                                                                                                                                                                                                                                                                                                                                                          check_module: php_fpm
                                                                                                                                                                                                                                                                                                                                                                          retry: false
                                                                                                                                                                                                                                                                                                                                                                          pattern:
                                                                                                                                                                                                                                                                                                                                                                            exe: "php-fpm: master process"
                                                                                                                                                                                                                                                                                                                                                                      

                                                                                                                                                                                                                                                                                                                                                                      If you have a configuration other than those for PHP-FPM in php-fpm.conf, you can edit the Sysdig agent configuration in dragent.yaml, as shown in Example 1.

                                                                                                                                                                                                                                                                                                                                                                      Remember! Never edit dragent.default.yaml directly; always edit only dragent.yaml.

                                                                                                                                                                                                                                                                                                                                                                      Example

                                                                                                                                                                                                                                                                                                                                                                      Replace the values of status_url and ping_url below with the values set against pm.status_path and ping.path respectively in your php-fpm.conf:

                                                                                                                                                                                                                                                                                                                                                                      app_checks:
                                                                                                                                                                                                                                                                                                                                                                        - name: php-fpm
                                                                                                                                                                                                                                                                                                                                                                          check_module: php_fpm
                                                                                                                                                                                                                                                                                                                                                                          pattern:
                                                                                                                                                                                                                                                                                                                                                                            exe: "php-fpm: master process"
                                                                                                                                                                                                                                                                                                                                                                          conf:
                                                                                                                                                                                                                                                                                                                                                                            status_url: /mystatus
                                                                                                                                                                                                                                                                                                                                                                            ping_url: /myping
                                                                                                                                                                                                                                                                                                                                                                            ping_reply: mypingreply
                                                                                                                                                                                                                                                                                                                                                                      

                                                                                                                                                                                                                                                                                                                                                                      Metrics Available

                                                                                                                                                                                                                                                                                                                                                                      See PHP-FPM Metrics.

                                                                                                                                                                                                                                                                                                                                                                      Result in the Monitor UI

                                                                                                                                                                                                                                                                                                                                                                      8.6.2.21 -

                                                                                                                                                                                                                                                                                                                                                                      PostgreSQL

                                                                                                                                                                                                                                                                                                                                                                      PostgreSQL is a powerful, open-source, object-relational database system that has earned a strong reputation for reliability, feature robustness, and performance.

                                                                                                                                                                                                                                                                                                                                                                      If PostgreSQL is installed in your environment, the Sysdig agent will automatically connect in most cases. In some conditions, you may need to create a specific user for Sysdig and edit the default entries to connect.

                                                                                                                                                                                                                                                                                                                                                                      See the Default Configuration section, below. The Sysdig agent automatically collects all metrics with the default configuration when correct credentials are provided.

                                                                                                                                                                                                                                                                                                                                                                      This page describes the default configuration settings, how to edit the configuration to collect additional information, the metrics available for integration, and a sample result in the Sysdig Monitor UI.

                                                                                                                                                                                                                                                                                                                                                                      PostgreSQL Setup

                                                                                                                                                                                                                                                                                                                                                                      PostgreSQL will be auto-discovered and the agent will connect through the Unix socket using the Default Configuration with the **postgres **default user. If this does not work, you can create a user for Sysdig Monitor and give it enough permissions to read Postgres stats. To do this, execute the following example statements on your server:

                                                                                                                                                                                                                                                                                                                                                                      create user sysdig-cloud with password 'password';
                                                                                                                                                                                                                                                                                                                                                                      grant SELECT ON pg_stat_database to sysdig_cloud;
                                                                                                                                                                                                                                                                                                                                                                      

                                                                                                                                                                                                                                                                                                                                                                      Sysdig Agent Configuration

                                                                                                                                                                                                                                                                                                                                                                      Review how to Edit dragent.yaml to Integrate or Modify Application Checks.

                                                                                                                                                                                                                                                                                                                                                                      Default Configuration

                                                                                                                                                                                                                                                                                                                                                                      By default, Sysdig’s default.dragent.yaml uses the following code to connect with Postgres.

                                                                                                                                                                                                                                                                                                                                                                      app_checks:
                                                                                                                                                                                                                                                                                                                                                                        - name: postgres
                                                                                                                                                                                                                                                                                                                                                                          pattern:
                                                                                                                                                                                                                                                                                                                                                                            comm: postgres
                                                                                                                                                                                                                                                                                                                                                                            port: 5432
                                                                                                                                                                                                                                                                                                                                                                          conf:
                                                                                                                                                                                                                                                                                                                                                                            unix_sock: "/var/run/postgresql/"
                                                                                                                                                                                                                                                                                                                                                                            username: postgres
                                                                                                                                                                                                                                                                                                                                                                      

                                                                                                                                                                                                                                                                                                                                                                      If a special user for Sysdig is created, then update dragent.yaml file with the Expanded Example, below.

                                                                                                                                                                                                                                                                                                                                                                      Never edit default.dragent.yaml directly; always edit only dragent.yaml.

                                                                                                                                                                                                                                                                                                                                                                      Example 1: Special User

                                                                                                                                                                                                                                                                                                                                                                      Update the username and password created for the Sysdig agent in the respective fields, as follows:

                                                                                                                                                                                                                                                                                                                                                                      app_checks:
                                                                                                                                                                                                                                                                                                                                                                        - name: postgres
                                                                                                                                                                                                                                                                                                                                                                          pattern:
                                                                                                                                                                                                                                                                                                                                                                            comm: postgres
                                                                                                                                                                                                                                                                                                                                                                            port: 5432
                                                                                                                                                                                                                                                                                                                                                                          conf:
                                                                                                                                                                                                                                                                                                                                                                            username: sysdig-cloud
                                                                                                                                                                                                                                                                                                                                                                            password: password
                                                                                                                                                                                                                                                                                                                                                                      

                                                                                                                                                                                                                                                                                                                                                                      Example 2: Connecting on Unix Socket

                                                                                                                                                                                                                                                                                                                                                                      If Postgres is listening on Unix socket /tmp/.s.PGSQL.5432, set value of unix_sock to /tmp/

                                                                                                                                                                                                                                                                                                                                                                      app_checks:
                                                                                                                                                                                                                                                                                                                                                                        - name: postgres
                                                                                                                                                                                                                                                                                                                                                                          pattern:
                                                                                                                                                                                                                                                                                                                                                                            comm: postgres
                                                                                                                                                                                                                                                                                                                                                                            port: 5432
                                                                                                                                                                                                                                                                                                                                                                          conf:
                                                                                                                                                                                                                                                                                                                                                                            unix_sock: "/tmp/"
                                                                                                                                                                                                                                                                                                                                                                            username: postgres
                                                                                                                                                                                                                                                                                                                                                                      

                                                                                                                                                                                                                                                                                                                                                                      Example 3: Relations

                                                                                                                                                                                                                                                                                                                                                                      Lists of relations/tables can be specified to track per-relation metrics.

                                                                                                                                                                                                                                                                                                                                                                      A single relation can be specified in two ways:

                                                                                                                                                                                                                                                                                                                                                                      • Single relation with exact name against relation_name.

                                                                                                                                                                                                                                                                                                                                                                      • Regex to include all matching relation against relation_regex.

                                                                                                                                                                                                                                                                                                                                                                      If schemas are not provided, all schemas will be included. dbname is to be provided if relations is specified.

                                                                                                                                                                                                                                                                                                                                                                      app_checks:
                                                                                                                                                                                                                                                                                                                                                                        - name: postgres
                                                                                                                                                                                                                                                                                                                                                                          pattern:
                                                                                                                                                                                                                                                                                                                                                                            comm: postgres
                                                                                                                                                                                                                                                                                                                                                                            port: 5432
                                                                                                                                                                                                                                                                                                                                                                          conf:
                                                                                                                                                                                                                                                                                                                                                                            username: <username>
                                                                                                                                                                                                                                                                                                                                                                            password: <password>
                                                                                                                                                                                                                                                                                                                                                                            dbname: <user_db_name>
                                                                                                                                                                                                                                                                                                                                                                            relations:
                                                                                                                                                                                                                                                                                                                                                                              - relation_name: <table_name_1>
                                                                                                                                                                                                                                                                                                                                                                                schemas:
                                                                                                                                                                                                                                                                                                                                                                                  - <schema_name_1>
                                                                                                                                                                                                                                                                                                                                                                              - relation_regex: <table_pattern>
                                                                                                                                                                                                                                                                                                                                                                      

                                                                                                                                                                                                                                                                                                                                                                      Example 4: Other Optional Parameters

                                                                                                                                                                                                                                                                                                                                                                      app_checks:
                                                                                                                                                                                                                                                                                                                                                                        - name: postgres
                                                                                                                                                                                                                                                                                                                                                                          check_module: postgres
                                                                                                                                                                                                                                                                                                                                                                          pattern:
                                                                                                                                                                                                                                                                                                                                                                            comm: postgres
                                                                                                                                                                                                                                                                                                                                                                            port: 5432
                                                                                                                                                                                                                                                                                                                                                                          conf:
                                                                                                                                                                                                                                                                                                                                                                            username: postgres
                                                                                                                                                                                                                                                                                                                                                                            unix_sock: "/var/run/postgresql"
                                                                                                                                                                                                                                                                                                                                                                            dbname: <user_db_name>
                                                                                                                                                                                                                                                                                                                                                                            #collect_activity_metrics: true
                                                                                                                                                                                                                                                                                                                                                                            #collect_default_database: true
                                                                                                                                                                                                                                                                                                                                                                            #tag_replication_role: true
                                                                                                                                                                                                                                                                                                                                                                      
                                                                                                                                                                                                                                                                                                                                                                      Optional Parameters

                                                                                                                                                                                                                                                                                                                                                                      Config Parameter

                                                                                                                                                                                                                                                                                                                                                                      Description

                                                                                                                                                                                                                                                                                                                                                                      Default Value

                                                                                                                                                                                                                                                                                                                                                                      collect_activity_metrics

                                                                                                                                                                                                                                                                                                                                                                      When set to true, it will enable metrics from pg_stat_activity. New metrics added will be:

                                                                                                                                                                                                                                                                                                                                                                      • postgresql.active_queries

                                                                                                                                                                                                                                                                                                                                                                      • postgresql.transactions.idle_in_transaction

                                                                                                                                                                                                                                                                                                                                                                      • postgresql.transactions.open

                                                                                                                                                                                                                                                                                                                                                                      • postgresql.waiting_queries

                                                                                                                                                                                                                                                                                                                                                                      false

                                                                                                                                                                                                                                                                                                                                                                      collect_default_database

                                                                                                                                                                                                                                                                                                                                                                      When set to true, it will collect statistics from default database which is postgres. All metrics from postgres database will have tag db:postgres

                                                                                                                                                                                                                                                                                                                                                                      false

                                                                                                                                                                                                                                                                                                                                                                      tag_replication_role

                                                                                                                                                                                                                                                                                                                                                                      When set to true, metrics and checks will be tagged with replication_role:<master|standby>

                                                                                                                                                                                                                                                                                                                                                                      false

                                                                                                                                                                                                                                                                                                                                                                      Optional Parameters

                                                                                                                                                                                                                                                                                                                                                                      Example 5: Custom Metrics Using Custom Queries

                                                                                                                                                                                                                                                                                                                                                                      Personalized custom metrics can be collected from Postgres using custom queries.

                                                                                                                                                                                                                                                                                                                                                                      app_checks:
                                                                                                                                                                                                                                                                                                                                                                        - name: postgres
                                                                                                                                                                                                                                                                                                                                                                          pattern:
                                                                                                                                                                                                                                                                                                                                                                            comm: postgres
                                                                                                                                                                                                                                                                                                                                                                            port: 5432
                                                                                                                                                                                                                                                                                                                                                                          conf:
                                                                                                                                                                                                                                                                                                                                                                            unix_sock: "/var/run/postgresql/"
                                                                                                                                                                                                                                                                                                                                                                            username: postgres
                                                                                                                                                                                                                                                                                                                                                                            custom_queries:
                                                                                                                                                                                                                                                                                                                                                                              - metric_prefix: postgresql.custom
                                                                                                                                                                                                                                                                                                                                                                                query: <QUERY>
                                                                                                                                                                                                                                                                                                                                                                                columns:
                                                                                                                                                                                                                                                                                                                                                                                  - name: <COLUNMS_1_NAME>
                                                                                                                                                                                                                                                                                                                                                                                    type: <COLUMNS_1_TYPE>
                                                                                                                                                                                                                                                                                                                                                                                  - name: <COLUNMS_2_NAME>
                                                                                                                                                                                                                                                                                                                                                                                    type: <COLUMNS_2_TYPE>
                                                                                                                                                                                                                                                                                                                                                                                tags:
                                                                                                                                                                                                                                                                                                                                                                                  - <TAG_KEY>:<TAG_VALUE>
                                                                                                                                                                                                                                                                                                                                                                      
                                                                                                                                                                                                                                                                                                                                                                      OptionRequiredDescription
                                                                                                                                                                                                                                                                                                                                                                      metric_prefixYesEach metric starts with the chosen prefix.
                                                                                                                                                                                                                                                                                                                                                                      queryYesThis is the SQL to execute. It can be a simple statement or a multi-line script. All of the rows of the results are evaluated. Use the pipe if you require a multi-line script
                                                                                                                                                                                                                                                                                                                                                                      columnsYesThis is a list representing each column ordered sequentially from left to right. The number of columns must equal the number of columns returned in the query. There are 2 required pieces of data:- name: This is the suffix to append to the metric_prefix to form the full metric name. If the type is specified as tag, the column is instead applied as a tag to every metric collected by this query.- type: This is the submission method (gauge, count, rate, etc.). This can also be set to ’tag’ to tag each metric in the row with the name and value of the item in this column
                                                                                                                                                                                                                                                                                                                                                                      tagsNoA list of tags to apply to each metric (as specified above).

                                                                                                                                                                                                                                                                                                                                                                      Optional Parameters

                                                                                                                                                                                                                                                                                                                                                                      Metrics Available

                                                                                                                                                                                                                                                                                                                                                                      See PostgreSQL Metrics.

                                                                                                                                                                                                                                                                                                                                                                      Result in the Monitor UI

                                                                                                                                                                                                                                                                                                                                                                      Default Dashboard

                                                                                                                                                                                                                                                                                                                                                                      The default PostgreSQL dashboard includes combined metrics and individual metrics in an overview page.

                                                                                                                                                                                                                                                                                                                                                                      Other Views

                                                                                                                                                                                                                                                                                                                                                                      You can also view individual metric charts from a drop-down menu in an Explore view.

                                                                                                                                                                                                                                                                                                                                                                      8.6.2.22 -

                                                                                                                                                                                                                                                                                                                                                                      RabbitMQ

                                                                                                                                                                                                                                                                                                                                                                      RabbitMQ is an open-source message-broker software (sometimes called message-oriented middleware) that implements Advanced Message Queuing Protocol (AMQP). The RabbitMQ server is written in the Erlang language and is built on the Open Telecom Platform framework for clustering and fail-over. Client libraries to interface with the broker are available in all major programming languages. If RabbitMQ is installed on your environment, the Sysdig agent will automatically connect. See the Default Configuration section, below.

                                                                                                                                                                                                                                                                                                                                                                      The Sysdig agent automatically collects all metrics with the default configuration. You may need to edit the dragent.yaml file if a metrics limit is reached.

                                                                                                                                                                                                                                                                                                                                                                      This page describes the default configuration settings, how to edit the configuration to collect additional information, the metrics available for integration, and a sample result in the Sysdig Monitor UI.

                                                                                                                                                                                                                                                                                                                                                                      RabbitMQ Setup

                                                                                                                                                                                                                                                                                                                                                                      Enable the RabbitMQ management plugin. See RabbitMQ’s documentation to enable it.

                                                                                                                                                                                                                                                                                                                                                                      Sysdig Agent Configuration

                                                                                                                                                                                                                                                                                                                                                                      Review how to Edit dragent.yaml to Integrate or Modify Application Checks.

                                                                                                                                                                                                                                                                                                                                                                      Default Configuration

                                                                                                                                                                                                                                                                                                                                                                      By default, Sysdig’s dragent.default.yaml uses the following code to connect with RabbitMQ and collect all metrics.

                                                                                                                                                                                                                                                                                                                                                                      app_checks:
                                                                                                                                                                                                                                                                                                                                                                        - name: rabbitmq
                                                                                                                                                                                                                                                                                                                                                                          pattern:
                                                                                                                                                                                                                                                                                                                                                                            port: 15672
                                                                                                                                                                                                                                                                                                                                                                          conf:
                                                                                                                                                                                                                                                                                                                                                                            rabbitmq_api_url: "http://localhost:15672/api/"
                                                                                                                                                                                                                                                                                                                                                                            rabbitmq_user: guest
                                                                                                                                                                                                                                                                                                                                                                            rabbitmq_pass: guest
                                                                                                                                                                                                                                                                                                                                                                      

                                                                                                                                                                                                                                                                                                                                                                      The RabbitMQ app check tracks various entities, such as exchanges, queues and nodes. Each of these entities has its maximum limits. If the limit is reached, metrics can be controlled by editing the dragent.yaml file, as in the following examples.

                                                                                                                                                                                                                                                                                                                                                                      Remember! Never edit dragent.default.yaml directly; always edit only dragent.yaml.

                                                                                                                                                                                                                                                                                                                                                                      Example 1: Manage logging_interval

                                                                                                                                                                                                                                                                                                                                                                      When a maximum limit is exceeded, the app check will log an info message:

                                                                                                                                                                                                                                                                                                                                                                      rabbitmq: Too many <entity type> (<number of entities>) to fetch and maximum limit is (<configured limit>). You must choose the <entity type> you are interested in by editing the dragent.yaml configuration file

                                                                                                                                                                                                                                                                                                                                                                      This message is suppressed by a configuration parameter, logging_interval.

                                                                                                                                                                                                                                                                                                                                                                      Its default value is 300 seconds. This can be altered by specifying a different value in dragent.yaml.

                                                                                                                                                                                                                                                                                                                                                                      app_checks:
                                                                                                                                                                                                                                                                                                                                                                        - name: rabbitmq
                                                                                                                                                                                                                                                                                                                                                                          pattern:
                                                                                                                                                                                                                                                                                                                                                                            port: 15672
                                                                                                                                                                                                                                                                                                                                                                          conf:
                                                                                                                                                                                                                                                                                                                                                                            rabbitmq_api_url: "http://localhost:15672/api/"
                                                                                                                                                                                                                                                                                                                                                                            rabbitmq_user: guest
                                                                                                                                                                                                                                                                                                                                                                            rabbitmq_pass: guest
                                                                                                                                                                                                                                                                                                                                                                            logging_interval: 10 # Value in seconds. Default is 300
                                                                                                                                                                                                                                                                                                                                                                      

                                                                                                                                                                                                                                                                                                                                                                      Example 2: Specify Nodes, Queues, or Exchanges

                                                                                                                                                                                                                                                                                                                                                                      Each of the tracked RabbitMQ entities has its maximum limits. As of Agent v10.5.1, the default limits are as follows:

                                                                                                                                                                                                                                                                                                                                                                      • Exchanges: 16 per-exchange metrics

                                                                                                                                                                                                                                                                                                                                                                      • Queues: 20 per-queue metrics

                                                                                                                                                                                                                                                                                                                                                                      • Nodes: 9 per-node metrics

                                                                                                                                                                                                                                                                                                                                                                      The max_detailed_* settings for the RabbitMQ app check do not limit the reported number of queues, exchanges, and node, but the number of generated metrics for the objects. For example, a single queue might report up to 20 metrics, and therefore, set max_detailed_queues to 20 times the actual number of queues.

                                                                                                                                                                                                                                                                                                                                                                      The metrics for these entities are tagged. If any of these entities are present but no transactions have occurred for them, the metrics are still reported with 0 values, though without tags. Therefore, when segmenting these metrics, the tags will show as unset in the Sysdig Monitor Explore view. However, all such entities are still counted against the maximum limits. In such a scenario, you can specify the entity names for which you want to collect metrics in the dragent.yaml file.

                                                                                                                                                                                                                                                                                                                                                                      app_checks:
                                                                                                                                                                                                                                                                                                                                                                        - name: rabbitmq
                                                                                                                                                                                                                                                                                                                                                                          pattern:
                                                                                                                                                                                                                                                                                                                                                                            port: 15672
                                                                                                                                                                                                                                                                                                                                                                          conf:
                                                                                                                                                                                                                                                                                                                                                                            rabbitmq_api_url: "http://localhost:15672/api/"
                                                                                                                                                                                                                                                                                                                                                                            rabbitmq_user: guest
                                                                                                                                                                                                                                                                                                                                                                            rabbitmq_pass: guest
                                                                                                                                                                                                                                                                                                                                                                            tags: ["queues:<queuename>"]
                                                                                                                                                                                                                                                                                                                                                                            nodes:
                                                                                                                                                                                                                                                                                                                                                                              - rabbit@localhost
                                                                                                                                                                                                                                                                                                                                                                              - rabbit2@domain
                                                                                                                                                                                                                                                                                                                                                                            nodes_regexes:
                                                                                                                                                                                                                                                                                                                                                                              - bla.*
                                                                                                                                                                                                                                                                                                                                                                            queues:
                                                                                                                                                                                                                                                                                                                                                                              - queue1
                                                                                                                                                                                                                                                                                                                                                                              - queue2
                                                                                                                                                                                                                                                                                                                                                                            queues_regexes:
                                                                                                                                                                                                                                                                                                                                                                              - thisqueue-.*
                                                                                                                                                                                                                                                                                                                                                                              - another_\d+queue
                                                                                                                                                                                                                                                                                                                                                                            exchanges:
                                                                                                                                                                                                                                                                                                                                                                              - exchange1
                                                                                                                                                                                                                                                                                                                                                                              - exchange2
                                                                                                                                                                                                                                                                                                                                                                            exchanges_regexes:
                                                                                                                                                                                                                                                                                                                                                                              - exchange*
                                                                                                                                                                                                                                                                                                                                                                      

                                                                                                                                                                                                                                                                                                                                                                      Example 3: Custom tags

                                                                                                                                                                                                                                                                                                                                                                      Optional tags can be applied to every emitted metric, service check, and/or event.

                                                                                                                                                                                                                                                                                                                                                                      Names can be specified by exact name or regular expression.

                                                                                                                                                                                                                                                                                                                                                                      app_checks:
                                                                                                                                                                                                                                                                                                                                                                        - name: rabbitmq
                                                                                                                                                                                                                                                                                                                                                                          pattern:
                                                                                                                                                                                                                                                                                                                                                                            port: 15672
                                                                                                                                                                                                                                                                                                                                                                          conf:
                                                                                                                                                                                                                                                                                                                                                                            rabbitmq_api_url: "http://localhost:15672/api/"
                                                                                                                                                                                                                                                                                                                                                                            rabbitmq_user: guest
                                                                                                                                                                                                                                                                                                                                                                            rabbitmq_pass: guest
                                                                                                                                                                                                                                                                                                                                                                            tags: ["some_tag:some_value"]
                                                                                                                                                                                                                                                                                                                                                                      

                                                                                                                                                                                                                                                                                                                                                                      Example 4: filter_by_node

                                                                                                                                                                                                                                                                                                                                                                      Use filter_by_node: true if you want each node to report information localized to the node. Without this option, each node reports cluster-wide info (as presented by RabbitMQ itself). This option makes it easier to view the metrics in the UI by removing redundant information reported by individual nodes.

                                                                                                                                                                                                                                                                                                                                                                      Default: false.

                                                                                                                                                                                                                                                                                                                                                                      Prerequisite: Sysdig agent v. 92.3 or higher.

                                                                                                                                                                                                                                                                                                                                                                      app_checks:
                                                                                                                                                                                                                                                                                                                                                                        - name: rabbitmq
                                                                                                                                                                                                                                                                                                                                                                          pattern:
                                                                                                                                                                                                                                                                                                                                                                            port: 15672
                                                                                                                                                                                                                                                                                                                                                                          conf:
                                                                                                                                                                                                                                                                                                                                                                            rabbitmq_api_url: "http://localhost:15672/api/"
                                                                                                                                                                                                                                                                                                                                                                            rabbitmq_user: guest
                                                                                                                                                                                                                                                                                                                                                                            rabbitmq_pass: guest
                                                                                                                                                                                                                                                                                                                                                                            filter_by_node: true
                                                                                                                                                                                                                                                                                                                                                                      

                                                                                                                                                                                                                                                                                                                                                                      Metrics Available

                                                                                                                                                                                                                                                                                                                                                                      See RabbitMQ Metrics.

                                                                                                                                                                                                                                                                                                                                                                      Result in the Monitor UI

                                                                                                                                                                                                                                                                                                                                                                      8.6.2.23 -

                                                                                                                                                                                                                                                                                                                                                                      RedisDB

                                                                                                                                                                                                                                                                                                                                                                      Redis is an open-source (BSD licensed), in-memory data structure store, used as a database, cache, and message broker. If Redis is installed in your environment, the Sysdig agent will automatically connect in most cases. You may need to edit the default entries to get additional metrics. See the Default Configuration section, below.

                                                                                                                                                                                                                                                                                                                                                                      This page describes the default configuration settings, how to edit the configuration to collect additional information, the metrics available for integration, and a sample result in the Sysdig Monitor UI.

                                                                                                                                                                                                                                                                                                                                                                      Application Setup

                                                                                                                                                                                                                                                                                                                                                                      Redis will automatically expose all metrics. You do not need to configure anything in the Redis instance.

                                                                                                                                                                                                                                                                                                                                                                      Sysdig Agent Configuration

                                                                                                                                                                                                                                                                                                                                                                      Review how to Edit dragent.yaml to Integrate or Modify Application Checks.

                                                                                                                                                                                                                                                                                                                                                                      Default Configuration

                                                                                                                                                                                                                                                                                                                                                                      By default, Sysdig’s dragent.default.yaml uses the following code to connect with Redis and collect basic metrics:

                                                                                                                                                                                                                                                                                                                                                                      app_checks:
                                                                                                                                                                                                                                                                                                                                                                        - name: redis
                                                                                                                                                                                                                                                                                                                                                                          check_module: redisdb
                                                                                                                                                                                                                                                                                                                                                                          pattern:
                                                                                                                                                                                                                                                                                                                                                                            comm: redis-server
                                                                                                                                                                                                                                                                                                                                                                          conf:
                                                                                                                                                                                                                                                                                                                                                                            host: 127.0.0.1
                                                                                                                                                                                                                                                                                                                                                                            port: "{port}"
                                                                                                                                                                                                                                                                                                                                                                      

                                                                                                                                                                                                                                                                                                                                                                      Some additional metrics can be collected by editing the configuration file as shown in following examples. The options shown in Example 2 are relevant if Redis requires authentication or if a Unix socket is used.

                                                                                                                                                                                                                                                                                                                                                                      Remember! Never edit dragent.default.yaml directly; always edit only dragent.yaml.

                                                                                                                                                                                                                                                                                                                                                                      Example 1: Key Lengths

                                                                                                                                                                                                                                                                                                                                                                      The following example entry results in the metric redis.key.length in the Sysdig Monitor UI, displaying the length of specific keys (segmented by: key). To enable, provide the key names in dragent.yaml as follows.

                                                                                                                                                                                                                                                                                                                                                                      Note that length is 0 (zero) for keys that have a type other than list, set, hash, or sorted set. Keys can be expressed as patterns; see https://redis.io/commands/keys.

                                                                                                                                                                                                                                                                                                                                                                      Sample entry in dragent.yaml:

                                                                                                                                                                                                                                                                                                                                                                      app_checks:
                                                                                                                                                                                                                                                                                                                                                                        - name: redis
                                                                                                                                                                                                                                                                                                                                                                          check_module: redisdb
                                                                                                                                                                                                                                                                                                                                                                          pattern:
                                                                                                                                                                                                                                                                                                                                                                            comm: redis-server
                                                                                                                                                                                                                                                                                                                                                                          conf:
                                                                                                                                                                                                                                                                                                                                                                            host: 127.0.0.1
                                                                                                                                                                                                                                                                                                                                                                            port: "{port}"
                                                                                                                                                                                                                                                                                                                                                                            keys:
                                                                                                                                                                                                                                                                                                                                                                              - "list_1"
                                                                                                                                                                                                                                                                                                                                                                              - "list_9*"
                                                                                                                                                                                                                                                                                                                                                                      

                                                                                                                                                                                                                                                                                                                                                                      Example 2: Additional Configuration Options

                                                                                                                                                                                                                                                                                                                                                                      • unix_socket_path (Optional) - Can be used if your Redis uses a socket instead of host and port.

                                                                                                                                                                                                                                                                                                                                                                      • password (Optional) - Can be used if your Redis requires a password

                                                                                                                                                                                                                                                                                                                                                                      app_checks:
                                                                                                                                                                                                                                                                                                                                                                        - name: redis
                                                                                                                                                                                                                                                                                                                                                                          check_module: redisdb
                                                                                                                                                                                                                                                                                                                                                                          pattern:
                                                                                                                                                                                                                                                                                                                                                                            comm: redis-server
                                                                                                                                                                                                                                                                                                                                                                          conf:
                                                                                                                                                                                                                                                                                                                                                                            host: 127.0.0.1
                                                                                                                                                                                                                                                                                                                                                                            port: "{port}"
                                                                                                                                                                                                                                                                                                                                                                            # unix_socket_path: /var/run/redis/redis.sock # can be used in lieu of host/port
                                                                                                                                                                                                                                                                                                                                                                            # password: mypassword                                            # if your Redis requires auth
                                                                                                                                                                                                                                                                                                                                                                      

                                                                                                                                                                                                                                                                                                                                                                      Example 3: COMMANDSTATS Metrics

                                                                                                                                                                                                                                                                                                                                                                      You can also collect the INFO COMMANDSTATS result as metrics (redis.command.*). This works with Redis >= 2.6

                                                                                                                                                                                                                                                                                                                                                                      Sample implementation:

                                                                                                                                                                                                                                                                                                                                                                      app_checks:
                                                                                                                                                                                                                                                                                                                                                                        - name: redis
                                                                                                                                                                                                                                                                                                                                                                          check_module: redisdb
                                                                                                                                                                                                                                                                                                                                                                          pattern:
                                                                                                                                                                                                                                                                                                                                                                            comm: redis-server
                                                                                                                                                                                                                                                                                                                                                                          conf:
                                                                                                                                                                                                                                                                                                                                                                            host: 127.0.0.1
                                                                                                                                                                                                                                                                                                                                                                            port: "{port}"
                                                                                                                                                                                                                                                                                                                                                                            command_stats: true
                                                                                                                                                                                                                                                                                                                                                                      

                                                                                                                                                                                                                                                                                                                                                                      Metrics Available

                                                                                                                                                                                                                                                                                                                                                                      See RedisDB Metrics.

                                                                                                                                                                                                                                                                                                                                                                      Result in the Monitor UI

                                                                                                                                                                                                                                                                                                                                                                      8.6.2.24 -

                                                                                                                                                                                                                                                                                                                                                                      SNMP

                                                                                                                                                                                                                                                                                                                                                                      Simple Network Management Protocol (SNMP) is an application-layer protocol used to manage and monitor network devices and their functions. The Sysdig agent can connect to network devices and collect metrics using SNMP.

                                                                                                                                                                                                                                                                                                                                                                      This page describes the default configuration settings, how to edit the configuration to collect additional information, the metrics available for integration, and a sample result in the Sysdig Monitor UI.

                                                                                                                                                                                                                                                                                                                                                                      SNMP Overview

                                                                                                                                                                                                                                                                                                                                                                      Simple Network Management Protocol (SNMP) is an Internet Standard protocol for collecting and configuring information about devices in the networks. The network devices include physical devices like switches, routers, servers etc.

                                                                                                                                                                                                                                                                                                                                                                      SNMP has three primary versions ( SNMPv1, SNMPv2c and SNMPv3) and SNMPv2c is most widely used.

                                                                                                                                                                                                                                                                                                                                                                      SNMP allows device vendors to expose management data in the form of variables on managed systems organized in a management information base (MIB), which describe the system status and configuration. The devices can be queried as well as configured remotely using these variables. Certain MIBs are generic and supported by the majority of the device vendors. Additionally, each vendor can have their own private/enterprise MIBs for vendor-specific information.

                                                                                                                                                                                                                                                                                                                                                                      SNMP MIB is a collection of objects uniquely identified by an Object Identifier (OID). OIDs are represented in the form of x.0, where x is the name of object in the MIB definition.

                                                                                                                                                                                                                                                                                                                                                                      For example, suppose one wanted to identify an instance of the variable sysDescr
                                                                                                                                                                                                                                                                                                                                                                      
                                                                                                                                                                                                                                                                                                                                                                      The object class for sysDescr is:
                                                                                                                                                                                                                                                                                                                                                                      
                                                                                                                                                                                                                                                                                                                                                                      iso org dod internet mgmt mib system sysDescr
                                                                                                                                                                                                                                                                                                                                                                       1   3   6     1      2    1    1       1
                                                                                                                                                                                                                                                                                                                                                                      
                                                                                                                                                                                                                                                                                                                                                                      Hence, the object type, x, would be 1.3.6.1.2.1.1.1
                                                                                                                                                                                                                                                                                                                                                                      

                                                                                                                                                                                                                                                                                                                                                                      SNMP Agent Configuration

                                                                                                                                                                                                                                                                                                                                                                      To monitor the servers with the Sysdig agent, the SNMP agent must be installed on the servers to query the system information.

                                                                                                                                                                                                                                                                                                                                                                      For Ubuntu-based servers, use the following commands to install the SNMP Daemon:

                                                                                                                                                                                                                                                                                                                                                                      $sudo apt-get update
                                                                                                                                                                                                                                                                                                                                                                      $sudo apt-get install snmpd
                                                                                                                                                                                                                                                                                                                                                                      

                                                                                                                                                                                                                                                                                                                                                                      Next, configure this SNMP agent to respond to queries from the SNMP manager by updating the configuration file located at /etc/snmp/snmpd.conf

                                                                                                                                                                                                                                                                                                                                                                      Below are the important fields that must be configured:

                                                                                                                                                                                                                                                                                                                                                                      snmpd.conf

                                                                                                                                                                                                                                                                                                                                                                      # Listen for connections on all interfaces (both IPv4 *and* IPv6)
                                                                                                                                                                                                                                                                                                                                                                      agentAddress udp:161,udp6:[::1]:161
                                                                                                                                                                                                                                                                                                                                                                      
                                                                                                                                                                                                                                                                                                                                                                      ## ACCESS CONTROL
                                                                                                                                                                                                                                                                                                                                                                      ## system + hrSystem groups only
                                                                                                                                                                                                                                                                                                                                                                      view systemonly included .1.3.6.1.2.1.1
                                                                                                                                                                                                                                                                                                                                                                      view systemonly included .1.3.6.1.2.1.25.1
                                                                                                                                                                                                                                                                                                                                                                      view systemonly included .1.3.6.1.2.1.31.1
                                                                                                                                                                                                                                                                                                                                                                      view systemonly included .1.3.6.1.2.1.2.2.1.1
                                                                                                                                                                                                                                                                                                                                                                      
                                                                                                                                                                                                                                                                                                                                                                      # Default access to basic system info
                                                                                                                                                                                                                                                                                                                                                                      rocommunity public default -V systemonly
                                                                                                                                                                                                                                                                                                                                                                      # rocommunity6 is for IPv6
                                                                                                                                                                                                                                                                                                                                                                      rocommunity6 public default -V systemonly
                                                                                                                                                                                                                                                                                                                                                                      

                                                                                                                                                                                                                                                                                                                                                                      After making changes to the config file, restart the snmpd service using:

                                                                                                                                                                                                                                                                                                                                                                      $sudo service snmpd restart
                                                                                                                                                                                                                                                                                                                                                                      

                                                                                                                                                                                                                                                                                                                                                                      Sysdig Agent Configuration

                                                                                                                                                                                                                                                                                                                                                                      Review how to Edit dragent.yaml to Integrate or Modify Application Checks.

                                                                                                                                                                                                                                                                                                                                                                      Default Configuration

                                                                                                                                                                                                                                                                                                                                                                      No default configuration is present for SNMP check.

                                                                                                                                                                                                                                                                                                                                                                      • You must specify the OID/MIB for every parameter you want to collect, as in the following example.

                                                                                                                                                                                                                                                                                                                                                                      • The OIDs configured in dragent.yaml are included in the snmpd.conf configuration under the ‘ACCESS CONTROL’ section

                                                                                                                                                                                                                                                                                                                                                                      • Ensure that the community_string is same as configured in the system configuration (rocommunity).

                                                                                                                                                                                                                                                                                                                                                                      Remember! Never edit dragent.default.yaml directly; always edit only dragent.yaml.

                                                                                                                                                                                                                                                                                                                                                                      Example

                                                                                                                                                                                                                                                                                                                                                                      app_checks:
                                                                                                                                                                                                                                                                                                                                                                        - name: snmp
                                                                                                                                                                                                                                                                                                                                                                          pattern:
                                                                                                                                                                                                                                                                                                                                                                            comm: python
                                                                                                                                                                                                                                                                                                                                                                            arg: /opt/draios/bin/sdchecks
                                                                                                                                                                                                                                                                                                                                                                          interval: 30
                                                                                                                                                                                                                                                                                                                                                                          conf:
                                                                                                                                                                                                                                                                                                                                                                            mibs_folder: /usr/share/mibs/ietf/
                                                                                                                                                                                                                                                                                                                                                                            ip_address: 52.53.158.103
                                                                                                                                                                                                                                                                                                                                                                            port: 161
                                                                                                                                                                                                                                                                                                                                                                            community_string: public
                                                                                                                                                                                                                                                                                                                                                                      
                                                                                                                                                                                                                                                                                                                                                                            # Only required for snmp v1, will default to 2
                                                                                                                                                                                                                                                                                                                                                                            # snmp_version: 2
                                                                                                                                                                                                                                                                                                                                                                      
                                                                                                                                                                                                                                                                                                                                                                            # Optional tags can be set with each metric
                                                                                                                                                                                                                                                                                                                                                                            tags:
                                                                                                                                                                                                                                                                                                                                                                               - vendor:EMC
                                                                                                                                                                                                                                                                                                                                                                               - array:VNX5300
                                                                                                                                                                                                                                                                                                                                                                               - location:front
                                                                                                                                                                                                                                                                                                                                                                      
                                                                                                                                                                                                                                                                                                                                                                            metrics:
                                                                                                                                                                                                                                                                                                                                                                              - OID: 1.3.6.1.2.1.25.2.3.1.5
                                                                                                                                                                                                                                                                                                                                                                                name: hrStorageSize
                                                                                                                                                                                                                                                                                                                                                                              - OID: 1.3.6.1.2.1.1.7
                                                                                                                                                                                                                                                                                                                                                                                name: sysServices
                                                                                                                                                                                                                                                                                                                                                                              - MIB: TCP-MIB
                                                                                                                                                                                                                                                                                                                                                                                symbol: tcpActiveOpens
                                                                                                                                                                                                                                                                                                                                                                              - MIB: UDP-MIB
                                                                                                                                                                                                                                                                                                                                                                                symbol: udpInDatagrams
                                                                                                                                                                                                                                                                                                                                                                              - MIB: IP-MIB
                                                                                                                                                                                                                                                                                                                                                                                table: ipSystemStatsTable
                                                                                                                                                                                                                                                                                                                                                                                symbols:
                                                                                                                                                                                                                                                                                                                                                                                  - ipSystemStatsInReceives
                                                                                                                                                                                                                                                                                                                                                                                metric_tags:
                                                                                                                                                                                                                                                                                                                                                                                  - tag: ipversion
                                                                                                                                                                                                                                                                                                                                                                                    index: 1        # specify which index you want to read the tag value from
                                                                                                                                                                                                                                                                                                                                                                              - MIB: IF-MIB
                                                                                                                                                                                                                                                                                                                                                                                table: ifTable
                                                                                                                                                                                                                                                                                                                                                                                symbols:
                                                                                                                                                                                                                                                                                                                                                                                  - ifInOctets
                                                                                                                                                                                                                                                                                                                                                                                  - ifOutOctets
                                                                                                                                                                                                                                                                                                                                                                                metric_tags:
                                                                                                                                                                                                                                                                                                                                                                                  - tag: interface
                                                                                                                                                                                                                                                                                                                                                                                    column: ifDescr  # specify which column to read the tag value from
                                                                                                                                                                                                                                                                                                                                                                      

                                                                                                                                                                                                                                                                                                                                                                      The Sysdig agent allows you to monitor the SNMP counters and gauge of your choice. For each device, specify the metrics that you want to monitor in the metrics subsection using one of the following methods:

                                                                                                                                                                                                                                                                                                                                                                      1. Specify a MIB and the symbol that you want to export

                                                                                                                                                                                                                                                                                                                                                                        metrics:
                                                                                                                                                                                                                                                                                                                                                                          - MIB: UDP-MIB
                                                                                                                                                                                                                                                                                                                                                                            symbol: udpInDatagrams
                                                                                                                                                                                                                                                                                                                                                                        
                                                                                                                                                                                                                                                                                                                                                                      2. Specify an OID and the name you want the metric to appear under in Sysdig Monitor:

                                                                                                                                                                                                                                                                                                                                                                        metrics:
                                                                                                                                                                                                                                                                                                                                                                          - OID: 1.3.6.1.2.1.6.5
                                                                                                                                                                                                                                                                                                                                                                            name: tcpActiveOpens
                                                                                                                                                                                                                                                                                                                                                                        #The name here is the one specified in the MIB but you could use any name.
                                                                                                                                                                                                                                                                                                                                                                        
                                                                                                                                                                                                                                                                                                                                                                      3. Specify an MIB and a table from which to extract information:

                                                                                                                                                                                                                                                                                                                                                                        metrics:
                                                                                                                                                                                                                                                                                                                                                                          - MIB: IF-MIB
                                                                                                                                                                                                                                                                                                                                                                            table: ifTable
                                                                                                                                                                                                                                                                                                                                                                            symbols:
                                                                                                                                                                                                                                                                                                                                                                              - ifInOctets
                                                                                                                                                                                                                                                                                                                                                                            metric_tags:
                                                                                                                                                                                                                                                                                                                                                                              - tag: interface
                                                                                                                                                                                                                                                                                                                                                                            column: ifDescr
                                                                                                                                                                                                                                                                                                                                                                        

                                                                                                                                                                                                                                                                                                                                                                      Metrics Available

                                                                                                                                                                                                                                                                                                                                                                      The SNMP check does not have default metrics. All metrics mentioned in dragent.yaml file will be seen with snmp.* prefix/

                                                                                                                                                                                                                                                                                                                                                                      Result in the Monitor UI

                                                                                                                                                                                                                                                                                                                                                                      8.6.2.25 -

                                                                                                                                                                                                                                                                                                                                                                      Supervisord

                                                                                                                                                                                                                                                                                                                                                                      Supervisor daemon is a client/server system that allows its users to monitor and control a number of processes on UNIX-like operating systems., The Supervisor check monitors the uptime, status, and number of processes running under Supervisord.

                                                                                                                                                                                                                                                                                                                                                                      No default configuration is provided for the Supervisor check; you must provide the configuration in the dragent.yaml file for the Sysdig agent to collect the data provided by Supervisor.

                                                                                                                                                                                                                                                                                                                                                                      This page describes the setup steps required on Supervisor, how to edit the Sysdig agent configuration to collect additional information, the metrics available for integration, and a sample result in the Sysdig Monitor UI.

                                                                                                                                                                                                                                                                                                                                                                      Supervisor Setup

                                                                                                                                                                                                                                                                                                                                                                      Configuration

                                                                                                                                                                                                                                                                                                                                                                      The Sysdig agent can collect data from Supervisor via HTTP server or UNIX socket. The agent collects the same data regardless of the configured collection method.

                                                                                                                                                                                                                                                                                                                                                                      Un-comment the following or add them if they are not present in /etc/supervisor/supervisord.conf

                                                                                                                                                                                                                                                                                                                                                                      [inet_http_server]
                                                                                                                                                                                                                                                                                                                                                                      port=localhost:9001
                                                                                                                                                                                                                                                                                                                                                                      username=user  # optional
                                                                                                                                                                                                                                                                                                                                                                      password=pass  # optional
                                                                                                                                                                                                                                                                                                                                                                      ...
                                                                                                                                                                                                                                                                                                                                                                      
                                                                                                                                                                                                                                                                                                                                                                      
                                                                                                                                                                                                                                                                                                                                                                      [supervisorctl]
                                                                                                                                                                                                                                                                                                                                                                      serverurl=unix:///tmp/supervisor.sock
                                                                                                                                                                                                                                                                                                                                                                      ...
                                                                                                                                                                                                                                                                                                                                                                      
                                                                                                                                                                                                                                                                                                                                                                      [unix_http_server]
                                                                                                                                                                                                                                                                                                                                                                      file=/tmp/supervisor.sock
                                                                                                                                                                                                                                                                                                                                                                      chmod=777 # make sure chmod is set so that non-root users can read the socket.
                                                                                                                                                                                                                                                                                                                                                                      ...
                                                                                                                                                                                                                                                                                                                                                                      
                                                                                                                                                                                                                                                                                                                                                                      [program:foo]
                                                                                                                                                                                                                                                                                                                                                                      command=/bin/cat
                                                                                                                                                                                                                                                                                                                                                                      

                                                                                                                                                                                                                                                                                                                                                                      The programs controlled by Supervisor are given by different [program] sections in the configuration. Each program you want to manage by Supervisor must be specified in the Supervisor configuration file, with its supported options in the [program] section. See Supervisor’s sample.conf file for details.

                                                                                                                                                                                                                                                                                                                                                                      Sysdig Agent Configuration

                                                                                                                                                                                                                                                                                                                                                                      Review how to Edit dragent.yaml to Integrate or Modify Application Checks.

                                                                                                                                                                                                                                                                                                                                                                      Default Configuration

                                                                                                                                                                                                                                                                                                                                                                      By default, Sysdig’s dragent.default.yaml does not have any configuration to connect the agent with Supervisor. Edit dragent.yaml following the Examples given to connect with Supervisor and collect supervisor.* metrics.

                                                                                                                                                                                                                                                                                                                                                                      Remember! Never edit dragent.default.yaml directly; always edit only dragent.yaml.

                                                                                                                                                                                                                                                                                                                                                                      Example 1: Connect by UNIX Socket

                                                                                                                                                                                                                                                                                                                                                                        - name: supervisord
                                                                                                                                                                                                                                                                                                                                                                          pattern:
                                                                                                                                                                                                                                                                                                                                                                            comm: supervisord
                                                                                                                                                                                                                                                                                                                                                                          conf:
                                                                                                                                                                                                                                                                                                                                                                            socket: "unix:///tmp/supervisor.sock"
                                                                                                                                                                                                                                                                                                                                                                      
                                                                                                                                                                                                                                                                                                                                                                      

                                                                                                                                                                                                                                                                                                                                                                      Example 2: Connect by Host Name and Port, Optional Authentication

                                                                                                                                                                                                                                                                                                                                                                      - name: supervisord
                                                                                                                                                                                                                                                                                                                                                                        pattern:
                                                                                                                                                                                                                                                                                                                                                                          comm: supervisord
                                                                                                                                                                                                                                                                                                                                                                        conf:
                                                                                                                                                                                                                                                                                                                                                                          host: localhost
                                                                                                                                                                                                                                                                                                                                                                          port: 9001
                                                                                                                                                                                                                                                                                                                                                                      # user: user # Optional. Required only if a username is configured.
                                                                                                                                                                                                                                                                                                                                                                      # pass: pass # Optional. Required only if a password is configured.
                                                                                                                                                                                                                                                                                                                                                                      

                                                                                                                                                                                                                                                                                                                                                                      Metrics Available

                                                                                                                                                                                                                                                                                                                                                                      Metric Name

                                                                                                                                                                                                                                                                                                                                                                      Metric Description

                                                                                                                                                                                                                                                                                                                                                                      supervisord.process.count

                                                                                                                                                                                                                                                                                                                                                                      (gauge)

                                                                                                                                                                                                                                                                                                                                                                      The number of supervisord monitored processes

                                                                                                                                                                                                                                                                                                                                                                      shown as process

                                                                                                                                                                                                                                                                                                                                                                      supervisord.process.uptime

                                                                                                                                                                                                                                                                                                                                                                      (gauge)

                                                                                                                                                                                                                                                                                                                                                                      The process uptime

                                                                                                                                                                                                                                                                                                                                                                      shown as second

                                                                                                                                                                                                                                                                                                                                                                      See also Supervisord Metrics.

                                                                                                                                                                                                                                                                                                                                                                      Service Check

                                                                                                                                                                                                                                                                                                                                                                      supervisored.can.connect:

                                                                                                                                                                                                                                                                                                                                                                      Returns CRITICAL if the Sysdig agent cannot connect to the HTTP server or UNIX socket configured, otherwise OK.

                                                                                                                                                                                                                                                                                                                                                                      supervisord.process.status:

                                                                                                                                                                                                                                                                                                                                                                      SUPERVISORD STATUSSUPERVISORD.PROCESS.STATUS
                                                                                                                                                                                                                                                                                                                                                                      STOPPEDCRITICAL
                                                                                                                                                                                                                                                                                                                                                                      STARTINGUNKNOWN
                                                                                                                                                                                                                                                                                                                                                                      RUNNINGOK
                                                                                                                                                                                                                                                                                                                                                                      BACKOFFCRITICAL
                                                                                                                                                                                                                                                                                                                                                                      STOPPINGCRITICAL
                                                                                                                                                                                                                                                                                                                                                                      EXITEDCRITICAL
                                                                                                                                                                                                                                                                                                                                                                      FATALCRITICAL
                                                                                                                                                                                                                                                                                                                                                                      UNKNOWNUNKNOWN

                                                                                                                                                                                                                                                                                                                                                                      Result in the Monitor UI

                                                                                                                                                                                                                                                                                                                                                                      8.6.2.26 -

                                                                                                                                                                                                                                                                                                                                                                      TCP

                                                                                                                                                                                                                                                                                                                                                                      You can monitor the status of your custom application’s port using the TCP check. This check will routinely connect to the designated port and send Sysdig Monitor a simple on/off metric and response time.

                                                                                                                                                                                                                                                                                                                                                                      This page describes the default configuration settings, how to edit the configuration to collect additional information, the metrics available for integration, and a sample result in the Sysdig Monitor UI.

                                                                                                                                                                                                                                                                                                                                                                      TCP Application Setup

                                                                                                                                                                                                                                                                                                                                                                      Any application listening on a TCP port can be monitored with tcp_check.

                                                                                                                                                                                                                                                                                                                                                                      Sysdig Agent Configuration

                                                                                                                                                                                                                                                                                                                                                                      Review how to Edit dragent.yaml to Integrate or Modify Application Checks.

                                                                                                                                                                                                                                                                                                                                                                      Default Configuration

                                                                                                                                                                                                                                                                                                                                                                      No default configuration is provided in the default settings file; you must add the entries in Example 1 to the user settings config file dragent.yaml.

                                                                                                                                                                                                                                                                                                                                                                      Remember! Never edit dragent.default.yaml directly; always edit only dragent.yaml.

                                                                                                                                                                                                                                                                                                                                                                      Example

                                                                                                                                                                                                                                                                                                                                                                       - name: tcp_check
                                                                                                                                                                                                                                                                                                                                                                          check_module: tcp_check
                                                                                                                                                                                                                                                                                                                                                                          pattern:
                                                                                                                                                                                                                                                                                                                                                                            comm: httpd
                                                                                                                                                                                                                                                                                                                                                                            arg: DFOREGROUND
                                                                                                                                                                                                                                                                                                                                                                          conf:
                                                                                                                                                                                                                                                                                                                                                                            port: 80
                                                                                                                                                                                                                                                                                                                                                                            collect_response_time: true
                                                                                                                                                                                                                                                                                                                                                                      

                                                                                                                                                                                                                                                                                                                                                                      This example shows monitoring a TCP check on an Apache process running on the host on port 80.

                                                                                                                                                                                                                                                                                                                                                                      comm: is the command for running the Apache server on port 80.

                                                                                                                                                                                                                                                                                                                                                                      If you want the response time for your port, meaning the amount of time the process takes to accept the connection, you can add the collect_response_time: true parameter under the conf: section and the additional metric network.tcp.response_time will appear in the Metrics list.

                                                                                                                                                                                                                                                                                                                                                                      Do not use port: under the pattern: section in this case, because if the process is not listening it will not be matched and the metric will not be sent to Sysdig Monitor.

                                                                                                                                                                                                                                                                                                                                                                      Metrics Available

                                                                                                                                                                                                                                                                                                                                                                      Metric Name

                                                                                                                                                                                                                                                                                                                                                                      Metric Description

                                                                                                                                                                                                                                                                                                                                                                      network.tcp.response_time

                                                                                                                                                                                                                                                                                                                                                                      (gauge)

                                                                                                                                                                                                                                                                                                                                                                      The response time of a given host and TCP port, tagged with url, e.g. 'url:192.168.1.100:22'.

                                                                                                                                                                                                                                                                                                                                                                      shown as second

                                                                                                                                                                                                                                                                                                                                                                      See TCP Metrics.

                                                                                                                                                                                                                                                                                                                                                                      Service Checks

                                                                                                                                                                                                                                                                                                                                                                      tcp.can_connect :

                                                                                                                                                                                                                                                                                                                                                                      DOWN if the agent cannot connect to the configured host and port, otherwise UP.

                                                                                                                                                                                                                                                                                                                                                                      Result in the Monitor UI

                                                                                                                                                                                                                                                                                                                                                                      8.6.2.27 -

                                                                                                                                                                                                                                                                                                                                                                      Varnish

                                                                                                                                                                                                                                                                                                                                                                      Varnish HTTP Cache is a web application accelerator, also known as a “caching HTTP reverse proxy.” You install it in front of any server that speaks HTTP and configure it to cache the contents. If Varnish is installed on your environment, the Sysdig agent will automatically connect. See the Default Configuration section, below.

                                                                                                                                                                                                                                                                                                                                                                      The Sysdig Agent automatically collects all metrics. You can also edit the configuration to emit service checks for the back end.

                                                                                                                                                                                                                                                                                                                                                                      This page describes the default configuration settings, how to edit the configuration to collect additional information, the metrics available for integration, and a sample result in the Sysdig Monitor UI.

                                                                                                                                                                                                                                                                                                                                                                      Varnish Setup

                                                                                                                                                                                                                                                                                                                                                                      Varnish will automatically expose all metrics. You do not need to add anything to the Varnish instance.

                                                                                                                                                                                                                                                                                                                                                                      Sysdig Agent Configuration

                                                                                                                                                                                                                                                                                                                                                                      Review how to Edit dragent.yaml to Integrate or Modify Application Checks.

                                                                                                                                                                                                                                                                                                                                                                      Default Configuration

                                                                                                                                                                                                                                                                                                                                                                      By default, Sysdig’s dragent.default.yaml uses the following code to connect with Varnish and collect all but the VBE metrics. See Example 2 Enable Varnish VBE Metrics.

                                                                                                                                                                                                                                                                                                                                                                      metrics_filter:
                                                                                                                                                                                                                                                                                                                                                                       - exclude: varnish.VBE.*
                                                                                                                                                                                                                                                                                                                                                                      app_checks:
                                                                                                                                                                                                                                                                                                                                                                       - name: varnishapp_checks:
                                                                                                                                                                                                                                                                                                                                                                          interval: 15
                                                                                                                                                                                                                                                                                                                                                                          pattern:
                                                                                                                                                                                                                                                                                                                                                                            comm: varnishd
                                                                                                                                                                                                                                                                                                                                                                          conf:
                                                                                                                                                                                                                                                                                                                                                                            varnishstat: /usr/bin/varnishstat
                                                                                                                                                                                                                                                                                                                                                                      

                                                                                                                                                                                                                                                                                                                                                                      Optionally, if you want to submit service checks for the health of each back end, you can configure varnishadm and edit dragent.yaml as in Example 1.

                                                                                                                                                                                                                                                                                                                                                                      Remember! Never edit dragent.default.yaml directly; always edit only dragent.yaml.

                                                                                                                                                                                                                                                                                                                                                                      Example 1 Service Health Checks with varnishadm

                                                                                                                                                                                                                                                                                                                                                                      When varnishadm is configured, the Sysdig agent requires privileges to execute the binary with root privileges. Add the following to your /etc/sudoers file:

                                                                                                                                                                                                                                                                                                                                                                      sysdig-agent ALL=(ALL) NOPASSWD:/usr/bin/varnishadm
                                                                                                                                                                                                                                                                                                                                                                      

                                                                                                                                                                                                                                                                                                                                                                      Then edit dragent.yaml as follows. Note: If you have configured varnishadm and your secret file is NOT /etc/varnish/secret, you can comment out secretfile.

                                                                                                                                                                                                                                                                                                                                                                      app_checks:
                                                                                                                                                                                                                                                                                                                                                                        - name: varnish
                                                                                                                                                                                                                                                                                                                                                                          interval: 15
                                                                                                                                                                                                                                                                                                                                                                          pattern:
                                                                                                                                                                                                                                                                                                                                                                            comm: varnishd
                                                                                                                                                                                                                                                                                                                                                                          conf:
                                                                                                                                                                                                                                                                                                                                                                            varnishstat: /usr/bin/varnishstat
                                                                                                                                                                                                                                                                                                                                                                            varnishadm: /usr/bin/varnishadm
                                                                                                                                                                                                                                                                                                                                                                            secretfile: /etc/varnish/secret
                                                                                                                                                                                                                                                                                                                                                                      

                                                                                                                                                                                                                                                                                                                                                                      This example will enable following service check.

                                                                                                                                                                                                                                                                                                                                                                      varnish.backend_healthy: The agent submits a service check for each Varnish backend, tagging each with backend:<backend_name>.

                                                                                                                                                                                                                                                                                                                                                                      Example 2 Enable Varnish VBE Metrics

                                                                                                                                                                                                                                                                                                                                                                      Varnish VBE metrics are dynamically generated (and therefore are not listed in the Metrics Dictionary). Because they generate unique metric names with timestamps, they can clutter metric handling and are filtered out by default. If you want to collect these metrics, use include in the metrics_filter in dragent.yaml:

                                                                                                                                                                                                                                                                                                                                                                      metrics_filter:
                                                                                                                                                                                                                                                                                                                                                                       - include: varnish.VBE.*
                                                                                                                                                                                                                                                                                                                                                                      app_checks:
                                                                                                                                                                                                                                                                                                                                                                       - name: varnishapp_checks:
                                                                                                                                                                                                                                                                                                                                                                          interval: 15
                                                                                                                                                                                                                                                                                                                                                                          pattern:
                                                                                                                                                                                                                                                                                                                                                                            comm: varnishd
                                                                                                                                                                                                                                                                                                                                                                          conf:
                                                                                                                                                                                                                                                                                                                                                                            varnishstat: /usr/bin/varnishstat
                                                                                                                                                                                                                                                                                                                                                                      

                                                                                                                                                                                                                                                                                                                                                                      Metrics Available

                                                                                                                                                                                                                                                                                                                                                                      See Varnish Metrics.

                                                                                                                                                                                                                                                                                                                                                                      Result in the Monitor UI

                                                                                                                                                                                                                                                                                                                                                                      8.6.3 -

                                                                                                                                                                                                                                                                                                                                                                      (Legacy) Create a Custom App Check

                                                                                                                                                                                                                                                                                                                                                                      We are sunsetting application checks in favor of Monitoring Integrations.

                                                                                                                                                                                                                                                                                                                                                                      Application checks are integrations that allow the Sysdig agent to poll specific metrics exposed by any application, and the built-in app checks currently supported are listed on the App Checks main page. Many other Java-based applications are also supported out-of-the-box.

                                                                                                                                                                                                                                                                                                                                                                      If your application is not already supported though, you have a few options:

                                                                                                                                                                                                                                                                                                                                                                      1. Utilize Prometheus, StatsD, or JMX to collect custom metrics:

                                                                                                                                                                                                                                                                                                                                                                      2. Send a request at support@sysdig.com, and we’ll do our best to add support for your application.

                                                                                                                                                                                                                                                                                                                                                                      3. Create your own check by following the instructions below.

                                                                                                                                                                                                                                                                                                                                                                      If you do write a custom check, let us know. We love hearing about how our users extend Sysdig Monitor, and we can also consider embedding your app check automatically in the Sysdig agent.

                                                                                                                                                                                                                                                                                                                                                                      See also Understanding the Agent Config Files for details on accessing and editing the agent configuration files in general.

                                                                                                                                                                                                                                                                                                                                                                      Check Anatomy

                                                                                                                                                                                                                                                                                                                                                                      Essentially, an app check is a Python Class that extends AgentCheck :

                                                                                                                                                                                                                                                                                                                                                                      from checks import AgentCheck
                                                                                                                                                                                                                                                                                                                                                                      
                                                                                                                                                                                                                                                                                                                                                                      class MyCustomCheck(AgentCheck):
                                                                                                                                                                                                                                                                                                                                                                          # namespaces of the monitored process to join
                                                                                                                                                                                                                                                                                                                                                                          # right now we support 'net', 'mnt' and 'uts'
                                                                                                                                                                                                                                                                                                                                                                          # put there the minimum necessary namespaces to join
                                                                                                                                                                                                                                                                                                                                                                          # usually 'net' is enough. In this case you can also omit the variable
                                                                                                                                                                                                                                                                                                                                                                          # NEEDED_NS = ( 'net', )
                                                                                                                                                                                                                                                                                                                                                                      
                                                                                                                                                                                                                                                                                                                                                                          # def __init__(self, name, init_config, agentConfig):
                                                                                                                                                                                                                                                                                                                                                                          #     '''
                                                                                                                                                                                                                                                                                                                                                                          #     Optional, define it if you need custom initialization
                                                                                                                                                                                                                                                                                                                                                                          #     remember to accept these parameters and pass them to the superclass
                                                                                                                                                                                                                                                                                                                                                                          #     '''
                                                                                                                                                                                                                                                                                                                                                                          #     AgentCheck.__init__(self, name, init_config, agentConfig)
                                                                                                                                                                                                                                                                                                                                                                          #     self.myvar = None
                                                                                                                                                                                                                                                                                                                                                                      
                                                                                                                                                                                                                                                                                                                                                                          def check(self, instance):
                                                                                                                                                                                                                                                                                                                                                                              '''
                                                                                                                                                                                                                                                                                                                                                                              This function gets called to perform the check.
                                                                                                                                                                                                                                                                                                                                                                              Connect to the application, parse the metrics and add them to aggregation using
                                                                                                                                                                                                                                                                                                                                                                              superclass methods like `self.gauge(metricname, value, tags)`
                                                                                                                                                                                                                                                                                                                                                                              '''
                                                                                                                                                                                                                                                                                                                                                                              server_port = instance['port']
                                                                                                                                                                                                                                                                                                                                                                              self.gauge("testmetric", 1)
                                                                                                                                                                                                                                                                                                                                                                      

                                                                                                                                                                                                                                                                                                                                                                      Put this file into /opt/draios/lib/python/checks.custom.d (create the directory if not present) and it will be available to the Sysdig agent. To run your checks, you need to supply configuration information in the agent’s config file, dragent.yaml as is done with bundled checks:

                                                                                                                                                                                                                                                                                                                                                                      app_checks:
                                                                                                                                                                                                                                                                                                                                                                        - name: voltdb # check name, must be unique
                                                                                                                                                                                                                                                                                                                                                                          # name of your .py file, if it's the same of the check name you can omit it
                                                                                                                                                                                                                                                                                                                                                                          # check_module: voltdb
                                                                                                                                                                                                                                                                                                                                                                          pattern: # pattern to match the application
                                                                                                                                                                                                                                                                                                                                                                            comm: java
                                                                                                                                                                                                                                                                                                                                                                            arg: org.voltdb.VoltDB
                                                                                                                                                                                                                                                                                                                                                                          conf:
                                                                                                                                                                                                                                                                                                                                                                            port: 21212 # any key value config you need on `check(self, instance_conf)` function
                                                                                                                                                                                                                                                                                                                                                                      

                                                                                                                                                                                                                                                                                                                                                                      Check Interface Detail

                                                                                                                                                                                                                                                                                                                                                                      As you can see, the most important piece of the check interface is the check function. The function declaration is:

                                                                                                                                                                                                                                                                                                                                                                          def check(self, instance)
                                                                                                                                                                                                                                                                                                                                                                      

                                                                                                                                                                                                                                                                                                                                                                      instance is a dict containing the configuration of the check. It will contain all the attributes found in the conf: section in dragent.yaml plus the following:

                                                                                                                                                                                                                                                                                                                                                                      • name: The check unique name.

                                                                                                                                                                                                                                                                                                                                                                      • ports: An array of all listening ports of the process.

                                                                                                                                                                                                                                                                                                                                                                      • port: The first listening port of the process.

                                                                                                                                                                                                                                                                                                                                                                      These attributes are available as defaults and allow you to automatically configure your check. The conf: section as higher priority on these values.

                                                                                                                                                                                                                                                                                                                                                                      Inside the check function you can call these methods to send metrics:

                                                                                                                                                                                                                                                                                                                                                                      self.gauge(metric_name, value, tags) # Sample a gauge metric
                                                                                                                                                                                                                                                                                                                                                                      
                                                                                                                                                                                                                                                                                                                                                                      self.rate(metric_name, value, tags) # Sample a point, with the rate calculated at the end of the check
                                                                                                                                                                                                                                                                                                                                                                      
                                                                                                                                                                                                                                                                                                                                                                      self.increment(metric_name, value, tags) # Increment a counter metric
                                                                                                                                                                                                                                                                                                                                                                      
                                                                                                                                                                                                                                                                                                                                                                      self.decrement(metric_name, value, tags) # Decrement a counter metric
                                                                                                                                                                                                                                                                                                                                                                      
                                                                                                                                                                                                                                                                                                                                                                      self.histogram(metric_name, value, tags) # Sample a histogram metric
                                                                                                                                                                                                                                                                                                                                                                      
                                                                                                                                                                                                                                                                                                                                                                      self.count(metric_name, value, tags) # Sample a raw count metric
                                                                                                                                                                                                                                                                                                                                                                      
                                                                                                                                                                                                                                                                                                                                                                      self.monotonic_count(metric_name, value, tags) # Sample an increasing counter metric
                                                                                                                                                                                                                                                                                                                                                                      

                                                                                                                                                                                                                                                                                                                                                                      Usually the most used are gauge and rate . Besides metric_name and value parameters that are quite obvious, you can also add tags to your metric using this format:

                                                                                                                                                                                                                                                                                                                                                                      tags = [ "key:value", "key2:value2", "key_without_value"]
                                                                                                                                                                                                                                                                                                                                                                      

                                                                                                                                                                                                                                                                                                                                                                      It is an array of string representing tags in both single or key/value approach. They will be useful in Sysdig Monitor for graph segmentation.

                                                                                                                                                                                                                                                                                                                                                                      You can also send service checks which are on/off metrics, using this interface:

                                                                                                                                                                                                                                                                                                                                                                      self.service_check(name, status, tags)
                                                                                                                                                                                                                                                                                                                                                                      

                                                                                                                                                                                                                                                                                                                                                                      Where status can be:

                                                                                                                                                                                                                                                                                                                                                                      • AgentCheck.OK

                                                                                                                                                                                                                                                                                                                                                                      • AgentCheck.WARNING

                                                                                                                                                                                                                                                                                                                                                                      • AgentCheck.CRITICAL

                                                                                                                                                                                                                                                                                                                                                                      • AgentCheck.UNKNOWN

                                                                                                                                                                                                                                                                                                                                                                      Testing

                                                                                                                                                                                                                                                                                                                                                                      To test your check you can launch Sysdig App Checks from the command line to avoid running the full agent and iterate faster:

                                                                                                                                                                                                                                                                                                                                                                      # from /opt/draios directory
                                                                                                                                                                                                                                                                                                                                                                      ./bin/sdchecks runCheck <check_unique_name> <process_pid> [<process_vpid>] [<process_port>]
                                                                                                                                                                                                                                                                                                                                                                      
                                                                                                                                                                                                                                                                                                                                                                      • check_unique_name: The check name as on config file.

                                                                                                                                                                                                                                                                                                                                                                      • pid: Process pid seen from host.

                                                                                                                                                                                                                                                                                                                                                                      • vpid: Optional, process pid seen inside the container, defaults to 1.

                                                                                                                                                                                                                                                                                                                                                                      • port: Optional, port where the process is listening, defaults to None.

                                                                                                                                                                                                                                                                                                                                                                      Example:

                                                                                                                                                                                                                                                                                                                                                                      ./bin/sdchecks runCheck redis 1254 1 6379
                                                                                                                                                                                                                                                                                                                                                                      5658:INFO:Starting
                                                                                                                                                                                                                                                                                                                                                                      5658:INFO:Container support: True
                                                                                                                                                                                                                                                                                                                                                                      5658:INFO:Run AppCheck for {'ports': [6379], 'pid': 5625, 'check': 'redis', 'vpid': 1}
                                                                                                                                                                                                                                                                                                                                                                      Conf: {'port': 6379, 'socket_timeout': 5, 'host': '127.0.0.1', 'name': 'redis', 'ports': [6379]}
                                                                                                                                                                                                                                                                                                                                                                      Metrics: # metrics array
                                                                                                                                                                                                                                                                                                                                                                      Checks: # metrics check
                                                                                                                                                                                                                                                                                                                                                                      Exception: None # exceptions
                                                                                                                                                                                                                                                                                                                                                                      

                                                                                                                                                                                                                                                                                                                                                                      The output is intentionally raw to allow you to better debug what the check is doing.

                                                                                                                                                                                                                                                                                                                                                                      8.6.4 -

                                                                                                                                                                                                                                                                                                                                                                      (Legacy) Create Per-Container Custom App Checks

                                                                                                                                                                                                                                                                                                                                                                      We are sunsetting application checks in favor of Monitoring Integrations.

                                                                                                                                                                                                                                                                                                                                                                      Sysdig supports adding custom application check-script configurations for each individual container in the infrastructure. This avoids multiple edits and entries to achieve container specific customization. In particular, this enables PaaS to work smarter, by delegating application teams to configure their own checks.

                                                                                                                                                                                                                                                                                                                                                                      See also Understanding the Agent Config Files for details on accessing and editing the agent configuration files in general.

                                                                                                                                                                                                                                                                                                                                                                      How It Works

                                                                                                                                                                                                                                                                                                                                                                      The SYSDIG_AGENT_CONF variable stores a YAML-formatted configuration for your app check and will be used to match app check configurations.

                                                                                                                                                                                                                                                                                                                                                                      All original app_checks are available, and the syntax is the same as for dragent.yaml. You can add the environment variable directly to the Docker file.

                                                                                                                                                                                                                                                                                                                                                                      Example with Dockerfile

                                                                                                                                                                                                                                                                                                                                                                      This example defines a per container app-check for Redis. Normally you would have a YAML formatted entry installed into the agent’s /opt/draios/etc/dragent.yaml file that would look like this:

                                                                                                                                                                                                                                                                                                                                                                      app_checks:
                                                                                                                                                                                                                                                                                                                                                                        - name: redis
                                                                                                                                                                                                                                                                                                                                                                          check_module: redisdb
                                                                                                                                                                                                                                                                                                                                                                          pattern:
                                                                                                                                                                                                                                                                                                                                                                            comm: redis-server
                                                                                                                                                                                                                                                                                                                                                                          conf:
                                                                                                                                                                                                                                                                                                                                                                            host: 127.0.0.1
                                                                                                                                                                                                                                                                                                                                                                            port: "{port}"
                                                                                                                                                                                                                                                                                                                                                                            password: protected
                                                                                                                                                                                                                                                                                                                                                                      

                                                                                                                                                                                                                                                                                                                                                                      For the per-container method, convert and add the above entry to the Docker file via the SYSDIG_AGENT_CONF environment variable:

                                                                                                                                                                                                                                                                                                                                                                      FROM redis
                                                                                                                                                                                                                                                                                                                                                                      # This config file adds a password for accessing redis instance
                                                                                                                                                                                                                                                                                                                                                                      ADD redis.conf /
                                                                                                                                                                                                                                                                                                                                                                      
                                                                                                                                                                                                                                                                                                                                                                      ENV SYSDIG_AGENT_CONF { "app_checks": [{ "name": "redis", "check_module": "redisdb", "pattern": {"comm": "redis-server"}, "conf": { "host": "127.0.0.1", "port": "6379", "password": "protected"} }] }
                                                                                                                                                                                                                                                                                                                                                                      ENTRYPOINT ["redis-server"]
                                                                                                                                                                                                                                                                                                                                                                      CMD [ "/redis.conf" ]
                                                                                                                                                                                                                                                                                                                                                                      

                                                                                                                                                                                                                                                                                                                                                                      Example with Docker CLI

                                                                                                                                                                                                                                                                                                                                                                      You can add parameters starting a container with dockerrunusing-e/–envflag or injecting it using orchestration systems like Kubernetes:

                                                                                                                                                                                                                                                                                                                                                                      PER_CONTAINER_CONF='{ "app_checks": [{ "name": "redis", "check_module": "redisdb", "pattern": {"comm": "redis-server"}, "conf": { "host": "127.0.0.1", "port": "6379", "password": "protected"} }] }'
                                                                                                                                                                                                                                                                                                                                                                      
                                                                                                                                                                                                                                                                                                                                                                      docker run --name redis -v /tmp/redis.conf:/etc/redis.conf -e SYSDIG_AGENT_CONF="${PER_CONTAINER_CONF}" -d redis /etc/redis.conf
                                                                                                                                                                                                                                                                                                                                                                      

                                                                                                                                                                                                                                                                                                                                                                      9 -

                                                                                                                                                                                                                                                                                                                                                                      Captures

                                                                                                                                                                                                                                                                                                                                                                      Sysdig capture files contain system calls and other OS events that can be analyzed with either the open-source sysdig or csysdig (curses-based) utilities, and are displayed in the Captures module.

                                                                                                                                                                                                                                                                                                                                                                      The Captures module contains a table listing the capture file name, the host it was retrieved from, the time frame, and the size of the capture. When the capture file status is uploaded, the file has been successfully transmitted from the Sysdig agent to the storage bucket, and is available for download and analysis.

                                                                                                                                                                                                                                                                                                                                                                      Store Capture Files

                                                                                                                                                                                                                                                                                                                                                                      Sysdig capture files are stored in Sysdig’s AWS S3 storage (for SaaS environments), or in the Cassandra DB (for on-premises environments) by default.

                                                                                                                                                                                                                                                                                                                                                                      Learn more about creating, configuring, and analyzing capture files:

                                                                                                                                                                                                                                                                                                                                                                      This feature is available in the Enterprise tier of the Sysdig product. See https://sysdig.com/pricing for details, or contact sales@sysdig.com.

                                                                                                                                                                                                                                                                                                                                                                      9.1 -

                                                                                                                                                                                                                                                                                                                                                                      Configure Sysdig Captures

                                                                                                                                                                                                                                                                                                                                                                      Create a Capture File From an Alert

                                                                                                                                                                                                                                                                                                                                                                      While configuring your alert in the Act section toggle on the Activate Sysdig Capture

                                                                                                                                                                                                                                                                                                                                                                      ParameterDescription
                                                                                                                                                                                                                                                                                                                                                                      StorageThe storage location for the capture files. The default storage location is the Sysdig Cloud Amazon S3 bucket. To configure a custom S3 storage bucket, refer to Configure AWS Capture File Storage.
                                                                                                                                                                                                                                                                                                                                                                      File NameThe name of the capture file. The default name includes the date and time stamp the capture was created.
                                                                                                                                                                                                                                                                                                                                                                      Time frameThe period of time captured. The default time is 15 seconds; the maximum capture time available is 24 hours. The capture file size limit is 100MB. The capture time starts from the time the alert threshold was breached (it does not capture syscalls from before the alert was triggered)
                                                                                                                                                                                                                                                                                                                                                                      Note: Sysdig recommends using the default time to ensure captures are small and manageable.
                                                                                                                                                                                                                                                                                                                                                                      FilterRestricts the amount of trace information collected. For more information, including examples of available filters, refer to the Sysdig Github page.

                                                                                                                                                                                                                                                                                                                                                                      Create a Capture File Manually

                                                                                                                                                                                                                                                                                                                                                                      To create a capture file:

                                                                                                                                                                                                                                                                                                                                                                      1. From the Explore module, select a host or container.

                                                                                                                                                                                                                                                                                                                                                                      2. Click the Key Page Action drop-down menu, and select Sysdig Capture.

                                                                                                                                                                                                                                                                                                                                                                        The Sysdig Capture pop-up window will open.

                                                                                                                                                                                                                                                                                                                                                                      3. Define the following parameters, and click the Start Capture button:

                                                                                                                                                                                                                                                                                                                                                                      ParameterDescription
                                                                                                                                                                                                                                                                                                                                                                      StorageThe storage location for the capture files. The default storage location is the Sysdig Cloud Amazon S3 bucket. To configure a custom S3 storage bucket, refer to Configure AWS Capture File Storage.
                                                                                                                                                                                                                                                                                                                                                                      Capture path and nameThe name of the capture file. The default name includes the date and time stamp the capture was created.
                                                                                                                                                                                                                                                                                                                                                                      Time frameThe period of time captured. The default time is 15 seconds; the maximum capture time available is 24 hours. The capture file size limit is 100MB.
                                                                                                                                                                                                                                                                                                                                                                      Note: Sysdig recommends using the default time to ensure captures are small and manageable.
                                                                                                                                                                                                                                                                                                                                                                      FilterRestricts the amount of trace information collected. For more information, including examples of available filters, refer to the Sysdig Github page.

                                                                                                                                                                                                                                                                                                                                                                      The Sysdig agent will be signaled to start a capture, and send back the resulting trace file. The file will then be displayed in the Captures module.

                                                                                                                                                                                                                                                                                                                                                                      Download a Capture File

                                                                                                                                                                                                                                                                                                                                                                      To download a capture file:

                                                                                                                                                                                                                                                                                                                                                                      1. From the Captures module, navigate to the target capture file.

                                                                                                                                                                                                                                                                                                                                                                      2. Select the target capture file.

                                                                                                                                                                                                                                                                                                                                                                      3. Click the Download button. A capture file will be automatically downloaded to your local machine.

                                                                                                                                                                                                                                                                                                                                                                      Delete Capture Files

                                                                                                                                                                                                                                                                                                                                                                      To delete a single capture file:

                                                                                                                                                                                                                                                                                                                                                                      1. From the Captures module, select the capture file to be deleted.

                                                                                                                                                                                                                                                                                                                                                                      2. Click the Delete button at the bottom of the Captures module:

                                                                                                                                                                                                                                                                                                                                                                      3. On the Keep File prompt, click the Delete button to confirm, or the Keep File button to cancel.

                                                                                                                                                                                                                                                                                                                                                                      To delete all capture files:

                                                                                                                                                                                                                                                                                                                                                                      1. From the Captures module, click the Delete All button:

                                                                                                                                                                                                                                                                                                                                                                      2. Click the Yes, Delete Captures button to confirm, or the Cancel button.

                                                                                                                                                                                                                                                                                                                                                                      9.2 -

                                                                                                                                                                                                                                                                                                                                                                      Review a Capture File

                                                                                                                                                                                                                                                                                                                                                                      Explore a Capture File

                                                                                                                                                                                                                                                                                                                                                                      1. From the Captures module, navigate to the target capture file.

                                                                                                                                                                                                                                                                                                                                                                      2. Select the target capture file. You will see some action buttons at the bottom of the interface.

                                                                                                                                                                                                                                                                                                                                                                      3. Click the Explore button. You will be directed to the Explore tab view of the capture.

                                                                                                                                                                                                                                                                                                                                                                      Inspect a Capture File

                                                                                                                                                                                                                                                                                                                                                                      1. From the Captures module, navigate to the target capture file.

                                                                                                                                                                                                                                                                                                                                                                      2. Select the target capture file. You will see some action buttons at the bottom of the interface.

                                                                                                                                                                                                                                                                                                                                                                      3. Click the Inspect button. You will be directed to the Sysdig Inspect page of the capture.

                                                                                                                                                                                                                                                                                                                                                                      10 -

                                                                                                                                                                                                                                                                                                                                                                      Metrics Dictionary

                                                                                                                                                                                                                                                                                                                                                                      The Sysdig metrics dictionary lists all the metrics, both in Sysdig legacy and Prometheus-compatible notation, supported by the Sysdig product suite, as well as kube state and cloud provider metrics. The Metrics Dictionary is a living document and is updated as new metrics are added to the product.

                                                                                                                                                                                                                                                                                                                                                                      10.1 -

                                                                                                                                                                                                                                                                                                                                                                      Metrics and Label Mapping

                                                                                                                                                                                                                                                                                                                                                                      This topic outlines the mapping between the metrics and label naming conventions in the Sysdig legacy datastore and the new Sysdig datastore.

                                                                                                                                                                                                                                                                                                                                                                      10.1.1 -

                                                                                                                                                                                                                                                                                                                                                                      Mapping Classic Metrics with Context-Specific PromQL Metrics

                                                                                                                                                                                                                                                                                                                                                                      Sysdig classic metrics such as cpu.used.percent previously returned values from a process, container, or host depending on the query segmentation or scope. You can now use context-explicit metrics which aligns with the flat model and resource specific semantics of Prometheus naming schema. Your existing dashboards and alerts will be automatically migrated to the new naming convention.

                                                                                                                                                                                                                                                                                                                                                                      Sysdig Classic MetricsContext-Specific Metrics in Prometheus Notation
                                                                                                                                                                                                                                                                                                                                                                      cpu.cores.usedsysdig_container_cpu_cores_used
                                                                                                                                                                                                                                                                                                                                                                      sysdig_host_cpu_cores_used
                                                                                                                                                                                                                                                                                                                                                                      sysdig_program_cpu_cores_used
                                                                                                                                                                                                                                                                                                                                                                      cpu.cores.used.percentsysdig_container_cpu_cores_used_percent
                                                                                                                                                                                                                                                                                                                                                                      sysdig_host_cpu_cores_used_percent
                                                                                                                                                                                                                                                                                                                                                                      sysdig_program_cpu_cores_used_percent
                                                                                                                                                                                                                                                                                                                                                                      cpu.used.percentsysdig_container_cpu_used_percent
                                                                                                                                                                                                                                                                                                                                                                      sysdig_host_cpu_used_percent
                                                                                                                                                                                                                                                                                                                                                                      sysdig_program_cpu_used_percent
                                                                                                                                                                                                                                                                                                                                                                      fd.used.percentsysdig_container_fd_used_percent
                                                                                                                                                                                                                                                                                                                                                                      sysdig_host_fd_used_percent
                                                                                                                                                                                                                                                                                                                                                                      sysdig_program_fd_used_percent
                                                                                                                                                                                                                                                                                                                                                                      file.bytes.insysdig_container_file_in_bytes
                                                                                                                                                                                                                                                                                                                                                                      sysdig_host_file_in_bytes
                                                                                                                                                                                                                                                                                                                                                                      sysdig_program_file_in_bytes
                                                                                                                                                                                                                                                                                                                                                                      file.bytes.outsysdig_container_file_out_bytes
                                                                                                                                                                                                                                                                                                                                                                      sysdig_host_file_out_bytes
                                                                                                                                                                                                                                                                                                                                                                      sysdig_program_file_out_bytes
                                                                                                                                                                                                                                                                                                                                                                      file.bytes.totalsysdig_container_file_total_bytes
                                                                                                                                                                                                                                                                                                                                                                      sysdig_host_file_total_bytes
                                                                                                                                                                                                                                                                                                                                                                      sysdig_program_file_total_bytes
                                                                                                                                                                                                                                                                                                                                                                      file.error.open.countsysdig_container_file_error_open_count
                                                                                                                                                                                                                                                                                                                                                                      sysdig_host_file_error_open_count
                                                                                                                                                                                                                                                                                                                                                                      sysdig_program_file_error_open_count
                                                                                                                                                                                                                                                                                                                                                                      file.error.total.countsysdig_container_file_error_total_count
                                                                                                                                                                                                                                                                                                                                                                      sysdig_host_file_error_total_count
                                                                                                                                                                                                                                                                                                                                                                      sysdig_program_file_error_total_count
                                                                                                                                                                                                                                                                                                                                                                      file.iops.insysdig_container_file_in_iops
                                                                                                                                                                                                                                                                                                                                                                      sysdig_host_file_in_iops
                                                                                                                                                                                                                                                                                                                                                                      sysdig_program_file_in_iops
                                                                                                                                                                                                                                                                                                                                                                      file.iops.outsysdig_container_file_out_iops
                                                                                                                                                                                                                                                                                                                                                                      sysdig_host_file_out_iops
                                                                                                                                                                                                                                                                                                                                                                      sysdig_program_file_out_iops
                                                                                                                                                                                                                                                                                                                                                                      file.iops.totalsysdig_container_file_total_iops
                                                                                                                                                                                                                                                                                                                                                                      sysdig_host_file_total_iops
                                                                                                                                                                                                                                                                                                                                                                      sysdig_program_file_total_iops
                                                                                                                                                                                                                                                                                                                                                                      file.open.countsysdig_container_file_open_count
                                                                                                                                                                                                                                                                                                                                                                      sysdig_host_file_open_count
                                                                                                                                                                                                                                                                                                                                                                      sysdig_program_file_open_count
                                                                                                                                                                                                                                                                                                                                                                      file.time.insysdig_container_file_in_time
                                                                                                                                                                                                                                                                                                                                                                      sysdig_host_file_in_time
                                                                                                                                                                                                                                                                                                                                                                      sysdig_program_file_in_time
                                                                                                                                                                                                                                                                                                                                                                      file.time.outsysdig_container_file_out_time
                                                                                                                                                                                                                                                                                                                                                                      sysdig_host_file_out_time
                                                                                                                                                                                                                                                                                                                                                                      sysdig_program_file_out_time
                                                                                                                                                                                                                                                                                                                                                                      file.time.totalsysdig_container_file_total_time
                                                                                                                                                                                                                                                                                                                                                                      sysdig_host_file_total_time
                                                                                                                                                                                                                                                                                                                                                                      sysdig_program_file_total_time
                                                                                                                                                                                                                                                                                                                                                                      fs.bytes.freesysdig_container_fs_free_bytes
                                                                                                                                                                                                                                                                                                                                                                      sysdig_fs_free_bytes
                                                                                                                                                                                                                                                                                                                                                                      sysdig_host_fs_free_bytes
                                                                                                                                                                                                                                                                                                                                                                      fs.bytes.totalsysdig_container_fs_total_bytes
                                                                                                                                                                                                                                                                                                                                                                      sysdig_fs_total_bytes
                                                                                                                                                                                                                                                                                                                                                                      sysdig_host_fs_total_bytes
                                                                                                                                                                                                                                                                                                                                                                      fs.bytes.usedsysdig_container_fs_used_bytes
                                                                                                                                                                                                                                                                                                                                                                      sysdig_fs_used_bytes
                                                                                                                                                                                                                                                                                                                                                                      sysdig_host_fs_used_bytes
                                                                                                                                                                                                                                                                                                                                                                      fs.free.percentsysdig_container_fs_free_percent
                                                                                                                                                                                                                                                                                                                                                                      sysdig_fs_free_percent
                                                                                                                                                                                                                                                                                                                                                                      sysdig_host_fs_free_percent
                                                                                                                                                                                                                                                                                                                                                                      fs.inodes.total.countsysdig_container_fs_inodes_total_count
                                                                                                                                                                                                                                                                                                                                                                      sysdig_fs_inodes_total_count
                                                                                                                                                                                                                                                                                                                                                                      sysdig_host_fs_inodes_total_count
                                                                                                                                                                                                                                                                                                                                                                      fs.inodes.used.countsysdig_container_fs_inodes_used_count
                                                                                                                                                                                                                                                                                                                                                                      sysdig_fs_inodes_used_count
                                                                                                                                                                                                                                                                                                                                                                      sysdig_host_fs_inodes_used_count
                                                                                                                                                                                                                                                                                                                                                                      fs.inodes.used.percentsysdig_container_fs_inodes_used_percent
                                                                                                                                                                                                                                                                                                                                                                      sysdig_fs_inodes_used_percent
                                                                                                                                                                                                                                                                                                                                                                      sysdig_host_fs_inodes_used_percent
                                                                                                                                                                                                                                                                                                                                                                      fs.largest.used.percentsysdig_container_fs_largest_used_percent
                                                                                                                                                                                                                                                                                                                                                                      sysdig_host_fs_largest_used_percent
                                                                                                                                                                                                                                                                                                                                                                      fs.root.used.percentsysdig_container_fs_root_used_percent
                                                                                                                                                                                                                                                                                                                                                                      sysdig_host_fs_root_used_percent
                                                                                                                                                                                                                                                                                                                                                                      fs.used.percentsysdig_container_fs_used_percent
                                                                                                                                                                                                                                                                                                                                                                      sysdig_fs_used_percent
                                                                                                                                                                                                                                                                                                                                                                      sysdig_host_fs_used_percent
                                                                                                                                                                                                                                                                                                                                                                      host.error.countsysdig_container_syscall_error_count
                                                                                                                                                                                                                                                                                                                                                                      sysdig_host_syscall_error_count
                                                                                                                                                                                                                                                                                                                                                                      infosysdig_agent_info
                                                                                                                                                                                                                                                                                                                                                                      sysdig_container_info
                                                                                                                                                                                                                                                                                                                                                                      sysdig_host_info
                                                                                                                                                                                                                                                                                                                                                                      memory.bytes.totalsysdig_host_memory_total_bytes
                                                                                                                                                                                                                                                                                                                                                                      sysdig_container_memory_used_bytes
                                                                                                                                                                                                                                                                                                                                                                      sysdig_host_memory_used_bytes
                                                                                                                                                                                                                                                                                                                                                                      sysdig_program_memory_used_bytes
                                                                                                                                                                                                                                                                                                                                                                      memory.bytes.virtualsysdig_container_memory_virtual_bytes
                                                                                                                                                                                                                                                                                                                                                                      sysdig_host_memory_virtual_bytes
                                                                                                                                                                                                                                                                                                                                                                      memory.swap.bytes.usedsysdig_container_memory_swap_used_bytes
                                                                                                                                                                                                                                                                                                                                                                      sysdig_host_memory_swap_used_bytes
                                                                                                                                                                                                                                                                                                                                                                      memory.used.percentsysdig_container_memory_used_percent
                                                                                                                                                                                                                                                                                                                                                                      sysdig_host_memory_used_percent
                                                                                                                                                                                                                                                                                                                                                                      net.bytes.insysdig_connection_net_in_bytes
                                                                                                                                                                                                                                                                                                                                                                      sysdig_container_net_in_bytes
                                                                                                                                                                                                                                                                                                                                                                      sysdig_host_net_in_bytes
                                                                                                                                                                                                                                                                                                                                                                      sysdig_program_net_in_bytes
                                                                                                                                                                                                                                                                                                                                                                      net.bytes.outsysdig_connection_net_out_bytes
                                                                                                                                                                                                                                                                                                                                                                      sysdig_container_net_out_bytes
                                                                                                                                                                                                                                                                                                                                                                      sysdig_host_net_out_bytes
                                                                                                                                                                                                                                                                                                                                                                      sysdig_program_net_out_bytes
                                                                                                                                                                                                                                                                                                                                                                      net.bytes.totalsysdig_connection_net_total_bytes
                                                                                                                                                                                                                                                                                                                                                                      sysdig_container_net_total_bytes
                                                                                                                                                                                                                                                                                                                                                                      sysdig_host_net_total_bytes
                                                                                                                                                                                                                                                                                                                                                                      sysdig_program_net_total_bytes
                                                                                                                                                                                                                                                                                                                                                                      net.connection.count.insysdig_connection_net_connection_in_count
                                                                                                                                                                                                                                                                                                                                                                      sysdig_container_net_connection_in_count
                                                                                                                                                                                                                                                                                                                                                                      sysdig_host_net_connection_in_count
                                                                                                                                                                                                                                                                                                                                                                      sysdig_program_net_connection_in_count
                                                                                                                                                                                                                                                                                                                                                                      net.connection.count.outsysdig_connection_net_connection_out_count
                                                                                                                                                                                                                                                                                                                                                                      sysdig_container_net_connection_out_count
                                                                                                                                                                                                                                                                                                                                                                      sysdig_host_net_connection_out_count
                                                                                                                                                                                                                                                                                                                                                                      sysdig_program_net_connection_out_count
                                                                                                                                                                                                                                                                                                                                                                      net.connection.count.totalsysdig_connection_net_connection_total_count
                                                                                                                                                                                                                                                                                                                                                                      sysdig_container_net_connection_total_count
                                                                                                                                                                                                                                                                                                                                                                      sysdig_host_net_connection_total_count
                                                                                                                                                                                                                                                                                                                                                                      sysdig_program_net_connection_total_count
                                                                                                                                                                                                                                                                                                                                                                      net.request.countsysdig_connection_net_request_count
                                                                                                                                                                                                                                                                                                                                                                      sysdig_container_net_request_count
                                                                                                                                                                                                                                                                                                                                                                      sysdig_host_net_request_count
                                                                                                                                                                                                                                                                                                                                                                      sysdig_program_net_request_count
                                                                                                                                                                                                                                                                                                                                                                      net.error.countsysdig_container_net_error_count
                                                                                                                                                                                                                                                                                                                                                                      sysdig_host_net_error_count
                                                                                                                                                                                                                                                                                                                                                                      sysdig_program_net_error_count
                                                                                                                                                                                                                                                                                                                                                                      net.request.count.insysdig_connection_net_request_in_count
                                                                                                                                                                                                                                                                                                                                                                      sysdig_container_net_request_in_count
                                                                                                                                                                                                                                                                                                                                                                      sysdig_host_net_request_in_count
                                                                                                                                                                                                                                                                                                                                                                      sysdig_program_net_request_in_count
                                                                                                                                                                                                                                                                                                                                                                      net.request.count.outsysdig_connection_net_request_out_count
                                                                                                                                                                                                                                                                                                                                                                      sysdig_container_net_request_out_count
                                                                                                                                                                                                                                                                                                                                                                      sysdig_host_net_request_out_count
                                                                                                                                                                                                                                                                                                                                                                      sysdig_program_net_request_out_count
                                                                                                                                                                                                                                                                                                                                                                      net.request.timesysdig_connection_net_request_time
                                                                                                                                                                                                                                                                                                                                                                      sysdig_container_net_request_time
                                                                                                                                                                                                                                                                                                                                                                      sysdig_host_net_request_time
                                                                                                                                                                                                                                                                                                                                                                      sysdig_program_net_request_time
                                                                                                                                                                                                                                                                                                                                                                      net.request.time.insysdig_connection_net_request_in_time
                                                                                                                                                                                                                                                                                                                                                                      sysdig_container_net_request_in_time
                                                                                                                                                                                                                                                                                                                                                                      sysdig_host_net_request_in_time
                                                                                                                                                                                                                                                                                                                                                                      sysdig_program_net_request_in_time
                                                                                                                                                                                                                                                                                                                                                                      net.request.time.outsysdig_connection_net_request_out_time
                                                                                                                                                                                                                                                                                                                                                                      sysdig_container_net_request_out_time
                                                                                                                                                                                                                                                                                                                                                                      sysdig_host_net_request_out_time
                                                                                                                                                                                                                                                                                                                                                                      sysdig_program_net_request_out_time
                                                                                                                                                                                                                                                                                                                                                                      net.server.bytes.insysdig_container_net_server_in_bytes
                                                                                                                                                                                                                                                                                                                                                                      sysdig_host_net_server_in_bytes
                                                                                                                                                                                                                                                                                                                                                                      net.server.bytes.outsysdig_container_net_server_out_bytes
                                                                                                                                                                                                                                                                                                                                                                      sysdig_host_net_server_out_bytes
                                                                                                                                                                                                                                                                                                                                                                      net.server.bytes.totalsysdig_container_net_server_total_bytes
                                                                                                                                                                                                                                                                                                                                                                      sysdig_host_net_server_total_bytes
                                                                                                                                                                                                                                                                                                                                                                      net.sql.error.countsysdig_container_net_sql_error_count
                                                                                                                                                                                                                                                                                                                                                                      sysdig_host_net_sql_error_count
                                                                                                                                                                                                                                                                                                                                                                      net.sql.request.countsysdig_container_net_sql_request_count
                                                                                                                                                                                                                                                                                                                                                                      sysdig_host_net_sql_request_count
                                                                                                                                                                                                                                                                                                                                                                      net.tcp.queue.lensysdig_container_net_tcp_queue_len
                                                                                                                                                                                                                                                                                                                                                                      sysdig_host_net_tcp_queue_len
                                                                                                                                                                                                                                                                                                                                                                      sysdig_program_net_tcp_queue_len
                                                                                                                                                                                                                                                                                                                                                                      proc.countsysdig_container_proc_count
                                                                                                                                                                                                                                                                                                                                                                      sysdig_host_proc_count
                                                                                                                                                                                                                                                                                                                                                                      sysdig_program_proc_count
                                                                                                                                                                                                                                                                                                                                                                      thread.countsysdig_container_thread_count
                                                                                                                                                                                                                                                                                                                                                                      sysdig_host_thread_count
                                                                                                                                                                                                                                                                                                                                                                      sysdig_program_thread_count
                                                                                                                                                                                                                                                                                                                                                                      uptimesysdig_container_up
                                                                                                                                                                                                                                                                                                                                                                      sysdig_host_up
                                                                                                                                                                                                                                                                                                                                                                      sysdig_program_up

                                                                                                                                                                                                                                                                                                                                                                      10.1.2 -

                                                                                                                                                                                                                                                                                                                                                                      Mapping Between Classic Metrics and PromQL Metrics

                                                                                                                                                                                                                                                                                                                                                                      Starting SaaS v 3.2.6, Sysdig classic metrics and labels have been renamed to be aligned with Prometheus naming convention. For example, Sysdig classic metrics have a dot-oriented hierarchy, whereas Prometheus has label-based metric organization. The table below helps you identify the Prometheus metrics and labels and the corresponding ones in the Sysdig classic system.

                                                                                                                                                                                                                                                                                                                                                                      Entity

                                                                                                                                                                                                                                                                                                                                                                      Type

                                                                                                                                                                                                                                                                                                                                                                      PromQL Metric Name

                                                                                                                                                                                                                                                                                                                                                                      Classic Metric Name

                                                                                                                                                                                                                                                                                                                                                                      Label

                                                                                                                                                                                                                                                                                                                                                                      Classic Label

                                                                                                                                                                                                                                                                                                                                                                      host

                                                                                                                                                                                                                                                                                                                                                                      info

                                                                                                                                                                                                                                                                                                                                                                      sysdig_host_info

                                                                                                                                                                                                                                                                                                                                                                      Not exposed

                                                                                                                                                                                                                                                                                                                                                                      • host_mac

                                                                                                                                                                                                                                                                                                                                                                      • host

                                                                                                                                                                                                                                                                                                                                                                      • instance_id

                                                                                                                                                                                                                                                                                                                                                                      • agent_tag_{*}

                                                                                                                                                                                                                                                                                                                                                                      • host.mac

                                                                                                                                                                                                                                                                                                                                                                      • host.hostName

                                                                                                                                                                                                                                                                                                                                                                      • host.instanceId

                                                                                                                                                                                                                                                                                                                                                                      • agent.tag.{*}

                                                                                                                                                                                                                                                                                                                                                                      sysdig_cloud_provider_info

                                                                                                                                                                                                                                                                                                                                                                      • host_mac

                                                                                                                                                                                                                                                                                                                                                                      • provider_id

                                                                                                                                                                                                                                                                                                                                                                      • account_id

                                                                                                                                                                                                                                                                                                                                                                      • region

                                                                                                                                                                                                                                                                                                                                                                      • availability_zone

                                                                                                                                                                                                                                                                                                                                                                      • instance_type

                                                                                                                                                                                                                                                                                                                                                                      • tag_{*}

                                                                                                                                                                                                                                                                                                                                                                      • security_groups

                                                                                                                                                                                                                                                                                                                                                                      • host_ip_public

                                                                                                                                                                                                                                                                                                                                                                      • host_ip_private

                                                                                                                                                                                                                                                                                                                                                                      • host_name

                                                                                                                                                                                                                                                                                                                                                                      • name

                                                                                                                                                                                                                                                                                                                                                                      • host.mac

                                                                                                                                                                                                                                                                                                                                                                      • cloudProvider.id

                                                                                                                                                                                                                                                                                                                                                                      • cloudProvider.account.id

                                                                                                                                                                                                                                                                                                                                                                      • cloudProvider.region

                                                                                                                                                                                                                                                                                                                                                                      • cloudProvider.availabilityZone

                                                                                                                                                                                                                                                                                                                                                                      • cloudProvider.instance.type

                                                                                                                                                                                                                                                                                                                                                                      • cloudProvider.tag.{*}

                                                                                                                                                                                                                                                                                                                                                                      • cloudProvider.securityGroups

                                                                                                                                                                                                                                                                                                                                                                      • cloudProvider.host.ip.public

                                                                                                                                                                                                                                                                                                                                                                      • cloudProvider.host.ip.private

                                                                                                                                                                                                                                                                                                                                                                      • cloudProvider.host.name

                                                                                                                                                                                                                                                                                                                                                                      • cloudProvider.name

                                                                                                                                                                                                                                                                                                                                                                      data

                                                                                                                                                                                                                                                                                                                                                                      sysdig_host_cpu_used_percent

                                                                                                                                                                                                                                                                                                                                                                      cpu.used.percent

                                                                                                                                                                                                                                                                                                                                                                      • host_mac

                                                                                                                                                                                                                                                                                                                                                                      • host

                                                                                                                                                                                                                                                                                                                                                                      • host.mac

                                                                                                                                                                                                                                                                                                                                                                      • host.hostname

                                                                                                                                                                                                                                                                                                                                                                      sysdig_host_cpu_cores_used

                                                                                                                                                                                                                                                                                                                                                                      cpu.cores.used

                                                                                                                                                                                                                                                                                                                                                                      sysdig_host_cpu_user_percent

                                                                                                                                                                                                                                                                                                                                                                      cpu.user.percent

                                                                                                                                                                                                                                                                                                                                                                      sysdig_host_cpu_idle_percent

                                                                                                                                                                                                                                                                                                                                                                      cpu.idle.percent

                                                                                                                                                                                                                                                                                                                                                                      sysdig_host_cpu_iowait_percent

                                                                                                                                                                                                                                                                                                                                                                      cpu.iowait.percent

                                                                                                                                                                                                                                                                                                                                                                      sysdig_host_cpu_nice_percent

                                                                                                                                                                                                                                                                                                                                                                      cpu.nice.percent

                                                                                                                                                                                                                                                                                                                                                                      sysdig_host_cpu_stolen_percent

                                                                                                                                                                                                                                                                                                                                                                      cpu.stolen.percent

                                                                                                                                                                                                                                                                                                                                                                      sysdig_host_cpu_system_percent

                                                                                                                                                                                                                                                                                                                                                                      cpu.system.percent

                                                                                                                                                                                                                                                                                                                                                                      sysdig_host_fd_used_percent

                                                                                                                                                                                                                                                                                                                                                                      fd.used.percent

                                                                                                                                                                                                                                                                                                                                                                      sysdig_host_file_error_open_count

                                                                                                                                                                                                                                                                                                                                                                      file.error.open.count

                                                                                                                                                                                                                                                                                                                                                                      sysdig_host_file_error_total_count

                                                                                                                                                                                                                                                                                                                                                                      file.error.total.count

                                                                                                                                                                                                                                                                                                                                                                      sysdig_host_file_in_bytes

                                                                                                                                                                                                                                                                                                                                                                      file.bytes.in

                                                                                                                                                                                                                                                                                                                                                                      sysdig_host_file_in_iops

                                                                                                                                                                                                                                                                                                                                                                      file.iops.in

                                                                                                                                                                                                                                                                                                                                                                      sysdig_host_file_in_time

                                                                                                                                                                                                                                                                                                                                                                      file.time.in

                                                                                                                                                                                                                                                                                                                                                                      sysdig_host_file_open_count

                                                                                                                                                                                                                                                                                                                                                                      file.open.count

                                                                                                                                                                                                                                                                                                                                                                      sysdig_host_file_out_bytes

                                                                                                                                                                                                                                                                                                                                                                      file.bytes.out

                                                                                                                                                                                                                                                                                                                                                                      sysdig_host_file_out_iops

                                                                                                                                                                                                                                                                                                                                                                      file.iops.out

                                                                                                                                                                                                                                                                                                                                                                      sysdig_host_file_out_time

                                                                                                                                                                                                                                                                                                                                                                      file.time.out

                                                                                                                                                                                                                                                                                                                                                                      sysdig_host_load_average_15m

                                                                                                                                                                                                                                                                                                                                                                      load.average.15m

                                                                                                                                                                                                                                                                                                                                                                      sysdig_host_load_average_1m

                                                                                                                                                                                                                                                                                                                                                                      load.average.1m

                                                                                                                                                                                                                                                                                                                                                                      sysdig_host_load_average_5m

                                                                                                                                                                                                                                                                                                                                                                      load.average.5m

                                                                                                                                                                                                                                                                                                                                                                      sysdig_host_memory_available_bytes

                                                                                                                                                                                                                                                                                                                                                                      memory.bytes.available

                                                                                                                                                                                                                                                                                                                                                                      sysdig_host_memory_total_bytes

                                                                                                                                                                                                                                                                                                                                                                      memory.bytes.total

                                                                                                                                                                                                                                                                                                                                                                      sysdig_host_memory_used_bytes

                                                                                                                                                                                                                                                                                                                                                                      memory.bytes.used

                                                                                                                                                                                                                                                                                                                                                                      sysdig_host_memory_swap_available_bytes

                                                                                                                                                                                                                                                                                                                                                                      memory.swap.bytes.available

                                                                                                                                                                                                                                                                                                                                                                      sysdig_host_memory_swap_total_bytes

                                                                                                                                                                                                                                                                                                                                                                      memory.swap.bytes.total

                                                                                                                                                                                                                                                                                                                                                                      sysdig_host_memory_swap_used_bytes

                                                                                                                                                                                                                                                                                                                                                                      memory.swap.bytes.used

                                                                                                                                                                                                                                                                                                                                                                      sysdig_host_memory_virtual_bytes

                                                                                                                                                                                                                                                                                                                                                                      memory.bytes.virtual

                                                                                                                                                                                                                                                                                                                                                                      sysdig_host_net_connection_in_count

                                                                                                                                                                                                                                                                                                                                                                      net.connection.count.in

                                                                                                                                                                                                                                                                                                                                                                      sysdig_host_net_connection_out_count

                                                                                                                                                                                                                                                                                                                                                                      net.connection.count.out

                                                                                                                                                                                                                                                                                                                                                                      sysdig_host_net_error_count

                                                                                                                                                                                                                                                                                                                                                                      net.error.count

                                                                                                                                                                                                                                                                                                                                                                      sysdig_host_net_in_bytes

                                                                                                                                                                                                                                                                                                                                                                      net.bytes.in

                                                                                                                                                                                                                                                                                                                                                                      sysdig_host_net_out_bytes

                                                                                                                                                                                                                                                                                                                                                                      net.bytes.out

                                                                                                                                                                                                                                                                                                                                                                      sysdig_host_net_tcp_queue_len

                                                                                                                                                                                                                                                                                                                                                                      net.tcp.queue.len

                                                                                                                                                                                                                                                                                                                                                                      sysdig_host_proc_count

                                                                                                                                                                                                                                                                                                                                                                      proc.count

                                                                                                                                                                                                                                                                                                                                                                      sysdig_host_system_uptime

                                                                                                                                                                                                                                                                                                                                                                      system.uptime

                                                                                                                                                                                                                                                                                                                                                                      sysdig_host_thread_count

                                                                                                                                                                                                                                                                                                                                                                      thread.count

                                                                                                                                                                                                                                                                                                                                                                      container

                                                                                                                                                                                                                                                                                                                                                                      info

                                                                                                                                                                                                                                                                                                                                                                      sysdig_container_info

                                                                                                                                                                                                                                                                                                                                                                      Not exposed

                                                                                                                                                                                                                                                                                                                                                                      container_id

                                                                                                                                                                                                                                                                                                                                                                      container_id

                                                                                                                                                                                                                                                                                                                                                                      container_full_id

                                                                                                                                                                                                                                                                                                                                                                      none

                                                                                                                                                                                                                                                                                                                                                                      host_mac

                                                                                                                                                                                                                                                                                                                                                                      host.mac

                                                                                                                                                                                                                                                                                                                                                                      container

                                                                                                                                                                                                                                                                                                                                                                      container.name

                                                                                                                                                                                                                                                                                                                                                                      container_type

                                                                                                                                                                                                                                                                                                                                                                      container.type

                                                                                                                                                                                                                                                                                                                                                                      image

                                                                                                                                                                                                                                                                                                                                                                      container.image

                                                                                                                                                                                                                                                                                                                                                                      image_id

                                                                                                                                                                                                                                                                                                                                                                      container.image.id

                                                                                                                                                                                                                                                                                                                                                                      mesos_task_id

                                                                                                                                                                                                                                                                                                                                                                      container.mesosTaskId

                                                                                                                                                                                                                                                                                                                                                                      Only available in Mesos orchestrator.

                                                                                                                                                                                                                                                                                                                                                                      cluster

                                                                                                                                                                                                                                                                                                                                                                      kubernetes.cluster.name

                                                                                                                                                                                                                                                                                                                                                                      Present only if the container is part of Kubernetes.

                                                                                                                                                                                                                                                                                                                                                                      pod

                                                                                                                                                                                                                                                                                                                                                                      kubernetes.pod.name

                                                                                                                                                                                                                                                                                                                                                                      Present only if the container is part of Kubernetes

                                                                                                                                                                                                                                                                                                                                                                      namespace

                                                                                                                                                                                                                                                                                                                                                                      kubernetes.namespace.name

                                                                                                                                                                                                                                                                                                                                                                      Present only if the container is part of Kubernetes.

                                                                                                                                                                                                                                                                                                                                                                      data

                                                                                                                                                                                                                                                                                                                                                                      sysdig_container_cpu_used_percent

                                                                                                                                                                                                                                                                                                                                                                      cpu.used.percent

                                                                                                                                                                                                                                                                                                                                                                      • host_mac

                                                                                                                                                                                                                                                                                                                                                                      • container_id

                                                                                                                                                                                                                                                                                                                                                                      • container_type

                                                                                                                                                                                                                                                                                                                                                                      • container

                                                                                                                                                                                                                                                                                                                                                                      • host.mac

                                                                                                                                                                                                                                                                                                                                                                      • container.id

                                                                                                                                                                                                                                                                                                                                                                      • container.type

                                                                                                                                                                                                                                                                                                                                                                      • container.name

                                                                                                                                                                                                                                                                                                                                                                      sysdig_container_cpu_cores_used

                                                                                                                                                                                                                                                                                                                                                                      cpu.cores.used

                                                                                                                                                                                                                                                                                                                                                                      sysdig_container_cpu_cores_used_percent

                                                                                                                                                                                                                                                                                                                                                                      cpu.cores.used.percent

                                                                                                                                                                                                                                                                                                                                                                      sysdig_container_cpu_quota_used_percent

                                                                                                                                                                                                                                                                                                                                                                      cpu.quota.used.percent

                                                                                                                                                                                                                                                                                                                                                                      sysdig_container_cpu_shares

                                                                                                                                                                                                                                                                                                                                                                      cpu.shares.count

                                                                                                                                                                                                                                                                                                                                                                      sysdig_container_cpu_shares_used_percent

                                                                                                                                                                                                                                                                                                                                                                      cpu.shares.used.percent

                                                                                                                                                                                                                                                                                                                                                                      sysdig_container_fd_used_percent

                                                                                                                                                                                                                                                                                                                                                                      fd.used.percent

                                                                                                                                                                                                                                                                                                                                                                      sysdig_container_file_error_open_count

                                                                                                                                                                                                                                                                                                                                                                      file.error.open.count

                                                                                                                                                                                                                                                                                                                                                                      sysdig_container_file_error_total_count

                                                                                                                                                                                                                                                                                                                                                                      file.error.total.count

                                                                                                                                                                                                                                                                                                                                                                      sysdig_container_file_in_bytes

                                                                                                                                                                                                                                                                                                                                                                      file.bytes.in

                                                                                                                                                                                                                                                                                                                                                                      sysdig_container_file_in_iops

                                                                                                                                                                                                                                                                                                                                                                      file.iops.in

                                                                                                                                                                                                                                                                                                                                                                      sysdig_container_file_in_time

                                                                                                                                                                                                                                                                                                                                                                      file.time.in

                                                                                                                                                                                                                                                                                                                                                                      sysdig_container_file_open_count

                                                                                                                                                                                                                                                                                                                                                                      file.open.count

                                                                                                                                                                                                                                                                                                                                                                      sysdig_container_file_out_bytes

                                                                                                                                                                                                                                                                                                                                                                      file.bytes.out

                                                                                                                                                                                                                                                                                                                                                                      sysdig_container_file_out_iops

                                                                                                                                                                                                                                                                                                                                                                      file.iops.out

                                                                                                                                                                                                                                                                                                                                                                      sysdig_container_file_out_time

                                                                                                                                                                                                                                                                                                                                                                      file.time.out

                                                                                                                                                                                                                                                                                                                                                                      sysdig_container_memory_limit_bytes

                                                                                                                                                                                                                                                                                                                                                                      memory.limit.bytes

                                                                                                                                                                                                                                                                                                                                                                      sysdig_container_memory_limit_used_percent

                                                                                                                                                                                                                                                                                                                                                                      memory.limit.used.percent

                                                                                                                                                                                                                                                                                                                                                                      sysdig_container_memory_swap_available_bytes

                                                                                                                                                                                                                                                                                                                                                                      memory.swap.bytes.available

                                                                                                                                                                                                                                                                                                                                                                      sysdig_container_memory_swap_total_bytes

                                                                                                                                                                                                                                                                                                                                                                      memory.swap.bytes.total

                                                                                                                                                                                                                                                                                                                                                                      sysdig_container_memory_swap_used_bytes

                                                                                                                                                                                                                                                                                                                                                                      memory.swap.bytes.used

                                                                                                                                                                                                                                                                                                                                                                      sysdig_container_memory_used_bytes

                                                                                                                                                                                                                                                                                                                                                                      memory.bytes.used

                                                                                                                                                                                                                                                                                                                                                                      sysdig_container_memory_virtual_bytes

                                                                                                                                                                                                                                                                                                                                                                      memory.bytes.virtual

                                                                                                                                                                                                                                                                                                                                                                      sysdig_container_net_connection_in_count

                                                                                                                                                                                                                                                                                                                                                                      net.connection.count.in

                                                                                                                                                                                                                                                                                                                                                                      sysdig_container_net_connection_out_count

                                                                                                                                                                                                                                                                                                                                                                      net.connection.count.out

                                                                                                                                                                                                                                                                                                                                                                      sysdig_container_net_error_count

                                                                                                                                                                                                                                                                                                                                                                      net.error.count

                                                                                                                                                                                                                                                                                                                                                                      sysdig_container_net_in_bytes

                                                                                                                                                                                                                                                                                                                                                                      net.bytes.in

                                                                                                                                                                                                                                                                                                                                                                      sysdig_container_net_out_bytes

                                                                                                                                                                                                                                                                                                                                                                      net.bytes.out

                                                                                                                                                                                                                                                                                                                                                                      sysdig_container_net_tcp_queue_len

                                                                                                                                                                                                                                                                                                                                                                      net.tcp.queue.len

                                                                                                                                                                                                                                                                                                                                                                      sysdig_container_proc_count

                                                                                                                                                                                                                                                                                                                                                                      proc.count

                                                                                                                                                                                                                                                                                                                                                                      sysdig_container_swap_limit_bytes

                                                                                                                                                                                                                                                                                                                                                                      swap.limit.bytes

                                                                                                                                                                                                                                                                                                                                                                      sysdig_container_thread_count

                                                                                                                                                                                                                                                                                                                                                                      thread.count

                                                                                                                                                                                                                                                                                                                                                                      Process/ Program

                                                                                                                                                                                                                                                                                                                                                                      Info

                                                                                                                                                                                                                                                                                                                                                                      sysdig_program_info

                                                                                                                                                                                                                                                                                                                                                                      not exposed

                                                                                                                                                                                                                                                                                                                                                                      program

                                                                                                                                                                                                                                                                                                                                                                      proc.name

                                                                                                                                                                                                                                                                                                                                                                      cmd_line

                                                                                                                                                                                                                                                                                                                                                                      proc.commandLine

                                                                                                                                                                                                                                                                                                                                                                      host_mac

                                                                                                                                                                                                                                                                                                                                                                      host.mac

                                                                                                                                                                                                                                                                                                                                                                      container_id

                                                                                                                                                                                                                                                                                                                                                                      container.id

                                                                                                                                                                                                                                                                                                                                                                      container_type

                                                                                                                                                                                                                                                                                                                                                                      container.type

                                                                                                                                                                                                                                                                                                                                                                      data

                                                                                                                                                                                                                                                                                                                                                                      sysdig_program_cpu_used_percent

                                                                                                                                                                                                                                                                                                                                                                      cpu.used.percent

                                                                                                                                                                                                                                                                                                                                                                      host_mac

                                                                                                                                                                                                                                                                                                                                                                      host.mac

                                                                                                                                                                                                                                                                                                                                                                      container_id

                                                                                                                                                                                                                                                                                                                                                                      container.id

                                                                                                                                                                                                                                                                                                                                                                      container_type

                                                                                                                                                                                                                                                                                                                                                                      container.type

                                                                                                                                                                                                                                                                                                                                                                      program

                                                                                                                                                                                                                                                                                                                                                                      proc.name

                                                                                                                                                                                                                                                                                                                                                                      cmd_line

                                                                                                                                                                                                                                                                                                                                                                      proc.commandLine

                                                                                                                                                                                                                                                                                                                                                                      sysdig_program_memory_used_bytes

                                                                                                                                                                                                                                                                                                                                                                      memory.bytes.used

                                                                                                                                                                                                                                                                                                                                                                      host_mac

                                                                                                                                                                                                                                                                                                                                                                      host.mac

                                                                                                                                                                                                                                                                                                                                                                      container_id

                                                                                                                                                                                                                                                                                                                                                                      container.id

                                                                                                                                                                                                                                                                                                                                                                      container_type

                                                                                                                                                                                                                                                                                                                                                                      container.type

                                                                                                                                                                                                                                                                                                                                                                      program

                                                                                                                                                                                                                                                                                                                                                                      proc.name

                                                                                                                                                                                                                                                                                                                                                                      cmd_line

                                                                                                                                                                                                                                                                                                                                                                      proc.commandLine

                                                                                                                                                                                                                                                                                                                                                                      sysdig_program_net_in_bytes

                                                                                                                                                                                                                                                                                                                                                                      net.bytes.in

                                                                                                                                                                                                                                                                                                                                                                      container_id

                                                                                                                                                                                                                                                                                                                                                                      container.id

                                                                                                                                                                                                                                                                                                                                                                      host_mac

                                                                                                                                                                                                                                                                                                                                                                      host.mac

                                                                                                                                                                                                                                                                                                                                                                      container_type

                                                                                                                                                                                                                                                                                                                                                                      container.type

                                                                                                                                                                                                                                                                                                                                                                      program

                                                                                                                                                                                                                                                                                                                                                                      proc.name

                                                                                                                                                                                                                                                                                                                                                                      cmd_line

                                                                                                                                                                                                                                                                                                                                                                      proc.commandLine

                                                                                                                                                                                                                                                                                                                                                                      sysdig_program_net_out_bytes

                                                                                                                                                                                                                                                                                                                                                                      net.bytes.out

                                                                                                                                                                                                                                                                                                                                                                      host_mac

                                                                                                                                                                                                                                                                                                                                                                      host.mac

                                                                                                                                                                                                                                                                                                                                                                      container_id

                                                                                                                                                                                                                                                                                                                                                                      container.id

                                                                                                                                                                                                                                                                                                                                                                      container_type

                                                                                                                                                                                                                                                                                                                                                                      container.type

                                                                                                                                                                                                                                                                                                                                                                      program

                                                                                                                                                                                                                                                                                                                                                                      proc.name

                                                                                                                                                                                                                                                                                                                                                                      cmd_line

                                                                                                                                                                                                                                                                                                                                                                      proc.commandLine

                                                                                                                                                                                                                                                                                                                                                                      sysdig_program_proc_count

                                                                                                                                                                                                                                                                                                                                                                      proc.count

                                                                                                                                                                                                                                                                                                                                                                      host_mac

                                                                                                                                                                                                                                                                                                                                                                      host.mac

                                                                                                                                                                                                                                                                                                                                                                      container_id

                                                                                                                                                                                                                                                                                                                                                                      container.id

                                                                                                                                                                                                                                                                                                                                                                      container_type

                                                                                                                                                                                                                                                                                                                                                                      container.type

                                                                                                                                                                                                                                                                                                                                                                      program

                                                                                                                                                                                                                                                                                                                                                                      proc.name

                                                                                                                                                                                                                                                                                                                                                                      cmd_line

                                                                                                                                                                                                                                                                                                                                                                      proc.commandLine

                                                                                                                                                                                                                                                                                                                                                                      sysdig_program_thread_count

                                                                                                                                                                                                                                                                                                                                                                      thread.count

                                                                                                                                                                                                                                                                                                                                                                      host_mac

                                                                                                                                                                                                                                                                                                                                                                      host.mac

                                                                                                                                                                                                                                                                                                                                                                      container_id

                                                                                                                                                                                                                                                                                                                                                                      container.id

                                                                                                                                                                                                                                                                                                                                                                      container_type

                                                                                                                                                                                                                                                                                                                                                                      container.type

                                                                                                                                                                                                                                                                                                                                                                      program

                                                                                                                                                                                                                                                                                                                                                                      proc.name

                                                                                                                                                                                                                                                                                                                                                                      cmd_line

                                                                                                                                                                                                                                                                                                                                                                      proc.commandLine

                                                                                                                                                                                                                                                                                                                                                                      fs

                                                                                                                                                                                                                                                                                                                                                                      info

                                                                                                                                                                                                                                                                                                                                                                      sysdig_fs_info

                                                                                                                                                                                                                                                                                                                                                                      not exposed

                                                                                                                                                                                                                                                                                                                                                                      host_mac

                                                                                                                                                                                                                                                                                                                                                                      host.mac

                                                                                                                                                                                                                                                                                                                                                                      container_id

                                                                                                                                                                                                                                                                                                                                                                      container.id

                                                                                                                                                                                                                                                                                                                                                                      container_type

                                                                                                                                                                                                                                                                                                                                                                      container.type

                                                                                                                                                                                                                                                                                                                                                                      device

                                                                                                                                                                                                                                                                                                                                                                      fs.device

                                                                                                                                                                                                                                                                                                                                                                      mount_dir

                                                                                                                                                                                                                                                                                                                                                                      fs.mountDir

                                                                                                                                                                                                                                                                                                                                                                      type

                                                                                                                                                                                                                                                                                                                                                                      fs.type

                                                                                                                                                                                                                                                                                                                                                                      data

                                                                                                                                                                                                                                                                                                                                                                      sysdig_fs_free_bytes

                                                                                                                                                                                                                                                                                                                                                                      fs.bytes.free

                                                                                                                                                                                                                                                                                                                                                                      host_mac

                                                                                                                                                                                                                                                                                                                                                                      host.mac

                                                                                                                                                                                                                                                                                                                                                                      container_id

                                                                                                                                                                                                                                                                                                                                                                      container.id

                                                                                                                                                                                                                                                                                                                                                                      container_type

                                                                                                                                                                                                                                                                                                                                                                      container.type

                                                                                                                                                                                                                                                                                                                                                                      device

                                                                                                                                                                                                                                                                                                                                                                      fs.device

                                                                                                                                                                                                                                                                                                                                                                      sysdig_fs_inodes_total_count

                                                                                                                                                                                                                                                                                                                                                                      fs.inodes.total.count

                                                                                                                                                                                                                                                                                                                                                                      host_mac

                                                                                                                                                                                                                                                                                                                                                                      host.mac

                                                                                                                                                                                                                                                                                                                                                                      container_id

                                                                                                                                                                                                                                                                                                                                                                      container.id

                                                                                                                                                                                                                                                                                                                                                                      container_type

                                                                                                                                                                                                                                                                                                                                                                      container.type

                                                                                                                                                                                                                                                                                                                                                                      device

                                                                                                                                                                                                                                                                                                                                                                      fs.device

                                                                                                                                                                                                                                                                                                                                                                      sysdig_fs_inodes_used_count

                                                                                                                                                                                                                                                                                                                                                                      fs.inodes.used.count

                                                                                                                                                                                                                                                                                                                                                                      host_mac

                                                                                                                                                                                                                                                                                                                                                                      host.mac

                                                                                                                                                                                                                                                                                                                                                                      container_id

                                                                                                                                                                                                                                                                                                                                                                      container.id

                                                                                                                                                                                                                                                                                                                                                                      container_type

                                                                                                                                                                                                                                                                                                                                                                      container.type

                                                                                                                                                                                                                                                                                                                                                                      device

                                                                                                                                                                                                                                                                                                                                                                      fs.device

                                                                                                                                                                                                                                                                                                                                                                      sysdig_fs_total_bytes

                                                                                                                                                                                                                                                                                                                                                                      fs.bytes.total

                                                                                                                                                                                                                                                                                                                                                                      host_mac

                                                                                                                                                                                                                                                                                                                                                                      host.mac

                                                                                                                                                                                                                                                                                                                                                                      container_id

                                                                                                                                                                                                                                                                                                                                                                      container.id

                                                                                                                                                                                                                                                                                                                                                                      container_type

                                                                                                                                                                                                                                                                                                                                                                      container.type

                                                                                                                                                                                                                                                                                                                                                                      device

                                                                                                                                                                                                                                                                                                                                                                      fs.device

                                                                                                                                                                                                                                                                                                                                                                      fs.bytes.used

                                                                                                                                                                                                                                                                                                                                                                      host_mac

                                                                                                                                                                                                                                                                                                                                                                      host.mac

                                                                                                                                                                                                                                                                                                                                                                      container_id

                                                                                                                                                                                                                                                                                                                                                                      container.id

                                                                                                                                                                                                                                                                                                                                                                      container_type

                                                                                                                                                                                                                                                                                                                                                                      container.type

                                                                                                                                                                                                                                                                                                                                                                      devide

                                                                                                                                                                                                                                                                                                                                                                      fs.device

                                                                                                                                                                                                                                                                                                                                                                      10.1.3 -

                                                                                                                                                                                                                                                                                                                                                                      Mapping Legacy Sysdig Kubernetes Metrics with Prometheus Metrics

                                                                                                                                                                                                                                                                                                                                                                      Prometheus metrics, in Kubernetes parlance, are nothing but Kube State Metrics. These metrics are available in Sysdig PromQL and can be mapped to existing Sysdig Kubernetes metrics.

                                                                                                                                                                                                                                                                                                                                                                      For descriptions on Kubernetes State Metrics, see Kubernetes State Metrics.

                                                                                                                                                                                                                                                                                                                                                                      Resource

                                                                                                                                                                                                                                                                                                                                                                      Sysdig Metrics

                                                                                                                                                                                                                                                                                                                                                                      Kubernetes State Metrics

                                                                                                                                                                                                                                                                                                                                                                      Label

                                                                                                                                                                                                                                                                                                                                                                      Example / More Information

                                                                                                                                                                                                                                                                                                                                                                      Pod

                                                                                                                                                                                                                                                                                                                                                                      kubernetes.pod.containers.waiting

                                                                                                                                                                                                                                                                                                                                                                      kube_pod_container_status_waiting

                                                                                                                                                                                                                                                                                                                                                                      • container=<container-name>

                                                                                                                                                                                                                                                                                                                                                                      • pod=<pod-name>

                                                                                                                                                                                                                                                                                                                                                                      • namespace=<pod-namespace>

                                                                                                                                                                                                                                                                                                                                                                      kubernetes.pod.resourceLimits.cpuCores

                                                                                                                                                                                                                                                                                                                                                                      kubernetes.pod.resourceLimits.memBytes

                                                                                                                                                                                                                                                                                                                                                                      kube_pod_container_resource_limits

                                                                                                                                                                                                                                                                                                                                                                      kube_pod_sysdig_resource_limits_memory_bytes

                                                                                                                                                                                                                                                                                                                                                                      kube_pod_sysdig_resource_limits_cpu_cores

                                                                                                                                                                                                                                                                                                                                                                      • resource=<resource-name>

                                                                                                                                                                                                                                                                                                                                                                      • unit=<resource-unit>

                                                                                                                                                                                                                                                                                                                                                                      • pod=<pod-name>

                                                                                                                                                                                                                                                                                                                                                                      • namespace=<pod-namespace>

                                                                                                                                                                                                                                                                                                                                                                      • node=< node-name>

                                                                                                                                                                                                                                                                                                                                                                      {namespace="default",pod="pod0",container="pod1_con1",resource="cpu",unit="core"}

                                                                                                                                                                                                                                                                                                                                                                      {namespace="default",pod="pod0",container="pod1_con1",resource="memory",unit="byte"}

                                                                                                                                                                                                                                                                                                                                                                      kubernetes.pod.resourceRequests.cpuCores

                                                                                                                                                                                                                                                                                                                                                                      kubernetes.pod.resourceRequests.memBytes

                                                                                                                                                                                                                                                                                                                                                                      kube_pod_container_resource_requests

                                                                                                                                                                                                                                                                                                                                                                      kube_pod_sysdig_resource_requests_cpu_cores

                                                                                                                                                                                                                                                                                                                                                                      kube_pod_sysdig_resource_requests_memory_bytes

                                                                                                                                                                                                                                                                                                                                                                      • resource=<resource-name>

                                                                                                                                                                                                                                                                                                                                                                      • unit=<resource-unit>

                                                                                                                                                                                                                                                                                                                                                                      • container=<container-name>

                                                                                                                                                                                                                                                                                                                                                                      • pod=<pod-name>

                                                                                                                                                                                                                                                                                                                                                                      • namespace=<pod-namespace>

                                                                                                                                                                                                                                                                                                                                                                      • node=< node-name>

                                                                                                                                                                                                                                                                                                                                                                      {namespace="default",pod="pod0",container="pod1_con1",resource="cpu",unit="core"}

                                                                                                                                                                                                                                                                                                                                                                      {namespace="default",pod="pod0",container="pod1_con1",resource="memory",unit="byte"}

                                                                                                                                                                                                                                                                                                                                                                      kubernetes.pod.status.ready

                                                                                                                                                                                                                                                                                                                                                                      kube_pod_status_ready

                                                                                                                                                                                                                                                                                                                                                                      • pod=<pod-name>

                                                                                                                                                                                                                                                                                                                                                                      • namespace=<pod-namespace>

                                                                                                                                                                                                                                                                                                                                                                      • condition=<true|false|unknown>

                                                                                                                                                                                                                                                                                                                                                                      kube_pod_info

                                                                                                                                                                                                                                                                                                                                                                      • pod=<pod-name>

                                                                                                                                                                                                                                                                                                                                                                      • namespace=<pod-namespace>

                                                                                                                                                                                                                                                                                                                                                                      • host_ip=<host-ip>

                                                                                                                                                                                                                                                                                                                                                                      • pod_ip=<pod-ip>

                                                                                                                                                                                                                                                                                                                                                                      • node=<node-name>

                                                                                                                                                                                                                                                                                                                                                                      • uid=<pod-uid>

                                                                                                                                                                                                                                                                                                                                                                      {namespace="default",pod="pod0",host_ip="1.1.1.1",pod_ip="1.2.3.4",uid="abc-0",node="node1",created_by_kind="<none>",created_by_name="<none>",priority_class=""}

                                                                                                                                                                                                                                                                                                                                                                      kube_pod_owner

                                                                                                                                                                                                                                                                                                                                                                      • pod=<pod-name>

                                                                                                                                                                                                                                                                                                                                                                      • namespace=<pod-namespace>

                                                                                                                                                                                                                                                                                                                                                                      • owner_kind=<owner kind>

                                                                                                                                                                                                                                                                                                                                                                      • owner_name=<owner name>

                                                                                                                                                                                                                                                                                                                                                                      {namespace="default",pod="pod0",owner_kind="<none>",owner_name="<none>;",owner_is_controller="<none>"}

                                                                                                                                                                                                                                                                                                                                                                      kube_pod_labels

                                                                                                                                                                                                                                                                                                                                                                      • pod=<pod-name>

                                                                                                                                                                                                                                                                                                                                                                      • namespace=<pod-namespace>

                                                                                                                                                                                                                                                                                                                                                                      • label_POD_LABEL=<POD_LABEL>

                                                                                                                                                                                                                                                                                                                                                                      {namespace="default",pod="pod0", label_app="myApp"}

                                                                                                                                                                                                                                                                                                                                                                      kube_pod_container_info

                                                                                                                                                                                                                                                                                                                                                                      • pod=<pod-name>

                                                                                                                                                                                                                                                                                                                                                                      • namespace=<pod-namespace>

                                                                                                                                                                                                                                                                                                                                                                      • container_id=<containerid>

                                                                                                                                                                                                                                                                                                                                                                      {namespace="default",pod="pod0",container="container2",image="k8s.gcr.io/hyperkube2",image_id="docker://sha256:bbb",container_id="docker://cd456"}

                                                                                                                                                                                                                                                                                                                                                                      node

                                                                                                                                                                                                                                                                                                                                                                      kubernetes.node.allocatable.cpuCores

                                                                                                                                                                                                                                                                                                                                                                      kube_node_status_allocatable_cpu_cores

                                                                                                                                                                                                                                                                                                                                                                      • node=<node-address>

                                                                                                                                                                                                                                                                                                                                                                      • resource=<resource-name>

                                                                                                                                                                                                                                                                                                                                                                      • unit=<resource-unit>

                                                                                                                                                                                                                                                                                                                                                                      • node=<node-address>

                                                                                                                                                                                                                                                                                                                                                                      resource/unit have one of the values: (cpu, core); (memory, byte); (pods, integer). Sysdig currently supports only CPU, pods, and memory resources for kube_node_status_capacity metrics.

                                                                                                                                                                                                                                                                                                                                                                      "# HELP kube_node_status_capacity The capacity for different resources of a node.
                                                                                                                                                                                                                                                                                                                                                                      kube_node_status_capacity{node=""k8s-master"",resource=""hugepages_1Gi"",unit=""byte""} 0
                                                                                                                                                                                                                                                                                                                                                                      kube_node_status_capacity{node=""k8s-master"",resource=""hugepages_2Mi"",unit=""byte""} 0
                                                                                                                                                                                                                                                                                                                                                                      kube_node_status_capacity{node=""k8s-master"",resource=""memory"",unit=""byte""} 4.16342016e+09
                                                                                                                                                                                                                                                                                                                                                                      kube_node_status_capacity{node=""k8s-master"",resource=""pods"",unit=""integer""} 110
                                                                                                                                                                                                                                                                                                                                                                      kube_node_status_capacity{node=""k8s-node1"",resource=""pods"",unit=""integer""} 110
                                                                                                                                                                                                                                                                                                                                                                      kube_node_status_capacity{node=""k8s-node1"",resource=""cpu"",unit=""core""} 2
                                                                                                                                                                                                                                                                                                                                                                      kube_node_status_capacity{node=""k8s-node1"",resource=""hugepages_1Gi"",unit=""byte""} 0
                                                                                                                                                                                                                                                                                                                                                                      kube_node_status_capacity{node=""k8s-node1"",resource=""hugepages_2Mi"",unit=""byte""} 0
                                                                                                                                                                                                                                                                                                                                                                      kube_node_status_capacity{node=""k8s-node1"",resource=""memory"",unit=""byte""} 6.274154496e+09
                                                                                                                                                                                                                                                                                                                                                                      kube_node_status_capacity{node=""k8s-node2"",resource=""hugepages_1Gi"",unit=""byte""} 0
                                                                                                                                                                                                                                                                                                                                                                      kube_node_status_capacity{node=""k8s-node2"",resource=""hugepages_2Mi"",unit=""byte""} 0
                                                                                                                                                                                                                                                                                                                                                                      kube_node_status_capacity{node=""k8s-node2"",resource=""memory"",unit=""byte""} 6.274154496e+09
                                                                                                                                                                                                                                                                                                                                                                      kube_node_status_capacity{node=""k8s-node2"",resource=""pods"",unit=""integer""} 110
                                                                                                                                                                                                                                                                                                                                                                      kube_node_status_capacity{node=""k8s-node2"",resource=""cpu"",unit=""core""} 2

                                                                                                                                                                                                                                                                                                                                                                      kubernetes.node.allocatable.memBytes

                                                                                                                                                                                                                                                                                                                                                                      kube_node_status_allocatable_memory_bytes

                                                                                                                                                                                                                                                                                                                                                                      kubernetes.node.allocatable.pods

                                                                                                                                                                                                                                                                                                                                                                      kube_node_status_allocatable_pods

                                                                                                                                                                                                                                                                                                                                                                      kubernetes.node.capacity.cpuCores

                                                                                                                                                                                                                                                                                                                                                                      kube_node_status_capacity_cpu_cores

                                                                                                                                                                                                                                                                                                                                                                      • node=<node-address>

                                                                                                                                                                                                                                                                                                                                                                      • resource=<resource-name>

                                                                                                                                                                                                                                                                                                                                                                      • unit=<resource-unit>

                                                                                                                                                                                                                                                                                                                                                                      • node=<node-address>

                                                                                                                                                                                                                                                                                                                                                                      kubernetes.node.capacity.memBytes

                                                                                                                                                                                                                                                                                                                                                                      kube_node_status_capacity_memory_bytes

                                                                                                                                                                                                                                                                                                                                                                      kubernetes.node.capacity.pod

                                                                                                                                                                                                                                                                                                                                                                      kube_node_status_capacity_pods

                                                                                                                                                                                                                                                                                                                                                                      kubernetes.node.diskPressure

                                                                                                                                                                                                                                                                                                                                                                      kube_node_status_condition

                                                                                                                                                                                                                                                                                                                                                                      • node=<node-address

                                                                                                                                                                                                                                                                                                                                                                      • condition=<node-condition>

                                                                                                                                                                                                                                                                                                                                                                      • status=<true|false|unknown>

                                                                                                                                                                                                                                                                                                                                                                      kubernetes.node.memoryPressure

                                                                                                                                                                                                                                                                                                                                                                      kubernetes.node.networkUnavailable

                                                                                                                                                                                                                                                                                                                                                                      kubernetes.node.outOfDisk

                                                                                                                                                                                                                                                                                                                                                                      kubernetes.node.ready

                                                                                                                                                                                                                                                                                                                                                                      kubernetes.node.unschedulable

                                                                                                                                                                                                                                                                                                                                                                      kube_node_spec_unschedulable

                                                                                                                                                                                                                                                                                                                                                                      • node=<node-address>

                                                                                                                                                                                                                                                                                                                                                                      kube_node_info

                                                                                                                                                                                                                                                                                                                                                                      • node=<node-address>

                                                                                                                                                                                                                                                                                                                                                                      kube_node_labels

                                                                                                                                                                                                                                                                                                                                                                      • node=<node-address>

                                                                                                                                                                                                                                                                                                                                                                      • label_NODE_LABEL=<NODE_LABEL>

                                                                                                                                                                                                                                                                                                                                                                      Deployment

                                                                                                                                                                                                                                                                                                                                                                      kubernetes.deployment.replicas.available

                                                                                                                                                                                                                                                                                                                                                                      kube_deployment_status_replicas_available

                                                                                                                                                                                                                                                                                                                                                                      • deployment=<deployment-name>

                                                                                                                                                                                                                                                                                                                                                                      • namespace=<deployment-namespace>

                                                                                                                                                                                                                                                                                                                                                                      kubernetes.deployment.replicas.desired

                                                                                                                                                                                                                                                                                                                                                                      kube_deployment_spec_replicas

                                                                                                                                                                                                                                                                                                                                                                      kubernetes.deployment.replicas.paused

                                                                                                                                                                                                                                                                                                                                                                      kube_deployment_spec_paused

                                                                                                                                                                                                                                                                                                                                                                      kubernetes.deployment.replicas.running

                                                                                                                                                                                                                                                                                                                                                                      kube_deployment_status_replicas

                                                                                                                                                                                                                                                                                                                                                                      kubernetes.deployment.replicas.unavailable

                                                                                                                                                                                                                                                                                                                                                                      kube_deployment_status_replicas_unavailable

                                                                                                                                                                                                                                                                                                                                                                      kubernetes.deployment.replicas.updated

                                                                                                                                                                                                                                                                                                                                                                      kube_deployment_status_replicas_updated

                                                                                                                                                                                                                                                                                                                                                                      kube_deployment_labels

                                                                                                                                                                                                                                                                                                                                                                      job

                                                                                                                                                                                                                                                                                                                                                                      kubernetes.job.completions

                                                                                                                                                                                                                                                                                                                                                                      kube_job_spec_completions

                                                                                                                                                                                                                                                                                                                                                                      • job_name=<job-name>

                                                                                                                                                                                                                                                                                                                                                                      • namespace=<job-namespace>

                                                                                                                                                                                                                                                                                                                                                                      kubernetes.job.numFailed

                                                                                                                                                                                                                                                                                                                                                                      kube_job_failed

                                                                                                                                                                                                                                                                                                                                                                      kubernetes.job.numSucceeded

                                                                                                                                                                                                                                                                                                                                                                      kube_job_complete

                                                                                                                                                                                                                                                                                                                                                                      kubernetes.job.parallelism

                                                                                                                                                                                                                                                                                                                                                                      kube_job_spec_parallelism

                                                                                                                                                                                                                                                                                                                                                                      kube_job_status_active

                                                                                                                                                                                                                                                                                                                                                                      kube_job_info

                                                                                                                                                                                                                                                                                                                                                                      kube_job_owner

                                                                                                                                                                                                                                                                                                                                                                      • job_name=<job-name>

                                                                                                                                                                                                                                                                                                                                                                      • namespace=<job-namespace>

                                                                                                                                                                                                                                                                                                                                                                      • owner_kind=<owner kind>

                                                                                                                                                                                                                                                                                                                                                                      • owner_name=<owner name>

                                                                                                                                                                                                                                                                                                                                                                      kube_job_labels

                                                                                                                                                                                                                                                                                                                                                                      • job_name=<job-name>

                                                                                                                                                                                                                                                                                                                                                                      • namespace=<job-namespace>

                                                                                                                                                                                                                                                                                                                                                                      • label_job_label=<job_label>

                                                                                                                                                                                                                                                                                                                                                                      daemonSet

                                                                                                                                                                                                                                                                                                                                                                      kubernetes.daemonSet.pods.desired

                                                                                                                                                                                                                                                                                                                                                                      kube_daemonset_status_desired_number_scheduled

                                                                                                                                                                                                                                                                                                                                                                      • daemonset=<daemonset-name>

                                                                                                                                                                                                                                                                                                                                                                      • namespace=<daemonset-namespace>

                                                                                                                                                                                                                                                                                                                                                                      kubernetes.daemonSet.pods.misscheduled

                                                                                                                                                                                                                                                                                                                                                                      kube_daemonset_status_number_misscheduled

                                                                                                                                                                                                                                                                                                                                                                      kubernetes.daemonSet.pods.ready

                                                                                                                                                                                                                                                                                                                                                                      kube_daemonset_status_number_ready

                                                                                                                                                                                                                                                                                                                                                                      kubernetes.daemonSet.pods.scheduled

                                                                                                                                                                                                                                                                                                                                                                      kube_daemonset_status_current_number_scheduled

                                                                                                                                                                                                                                                                                                                                                                      kube_daemonset_labels

                                                                                                                                                                                                                                                                                                                                                                      • daemonset=<daemonset-name>

                                                                                                                                                                                                                                                                                                                                                                      • namespace=<daemonset-namespace>

                                                                                                                                                                                                                                                                                                                                                                      • label_daemonset_label=<daemonset_label>

                                                                                                                                                                                                                                                                                                                                                                      replicaSet

                                                                                                                                                                                                                                                                                                                                                                      kubernetes.replicaSet.replicas.fullyLabeled

                                                                                                                                                                                                                                                                                                                                                                      kube_replicaset_status_fully_labeled_replicas

                                                                                                                                                                                                                                                                                                                                                                      • replicaset=<replicaset-name>

                                                                                                                                                                                                                                                                                                                                                                      • namespace=<replicaset-namespace>

                                                                                                                                                                                                                                                                                                                                                                      kubernetes.replicaSet.replicas.ready

                                                                                                                                                                                                                                                                                                                                                                      kube_replicaset_status_ready_replicas

                                                                                                                                                                                                                                                                                                                                                                      kubernetes.replicaSet.replicas.running

                                                                                                                                                                                                                                                                                                                                                                      kube_replicaset_status_replicas

                                                                                                                                                                                                                                                                                                                                                                      kubernetes.replicaSet.replicas.desired

                                                                                                                                                                                                                                                                                                                                                                      kube_replicaset_spec_replicas

                                                                                                                                                                                                                                                                                                                                                                      kube_replicaset_owner

                                                                                                                                                                                                                                                                                                                                                                      • replicaset=<replicaset-name>

                                                                                                                                                                                                                                                                                                                                                                      • namespace=<replicaset-namespace>

                                                                                                                                                                                                                                                                                                                                                                      • owner_kind=<owner kind>

                                                                                                                                                                                                                                                                                                                                                                      • owner_name=<owner name>

                                                                                                                                                                                                                                                                                                                                                                      kube_replicaset_labels

                                                                                                                                                                                                                                                                                                                                                                      • label_replicaset_label=<replicaset_label>

                                                                                                                                                                                                                                                                                                                                                                      • replicaset=<replicaset-name>

                                                                                                                                                                                                                                                                                                                                                                      • namespace=<replicaset-namespace>

                                                                                                                                                                                                                                                                                                                                                                      statefulset

                                                                                                                                                                                                                                                                                                                                                                      kubernetes.statefulset.replicas

                                                                                                                                                                                                                                                                                                                                                                      kube_statefulset_replicas

                                                                                                                                                                                                                                                                                                                                                                      • statefulset=<statefulset-name>

                                                                                                                                                                                                                                                                                                                                                                      • namespace=<statefulset-namespace>

                                                                                                                                                                                                                                                                                                                                                                      kubernetes.statefulset.status.replicas

                                                                                                                                                                                                                                                                                                                                                                      kube_statefulset_status_replicas

                                                                                                                                                                                                                                                                                                                                                                      kubernetes.statefulset.status.replicas.current

                                                                                                                                                                                                                                                                                                                                                                      kube_statefulset_status_replicas_current

                                                                                                                                                                                                                                                                                                                                                                      kubernetes.statefulset.status.replicas.ready

                                                                                                                                                                                                                                                                                                                                                                      kube_statefulset_status_replicas_ready

                                                                                                                                                                                                                                                                                                                                                                      kubernetes.statefulset.status.replicas.updated

                                                                                                                                                                                                                                                                                                                                                                      kube_statefulset_status_replicas_updated

                                                                                                                                                                                                                                                                                                                                                                      kube_statefulset_labels

                                                                                                                                                                                                                                                                                                                                                                      hpa

                                                                                                                                                                                                                                                                                                                                                                      kubernetes.hpa.replicas.min

                                                                                                                                                                                                                                                                                                                                                                      kube_horizontalpodautoscaler_spec_min_replicas

                                                                                                                                                                                                                                                                                                                                                                      • hpa=<hpa-name>

                                                                                                                                                                                                                                                                                                                                                                      • namespace=<hpa-namespace>

                                                                                                                                                                                                                                                                                                                                                                      kubernetes.hpa.replicas.max

                                                                                                                                                                                                                                                                                                                                                                      kube_horizontalpodautoscaler_spec_max_replicas

                                                                                                                                                                                                                                                                                                                                                                      kubernetes.hpa.replicas.current

                                                                                                                                                                                                                                                                                                                                                                      kube_horizontalpodautoscaler_status_current_replicas

                                                                                                                                                                                                                                                                                                                                                                      kubernetes.hpa.replicas.desired

                                                                                                                                                                                                                                                                                                                                                                      kube_horizontalpodautoscaler_status_desired_replicas

                                                                                                                                                                                                                                                                                                                                                                      kube_horizontalpodautoscaler_labels

                                                                                                                                                                                                                                                                                                                                                                      resourcequota

                                                                                                                                                                                                                                                                                                                                                                      kubernetes.resourcequota.configmaps.hard

                                                                                                                                                                                                                                                                                                                                                                      kubernetes.resourcequota.configmaps.used

                                                                                                                                                                                                                                                                                                                                                                      kubernetes.resourcequota.limits.cpu.hard

                                                                                                                                                                                                                                                                                                                                                                      kubernetes.resourcequota.limits.cpu.used

                                                                                                                                                                                                                                                                                                                                                                      kubernetes.resourcequota.limits.memory.hard

                                                                                                                                                                                                                                                                                                                                                                      kubernetes.resourcequota.limits.memory.used

                                                                                                                                                                                                                                                                                                                                                                      kubernetes.resourcequota.persistentvolumeclaims.hard

                                                                                                                                                                                                                                                                                                                                                                      kubernetes.resourcequota.persistentvolumeclaims.used

                                                                                                                                                                                                                                                                                                                                                                      kubernetes.resourcequota.cpu.hard

                                                                                                                                                                                                                                                                                                                                                                      kubernetes.resourcequota.memory.hard

                                                                                                                                                                                                                                                                                                                                                                      kubernetes.resourcequota.pods.hard

                                                                                                                                                                                                                                                                                                                                                                      kubernetes.resourcequota.pods.used

                                                                                                                                                                                                                                                                                                                                                                      kubernetes.resourcequota.replicationcontrollers.hard

                                                                                                                                                                                                                                                                                                                                                                      kubernetes.resourcequota.replicationcontrollers.used

                                                                                                                                                                                                                                                                                                                                                                      kubernetes.resourcequota.requests.cpu.hard

                                                                                                                                                                                                                                                                                                                                                                      kubernetes.resourcequota.requests.cpu.used

                                                                                                                                                                                                                                                                                                                                                                      kubernetes.resourcequota.requests.memory.hard

                                                                                                                                                                                                                                                                                                                                                                      kubernetes.resourcequota.requests.memory.used

                                                                                                                                                                                                                                                                                                                                                                      kubernetes.resourcequota.requests.storage.hard

                                                                                                                                                                                                                                                                                                                                                                      kubernetes.resourcequota.requests.storage.used

                                                                                                                                                                                                                                                                                                                                                                      kubernetes.resourcequota.resourcequotas.hard

                                                                                                                                                                                                                                                                                                                                                                      kubernetes.resourcequota.resourcequotas.used

                                                                                                                                                                                                                                                                                                                                                                      kubernetes.resourcequota.secrets.hard

                                                                                                                                                                                                                                                                                                                                                                      kubernetes.resourcequota.secrets.used

                                                                                                                                                                                                                                                                                                                                                                      kubernetes.resourcequota.services.hard

                                                                                                                                                                                                                                                                                                                                                                      kubernetes.resourcequota.services.used

                                                                                                                                                                                                                                                                                                                                                                      kubernetes.resourcequota.services.loadbalancers.hard

                                                                                                                                                                                                                                                                                                                                                                      kubernetes.resourcequota.services.loadbalancers.used

                                                                                                                                                                                                                                                                                                                                                                      kubernetes.resourcequota.services.nodeports.hard

                                                                                                                                                                                                                                                                                                                                                                      kubernetes.resourcequota.services.nodeports.used

                                                                                                                                                                                                                                                                                                                                                                      kube_resourcequota

                                                                                                                                                                                                                                                                                                                                                                      • resourcequota=<quota-name>

                                                                                                                                                                                                                                                                                                                                                                      • namespace=<namespace>

                                                                                                                                                                                                                                                                                                                                                                      • resource=<ResourceName>

                                                                                                                                                                                                                                                                                                                                                                      • type=<quota-type>

                                                                                                                                                                                                                                                                                                                                                                      namespace

                                                                                                                                                                                                                                                                                                                                                                      kube_namespace_labels

                                                                                                                                                                                                                                                                                                                                                                      • namespace=<namespace-name>

                                                                                                                                                                                                                                                                                                                                                                      • label_ns_label=<ns_label>

                                                                                                                                                                                                                                                                                                                                                                      replicationcontroller

                                                                                                                                                                                                                                                                                                                                                                      kubernetes.replicationcontroller.replicas.desired

                                                                                                                                                                                                                                                                                                                                                                      kube_replicationcontroller_spec_replicase

                                                                                                                                                                                                                                                                                                                                                                      • replicationcontroller=<replicationcontroller-name>

                                                                                                                                                                                                                                                                                                                                                                      • namespace=<replicationcontroller-namespace>

                                                                                                                                                                                                                                                                                                                                                                      kubernetes.replicationcontroller.replicas.running

                                                                                                                                                                                                                                                                                                                                                                      kube_replicationcontroller_status_replicas

                                                                                                                                                                                                                                                                                                                                                                      kube_replicationcontroller_status_fully_labeled_replicas

                                                                                                                                                                                                                                                                                                                                                                      kube_replicationcontroller_status_ready_replicas

                                                                                                                                                                                                                                                                                                                                                                      kube_replicationcontroller_status_available_replicas

                                                                                                                                                                                                                                                                                                                                                                      kube_replicationcontroller_status_observed_generation

                                                                                                                                                                                                                                                                                                                                                                      kube_replicationcontroller_metadata_generation

                                                                                                                                                                                                                                                                                                                                                                      kube_replicationcontroller_created

                                                                                                                                                                                                                                                                                                                                                                      kube_replicationcontroller_owner

                                                                                                                                                                                                                                                                                                                                                                      • replicationcontroller=<replicationcontroller-name>

                                                                                                                                                                                                                                                                                                                                                                      • namespace=<replicationcontroller-namespace>

                                                                                                                                                                                                                                                                                                                                                                      • owner_kind=<owner kind>

                                                                                                                                                                                                                                                                                                                                                                      • owner_name=<owner name>

                                                                                                                                                                                                                                                                                                                                                                      service

                                                                                                                                                                                                                                                                                                                                                                      kube_service_info

                                                                                                                                                                                                                                                                                                                                                                      • service=<service-name>

                                                                                                                                                                                                                                                                                                                                                                      • namespace=<service-namespace>

                                                                                                                                                                                                                                                                                                                                                                      • cluster_ip=<service cluster ip>

                                                                                                                                                                                                                                                                                                                                                                      • external_name=<service external name>

                                                                                                                                                                                                                                                                                                                                                                      • load_balancer_ip=<service load balancer ip>

                                                                                                                                                                                                                                                                                                                                                                      kube_service_labels

                                                                                                                                                                                                                                                                                                                                                                      • service=<service-name>

                                                                                                                                                                                                                                                                                                                                                                      • namespace=<service-namespace>

                                                                                                                                                                                                                                                                                                                                                                      • label_service_label=<service_label>

                                                                                                                                                                                                                                                                                                                                                                      persistentvolume

                                                                                                                                                                                                                                                                                                                                                                      kubernetes.persistentvolume.storage

                                                                                                                                                                                                                                                                                                                                                                      kube_persistentvolume_capacity_bytes

                                                                                                                                                                                                                                                                                                                                                                      • persistentvolume=<pv-name>

                                                                                                                                                                                                                                                                                                                                                                      kube_persistentvolume_info

                                                                                                                                                                                                                                                                                                                                                                      • persistentvolume=<pv-name>

                                                                                                                                                                                                                                                                                                                                                                      kube_persistentvolume_labels

                                                                                                                                                                                                                                                                                                                                                                      • persistentvolume=<pv-name>

                                                                                                                                                                                                                                                                                                                                                                      • namespace=<label_persistentvolume_label=<persistentvolume_label>

                                                                                                                                                                                                                                                                                                                                                                      persistentvolumeclaim

                                                                                                                                                                                                                                                                                                                                                                      kubernetes.persistentvolumeclaim.requests.storage

                                                                                                                                                                                                                                                                                                                                                                      kube_persistentvolumeclaim_resource_requests_storage_bytes

                                                                                                                                                                                                                                                                                                                                                                      • namespace=<persistentvolumeclaim-namespace>

                                                                                                                                                                                                                                                                                                                                                                      • persistentvolumeclaim=<persistentvolumeclaim-name>

                                                                                                                                                                                                                                                                                                                                                                      kube_persistentvolumeclaim_info

                                                                                                                                                                                                                                                                                                                                                                      kube_persistentvolumeclaim_labels

                                                                                                                                                                                                                                                                                                                                                                      • persistentvolumeclaim=<persistentvolumeclaim-name>

                                                                                                                                                                                                                                                                                                                                                                      • namespace=<persistentvolumeclaim-namespace>

                                                                                                                                                                                                                                                                                                                                                                      • label_persistentvolumeclaim_label=<persistentvolumeclaim_label>

                                                                                                                                                                                                                                                                                                                                                                      10.1.4 -

                                                                                                                                                                                                                                                                                                                                                                      Run PromQL Queries Faster with Extended Label Set

                                                                                                                                                                                                                                                                                                                                                                      Sysdig allows you to run PromQL queries smoother and faster with the extended label set. The extended label set is created by augmenting the incoming data with the rich metadata associated with your infrastructure and making it available in PromQL.

                                                                                                                                                                                                                                                                                                                                                                      With this, you can troubleshoot a problem or building Dashboards and Alerts without the need to write complex queries. Sysdig automatically enriches your metrics with Kubernetes and application context without the need to instrument additional labels in your environment. This reduces operational complexity and cost—the enrichment takes place in Sysdig metric ingestion pipeline after time series have been sent to the backend.

                                                                                                                                                                                                                                                                                                                                                                      Calculate Memory Usage by Deployment in a Cluster

                                                                                                                                                                                                                                                                                                                                                                      Using the vector matching operation, you could run the following query and calculate the memory usage by deployment in a cluster:

                                                                                                                                                                                                                                                                                                                                                                      sum by(cluster,namespace,owner_name) ((sysdig_container_memory_used_bytes * on(container_id) group_left(pod,namespace,cluster) kube_pod_container_info) * on(pod,namespace,cluster) group_left(owner_name) kube_pod_owner{owner_kind="Deployment",owner_name=~".+",cluster=~".+",namespace=~".+"})
                                                                                                                                                                                                                                                                                                                                                                      

                                                                                                                                                                                                                                                                                                                                                                      To get the result, you need to write a query to perform a join (vector match) of various metrics, usually in the following order:

                                                                                                                                                                                                                                                                                                                                                                      • Grab a metric you need that is defined on a container level. For example, a Prometheus metric or some of the Sysdig provided metrics, such as sysdig_container_memory_used_byte.

                                                                                                                                                                                                                                                                                                                                                                      • Perform a vector match on container ID with the metric kube_pod_container_info to get the pod metadata.

                                                                                                                                                                                                                                                                                                                                                                      • Perform a vector match on the pod, namespace, and cluster with the kube_pod_owner metric.

                                                                                                                                                                                                                                                                                                                                                                      In the case of Sysdig’s extended label set for PromQL, all the metrics inherit the metadata, so that necessary container, host, and Kubernetes metadata are set on all the metrics. This simplifies the query so you can build and run it quickly.

                                                                                                                                                                                                                                                                                                                                                                      Likewise, the above query can be simplified as follows:

                                                                                                                                                                                                                                                                                                                                                                      sum by (kube_cluster_name,kube_namespace_name,kube_deployment_name) (sysdig_container_memory_used_bytes{kube_cluster_name!="",kube_namespace_name!="",kube_deployment_name!=""})
                                                                                                                                                                                                                                                                                                                                                                      

                                                                                                                                                                                                                                                                                                                                                                      The advantages of using a simplified query are:

                                                                                                                                                                                                                                                                                                                                                                      • Complex vector matching operations (the group_left and group_right operators) are no longer required. All the labels are already available on each of the metrics, and therefore, any filtering can be performed directly on the metric itself.

                                                                                                                                                                                                                                                                                                                                                                      • The metrics now will have a huge amount of labels. You can use PromQL Explorer to deal with this rich metadata.

                                                                                                                                                                                                                                                                                                                                                                      • The metadata is distinguishable from user-defined labels. For example, Kubernetes metadata labels start with kube_. For instance, cluster is replaced with kube_cluster_name.

                                                                                                                                                                                                                                                                                                                                                                      • Create a dashboard panel or an alert from the PromQL query you run in the PromQL Query Explore.

                                                                                                                                                                                                                                                                                                                                                                      • Filter data by applying the comparison operators on the label values given in the table.

                                                                                                                                                                                                                                                                                                                                                                      Examples for Simplifying Queries

                                                                                                                                                                                                                                                                                                                                                                      Given below are some of the examples of using the extended label set to simplify complex query operations.

                                                                                                                                                                                                                                                                                                                                                                      Memory Usage in a Kubernetes Cluster

                                                                                                                                                                                                                                                                                                                                                                      Query with core label set:

                                                                                                                                                                                                                                                                                                                                                                      avg by (agent_tag_cluster) ((sysdig_host_memory_used_bytes/sysdig_host_memory_total_bytes) * on(host,agent_tag_cluster) sysdig_host_info{agent_tag_cluster=~".+"}) * 100
                                                                                                                                                                                                                                                                                                                                                                      

                                                                                                                                                                                                                                                                                                                                                                      Query with the extended label set:

                                                                                                                                                                                                                                                                                                                                                                      avg by (agent_tag_cluster) (sysdig_host_memory_used_bytes/sysdig_host_memory_total_bytes) * 100
                                                                                                                                                                                                                                                                                                                                                                      

                                                                                                                                                                                                                                                                                                                                                                      CPU Usage in Containers

                                                                                                                                                                                                                                                                                                                                                                      Query with the core label set:

                                                                                                                                                                                                                                                                                                                                                                      sum by (cluster,namespace)(sysdig_container_cpu_cores_used * on (container_id) group_left(cluster,pod,namespace) kube_pod_container_info{cluster=~".+"})
                                                                                                                                                                                                                                                                                                                                                                      

                                                                                                                                                                                                                                                                                                                                                                      Simplified query with the extended label set:

                                                                                                                                                                                                                                                                                                                                                                      sum by (kube_cluster_name,kube_namespace_name)(sysdig_container_cpu_cores_used{kube_cluster_name=~".+"})
                                                                                                                                                                                                                                                                                                                                                                      

                                                                                                                                                                                                                                                                                                                                                                      Memory Usage in Daemonset

                                                                                                                                                                                                                                                                                                                                                                      Query with the core label set:

                                                                                                                                                                                                                                                                                                                                                                      sum by(cluster,namespace,owner_name) (sum by(pod) (label_replace(sysdig_container_memory_used_bytes * on(container_id,host_mac) group_left(label_io_kubernetes_pod_namespace,label_io_kubernetes_pod_name,label_io_kubernetes_container_name) sysdig_container_info{label_io_kubernetes_pod_namespace=~".*",cluster=~".*"},"pod","$1","label_io_kubernetes_pod_name","(.*)"))  * on(pod) group_right sum by(cluster,namespace,owner_name,pod) (kube_pod_owner{owner_kind=~"DaemonSet",owner_name=~".*",cluster=~".*",namespace=~".*"}))
                                                                                                                                                                                                                                                                                                                                                                      

                                                                                                                                                                                                                                                                                                                                                                      Simplified query with the extended label set:

                                                                                                                                                                                                                                                                                                                                                                      sum by(kube_cluster_name,kube_namespace_name,kube_daemonset_name) (sysdig_container_memory_used_bytes{kube_daemonset_name=~".*",kube_cluster_name=~".*",kube_namespace_name=~".*"})
                                                                                                                                                                                                                                                                                                                                                                      

                                                                                                                                                                                                                                                                                                                                                                      Pod Restarts in a Kubernetes Cluster

                                                                                                                                                                                                                                                                                                                                                                      Query with the core label set:

                                                                                                                                                                                                                                                                                                                                                                      sum by(cluster,namespace,owner_name)(changes(kube_pod_status_ready{condition="true",cluster=~$cluster,namespace=~$namespace}[$__interval]) * on(cluster,namespace,pod) group_left(owner_name) kube_pod_owner{owner_kind="Deployment",owner_name=~".+",cluster=~$cluster,namespace=~$namespace})
                                                                                                                                                                                                                                                                                                                                                                      

                                                                                                                                                                                                                                                                                                                                                                      Simplified query with the extended label set:

                                                                                                                                                                                                                                                                                                                                                                      sum by (kube_cluster_name,kube_namespace_name,kube_deployment_name)(changes(kube_pod_status_ready{condition="true",kube_cluster_name=~$cluster,kube_namespace_name=~$namespace,kube_deployment_name=~".+"}[$__interval]))
                                                                                                                                                                                                                                                                                                                                                                      

                                                                                                                                                                                                                                                                                                                                                                      Containers per Image

                                                                                                                                                                                                                                                                                                                                                                      Query with the core label set:

                                                                                                                                                                                                                                                                                                                                                                      count by (owner_name,image,cluster,namespace)((sysdig_container_info{cluster=~$cluster,namespace=~$namespace})  * on(pod,namespace,cluster) group_left(owner_name) max by (pod,namespace,cluster,owner_name)(kube_pod_owner{owner_kind="Deployment",owner_name=~".+"}))
                                                                                                                                                                                                                                                                                                                                                                      

                                                                                                                                                                                                                                                                                                                                                                      Simplified query with the extended label set:

                                                                                                                                                                                                                                                                                                                                                                      count by (kube_deployment_name,image,kube_cluster_name,kube_namespace_name)(sysdig_container_info{kube_deployment_name=~".+",kube_cluster_name=~$cluster,kube_namespace_name=~$namespace})
                                                                                                                                                                                                                                                                                                                                                                      

                                                                                                                                                                                                                                                                                                                                                                      Average TCP Queue per Node

                                                                                                                                                                                                                                                                                                                                                                      Query with the core label set:

                                                                                                                                                                                                                                                                                                                                                                      avg by (agent_tag_cluster,host)( sysdig_host_net_tcp_queue_len * on (host_mac) group_left(agent_tag_cluster,host) sysdig_host_info{agent_tag_cluster=~$cluster,host=~".+"})
                                                                                                                                                                                                                                                                                                                                                                      

                                                                                                                                                                                                                                                                                                                                                                      Simplified query with the extended label set:

                                                                                                                                                                                                                                                                                                                                                                      avg by (agent_tag_cluster,host_hostname) (sysdig_host_net_tcp_queue_len{agent_tag_cluster =~ $cluster})
                                                                                                                                                                                                                                                                                                                                                                      

                                                                                                                                                                                                                                                                                                                                                                      10.2 -

                                                                                                                                                                                                                                                                                                                                                                      Agent

                                                                                                                                                                                                                                                                                                                                                                      sysdig_agent_info

                                                                                                                                                                                                                                                                                                                                                                      |Prometheus ID |sysdig_agent_info | |Legacy ID |info | |Metric Type |gauge | |Unit |number | |Description |The metrics will always have the value of 1.| |Addional Notes| |

                                                                                                                                                                                                                                                                                                                                                                      sysdig_agent_timeseries_count_appcheck

                                                                                                                                                                                                                                                                                                                                                                      |Prometheus ID |sysdig_agent_timeseries_count_appcheck | |Legacy ID |metricCount.appCheck | |Metric Type |gauge | |Unit |number | |Description |The total number of time series received from appcheck integrations.| |Addional Notes| |

                                                                                                                                                                                                                                                                                                                                                                      sysdig_agent_timeseries_count_jmx

                                                                                                                                                                                                                                                                                                                                                                      |Prometheus ID |sysdig_agent_timeseries_count_jmx | |Legacy ID |metricCount.jmx | |Metric Type |gauge | |Unit |number | |Description |The total number of time series received from JMX integrations.| |Addional Notes| |

                                                                                                                                                                                                                                                                                                                                                                      sysdig_agent_timeseries_count_prometheus

                                                                                                                                                                                                                                                                                                                                                                      |Prometheus ID |sysdig_agent_timeseries_count_prometheus | |Legacy ID |metricCount.prometheus | |Metric Type |gauge | |Unit |number | |Description |The total number of time series received from Prometheus integrations.| |Addional Notes| |

                                                                                                                                                                                                                                                                                                                                                                      sysdig_agent_timeseries_count_statsd

                                                                                                                                                                                                                                                                                                                                                                      |Prometheus ID |sysdig_agent_timeseries_count_statsd | |Legacy ID |metricCount.statsd | |Metric Type |gauge | |Unit |number | |Description |The total number of time series received from StatsD integrations.| |Addional Notes| |

                                                                                                                                                                                                                                                                                                                                                                      10.3 -

                                                                                                                                                                                                                                                                                                                                                                      Containers

                                                                                                                                                                                                                                                                                                                                                                      sysdig_container_count

                                                                                                                                                                                                                                                                                                                                                                      |Prometheus ID |sysdig_container_count | |Legacy ID |container.count | |Metric Type |gauge | |Unit |number | |Description |The count of the number of containers. | |Addional Notes|This metric is perfect for dashboards and alerts. In particular, you can create alerts that notify you when you have too many (or too few) containers of a certain type in a certain group or node - try segmenting by container.image, .id or .name. See also: host.count.|

                                                                                                                                                                                                                                                                                                                                                                      sysdig_container_cpu_cgroup_used_percent

                                                                                                                                                                                                                                                                                                                                                                      |Prometheus ID |sysdig_container_cpu_cgroup_used_percent | |Legacy ID |cpu.cgroup.used.percent | |Metric Type |gauge | |Unit |percent | |Description |The percentage of a container’s cgroup limit that is actually used. This is the minimum usage for the underlying cgroup limits: cpuset.limit and quota.limit.| |Addional Notes| |

                                                                                                                                                                                                                                                                                                                                                                      sysdig_container_cpu_cores_cgroup_limit

                                                                                                                                                                                                                                                                                                                                                                      |Prometheus ID |sysdig_container_cpu_cores_cgroup_limit | |Legacy ID |cpu.cores.cgroup.limit | |Metric Type |gauge | |Unit |number | |Description |The number of CPU cores assigned to a container. This is the minimum of the cgroup limits: cpuset.limit and quota.limit.| |Addional Notes| |

                                                                                                                                                                                                                                                                                                                                                                      sysdig_container_cpu_cores_quota_limit

                                                                                                                                                                                                                                                                                                                                                                      |Prometheus ID |sysdig_container_cpu_cores_quota_limit | |Legacy ID |cpu.cores.quota.limit | |Metric Type |gauge | |Unit |number | |Description |The number of CPU cores assigned to a container. Technically, the container’s cgroup quota and period. This is a way of creating a CPU limit for a container.| |Addional Notes| |

                                                                                                                                                                                                                                                                                                                                                                      sysdig_container_cpu_cores_used

                                                                                                                                                                                                                                                                                                                                                                      |Prometheus ID |sysdig_container_cpu_cores_used | |Legacy ID |cpu.cores.used | |Metric Type |gauge | |Unit |number | |Description |The CPU core usage of each container is obtained from cgroups, and is equal to the number of cores used by the container. For example, if a container uses two of an available four cores, the value of sysdig_container_cpu_cores_used will be two.| |Addional Notes| |

                                                                                                                                                                                                                                                                                                                                                                      sysdig_container_cpu_cores_used_percent

                                                                                                                                                                                                                                                                                                                                                                      |Prometheus ID |sysdig_container_cpu_cores_used_percent | |Legacy ID |cpu.cores.used.percent | |Metric Type |gauge | |Unit |percent | |Description |The CPU core usage percent for each container is obtained from cgroups, and is equal to the number of cores multiplied by 100. For example, if a container uses three cores, the value of sysdig_container_cpu_cores_used_percent would be 300%.| |Addional Notes| |

                                                                                                                                                                                                                                                                                                                                                                      sysdig_container_cpu_quota_used_percent

                                                                                                                                                                                                                                                                                                                                                                      |Prometheus ID |sysdig_container_cpu_quota_used_percent | |Legacy ID |cpu.quota.used.percent | |Metric Type |gauge | |Unit |percent | |Description |The percentage of a container’s CPU Quota that is actually used. CPU Quotas are a common way of creating a CPU limit for a container. CPU Quotas are based on a percentage of time - a container can only spend its quota of time on CPU cycles across a given time period (default period is 100ms). Note that, unlike CPU Shares, CPU Quota is a hard limit to the amount of CPU the container can use - so this metric, CPU Quota %, should not exceed 100%.| |Addional Notes| |

                                                                                                                                                                                                                                                                                                                                                                      sysdig_container_cpu_shares_count

                                                                                                                                                                                                                                                                                                                                                                      |Prometheus ID |sysdig_container_cpu_shares_count | |Legacy ID |cpu.shares.count | |Metric Type |gauge | |Unit |number | |Description |The number of CPU shares assigned to a container (technically, the container’s cgroup) - this is a common way of creating a CPU limit for a container. CPU Shares represent a relative weight used by the kernel to distribute CPU cycles across different containers. The default value for a container is 1024. Each container receives its own allocation of CPU cycles, according to the ratio of it’s share count vs to the total number of shares claimed by all containers. For example, if you have three containers, each with 1024 shares, then each will recieve 1/3 of the CPU cycles. Note that this is not a hard limit: a container can consume more than its allocation, if the CPU has cycles that aren’t being consumed by the container they were originally allocated to.| |Addional Notes| |

                                                                                                                                                                                                                                                                                                                                                                      sysdig_container_cpu_shares_used_percent

                                                                                                                                                                                                                                                                                                                                                                      |Prometheus ID |sysdig_container_cpu_shares_used_percent | |Legacy ID |cpu.shares.used.percent | |Metric Type |gauge | |Unit |percent | |Description |The percentage of a container’s allocated CPU shares that are actually used. CPU Shares are a common way of creating a CPU limit for a container. CPU Shares represent a relative weight used by the kernel to distribute CPU cycles across different containers. The default value for a container is 1024. Each container receives its own allocation of CPU cycles, according to the ratio of it’s share count vs to the total number of shares claimed by all containers. For example, if you have three containers, each with 1024 shares, then each will recieve 1/3 of the CPU cycles. Note that this is not a hard limit: a container can consume more than its allocation, if the CPU has cycles that aren’t being consumed by the container they were originally allocated to - so this metric, CPU Shares %, can actually exceed 100%.| |Addional Notes| |

                                                                                                                                                                                                                                                                                                                                                                      sysdig_container_cpu_used_percent

                                                                                                                                                                                                                                                                                                                                                                      |Prometheus ID |sysdig_container_cpu_used_percent | |Legacy ID |cpu.used.percent | |Metric Type |gauge | |Unit |percent | |Description |The CPU usage for each container is obtained from cgroups, and normalized by dividing by the number of cores to determine an overall percentage. For example, if the environment contains six cores on a host, and the container or processes are assigned two cores, Sysdig will report CPU usage of 2/6 * 100% = 33.33%. This metric is calculated differently for hosts and processes.| |Addional Notes| |

                                                                                                                                                                                                                                                                                                                                                                      sysdig_container_fd_used_percent

                                                                                                                                                                                                                                                                                                                                                                      |Prometheus ID |sysdig_container_fd_used_percent | |Legacy ID |fd.used.percent | |Metric Type |gauge | |Unit |percent | |Description |The percentage of used file descriptors out of the maximum available. | |Addional Notes|Usually, when a process reaches its FD limit it will stop operating properly and possibly crash. As a consequence, this is a metric you want to monitor carefully, or even better use for alerts.|

                                                                                                                                                                                                                                                                                                                                                                      sysdig_container_file_error_open_count

                                                                                                                                                                                                                                                                                                                                                                      |Prometheus ID |sysdig_container_file_error_open_count | |Legacy ID |file.error.open.count | |Metric Type |counter | |Unit |number | |Description |The number of errors in opening files. | |Addional Notes|By default, this metric shows the total value for the selected scope. For instance, if you apply it to a group of machines, you will see the total value for the whole group. However, you can easily segment the metric to see it by host, process, container, and so on. Just use ‘Segment by’ in the UI.|

                                                                                                                                                                                                                                                                                                                                                                      sysdig_container_file_error_total_count

                                                                                                                                                                                                                                                                                                                                                                      |Prometheus ID |sysdig_container_file_error_total_count | |Legacy ID |file.error.total.count | |Metric Type |counter | |Unit |number | |Description |The number of error caused by file access. | |Addional Notes|By default, this metric shows the total value for the selected scope. For instance, if you apply it to a group of machines, you will see the total value for the whole group. However, you can easily segment the metric to see it by host, process, container, and so on. Just use ‘Segment by’ in the UI.|

                                                                                                                                                                                                                                                                                                                                                                      sysdig_container_file_in_bytes

                                                                                                                                                                                                                                                                                                                                                                      |Prometheus ID |sysdig_container_file_in_bytes | |Legacy ID |file.bytes.in | |Metric Type |counter | |Unit |data | |Description |The amount of bytes read from file. | |Addional Notes|By default, this metric shows the total value for the selected scope. For instance, if you apply it to a group of machines, you will see the total value for the whole group. However, you can easily segment the metric to see it by host, process, container, and so on. Just use ‘Segment by’ in the UI.|

                                                                                                                                                                                                                                                                                                                                                                      sysdig_container_file_in_iops

                                                                                                                                                                                                                                                                                                                                                                      |Prometheus ID |sysdig_container_file_in_iops | |Legacy ID |file.iops.in | |Metric Type |counter | |Unit |number | |Description |The number of file read operations per second. | |Addional Notes|This is calculated by measuring the actual number of read and write requests made by a process. Therefore, it can differ from what other tools show, which is usually based on interpolating this value from the number of bytes read and written to the file system.|

                                                                                                                                                                                                                                                                                                                                                                      sysdig_container_file_in_time

                                                                                                                                                                                                                                                                                                                                                                      |Prometheus ID |sysdig_container_file_in_time | |Legacy ID |file.time.in | |Metric Type |counter | |Unit |time | |Description |The time spent in file reading. | |Addional Notes|By default, this metric shows the total value for the selected scope. For instance, if you apply it to a group of machines, you will see the total value for the whole group. However, you can easily segment the metric to see it by host, process, container, and so on. Just use ‘Segment by’ in the UI.|

                                                                                                                                                                                                                                                                                                                                                                      sysdig_container_file_open_count

                                                                                                                                                                                                                                                                                                                                                                      |Prometheus ID |sysdig_container_file_open_count | |Legacy ID |file.open.count | |Metric Type |counter | |Unit |number | |Description |The number of time the file has been opened.| |Addional Notes| |

                                                                                                                                                                                                                                                                                                                                                                      sysdig_container_file_out_bytes

                                                                                                                                                                                                                                                                                                                                                                      |Prometheus ID |sysdig_container_file_out_bytes | |Legacy ID |file.bytes.out | |Metric Type |counter | |Unit |data | |Description |The number of of bytes written to file. | |Addional Notes|By default, this metric shows the total value for the selected scope. For instance, if you apply it to a group of machines, you will see the total value for the whole group. However, you can easily segment the metric to see it by host, process, container, and so on. Just use ‘Segment by’ in the UI.|

                                                                                                                                                                                                                                                                                                                                                                      sysdig_container_file_out_iops

                                                                                                                                                                                                                                                                                                                                                                      |Prometheus ID |sysdig_container_file_out_iops | |Legacy ID |file.iops.out | |Metric Type |counter | |Unit |number | |Description |The Number of file write operations per second. | |Addional Notes|This is calculated by measuring the actual number of read and write requests made by a process. Therefore, it can differ from what other tools show, which is usually based on interpolating this value from the number of bytes read and written to the file system.|

                                                                                                                                                                                                                                                                                                                                                                      sysdig_container_file_out_time

                                                                                                                                                                                                                                                                                                                                                                      |Prometheus ID |sysdig_container_file_out_time | |Legacy ID |file.time.out | |Metric Type |counter | |Unit |time | |Description |The time spent in file writing. | |Addional Notes|By default, this metric shows the total value for the selected scope. For instance, if you apply it to a group of machines, you will see the total value for the whole group. However, you can easily segment the metric to see it by host, process, container, and so on. Just use ‘Segment by’ in the UI.|

                                                                                                                                                                                                                                                                                                                                                                      sysdig_container_file_total_bytes

                                                                                                                                                                                                                                                                                                                                                                      |Prometheus ID |sysdig_container_file_total_bytes | |Legacy ID |file.bytes.total | |Metric Type |counter | |Unit |data | |Description |The number of bytes read from and written to file. | |Addional Notes|By default, this metric shows the total value for the selected scope. For instance, if you apply it to a group of machines, you will see the total value for the whole group. However, you can easily segment the metric to see it by host, process, container, and so on. Just use ‘Segment by’ in the UI.|

                                                                                                                                                                                                                                                                                                                                                                      sysdig_container_file_total_iops

                                                                                                                                                                                                                                                                                                                                                                      |Prometheus ID |sysdig_container_file_total_iops | |Legacy ID |file.iops.total | |Metric Type |counter | |Unit |number | |Description |The number of read and write file operations per second. | |Addional Notes|This is calculated by measuring the actual number of read and write requests made by a process. Therefore, it can differ from what other tools show, which is usually based on interpolating this value from the number of bytes read and written to the file system.|

                                                                                                                                                                                                                                                                                                                                                                      sysdig_container_file_total_time

                                                                                                                                                                                                                                                                                                                                                                      |Prometheus ID |sysdig_container_file_total_time | |Legacy ID |file.time.total | |Metric Type |counter | |Unit |time | |Description |The time spent in file I/O. | |Addional Notes|By default, this metric shows the total value for the selected scope. For instance, if you apply it to a group of machines, you will see the total value for the whole group. However, you can easily segment the metric to see it by host, process, container, and so on. Just use ‘Segment by’ in the UI.|

                                                                                                                                                                                                                                                                                                                                                                      sysdig_container_fs_free_bytes

                                                                                                                                                                                                                                                                                                                                                                      |Prometheus ID |sysdig_container_fs_free_bytes | |Legacy ID |fs.bytes.free | |Metric Type |gauge | |Unit |data | |Description |The available space in the filesystem.| |Addional Notes| |

                                                                                                                                                                                                                                                                                                                                                                      sysdig_container_fs_free_percent

                                                                                                                                                                                                                                                                                                                                                                      |Prometheus ID |sysdig_container_fs_free_percent | |Legacy ID |fs.free.percent | |Metric Type |gauge | |Unit |percent | |Description |The percentage of free space in the filesystem.| |Addional Notes| |

                                                                                                                                                                                                                                                                                                                                                                      sysdig_container_fs_inodes_total_count

                                                                                                                                                                                                                                                                                                                                                                      |Prometheus ID |sysdig_container_fs_inodes_total_count | |Legacy ID |fs.inodes.total.count | |Metric Type |gauge | |Unit |number | |Description |The total number of inodes in the filesystem.| |Addional Notes| |

                                                                                                                                                                                                                                                                                                                                                                      sysdig_container_fs_inodes_used_count

                                                                                                                                                                                                                                                                                                                                                                      |Prometheus ID |sysdig_container_fs_inodes_used_count | |Legacy ID |fs.inodes.used.count | |Metric Type |gauge | |Unit |number | |Description |The number of inodes used in the filesystem.| |Addional Notes| |

                                                                                                                                                                                                                                                                                                                                                                      sysdig_container_fs_inodes_used_percent

                                                                                                                                                                                                                                                                                                                                                                      |Prometheus ID |sysdig_container_fs_inodes_used_percent | |Legacy ID |fs.inodes.used.percent | |Metric Type |gauge | |Unit |percent | |Description |The percentage of inodes usage in the filesystem.| |Addional Notes| |

                                                                                                                                                                                                                                                                                                                                                                      sysdig_container_fs_largest_used_percent

                                                                                                                                                                                                                                                                                                                                                                      |Prometheus ID |sysdig_container_fs_largest_used_percent | |Legacy ID |fs.largest.used.percent | |Metric Type |gauge | |Unit |percent | |Description |The percentage of the largest filesystem in use.| |Addional Notes| |

                                                                                                                                                                                                                                                                                                                                                                      sysdig_container_fs_root_used_percent

                                                                                                                                                                                                                                                                                                                                                                      |Prometheus ID |sysdig_container_fs_root_used_percent | |Legacy ID |fs.root.used.percent | |Metric Type |gauge | |Unit |percent | |Description |The percentage of the root filesystem in use in the container.| |Addional Notes| |

                                                                                                                                                                                                                                                                                                                                                                      sysdig_container_fs_total_bytes

                                                                                                                                                                                                                                                                                                                                                                      |Prometheus ID |sysdig_container_fs_total_bytes | |Legacy ID |fs.bytes.total | |Metric Type |gauge | |Unit |data | |Description |The size of container filesystem.| |Addional Notes| |

                                                                                                                                                                                                                                                                                                                                                                      sysdig_container_fs_used_bytes

                                                                                                                                                                                                                                                                                                                                                                      |Prometheus ID |sysdig_container_fs_used_bytes | |Legacy ID |fs.bytes.used | |Metric Type |gauge | |Unit |data | |Description |The used space in the container filesystem.| |Addional Notes| |

                                                                                                                                                                                                                                                                                                                                                                      sysdig_container_fs_used_percent

                                                                                                                                                                                                                                                                                                                                                                      |Prometheus ID |sysdig_container_fs_used_percent | |Legacy ID |fs.used.percent | |Metric Type |gauge | |Unit |percent | |Description |The percentage of the sum of all filesystems in use in the container.| |Addional Notes| |

                                                                                                                                                                                                                                                                                                                                                                      sysdig_container_info

                                                                                                                                                                                                                                                                                                                                                                      |Prometheus ID |sysdig_container_info | |Legacy ID |info | |Metric Type |gauge | |Unit |number | |Description |The info metrics will always have the value of 1.| |Addional Notes| |

                                                                                                                                                                                                                                                                                                                                                                      sysdig_container_memory_limit_bytes

                                                                                                                                                                                                                                                                                                                                                                      |Prometheus ID |sysdig_container_memory_limit_bytes | |Legacy ID |memory.limit.bytes | |Metric Type |gauge | |Unit |data | |Description |The memory limit in bytes assigned to a container.| |Addional Notes| |

                                                                                                                                                                                                                                                                                                                                                                      sysdig_container_memory_limit_used_percent

                                                                                                                                                                                                                                                                                                                                                                      |Prometheus ID |sysdig_container_memory_limit_used_percent | |Legacy ID |memory.limit.used.percent | |Metric Type |gauge | |Unit |percent | |Description |The percentage of memory limit used by a container.| |Addional Notes| |

                                                                                                                                                                                                                                                                                                                                                                      sysdig_container_memory_used_bytes

                                                                                                                                                                                                                                                                                                                                                                      |Prometheus ID |sysdig_container_memory_used_bytes | |Legacy ID |memory.bytes.used | |Metric Type |gauge | |Unit |data | |Description |The amount of physical memory currently in use. | |Addional Notes|By default, this metric shows the average value for the selected scope. For instance, if you apply it to a group of machines, you will see the average value for the whole group. However, the metric can also be segmented by using ‘Segment by’ in the UI.|

                                                                                                                                                                                                                                                                                                                                                                      sysdig_container_memory_used_percent

                                                                                                                                                                                                                                                                                                                                                                      |Prometheus ID |sysdig_container_memory_used_percent | |Legacy ID |memory.used.percent | |Metric Type |gauge | |Unit |percent | |Description |The percentage of physical memory in use. | |Addional Notes|By default, this metric shows the average value for the selected scope. For instance, if you apply it to a group of machines, you will see the average value for the whole group. However, you can easily segment the metric to see it by host, process, container, and so on. Just use ‘Segment by’ in the UI.|

                                                                                                                                                                                                                                                                                                                                                                      sysdig_container_memory_virtual_bytes

                                                                                                                                                                                                                                                                                                                                                                      |Prometheus ID |sysdig_container_memory_virtual_bytes | |Legacy ID |memory.bytes.virtual | |Metric Type |gauge | |Unit |data | |Description |The virtual memory size of the process, in bytes. This value is obtained from Sysdig events.| |Addional Notes| |

                                                                                                                                                                                                                                                                                                                                                                      sysdig_container_net_connection_in_count

                                                                                                                                                                                                                                                                                                                                                                      |Prometheus ID |sysdig_container_net_connection_in_count | |Legacy ID |net.connection.count.in | |Metric Type |counter | |Unit |number | |Description |The number of currently established client (inbound) connections. | |Addional Notes|This metric is especially useful when segmented by protocol, port or process.|

                                                                                                                                                                                                                                                                                                                                                                      sysdig_container_net_connection_out_count

                                                                                                                                                                                                                                                                                                                                                                      |Prometheus ID |sysdig_container_net_connection_out_count | |Legacy ID |net.connection.count.out | |Metric Type |counter | |Unit |number | |Description |The number of currently established server (outbound) connections. | |Addional Notes|This metric is especially useful when segmented by protocol, port or process.|

                                                                                                                                                                                                                                                                                                                                                                      sysdig_container_net_connection_total_count

                                                                                                                                                                                                                                                                                                                                                                      |Prometheus ID |sysdig_container_net_connection_total_count | |Legacy ID |net.connection.count.total | |Metric Type |counter | |Unit |number | |Description |The number of currently established connections. This value may exceed the sum of the inbound and outbound metrics since it represents client and server inter-host connections as well as internal only connections.| |Addional Notes|This metric is especially useful when segmented by protocol, port or process. |

                                                                                                                                                                                                                                                                                                                                                                      sysdig_container_net_error_count

                                                                                                                                                                                                                                                                                                                                                                      |Prometheus ID |sysdig_container_net_error_count | |Legacy ID |net.error.count | |Metric Type |counter | |Unit |number | |Description |The number of network errors. | |Addional Notes|By default, this metric shows the total value for the selected scope. For instance, if you apply it to a group of machines, you will see the total value for the whole group. However, you can easily segment the metric to see it by host, process, container, and so on. Just use ‘Segment by’ in the UI.|

                                                                                                                                                                                                                                                                                                                                                                      sysdig_container_net_http_error_count

                                                                                                                                                                                                                                                                                                                                                                      |Prometheus ID |sysdig_container_net_http_error_count | |Legacy ID |net.http.error.count | |Metric Type |counter | |Unit |number | |Description |The number of failed HTTP requests as counted from 4xx/5xx status codes.| |Addional Notes| |

                                                                                                                                                                                                                                                                                                                                                                      sysdig_container_net_http_request_count

                                                                                                                                                                                                                                                                                                                                                                      |Prometheus ID |sysdig_container_net_http_request_count| |Legacy ID |net.http.request.count | |Metric Type |counter | |Unit |number | |Description |The count of HTTP requests. | |Addional Notes| |

                                                                                                                                                                                                                                                                                                                                                                      sysdig_container_net_http_request_time

                                                                                                                                                                                                                                                                                                                                                                      |Prometheus ID |sysdig_container_net_http_request_time | |Legacy ID |net.http.request.time | |Metric Type |counter | |Unit |time | |Description |The average time taken for HTTP requests.| |Addional Notes| |

                                                                                                                                                                                                                                                                                                                                                                      sysdig_container_net_http_statuscode_error_count

                                                                                                                                                                                                                                                                                                                                                                      |Prometheus ID |sysdig_container_net_http_statuscode_error_count| |Legacy ID |net.http.statuscode.error.count | |Metric Type |counter | |Unit |number | |Description |The number of HTTP error codes returned. | |Addional Notes| |

                                                                                                                                                                                                                                                                                                                                                                      sysdig_container_net_http_statuscode_request_count

                                                                                                                                                                                                                                                                                                                                                                      |Prometheus ID |sysdig_container_net_http_statuscode_request_count| |Legacy ID |net.http.statuscode.request.count | |Metric Type |counter | |Unit |number | |Description |The number of HTTP status codes requests. | |Addional Notes| |

                                                                                                                                                                                                                                                                                                                                                                      sysdig_container_net_http_url_error_count

                                                                                                                                                                                                                                                                                                                                                                      |Prometheus ID |sysdig_container_net_http_url_error_count| |Legacy ID |net.http.url.error.count | |Metric Type |counter | |Unit |number | |Description | | |Addional Notes| |

                                                                                                                                                                                                                                                                                                                                                                      sysdig_container_net_http_url_request_count

                                                                                                                                                                                                                                                                                                                                                                      |Prometheus ID |sysdig_container_net_http_url_request_count| |Legacy ID |net.http.url.request.count | |Metric Type |counter | |Unit |number | |Description |The number of HTTP URLs requests. | |Addional Notes| |

                                                                                                                                                                                                                                                                                                                                                                      sysdig_container_net_http_url_request_time

                                                                                                                                                                                                                                                                                                                                                                      |Prometheus ID |sysdig_container_net_http_url_request_time| |Legacy ID |net.http.url.request.time | |Metric Type |counter | |Unit |time | |Description |The time taken for requesting HTTP URLs. | |Addional Notes| |

                                                                                                                                                                                                                                                                                                                                                                      sysdig_container_net_in_bytes

                                                                                                                                                                                                                                                                                                                                                                      |Prometheus ID |sysdig_container_net_in_bytes | |Legacy ID |net.bytes.in | |Metric Type |counter | |Unit |data | |Description |The number of inbound network bytes. | |Addional Notes|By default, this metric shows the total value for the selected scope. For instance, if you apply it to a group of machines, you will see the total value for the whole group. However, you can easily segment the metric to see it by host, process, container, and so on. Just use ‘Segment by’ in the UI.|

                                                                                                                                                                                                                                                                                                                                                                      sysdig_container_net_mongodb_error_count

                                                                                                                                                                                                                                                                                                                                                                      |Prometheus ID |sysdig_container_net_mongodb_error_count| |Legacy ID |net.mongodb.error.count | |Metric Type |counter | |Unit |number | |Description |The number of Failed MongoDB requests. | |Addional Notes| |

                                                                                                                                                                                                                                                                                                                                                                      sysdig_container_net_mongodb_request_count

                                                                                                                                                                                                                                                                                                                                                                      |Prometheus ID |sysdig_container_net_mongodb_request_count| |Legacy ID |net.mongodb.request.count | |Metric Type |counter | |Unit |number | |Description |The total number of MongoDB requests. | |Addional Notes| |

                                                                                                                                                                                                                                                                                                                                                                      sysdig_container_net_out_bytes

                                                                                                                                                                                                                                                                                                                                                                      |Prometheus ID |sysdig_container_net_out_bytes | |Legacy ID |net.bytes.out | |Metric Type |counter | |Unit |data | |Description |The number of outbound network bytes. | |Addional Notes|By default, this metric shows the total value for the selected scope. For instance, if you apply it to a group of machines, you will see the total value for the whole group. However, you can easily segment the metric to see it by host, process, container, and so on. Just use ‘Segment by’ in the UI.|

                                                                                                                                                                                                                                                                                                                                                                      sysdig_container_net_request_count

                                                                                                                                                                                                                                                                                                                                                                      |Prometheus ID |sysdig_container_net_request_count | |Legacy ID |net.request.count | |Metric Type |counter | |Unit |number | |Description |The total number of network requests. Note, this value may exceed the sum of inbound and outbound requests, because this count includes requests over internal connections.| |Addional Notes| |

                                                                                                                                                                                                                                                                                                                                                                      sysdig_container_net_request_in_count

                                                                                                                                                                                                                                                                                                                                                                      |Prometheus ID |sysdig_container_net_request_in_count | |Legacy ID |net.request.count.in | |Metric Type |counter | |Unit |number | |Description |The number of inbound network requests.| |Addional Notes| |

                                                                                                                                                                                                                                                                                                                                                                      sysdig_container_net_request_in_time

                                                                                                                                                                                                                                                                                                                                                                      |Prometheus ID |sysdig_container_net_request_in_time | |Legacy ID |net.request.time.in | |Metric Type |counter | |Unit |time | |Description |The average time to serve an inbound request.| |Addional Notes| |

                                                                                                                                                                                                                                                                                                                                                                      sysdig_container_net_request_out_count

                                                                                                                                                                                                                                                                                                                                                                      |Prometheus ID |sysdig_container_net_request_out_count | |Legacy ID |net.request.count.out | |Metric Type |counter | |Unit |number | |Description |The number of outbound network requests.| |Addional Notes| |

                                                                                                                                                                                                                                                                                                                                                                      sysdig_container_net_request_out_time

                                                                                                                                                                                                                                                                                                                                                                      |Prometheus ID |sysdig_container_net_request_out_time | |Legacy ID |net.request.time.out | |Metric Type |counter | |Unit |time | |Description |The average time spent waiting for an outbound request.| |Addional Notes| |

                                                                                                                                                                                                                                                                                                                                                                      sysdig_container_net_request_time

                                                                                                                                                                                                                                                                                                                                                                      |Prometheus ID |sysdig_container_net_request_time | |Legacy ID |net.request.time | |Metric Type |counter | |Unit |time | |Description |The average time to serve a network request.| |Addional Notes| |

                                                                                                                                                                                                                                                                                                                                                                      sysdig_container_net_server_connection_in_count

                                                                                                                                                                                                                                                                                                                                                                      |Prometheus ID |sysdig_container_net_server_connection_in_count| |Legacy ID |net.server.connection.count.in | |Metric Type |counter | |Unit |number | |Description | | |Addional Notes| |

                                                                                                                                                                                                                                                                                                                                                                      sysdig_container_net_server_in_bytes

                                                                                                                                                                                                                                                                                                                                                                      |Prometheus ID |sysdig_container_net_server_in_bytes| |Legacy ID |net.server.bytes.in | |Metric Type |counter | |Unit |data | |Description | | |Addional Notes| |

                                                                                                                                                                                                                                                                                                                                                                      sysdig_container_net_server_out_bytes

                                                                                                                                                                                                                                                                                                                                                                      |Prometheus ID |sysdig_container_net_server_out_bytes| |Legacy ID |net.server.bytes.out | |Metric Type |counter | |Unit |data | |Description | | |Addional Notes| |

                                                                                                                                                                                                                                                                                                                                                                      sysdig_container_net_server_total_bytes

                                                                                                                                                                                                                                                                                                                                                                      |Prometheus ID |sysdig_container_net_server_total_bytes| |Legacy ID |net.server.bytes.total | |Metric Type |counter | |Unit |data | |Description | | |Addional Notes| |

                                                                                                                                                                                                                                                                                                                                                                      sysdig_container_net_sql_error_count

                                                                                                                                                                                                                                                                                                                                                                      |Prometheus ID |sysdig_container_net_sql_error_count| |Legacy ID |net.sql.error.count | |Metric Type |counter | |Unit |number | |Description |The number of failed SQL requests. | |Addional Notes| |

                                                                                                                                                                                                                                                                                                                                                                      sysdig_container_net_sql_query_error_count

                                                                                                                                                                                                                                                                                                                                                                      |Prometheus ID |sysdig_container_net_sql_query_error_count| |Legacy ID |net.sql.query.error.count | |Metric Type |counter | |Unit |number | |Description | | |Addional Notes| |

                                                                                                                                                                                                                                                                                                                                                                      sysdig_container_net_sql_query_request_count

                                                                                                                                                                                                                                                                                                                                                                      |Prometheus ID |sysdig_container_net_sql_query_request_count| |Legacy ID |net.sql.query.request.count | |Metric Type |counter | |Unit |number | |Description | | |Addional Notes| |

                                                                                                                                                                                                                                                                                                                                                                      sysdig_container_net_sql_query_request_time

                                                                                                                                                                                                                                                                                                                                                                      |Prometheus ID |sysdig_container_net_sql_query_request_time| |Legacy ID |net.sql.query.request.time | |Metric Type |counter | |Unit |time | |Description | | |Addional Notes| |

                                                                                                                                                                                                                                                                                                                                                                      sysdig_container_net_sql_querytype_error_count

                                                                                                                                                                                                                                                                                                                                                                      |Prometheus ID |sysdig_container_net_sql_querytype_error_count| |Legacy ID |net.sql.querytype.error.count | |Metric Type |counter | |Unit |number | |Description | | |Addional Notes| |

                                                                                                                                                                                                                                                                                                                                                                      sysdig_container_net_sql_querytype_request_count

                                                                                                                                                                                                                                                                                                                                                                      |Prometheus ID |sysdig_container_net_sql_querytype_request_count| |Legacy ID |net.sql.querytype.request.count | |Metric Type |counter | |Unit |number | |Description | | |Addional Notes| |

                                                                                                                                                                                                                                                                                                                                                                      sysdig_container_net_sql_querytype_request_time

                                                                                                                                                                                                                                                                                                                                                                      |Prometheus ID |sysdig_container_net_sql_querytype_request_time| |Legacy ID |net.sql.querytype.request.time | |Metric Type |counter | |Unit |time | |Description | | |Addional Notes| |

                                                                                                                                                                                                                                                                                                                                                                      sysdig_container_net_sql_request_count

                                                                                                                                                                                                                                                                                                                                                                      |Prometheus ID |sysdig_container_net_sql_request_count| |Legacy ID |net.sql.request.count | |Metric Type |counter | |Unit |number | |Description |The number of SQL requests. | |Addional Notes| |

                                                                                                                                                                                                                                                                                                                                                                      sysdig_container_net_sql_request_time

                                                                                                                                                                                                                                                                                                                                                                      |Prometheus ID |sysdig_container_net_sql_request_time | |Legacy ID |net.sql.request.time | |Metric Type |counter | |Unit |time | |Description |The average time to complete an SQL request.| |Addional Notes| |

                                                                                                                                                                                                                                                                                                                                                                      sysdig_container_net_sql_table_error_count

                                                                                                                                                                                                                                                                                                                                                                      |Prometheus ID |sysdig_container_net_sql_table_error_count| |Legacy ID |net.sql.table.error.count | |Metric Type |counter | |Unit |number | |Description |The total number of SQL errors returned. | |Addional Notes| |

                                                                                                                                                                                                                                                                                                                                                                      sysdig_container_net_sql_table_request_count

                                                                                                                                                                                                                                                                                                                                                                      |Prometheus ID |sysdig_container_net_sql_table_request_count| |Legacy ID |net.sql.table.request.count | |Metric Type |counter | |Unit |number | |Description |The total number of SQL table requests. | |Addional Notes| |

                                                                                                                                                                                                                                                                                                                                                                      sysdig_container_net_sql_table_request_time

                                                                                                                                                                                                                                                                                                                                                                      |Prometheus ID |sysdig_container_net_sql_table_request_time | |Legacy ID |net.sql.table.request.time | |Metric Type |counter | |Unit |time | |Description |The average time to serve an SQL table request.| |Addional Notes| |

                                                                                                                                                                                                                                                                                                                                                                      sysdig_container_net_tcp_queue_len

                                                                                                                                                                                                                                                                                                                                                                      |Prometheus ID |sysdig_container_net_tcp_queue_len | |Legacy ID |net.tcp.queue.len | |Metric Type |counter | |Unit |number | |Description |The length of the TCP request queue.| |Addional Notes| |

                                                                                                                                                                                                                                                                                                                                                                      sysdig_container_net_total_bytes

                                                                                                                                                                                                                                                                                                                                                                      |Prometheus ID |sysdig_container_net_total_bytes | |Legacy ID |net.bytes.total | |Metric Type |counter | |Unit |data | |Description |The total number of network bytes, including inbound and outbound connections. | |Addional Notes|By default, this metric shows the total value for the selected scope. For instance, if you apply it to a group of machines, you will see the total value for the whole group. However, you can easily segment the metric to see it by host, process, container, and so on. Just use ‘Segment by’ in the UI.|

                                                                                                                                                                                                                                                                                                                                                                      sysdig_container_proc_count

                                                                                                                                                                                                                                                                                                                                                                      |Prometheus ID |sysdig_container_proc_count | |Legacy ID |proc.count | |Metric Type |counter | |Unit |number | |Description |The number of processes on host or container.| |Addional Notes| |

                                                                                                                                                                                                                                                                                                                                                                      sysdig_container_swap_limit_bytes

                                                                                                                                                                                                                                                                                                                                                                      |Prometheus ID |sysdig_container_swap_limit_bytes | |Legacy ID |swap.limit.bytes | |Metric Type |gauge | |Unit |data | |Description |The swap limit in bytes assigned to a container.| |Addional Notes| |

                                                                                                                                                                                                                                                                                                                                                                      sysdig_container_swap_limit_used_percent

                                                                                                                                                                                                                                                                                                                                                                      |Prometheus ID |sysdig_container_swap_limit_used_percent | |Legacy ID |swap.limit.used.percent | |Metric Type |gauge | |Unit |percent | |Description |The percentage of swap limit used by the container.| |Addional Notes| |

                                                                                                                                                                                                                                                                                                                                                                      sysdig_container_syscall_count

                                                                                                                                                                                                                                                                                                                                                                      |Prometheus ID |sysdig_container_syscall_count | |Legacy ID |syscall.count | |Metric Type |gauge | |Unit |number | |Description |The total number of syscalls seen. | |Addional Notes|Syscalls are resource intensive. This metric tracks how many have been made by a given process or container|

                                                                                                                                                                                                                                                                                                                                                                      sysdig_container_syscall_error_count

                                                                                                                                                                                                                                                                                                                                                                      |Prometheus ID |sysdig_container_syscall_error_count | |Legacy ID |host.error.count | |Metric Type |counter | |Unit |number | |Description |The number of system call errors. | |Addional Notes|By default, this metric shows the total value for the selected scope. For instance, if you apply it to a group of machines, you will see the total value for the whole group. However, you can easily segment the metric to see it by host, process, container, and so on. Just use ‘Segment by’ in the UI.|

                                                                                                                                                                                                                                                                                                                                                                      sysdig_container_thread_count

                                                                                                                                                                                                                                                                                                                                                                      |Prometheus ID |sysdig_container_thread_count | |Legacy ID |thread.count | |Metric Type |counter | |Unit |number | |Description |The number of threads running in a container.| |Addional Notes| |

                                                                                                                                                                                                                                                                                                                                                                      sysdig_container_timeseries_count_appcheck

                                                                                                                                                                                                                                                                                                                                                                      |Prometheus ID |sysdig_container_timeseries_count_appcheck| |Legacy ID |metricCount.appCheck | |Metric Type |gauge | |Unit |number | |Description |The number of appcheck custom metrics. | |Addional Notes| |

                                                                                                                                                                                                                                                                                                                                                                      sysdig_container_timeseries_count_jmx

                                                                                                                                                                                                                                                                                                                                                                      |Prometheus ID |sysdig_container_timeseries_count_jmx| |Legacy ID |metricCount.jmx | |Metric Type |gauge | |Unit |number | |Description |The number of JMX custom metrics. | |Addional Notes| |

                                                                                                                                                                                                                                                                                                                                                                      sysdig_container_timeseries_count_prometheus

                                                                                                                                                                                                                                                                                                                                                                      |Prometheus ID |sysdig_container_timeseries_count_prometheus| |Legacy ID |metricCount.prometheus | |Metric Type |gauge | |Unit |number | |Description |The number of Prometheus custom metrics. | |Addional Notes| |

                                                                                                                                                                                                                                                                                                                                                                      sysdig_container_timeseries_count_statsd

                                                                                                                                                                                                                                                                                                                                                                      |Prometheus ID |sysdig_container_timeseries_count_statsd| |Legacy ID |metricCount.statsd | |Metric Type |gauge | |Unit |number | |Description |The number of StatsD custom metrics. | |Addional Notes| |

                                                                                                                                                                                                                                                                                                                                                                      sysdig_container_up

                                                                                                                                                                                                                                                                                                                                                                      |Prometheus ID |sysdig_container_up | |Legacy ID |uptime | |Metric Type |gauge | |Unit |number | |Description |The percentage of time the selected entity was down during the visualized time sample. This can be used to determine if a machine (or a group of machines) went down.| |Addional Notes| |

                                                                                                                                                                                                                                                                                                                                                                      10.4 -

                                                                                                                                                                                                                                                                                                                                                                      File

                                                                                                                                                                                                                                                                                                                                                                      sysdig_filestats_host_file_error_total_count

                                                                                                                                                                                                                                                                                                                                                                      |Prometheus ID |sysdig_filestats_host_file_error_total_count | |Legacy ID |file.error.total.count | |Metric Type |counter | |Unit |number | |Description |Number of error caused by file access. | |Addional Notes|By default, this metric shows the total value for the selected scope. For instance, if you apply it to a group of machines, you will see the total value for the whole group. However, you can easily segment the metric to see it by host, process, container, and so on. Just use ‘Segment by’ in the UI.|

                                                                                                                                                                                                                                                                                                                                                                      sysdig_filestats_host_file_in_bytes

                                                                                                                                                                                                                                                                                                                                                                      |Prometheus ID |sysdig_filestats_host_file_in_bytes | |Legacy ID |file.bytes.in | |Metric Type |counter | |Unit |data | |Description |Amount of bytes read from file. | |Addional Notes|By default, this metric shows the total value for the selected scope. For instance, if you apply it to a group of machines, you will see the total value for the whole group. However, you can easily segment the metric to see it by host, process, container, and so on. Just use ‘Segment by’ in the UI.|

                                                                                                                                                                                                                                                                                                                                                                      sysdig_filestats_host_file_open_count

                                                                                                                                                                                                                                                                                                                                                                      |Prometheus ID |sysdig_filestats_host_file_open_count | |Legacy ID |file.open.count | |Metric Type |counter | |Unit |number | |Description |Number of time the file has been opened.| |Addional Notes| |

                                                                                                                                                                                                                                                                                                                                                                      sysdig_filestats_host_file_out_bytes

                                                                                                                                                                                                                                                                                                                                                                      |Prometheus ID |sysdig_filestats_host_file_out_bytes | |Legacy ID |file.bytes.out | |Metric Type |counter | |Unit |data | |Description |Amount of bytes written to file. | |Addional Notes|By default, this metric shows the total value for the selected scope. For instance, if you apply it to a group of machines, you will see the total value for the whole group. However, you can easily segment the metric to see it by host, process, container, and so on. Just use ‘Segment by’ in the UI.|

                                                                                                                                                                                                                                                                                                                                                                      sysdig_filestats_host_file_total_bytes

                                                                                                                                                                                                                                                                                                                                                                      |Prometheus ID |sysdig_filestats_host_file_total_bytes | |Legacy ID |file.bytes.total | |Metric Type |counter | |Unit |data | |Description |Amount of bytes read from and written to file. | |Addional Notes|By default, this metric shows the total value for the selected scope. For instance, if you apply it to a group of machines, you will see the total value for the whole group. However, you can easily segment the metric to see it by host, process, container, and so on. Just use ‘Segment by’ in the UI.|

                                                                                                                                                                                                                                                                                                                                                                      sysdig_filestats_host_file_total_time

                                                                                                                                                                                                                                                                                                                                                                      |Prometheus ID |sysdig_filestats_host_file_total_time | |Legacy ID |file.time.total | |Metric Type |counter | |Unit |time | |Description |Time spent in file I/O. | |Addional Notes|By default, this metric shows the total value for the selected scope. For instance, if you apply it to a group of machines, you will see the total value for the whole group. However, you can easily segment the metric to see it by host, process, container, and so on. Just use ‘Segment by’ in the UI.|

                                                                                                                                                                                                                                                                                                                                                                      sysdig_fs_free_bytes

                                                                                                                                                                                                                                                                                                                                                                      |Prometheus ID |sysdig_fs_free_bytes | |Legacy ID |fs.bytes.free | |Metric Type |gauge | |Unit |data | |Description |Filesystem available space.| |Addional Notes| |

                                                                                                                                                                                                                                                                                                                                                                      sysdig_fs_free_percent

                                                                                                                                                                                                                                                                                                                                                                      |Prometheus ID |sysdig_fs_free_percent | |Legacy ID |fs.free.percent | |Metric Type |gauge | |Unit |percent | |Description |Percentage of filesystem free space.| |Addional Notes| |

                                                                                                                                                                                                                                                                                                                                                                      sysdig_fs_inodes_total_count

                                                                                                                                                                                                                                                                                                                                                                      |Prometheus ID |sysdig_fs_inodes_total_count| |Legacy ID |fs.inodes.total.count | |Metric Type |gauge | |Unit |number | |Description | | |Addional Notes| |

                                                                                                                                                                                                                                                                                                                                                                      sysdig_fs_inodes_used_count

                                                                                                                                                                                                                                                                                                                                                                      |Prometheus ID |sysdig_fs_inodes_used_count| |Legacy ID |fs.inodes.used.count | |Metric Type |gauge | |Unit |number | |Description | | |Addional Notes| |

                                                                                                                                                                                                                                                                                                                                                                      sysdig_fs_inodes_used_percent

                                                                                                                                                                                                                                                                                                                                                                      |Prometheus ID |sysdig_fs_inodes_used_percent| |Legacy ID |fs.inodes.used.percent | |Metric Type |gauge | |Unit |percent | |Description | | |Addional Notes| |

                                                                                                                                                                                                                                                                                                                                                                      sysdig_fs_total_bytes

                                                                                                                                                                                                                                                                                                                                                                      |Prometheus ID |sysdig_fs_total_bytes| |Legacy ID |fs.bytes.total | |Metric Type |gauge | |Unit |data | |Description |Filesystem size. | |Addional Notes| |

                                                                                                                                                                                                                                                                                                                                                                      sysdig_fs_used_bytes

                                                                                                                                                                                                                                                                                                                                                                      |Prometheus ID |sysdig_fs_used_bytes | |Legacy ID |fs.bytes.used | |Metric Type |gauge | |Unit |data | |Description |Filesystem used space.| |Addional Notes| |

                                                                                                                                                                                                                                                                                                                                                                      sysdig_fs_used_percent

                                                                                                                                                                                                                                                                                                                                                                      |Prometheus ID |sysdig_fs_used_percent | |Legacy ID |fs.used.percent | |Metric Type |gauge | |Unit |percent | |Description |Percentage of the sum of all filesystems in use.| |Addional Notes| |

                                                                                                                                                                                                                                                                                                                                                                      10.5 -

                                                                                                                                                                                                                                                                                                                                                                      Host

                                                                                                                                                                                                                                                                                                                                                                      sysdig_host_container_count

                                                                                                                                                                                                                                                                                                                                                                      |Prometheus ID |sysdig_host_container_count | |Legacy ID |container.count | |Metric Type |gauge | |Unit |number | |Description |Count of the number of containers. | |Addional Notes|This metric is perfect for dashboards and alerts. In particular, you can create alerts that notify you when you have too many (or too few) containers of a certain type in a certain group or node - try segmenting by container.image, .id or .name. See also: host.count.|

                                                                                                                                                                                                                                                                                                                                                                      sysdig_host_container_start_count

                                                                                                                                                                                                                                                                                                                                                                      |Prometheus ID |sysdig_host_container_start_count| |Legacy ID |host.container.start.count | |Metric Type |counter | |Unit |number | |Description | | |Addional Notes| |

                                                                                                                                                                                                                                                                                                                                                                      sysdig_host_count

                                                                                                                                                                                                                                                                                                                                                                      |Prometheus ID |sysdig_host_count | |Legacy ID |host.count | |Metric Type |gauge | |Unit |number | |Description |Count of the number of hosts. | |Addional Notes|This metric is perfect for dashboards and alerts. In particular, you can create alerts that notify you when you have too many (or too few) machines of a certain type in a certain group - try segment by tag or hostname. See also: container.count.|

                                                                                                                                                                                                                                                                                                                                                                      sysdig_host_cpu_cores_used

                                                                                                                                                                                                                                                                                                                                                                      |Prometheus ID |sysdig_host_cpu_cores_used| |Legacy ID |cpu.cores.used | |Metric Type |gauge | |Unit |number | |Description | | |Addional Notes| |

                                                                                                                                                                                                                                                                                                                                                                      sysdig_host_cpu_cores_used_percent

                                                                                                                                                                                                                                                                                                                                                                      |Prometheus ID |sysdig_host_cpu_cores_used_percent| |Legacy ID |cpu.cores.used.percent | |Metric Type |gauge | |Unit |percent | |Description | | |Addional Notes| |

                                                                                                                                                                                                                                                                                                                                                                      sysdig_host_cpu_idle_percent

                                                                                                                                                                                                                                                                                                                                                                      |Prometheus ID |sysdig_host_cpu_idle_percent | |Legacy ID |cpu.idle.percent | |Metric Type |gauge | |Unit |percent | |Description |Percentage of time that the CPU or CPUs were idle and the system did not have an outstanding disk I/O request. | |Addional Notes|By default, this metric shows the average value for the selected scope. For instance, if you apply it to a group of machines, you will see the average value for the whole group. However, the metric can also be segmented by using ‘Segment by’ in the UI.|

                                                                                                                                                                                                                                                                                                                                                                      sysdig_host_cpu_iowait_percent

                                                                                                                                                                                                                                                                                                                                                                      |Prometheus ID |sysdig_host_cpu_iowait_percent | |Legacy ID |cpu.iowait.percent | |Metric Type |gauge | |Unit |percent | |Description |Percentage of time that the CPU or CPUs were idle during which the system had an outstanding disk I/O request. | |Addional Notes|By default, this metric shows the average value for the selected scope. For instance, if you apply it to a group of machines, you will see the average value for the whole group. However, the metric can also be segmented by using ‘Segment by’ in the UI.|

                                                                                                                                                                                                                                                                                                                                                                      sysdig_host_cpu_nice_percent

                                                                                                                                                                                                                                                                                                                                                                      |Prometheus ID |sysdig_host_cpu_nice_percent | |Legacy ID |cpu.nice.percent | |Metric Type |gauge | |Unit |percent | |Description |Percentage of CPU utilization that occurred while executing at the user level with nice priority. | |Addional Notes|By default, this metric shows the average value for the selected scope. For instance, if you apply it to a group of machines, you will see the average value for the whole group. However, the metric can also be segmented by using ‘Segment by’ in the UI.|

                                                                                                                                                                                                                                                                                                                                                                      sysdig_host_cpu_stolen_percent

                                                                                                                                                                                                                                                                                                                                                                      |Prometheus ID |sysdig_host_cpu_stolen_percent | |Legacy ID |cpu.stolen.percent | |Metric Type |gauge | |Unit |percent | |Description |CPU steal time is a measure of the percent of time that a virtual machine’s CPU is in a state of involuntary wait due to the fact that the physical CPU is shared among virtual machines. In calculating steal time, the operating system kernel detects when it has work available but does not have access to the physical CPU to perform that work. | |Addional Notes|If the percent of steal time is consistently high, you may want to stop and restart the instance (since it will most likely start on different physical hardware) or upgrade to a virtual machine with more CPU power. Also see the metric ‘capacity total percent’ to see how steal time directly impacts the number of server requests that could not be handled. On AWS EC2, steal time does not depend on the activity of other virtual machine neighbours. EC2 is simply making sure your instance is not using more CPU cycles than paid for.|

                                                                                                                                                                                                                                                                                                                                                                      sysdig_host_cpu_system_percent

                                                                                                                                                                                                                                                                                                                                                                      |Prometheus ID |sysdig_host_cpu_system_percent | |Legacy ID |cpu.system.percent | |Metric Type |gauge | |Unit |percent | |Description |Percentage of CPU utilization that occurred while executing at the system level (kernel). | |Addional Notes|By default, this metric shows the average value for the selected scope. For instance, if you apply it to a group of machines, you will see the average value for the whole group. However, the metric can also be segmented by using ‘Segment by’ in the UI.|

                                                                                                                                                                                                                                                                                                                                                                      sysdig_host_cpu_used_percent

                                                                                                                                                                                                                                                                                                                                                                      |Prometheus ID |sysdig_host_cpu_used_percent | |Legacy ID |cpu.used.percent | |Metric Type |gauge | |Unit |percent | |Description |The CPU usage for each container is obtained from cgroups, and normalized by dividing by the number of cores to determine an overall percentage. For example, if the environment contains six cores on a host, and the container or processes are assigned two cores, Sysdig will report CPU usage of 2/6 * 100% = 33.33%. This metric is calculated differently for hosts and processes.| |Addional Notes| |

                                                                                                                                                                                                                                                                                                                                                                      sysdig_host_cpu_user_percent

                                                                                                                                                                                                                                                                                                                                                                      |Prometheus ID |sysdig_host_cpu_user_percent | |Legacy ID |cpu.user.percent | |Metric Type |gauge | |Unit |percent | |Description |Percentage of CPU utilization that occurred while executing at the user level (application). | |Addional Notes|By default, this metric shows the average value for the selected scope. For instance, if you apply it to a group of machines, you will see the average value for the whole group. However, the metric can also be segmented by using ‘Segment by’ in the UI.|

                                                                                                                                                                                                                                                                                                                                                                      sysdig_host_cpucore_idle_percent

                                                                                                                                                                                                                                                                                                                                                                      |Prometheus ID |sysdig_host_cpucore_idle_percent| |Legacy ID |cpucore.idle.percent | |Metric Type |gauge | |Unit |percent | |Description | | |Addional Notes| |

                                                                                                                                                                                                                                                                                                                                                                      sysdig_host_cpucore_iowait_percent

                                                                                                                                                                                                                                                                                                                                                                      |Prometheus ID |sysdig_host_cpucore_iowait_percent| |Legacy ID |cpucore.iowait.percent | |Metric Type |gauge | |Unit |percent | |Description | | |Addional Notes| |

                                                                                                                                                                                                                                                                                                                                                                      sysdig_host_cpucore_nice_percent

                                                                                                                                                                                                                                                                                                                                                                      |Prometheus ID |sysdig_host_cpucore_nice_percent| |Legacy ID |cpucore.nice.percent | |Metric Type |gauge | |Unit |percent | |Description | | |Addional Notes| |

                                                                                                                                                                                                                                                                                                                                                                      sysdig_host_cpucore_stolen_percent

                                                                                                                                                                                                                                                                                                                                                                      |Prometheus ID |sysdig_host_cpucore_stolen_percent| |Legacy ID |cpucore.stolen.percent | |Metric Type |gauge | |Unit |percent | |Description | | |Addional Notes| |

                                                                                                                                                                                                                                                                                                                                                                      sysdig_host_cpucore_system_percent

                                                                                                                                                                                                                                                                                                                                                                      |Prometheus ID |sysdig_host_cpucore_system_percent| |Legacy ID |cpucore.system.percent | |Metric Type |gauge | |Unit |percent | |Description | | |Addional Notes| |

                                                                                                                                                                                                                                                                                                                                                                      sysdig_host_cpucore_used_percent

                                                                                                                                                                                                                                                                                                                                                                      |Prometheus ID |sysdig_host_cpucore_used_percent| |Legacy ID |cpucore.used.percent | |Metric Type |gauge | |Unit |percent | |Description | | |Addional Notes| |

                                                                                                                                                                                                                                                                                                                                                                      sysdig_host_cpucore_user_percent

                                                                                                                                                                                                                                                                                                                                                                      |Prometheus ID |sysdig_host_cpucore_user_percent| |Legacy ID |cpucore.user.percent | |Metric Type |gauge | |Unit |percent | |Description | | |Addional Notes| |

                                                                                                                                                                                                                                                                                                                                                                      sysdig_host_fd_used_percent

                                                                                                                                                                                                                                                                                                                                                                      |Prometheus ID |sysdig_host_fd_used_percent | |Legacy ID |fd.used.percent | |Metric Type |gauge | |Unit |percent | |Description |Percentage of used file descriptors out of the maximum available. | |Addional Notes|Usually, when a process reaches its FD limit it will stop operating properly and possibly crash. As a consequence, this is a metric you want to monitor carefully, or even better use for alerts.|

                                                                                                                                                                                                                                                                                                                                                                      sysdig_host_file_error_open_count

                                                                                                                                                                                                                                                                                                                                                                      |Prometheus ID |sysdig_host_file_error_open_count | |Legacy ID |file.error.open.count | |Metric Type |counter | |Unit |number | |Description |Number of errors in opening files. | |Addional Notes|By default, this metric shows the total value for the selected scope. For instance, if you apply it to a group of machines, you will see the total value for the whole group. However, you can easily segment the metric to see it by host, process, container, and so on. Just use ‘Segment by’ in the UI.|

                                                                                                                                                                                                                                                                                                                                                                      sysdig_host_file_error_total_count

                                                                                                                                                                                                                                                                                                                                                                      |Prometheus ID |sysdig_host_file_error_total_count | |Legacy ID |file.error.total.count | |Metric Type |counter | |Unit |number | |Description |Number of error caused by file access. | |Addional Notes|By default, this metric shows the total value for the selected scope. For instance, if you apply it to a group of machines, you will see the total value for the whole group. However, you can easily segment the metric to see it by host, process, container, and so on. Just use ‘Segment by’ in the UI.|

                                                                                                                                                                                                                                                                                                                                                                      sysdig_host_file_in_bytes

                                                                                                                                                                                                                                                                                                                                                                      |Prometheus ID |sysdig_host_file_in_bytes | |Legacy ID |file.bytes.in | |Metric Type |counter | |Unit |data | |Description |Amount of bytes read from file. | |Addional Notes|By default, this metric shows the total value for the selected scope. For instance, if you apply it to a group of machines, you will see the total value for the whole group. However, you can easily segment the metric to see it by host, process, container, and so on. Just use ‘Segment by’ in the UI.|

                                                                                                                                                                                                                                                                                                                                                                      sysdig_host_file_in_iops

                                                                                                                                                                                                                                                                                                                                                                      |Prometheus ID |sysdig_host_file_in_iops | |Legacy ID |file.iops.in | |Metric Type |counter | |Unit |number | |Description |Number of file read operations per second. | |Addional Notes|This is calculated by measuring the actual number of read and write requests made by a process. Therefore, it can differ from what other tools show, which is usually based on interpolating this value from the number of bytes read and written to the file system.|

                                                                                                                                                                                                                                                                                                                                                                      sysdig_host_file_in_time

                                                                                                                                                                                                                                                                                                                                                                      |Prometheus ID |sysdig_host_file_in_time | |Legacy ID |file.time.in | |Metric Type |counter | |Unit |time | |Description |Time spent in file reading. | |Addional Notes|By default, this metric shows the total value for the selected scope. For instance, if you apply it to a group of machines, you will see the total value for the whole group. However, you can easily segment the metric to see it by host, process, container, and so on. Just use ‘Segment by’ in the UI.|

                                                                                                                                                                                                                                                                                                                                                                      sysdig_host_file_open_count

                                                                                                                                                                                                                                                                                                                                                                      |Prometheus ID |sysdig_host_file_open_count | |Legacy ID |file.open.count | |Metric Type |counter | |Unit |number | |Description |Number of time the file has been opened.| |Addional Notes| |

                                                                                                                                                                                                                                                                                                                                                                      sysdig_host_file_out_bytes

                                                                                                                                                                                                                                                                                                                                                                      |Prometheus ID |sysdig_host_file_out_bytes | |Legacy ID |file.bytes.out | |Metric Type |counter | |Unit |data | |Description |Amount of bytes written to file. | |Addional Notes|By default, this metric shows the total value for the selected scope. For instance, if you apply it to a group of machines, you will see the total value for the whole group. However, you can easily segment the metric to see it by host, process, container, and so on. Just use ‘Segment by’ in the UI.|

                                                                                                                                                                                                                                                                                                                                                                      sysdig_host_file_out_iops

                                                                                                                                                                                                                                                                                                                                                                      |Prometheus ID |sysdig_host_file_out_iops | |Legacy ID |file.iops.out | |Metric Type |counter | |Unit |number | |Description |Number of file write operations per second. | |Addional Notes|This is calculated by measuring the actual number of read and write requests made by a process. Therefore, it can differ from what other tools show, which is usually based on interpolating this value from the number of bytes read and written to the file system.|

                                                                                                                                                                                                                                                                                                                                                                      sysdig_host_file_out_time

                                                                                                                                                                                                                                                                                                                                                                      |Prometheus ID |sysdig_host_file_out_time | |Legacy ID |file.time.out | |Metric Type |counter | |Unit |time | |Description |Time spent in file writing. | |Addional Notes|By default, this metric shows the total value for the selected scope. For instance, if you apply it to a group of machines, you will see the total value for the whole group. However, you can easily segment the metric to see it by host, process, container, and so on. Just use ‘Segment by’ in the UI.|

                                                                                                                                                                                                                                                                                                                                                                      sysdig_host_file_total_bytes

                                                                                                                                                                                                                                                                                                                                                                      |Prometheus ID |sysdig_host_file_total_bytes | |Legacy ID |file.bytes.total | |Metric Type |counter | |Unit |data | |Description |Amount of bytes read from and written to file. | |Addional Notes|By default, this metric shows the total value for the selected scope. For instance, if you apply it to a group of machines, you will see the total value for the whole group. However, you can easily segment the metric to see it by host, process, container, and so on. Just use ‘Segment by’ in the UI.|

                                                                                                                                                                                                                                                                                                                                                                      sysdig_host_file_total_iops

                                                                                                                                                                                                                                                                                                                                                                      |Prometheus ID |sysdig_host_file_total_iops | |Legacy ID |file.iops.total | |Metric Type |counter | |Unit |number | |Description |Number of read and write file operations per second. | |Addional Notes|This is calculated by measuring the actual number of read and write requests made by a process. Therefore, it can differ from what other tools show, which is usually based on interpolating this value from the number of bytes read and written to the file system.|

                                                                                                                                                                                                                                                                                                                                                                      sysdig_host_file_total_time

                                                                                                                                                                                                                                                                                                                                                                      |Prometheus ID |sysdig_host_file_total_time | |Legacy ID |file.time.total | |Metric Type |counter | |Unit |time | |Description |Time spent in file I/O. | |Addional Notes|By default, this metric shows the total value for the selected scope. For instance, if you apply it to a group of machines, you will see the total value for the whole group. However, you can easily segment the metric to see it by host, process, container, and so on. Just use ‘Segment by’ in the UI.|

                                                                                                                                                                                                                                                                                                                                                                      sysdig_host_fs_free_bytes

                                                                                                                                                                                                                                                                                                                                                                      |Prometheus ID |sysdig_host_fs_free_bytes | |Legacy ID |fs.bytes.free | |Metric Type |gauge | |Unit |data | |Description |Filesystem available space.| |Addional Notes| |

                                                                                                                                                                                                                                                                                                                                                                      sysdig_host_fs_free_percent

                                                                                                                                                                                                                                                                                                                                                                      |Prometheus ID |sysdig_host_fs_free_percent | |Legacy ID |fs.free.percent | |Metric Type |gauge | |Unit |percent | |Description |Percentage of filesystem free space.| |Addional Notes| |

                                                                                                                                                                                                                                                                                                                                                                      sysdig_host_fs_inodes_total_count

                                                                                                                                                                                                                                                                                                                                                                      |Prometheus ID |sysdig_host_fs_inodes_total_count| |Legacy ID |fs.inodes.total.count | |Metric Type |gauge | |Unit |number | |Description | | |Addional Notes| |

                                                                                                                                                                                                                                                                                                                                                                      sysdig_host_fs_inodes_used_count

                                                                                                                                                                                                                                                                                                                                                                      |Prometheus ID |sysdig_host_fs_inodes_used_count| |Legacy ID |fs.inodes.used.count | |Metric Type |gauge | |Unit |number | |Description | | |Addional Notes| |

                                                                                                                                                                                                                                                                                                                                                                      sysdig_host_fs_inodes_used_percent

                                                                                                                                                                                                                                                                                                                                                                      |Prometheus ID |sysdig_host_fs_inodes_used_percent| |Legacy ID |fs.inodes.used.percent | |Metric Type |gauge | |Unit |percent | |Description | | |Addional Notes| |

                                                                                                                                                                                                                                                                                                                                                                      sysdig_host_fs_largest_used_percent

                                                                                                                                                                                                                                                                                                                                                                      |Prometheus ID |sysdig_host_fs_largest_used_percent | |Legacy ID |fs.largest.used.percent | |Metric Type |gauge | |Unit |percent | |Description |Percentage of the largest filesystem in use.| |Addional Notes| |

                                                                                                                                                                                                                                                                                                                                                                      sysdig_host_fs_root_used_percent

                                                                                                                                                                                                                                                                                                                                                                      |Prometheus ID |sysdig_host_fs_root_used_percent | |Legacy ID |fs.root.used.percent | |Metric Type |gauge | |Unit |percent | |Description |Percentage of the root filesystem in use.| |Addional Notes| |

                                                                                                                                                                                                                                                                                                                                                                      sysdig_host_fs_total_bytes

                                                                                                                                                                                                                                                                                                                                                                      |Prometheus ID |sysdig_host_fs_total_bytes| |Legacy ID |fs.bytes.total | |Metric Type |gauge | |Unit |data | |Description |Filesystem size. | |Addional Notes| |

                                                                                                                                                                                                                                                                                                                                                                      sysdig_host_fs_used_bytes

                                                                                                                                                                                                                                                                                                                                                                      |Prometheus ID |sysdig_host_fs_used_bytes| |Legacy ID |fs.bytes.used | |Metric Type |gauge | |Unit |data | |Description |Filesystem used space. | |Addional Notes| |

                                                                                                                                                                                                                                                                                                                                                                      sysdig_host_fs_used_percent

                                                                                                                                                                                                                                                                                                                                                                      |Prometheus ID |sysdig_host_fs_used_percent | |Legacy ID |fs.used.percent | |Metric Type |gauge | |Unit |percent | |Description |Percentage of the sum of all filesystems in use.| |Addional Notes| |

                                                                                                                                                                                                                                                                                                                                                                      sysdig_host_info

                                                                                                                                                                                                                                                                                                                                                                      |Prometheus ID |sysdig_host_info| |Legacy ID |info | |Metric Type |gauge | |Unit |number | |Description | | |Addional Notes| |

                                                                                                                                                                                                                                                                                                                                                                      sysdig_host_load_average_15m

                                                                                                                                                                                                                                                                                                                                                                      |Prometheus ID |sysdig_host_load_average_15m | |Legacy ID |load.average.15m | |Metric Type |gauge | |Unit |number | |Description |The 15 minute system load average represents the average number of jobs in (1) the CPU run queue or (2) waiting for disk I/O averaged over 15 minutes for all cores. The value should correspond to the third (and last) load average value displayed by ‘uptime’ command.| |Addional Notes| |

                                                                                                                                                                                                                                                                                                                                                                      sysdig_host_load_average_1m

                                                                                                                                                                                                                                                                                                                                                                      |Prometheus ID |sysdig_host_load_average_1m | |Legacy ID |load.average.1m | |Metric Type |gauge | |Unit |number | |Description |The 1 minute system load average represents the average number of jobs in (1) the CPU run queue or (2) waiting for disk I/O averaged over 1 minute for all cores. The value should correspond to the first (of three) load average values displayed by ‘uptime’ command.| |Addional Notes| |

                                                                                                                                                                                                                                                                                                                                                                      sysdig_host_load_average_5m

                                                                                                                                                                                                                                                                                                                                                                      |Prometheus ID |sysdig_host_load_average_5m | |Legacy ID |load.average.5m | |Metric Type |gauge | |Unit |number | |Description |The 5 minute system load average represents the average number of jobs in (1) the CPU run queue or (2) waiting for disk I/O averaged over 5 minutes for all cores. The value should correspond to the second (of three) load average values displayed by ‘uptime’ command.| |Addional Notes| |

                                                                                                                                                                                                                                                                                                                                                                      sysdig_host_load_average_percpu_15m

                                                                                                                                                                                                                                                                                                                                                                      |Prometheus ID |sysdig_host_load_average_percpu_15m | |Legacy ID |load.average.percpu.15m | |Metric Type |gauge | |Unit |number | |Description |The 15 minute system load average represents the average number of jobs in (1) the CPU run queue or (2) waiting for disk I/O averaged over 15 minutes, divided by number of system CPUs.| |Addional Notes| |

                                                                                                                                                                                                                                                                                                                                                                      sysdig_host_load_average_percpu_1m

                                                                                                                                                                                                                                                                                                                                                                      |Prometheus ID |sysdig_host_load_average_percpu_1m | |Legacy ID |load.average.percpu.1m | |Metric Type |gauge | |Unit |number | |Description |The 1 minute system load average represents the average number of jobs in (1) the CPU run queue or (2) waiting for disk I/O averaged over 1 minute, divided by number of system CPUs.| |Addional Notes| |

                                                                                                                                                                                                                                                                                                                                                                      sysdig_host_load_average_percpu_5m

                                                                                                                                                                                                                                                                                                                                                                      |Prometheus ID |sysdig_host_load_average_percpu_5m | |Legacy ID |load.average.percpu.5m | |Metric Type |gauge | |Unit |number | |Description |The 5 minute system load average represents the average number of jobs in (1) the CPU run queue or (2) waiting for disk I/O averaged over 5 minutes, divided by number of system CPUs.| |Addional Notes| |

                                                                                                                                                                                                                                                                                                                                                                      sysdig_host_memory_available_bytes

                                                                                                                                                                                                                                                                                                                                                                      |Prometheus ID |sysdig_host_memory_available_bytes | |Legacy ID |memory.bytes.available | |Metric Type |gauge | |Unit |data | |Description |The available memory for a host is obtained from /proc/meminfo. For environments using Linux kernel version 3.12 and later, the available memory is obtained using the mem.available field in /proc/meminfo. For environments using earlier kernel versions, the formula is MemFree + Cached + Buffers.| |Addional Notes| |

                                                                                                                                                                                                                                                                                                                                                                      sysdig_host_memory_swap_available_bytes

                                                                                                                                                                                                                                                                                                                                                                      |Prometheus ID |sysdig_host_memory_swap_available_bytes | |Legacy ID |memory.swap.bytes.available | |Metric Type |gauge | |Unit |data | |Description |Available amount of swap memory. | |Addional Notes|Sum of free and cached swap memory. By default, this metric shows the average value for the selected scope. For instance, if you apply it to a group of machines, you will see the average value for the whole group. However, the metric can also be segmented by using ‘Segment by’ in the UI.|

                                                                                                                                                                                                                                                                                                                                                                      sysdig_host_memory_swap_total_bytes

                                                                                                                                                                                                                                                                                                                                                                      |Prometheus ID |sysdig_host_memory_swap_total_bytes | |Legacy ID |memory.swap.bytes.total | |Metric Type |gauge | |Unit |data | |Description |Total amount of swap memory. | |Addional Notes|By default, this metric shows the average value for the selected scope. For instance, if you apply it to a group of machines, you will see the average value for the whole group. However, the metric can also be segmented by using ‘Segment by’ in the UI.|

                                                                                                                                                                                                                                                                                                                                                                      sysdig_host_memory_swap_used_bytes

                                                                                                                                                                                                                                                                                                                                                                      |Prometheus ID |sysdig_host_memory_swap_used_bytes | |Legacy ID |memory.swap.bytes.used | |Metric Type |gauge | |Unit |data | |Description |Used amount of swap memory. | |Addional Notes|The amount of used swap memory is calculated by subtracting available from total swap memory. By default, this metric shows the average value for the selected scope. For instance, if you apply it to a group of machines, you will see the average value for the whole group. However, the metric can also be segmented by using ‘Segment by’ in the UI.|

                                                                                                                                                                                                                                                                                                                                                                      sysdig_host_memory_swap_used_percent

                                                                                                                                                                                                                                                                                                                                                                      |Prometheus ID |sysdig_host_memory_swap_used_percent | |Legacy ID |memory.swap.used.percent | |Metric Type |gauge | |Unit |percent | |Description |Used percent of swap memory. | |Addional Notes|The percentage of used swap memory is calculated as percentual ratio of used and total swap memory. By default, this metric shows the average value for the selected scope. For instance, if you apply it to a group of machines, you will see the average value for the whole group. However, the metric can also be segmented by using ‘Segment by’ in the UI.|

                                                                                                                                                                                                                                                                                                                                                                      sysdig_host_memory_total_bytes

                                                                                                                                                                                                                                                                                                                                                                      |Prometheus ID |sysdig_host_memory_total_bytes | |Legacy ID |memory.bytes.total | |Metric Type |gauge | |Unit |data | |Description |The total memory of a host, in bytes. This value is obtained from /proc.| |Addional Notes| |

                                                                                                                                                                                                                                                                                                                                                                      sysdig_host_memory_used_bytes

                                                                                                                                                                                                                                                                                                                                                                      |Prometheus ID |sysdig_host_memory_used_bytes | |Legacy ID |memory.bytes.used | |Metric Type |gauge | |Unit |data | |Description |The amount of physical memory currently in use. | |Addional Notes|By default, this metric shows the average value for the selected scope. For instance, if you apply it to a group of machines, you will see the average value for the whole group. However, the metric can also be segmented by using ‘Segment by’ in the UI.|

                                                                                                                                                                                                                                                                                                                                                                      sysdig_host_memory_used_percent

                                                                                                                                                                                                                                                                                                                                                                      |Prometheus ID |sysdig_host_memory_used_percent | |Legacy ID |memory.used.percent | |Metric Type |gauge | |Unit |percent | |Description |The percentage of physical memory in use. | |Addional Notes|By default, this metric shows the average value for the selected scope. For instance, if you apply it to a group of machines, you will see the average value for the whole group. However, you can easily segment the metric to see it by host, process, container, and so on. Just use ‘Segment by’ in the UI.|

                                                                                                                                                                                                                                                                                                                                                                      sysdig_host_memory_virtual_bytes

                                                                                                                                                                                                                                                                                                                                                                      |Prometheus ID |sysdig_host_memory_virtual_bytes | |Legacy ID |memory.bytes.virtual | |Metric Type |gauge | |Unit |data | |Description |The virtual memory size of the process, in bytes. This value is obtained from Sysdig events.| |Addional Notes| |

                                                                                                                                                                                                                                                                                                                                                                      sysdig_host_net_connection_in_count

                                                                                                                                                                                                                                                                                                                                                                      |Prometheus ID |sysdig_host_net_connection_in_count | |Legacy ID |net.connection.count.in | |Metric Type |counter | |Unit |number | |Description |Number of currently established client (inbound) connections. | |Addional Notes|This metric is especially useful when segmented by protocol, port or process.|

                                                                                                                                                                                                                                                                                                                                                                      sysdig_host_net_connection_out_count

                                                                                                                                                                                                                                                                                                                                                                      |Prometheus ID |sysdig_host_net_connection_out_count | |Legacy ID |net.connection.count.out | |Metric Type |counter | |Unit |number | |Description |Number of currently established server (outbound) connections. | |Addional Notes|This metric is especially useful when segmented by protocol, port or process.|

                                                                                                                                                                                                                                                                                                                                                                      sysdig_host_net_connection_total_count

                                                                                                                                                                                                                                                                                                                                                                      |Prometheus ID |sysdig_host_net_connection_total_count | |Legacy ID |net.connection.count.total | |Metric Type |counter | |Unit |number | |Description |Number of currently established connections. This value may exceed the sum of the inbound and outbound metrics since it represents client and server inter-host connections as well as internal only connections.| |Addional Notes|This metric is especially useful when segmented by protocol, port or process. |

                                                                                                                                                                                                                                                                                                                                                                      sysdig_host_net_error_count

                                                                                                                                                                                                                                                                                                                                                                      |Prometheus ID |sysdig_host_net_error_count | |Legacy ID |net.error.count | |Metric Type |counter | |Unit |number | |Description |Number of network errors. | |Addional Notes|By default, this metric shows the total value for the selected scope. For instance, if you apply it to a group of machines, you will see the total value for the whole group. However, you can easily segment the metric to see it by host, process, container, and so on. Just use ‘Segment by’ in the UI.|

                                                                                                                                                                                                                                                                                                                                                                      sysdig_host_net_http_error_count

                                                                                                                                                                                                                                                                                                                                                                      |Prometheus ID |sysdig_host_net_http_error_count | |Legacy ID |net.http.error.count | |Metric Type |counter | |Unit |number | |Description |Number of failed HTTP requests as counted from 4xx/5xx status codes.| |Addional Notes| |

                                                                                                                                                                                                                                                                                                                                                                      sysdig_host_net_http_request_count

                                                                                                                                                                                                                                                                                                                                                                      |Prometheus ID |sysdig_host_net_http_request_count| |Legacy ID |net.http.request.count | |Metric Type |counter | |Unit |number | |Description |Count of HTTP requests. | |Addional Notes| |

                                                                                                                                                                                                                                                                                                                                                                      sysdig_host_net_http_request_time

                                                                                                                                                                                                                                                                                                                                                                      |Prometheus ID |sysdig_host_net_http_request_time| |Legacy ID |net.http.request.time | |Metric Type |counter | |Unit |time | |Description |Average time for HTTP requests. | |Addional Notes| |

                                                                                                                                                                                                                                                                                                                                                                      sysdig_host_net_http_statuscode_error_count

                                                                                                                                                                                                                                                                                                                                                                      |Prometheus ID |sysdig_host_net_http_statuscode_error_count| |Legacy ID |net.http.statuscode.error.count | |Metric Type |counter | |Unit |number | |Description | | |Addional Notes| |

                                                                                                                                                                                                                                                                                                                                                                      sysdig_host_net_http_statuscode_request_count

                                                                                                                                                                                                                                                                                                                                                                      |Prometheus ID |sysdig_host_net_http_statuscode_request_count| |Legacy ID |net.http.statuscode.request.count | |Metric Type |counter | |Unit |number | |Description | | |Addional Notes| |

                                                                                                                                                                                                                                                                                                                                                                      sysdig_host_net_http_url_error_count

                                                                                                                                                                                                                                                                                                                                                                      |Prometheus ID |sysdig_host_net_http_url_error_count| |Legacy ID |net.http.url.error.count | |Metric Type |counter | |Unit |number | |Description | | |Addional Notes| |

                                                                                                                                                                                                                                                                                                                                                                      sysdig_host_net_http_url_request_count

                                                                                                                                                                                                                                                                                                                                                                      |Prometheus ID |sysdig_host_net_http_url_request_count| |Legacy ID |net.http.url.request.count | |Metric Type |counter | |Unit |number | |Description | | |Addional Notes| |

                                                                                                                                                                                                                                                                                                                                                                      sysdig_host_net_http_url_request_time

                                                                                                                                                                                                                                                                                                                                                                      |Prometheus ID |sysdig_host_net_http_url_request_time| |Legacy ID |net.http.url.request.time | |Metric Type |counter | |Unit |time | |Description | | |Addional Notes| |

                                                                                                                                                                                                                                                                                                                                                                      sysdig_host_net_mongodb_collection_error_count

                                                                                                                                                                                                                                                                                                                                                                      |Prometheus ID |sysdig_host_net_mongodb_collection_error_count| |Legacy ID |net.mongodb.collection.error.count | |Metric Type |counter | |Unit |number | |Description | | |Addional Notes| |

                                                                                                                                                                                                                                                                                                                                                                      sysdig_host_net_mongodb_collection_request_count

                                                                                                                                                                                                                                                                                                                                                                      |Prometheus ID |sysdig_host_net_mongodb_collection_request_count| |Legacy ID |net.mongodb.collection.request.count | |Metric Type |counter | |Unit |number | |Description | | |Addional Notes| |

                                                                                                                                                                                                                                                                                                                                                                      sysdig_host_net_mongodb_collection_request_time

                                                                                                                                                                                                                                                                                                                                                                      |Prometheus ID |sysdig_host_net_mongodb_collection_request_time| |Legacy ID |net.mongodb.collection.request.time | |Metric Type |counter | |Unit |time | |Description | | |Addional Notes| |

                                                                                                                                                                                                                                                                                                                                                                      sysdig_host_net_mongodb_error_count

                                                                                                                                                                                                                                                                                                                                                                      |Prometheus ID |sysdig_host_net_mongodb_error_count| |Legacy ID |net.mongodb.error.count | |Metric Type |counter | |Unit |number | |Description | | |Addional Notes| |

                                                                                                                                                                                                                                                                                                                                                                      sysdig_host_net_mongodb_operation_error_count

                                                                                                                                                                                                                                                                                                                                                                      |Prometheus ID |sysdig_host_net_mongodb_operation_error_count| |Legacy ID |net.mongodb.operation.error.count | |Metric Type |counter | |Unit |number | |Description | | |Addional Notes| |

                                                                                                                                                                                                                                                                                                                                                                      sysdig_host_net_mongodb_operation_request_count

                                                                                                                                                                                                                                                                                                                                                                      |Prometheus ID |sysdig_host_net_mongodb_operation_request_count| |Legacy ID |net.mongodb.operation.request.count | |Metric Type |counter | |Unit |number | |Description | | |Addional Notes| |

                                                                                                                                                                                                                                                                                                                                                                      sysdig_host_net_mongodb_operation_request_time

                                                                                                                                                                                                                                                                                                                                                                      |Prometheus ID |sysdig_host_net_mongodb_operation_request_time| |Legacy ID |net.mongodb.operation.request.time | |Metric Type |counter | |Unit |time | |Description | | |Addional Notes| |

                                                                                                                                                                                                                                                                                                                                                                      sysdig_host_net_mongodb_request_count

                                                                                                                                                                                                                                                                                                                                                                      |Prometheus ID |sysdig_host_net_mongodb_request_count| |Legacy ID |net.mongodb.request.count | |Metric Type |counter | |Unit |number | |Description | | |Addional Notes| |

                                                                                                                                                                                                                                                                                                                                                                      sysdig_host_net_mongodb_request_time

                                                                                                                                                                                                                                                                                                                                                                      |Prometheus ID |sysdig_host_net_mongodb_request_time| |Legacy ID |net.mongodb.request.time | |Metric Type |counter | |Unit |time | |Description | | |Addional Notes| |

                                                                                                                                                                                                                                                                                                                                                                      sysdig_host_net_in_bytes

                                                                                                                                                                                                                                                                                                                                                                      |Prometheus ID |sysdig_host_net_in_bytes | |Legacy ID |net.bytes.in | |Metric Type |counter | |Unit |data | |Description |Inbound network bytes. | |Addional Notes|By default, this metric shows the total value for the selected scope. For instance, if you apply it to a group of machines, you will see the total value for the whole group. However, you can easily segment the metric to see it by host, process, container, and so on. Just use ‘Segment by’ in the UI.|

                                                                                                                                                                                                                                                                                                                                                                      sysdig_host_net_out_bytes

                                                                                                                                                                                                                                                                                                                                                                      |Prometheus ID |sysdig_host_net_out_bytes | |Legacy ID |net.bytes.out | |Metric Type |counter | |Unit |data | |Description |Outbound network bytes. | |Addional Notes|By default, this metric shows the total value for the selected scope. For instance, if you apply it to a group of machines, you will see the total value for the whole group. However, you can easily segment the metric to see it by host, process, container, and so on. Just use ‘Segment by’ in the UI.|

                                                                                                                                                                                                                                                                                                                                                                      sysdig_host_net_request_count

                                                                                                                                                                                                                                                                                                                                                                      |Prometheus ID |sysdig_host_net_request_count | |Legacy ID |net.request.count | |Metric Type |counter | |Unit |number | |Description |Total number of network requests. Note, this value may exceed the sum of inbound and outbound requests, because this count includes requests over internal connections.| |Addional Notes| |

                                                                                                                                                                                                                                                                                                                                                                      sysdig_host_net_request_in_count

                                                                                                                                                                                                                                                                                                                                                                      |Prometheus ID |sysdig_host_net_request_in_count | |Legacy ID |net.request.count.in | |Metric Type |counter | |Unit |number | |Description |Number of inbound network requests.| |Addional Notes| |

                                                                                                                                                                                                                                                                                                                                                                      sysdig_host_net_request_in_time

                                                                                                                                                                                                                                                                                                                                                                      |Prometheus ID |sysdig_host_net_request_in_time | |Legacy ID |net.request.time.in | |Metric Type |counter | |Unit |time | |Description |Average time to serve an inbound request.| |Addional Notes| |

                                                                                                                                                                                                                                                                                                                                                                      sysdig_host_net_request_out_count

                                                                                                                                                                                                                                                                                                                                                                      |Prometheus ID |sysdig_host_net_request_out_count | |Legacy ID |net.request.count.out | |Metric Type |counter | |Unit |number | |Description |Number of outbound network requests.| |Addional Notes| |

                                                                                                                                                                                                                                                                                                                                                                      sysdig_host_net_request_out_time

                                                                                                                                                                                                                                                                                                                                                                      |Prometheus ID |sysdig_host_net_request_out_time | |Legacy ID |net.request.time.out | |Metric Type |counter | |Unit |time | |Description |Average time spent waiting for an outbound request.| |Addional Notes| |

                                                                                                                                                                                                                                                                                                                                                                      sysdig_host_net_request_time

                                                                                                                                                                                                                                                                                                                                                                      |Prometheus ID |sysdig_host_net_request_time | |Legacy ID |net.request.time | |Metric Type |counter | |Unit |time | |Description |Average time to serve a network request.| |Addional Notes| |

                                                                                                                                                                                                                                                                                                                                                                      sysdig_host_net_server_connection_in_count

                                                                                                                                                                                                                                                                                                                                                                      |Prometheus ID |sysdig_host_net_server_connection_in_count| |Legacy ID |net.server.connection.count.in | |Metric Type |counter | |Unit |number | |Description | | |Addional Notes| |

                                                                                                                                                                                                                                                                                                                                                                      sysdig_host_net_server_in_bytes

                                                                                                                                                                                                                                                                                                                                                                      |Prometheus ID |sysdig_host_net_server_in_bytes| |Legacy ID |net.server.bytes.in | |Metric Type |counter | |Unit |data | |Description | | |Addional Notes| |

                                                                                                                                                                                                                                                                                                                                                                      sysdig_host_net_server_out_bytes

                                                                                                                                                                                                                                                                                                                                                                      |Prometheus ID |sysdig_host_net_server_out_bytes| |Legacy ID |net.server.bytes.out | |Metric Type |counter | |Unit |data | |Description | | |Addional Notes| |

                                                                                                                                                                                                                                                                                                                                                                      sysdig_host_net_server_total_bytes

                                                                                                                                                                                                                                                                                                                                                                      |Prometheus ID |sysdig_host_net_server_total_bytes| |Legacy ID |net.server.bytes.total | |Metric Type |counter | |Unit |data | |Description | | |Addional Notes| |

                                                                                                                                                                                                                                                                                                                                                                      sysdig_host_net_sql_error_count

                                                                                                                                                                                                                                                                                                                                                                      |Prometheus ID |sysdig_host_net_sql_error_count| |Legacy ID |net.sql.error.count | |Metric Type |counter | |Unit |number | |Description |Number of Failed SQL requests. | |Addional Notes| |

                                                                                                                                                                                                                                                                                                                                                                      sysdig_host_net_sql_query_error_count

                                                                                                                                                                                                                                                                                                                                                                      |Prometheus ID |sysdig_host_net_sql_query_error_count| |Legacy ID |net.sql.query.error.count | |Metric Type |counter | |Unit |number | |Description | | |Addional Notes| |

                                                                                                                                                                                                                                                                                                                                                                      sysdig_host_net_sql_query_request_count

                                                                                                                                                                                                                                                                                                                                                                      |Prometheus ID |sysdig_host_net_sql_query_request_count| |Legacy ID |net.sql.query.request.count | |Metric Type |counter | |Unit |number | |Description | | |Addional Notes| |

                                                                                                                                                                                                                                                                                                                                                                      sysdig_host_net_sql_query_request_time

                                                                                                                                                                                                                                                                                                                                                                      |Prometheus ID |sysdig_host_net_sql_query_request_time| |Legacy ID |net.sql.query.request.time | |Metric Type |counter | |Unit |time | |Description | | |Addional Notes| |

                                                                                                                                                                                                                                                                                                                                                                      sysdig_host_net_sql_querytype_error_count

                                                                                                                                                                                                                                                                                                                                                                      |Prometheus ID |sysdig_host_net_sql_querytype_error_count| |Legacy ID |net.sql.querytype.error.count | |Metric Type |counter | |Unit |number | |Description | | |Addional Notes| |

                                                                                                                                                                                                                                                                                                                                                                      sysdig_host_net_sql_querytype_request_count

                                                                                                                                                                                                                                                                                                                                                                      |Prometheus ID |sysdig_host_net_sql_querytype_request_count| |Legacy ID |net.sql.querytype.request.count | |Metric Type |counter | |Unit |number | |Description | | |Addional Notes| |

                                                                                                                                                                                                                                                                                                                                                                      sysdig_host_net_sql_querytype_request_time

                                                                                                                                                                                                                                                                                                                                                                      |Prometheus ID |sysdig_host_net_sql_querytype_request_time| |Legacy ID |net.sql.querytype.request.time | |Metric Type |counter | |Unit |time | |Description | | |Addional Notes| |

                                                                                                                                                                                                                                                                                                                                                                      sysdig_host_net_sql_request_count

                                                                                                                                                                                                                                                                                                                                                                      |Prometheus ID |sysdig_host_net_sql_request_count| |Legacy ID |net.sql.request.count | |Metric Type |counter | |Unit |number | |Description |Number of SQL requests. | |Addional Notes| |

                                                                                                                                                                                                                                                                                                                                                                      sysdig_host_net_sql_request_time

                                                                                                                                                                                                                                                                                                                                                                      |Prometheus ID |sysdig_host_net_sql_request_time | |Legacy ID |net.sql.request.time | |Metric Type |counter | |Unit |time | |Description |Average time to complete a SQL request.| |Addional Notes| |

                                                                                                                                                                                                                                                                                                                                                                      sysdig_host_net_sql_table_error_count

                                                                                                                                                                                                                                                                                                                                                                      |Prometheus ID |sysdig_host_net_sql_table_error_count| |Legacy ID |net.sql.table.error.count | |Metric Type |counter | |Unit |number | |Description | | |Addional Notes| |

                                                                                                                                                                                                                                                                                                                                                                      sysdig_host_net_sql_table_request_count

                                                                                                                                                                                                                                                                                                                                                                      |Prometheus ID |sysdig_host_net_sql_table_request_count| |Legacy ID |net.sql.table.request.count | |Metric Type |counter | |Unit |number | |Description | | |Addional Notes| |

                                                                                                                                                                                                                                                                                                                                                                      sysdig_host_net_sql_table_request_time

                                                                                                                                                                                                                                                                                                                                                                      |Prometheus ID |sysdig_host_net_sql_table_request_time| |Legacy ID |net.sql.table.request.time | |Metric Type |counter | |Unit |time | |Description | | |Addional Notes| |

                                                                                                                                                                                                                                                                                                                                                                      sysdig_host_net_tcp_queue_len

                                                                                                                                                                                                                                                                                                                                                                      |Prometheus ID |sysdig_host_net_tcp_queue_len | |Legacy ID |net.tcp.queue.len | |Metric Type |counter | |Unit |number | |Description |Length of the TCP request queue.| |Addional Notes| |

                                                                                                                                                                                                                                                                                                                                                                      sysdig_host_net_total_bytes

                                                                                                                                                                                                                                                                                                                                                                      |Prometheus ID |sysdig_host_net_total_bytes | |Legacy ID |net.bytes.total | |Metric Type |counter | |Unit |data | |Description |Total network bytes, inbound and outbound. | |Addional Notes|By default, this metric shows the total value for the selected scope. For instance, if you apply it to a group of machines, you will see the total value for the whole group. However, you can easily segment the metric to see it by host, process, container, and so on. Just use ‘Segment by’ in the UI.|

                                                                                                                                                                                                                                                                                                                                                                      sysdig_host_proc_count

                                                                                                                                                                                                                                                                                                                                                                      |Prometheus ID |sysdig_host_proc_count | |Legacy ID |proc.count | |Metric Type |counter | |Unit |number | |Description |Number of processes on host or container.| |Addional Notes| |

                                                                                                                                                                                                                                                                                                                                                                      sysdig_host_syscall_count

                                                                                                                                                                                                                                                                                                                                                                      |Prometheus ID |sysdig_host_syscall_count | |Legacy ID |syscall.count | |Metric Type |gauge | |Unit |number | |Description |Total number of syscalls seen | |Addional Notes|Syscalls are resource intensive. This metric tracks how many have been made by a given process or container|

                                                                                                                                                                                                                                                                                                                                                                      sysdig_host_syscall_error_count

                                                                                                                                                                                                                                                                                                                                                                      |Prometheus ID |sysdig_host_syscall_error_count | |Legacy ID |host.error.count | |Metric Type |counter | |Unit |number | |Description |Number of system call errors. | |Addional Notes|By default, this metric shows the total value for the selected scope. For instance, if you apply it to a group of machines, you will see the total value for the whole group. However, you can easily segment the metric to see it by host, process, container, and so on. Just use ‘Segment by’ in the UI.|

                                                                                                                                                                                                                                                                                                                                                                      sysdig_host_system_uptime

                                                                                                                                                                                                                                                                                                                                                                      |Prometheus ID |sysdig_host_system_uptime | |Legacy ID |system.uptime | |Metric Type |gauge | |Unit |time | |Description |This metric is sent by the agent and represent the amount of seconds since host boot time. It is not available with container granularity.| |Addional Notes| |

                                                                                                                                                                                                                                                                                                                                                                      sysdig_host_thread_count

                                                                                                                                                                                                                                                                                                                                                                      |Prometheus ID |sysdig_host_thread_count| |Legacy ID |thread.count | |Metric Type |counter | |Unit |number | |Description | | |Addional Notes| |

                                                                                                                                                                                                                                                                                                                                                                      sysdig_host_timeseries_count_appcheck

                                                                                                                                                                                                                                                                                                                                                                      |Prometheus ID |sysdig_host_timeseries_count_appcheck| |Legacy ID |metricCount.appCheck | |Metric Type |gauge | |Unit |number | |Description | | |Addional Notes| |

                                                                                                                                                                                                                                                                                                                                                                      sysdig_host_timeseries_count_jmx

                                                                                                                                                                                                                                                                                                                                                                      |Prometheus ID |sysdig_host_timeseries_count_jmx| |Legacy ID |metricCount.jmx | |Metric Type |gauge | |Unit |number | |Description | | |Addional Notes| |

                                                                                                                                                                                                                                                                                                                                                                      sysdig_host_timeseries_count_prometheus

                                                                                                                                                                                                                                                                                                                                                                      |Prometheus ID |sysdig_host_timeseries_count_prometheus| |Legacy ID |metricCount.prometheus | |Metric Type |gauge | |Unit |number | |Description | | |Addional Notes| |

                                                                                                                                                                                                                                                                                                                                                                      sysdig_host_timeseries_count_statsd

                                                                                                                                                                                                                                                                                                                                                                      |Prometheus ID |sysdig_host_timeseries_count_statsd| |Legacy ID |metricCount.statsd | |Metric Type |gauge | |Unit |number | |Description | | |Addional Notes| |

                                                                                                                                                                                                                                                                                                                                                                      sysdig_host_up

                                                                                                                                                                                                                                                                                                                                                                      |Prometheus ID |sysdig_host_up | |Legacy ID |uptime | |Metric Type |gauge | |Unit |number | |Description |The percentage of time the selected entity was down during the visualized time sample. This can be used to determine if a machine (or a group of machines) went down.| |Addional Notes| |

                                                                                                                                                                                                                                                                                                                                                                      10.6 -

                                                                                                                                                                                                                                                                                                                                                                      JMX/JVM

                                                                                                                                                                                                                                                                                                                                                                      jmx_jvm_class_loaded

                                                                                                                                                                                                                                                                                                                                                                      |Prometheus ID |jmx_jvm_class_loaded | |Legacy ID |jvm.class.loaded | |Metric Type |gauge | |Unit |number | |Description |The number of classes that are currently loaded in the JVM. | |Addional Notes|By default, this metric shows the total value for the selected scope. For instance, if you apply it to a group of machines, you will see the total value for the whole group. However, you can easily segment the metric to see it by host, process, container, and so on. Just use ‘Segment by’ in the UI.|

                                                                                                                                                                                                                                                                                                                                                                      jmx_jvm_class_unloaded

                                                                                                                                                                                                                                                                                                                                                                      |Prometheus ID |jmx_jvm_class_unloaded| |Legacy ID |jvm.class.unloaded | |Metric Type |gauge | |Unit |number | |Description | | |Addional Notes| |

                                                                                                                                                                                                                                                                                                                                                                      jmx_jvm_gc_ConcurrentMarkSweep_count

                                                                                                                                                                                                                                                                                                                                                                      |Prometheus ID |jmx_jvm_gc_ConcurrentMarkSweep_count | |Legacy ID |jvm.gc.ConcurrentMarkSweep.count | |Metric Type |counter | |Unit |number | |Description |The number of times the Concurrent Mark-Sweep garbage collector has run.| |Addional Notes| |

                                                                                                                                                                                                                                                                                                                                                                      jmx_jvm_gc_ConcurrentMarkSweep_time

                                                                                                                                                                                                                                                                                                                                                                      |Prometheus ID |jmx_jvm_gc_ConcurrentMarkSweep_time | |Legacy ID |jvm.gc.ConcurrentMarkSweep.time | |Metric Type |counter | |Unit |time | |Description |The amount of time the Concurrent Mark-Sweep garbage collector has run.| |Addional Notes| |

                                                                                                                                                                                                                                                                                                                                                                      jmx_jvm_gc_Copy_count

                                                                                                                                                                                                                                                                                                                                                                      |Prometheus ID |jmx_jvm_gc_Copy_count| |Legacy ID |jvm.gc.Copy.count | |Metric Type |counter | |Unit |number | |Description | | |Addional Notes| |

                                                                                                                                                                                                                                                                                                                                                                      jmx_jvm_gc_Copy_time

                                                                                                                                                                                                                                                                                                                                                                      |Prometheus ID |jmx_jvm_gc_Copy_time| |Legacy ID |jvm.gc.Copy.time | |Metric Type |counter | |Unit |time | |Description | | |Addional Notes| |

                                                                                                                                                                                                                                                                                                                                                                      jmx_jvm_gc_G1_Old_Generation_count

                                                                                                                                                                                                                                                                                                                                                                      |Prometheus ID |jmx_jvm_gc_G1_Old_Generation_count| |Legacy ID |jvm.gc.G1_Old_Generation.count | |Metric Type |counter | |Unit |number | |Description | | |Addional Notes| |

                                                                                                                                                                                                                                                                                                                                                                      jmx_jvm_gc_G1_Old_Generation_time

                                                                                                                                                                                                                                                                                                                                                                      |Prometheus ID |jmx_jvm_gc_G1_Old_Generation_time| |Legacy ID |jvm.gc.G1_Old_Generation.time | |Metric Type |counter | |Unit |time | |Description | | |Addional Notes| |

                                                                                                                                                                                                                                                                                                                                                                      jmx_jvm_gc_G1_Young_Generation_count

                                                                                                                                                                                                                                                                                                                                                                      |Prometheus ID |jmx_jvm_gc_G1_Young_Generation_count| |Legacy ID |jvm.gc.G1_Young_Generation.count | |Metric Type |counter | |Unit |number | |Description | | |Addional Notes| |

                                                                                                                                                                                                                                                                                                                                                                      jmx_jvm_gc_G1_Young_Generation_time

                                                                                                                                                                                                                                                                                                                                                                      |Prometheus ID |jmx_jvm_gc_G1_Young_Generation_time| |Legacy ID |jvm.gc.G1_Young_Generation.time | |Metric Type |counter | |Unit |time | |Description | | |Addional Notes| |

                                                                                                                                                                                                                                                                                                                                                                      jmx_jvm_gc_MarkSweepCompact_count

                                                                                                                                                                                                                                                                                                                                                                      |Prometheus ID |jmx_jvm_gc_MarkSweepCompact_count| |Legacy ID |jvm.gc.MarkSweepCompact.count | |Metric Type |counter | |Unit |number | |Description | | |Addional Notes| |

                                                                                                                                                                                                                                                                                                                                                                      jmx_jvm_gc_MarkSweepCompact_time

                                                                                                                                                                                                                                                                                                                                                                      |Prometheus ID |jmx_jvm_gc_MarkSweepCompact_time| |Legacy ID |jvm.gc.MarkSweepCompact.time | |Metric Type |counter | |Unit |time | |Description | | |Addional Notes| |

                                                                                                                                                                                                                                                                                                                                                                      jmx_jvm_gc_PS_MarkSweep_count

                                                                                                                                                                                                                                                                                                                                                                      |Prometheus ID |jmx_jvm_gc_PS_MarkSweep_count | |Legacy ID |jvm.gc.PS_MarkSweep.count | |Metric Type |counter | |Unit |number | |Description |The number of times the parallel scavenge Mark-Sweep old generation garbage collector has run.| |Addional Notes| |

                                                                                                                                                                                                                                                                                                                                                                      jmx_jvm_gc_PS_MarkSweep_time

                                                                                                                                                                                                                                                                                                                                                                      |Prometheus ID |jmx_jvm_gc_PS_MarkSweep_time | |Legacy ID |jvm.gc.PS_MarkSweep.time | |Metric Type |counter | |Unit |time | |Description |The amount of time the parallel scavenge Mark-Sweep old generation garbage collector has run.| |Addional Notes| |

                                                                                                                                                                                                                                                                                                                                                                      jmx_jvm_gc_PS_Scavenge_count

                                                                                                                                                                                                                                                                                                                                                                      |Prometheus ID |jmx_jvm_gc_PS_Scavenge_count | |Legacy ID |jvm.gc.PS_Scavenge.count | |Metric Type |counter | |Unit |number | |Description |The number of times the parallel eden/survivor space garbage collector has run.| |Addional Notes| |

                                                                                                                                                                                                                                                                                                                                                                      jmx_jvm_gc_PS_Scavenge_time

                                                                                                                                                                                                                                                                                                                                                                      |Prometheus ID |jmx_jvm_gc_PS_Scavenge_time | |Legacy ID |jvm.gc.PS_Scavenge.time | |Metric Type |counter | |Unit |time | |Description |The amount of time the parallel eden/survivor space garbage collector has run.| |Addional Notes| |

                                                                                                                                                                                                                                                                                                                                                                      jmx_jvm_gc_ParNew_count

                                                                                                                                                                                                                                                                                                                                                                      |Prometheus ID |jmx_jvm_gc_ParNew_count | |Legacy ID |jvm.gc.ParNew.count | |Metric Type |counter | |Unit |number | |Description |The number of times the parallel garbage collector has run.| |Addional Notes| |

                                                                                                                                                                                                                                                                                                                                                                      jmx_jvm_gc_ParNew_time

                                                                                                                                                                                                                                                                                                                                                                      |Prometheus ID |jmx_jvm_gc_ParNew_time | |Legacy ID |jvm.gc.ParNew.time | |Metric Type |counter | |Unit |time | |Description |The amount of time the parallel garbage collector has run.| |Addional Notes| |

                                                                                                                                                                                                                                                                                                                                                                      jmx_jvm_heap_committed

                                                                                                                                                                                                                                                                                                                                                                      |Prometheus ID |jmx_jvm_heap_committed | |Legacy ID |jvm.heap.committed | |Metric Type |counter | |Unit |number | |Description |The amount of memory that is currently allocated to the JVM for heap memory. Heap memory is the storage area for Java objects. The JVM may release memory to the system and Heap Committed could decrease below Heap Init; but Heap Committed can never increase above Heap Max. | |Addional Notes|By default, this metric shows the average value for the selected scope. For instance, if you apply it to a group of machines, you will see the average value for the whole group. However, you can easily segment the metric to see it by host, process, container, and so on. Just use ‘Segment by’ in the UI.|

                                                                                                                                                                                                                                                                                                                                                                      jmx_jvm_heap_init

                                                                                                                                                                                                                                                                                                                                                                      |Prometheus ID |jmx_jvm_heap_init | |Legacy ID |jvm.heap.init | |Metric Type |counter | |Unit |number | |Description |The initial amount of memory that the JVM requests from the operating system for heap memory during startup (defined by the –Xms option). The JVM may request additional memory from the operating system and may also release memory to the system over time. The value of Heap Init may be undefined. | |Addional Notes|By default, this metric shows the average value for the selected scope. For instance, if you apply it to a group of machines, you will see the average value for the whole group. However, you can easily segment the metric to see it by host, process, container, and so on. Just use ‘Segment by’ in the UI.|

                                                                                                                                                                                                                                                                                                                                                                      jmx_jvm_heap_max

                                                                                                                                                                                                                                                                                                                                                                      |Prometheus ID |jmx_jvm_heap_max | |Legacy ID |jvm.heap.max | |Metric Type |counter | |Unit |number | |Description |The maximum size allocation of heap memory for the JVM (defined by the –Xmx option). Any memory allocation attempt that would exceed this limit will cause an OutOfMemoryError exception to be thrown. | |Addional Notes|By default, this metric shows the average value for the selected scope. For instance, if you apply it to a group of machines, you will see the average value for the whole group. However, you can easily segment the metric to see it by host, process, container, and so on. Just use ‘Segment by’ in the UI.|

                                                                                                                                                                                                                                                                                                                                                                      jmx_jvm_heap_used

                                                                                                                                                                                                                                                                                                                                                                      |Prometheus ID |jmx_jvm_heap_used | |Legacy ID |jvm.heap.used | |Metric Type |counter | |Unit |number | |Description |The amount of allocated heap memory (ie Heap Committed) currently in use. Heap memory is the storage area for Java objects. An object in the heap that is referenced by another object is ’live’, and will remain in the heap as long as it continues to be referenced. Objects that are no longer referenced are garbage and will be cleared out of the heap to reclaim space.| |Addional Notes|By default, this metric shows the average value for the selected scope. For instance, if you apply it to a group of machines, you will see the average value for the whole group. However, you can easily segment the metric to see it by host, process, container, and so on. Just use ‘Segment by’ in the UI. |

                                                                                                                                                                                                                                                                                                                                                                      jmx_jvm_heap_used_percent

                                                                                                                                                                                                                                                                                                                                                                      |Prometheus ID |jmx_jvm_heap_used_percent | |Legacy ID |jvm.heap.used.percent | |Metric Type |gauge | |Unit |percent | |Description |The ratio between Heap Used and Heap Committed. | |Addional Notes|By default, this metric shows the average value for the selected scope. For instance, if you apply it to a group of machines, you will see the average value for the whole group. However, you can easily segment the metric to see it by host, process, container, and so on. Just use ‘Segment by’ in the UI.|

                                                                                                                                                                                                                                                                                                                                                                      jmx_jvm_nonHeap_committed

                                                                                                                                                                                                                                                                                                                                                                      |Prometheus ID |jmx_jvm_nonHeap_committed | |Legacy ID |jvm.nonHeap.committed | |Metric Type |counter | |Unit |number | |Description |The amount of memory that is currently allocated to the JVM for non-heap memory. Non-heap memory is used by Java to store loaded classes and other meta-data. The JVM may release memory to the system and Non-Heap Committed could decrease below Non-Heap Init; but Non-Heap Committed can never increase above Non-Heap Max.| |Addional Notes|By default, this metric shows the average value for the selected scope. For instance, if you apply it to a group of machines, you will see the average value for the whole group. However, you can easily segment the metric to see it by host, process, container, and so on. Just use ‘Segment by’ in the UI. |

                                                                                                                                                                                                                                                                                                                                                                      jmx_jvm_nonHeap_init

                                                                                                                                                                                                                                                                                                                                                                      |Prometheus ID |jmx_jvm_nonHeap_init | |Legacy ID |jvm.nonHeap.init | |Metric Type |counter | |Unit |number | |Description |The initial amount of memory that the JVM requests from the operating system for non-heap memory during startup. The JVM may request additional memory from the operating system and may also release memory to the system over time. The value of Non-Heap Init may be undefined. | |Addional Notes|By default, this metric shows the average value for the selected scope. For instance, if you apply it to a group of machines, you will see the average value for the whole group. However, you can easily segment the metric to see it by host, process, container, and so on. Just use ‘Segment by’ in the UI.|

                                                                                                                                                                                                                                                                                                                                                                      jmx_jvm_nonHeap_max

                                                                                                                                                                                                                                                                                                                                                                      |Prometheus ID |jmx_jvm_nonHeap_max | |Legacy ID |jvm.nonHeap.max | |Metric Type |counter | |Unit |number | |Description |The maximum size allocation of non-heap memory for the JVM. This memory is used by Java to store loaded classes and other meta-data. | |Addional Notes|By default, this metric shows the average value for the selected scope. For instance, if you apply it to a group of machines, you will see the average value for the whole group. However, you can easily segment the metric to see it by host, process, container, and so on. Just use ‘Segment by’ in the UI.|

                                                                                                                                                                                                                                                                                                                                                                      jmx_jvm_nonHeap_used

                                                                                                                                                                                                                                                                                                                                                                      |Prometheus ID |jmx_jvm_nonHeap_used | |Legacy ID |jvm.nonHeap.used | |Metric Type |counter | |Unit |number | |Description |The amount of allocated non-heap memory (ie Non-Heap Committed) currently in use. Non-heap memory is used by Java to store loaded classes and other meta-data. | |Addional Notes|By default, this metric shows the average value for the selected scope. For instance, if you apply it to a group of machines, you will see the average value for the whole group. However, you can easily segment the metric to see it by host, process, container, and so on. Just use ‘Segment by’ in the UI.|

                                                                                                                                                                                                                                                                                                                                                                      jmx_jvm_nonHeap_used_percent

                                                                                                                                                                                                                                                                                                                                                                      |Prometheus ID |jmx_jvm_nonHeap_used_percent | |Legacy ID |jvm.nonHeap.used.percent | |Metric Type |gauge | |Unit |percent | |Description |The ratio between Non-Heap Used and Non-Heap Committed. | |Addional Notes|By default, this metric shows the average value for the selected scope. For instance, if you apply it to a group of machines, you will see the average value for the whole group. However, you can easily segment the metric to see it by host, process, container, and so on. Just use ‘Segment by’ in the UI.|

                                                                                                                                                                                                                                                                                                                                                                      jmx_jvm_thread_count

                                                                                                                                                                                                                                                                                                                                                                      |Prometheus ID |jmx_jvm_thread_count | |Legacy ID |jvm.thread.count | |Metric Type |gauge | |Unit |number | |Description |The current number of live daemon and non-daemon threads. | |Addional Notes|By default, this metric shows the total value for the selected scope. For instance, if you apply it to a group of machines, you will see the total value for the whole group. However, you can easily segment the metric to see it by host, process, container, and so on. Just use ‘Segment by’ in the UI.|

                                                                                                                                                                                                                                                                                                                                                                      jmx_jvm_thread_daemon

                                                                                                                                                                                                                                                                                                                                                                      |Prometheus ID |jmx_jvm_thread_daemon | |Legacy ID |jvm.thread.daemon | |Metric Type |gauge | |Unit |number | |Description |The current number of live daemon threads. Daemon threads are used for background supporting tasks and are only needed while normal threads are executing. | |Addional Notes|By default, this metric shows the total value for the selected scope. For instance, if you apply it to a group of machines, you will see the total value for the whole group. However, you can easily segment the metric to see it by host, process, container, and so on. Just use ‘Segment by’ in the UI.|

                                                                                                                                                                                                                                                                                                                                                                      10.7 -

                                                                                                                                                                                                                                                                                                                                                                      Kubernetes

                                                                                                                                                                                                                                                                                                                                                                      kube_daemonset_labels

                                                                                                                                                                                                                                                                                                                                                                      |Prometheus ID |kube_daemonset_labels| |Legacy ID | | |Metric Type |gauge | |Unit |number | |Description | | |Addional Notes| |

                                                                                                                                                                                                                                                                                                                                                                      kube_daemonset_status_current_number_scheduled

                                                                                                                                                                                                                                                                                                                                                                      |Prometheus ID |kube_daemonset_status_current_number_scheduled | |Legacy ID |kubernetes.daemonSet.pods.scheduled | |Metric Type |gauge | |Unit |number | |Description |The number of nodes that running at least one daemon and are supposed to.| |Addional Notes| |

                                                                                                                                                                                                                                                                                                                                                                      kube_daemonset_status_desired_number_scheduled

                                                                                                                                                                                                                                                                                                                                                                      |Prometheus ID |kube_daemonset_status_desired_number_scheduled | |Legacy ID |kubernetes.daemonSet.pods.desired | |Metric Type |gauge | |Unit |number | |Description |The number of nodes that should be running the daemon Pod.| |Addional Notes| |

                                                                                                                                                                                                                                                                                                                                                                      kube_daemonset_status_number_misscheduled

                                                                                                                                                                                                                                                                                                                                                                      |Prometheus ID |kube_daemonset_status_number_misscheduled | |Legacy ID |kubernetes.daemonSet.pods.misscheduled | |Metric Type |gauge | |Unit |number | |Description |The number of nodes running a daemon Pod that are not supposed to.| |Addional Notes| |

                                                                                                                                                                                                                                                                                                                                                                      kube_daemonset_status_number_ready

                                                                                                                                                                                                                                                                                                                                                                      |Prometheus ID |kube_daemonset_status_number_ready | |Legacy ID |kubernetes.daemonSet.pods.ready | |Metric Type |gauge | |Unit |number | |Description |The number of nodes that should be running the daemon Pod and have one or more of the daemon Pod running and ready.| |Addional Notes| |

                                                                                                                                                                                                                                                                                                                                                                      kube_deployment_labels

                                                                                                                                                                                                                                                                                                                                                                      |Prometheus ID |kube_deployment_labels| |Legacy ID | | |Metric Type |gauge | |Unit |number | |Description | | |Addional Notes| |

                                                                                                                                                                                                                                                                                                                                                                      kube_deployment_spec_paused

                                                                                                                                                                                                                                                                                                                                                                      |Prometheus ID |kube_deployment_spec_paused | |Legacy ID |kubernetes.deployment.replicas.paused | |Metric Type |gauge | |Unit |number | |Description |The number of paused Pods per deployment. These Pods will not be processed by the deployment controller.| |Addional Notes| |

                                                                                                                                                                                                                                                                                                                                                                      kube_deployment_spec_replicas

                                                                                                                                                                                                                                                                                                                                                                      |Prometheus ID |kube_deployment_spec_replicas | |Legacy ID |kubernetes.deployment.replicas.desired | |Metric Type |gauge | |Unit |number | |Description |The number of desired Pods per deployment.| |Addional Notes| |

                                                                                                                                                                                                                                                                                                                                                                      kube_deployment_status_replicas

                                                                                                                                                                                                                                                                                                                                                                      |Prometheus ID |kube_deployment_status_replicas | |Legacy ID |kubernetes.deployment.replicas.running | |Metric Type |gauge | |Unit |number | |Description |The number of running Pods per deployment.| |Addional Notes| |

                                                                                                                                                                                                                                                                                                                                                                      kube_deployment_status_replicas_available

                                                                                                                                                                                                                                                                                                                                                                      |Prometheus ID |kube_deployment_status_replicas_available | |Legacy ID |kubernetes.deployment.replicas.available | |Metric Type |gauge | |Unit |number | |Description |The number of available Pods per deployment.| |Addional Notes| |

                                                                                                                                                                                                                                                                                                                                                                      kube_deployment_status_replicas_unavailable

                                                                                                                                                                                                                                                                                                                                                                      |Prometheus ID |kube_deployment_status_replicas_unavailable | |Legacy ID |kubernetes.deployment.replicas.unavailable | |Metric Type |gauge | |Unit |number | |Description |The number of unavailable Pods per deployment.| |Addional Notes| |

                                                                                                                                                                                                                                                                                                                                                                      kube_deployment_status_replicas_updated

                                                                                                                                                                                                                                                                                                                                                                      |Prometheus ID |kube_deployment_status_replicas_updated | |Legacy ID |kubernetes.deployment.replicas.updated | |Metric Type |gauge | |Unit |number | |Description |The number of updated Pods per deployment.| |Addional Notes| |

                                                                                                                                                                                                                                                                                                                                                                      kube_hpa_labels

                                                                                                                                                                                                                                                                                                                                                                      |Prometheus ID |kube_hpa_labels| |Legacy ID | | |Metric Type |gauge | |Unit |number | |Description | | |Addional Notes| |

                                                                                                                                                                                                                                                                                                                                                                      kube_hpa_spec_max_replicas

                                                                                                                                                                                                                                                                                                                                                                      |Prometheus ID |kube_hpa_spec_max_replicas | |Legacy ID |kubernetes.hpa.replicas.max | |Metric Type |gauge | |Unit |number | |Description |Upper limit for the number of Pods that can be set by the autoscaler.| |Addional Notes| |

                                                                                                                                                                                                                                                                                                                                                                      kube_hpa_spec_min_replicas

                                                                                                                                                                                                                                                                                                                                                                      |Prometheus ID |kube_hpa_spec_min_replicas | |Legacy ID |kubernetes.hpa.replicas.min | |Metric Type |gauge | |Unit |number | |Description |Lower limit for the number of Pods that can be set by the autoscaler.| |Addional Notes| |

                                                                                                                                                                                                                                                                                                                                                                      kube_hpa_status_current_replicas

                                                                                                                                                                                                                                                                                                                                                                      |Prometheus ID |kube_hpa_status_current_replicas | |Legacy ID |kubernetes.hpa.replicas.current | |Metric Type |gauge | |Unit |number | |Description |Current number of replicas of Pods managed by this autoscaler.| |Addional Notes| |

                                                                                                                                                                                                                                                                                                                                                                      kube_hpa_status_desired_replicas

                                                                                                                                                                                                                                                                                                                                                                      |Prometheus ID |kube_hpa_status_desired_replicas | |Legacy ID |kubernetes.hpa.replicas.desired | |Metric Type |gauge | |Unit |number | |Description |Desired number of replicas of Pods managed by this autoscaler.| |Addional Notes| |

                                                                                                                                                                                                                                                                                                                                                                      kube_job_complete

                                                                                                                                                                                                                                                                                                                                                                      |Prometheus ID |kube_job_complete | |Legacy ID |kubernetes.job.numSucceeded | |Metric Type |gauge | |Unit |number | |Description |The number of Pods which reached Phase Succeeded.| |Addional Notes| |

                                                                                                                                                                                                                                                                                                                                                                      kube_job_failed

                                                                                                                                                                                                                                                                                                                                                                      |Prometheus ID |kube_job_failed | |Legacy ID |kubernetes.job.numFailed | |Metric Type |gauge | |Unit |number | |Description |The number of Pods which reached Phase Failed.| |Addional Notes| |

                                                                                                                                                                                                                                                                                                                                                                      kube_job_info

                                                                                                                                                                                                                                                                                                                                                                      |Prometheus ID |kube_job_info| |Legacy ID | | |Metric Type |gauge | |Unit |number | |Description | | |Addional Notes| |

                                                                                                                                                                                                                                                                                                                                                                      kube_job_labels

                                                                                                                                                                                                                                                                                                                                                                      |Prometheus ID |kube_job_labels| |Legacy ID | | |Metric Type |gauge | |Unit |number | |Description | | |Addional Notes| |

                                                                                                                                                                                                                                                                                                                                                                      kube_job_owner

                                                                                                                                                                                                                                                                                                                                                                      |Prometheus ID |kube_job_owner| |Legacy ID | | |Metric Type |gauge | |Unit |number | |Description | | |Addional Notes| |

                                                                                                                                                                                                                                                                                                                                                                      kube_job_spec_completions

                                                                                                                                                                                                                                                                                                                                                                      |Prometheus ID |kube_job_spec_completions | |Legacy ID |kubernetes.job.completions | |Metric Type |gauge | |Unit |number | |Description |The desired number of successfully finished Pods that the job should be run with.| |Addional Notes| |

                                                                                                                                                                                                                                                                                                                                                                      kube_job_spec_parallelism

                                                                                                                                                                                                                                                                                                                                                                      |Prometheus ID |kube_job_spec_parallelism | |Legacy ID |kubernetes.job.parallelism | |Metric Type |gauge | |Unit |number | |Description |The maximum desired number of Pods that the job should run at any given time.| |Addional Notes| |

                                                                                                                                                                                                                                                                                                                                                                      kube_job_status_active

                                                                                                                                                                                                                                                                                                                                                                      |Prometheus ID |kube_job_status_active | |Legacy ID |kubernetes.job.status.active | |Metric Type |gauge | |Unit |number | |Description |The number of actively running Pods.| |Addional Notes| |

                                                                                                                                                                                                                                                                                                                                                                      kube_namespace_labels

                                                                                                                                                                                                                                                                                                                                                                      |Prometheus ID |kube_namespace_labels| |Legacy ID | | |Metric Type |gauge | |Unit |number | |Description | | |Addional Notes| |

                                                                                                                                                                                                                                                                                                                                                                      kube_namespace_sysdig_count

                                                                                                                                                                                                                                                                                                                                                                      |Prometheus ID |kube_namespace_sysdig_count| |Legacy ID |kubernetes.namespace.count | |Metric Type |gauge | |Unit |number | |Description |The number of namespaces. | |Addional Notes| |

                                                                                                                                                                                                                                                                                                                                                                      kube_namespace_sysdig_deployment_count

                                                                                                                                                                                                                                                                                                                                                                      |Prometheus ID |kube_namespace_sysdig_deployment_count | |Legacy ID |kubernetes.namespace.deployment.count | |Metric Type |gauge | |Unit |number | |Description |The number of deployments per namespace.| |Addional Notes| |

                                                                                                                                                                                                                                                                                                                                                                      kube_namespace_sysdig_hpa_count

                                                                                                                                                                                                                                                                                                                                                                      |Prometheus ID |kube_namespace_sysdig_hpa_count | |Legacy ID |kubernetes.namespace.hpa.count | |Metric Type |gauge | |Unit |number | |Description |The number of HPA per namespace.| |Addional Notes| |

                                                                                                                                                                                                                                                                                                                                                                      kube_namespace_sysdig_job_count

                                                                                                                                                                                                                                                                                                                                                                      |Prometheus ID |kube_namespace_sysdig_job_count | |Legacy ID |kubernetes.namespace.job.count | |Metric Type |gauge | |Unit |number | |Description |The number of jobs per namespace.| |Addional Notes| |

                                                                                                                                                                                                                                                                                                                                                                      kube_namespace_sysdig_persistentvolumeclaim_count

                                                                                                                                                                                                                                                                                                                                                                      |Prometheus ID |kube_namespace_sysdig_persistentvolumeclaim_count | |Legacy ID |kubernetes.namespace.persistentvolumeclaim.count | |Metric Type |gauge | |Unit |number | |Description |The number of persistentvolumeclaim per namespace.| |Addional Notes| |

                                                                                                                                                                                                                                                                                                                                                                      kube_namespace_sysdig_pod_available_count

                                                                                                                                                                                                                                                                                                                                                                      |Prometheus ID |kube_namespace_sysdig_pod_available_count | |Legacy ID |kubernetes.namespace.pod.available.count | |Metric Type |gauge | |Unit |number | |Description |The number of available Pods per namespace.| |Addional Notes| |

                                                                                                                                                                                                                                                                                                                                                                      kube_namespace_sysdig_pod_desired_count

                                                                                                                                                                                                                                                                                                                                                                      |Prometheus ID |kube_namespace_sysdig_pod_desired_count | |Legacy ID |kubernetes.namespace.pod.desired.count | |Metric Type |gauge | |Unit |number | |Description |The number of desired Pods per namespace.| |Addional Notes| |

                                                                                                                                                                                                                                                                                                                                                                      kube_namespace_sysdig_pod_running_count

                                                                                                                                                                                                                                                                                                                                                                      |Prometheus ID |kube_namespace_sysdig_pod_running_count| |Legacy ID |kubernetes.namespace.pod.running.count | |Metric Type |gauge | |Unit |number | |Description | | |Addional Notes| |

                                                                                                                                                                                                                                                                                                                                                                      kube_namespace_sysdig_replicaset_count

                                                                                                                                                                                                                                                                                                                                                                      |Prometheus ID |kube_namespace_sysdig_replicaset_count | |Legacy ID |kubernetes.namespace.replicaSet.count | |Metric Type |gauge | |Unit |number | |Description |The number of replicaSets per namespace.| |Addional Notes| |

                                                                                                                                                                                                                                                                                                                                                                      kube_namespace_sysdig_resourcequota_count

                                                                                                                                                                                                                                                                                                                                                                      |Prometheus ID |kube_namespace_sysdig_resourcequota_count | |Legacy ID |kubernetes.namespace.resourcequota.count | |Metric Type |gauge | |Unit |number | |Description |The number of resource quota per namespace.| |Addional Notes| |

                                                                                                                                                                                                                                                                                                                                                                      kube_namespace_sysdig_service_count

                                                                                                                                                                                                                                                                                                                                                                      |Prometheus ID |kube_namespace_sysdig_service_count | |Legacy ID |kubernetes.namespace.service.count | |Metric Type |gauge | |Unit |number | |Description |The number of services per namespace.| |Addional Notes| |

                                                                                                                                                                                                                                                                                                                                                                      kube_namespace_sysdig_statefulset_count

                                                                                                                                                                                                                                                                                                                                                                      |Prometheus ID |kube_namespace_sysdig_statefulset_count | |Legacy ID |kubernetes.namespace.statefulSet.count | |Metric Type |gauge | |Unit |number | |Description |The number of statefulset per namespace.| |Addional Notes| |

                                                                                                                                                                                                                                                                                                                                                                      kube_node_info

                                                                                                                                                                                                                                                                                                                                                                      |Prometheus ID |kube_node_info| |Legacy ID | | |Metric Type |gauge | |Unit |number | |Description | | |Addional Notes| |

                                                                                                                                                                                                                                                                                                                                                                      kube_node_labels

                                                                                                                                                                                                                                                                                                                                                                      |Prometheus ID |kube_node_labels| |Legacy ID | | |Metric Type |gauge | |Unit |number | |Description | | |Addional Notes| |

                                                                                                                                                                                                                                                                                                                                                                      kube_node_spec_unschedulable

                                                                                                                                                                                                                                                                                                                                                                      |Prometheus ID |kube_node_spec_unschedulable | |Legacy ID |kubernetes.node.unschedulable | |Metric Type |gauge | |Unit |number | |Description |The number of nodes unavailable to schedule new Pods.| |Addional Notes| |

                                                                                                                                                                                                                                                                                                                                                                      kube_node_status_allocatable

                                                                                                                                                                                                                                                                                                                                                                      |Prometheus ID |kube_node_status_allocatable| |Legacy ID | | |Metric Type |gauge | |Unit |number | |Description | | |Addional Notes| |

                                                                                                                                                                                                                                                                                                                                                                      kube_node_status_allocatable_cpu_cores

                                                                                                                                                                                                                                                                                                                                                                      |Prometheus ID |kube_node_status_allocatable_cpu_cores | |Legacy ID |kubernetes.node.allocatable.cpuCores | |Metric Type |gauge | |Unit |number | |Description |The CPU resources of a node that are available for scheduling.| |Addional Notes| |

                                                                                                                                                                                                                                                                                                                                                                      kube_node_status_allocatable_memory_bytes

                                                                                                                                                                                                                                                                                                                                                                      |Prometheus ID |kube_node_status_allocatable_memory_bytes | |Legacy ID |kubernetes.node.allocatable.memBytes | |Metric Type |gauge | |Unit |data | |Description |The memory resources of a node that are available for scheduling.| |Addional Notes| |

                                                                                                                                                                                                                                                                                                                                                                      kube_node_status_allocatable_pods

                                                                                                                                                                                                                                                                                                                                                                      |Prometheus ID |kube_node_status_allocatable_pods | |Legacy ID |kubernetes.node.allocatable.pods | |Metric Type |gauge | |Unit |number | |Description |The Pod resources of a node that are available for scheduling.| |Addional Notes| |

                                                                                                                                                                                                                                                                                                                                                                      kube_node_status_capacity

                                                                                                                                                                                                                                                                                                                                                                      |Prometheus ID |kube_node_status_capacity| |Legacy ID | | |Metric Type |gauge | |Unit |number | |Description | | |Addional Notes| |

                                                                                                                                                                                                                                                                                                                                                                      kube_node_status_capacity_cpu_cores

                                                                                                                                                                                                                                                                                                                                                                      |Prometheus ID |kube_node_status_capacity_cpu_cores | |Legacy ID |kubernetes.node.capacity.cpuCores | |Metric Type |gauge | |Unit |number | |Description |The maximum CPU resources of the node.| |Addional Notes| |

                                                                                                                                                                                                                                                                                                                                                                      kube_node_status_capacity_memory_bytes

                                                                                                                                                                                                                                                                                                                                                                      |Prometheus ID |kube_node_status_capacity_memory_bytes | |Legacy ID |kubernetes.node.capacity.memBytes | |Metric Type |gauge | |Unit |data | |Description |The maximum memory resources of the node.| |Addional Notes| |

                                                                                                                                                                                                                                                                                                                                                                      kube_node_status_capacity_pods

                                                                                                                                                                                                                                                                                                                                                                      |Prometheus ID |kube_node_status_capacity_pods | |Legacy ID |kubernetes.node.capacity.pods | |Metric Type |gauge | |Unit |number | |Description |The maximum number of Pods of the node.| |Addional Notes| |

                                                                                                                                                                                                                                                                                                                                                                      kube_node_status_condition

                                                                                                                                                                                                                                                                                                                                                                      |Prometheus ID |kube_node_status_condition| |Legacy ID | | |Metric Type |gauge | |Unit |number | |Description | | |Addional Notes| |

                                                                                                                                                                                                                                                                                                                                                                      kube_node_sysdig_disk_pressure

                                                                                                                                                                                                                                                                                                                                                                      |Prometheus ID |kube_node_sysdig_disk_pressure | |Legacy ID |kubernetes.node.diskPressure | |Metric Type |gauge | |Unit |number | |Description |The number of nodes with disk pressure.| |Addional Notes| |

                                                                                                                                                                                                                                                                                                                                                                      kube_node_sysdig_host

                                                                                                                                                                                                                                                                                                                                                                      |Prometheus ID |kube_node_sysdig_host| |Legacy ID | | |Metric Type |gauge | |Unit |number | |Description | | |Addional Notes| |

                                                                                                                                                                                                                                                                                                                                                                      kube_node_sysdig_memory_pressure

                                                                                                                                                                                                                                                                                                                                                                      |Prometheus ID |kube_node_sysdig_memory_pressure | |Legacy ID |kubernetes.node.memoryPressure | |Metric Type |gauge | |Unit |number | |Description |The number of nodes with memory pressure.| |Addional Notes| |

                                                                                                                                                                                                                                                                                                                                                                      kube_node_sysdig_network_unavailable

                                                                                                                                                                                                                                                                                                                                                                      |Prometheus ID |kube_node_sysdig_network_unavailable | |Legacy ID |kubernetes.node.networkUnavailable | |Metric Type |gauge | |Unit |number | |Description |The number of nodes with network unavailable.| |Addional Notes| |

                                                                                                                                                                                                                                                                                                                                                                      kube_node_sysdig_ready

                                                                                                                                                                                                                                                                                                                                                                      |Prometheus ID |kube_node_sysdig_ready | |Legacy ID |kubernetes.node.ready | |Metric Type |gauge | |Unit |number | |Description |The number of nodes that are ready.| |Addional Notes| |

                                                                                                                                                                                                                                                                                                                                                                      kube_persistentvolume_capacity_bytes

                                                                                                                                                                                                                                                                                                                                                                      |Prometheus ID |kube_persistentvolume_capacity_bytes| |Legacy ID |kubernetes.persistentvolume.storage | |Metric Type |gauge | |Unit |number | |Description |The persistent volume’s capacity. | |Addional Notes| |

                                                                                                                                                                                                                                                                                                                                                                      kube_persistentvolume_claim_ref

                                                                                                                                                                                                                                                                                                                                                                      |Prometheus ID |kube_persistentvolume_claim_ref| |Legacy ID | | |Metric Type |gauge | |Unit |number | |Description | | |Addional Notes| |

                                                                                                                                                                                                                                                                                                                                                                      kube_persistentvolume_info

                                                                                                                                                                                                                                                                                                                                                                      |Prometheus ID |kube_persistentvolume_info| |Legacy ID | | |Metric Type |gauge | |Unit |number | |Description | | |Addional Notes| |

                                                                                                                                                                                                                                                                                                                                                                      kube_persistentvolume_labels

                                                                                                                                                                                                                                                                                                                                                                      |Prometheus ID |kube_persistentvolume_labels| |Legacy ID | | |Metric Type |gauge | |Unit |number | |Description | | |Addional Notes| |

                                                                                                                                                                                                                                                                                                                                                                      kube_persistentvolume_status_phase

                                                                                                                                                                                                                                                                                                                                                                      |Prometheus ID |kube_persistentvolume_status_phase| |Legacy ID | | |Metric Type |gauge | |Unit |number | |Description | | |Addional Notes| |

                                                                                                                                                                                                                                                                                                                                                                      kube_persistentvolumeclaim_access_mode

                                                                                                                                                                                                                                                                                                                                                                      |Prometheus ID |kube_persistentvolumeclaim_access_mode| |Legacy ID | | |Metric Type |gauge | |Unit |number | |Description | | |Addional Notes| |

                                                                                                                                                                                                                                                                                                                                                                      kube_persistentvolumeclaim_info

                                                                                                                                                                                                                                                                                                                                                                      |Prometheus ID |kube_persistentvolumeclaim_info| |Legacy ID | | |Metric Type |gauge | |Unit |number | |Description | | |Addional Notes| |

                                                                                                                                                                                                                                                                                                                                                                      kube_persistentvolumeclaim_labels

                                                                                                                                                                                                                                                                                                                                                                      |Prometheus ID |kube_persistentvolumeclaim_labels| |Legacy ID | | |Metric Type |gauge | |Unit |number | |Description | | |Addional Notes| |

                                                                                                                                                                                                                                                                                                                                                                      kube_persistentvolumeclaim_resource_requests_storage_bytes

                                                                                                                                                                                                                                                                                                                                                                      |Prometheus ID |kube_persistentvolumeclaim_resource_requests_storage_bytes| |Legacy ID |kubernetes.persistentvolumeclaim.requests.storage | |Metric Type |gauge | |Unit |number | |Description | | |Addional Notes| |

                                                                                                                                                                                                                                                                                                                                                                      kube_persistentvolumeclaim_status_phase

                                                                                                                                                                                                                                                                                                                                                                      |Prometheus ID |kube_persistentvolumeclaim_status_phase| |Legacy ID | | |Metric Type |gauge | |Unit |number | |Description | | |Addional Notes| |

                                                                                                                                                                                                                                                                                                                                                                      kube_persistentvolumeclaim_sysdig_storage

                                                                                                                                                                                                                                                                                                                                                                      |Prometheus ID |kube_persistentvolumeclaim_sysdig_storage | |Legacy ID |kubernetes.persistentvolumeclaim.storage | |Metric Type |gauge | |Unit |number | |Description |The actual resources of the underlying volume.| |Addional Notes| |

                                                                                                                                                                                                                                                                                                                                                                      kube_pod_container_info

                                                                                                                                                                                                                                                                                                                                                                      |Prometheus ID |kube_pod_container_info| |Legacy ID | | |Metric Type |gauge | |Unit |number | |Description | | |Addional Notes| |

                                                                                                                                                                                                                                                                                                                                                                      kube_pod_container_resource_limits

                                                                                                                                                                                                                                                                                                                                                                      |Prometheus ID |kube_pod_container_resource_limits| |Legacy ID | | |Metric Type |gauge | |Unit |number | |Description | | |Addional Notes| |

                                                                                                                                                                                                                                                                                                                                                                      kube_pod_container_resource_requests

                                                                                                                                                                                                                                                                                                                                                                      |Prometheus ID |kube_pod_container_resource_requests| |Legacy ID | | |Metric Type |gauge | |Unit |number | |Description | | |Addional Notes| |

                                                                                                                                                                                                                                                                                                                                                                      kube_pod_container_status_last_terminated_reason

                                                                                                                                                                                                                                                                                                                                                                      |Prometheus ID |kube_pod_container_status_last_terminated_reason| |Legacy ID | | |Metric Type |gauge | |Unit |number | |Description | | |Addional Notes| |

                                                                                                                                                                                                                                                                                                                                                                      kube_pod_container_status_ready

                                                                                                                                                                                                                                                                                                                                                                      |Prometheus ID |kube_pod_container_status_ready| |Legacy ID | | |Metric Type |gauge | |Unit |number | |Description | | |Addional Notes| |

                                                                                                                                                                                                                                                                                                                                                                      kube_pod_container_status_restarts_total

                                                                                                                                                                                                                                                                                                                                                                      |Prometheus ID |kube_pod_container_status_restarts_total| |Legacy ID | | |Metric Type |counter | |Unit |number | |Description | | |Addional Notes| |

                                                                                                                                                                                                                                                                                                                                                                      kube_pod_container_status_running

                                                                                                                                                                                                                                                                                                                                                                      |Prometheus ID |kube_pod_container_status_running| |Legacy ID | | |Metric Type |gauge | |Unit |number | |Description | | |Addional Notes| |

                                                                                                                                                                                                                                                                                                                                                                      kube_pod_container_status_terminated

                                                                                                                                                                                                                                                                                                                                                                      |Prometheus ID |kube_pod_container_status_terminated| |Legacy ID | | |Metric Type |gauge | |Unit |number | |Description | | |Addional Notes| |

                                                                                                                                                                                                                                                                                                                                                                      kube_pod_container_status_terminated_reason

                                                                                                                                                                                                                                                                                                                                                                      |Prometheus ID |kube_pod_container_status_terminated_reason| |Legacy ID | | |Metric Type |gauge | |Unit |number | |Description | | |Addional Notes| |

                                                                                                                                                                                                                                                                                                                                                                      kube_pod_container_status_waiting

                                                                                                                                                                                                                                                                                                                                                                      |Prometheus ID |kube_pod_container_status_waiting| |Legacy ID | | |Metric Type |gauge | |Unit |number | |Description | | |Addional Notes| |

                                                                                                                                                                                                                                                                                                                                                                      kube_pod_container_status_waiting_reason

                                                                                                                                                                                                                                                                                                                                                                      |Prometheus ID |kube_pod_container_status_waiting_reason| |Legacy ID | | |Metric Type |gauge | |Unit |number | |Description | | |Addional Notes| |

                                                                                                                                                                                                                                                                                                                                                                      kube_pod_info

                                                                                                                                                                                                                                                                                                                                                                      |Prometheus ID |kube_pod_info | |Legacy ID |kubernetes.pod.info| |Metric Type |gauge | |Unit |number | |Description | | |Addional Notes| |

                                                                                                                                                                                                                                                                                                                                                                      kube_pod_init_container_resource_limits

                                                                                                                                                                                                                                                                                                                                                                      |Prometheus ID |kube_pod_init_container_resource_limits| |Legacy ID | | |Metric Type |gauge | |Unit |number | |Description | | |Addional Notes| |

                                                                                                                                                                                                                                                                                                                                                                      kube_pod_init_container_resource_requests

                                                                                                                                                                                                                                                                                                                                                                      |Prometheus ID |kube_pod_init_container_resource_requests| |Legacy ID | | |Metric Type |gauge | |Unit |number | |Description | | |Addional Notes| |

                                                                                                                                                                                                                                                                                                                                                                      kube_pod_init_container_status_last_terminated_reason

                                                                                                                                                                                                                                                                                                                                                                      |Prometheus ID |kube_pod_init_container_status_last_terminated_reason| |Legacy ID | | |Metric Type |gauge | |Unit |number | |Description | | |Addional Notes| |

                                                                                                                                                                                                                                                                                                                                                                      kube_pod_init_container_status_ready

                                                                                                                                                                                                                                                                                                                                                                      |Prometheus ID |kube_pod_init_container_status_ready| |Legacy ID | | |Metric Type |gauge | |Unit |number | |Description | | |Addional Notes| |

                                                                                                                                                                                                                                                                                                                                                                      kube_pod_init_container_status_restarts_total

                                                                                                                                                                                                                                                                                                                                                                      |Prometheus ID |kube_pod_init_container_status_restarts_total| |Legacy ID | | |Metric Type |counter | |Unit |number | |Description | | |Addional Notes| |

                                                                                                                                                                                                                                                                                                                                                                      kube_pod_init_container_status_running

                                                                                                                                                                                                                                                                                                                                                                      |Prometheus ID |kube_pod_init_container_status_running| |Legacy ID | | |Metric Type |gauge | |Unit |number | |Description | | |Addional Notes| |

                                                                                                                                                                                                                                                                                                                                                                      kube_pod_init_container_status_terminated

                                                                                                                                                                                                                                                                                                                                                                      |Prometheus ID |kube_pod_init_container_status_terminated| |Legacy ID | | |Metric Type |gauge | |Unit |number | |Description | | |Addional Notes| |

                                                                                                                                                                                                                                                                                                                                                                      kube_pod_init_container_status_terminated_reason

                                                                                                                                                                                                                                                                                                                                                                      |Prometheus ID |kube_pod_init_container_status_terminated_reason| |Legacy ID | | |Metric Type |gauge | |Unit |number | |Description | | |Addional Notes| |

                                                                                                                                                                                                                                                                                                                                                                      kube_pod_init_container_status_waiting

                                                                                                                                                                                                                                                                                                                                                                      |Prometheus ID |kube_pod_init_container_status_waiting| |Legacy ID | | |Metric Type |gauge | |Unit |number | |Description | | |Addional Notes| |

                                                                                                                                                                                                                                                                                                                                                                      kube_pod_init_container_status_waiting_reason

                                                                                                                                                                                                                                                                                                                                                                      |Prometheus ID |kube_pod_init_container_status_waiting_reason| |Legacy ID | | |Metric Type |gauge | |Unit |number | |Description | | |Addional Notes| |

                                                                                                                                                                                                                                                                                                                                                                      kube_pod_labels

                                                                                                                                                                                                                                                                                                                                                                      |Prometheus ID |kube_pod_labels| |Legacy ID | | |Metric Type |gauge | |Unit |number | |Description | | |Addional Notes| |

                                                                                                                                                                                                                                                                                                                                                                      kube_pod_owner

                                                                                                                                                                                                                                                                                                                                                                      |Prometheus ID |kube_pod_owner| |Legacy ID | | |Metric Type |gauge | |Unit |number | |Description | | |Addional Notes| |

                                                                                                                                                                                                                                                                                                                                                                      kube_pod_spec_volumes_persistentvolumeclaims_info

                                                                                                                                                                                                                                                                                                                                                                      |Prometheus ID |kube_pod_spec_volumes_persistentvolumeclaims_info| |Legacy ID | | |Metric Type |gauge | |Unit |number | |Description | | |Addional Notes| |

                                                                                                                                                                                                                                                                                                                                                                      kube_pod_spec_volumes_persistentvolumeclaims_readonly

                                                                                                                                                                                                                                                                                                                                                                      |Prometheus ID |kube_pod_spec_volumes_persistentvolumeclaims_readonly| |Legacy ID | | |Metric Type |gauge | |Unit |number | |Description | | |Addional Notes| |

                                                                                                                                                                                                                                                                                                                                                                      kube_pod_sysdig_containers_waiting

                                                                                                                                                                                                                                                                                                                                                                      |Prometheus ID |kube_pod_sysdig_containers_waiting | |Legacy ID |kubernetes.pod.containers.waiting | |Metric Type |gauge | |Unit |number | |Description |The number of containers waiting for a Pod.| |Addional Notes| |

                                                                                                                                                                                                                                                                                                                                                                      kube_pod_sysdig_resource_limits_cpu_cores

                                                                                                                                                                                                                                                                                                                                                                      |Prometheus ID |kube_pod_sysdig_resource_limits_cpu_cores | |Legacy ID |kubernetes.pod.resourceLimits.cpuCores | |Metric Type |gauge | |Unit |number | |Description |The limit on CPU cores to be used by a container.| |Addional Notes| |

                                                                                                                                                                                                                                                                                                                                                                      kube_pod_sysdig_resource_limits_memory_bytes

                                                                                                                                                                                                                                                                                                                                                                      |Prometheus ID |kube_pod_sysdig_resource_limits_memory_bytes | |Legacy ID |kubernetes.pod.resourceLimits.memBytes | |Metric Type |gauge | |Unit |data | |Description |The limit on memory to be used by a container in bytes.| |Addional Notes| |

                                                                                                                                                                                                                                                                                                                                                                      kube_pod_sysdig_resource_requests_cpu_cores

                                                                                                                                                                                                                                                                                                                                                                      |Prometheus ID |kube_pod_sysdig_resource_requests_cpu_cores | |Legacy ID |kubernetes.pod.resourceRequests.cpuCores | |Metric Type |gauge | |Unit |number | |Description |The number of CPU cores requested by containers in the Pod.| |Addional Notes| |

                                                                                                                                                                                                                                                                                                                                                                      kube_pod_sysdig_resource_requests_memory_bytes

                                                                                                                                                                                                                                                                                                                                                                      |Prometheus ID |kube_pod_sysdig_resource_requests_memory_bytes | |Legacy ID |kubernetes.pod.resourceRequests.memBytes | |Metric Type |gauge | |Unit |data | |Description |The number of memory bytes requested by containers in the Pod.| |Addional Notes| |

                                                                                                                                                                                                                                                                                                                                                                      kube_pod_sysdig_restart_count

                                                                                                                                                                                                                                                                                                                                                                      |Prometheus ID |kube_pod_sysdig_restart_count | |Legacy ID |kubernetes.pod.restart.count | |Metric Type |gauge | |Unit |number | |Description |The number of container restarts for the Pod.| |Addional Notes| |

                                                                                                                                                                                                                                                                                                                                                                      kube_pod_sysdig_restart_rate

                                                                                                                                                                                                                                                                                                                                                                      |Prometheus ID |kube_pod_sysdig_restart_rate| |Legacy ID |kubernetes.pod.restart.rate | |Metric Type |gauge | |Unit |number | |Description | | |Addional Notes| |

                                                                                                                                                                                                                                                                                                                                                                      kube_pod_sysdig_status_ready

                                                                                                                                                                                                                                                                                                                                                                      |Prometheus ID |kube_pod_sysdig_status_ready | |Legacy ID |kubernetes.pod.status.ready | |Metric Type |gauge | |Unit |number | |Description |The number of pods ready to serve requests.| |Addional Notes| |

                                                                                                                                                                                                                                                                                                                                                                      kube_replicaset_labels

                                                                                                                                                                                                                                                                                                                                                                      |Prometheus ID |kube_replicaset_labels| |Legacy ID | | |Metric Type |gauge | |Unit |number | |Description | | |Addional Notes| |

                                                                                                                                                                                                                                                                                                                                                                      kube_replicaset_owner

                                                                                                                                                                                                                                                                                                                                                                      |Prometheus ID |kube_replicaset_owner| |Legacy ID | | |Metric Type |gauge | |Unit |number | |Description | | |Addional Notes| |

                                                                                                                                                                                                                                                                                                                                                                      kube_replicaset_spec_replicas

                                                                                                                                                                                                                                                                                                                                                                      |Prometheus ID |kube_replicaset_spec_replicas | |Legacy ID |kubernetes.replicaSet.replicas.desired | |Metric Type |gauge | |Unit |number | |Description |The number of desired Pods per replicaSet.| |Addional Notes| |

                                                                                                                                                                                                                                                                                                                                                                      kube_replicaset_status_fully_labeled_replicas

                                                                                                                                                                                                                                                                                                                                                                      |Prometheus ID |kube_replicaset_status_fully_labeled_replicas | |Legacy ID |kubernetes.replicaSet.replicas.fullyLabeled | |Metric Type |gauge | |Unit |number | |Description |The number of fully labeled Pods per replicaSet.| |Addional Notes| |

                                                                                                                                                                                                                                                                                                                                                                      kube_replicaset_status_ready_replicas

                                                                                                                                                                                                                                                                                                                                                                      |Prometheus ID |kube_replicaset_status_ready_replicas | |Legacy ID |kubernetes.replicaSet.replicas.ready | |Metric Type |gauge | |Unit |number | |Description |The number of ready Pods per replicaSet.| |Addional Notes| |

                                                                                                                                                                                                                                                                                                                                                                      kube_replicaset_status_replicas

                                                                                                                                                                                                                                                                                                                                                                      |Prometheus ID |kube_replicaset_status_replicas | |Legacy ID |kubernetes.replicaSet.replicas.running | |Metric Type |gauge | |Unit |number | |Description |The number of running Pods per replicaSet.| |Addional Notes| |

                                                                                                                                                                                                                                                                                                                                                                      kube_resourcequota

                                                                                                                                                                                                                                                                                                                                                                      |Prometheus ID |kube_resourcequota| |Legacy ID | | |Metric Type |gauge | |Unit |number | |Description | | |Addional Notes| |

                                                                                                                                                                                                                                                                                                                                                                      kube_resourcequota_sysdig_limits_cpu_hard

                                                                                                                                                                                                                                                                                                                                                                      |Prometheus ID |kube_resourcequota_sysdig_limits_cpu_hard| |Legacy ID |kubernetes.resourcequota.limits.cpu.hard | |Metric Type |gauge | |Unit |number | |Description |Enforced CPU Limit quota per namespace. | |Addional Notes| |

                                                                                                                                                                                                                                                                                                                                                                      kube_resourcequota_sysdig_limits_cpu_used

                                                                                                                                                                                                                                                                                                                                                                      |Prometheus ID |kube_resourcequota_sysdig_limits_cpu_used | |Legacy ID |kubernetes.resourcequota.limits.cpu.used | |Metric Type |gauge | |Unit |number | |Description |Current observed CPU limit usage per namespace.| |Addional Notes| |

                                                                                                                                                                                                                                                                                                                                                                      kube_resourcequota_sysdig_limits_memory_hard

                                                                                                                                                                                                                                                                                                                                                                      |Prometheus ID |kube_resourcequota_sysdig_limits_memory_hard| |Legacy ID |kubernetes.resourcequota.limits.memory.hard | |Metric Type |gauge | |Unit |number | |Description |Enforced memory limit quota per namespace. | |Addional Notes| |

                                                                                                                                                                                                                                                                                                                                                                      kube_resourcequota_sysdig_limits_memory_used

                                                                                                                                                                                                                                                                                                                                                                      |Prometheus ID |kube_resourcequota_sysdig_limits_memory_used | |Legacy ID |kubernetes.resourcequota.limits.memory.used | |Metric Type |gauge | |Unit |number | |Description |Current observed memory limit usage per namespace.| |Addional Notes| |

                                                                                                                                                                                                                                                                                                                                                                      kube_resourcequota_sysdig_persistentvolumeclaims_hard

                                                                                                                                                                                                                                                                                                                                                                      |Prometheus ID |kube_resourcequota_sysdig_persistentvolumeclaims_hard| |Legacy ID |kubernetes.resourcequota.persistentvolumeclaims.hard | |Metric Type |gauge | |Unit |number | |Description |Enforced Peristentvolumeclaim quota per namespace. | |Addional Notes| |

                                                                                                                                                                                                                                                                                                                                                                      kube_resourcequota_sysdig_persistentvolumeclaims_used

                                                                                                                                                                                                                                                                                                                                                                      |Prometheus ID |kube_resourcequota_sysdig_persistentvolumeclaims_used | |Legacy ID |kubernetes.resourcequota.persistentvolumeclaims.used | |Metric Type |gauge | |Unit |number | |Description |Current observed Persistentvolumeclaim usage per namespace.| |Addional Notes| |

                                                                                                                                                                                                                                                                                                                                                                      kube_resourcequota_sysdig_pods_hard

                                                                                                                                                                                                                                                                                                                                                                      |Prometheus ID |kube_resourcequota_sysdig_pods_hard| |Legacy ID |kubernetes.resourcequota.pods.hard | |Metric Type |gauge | |Unit |number | |Description |Enforced Pod quota per namespace. | |Addional Notes| |

                                                                                                                                                                                                                                                                                                                                                                      kube_resourcequota_sysdig_pods_used

                                                                                                                                                                                                                                                                                                                                                                      |Prometheus ID |kube_resourcequota_sysdig_pods_used | |Legacy ID |kubernetes.resourcequota.pods.used | |Metric Type |gauge | |Unit |number | |Description |Current observed Pod usage per namespace.| |Addional Notes| |

                                                                                                                                                                                                                                                                                                                                                                      kube_resourcequota_sysdig_requests_cpu_hard

                                                                                                                                                                                                                                                                                                                                                                      |Prometheus ID |kube_resourcequota_sysdig_requests_cpu_hard| |Legacy ID |kubernetes.resourcequota.requests.cpu.hard | |Metric Type |gauge | |Unit |number | |Description |Enforced CPU request quota per namespace. | |Addional Notes| |

                                                                                                                                                                                                                                                                                                                                                                      kube_resourcequota_sysdig_requests_cpu_used

                                                                                                                                                                                                                                                                                                                                                                      |Prometheus ID |kube_resourcequota_sysdig_requests_cpu_used | |Legacy ID |kubernetes.resourcequota.requests.cpu.used | |Metric Type |gauge | |Unit |number | |Description |Current observed CPU request usage per namespace.| |Addional Notes| |

                                                                                                                                                                                                                                                                                                                                                                      kube_resourcequota_sysdig_requests_memory_hard

                                                                                                                                                                                                                                                                                                                                                                      |Prometheus ID |kube_resourcequota_sysdig_requests_memory_hard| |Legacy ID |kubernetes.resourcequota.requests.memory.hard | |Metric Type |gauge | |Unit |number | |Description |Enforced memory request quota per namespace. | |Addional Notes| |

                                                                                                                                                                                                                                                                                                                                                                      kube_resourcequota_sysdig_requests_memory_used

                                                                                                                                                                                                                                                                                                                                                                      |Prometheus ID |kube_resourcequota_sysdig_requests_memory_used | |Legacy ID |kubernetes.resourcequota.requests.memory.used | |Metric Type |gauge | |Unit |number | |Description |Current observed memory request usage per namespace.| |Addional Notes| |

                                                                                                                                                                                                                                                                                                                                                                      kube_resourcequota_sysdig_services_hard

                                                                                                                                                                                                                                                                                                                                                                      |Prometheus ID |kube_resourcequota_sysdig_services_hard| |Legacy ID |kubernetes.resourcequota.services.hard | |Metric Type |gauge | |Unit |number | |Description |Enforced service quota per namespace. | |Addional Notes| |

                                                                                                                                                                                                                                                                                                                                                                      kube_resourcequota_sysdig_services_used

                                                                                                                                                                                                                                                                                                                                                                      |Prometheus ID |kube_resourcequota_sysdig_services_used | |Legacy ID |kubernetes.resourcequota.services.used | |Metric Type |gauge | |Unit |number | |Description |Current observed service usage per namespace.| |Addional Notes| |

                                                                                                                                                                                                                                                                                                                                                                      kube_service_info

                                                                                                                                                                                                                                                                                                                                                                      |Prometheus ID |kube_service_info| |Legacy ID | | |Metric Type |gauge | |Unit |number | |Description | | |Addional Notes| |

                                                                                                                                                                                                                                                                                                                                                                      kube_service_labels

                                                                                                                                                                                                                                                                                                                                                                      |Prometheus ID |kube_service_labels| |Legacy ID | | |Metric Type |gauge | |Unit |number | |Description | | |Addional Notes| |

                                                                                                                                                                                                                                                                                                                                                                      kube_statefulset_labels

                                                                                                                                                                                                                                                                                                                                                                      |Prometheus ID |kube_statefulset_labels| |Legacy ID | | |Metric Type |gauge | |Unit |number | |Description | | |Addional Notes| |

                                                                                                                                                                                                                                                                                                                                                                      kube_statefulset_replicas

                                                                                                                                                                                                                                                                                                                                                                      |Prometheus ID |kube_statefulset_replicas | |Legacy ID |kubernetes.statefulSet.replicas | |Metric Type |gauge | |Unit |number | |Description |Desired number of replicas of the given Template.| |Addional Notes| |

                                                                                                                                                                                                                                                                                                                                                                      kube_statefulset_status_replicas

                                                                                                                                                                                                                                                                                                                                                                      |Prometheus ID |kube_statefulset_status_replicas | |Legacy ID |kubernetes.statefulSet.status.replicas | |Metric Type |gauge | |Unit |number | |Description |Number of Pods created by the StatefulSet controller.| |Addional Notes| |

                                                                                                                                                                                                                                                                                                                                                                      kube_statefulset_status_replicas_current

                                                                                                                                                                                                                                                                                                                                                                      |Prometheus ID |kube_statefulset_status_replicas_current | |Legacy ID |kubernetes.statefulSet.status.replicas.current | |Metric Type |gauge | |Unit |number | |Description |The number of Pods created by the StatefulSet controller from the StatefulSet version indicated by currrentRevision.| |Addional Notes| |

                                                                                                                                                                                                                                                                                                                                                                      kube_statefulset_status_replicas_ready

                                                                                                                                                                                                                                                                                                                                                                      |Prometheus ID |kube_statefulset_status_replicas_ready | |Legacy ID |kubernetes.statefulSet.status.replicas.ready | |Metric Type |gauge | |Unit |number | |Description |Number of Pods created by the StatefulSet controller that have a Ready Condition.| |Addional Notes| |

                                                                                                                                                                                                                                                                                                                                                                      kube_statefulset_status_replicas_updated

                                                                                                                                                                                                                                                                                                                                                                      |Prometheus ID |kube_statefulset_status_replicas_updated | |Legacy ID |kubernetes.statefulSet.status.replicas.updated | |Metric Type |gauge | |Unit |number | |Description |Number of Pods created by the StatefulSet controller from the StatefulSet version indicated by updateRevision.| |Addional Notes| |

                                                                                                                                                                                                                                                                                                                                                                      kube_storageclass_created

                                                                                                                                                                                                                                                                                                                                                                      |Prometheus ID |kube_storageclass_created| |Legacy ID | | |Metric Type |gauge | |Unit |number | |Description | | |Addional Notes| |

                                                                                                                                                                                                                                                                                                                                                                      kube_storageclass_info

                                                                                                                                                                                                                                                                                                                                                                      |Prometheus ID |kube_storageclass_info| |Legacy ID | | |Metric Type |gauge | |Unit |number | |Description | | |Addional Notes| |

                                                                                                                                                                                                                                                                                                                                                                      kube_storageclass_labels

                                                                                                                                                                                                                                                                                                                                                                      |Prometheus ID |kube_storageclass_labels| |Legacy ID | | |Metric Type |gauge | |Unit |number | |Description | | |Addional Notes| |

                                                                                                                                                                                                                                                                                                                                                                      kube_workload_pods_status_phase

                                                                                                                                                                                                                                                                                                                                                                      |Prometheus ID |kube_workload_pods_status_phase | |Legacy ID |kubernetes.workload.pods.status.phase| |Metric Type |gauge | |Unit |number | |Description | | |Addional Notes| |

                                                                                                                                                                                                                                                                                                                                                                      kube_workload_status_replicas_misscheduled

                                                                                                                                                                                                                                                                                                                                                                      |Prometheus ID |kube_workload_status_replicas_misscheduled | |Legacy ID |kubernetes.workload.status.replicas.misscheduled| |Metric Type |gauge | |Unit |number | |Description | | |Addional Notes| |

                                                                                                                                                                                                                                                                                                                                                                      kube_workload_status_replicas_scheduled

                                                                                                                                                                                                                                                                                                                                                                      |Prometheus ID |kube_workload_status_replicas_scheduled | |Legacy ID |kubernetes.workload.status.replicas.scheduled| |Metric Type |gauge | |Unit |number | |Description | | |Addional Notes| |

                                                                                                                                                                                                                                                                                                                                                                      kube_workload_status_replicas_updated

                                                                                                                                                                                                                                                                                                                                                                      |Prometheus ID |kube_workload_status_replicas_updated | |Legacy ID |kubernetes.workload.status.replicas.updated| |Metric Type |gauge | |Unit |number | |Description | | |Addional Notes| |

                                                                                                                                                                                                                                                                                                                                                                      kube_workload_status_running

                                                                                                                                                                                                                                                                                                                                                                      |Prometheus ID |kube_workload_status_running | |Legacy ID |kubernetes.workload.status.running| |Metric Type |gauge | |Unit |number | |Description | | |Addional Notes| |

                                                                                                                                                                                                                                                                                                                                                                      kube_workload_status_unavailable

                                                                                                                                                                                                                                                                                                                                                                      |Prometheus ID |kube_workload_status_unavailable | |Legacy ID |kubernetes.workload.status.unavailable| |Metric Type |gauge | |Unit |number | |Description | | |Addional Notes| |

                                                                                                                                                                                                                                                                                                                                                                      10.8 -

                                                                                                                                                                                                                                                                                                                                                                      Network

                                                                                                                                                                                                                                                                                                                                                                      sysdig_connection_net_request_count

                                                                                                                                                                                                                                                                                                                                                                      |Prometheus ID |sysdig_connection_net_request_count | |Legacy ID |net.request.count | |Metric Type |- | |Unit |- | |Description |The total number of network requests. This value may exceed the sum of inbound and outbound requests, because this count includes requests over internal connections..| |Addional Notes| |

                                                                                                                                                                                                                                                                                                                                                                      sysdig_connection_net_connection_in_count

                                                                                                                                                                                                                                                                                                                                                                      |Prometheus ID |sysdig_connection_net_connection_in_count | |Legacy ID |net.connection.count.in | |Metric Type |counter | |Unit |number | |Description |The number of currently established client (inbound) connections. | |Addional Notes|This metric is especially useful when segmented by protocol, port or process.|

                                                                                                                                                                                                                                                                                                                                                                      sysdig_connection_net_connection_out_count

                                                                                                                                                                                                                                                                                                                                                                      |Prometheus ID |sysdig_connection_net_connection_out_count | |Legacy ID |net.connection.count.out | |Metric Type |counter | |Unit |number | |Description |The number of currently established server (outbound) connections. | |Addional Notes|This metric is especially useful when segmented by protocol, port or process.|

                                                                                                                                                                                                                                                                                                                                                                      sysdig_connection_net_connection_total_count

                                                                                                                                                                                                                                                                                                                                                                      |Prometheus ID |sysdig_connection_net_connection_total_count | |Legacy ID |net.connection.count.total | |Metric Type |counter | |Unit |number | |Description |The number of currently established connections. This value may exceed the sum of the inbound and outbound metrics since it represents client and server inter-host connections as well as internal only connections.| |Addional Notes|This metric is especially useful when segmented by protocol, port or process. |

                                                                                                                                                                                                                                                                                                                                                                      sysdig_connection_net_in_bytes

                                                                                                                                                                                                                                                                                                                                                                      |Prometheus ID |sysdig_connection_net_in_bytes | |Legacy ID |net.bytes.in | |Metric Type |counter | |Unit |data | |Description |The number of inbound network bytes. | |Addional Notes|By default, this metric shows the total value for the selected scope. For instance, if you apply it to a group of machines, you will see the total value for the whole group. However, you can easily segment the metric to see it by host, process, container, and so on. Just use ‘Segment by’ in the UI.|

                                                                                                                                                                                                                                                                                                                                                                      sysdig_connection_net_out_bytes

                                                                                                                                                                                                                                                                                                                                                                      |Prometheus ID |sysdig_connection_net_out_bytes | |Legacy ID |net.bytes.out | |Metric Type |counter | |Unit |data | |Description |The number of outbound network bytes. | |Addional Notes|By default, this metric shows the total value for the selected scope. For instance, if you apply it to a group of machines, you will see the total value for the whole group. However, you can easily segment the metric to see it by host, process, container, and so on. Just use ‘Segment by’ in the UI.|

                                                                                                                                                                                                                                                                                                                                                                      sysdig_connection_net_request_count

                                                                                                                                                                                                                                                                                                                                                                      |Prometheus ID |sysdig_connection_net_request_count | |Legacy ID |net.request.count | |Metric Type |counter | |Unit |number | |Description |The total number of network requests. Note, this value may exceed the sum of inbound and outbound requests, because this count includes requests over internal connections.| |Addional Notes| |

                                                                                                                                                                                                                                                                                                                                                                      sysdig_connection_net_request_in_count

                                                                                                                                                                                                                                                                                                                                                                      |Prometheus ID |sysdig_connection_net_request_in_count | |Legacy ID |net.request.count.in | |Metric Type |counter | |Unit |number | |Description |The number of inbound network requests.| |Addional Notes| |

                                                                                                                                                                                                                                                                                                                                                                      sysdig_connection_net_request_in_time

                                                                                                                                                                                                                                                                                                                                                                      |Prometheus ID |sysdig_connection_net_request_in_time | |Legacy ID |net.request.time.in | |Metric Type |counter | |Unit |time | |Description |The average time to serve an inbound request.| |Addional Notes| |

                                                                                                                                                                                                                                                                                                                                                                      sysdig_connection_net_request_out_count

                                                                                                                                                                                                                                                                                                                                                                      |Prometheus ID |sysdig_connection_net_request_out_count | |Legacy ID |net.request.count.out | |Metric Type |counter | |Unit |number | |Description |The number of outbound network requests.| |Addional Notes| |

                                                                                                                                                                                                                                                                                                                                                                      sysdig_connection_net_request_out_time

                                                                                                                                                                                                                                                                                                                                                                      |Prometheus ID |sysdig_connection_net_request_out_time | |Legacy ID |net.request.time.out | |Metric Type |counter | |Unit |time | |Description |The number of average time spent waiting for an outbound request.| |Addional Notes| |

                                                                                                                                                                                                                                                                                                                                                                      sysdig_connection_net_request_time

                                                                                                                                                                                                                                                                                                                                                                      |Prometheus ID |sysdig_connection_net_request_time | |Legacy ID |net.request.time | |Metric Type |counter | |Unit |time | |Description |The number of average time to serve a network request.| |Addional Notes| |

                                                                                                                                                                                                                                                                                                                                                                      sysdig_connection_net_total_bytes

                                                                                                                                                                                                                                                                                                                                                                      |Prometheus ID |sysdig_connection_net_total_bytes | |Legacy ID |net.bytes.total | |Metric Type |counter | |Unit |data | |Description |The total network bytes, including both inbound and outbound connections. | |Addional Notes|By default, this metric shows the total value for the selected scope. For instance, if you apply it to a group of machines, you will see the total value for the whole group. However, you can easily segment the metric to see it by host, process, container, and so on. Just use ‘Segment by’ in the UI.|

                                                                                                                                                                                                                                                                                                                                                                      10.9 -

                                                                                                                                                                                                                                                                                                                                                                      Program

                                                                                                                                                                                                                                                                                                                                                                      sysdig_program_cpu_cores_used

                                                                                                                                                                                                                                                                                                                                                                      |Prometheus ID |sysdig_program_cpu_cores_used | |Legacy ID |cpu.cores.used | |Metric Type |gauge | |Unit |number | |Description |The CPU core usage of each program is obtained from cgroups, and is equal to the number of cores used by the program. For example, if a program uses two of an available four cores, the value of sysdig_program_cpu_cores_used will be two.| |Addional Notes| |

                                                                                                                                                                                                                                                                                                                                                                      sysdig_program_cpu_cores_used_percent

                                                                                                                                                                                                                                                                                                                                                                      |Prometheus ID |sysdig_program_cpu_cores_used_percent | |Legacy ID |cpu.cores.used.percent | |Metric Type |gauge | |Unit |percent | |Description |The CPU core usage percent for each program is obtained from cgroups, and is equal to the number of cores multiplied by 100. For example, if a program uses three cores, the value of sysdig_program_cpu_cores_used_percent would be 300%.| |Addional Notes| |

                                                                                                                                                                                                                                                                                                                                                                      sysdig_program_cpu_used_percent

                                                                                                                                                                                                                                                                                                                                                                      |Prometheus ID |sysdig_program_cpu_used_percent | |Legacy ID |cpu.used.percent | |Metric Type |gauge | |Unit |percent | |Description |The CPU usage for each program is obtained from cgroups, and normalized by dividing by the number of cores to determine an overall percentage. For example, if the environment contains six cores on a host, and the processes are assigned two cores, Sysdig will report CPU usage of 2/6 * 100% = 33.33%. This metric is calculated differently for hosts and containers.| |Addional Notes| |

                                                                                                                                                                                                                                                                                                                                                                      sysdig_program_fd_used_percent

                                                                                                                                                                                                                                                                                                                                                                      |Prometheus ID |sysdig_program_fd_used_percent | |Legacy ID |fd.used.percent | |Metric Type |gauge | |Unit |percent | |Description |The percentage of used file descriptors out of the maximum available. | |Addional Notes|Usually, when a process reaches its FD limit it will stop operating properly and possibly crash. As a consequence, this is a metric you want to monitor carefully, or even better use for alerts.|

                                                                                                                                                                                                                                                                                                                                                                      sysdig_program_file_error_open_count

                                                                                                                                                                                                                                                                                                                                                                      |Prometheus ID |sysdig_program_file_error_open_count | |Legacy ID |file.error.open.count | |Metric Type |counter | |Unit |number | |Description |The number of errors caused by opening files. | |Addional Notes|By default, this metric shows the total value for the selected scope. For instance, if you apply it to a group of machines, you will see the total value for the whole group. However, you can easily segment the metric to see it by host, process, container, and so on. Just use ‘Segment by’ in the UI.|

                                                                                                                                                                                                                                                                                                                                                                      sysdig_program_file_error_total_count

                                                                                                                                                                                                                                                                                                                                                                      |Prometheus ID |sysdig_program_file_error_total_count | |Legacy ID |file.error.total.count | |Metric Type |counter | |Unit |number | |Description |The number of error caused by file access. | |Addional Notes|By default, this metric shows the total value for the selected scope. For instance, if you apply it to a group of machines, you will see the total value for the whole group. However, you can easily segment the metric to see it by host, process, container, and so on. Just use ‘Segment by’ in the UI.|

                                                                                                                                                                                                                                                                                                                                                                      sysdig_program_file_in_bytes

                                                                                                                                                                                                                                                                                                                                                                      |Prometheus ID |sysdig_program_file_in_bytes | |Legacy ID |file.bytes.in | |Metric Type |counter | |Unit |data | |Description |The number of bytes read from file. | |Addional Notes|By default, this metric shows the total value for the selected scope. For instance, if you apply it to a group of machines, you will see the total value for the whole group. However, you can easily segment the metric to see it by host, process, container, and so on. Just use ‘Segment by’ in the UI.|

                                                                                                                                                                                                                                                                                                                                                                      sysdig_program_file_in_iops

                                                                                                                                                                                                                                                                                                                                                                      |Prometheus ID |sysdig_program_file_in_iops | |Legacy ID |file.iops.in | |Metric Type |counter | |Unit |number | |Description |The number of file read operations per second. | |Addional Notes|This is calculated by measuring the actual number of read and write requests made by a process. Therefore, it can differ from what other tools show, which is usually based on interpolating this value from the number of bytes read and written to the file system.|

                                                                                                                                                                                                                                                                                                                                                                      sysdig_program_file_in_time

                                                                                                                                                                                                                                                                                                                                                                      |Prometheus ID |sysdig_program_file_in_time | |Legacy ID |file.time.in | |Metric Type |counter | |Unit |time | |Description |The time spent in file reading. | |Addional Notes|By default, this metric shows the total value for the selected scope. For instance, if you apply it to a group of machines, you will see the total value for the whole group. However, you can easily segment the metric to see it by host, process, container, and so on. Just use ‘Segment by’ in the UI.|

                                                                                                                                                                                                                                                                                                                                                                      sysdig_program_file_open_count

                                                                                                                                                                                                                                                                                                                                                                      |Prometheus ID |sysdig_program_file_open_count | |Legacy ID |file.open.count | |Metric Type |counter | |Unit |number | |Description |The number of time the file has been opened.| |Addional Notes| |

                                                                                                                                                                                                                                                                                                                                                                      sysdig_program_file_out_bytes

                                                                                                                                                                                                                                                                                                                                                                      |Prometheus ID |sysdig_program_file_out_bytes | |Legacy ID |file.bytes.out | |Metric Type |counter | |Unit |data | |Description |The number of bytes written to file. | |Addional Notes|By default, this metric shows the total value for the selected scope. For instance, if you apply it to a group of machines, you will see the total value for the whole group. However, you can easily segment the metric to see it by host, process, container, and so on. Just use ‘Segment by’ in the UI.|

                                                                                                                                                                                                                                                                                                                                                                      sysdig_program_file_out_iops

                                                                                                                                                                                                                                                                                                                                                                      |Prometheus ID |sysdig_program_file_out_iops | |Legacy ID |file.iops.out | |Metric Type |counter | |Unit |number | |Description |The number of file write operations per second. | |Addional Notes|This is calculated by measuring the actual number of read and write requests made by a process. Therefore, it can differ from what other tools show, which is usually based on interpolating this value from the number of bytes read and written to the file system.|

                                                                                                                                                                                                                                                                                                                                                                      sysdig_program_file_out_time

                                                                                                                                                                                                                                                                                                                                                                      |Prometheus ID |sysdig_program_file_out_time | |Legacy ID |file.time.out | |Metric Type |counter | |Unit |time | |Description |The time spent in file writing. | |Addional Notes|By default, this metric shows the total value for the selected scope. For instance, if you apply it to a group of machines, you will see the total value for the whole group. However, you can easily segment the metric to see it by host, process, container, and so on. Just use ‘Segment by’ in the UI.|

                                                                                                                                                                                                                                                                                                                                                                      sysdig_program_file_total_bytes

                                                                                                                                                                                                                                                                                                                                                                      |Prometheus ID |sysdig_program_file_total_bytes | |Legacy ID |file.bytes.total | |Metric Type |counter | |Unit |data | |Description |The number of bytes read from and written to file. | |Addional Notes|By default, this metric shows the total value for the selected scope. For instance, if you apply it to a group of machines, you will see the total value for the whole group. However, you can easily segment the metric to see it by host, process, container, and so on. Just use ‘Segment by’ in the UI.|

                                                                                                                                                                                                                                                                                                                                                                      sysdig_program_file_total_iops

                                                                                                                                                                                                                                                                                                                                                                      |Prometheus ID |sysdig_program_file_total_iops | |Legacy ID |file.iops.total | |Metric Type |counter | |Unit |number | |Description |The number of read and write file operations per second. | |Addional Notes|This is calculated by measuring the actual number of read and write requests made by a process. Therefore, it can differ from what other tools show, which is usually based on interpolating this value from the number of bytes read and written to the file system.|

                                                                                                                                                                                                                                                                                                                                                                      sysdig_program_file_total_time

                                                                                                                                                                                                                                                                                                                                                                      |Prometheus ID |sysdig_program_file_total_time | |Legacy ID |file.time.total | |Metric Type |counter | |Unit |time | |Description |The time spent in file I/O. | |Addional Notes|By default, this metric shows the total value for the selected scope. For instance, if you apply it to a group of machines, you will see the total value for the whole group. However, you can easily segment the metric to see it by host, process, container, and so on. Just use ‘Segment by’ in the UI.|

                                                                                                                                                                                                                                                                                                                                                                      sysdig_program_info

                                                                                                                                                                                                                                                                                                                                                                      |Prometheus ID |sysdig_program_info| |Legacy ID |info | |Metric Type |gauge | |Unit |number | |Description | | |Addional Notes| |

                                                                                                                                                                                                                                                                                                                                                                      sysdig_program_memory_used_bytes

                                                                                                                                                                                                                                                                                                                                                                      |Prometheus ID |sysdig_program_memory_used_bytes | |Legacy ID |memory.bytes.used | |Metric Type |gauge | |Unit |data | |Description |The amount of physical memory currently in use. | |Addional Notes|By default, this metric shows the average value for the selected scope. For instance, if you apply it to a group of machines, you will see the average value for the whole group. However, the metric can also be segmented by using ‘Segment by’ in the UI.|

                                                                                                                                                                                                                                                                                                                                                                      sysdig_program_memory_used_percent

                                                                                                                                                                                                                                                                                                                                                                      |Prometheus ID |sysdig_program_memory_used_percent | |Legacy ID |memory.used.percent | |Metric Type |gauge | |Unit |percent | |Description |The percentage of physical memory in use. | |Addional Notes|By default, this metric shows the average value for the selected scope. For instance, if you apply it to a group of machines, you will see the average value for the whole group. However, you can easily segment the metric to see it by host, process, container, and so on. Just use ‘Segment by’ in the UI.|

                                                                                                                                                                                                                                                                                                                                                                      sysdig_program_net_connection_in_count

                                                                                                                                                                                                                                                                                                                                                                      |Prometheus ID |sysdig_program_net_connection_in_count | |Legacy ID |net.connection.count.in | |Metric Type |counter | |Unit |number | |Description |The number of currently established client (inbound) connections. | |Addional Notes|This metric is especially useful when segmented by protocol, port or process.|

                                                                                                                                                                                                                                                                                                                                                                      sysdig_program_net_connection_out_count

                                                                                                                                                                                                                                                                                                                                                                      |Prometheus ID |sysdig_program_net_connection_out_count | |Legacy ID |net.connection.count.out | |Metric Type |counter | |Unit |number | |Description |The number of currently established server (outbound) connections. | |Addional Notes|This metric is especially useful when segmented by protocol, port or process.|

                                                                                                                                                                                                                                                                                                                                                                      sysdig_program_net_connection_total_count

                                                                                                                                                                                                                                                                                                                                                                      |Prometheus ID |sysdig_program_net_connection_total_count | |Legacy ID |net.connection.count.total | |Metric Type |counter | |Unit |number | |Description |The number of currently established connections. This value may exceed the sum of the inbound and outbound metrics since it represents client and server inter-host connections as well as internal only connections.| |Addional Notes|This metric is especially useful when segmented by protocol, port or process. |

                                                                                                                                                                                                                                                                                                                                                                      sysdig_program_net_error_count

                                                                                                                                                                                                                                                                                                                                                                      |Prometheus ID |sysdig_program_net_error_count | |Legacy ID |net.error.count | |Metric Type |counter | |Unit |number | |Description |The total number of network errors occurred in a second. | |Addional Notes|By default, this metric shows the total value for the selected scope. For instance, if you apply it to a group of machines, you will see the total value for the whole group. However, you can easily segment the metric to see it by host, process, container, and so on. Just use ‘Segment by’ in the UI.|

                                                                                                                                                                                                                                                                                                                                                                      sysdig_program_net_in_bytes

                                                                                                                                                                                                                                                                                                                                                                      |Prometheus ID |sysdig_program_net_in_bytes | |Legacy ID |net.bytes.in | |Metric Type |counter | |Unit |data | |Description |The number of inbound network bytes. | |Addional Notes|By default, this metric shows the total value for the selected scope. For instance, if you apply it to a group of machines, you will see the total value for the whole group. However, you can easily segment the metric to see it by host, process, container, and so on. Just use ‘Segment by’ in the UI.|

                                                                                                                                                                                                                                                                                                                                                                      sysdig_program_net_out_bytes

                                                                                                                                                                                                                                                                                                                                                                      |Prometheus ID |sysdig_program_net_out_bytes | |Legacy ID |net.bytes.out | |Metric Type |counter | |Unit |data | |Description |The number of outbound network bytes. | |Addional Notes|By default, this metric shows the total value for the selected scope. For instance, if you apply it to a group of machines, you will see the total value for the whole group. However, you can easily segment the metric to see it by host, process, container, and so on. Just use ‘Segment by’ in the UI.|

                                                                                                                                                                                                                                                                                                                                                                      sysdig_program_net_request_count

                                                                                                                                                                                                                                                                                                                                                                      |Prometheus ID |sysdig_program_net_request_count | |Legacy ID |net.request.count | |Metric Type |counter | |Unit |number | |Description |The total number of network requests. Note, this value may exceed the sum of inbound and outbound requests, because this count includes requests over internal connections.| |Addional Notes| |

                                                                                                                                                                                                                                                                                                                                                                      sysdig_program_net_request_in_count

                                                                                                                                                                                                                                                                                                                                                                      |Prometheus ID |sysdig_program_net_request_in_count | |Legacy ID |net.request.count.in | |Metric Type |counter | |Unit |number | |Description |The number of inbound network requests.| |Addional Notes| |

                                                                                                                                                                                                                                                                                                                                                                      sysdig_program_net_request_in_time

                                                                                                                                                                                                                                                                                                                                                                      |Prometheus ID |sysdig_program_net_request_in_time | |Legacy ID |net.request.time.in | |Metric Type |counter | |Unit |time | |Description |The average time to serve an inbound request.| |Addional Notes| |

                                                                                                                                                                                                                                                                                                                                                                      sysdig_program_net_request_out_count

                                                                                                                                                                                                                                                                                                                                                                      |Prometheus ID |sysdig_program_net_request_out_count | |Legacy ID |net.request.count.out | |Metric Type |counter | |Unit |number | |Description |The number of outbound network requests.| |Addional Notes| |

                                                                                                                                                                                                                                                                                                                                                                      sysdig_program_net_request_out_time

                                                                                                                                                                                                                                                                                                                                                                      |Prometheus ID |sysdig_program_net_request_out_time | |Legacy ID |net.request.time.out | |Metric Type |counter | |Unit |time | |Description |The average time spent waiting for an outbound request.| |Addional Notes| |

                                                                                                                                                                                                                                                                                                                                                                      sysdig_program_net_request_time

                                                                                                                                                                                                                                                                                                                                                                      |Prometheus ID |sysdig_program_net_request_time | |Legacy ID |net.request.time | |Metric Type |counter | |Unit |time | |Description |Average time to serve a network request.| |Addional Notes| |

                                                                                                                                                                                                                                                                                                                                                                      sysdig_program_net_tcp_queue_len

                                                                                                                                                                                                                                                                                                                                                                      |Prometheus ID |sysdig_program_net_tcp_queue_len | |Legacy ID |net.tcp.queue.len | |Metric Type |counter | |Unit |number | |Description |The length of the TCP request queue.| |Addional Notes| |

                                                                                                                                                                                                                                                                                                                                                                      sysdig_program_net_total_bytes

                                                                                                                                                                                                                                                                                                                                                                      |Prometheus ID |sysdig_program_net_total_bytes | |Legacy ID |net.bytes.total | |Metric Type |counter | |Unit |data | |Description |The total network bytes, including inbound and outbound connections, in a program. | |Addional Notes|By default, this metric shows the total value for the selected scope. For instance, if you apply it to a group of machines, you will see the total value for the whole group. However, you can easily segment the metric to see it by host, process, container, and so on. Just use ‘Segment by’ in the UI.|

                                                                                                                                                                                                                                                                                                                                                                      sysdig_program_proc_count

                                                                                                                                                                                                                                                                                                                                                                      |Prometheus ID |sysdig_program_proc_count | |Legacy ID |proc.count | |Metric Type |counter | |Unit |number | |Description |The number of processes on a host or container.| |Addional Notes| |

                                                                                                                                                                                                                                                                                                                                                                      sysdig_program_syscall_count

                                                                                                                                                                                                                                                                                                                                                                      |Prometheus ID |sysdig_program_syscall_count | |Legacy ID |syscall.count | |Metric Type |gauge | |Unit |number | |Description |The total number of syscalls seen | |Addional Notes|Syscalls are resource intensive. This metric tracks how many have been made by a given process or container|

                                                                                                                                                                                                                                                                                                                                                                      sysdig_program_thread_count

                                                                                                                                                                                                                                                                                                                                                                      |Prometheus ID |sysdig_program_thread_count | |Legacy ID |thread.count | |Metric Type |counter | |Unit |number | |Description |The total number of threads running in a program.| |Addional Notes| |

                                                                                                                                                                                                                                                                                                                                                                      sysdig_program_timeseries_count_appcheck

                                                                                                                                                                                                                                                                                                                                                                      |Prometheus ID |sysdig_program_timeseries_count_appcheck| |Legacy ID |metricCount.appCheck | |Metric Type |gauge | |Unit |number | |Description |The number of app check custom metrics. | |Addional Notes| |

                                                                                                                                                                                                                                                                                                                                                                      sysdig_program_timeseries_count_jmx

                                                                                                                                                                                                                                                                                                                                                                      |Prometheus ID |sysdig_program_timeseries_count_jmx| |Legacy ID |metricCount.jmx | |Metric Type |gauge | |Unit |number | |Description |The number of JMS custom metrics. | |Addional Notes| |

                                                                                                                                                                                                                                                                                                                                                                      sysdig_program_timeseries_count_prometheus

                                                                                                                                                                                                                                                                                                                                                                      |Prometheus ID |sysdig_program_timeseries_count_prometheus| |Legacy ID |metricCount.prometheus | |Metric Type |gauge | |Unit |number | |Description |The number of Prometheus custom metrics. | |Addional Notes| |

                                                                                                                                                                                                                                                                                                                                                                      sysdig_program_up

                                                                                                                                                                                                                                                                                                                                                                      |Prometheus ID |sysdig_program_up | |Legacy ID |uptime | |Metric Type |gauge | |Unit |number | |Description |The percentage of time the selected entity was down during the visualized time sample. This can be used to determine if a machine (or a group of machines) went down.| |Addional Notes| |

                                                                                                                                                                                                                                                                                                                                                                      sysdig_program_cpu_used_percent

                                                                                                                                                                                                                                                                                                                                                                      |Prometheus ID |sysdig_program_cpu_used_percent | |Legacy ID |cpu.used.percent | |Metric Type |- | |Unit |- | |Description |The CPU usage for each program is obtained from cgroups, and normalized by dividing by the number of cores to determine an overall percentage.| |Addional Notes| |

                                                                                                                                                                                                                                                                                                                                                                      sysdig_program_memory_used_percent

                                                                                                                                                                                                                                                                                                                                                                      |Prometheus ID |sysdig_program_memory_used_percent | |Legacy ID |memory.used.percent | |Metric Type |- | |Unit |- | |Description |The percentage of swap memory used. By default, this metric displays the average value for the defined scope. For example, if the scope is set to a group of machines, the metric value will be the average value for the whole group.| |Addional Notes| |

                                                                                                                                                                                                                                                                                                                                                                      10.9.1 -

                                                                                                                                                                                                                                                                                                                                                                      Program

                                                                                                                                                                                                                                                                                                                                                                      sysdig_program_cpu_cores_used

                                                                                                                                                                                                                                                                                                                                                                      MetadataValue
                                                                                                                                                                                                                                                                                                                                                                      publicIdsysdig_program_cpu_cores_used
                                                                                                                                                                                                                                                                                                                                                                      legacyIdcpu.cores.used
                                                                                                                                                                                                                                                                                                                                                                      descriptionThe CPU core usage of each program is obtained from cgroups, and is equal to the number of cores used by the program. For example, if a program uses two of an available four cores, the value of sysdig_program_cpu_cores_used will be two.
                                                                                                                                                                                                                                                                                                                                                                      notes

                                                                                                                                                                                                                                                                                                                                                                      sysdig_program_cpu_cores_used_percent

                                                                                                                                                                                                                                                                                                                                                                      MetadataValue
                                                                                                                                                                                                                                                                                                                                                                      publicIdsysdig_program_cpu_cores_used_percent
                                                                                                                                                                                                                                                                                                                                                                      legacyIdcpu.cores.used.percent
                                                                                                                                                                                                                                                                                                                                                                      descriptionThe CPU core usage percent for each program is obtained from cgroups, and is equal to the number of cores multiplied by 100. For example, if a program uses three cores, the value of sysdig_program_cpu_cores_used_percent would be 300%.
                                                                                                                                                                                                                                                                                                                                                                      notes

                                                                                                                                                                                                                                                                                                                                                                      sysdig_program_cpu_used_percent

                                                                                                                                                                                                                                                                                                                                                                      MetadataValue
                                                                                                                                                                                                                                                                                                                                                                      publicIdsysdig_program_cpu_used_percent
                                                                                                                                                                                                                                                                                                                                                                      legacyIdcpu.used.percent
                                                                                                                                                                                                                                                                                                                                                                      descriptionThe CPU usage for each program is obtained from cgroups, and normalized by dividing by the number of cores to determine an overall percentage. For example, if the environment contains six cores on a host, and the processes are assigned two cores, Sysdig will report CPU usage of 2/6 * 100% = 33.33%. This metric is calculated differently for hosts and containers.
                                                                                                                                                                                                                                                                                                                                                                      notes

                                                                                                                                                                                                                                                                                                                                                                      sysdig_program_fd_used_percent

                                                                                                                                                                                                                                                                                                                                                                      MetadataValue
                                                                                                                                                                                                                                                                                                                                                                      publicIdsysdig_program_fd_used_percent
                                                                                                                                                                                                                                                                                                                                                                      legacyIdfd.used.percent
                                                                                                                                                                                                                                                                                                                                                                      descriptionThe percentage of used file descriptors out of the maximum available.
                                                                                                                                                                                                                                                                                                                                                                      notesUsually, when a process reaches its FD limit it will stop operating properly and possibly crash. As a consequence, this is a metric you want to monitor carefully, or even better use for alerts.

                                                                                                                                                                                                                                                                                                                                                                      sysdig_program_file_error_open_count

                                                                                                                                                                                                                                                                                                                                                                      MetadataValue
                                                                                                                                                                                                                                                                                                                                                                      publicIdsysdig_program_file_error_open_count
                                                                                                                                                                                                                                                                                                                                                                      legacyIdfile.error.open.count
                                                                                                                                                                                                                                                                                                                                                                      descriptionThe number of errors caused by opening files.
                                                                                                                                                                                                                                                                                                                                                                      notesBy default, this metric shows the total value for the selected scope. For instance, if you apply it to a group of machines, you will see the total value for the whole group. However, you can easily segment the metric to see it by host, process, container, and so on. Just use ‘Segment by’ in the UI.

                                                                                                                                                                                                                                                                                                                                                                      sysdig_program_file_error_total_count

                                                                                                                                                                                                                                                                                                                                                                      MetadataValue
                                                                                                                                                                                                                                                                                                                                                                      publicIdsysdig_progra