Windows
This integration is disabled by default. See Enable and Disable Integrations to enable it in your account.
This integration has 77 metrics.
List of Alerts
Alert | Description | Format |
---|---|---|
[Windows] High CPU Usage | The CPU of the Windows instance reached 95% of use | Prometheus |
[Windows] High Disk Usage | Disk full over 95% in instance {{$labels.instance}} | Prometheus |
[Windows] High Physical Memory Usage | High physical memory usage in instance | Prometheus |
[Windows] High Network Inbound Errors | High inbound network error rate in instance | Prometheus |
[Windows] High Network Outbound Errors | High outbound network error rate in instance | Prometheus |
[Windows] Increase of Disk writes time | Increase of Disk writes time | Prometheus |
[Windows] Queue of Writes and reads Disk operations is growing | The queue for writes and reads disk operations is growing | Prometheus |
[Windows] High percent of swap space used | The swap space has reached high amount of used | Prometheus |
[Windows] Network bandwidth is reaching its limit | Network Bandwith use is reaching its limit | Prometheus |
[Windows] High number of transitions virtual addresses into disk | The rate at which pages transition to resident memory without being written to disk has reached problematic limit | Prometheus |
List of Dashboards
Windows Host Overview
The dashboard provides information about the Windows host.
Windows Process Overview
The dashboard provides information about the Windows processes.
Windows Services Overview
The dashboard provides information about the Windows services.
Windows Node Overview (Legacy)
The dashboard provides information about the Windows nodes (legacy).
List of Metrics
Metric name |
---|
windows_cpu_core_frequency_mhz |
windows_cpu_time_total |
windows_cs_physical_memory_bytes |
windows_logical_disk_free_bytes |
windows_logical_disk_read_bytes_total |
windows_logical_disk_reads_total |
windows_logical_disk_requests_queued |
windows_logical_disk_size_bytes |
windows_logical_disk_split_ios_total |
windows_logical_disk_write_bytes_total |
windows_logical_disk_write_seconds_total |
windows_logical_disk_writes_total |
windows_memory_transition_faults_total |
windows_net_bytes_received_total |
windows_net_bytes_sent_total |
windows_net_bytes_total |
windows_net_current_bandwidth_bytes |
windows_net_packets_outbound_discarded_total |
windows_net_packets_outbound_errors |
windows_net_packets_outbound_errors_total |
windows_net_packets_received_discarded_total |
windows_net_packets_received_errors |
windows_net_packets_received_errors_total |
windows_net_packets_received_total |
windows_net_packets_sent_total |
windows_os_paging_free_bytes |
windows_os_paging_limit_bytes |
windows_os_physical_memory_free_bytes |
windows_os_processes |
windows_os_users |
windows_os_virtual_memory_bytes |
windows_os_virtual_memory_free_bytes |
windows_process_cpu_time_total |
windows_process_io_bytes_total |
windows_process_io_operations_total |
windows_process_threads |
windows_process_working_set_bytes |
windows_service_info |
windows_service_start_mode |
windows_service_state |
windows_service_status |
windows_system_context_switches_total |
windows_system_processor_queue_length |
windows_system_system_up_time |
windows_system_threads |
wmi_cpu_core_frequency_mhz |
wmi_cpu_time_total |
wmi_cs_physical_memory_bytes |
wmi_logical_disk_free_bytes |
wmi_logical_disk_read_bytes_total |
wmi_logical_disk_reads_total |
wmi_logical_disk_requests_queued |
wmi_logical_disk_size_bytes |
wmi_logical_disk_split_ios_total |
wmi_logical_disk_write_bytes_total |
wmi_logical_disk_writes_total |
wmi_net_bytes_received_total |
wmi_net_bytes_sent_total |
wmi_net_bytes_total |
wmi_net_current_bandwidth |
wmi_net_packets_outbound_discarded |
wmi_net_packets_outbound_errors |
wmi_net_packets_received_discarded |
wmi_net_packets_received_errors |
wmi_net_packets_received_total |
wmi_net_packets_sent_total |
wmi_os_paging_free_bytes |
wmi_os_paging_limit_bytes |
wmi_os_physical_memory_free_bytes |
wmi_os_processes |
wmi_os_users |
wmi_os_virtual_memory_bytes |
wmi_os_virtual_memory_free_bytes |
wmi_system_context_switches_total |
wmi_system_processor_queue_length |
wmi_system_system_up_time |
wmi_system_threads |
Prerequisites
Windows Prometheus Bundle
The Sysdig Windows Prometheus Bundle is a comprehensive package that installs and configures a Prometheus Agent and the Windows Exporter allowing you to send metrics to your Sysdig Monitor account with ease
Getting Started
To begin monitoring your Windows machines, follow these steps:
- Download the binary installer from the latest release of this project
- Run the installer in your windows machine
- Configure the Sysdig region and your Sysdig API token in the wizard
- Select the collectors that you want to enable to produce metrics
- Finish the installation
- Go to your Sysdig Monitor account and start using the Microsoft Windows dashboards and alerts
Automated installation
You can automate the installation of the Sysdig Windows Prometheus Bundle across multiple machines using the command line or PowerShell.
Use the following command as an example:
msiexec /i windows_exporter-1.0.0-x64.msi ENABLED_COLLECTORS=cpu,os SYSDIG_URL="https://api.sysdigcloud.com/prometheus/remote/write" SYSDIG_TOKEN="yyyyyyy-zzzz-zzzz-zzzz-xxxxxxxx" /qn
This command will install the Sysdig Windows Prometheus Bundle with the specified settings, making it easy to deploy across your infrastructure.
By default, the Prometheus config file is installed in the path C:\Program Files\windows_exporter\prometheus.yml
, which can be manually edited to include additional Prometheus jobs.
Options and parameters
From the command line you can use these options:
ENABLED_COLLECTORS
: Comma separated list of collectorsSYSDIG_URL
: The Prometheus endpoint of your Sysdig Monitor region in the formhttps://api.sysdigcloud.com/prometheus/remote/write
. Consult the available regions here.COMPUTER_NAME
(optional): Overrides the labelinstance
in metrics generated by the Windows Exporter with a custom value. The default value is the computer name stored in theCOMPUTERNAME
Windows environment variable.PROMETHEUS_PORT
(optional): The Prometheus port. The default value is ‘9090’.PROMETHEUS_LOG_ENABLED
(optional): The Prometheus log feature, this creates log file of the prometheus agent into thewindows_exporter
folder. The default value is ‘0’.PROMETHEUS_LOG_LEVEL
(optional): The Prometheus log level, this configure the level of the log file if we previously enable the log. The default value is ‘info’.WINDOWS_EXPORTER_LISTEN_ADDR
(optional): The Windows Exporter IP address. The default value is ‘0.0.0.0’.WINDOWS_EXPORTER_LISTEN_PORT
(optional): The Windows Exporter port. The default value is ‘9182’.WINDOWS_EXPORTER_EXTRA_FLAGS
(optional): Windows Exporter additional CLI flags. The default value is an empty string.WINDOWS_EXPORTER_FIREWALL_REMOTE_ADDR
(optional): Comma separated remote IP addresses for the Windows Firewall exception (allow list). The default value is an empty string (any remote address).TEXTFILE_DIR
(only iftextfile
collector is enabled): The local folder where thetextfile
collector will look for files
Automated uninstallation
Use the following command to uninstall:
msiexec /x windows_exporter-1.0.0-x64.msi /qn
Installation
Installing an exporter is not required for this integration.
Monitoring and Troubleshooting Windows
This document describes important metrics and queries that you can use to monitor and troubleshoot Windows.
Windows Host Monitoring
CPU
Because CPU usage is critical, be aware of the mode of use of CPU. With 100 * avg by (mode) (rate(windows_cpu_time_total[5m]))
you can identify who is consuming the processor the most. One tip for this visualization is to focus on idle processes because they contribute to CPU usage.
For environments where you have huge machines and tons of cores, you can use the 100 * sum by (core) (rate(windows_cpu_time_total{mode != 'idle'}[5m]))
query to check for any potential peaks in every each of them and verify that they are sharing the load correctly.
Memory
Use the following queries to determine memory consumption in your windows host:
100* (windows_cs_physical_memory_bytes - windows_os_physical_memory_free_bytes) / windows_cs_physical_memory_bytes
windows_os_physical_memory_free_bytes
Additionally, you can use the following alert when the memory utilization is greater than the defined threshold:
100 * (windows_cs_physical_memory_bytes - windows_os_physical_memory_free_bytes) / windows_cs_physical_memory_bytes > 95
Disk
Disk capacity can be monitored by windows_logical_disk_free_bytes
and windows_logical_disk_size_bytes
.
With this query you can monitor if the disk is reaching its maximum capacity:
100 * (windows_logical_disk_size_bytes - windows_logical_disk_free_bytes) / windows_logical_disk_size_bytes > 95
Another factor to consider when you measure disk usage is IOPS. To monitor the write operations, use this query:
rate(windows_logical_disk_writes_total[5m])
Network
You can monitor network error rate for inbound and outbound packages with these following queries:
100 * rate(windows_net_packets_received_errors[5m]) / (rate(windows_net_packets_received_errors[5m]) + rate(windows_net_packets_received_total[5m])>0) > 75
100 * rate(windows_net_packets_outbound_errors[5m]) / (rate(windows_net_packets_outbound_errors[5m]) + rate(windows_net_packets_sent_total[5m])>0) > 75
Windows Process Monitoring
You can manage processes inside your machine and be aware about CPU that every process consume with the metric windows_process_cpu_time_total
for CPU, and the metric windows_process_working_set_bytes
for memory.
You can track Input and Output operations by process with the metric windows_process_io_operations_total
. This metric will give you information about some process that can overload your system.
Windows Service Monitoring
You can know about the status and health of the services inside your environment.
You can use this query to monitor the services that are running aggregated by status.
count by (status,instance)((windows_service_status > 0) * on (name) group_left(state) (windows_service_state{state=~"running"} > 0))
In order to identify every single behavior that is critical for your infrastructure, you have to learn about the properties and states of your services.
For state
you need to focus on stopped
and running
, for start mode
you have auto
, manual
and disabled
and for status
you will manage ok
and error
.
With those properties defined, you can monitor your services in running
state and error
status with the following query:
count(windows_service_status{status=~"error"} > 0)
You can also verify the services that are disabled
with the following query:
sum by(name,instance) (windows_service_start_mode{start_mode=~"disabled"} > 0)
Agent Configuration
This integration has no default agent job.
Feedback
Was this page helpful?
Glad to hear it! Please tell us how we can improve.
Sorry to hear that. Please tell us how we can improve.