Windows
This integration is disabled by default. Please contact Sysdig Support to enable it in your account.
This integration has 77 metrics.
List of Alerts
Alert | Description | Format |
---|---|---|
[Windows] High CPU Usage | The CPU of the Windows instance reached 95% of use | Prometheus |
[Windows] High Disk Usage | Disk full over 95% in instance {{$labels.instance}} | Prometheus |
[Windows] High Physical Memory Usage | High physical memory usage in instance | Prometheus |
[Windows] High Network Inbound Errors | High inbound network error rate in instance | Prometheus |
[Windows] High Network Outbound Errors | High outbound network error rate in instance | Prometheus |
[Windows] Increase of Disk writes time | Increase of Disk writes time | Prometheus |
[Windows] Queue of Writes and reads Disk operations is growing | The queue for writes and reads disk operations is growing | Prometheus |
[Windows] High percent of swap space used | The swap space has reached high amount of used | Prometheus |
[Windows] Network bandwidth is reaching its limit | Network Bandwith use is reaching its limit | Prometheus |
[Windows] High number of transitions virtual addresses into disk | The rate at which pages transition to resident memory without being written to disk has reached problematic limit | Prometheus |
List of Dashboards
Windows Host Overview
The dashboard provides information about the Windows host.
Windows Process Overview
The dashboard provides information about the Windows processes.
Windows Services Overview
The dashboard provides information about the Windows services.
Windows Node Overview (Legacy)
The dashboard provides information about the Windows nodes (legacy).
List of Metrics
Metric name |
---|
windows_cpu_core_frequency_mhz |
windows_cpu_time_total |
windows_cs_physical_memory_bytes |
windows_logical_disk_free_bytes |
windows_logical_disk_read_bytes_total |
windows_logical_disk_reads_total |
windows_logical_disk_requests_queued |
windows_logical_disk_size_bytes |
windows_logical_disk_split_ios_total |
windows_logical_disk_write_bytes_total |
windows_logical_disk_write_seconds_total |
windows_logical_disk_writes_total |
windows_memory_transition_faults_total |
windows_net_bytes_received_total |
windows_net_bytes_sent_total |
windows_net_bytes_total |
windows_net_current_bandwidth_bytes |
windows_net_packets_outbound_discarded_total |
windows_net_packets_outbound_errors |
windows_net_packets_outbound_errors_total |
windows_net_packets_received_discarded_total |
windows_net_packets_received_errors |
windows_net_packets_received_errors_total |
windows_net_packets_received_total |
windows_net_packets_sent_total |
windows_os_paging_free_bytes |
windows_os_paging_limit_bytes |
windows_os_physical_memory_free_bytes |
windows_os_processes |
windows_os_users |
windows_os_virtual_memory_bytes |
windows_os_virtual_memory_free_bytes |
windows_process_cpu_time_total |
windows_process_io_bytes_total |
windows_process_io_operations_total |
windows_process_threads |
windows_process_working_set_bytes |
windows_service_info |
windows_service_start_mode |
windows_service_state |
windows_service_status |
windows_system_context_switches_total |
windows_system_processor_queue_length |
windows_system_system_up_time |
windows_system_threads |
wmi_cpu_core_frequency_mhz |
wmi_cpu_time_total |
wmi_cs_physical_memory_bytes |
wmi_logical_disk_free_bytes |
wmi_logical_disk_read_bytes_total |
wmi_logical_disk_reads_total |
wmi_logical_disk_requests_queued |
wmi_logical_disk_size_bytes |
wmi_logical_disk_split_ios_total |
wmi_logical_disk_write_bytes_total |
wmi_logical_disk_writes_total |
wmi_net_bytes_received_total |
wmi_net_bytes_sent_total |
wmi_net_bytes_total |
wmi_net_current_bandwidth |
wmi_net_packets_outbound_discarded |
wmi_net_packets_outbound_errors |
wmi_net_packets_received_discarded |
wmi_net_packets_received_errors |
wmi_net_packets_received_total |
wmi_net_packets_sent_total |
wmi_os_paging_free_bytes |
wmi_os_paging_limit_bytes |
wmi_os_physical_memory_free_bytes |
wmi_os_processes |
wmi_os_users |
wmi_os_virtual_memory_bytes |
wmi_os_virtual_memory_free_bytes |
wmi_system_context_switches_total |
wmi_system_processor_queue_length |
wmi_system_system_up_time |
wmi_system_threads |
Preparing the Integration
Enable Windows Prometheus Metrics
In order to collect metrics from Windows VMs, you need to install the Windows exporter Prometheus agent, and the Prometheus Server.
Windows exporter
This component connects to WMI and exposes Windows metrics in Prometheus metric format.
To install this exporter:
- Dowload the latest version from Windows Exporter repository
- Configure the exporter
- Run the exporter with
$>.\exporter.exe --config.file=config.yaml
Exporter configuration
You can configure Windows exporter using the config.yaml
file as follows:
# Configuration and more info
# https://github.com/prometheus-community/windows_exporter
collectors:
enabled: cpu,cs,logical_disk,net,os,service,system,textfile,process
collector:
textfile:
directory: C:\Path\metrics\
# service:
# services-where: "Name='windows_exporter'"
log:
level: warn
Prometheus Agent
This component collects metrics from the Windows exporter and forwards them to Prometheus Server Remote Write endpoint.
To install the agent:
- Download the latest version from Prometheus Repository
- Configure the agent
- Run agent with
$>.\prometheus.exe --enable-feature=agent
Agent Configuration
You can configure Windows exporter using the prometheus.yaml
file as follows:
global:
scrape_interval: 10s # Set the scrape interval to every 15 seconds. Default is every 1 minute.
remote_write:
- url: "https://api.sysdigcloud.com/prometheus/remote/write"
bearer_token: "xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx"
proxy_url: "https://proxy.url:port" # Set the correct proxy url
scrape_configs:
- job_name: "windows_exporter"
# metrics_path defaults to '/metrics'
# scheme defaults to 'http'.
static_configs:
- targets: ["localhost:9182"]
metric_relabel_configs:
- source_labels: [instance]
target_label: instance
regex: '(.*)'
replacement: 'windows-vm-demo'
Installing
The installation of an exporter is not required for this integration.
Monitoring and Troubleshooting Windows
This document describes important metrics and queries that you can use to monitor and troubleshoot Windows.
Windows Host Monitoring
CPU
Because CPU usage is critical, be aware of the mode of use of CPU. With 100 * avg by (mode) (rate(windows_cpu_time_total[5m]))
you can identify who is consuming the processor the most. One tip for this visualization is to focus on idle processes because they contribute to CPU usage.
For environments where you have huge machines and tons of cores, you can use the 100 * sum by (core) (rate(windows_cpu_time_total{mode != 'idle'}[5m]))
query to check for any potential peaks in every each of them and verify that they are sharing the load correctly.
Memory
Use the following queries to determine memory consumption in your windows host:
100* (windows_cs_physical_memory_bytes - windows_os_physical_memory_free_bytes) / windows_cs_physical_memory_bytes
windows_os_physical_memory_free_bytes
Additionally, you can use the following alert when the memory utilization is greater than the defined threshold:
100 * (windows_cs_physical_memory_bytes - windows_os_physical_memory_free_bytes) / windows_cs_physical_memory_bytes > 95
Disk
Disk capacity can be monitored by windows_logical_disk_free_bytes
and windows_logical_disk_size_bytes
.
With this query you can monitor if the disk is reaching its maximum capacity:
100 * (windows_logical_disk_size_bytes - windows_logical_disk_free_bytes) / windows_logical_disk_size_bytes > 95
Another factor to consider when you measure disk usage is IOPS. To monitor the write operations, use this query:
rate(windows_logical_disk_writes_total[5m])
Network
You can monitor network error rate for inbound and outbound packages with these following queries:
100 * rate(windows_net_packets_received_errors[5m]) / (rate(windows_net_packets_received_errors[5m]) + rate(windows_net_packets_received_total[5m])>0) > 75
100 * rate(windows_net_packets_outbound_errors[5m]) / (rate(windows_net_packets_outbound_errors[5m]) + rate(windows_net_packets_sent_total[5m])>0) > 75
Windows Process Monitoring
You can manage processes inside your machine and be aware about CPU that every process consume with the metric windows_process_cpu_time_total
for CPU, and the metric windows_process_working_set_bytes
for memory.
You can track Input and Output operations by process with the metric windows_process_io_operations_total
. This metric will give you information about some process that can overload your system.
Windows Service Monitoring
You can know about the status and health of the services inside your environment.
You can use this query to monitor the services that are running aggregated by status.
count by (status,instance)((windows_service_status > 0) * on (name) group_left(state) (windows_service_state{state=~"running"} > 0))
In order to identify every single behavior that is critical for your infrastructure, you have to learn about the properties and states of your services.
For state
you need to focus on stopped
and running
, for start mode
you have auto
, manual
and disabled
and for status
you will manage ok
and error
.
With those properties defined, you can monitor your services in running
state and error
status with the following query:
count(windows_service_status{status=~"error"} > 0)
You can also verify the services that are disabled
with the following query:
sum by(name,instance) (windows_service_start_mode{start_mode=~"disabled"} > 0)
Agent Configuration
This integration has no default agent job.
Feedback
Was this page helpful?
Glad to hear it! Please tell us how we can improve.
Sorry to hear that. Please tell us how we can improve.