GCP Cloud PostgreSQL

Metrics, Dashboards, Alerts and more for GCP Cloud PostgreSQL Integration in Sysdig Monitor.
GCP Cloud PostgreSQL

You can enable this integration using the Connect a GCP Account option on the Monitor UI.

This integration has 32 metrics.

List of Alerts

AlertDescriptionFormat
[GCP Cloud PostgreSQL] Database DownThe PostgreSQL database is down.Prometheus
[GCP Cloud PostgreSQL] No High AvailabilityThe PostgreSQL database doesn’t have ‘High Availability’ enabled, or there are no available instances in other zones for automatic failover. The service will be down when the instance undergoes maintenance or has any other issues.Prometheus
[GCP Cloud PostgreSQL] High CPU UsageThe CPU usage is reaching the limit.Prometheus
[GCP Cloud PostgreSQL] High Memory UsageThe memory usage is reaching the limit.Prometheus
[GCP Cloud PostgreSQL] Disk Full in 48hThe disk will be full in 48 hours.Prometheus
[GCP Cloud PostgreSQL] Disk Full in 12hThe disk will be full in 12 hours.Prometheus
[GCP Cloud PostgreSQL] Replica LagThe PostgreSQL replica is lagging behind its primary instance. This issue is only applicable to replicas.Prometheus
[GCP Cloud PostgreSQL] Network LagThe time taken from primary binary log to IO thread on replica is high. This issue is only applicable to replicas.Prometheus
[GCP Cloud PostgreSQL] Low Cache Hit RateThe cache hit rate is low.Prometheus
[GCP Cloud PostgreSQL] DeadLocks In DatabaseDeadlocks detected in database.Prometheus
[GCP Cloud PostgreSQL] High Transaction ID UsageThe transaction ID usage is reaching the limit.Prometheus

List of Dashboards

GCP Cloud PostgreSQL

The dashboard provides information on the GCP Cloud PostgreSQL integration. GCP Cloud PostgreSQL

List of Metrics

Metric name
gcp_cloudsql_database_database_auto_failover_request_count
gcp_cloudsql_database_database_available_for_failover
gcp_cloudsql_database_database_cpu_utilization
gcp_cloudsql_database_database_disk_bytes_used_by_data_type
gcp_cloudsql_database_database_disk_read_ops_count
gcp_cloudsql_database_database_disk_utilization
gcp_cloudsql_database_database_disk_write_ops_count
gcp_cloudsql_database_database_instance_state
gcp_cloudsql_database_database_memory_utilization
gcp_cloudsql_database_database_network_received_bytes_count
gcp_cloudsql_database_database_network_sent_bytes_count
gcp_cloudsql_database_database_postgresql_blocks_read_count
gcp_cloudsql_database_database_postgresql_deadlock_count
gcp_cloudsql_database_database_postgresql_num_backends
gcp_cloudsql_database_database_postgresql_num_backends_by_application
gcp_cloudsql_database_database_postgresql_num_backends_by_state
gcp_cloudsql_database_database_postgresql_replication_replica_byte_lag
gcp_cloudsql_database_database_postgresql_temp_bytes_written_count
gcp_cloudsql_database_database_postgresql_temp_files_written_count
gcp_cloudsql_database_database_postgresql_transaction_count
gcp_cloudsql_database_database_postgresql_transaction_id_count
gcp_cloudsql_database_database_postgresql_transaction_id_utilization
gcp_cloudsql_database_database_postgresql_tuple_size
gcp_cloudsql_database_database_postgresql_tuples_fetched_count
gcp_cloudsql_database_database_postgresql_tuples_processed_count
gcp_cloudsql_database_database_postgresql_tuples_returned_count
gcp_cloudsql_database_database_postgresql_vacuum_oldest_transaction_age
gcp_cloudsql_database_database_replication_network_lag
gcp_cloudsql_database_database_replication_replica_lag
gcp_cloudsql_database_database_replication_state
gcp_cloudsql_database_database_up
gcp_cloudsql_database_database_uptime

Monitoring and Troubleshooting GCP Cloud PostgreSQL

This document describes important metrics and queries that you can use to monitor and troubleshoot GCP Cloud PostgreSQL.

Most of the metrics covered in this document are applicable to all the GCP Cloud SQL integrations (MySQL, PostgreSQL, SQL Server). There is a separate section for MySQL metrics.

The GCP metrics of type DELTA, which typically have the suffix _count in their name (but not always), return the cumulative value of 1 minute. They need to be divided by 60 to get the value per second.

Status and Replication

Status

Use the following query to get the status of the database. A value of 1 means UP, and a value of 0 means DOWN.

gcp_cloudsql_database_database_up

Use the following query to get the current serving state of the Cloud SQL instance. The state is stored inside the label state.

gcp_cloudsql_database_database_instance_state > 0

The state can be one of the following:

  • RUNNING: The instance is expected to be running. If an instance experiences unplanned (non-maintenance) downtime, the state will still be RUNNING, but the Database UP metric will report 0.
  • SUSPENDED: The instance is not available, for example, due to problems with billing.
  • RUNNABLE: The instance has been stopped by the owner. It is not currently running but is ready to be restarted.
  • PENDING_CREATE: The instance is being created.
  • MAINTENANCE: The instance is down for maintenance.
  • FAILED: The instance creation failed.
  • UNKNOWN_STATE: The state of the instance is unknown.

Replication

Failover

The following query returns a value > 0 if the failover operation is available on the instance.

gcp_cloudsql_database_database_available_for_failover

Use the following query to get the number of instance auto-failover requests per second.

gcp_cloudsql_database_database_auto_failover_request_count / 60
State

Use the following query to get the current serving state of replication. The state is stored inside the label state. According to the GCP documentation, this metric is in the BETA launch stage.

gcp_cloudsql_database_database_replication_state > 0

The state can be one of the following:

  • RUNNING: Replication is active and running.
  • STOPPED: Replication is inactive and stopped.
  • ERROR: An error has occurred and replication is stopped.
  • SYNCING: The replication is syncing.
  • UNSYNCED: The replication is unsynced.
Replica and Network Lag

Use the following query to get the approximate number of seconds the read replica is behind its primary instance. This is only applicable to replicas.

gcp_cloudsql_database_database_replication_replica_lag

The following query indicates time taken from primary binary log to IO thread on replica. This is only applicable to replicas.

gcp_cloudsql_database_database_replication_network_lag

CPU, Memory and Storage

CPU

Use the following query to get the current CPU utilization represented as a percentage of the reserved CPU that is currently in use. The return values are typically numbers between 0 and 1 (but might exceed 1).

gcp_cloudsql_database_database_cpu_utilization

Memory

Use the following query to get the percentage of the memory quota that is currently in use. The return values are numbers between 0 and 1.

gcp_cloudsql_database_database_memory_utilization

Disk

Use the following query to get the percentage of the disk quota that is currently in use. The return values are numbers between 0 and 1.

gcp_cloudsql_database_database_disk_utilization

Use the following query to get the disk usage per data type. The data type is stored inside the label data_type. According to the GCP documentation, this metric is in the BETA launch stage.

gcp_cloudsql_database_database_disk_bytes_used_by_data_type

Use the following queries to get the disk read and write IO operations per second.

gcp_cloudsql_database_database_disk_read_ops_count / 60

gcp_cloudsql_database_database_disk_write_ops_count / 60

Network

Use the following queries to get the bytes received and sent through the network per second.

gcp_cloudsql_database_database_network_received_bytes_count / 60

gcp_cloudsql_database_database_network_sent_bytes_count / 60

PostgreSQL Stats

Backends

Use the following query to get the number of connections to the Cloud SQL PostgreSQL instance.

gcp_cloudsql_database_database_postgresql_num_backends

Use the following query to get the number of connections to the Cloud SQL PostgreSQL instance, grouped by its state. The possible values inside the state label are idle, active, idle_in_transaction, idle_in_transaction_aborted, disabled or fastpath_function_call. According to the GCP documentation, this metric is in the BETA launch stage.

gcp_cloudsql_database_database_postgresql_num_backends_by_state

Use the following query to get the number of connections to the Cloud SQL PostgreSQL instance, grouped by applications. The application is stored inside the label application.

gcp_cloudsql_database_database_postgresql_num_backends_by_application

Transactions

Use the following query to get the number of transactions per second. The transaction type is stored inside the label transaction_type.

gcp_cloudsql_database_database_postgresql_transaction_count / 60

Use the following query to get the number of transaction IDs per second. The possible values inside the action label are assigned or frozen. assigned indicates the number of transaction IDs assigned and consumed by the instance, whereas frozen indicates the count of transaction IDs replenished by the VACUUM’s freeze operation.

gcp_cloudsql_database_database_postgresql_transaction_id_count / 60

Use the following query to get the current utilization represented as a percentage of transaction IDs consumed by the Cloud SQL PostgreSQL instance. Values are numbers between 0 and 1.

gcp_cloudsql_database_database_postgresql_transaction_id_utilization

Use the following query to get the age of the oldest transaction yet to be vacuumed in the Cloud SQL PostgreSQL instance. The metric is measured as the number of transactions that have happened since the oldest transaction. The possible values of the oldest_transaction_type label are running, prepared, replication_slot and replica.

gcp_cloudsql_database_database_postgresql_vacuum_oldest_transaction_age

Tuples

Use the following query to get the number of tuples (rows) in the database. The possible values inside the tuple_state label are live or dead. According to the GCP documentation, this metric is in the BETA launch stage.

gcp_cloudsql_database_database_postgresql_tuple_size

Use the following query to get the number of tuples(rows) processed per second for a given database for operations like insert, update, or delete. The operation type is stored inside the label operation_type. According to the GCP documentation, this metric is in the BETA launch stage.

gcp_cloudsql_database_database_postgresql_tuples_processed_count / 60

Use the following query to get the total number of rows fetched per second as a result of queries per database in the PostgreSQL instance.

gcp_cloudsql_database_database_postgresql_tuples_fetched_count / 60

Use the following query to get the total number of rows scanned per second while processing the queries per database in the PostgreSQL instance.

gcp_cloudsql_database_database_postgresql_tuples_returned_count / 60

Disk Read, Deadlocks and Replication

Disk Read

Use the following query to get the number of disk blocks read by this database per second. According to the GCP documentation, this metric is in the BETA launch stage.

gcp_cloudsql_database_database_postgresql_blocks_read_count{source="disk"} / 60

Use the following query to get the number of reads from buffer cache by this database per second. According to the GCP documentation, this metric is in BETA launch stage.

gcp_cloudsql_database_database_postgresql_blocks_read_count{source="buffer_cache"} / 60

Using the two previous metrics, calculate the hit cache ratio with the following query.

gcp_cloudsql_database_database_postgresql_blocks_read_count{source="buffer_cache"}
/
(gcp_cloudsql_database_database_postgresql_blocks_read_count{source="buffer_cache"} + ignoring(source) gcp_cloudsql_database_database_postgresql_blocks_read_count{source="disk"})

Use the following query to get the total amount of data in bytes written per second to temporary files by the queries per database.

gcp_cloudsql_database_database_postgresql_temp_bytes_written_count / 60

Use the following query to get the total number of temporary files used per second for writing data while performing algorithms such as join and sort.

gcp_cloudsql_database_database_postgresql_temp_files_written_count / 60
Deadlocks

Use the following query to get the number of deadlocks detected per second for this database. According to the GCP documentation, this metric is in the BETA launch stage.

gcp_cloudsql_database_database_postgresql_deadlock_count / 60
Replication

Use the following query to get the Replication lag in bytes. Reported from the master per replica. The possible values of the replica_lag_type label are replay_location, flush_location, write_location, and sent_location.

gcp_cloudsql_database_database_postgresql_replication_replica_byte_lag