Backup and Restore High Availability PostgreSQL Clusters
The tool can be leveraged in the following scenarios:
PostgreSQL cluster cannot start, for example, due to corrupted data. Databases need to be restored in a new Postgres HA instance.
Kubernetes cluster is not fully functional. Customer needs to recreate all Postgres databases in a new cluster for fast recovery.
Back Up a PostgreSQL HA Cluster
Use the Installer to deploy the backup tool in the cluster so it creates periodical backups to Amazon S3 or S3-compatible object storage. Later on, the most recent copy will be used to restore the databases. The backup tool will be installed as a cronjob called pg-backup-ha-cronjob
in the sysdigcloud
namespace. By default, it will create backups every 6 hours.
Prerequisites
- An S3 or S3 compatible bucket is provisioned.
- AWS Access Key ID and AWS Secret Access Key with appropriate privileges to use the bucket.
Backup Configuration
Use the configuration parameters in the values.yaml
associated with the Installer to configure the backup operation.
Manually Trigger a Backup
You can also trigger a backup on-demand using the following command:
kubectl create job pg-backup --from=cronjob/pg-backup-ha-cronjob -n sysdigcloud
Verify Backup Operation
In the sysdigcloud
namespace, run the following command. Replace the <backup-pod-name>
with the pod in which the latest backup job is executed.
kubectl get pods -n sysdigcloud | grep "pg-backup-ha-cronjob"
kubectl logs <backup-pod-name> -n sysdigcloud
The log of the pod provides the details about the backup operations. A successful backup job should generate logs similar to this:
2023-11-07T23:06:21+00:00 - INFO - Checking envs
2023-11-07T23:06:21+00:00 - INFO - Validating S3 Bucket
2023-11-07T23:06:21+00:00 - INFO - Aws: S3 region is: us-east-1
2023-11-07T23:06:21+00:00 - INFO - Starting
2023-11-07T23:06:21+00:00 - INFO - Checking envs
2023-11-07T23:06:21+00:00 - INFO - Connecting to S3 and backing up
2023-11-07T23:06:21+00:00 - INFO - Done
Restore a PostgreSQL HA Backup
The restore tool relies on the Kubernetes Job. As datastore restoration is carried out only upon request, it is not bundled within the Installer binary. You can trigger the database restore operation by applying the given YAML file. The duration of the restoration process will vary depending on the size of the databases.
Executing the restore necessitates scaling down all deployments in the sysdig namespace and a StatefulSet. This ensures a seamless and error-free database restoration.
This topic assumes that the most recent backup can be found in the S3 bucket in the path as indicated in the Backup section.
Scale Down the Workloads
Count the amount of replicas of the StatefulSet
sysdigcloud-netsec-ingest
:kubectl get sts sysdigcloud-netsec-ingest -n sysdigcloud
An example output:
NAME READY AGE sysdigcloud-netsec-ingest 1/1 4h11m
Note it down for future use.
Count the number of ready replicas for the all the sysdig deployments:
kubectl get deploy -n sysdigcloud
An example output:
NAME READY UP-TO-DATE AVAILABLE AGE ingress-default-backend 1/1 1 1 4h7m registry-scanner-api 2/2 2 2 3h55m sysdig-alert-manager 1/1 1 1 4h4m
Note it down for future use.
Scale down the workloads:
kubectl scale deployment --replicas 0 --all -n sysdigcloud kubectl scale sts sysdigcloud-netsec-ingest --replicas=0 -n sysdigcloud
Apply the Kubernetes Job
Apply the example Kubernetes job file to the cluster in the sysdigcloud
namespace:
An example Job for Restore:
apiVersion: batch/v1
kind: Job
metadata:
name: pg-restore-ha-job
namespace: sysdigcloud
generateName: pg-restore-ha
spec:
ttlSecondsAfterFinished: 200
template:
spec:
restartPolicy: Never
containers:
- image: quay.io/sysdig/postgres-backup-onprem:0.1.3
name: pg-backup-ha
command: ["/usr/local/bin/pg-restore.sh"]
env:
- name: TZ
value: Etc/UTC
- name: LOGICAL_BACKUP_PROVIDER
value: s3
- name: LOGICAL_BACKUP_S3_BUCKET
value: example-bucket
- name: LOGICAL_BACKUP_S3_REGION
value: us-east-1
- name: LOGICAL_BACKUP_PATH
value: "demo-path"
- name: PGPORT
value: "5432"
- name: PGHOST
value: sysdigcloud-postgres-cluster
- name: PGUSER
valueFrom:
secretKeyRef:
name: root.sysdigcloud-postgres-cluster.credentials.postgresql.acid.zalan.do
key: username
- name: PGDATABASE
value: postgres
- name: PGSSLMODE
value: require
- name: PGPASSWORD
valueFrom:
secretKeyRef:
name: root.sysdigcloud-postgres-cluster.credentials.postgresql.acid.zalan.do
key: password
- name: AWS_ACCESS_KEY_ID
value: XXXXXXXXXX
- name: AWS_SECRET_ACCESS_KEY
value: YYYYYYYYYY
imagePullSecrets:
- name: sysdigcloud-pull-secret
Verify Restore Operation
You can run the following command in the sysdigcloud namespace to get the name of the pod which runs the restore job.
kubectl get pods -n sysdogcloud | grep "pg-restore-ha-job"
kubectl logs <restore-pod-name> -n sysdigcloud
The pod logs provides the indication whether the job is completed successfully or not.
2024-01-07T09:00:00+00:00 - INFO - Starting
2024-01-07T09:00:00+00:00 - INFO - Checking envs
2024-01-07T09:00:00+00:00 - INFO - Connecting to S3 and restoring
2024-01-07T09:20:00+00:00 - INFO - Done
Scaling Up the Workloads
When the job is complete, scale up each deployment and StatefulSet by using the number of replicas noted down earlier.
Deployments
For example:
kubectl scale deployment registry-scanner-api --replicas 2 -n sysdigcloud
kubectl scale deployment ingress-default-backend --replicas 1 -n sysdigcloud
kubectl scale deployment sysdig-alert-manager --replicas 1 -n sysdigcloud
StatefulSet
For example:
kubectl scale sts sysdigcloud-netsec-ingest --replicas=1 -n sysdigcloud
Configuration Parameters
Parameter | Value | Example |
---|---|---|
logicalBackupS3Bucket | The AWS S3 bucket name. | example-bucket |
logicalBackupS3Region | The AWS Region where S3 bucket resides. | us-east-1 |
logicalBackupPath | The path to the backup files. | |
logicalBackupProvider | AWS S3 | S3 |
awsAccessKeyID | AWS Access Key ID | |
awsSecretAccessKey | AWS Secret Access Key | |
deploymentEnvironment | The variable that determines whether the backups includes all databases, or Sysdig only databases. * If the value is left blank, then all databases will be backed up. * If the value is set to sysdig_databases , all the databases are backed up except template0 , template1 ,postgres . | sysdig_databases |
enabled | The variable name that determines whether the backup is enabled or disabled . The value can be enabled or disabled . The default is enabled . | enabled |
schedule | The frequency to perform the database backup operation. | "* */6 * * *" |
Feedback
Was this page helpful?
Glad to hear it! Please tell us how we can improve.
Sorry to hear that. Please tell us how we can improve.