Postgres on CloudNativePG
CloudNativePG (CNPG) is a Postgres operator that handles HA replicas, PITR, and Barman-managed S3 backups inside Kubernetes. The OrbitalReg chart no longer ships its own Postgres StatefulSet for production — point at a CNPG cluster instead.
This page is the externally-rendered companion to docs/operations/postgres-migration-cnpg.md.
Why CNPG
- Continuous WAL archiving — RPO of seconds, not the daily-snapshot RPO of a stand-alone install
- Point-in-time recovery — restore to any second within the retention window
- In-cluster failover — primary loss promotes a replica in under 30 seconds, no operator action
- Backup verification — the Backup verification job restores into an ephemeral CNPG cluster, which is only easy because CNPG's bootstrap-from-backup flow is first-class
Install CNPG
The CloudNativePG operator itself is installed once per cluster:
kubectl apply -f \
https://raw.githubusercontent.com/cloudnative-pg/cloudnative-pg/main/releases/cnpg-1.24.0.yamlProvision an OrbitalReg cluster
The OrbitalReg chart's values.postgres.cnpg.enabled=true mode templates a Cluster resource:
postgres:
cnpg:
enabled: true
instances: 3
storage:
size: 100Gi
storageClass: fast
backup:
enabled: true
s3:
endpoint: s3.example.com
bucket: orbitalreg-postgres-backup
existingSecret: orbitalreg-cnpg-s3
retentionPolicy: "30d"
schedule: "0 2 * * *"The chart also creates a pg-superuser Secret and a read-write Service that OrbitalReg's API connects to via DATABASE_URL=postgres://…@<release>-postgres-rw:5432/orbitalreg.
Migrate from stand-alone Postgres
The full migration playbook lives at docs/operations/postgres-migration-cnpg.md. The shape:
- Drain writes — set the API to read-only mode under Admin → Maintenance (no new uploads, but downloads keep working).
- Take a
pg_dumpof the existing database. - Provision the CNPG cluster with the chart values above.
- Restore the dump into the new cluster's database.
- Update
DATABASE_URLto point at the new RW Service. - Roll out and re-enable writes.
End-to-end downtime for a 50-GB database is typically 10–20 minutes on a warm cluster; the dump-and-restore is the long pole.
Day-2 operations
| Task | Command |
|---|---|
| Trigger an on-demand backup | kubectl cnpg backup <cluster> -n <ns> |
| List backups | kubectl get backup -n <ns> |
| Promote a replica | kubectl cnpg promote <cluster> <pod> |
| Inspect WAL lag | kubectl cnpg status <cluster> |
| Run the verify-restore drill | ./scripts/orbital-restore.sh --scenario verify --target-time "now" |
Capacity sizing
A reasonable starting shape:
| Workload size | CNPG instances | CPU per pod | Mem per pod | Storage |
|---|---|---|---|---|
| Small (≤ 10 GB) | 2 | 500m | 1 Gi | 50 Gi |
| Medium (≤ 100 GB) | 3 | 1 | 2 Gi | 200 Gi |
| Large (≤ 1 TB) | 3 | 2 | 4 Gi | 2 Ti |
OrbitalReg's hottest tables — artifacts, scan_findings, artifact_pulls — are bounded by retention; the retention runner keeps the row counts stable rather than growing without bound.
Related docs
- Disaster recovery — restore runbook
- Backup verification — weekly automated restore-into-ephemeral-cluster
docs/operations/postgres-migration-cnpg.md— full migration recipe