Skip to content

Postgres on CloudNativePG

CloudNativePG (CNPG) is a Postgres operator that handles HA replicas, PITR, and Barman-managed S3 backups inside Kubernetes. The OrbitalReg chart no longer ships its own Postgres StatefulSet for production — point at a CNPG cluster instead.

This page is the externally-rendered companion to docs/operations/postgres-migration-cnpg.md.

Why CNPG

  • Continuous WAL archiving — RPO of seconds, not the daily-snapshot RPO of a stand-alone install
  • Point-in-time recovery — restore to any second within the retention window
  • In-cluster failover — primary loss promotes a replica in under 30 seconds, no operator action
  • Backup verification — the Backup verification job restores into an ephemeral CNPG cluster, which is only easy because CNPG's bootstrap-from-backup flow is first-class

Install CNPG

The CloudNativePG operator itself is installed once per cluster:

bash
kubectl apply -f \
  https://raw.githubusercontent.com/cloudnative-pg/cloudnative-pg/main/releases/cnpg-1.24.0.yaml

Provision an OrbitalReg cluster

The OrbitalReg chart's values.postgres.cnpg.enabled=true mode templates a Cluster resource:

yaml
postgres:
  cnpg:
    enabled: true
    instances: 3
    storage:
      size: 100Gi
      storageClass: fast
    backup:
      enabled: true
      s3:
        endpoint: s3.example.com
        bucket: orbitalreg-postgres-backup
        existingSecret: orbitalreg-cnpg-s3
      retentionPolicy: "30d"
      schedule: "0 2 * * *"

The chart also creates a pg-superuser Secret and a read-write Service that OrbitalReg's API connects to via DATABASE_URL=postgres://…@<release>-postgres-rw:5432/orbitalreg.

Migrate from stand-alone Postgres

The full migration playbook lives at docs/operations/postgres-migration-cnpg.md. The shape:

  1. Drain writes — set the API to read-only mode under Admin → Maintenance (no new uploads, but downloads keep working).
  2. Take a pg_dump of the existing database.
  3. Provision the CNPG cluster with the chart values above.
  4. Restore the dump into the new cluster's database.
  5. Update DATABASE_URL to point at the new RW Service.
  6. Roll out and re-enable writes.

End-to-end downtime for a 50-GB database is typically 10–20 minutes on a warm cluster; the dump-and-restore is the long pole.

Day-2 operations

TaskCommand
Trigger an on-demand backupkubectl cnpg backup <cluster> -n <ns>
List backupskubectl get backup -n <ns>
Promote a replicakubectl cnpg promote <cluster> <pod>
Inspect WAL lagkubectl cnpg status <cluster>
Run the verify-restore drill./scripts/orbital-restore.sh --scenario verify --target-time "now"

Capacity sizing

A reasonable starting shape:

Workload sizeCNPG instancesCPU per podMem per podStorage
Small (≤ 10 GB)2500m1 Gi50 Gi
Medium (≤ 100 GB)312 Gi200 Gi
Large (≤ 1 TB)324 Gi2 Ti

OrbitalReg's hottest tables — artifacts, scan_findings, artifact_pulls — are bounded by retention; the retention runner keeps the row counts stable rather than growing without bound.

Released under the Apache-2.0 License.