OrbitalRegHighDiskUsage
Severity: info · For: 10m · Runbook owner: platform on-call
What it means
A persistent volume claim whose name matches pvcSelector (default regex: .*orbitalreg.*) has been above 85% utilisation for 10 minutes. Computed from kubelet's kubelet_volume_stats_used_bytes / _capacity_bytes.
This is a heads-up, not an emergency. At 85% you have hours to days of runway depending on growth rate. At 100% Postgres stops writing WAL (OrbitalRegDBDown) and the API stops accepting uploads.
Likely causes
- PVC was sized for early-deployment volumes; now actually loaded.
- Retention policies aren't running — orphaned artifacts piling up.
- WAL retention window in CNPG is longer than the volume can support (default 30d).
- Trash isn't being purged — Admin → Trash → "Empty" hasn't been clicked.
- Detection scratch directory isn't being cleaned between scans.
Diagnose
bash
# Per-PVC fill
kubectl -n orbitalreg get pvc
# Top contributors inside the Postgres PVC (CNPG mode)
kubectl -n orbitalreg exec orbitalreg-postgres-1 -- \
du -sh /var/lib/postgresql/data/pgdata/* 2>/dev/null | sort -h
# Size of `trash` table (S3 dedup-aware)
kubectl -n orbitalreg exec deploy/orbitalreg-api -- sh -c '
psql "$ORBITALREG_PG_DSN" -c "
SELECT pg_size_pretty(SUM(size_bytes)) AS total,
COUNT(*) FROM trash WHERE purged_at IS NULL;"
'
# Pending retention work
curl -sf -H "Authorization: Bearer $TOKEN" \
https://orbitalreg.example.com/api/admin/retention/status | jqThe Admin → Storage page renders the same data with growth-rate sparklines.
Fix
- Resize the PVC:bashStorageClass must have
kubectl -n orbitalreg patch pvc <name> -p \ '{"spec":{"resources":{"requests":{"storage":"100Gi"}}}}'allowVolumeExpansion: true. - Drain the trash — Admin → Trash → "Empty" (purges to S3 garbage-collected state, frees the rows but not S3 bytes; S3 lifecycle handles those).
- Run retention now — Admin → Retention → "Run all policies".
- Tighten WAL retention — CNPG
Cluster.spec.backup.barmanObjectStore.data.retentionPolicyfrom 30d to 14d; trade-off is a smaller PITR window.
Escalate
- PVC at 95% and
allowVolumeExpansion: falseon the StorageClass → escalate to the platform-storage team; needs a coordinated migration to a larger volume. - Disk usage growing despite trash + retention being run → likely a detection scratch leak; file with the OrbitalReg backend team with a
du -h --max-depth=2of the Postgres or API PVC.