Operations
This section covers everything that happens after you've deployed OrbitalReg and put real traffic through it.
Backup + disaster recovery
The two questions that should have answers before a release goes live:
- Are my backups working? Backup verification documents the weekly-restore-into-ephemeral-cluster job, the Prometheus alert it fires when stale, and the admin UI card that surfaces the last-good-restore.
- What do I do when the cluster is gone? Disaster recovery is the runbook. Three scenarios — DB loss, S3 loss, total loss — with copy-pasteable commands.
Postgres on CloudNativePG is the recommended production-shape Postgres: HA replicas, PITR, Barman-managed S3 backups. The chart also documents migrating from a stand-alone Postgres install.
Observability
Observability covers metrics, logs, and traces:
- Prometheus metrics over
/metrics(every chi handler is wrapped) - ServiceMonitor CRD shipped in the chart
- JSON-structured logs to stdout (Loki + Promtail-friendly)
- Optional OpenTelemetry traces via OTLP/HTTP (off by default)
Monitoring is the metrics deep-dive: the full metric catalogue, the Grafana import recipe for the bundled overview + deep-dive dashboards, three Alertmanager routing examples (PagerDuty / Slack / email-only) for the bundled alert + recording rules, sample PromQL for the five most common triage questions, and a section-per-alert runbook.
Versioning policy
Versioning policy covers Calendar Versioning (YYYY.MAJOR.MINOR), the daily update-channel poll, and the 18-month support window per year-major. The Admin Overview page shows the currently-installed version, when it was built, and whether an update is available.
Release pipeline
Release pipeline describes how every CalVer tag becomes three multi-arch container images on ghcr.io, signed with cosign-keyless OIDC and attested with a CycloneDX SBOM. The workflow runs entirely from GitHub Actions identity — no long-lived OrbitalReg key is involved — so a downstream auditor can verify provenance without a vendor handshake.
Demo seeder
Demo seeder (orbital seed) documents the test-user matrix the seeder writes alongside the demo projects, repos, and trust policies. Five gestaffelte access profiles (alice / bob / carol / dave / eve) on the default tier let an operator (or an integration test) walk the per-project RBAC matrix without an auth roundtrip; full-plus adds three richer profiles (token-only CI bot, SAML-asserted maintainer, second org admin) for sales walkthroughs.
Air-gapped mode
Air-gapped operations is the runbook for installs that can't reach the public internet:
- Egress is blocked by default on fresh installs
- Each integration (webhooks, OSV, Sigstore Rekor, telemetry, OTel) has its own opt-in toggle under Admin → System → Egress allowlist
- Documentation, the chart, and container images all ship as air-gap-friendly bundles
Day-2 checklist
A short list of "things that should be enabled before you go-live":
| Item | Doc |
|---|---|
| Postgres backups verified end-to-end | Backup verification |
| At least one tested DR drill in the last 90 days | Disaster recovery |
| ServiceMonitor scraping the API | Observability |
| Alert rules with runbook links | Observability |
| TLS via cert-manager, with a Renewal monitor | (operator's existing dashboard) |
| Air-gapped egress allowlist matches your security posture | Air-gapped operations |
| At least one project-owner exists per project | Core concepts |
| Retention policies on long-tail repos | Core concepts |
| Sigstore trust policies pinned for prod-deploy repos | Core concepts |