Structured logs
OrbitalReg emits one JSON log line per event from every subsystem (API handlers, scan dispatcher, retention sweeper, backup mirror, replication worker, …). The schema is the contract your SIEM ingestion (Splunk / Elastic / Loki / Grafana / Datadog) maps onto — keeping it stable across releases is treated as a public-API guarantee.
Phase A of PRODUCT-ROADMAP item 63 introduced the canonical field names and the internal/applog helper package, and migrated the scan dispatcher as the pilot subsystem. Phase B extended the migration to four further emitters — retention sweeper, backup-mirror dual-write + recovery worker, cross-instance Geo-Sync push/pull/apply workers, and the OIDC token-exchange handler — so every long-running background loop and every credential-mint hot path now writes the canonical schema. Phase C lands the end-to-end SIEM-ingestion recipes for Loki, Elasticsearch, and Splunk plus a shared sample-query catalogue — see SIEM setup below.
Output format
- JSON, one event per line, written to stdout.
- Time stamp lives under the stdlib
slogdefault keytime(RFC3339 with nanos). - Severity under
level(DEBUG/INFO/WARN/ERROR). - Message under
msg. - Level threshold is configurable per-process via
ORBITALREG_LOG_LEVEL=debug|info|warn|error(defaultinfo).
Canonical fields
| Key | Type | When emitted |
|---|---|---|
component | string | always — names the subsystem (api, scan_dispatcher, retention_sweeper, backup_mirror, replication, geo_sync, notify, imports, oidc_exchange, maintenance, cnpg_mirror, seed) |
subcomponent | string | optional — handler name for API events (admin.security, webhooks, …) |
request_id | string | every API event — chi's request-id middleware value |
correlation_id | string | optional — copied from the X-Correlation-Id request header |
principal_kind | string | when an authenticated identity is on the context: user | service_account | anonymous |
principal_id | string | UUID of the user or service account |
principal_email | string | for human users only |
project_id | string | when the event scopes to a project |
repo_id | string | when the event scopes to a repository |
artifact_id | string | when the event scopes to a single artifact |
latency_ms | int64 | duration of the operation (handler, scan, replication tick, …) |
bytes_in | int64 | request body / pulled bytes |
bytes_out | int64 | response body / pushed bytes |
err | string | on failure — error message |
err_code | string | on failure — stable snake_case discriminator (e.g. pg_query_failed, scan_timeout, s3_get_failed) |
The level / time / msg keys are set by the stdlib slog JSON handler and are not in OrbitalReg's control — treat them as part of the schema.
Example event
{
"time": "2026-05-04T14:32:18.412304Z",
"level": "WARN",
"msg": "claim query failed",
"component": "scan_dispatcher",
"err": "context deadline exceeded",
"err_code": "pg_query_failed"
}{
"time": "2026-05-04T14:32:19.005111Z",
"level": "INFO",
"msg": "served",
"component": "api",
"subcomponent": "admin.security",
"request_id": "rq-7a13f0",
"principal_kind": "user",
"principal_id": "5f6c2a08-…",
"principal_email": "alice@example.test",
"project_id": "0a9d40b8-…",
"latency_ms": 47,
"bytes_out": 13128
}Helpers (Go)
Workers and handlers build their *slog.Logger through the helpers in api/internal/applog so a typo in request_id → req_id is a compile error rather than a SIEM regex miss:
// Worker — pre-tags every line with `component=scan_dispatcher`.
logger := applog.WorkerLogger(applog.ComponentScanDispatcher)
logger.Warn("claim query failed",
applog.Err(err),
applog.ErrCode("pg_query_failed"))
// HTTP handler — pre-tags with `component=api` + `request_id`.
log := applog.RequestLogger(r)
log = applog.WithProject(log, projectID)
log.Info("served",
applog.Latency(time.Since(t0)),
applog.BytesOut(int64(n)))Always reach for the constants (applog.KeyErr, applog.KeyProjectID, …) or the slog.String / slog.Int64 form — never inline a raw field-name string in a call site.
CI guard
make lint-logs (also a GitHub Actions workflow) refuses any log.Printf / log.Println / log.Fatal* / log.Panic* call inside api/. The stdlib log package emits plain text, which breaks JSON line-ingestion on every SIEM. Use log/slog (always) through the canonical helpers (always).
Per-component coverage
The migration walks the codebase one subsystem at a time so each batch is reviewable in isolation. The table below tracks which emitters now publish through internal/applog. A subsystem listed as shipped is guaranteed to tag component=<name> on every line it emits.
| Component name | Source path | Phase |
|---|---|---|
api | internal/applog.RequestLogger | A |
scan_dispatcher | internal/scan/dispatcher.go | A |
retention_sweeper | internal/retention/sweeper.go | B |
backup_mirror | internal/storage/backup.go, recovery.go | B |
geo_sync | internal/geosync/{geosync,apply,pull}.go | B |
oidc_exchange | internal/handlers/oidc_exchange.go | B |
notify | internal/notify/ | C |
imports | internal/jfrog/, internal/imports/* | C |
cnpg_mirror | internal/cnpg/ | C |
maintenance | internal/maintenance/ | C |
seed | internal/seed/ | C |
A replication constant is reserved in internal/applog for a future low-level pgx-replication-stream worker; the cross-instance geo-sync workers tag as geo_sync rather than replication because they replicate orchestration events (project / repo / artifact / security-block upserts) rather than raw row deltas.
PII redaction (opt-in)
When a deployment lands in a jurisdiction whose data-protection regime forbids storing identifying values in operational logs (GDPR / DSGVO is the recurring driver in DACH customers), set ORBITALREG_LOG_REDACT_FIELDS to the comma-separated list of dotted field paths that should be rewritten to [REDACTED] before the JSON encoder serialises the event. The redaction layer wraps the slog handler at boot so every emitter — request logger, worker logger, the bare slog.Default() callers — picks up the same scrub table without per-callsite plumbing. The default-off path is a thin pass-through, so production deployments that don't opt in pay zero per-record cost.
# Standard DACH-customer profile: scrub the email + last-IP fields.
ORBITALREG_LOG_REDACT_FIELDS=principal_email,attrs.email,attrs.ipPath syntax:
- Top-level field:
principal_email,repo_id,err— matches any attribute emitted directly on the record (whether fromlogger.Info("…", slog.String("principal_email", v))or from a preamblelogger.With(slog.String("principal_email", v))). - Nested attribute:
attrs.email,request.headers.authorization— matches the leaf key inside one or moreslog.Group(...)/Logger.WithGroup(...)levels. The leaf key keeps its name (the value flips to[REDACTED]) so SIEM dashboards that pivot on field presence keep working.
The redaction set is checked at boot. If the env var is non-empty, a single confirmation line lands on the SIEM at startup so the operator can verify the opt-in took effect:
{"time":"2026-05-05T08:00:01Z","level":"INFO",
"component":"api","msg":"log redaction enabled",
"redact_fields":["attrs.email","attrs.ip","principal_email"]}When you need redaction
| Customer profile | Recommendation |
|---|---|
| DACH / EU | Enable. GDPR / DSGVO require minimisation in operational logs. |
| US / global | Optional. Many customers prefer the un-redacted view for triage. |
| Air-gapped | Skip. Air-gap installs don't ship logs off-cluster, so the SIEM |
| retention-window risk that drives the scrub doesn't apply. |
What stays in the logs
Even with the recommended set above enabled, these fields remain visible because they're not personally identifying — they're operational keys SREs need to correlate during incidents:
request_id/correlation_id/trace_id/span_id— opaque per-request UUIDs, no PII.principal_id— a user UUID, not the email; SIEM dashboards still group by user without exposing the address.project_id/repo_id/artifact_id— internal UUIDs.
If a deployment must scrub these too (e.g. multi-tenant fan-out into a shared log lake), add them to the env-var list. The redaction layer treats every canonical schema field uniformly, so any field name in the Canonical fields table is a valid target.
SIEM setup
Every event is one shape-stable JSON object on stdout. Whatever collects container logs in your environment (Promtail, Filebeat, Elastic Agent, the Splunk Universal Forwarder, Vector, Fluent Bit, the Datadog Agent, …) is the right tool — there is no separate log-shipping daemon inside the OrbitalReg image. Pick the recipe below that matches your stack; the Sample queries section answers the same operator questions in each query language.
The OrbitalReg container labels itself with app.kubernetes.io/name= orbitalreg-api (and …/component={api,scan-dispatcher,maintenance, …} on workers running in their own deployments). Recipes below key off that label so you can ingest the whole platform with one collector config.
Loki (Grafana Agent / Promtail / Alloy)
Loki ingests JSON natively — the json pipeline stage promotes the canonical field names to label-able / |=-filterable values without requiring a structured parser per field.
Promtail (promtail.yaml, in-cluster scrape config):
scrape_configs:
- job_name: orbitalreg
kubernetes_sd_configs:
- role: pod
relabel_configs:
- source_labels: [__meta_kubernetes_pod_label_app_kubernetes_io_name]
regex: orbitalreg-api
action: keep
- source_labels: [__meta_kubernetes_namespace]
target_label: namespace
- source_labels: [__meta_kubernetes_pod_name]
target_label: pod
- source_labels: [__meta_kubernetes_pod_label_app_kubernetes_io_component]
target_label: workload
pipeline_stages:
- cri: {}
- json:
expressions:
level: level
component: component
subcomponent: subcomponent
request_id: request_id
err_code: err_code
project_id: project_id
repo_id: repo_id
principal_email: principal_email
latency_ms: latency_ms
- labels:
level:
component:
err_code:Keep request_id, project_id, repo_id, and principal_email as indexed-but-unlabelled values (extract them in the json stage without listing them under labels:). Promoting high-cardinality fields to Loki labels balloons the index — leave them as structured-metadata that LogQL | json reads at query time.
Grafana Alloy (the modern replacement) uses the same pipeline — swap loki.process for pipeline_stages and loki.source.kubernetes for kubernetes_sd_configs. The field set is identical.
Bare-metal / Docker-compose hosts: point Promtail at the daemon's JSON-file logs with a docker_sd_configs block keyed off the com.docker.compose.service=api label.
Elasticsearch (Filebeat / Elastic Agent / Logstash)
Filebeat with the container input parses CRI-O / containerd logs and decodes the inner JSON in one step:
filebeat.inputs:
- type: container
paths:
- /var/log/containers/orbitalreg-api-*.log
processors:
- add_kubernetes_metadata:
host: ${NODE_NAME}
matchers:
- logs_path:
logs_path: /var/log/containers/
- decode_json_fields:
fields: ["message"]
target: ""
overwrite_keys: true
add_error_key: true
- rename:
fields:
- { from: "msg", to: "message" }
- { from: "component", to: "service.name" }
ignore_missing: true
output.elasticsearch:
hosts: ["https://elastic.example.test:9200"]
index: "orbitalreg-%{+yyyy.MM.dd}"The rename processor maps component → service.name so Kibana's ECS-aware service-overview dashboards work out of the box; everything else stays under its OrbitalReg name in labels.* (Filebeat dumps unmapped JSON keys into the document root by default).
Logstash users can do the same with a one-stage filter:
filter {
json { source => "message" target => "ob" remove_field => ["message"] }
mutate {
rename => {
"[ob][msg]" => "message"
"[ob][component]" => "[service][name]"
}
}
if [ob][latency_ms] { mutate { convert => { "[ob][latency_ms]" => "integer" } } }
}Index template — pin the high-cardinality fields as keyword (don't let Elasticsearch infer text and waste an analyzer):
PUT _index_template/orbitalreg
{
"index_patterns": ["orbitalreg-*"],
"template": {
"mappings": {
"properties": {
"ob.component": { "type": "keyword" },
"ob.subcomponent": { "type": "keyword" },
"ob.request_id": { "type": "keyword" },
"ob.correlation_id": { "type": "keyword" },
"ob.err_code": { "type": "keyword" },
"ob.project_id": { "type": "keyword" },
"ob.repo_id": { "type": "keyword" },
"ob.artifact_id": { "type": "keyword" },
"ob.principal_email": { "type": "keyword" },
"ob.latency_ms": { "type": "long" },
"ob.bytes_in": { "type": "long" },
"ob.bytes_out": { "type": "long" }
}
}
}
}Splunk (Universal Forwarder / HEC / OpenTelemetry Collector)
Splunk's JSON ingestion runs at index time via INDEXED_EXTRACTIONS=json so canonical field names are searchable without a | spath:
inputs.conf (Universal Forwarder, K8s):
[monitor:///var/log/containers/orbitalreg-api-*.log]
disabled = false
sourcetype = orbitalreg:json
index = orbitalregprops.conf:
[orbitalreg:json]
INDEXED_EXTRACTIONS = json
KV_MODE = none
TIMESTAMP_FIELDS = time
TIME_FORMAT = %Y-%m-%dT%H:%M:%S.%9N%Z
SHOULD_LINEMERGE = false
TRUNCATE = 0
LINE_BREAKER = ([\r\n]+)\{fields.conf (mark high-cardinality keys as searchable but not auto-tokenized):
[component]
INDEXED = true
[err_code]
INDEXED = true
[request_id]
INDEXED = true
[project_id]
INDEXED = true
[repo_id]
INDEXED = true
[principal_email]
INDEXED = trueHEC alternative: drop the forwarder and ship via an in-cluster OpenTelemetry Collector with the splunkhec exporter. The same canonical schema arrives through HEC; sourcetype should still be set to orbitalreg:json so the props/fields rules above apply.
Sample queries
The same operational questions in each query language. Each block is copy-pasteable — just substitute time ranges and IDs.
Errors in the scan dispatcher (last 1h)
# LogQL
{app="orbitalreg-api", component="scan_dispatcher", level="ERROR"}
| json
| line_format "{{.err_code}} {{.msg}}"# Elasticsearch / Lucene (Discover)
ob.component:"scan_dispatcher" AND ob.level:"ERROR"# Splunk SPL
index=orbitalreg sourcetype="orbitalreg:json"
component=scan_dispatcher level=ERROR
| table _time err_code msgTop error codes by count (last 24h)
sum by (err_code) (
count_over_time(
{app="orbitalreg-api", level=~"WARN|ERROR"}
| json
| err_code != ""
[24h]
)
)# Lens / aggregation: terms on ob.err_code, filter ob.level:(WARN OR ERROR)
ob.level:(WARN OR ERROR) AND _exists_:ob.err_codeindex=orbitalreg level IN (WARN, ERROR) err_code=*
| stats count by err_code | sort -countSlowest API requests (P99 over last 1h, by route)
quantile_over_time(0.99,
{app="orbitalreg-api", component="api"}
| json
| unwrap latency_ms
[1h]) by (subcomponent)# Lens: percentile(99) of ob.latency_ms, broken down by ob.subcomponent
ob.component:"api" AND _exists_:ob.latency_msindex=orbitalreg component=api latency_ms=*
| stats perc99(latency_ms) AS p99 by subcomponent | sort -p99Who pulled a specific artifact
{app="orbitalreg-api", component="api"}
| json
| artifact_id="6f9c2a08-…"
| line_format "{{.principal_email}} {{.bytes_out}}B"ob.artifact_id:"6f9c2a08-…"index=orbitalreg artifact_id="6f9c2a08-…"
| table _time principal_email bytes_out request_idAll retention deletions for a project
{app="orbitalreg-api", component="retention_sweeper"}
| json
| project_id="0a9d40b8-…"ob.component:"retention_sweeper" AND ob.project_id:"0a9d40b8-…"index=orbitalreg component=retention_sweeper project_id="0a9d40b8-…"
| table _time repo_id artifact_id msgOIDC token-exchange failures by issuer
sum by (oidc_issuer) (
count_over_time(
{app="orbitalreg-api", component="oidc_exchange", level="ERROR"}
| json
[24h]
)
)ob.component:"oidc_exchange" AND ob.level:"ERROR"
# Aggregate: terms on ob.oidc_issuerindex=orbitalreg component=oidc_exchange level=ERROR
| stats count by oidc_issuer err_code | sort -countGeo-sync apply errors (receive side)
{app="orbitalreg-api", component="geo_sync"}
| json
| level="ERROR"
| line_format "{{.err_code}} peer={{.peer_id}} kind={{.event_kind}}"ob.component:"geo_sync" AND ob.level:"ERROR"index=orbitalreg component=geo_sync level=ERROR
| stats count by err_code peer_id event_kind | sort -countTrace one request end-to-end (handler → workers)
Every event triggered by an inbound API call carries the same request_id. Search by it across components to see the full hand-off chain — handler, scan submission, notification fan-out, geo-sync enqueue.
{app="orbitalreg-api"} | json | request_id="rq-7a13f0"ob.request_id:"rq-7a13f0"index=orbitalreg request_id="rq-7a13f0" | sort _timeTuning notes
- Don't promote
request_id/project_id/repo_idto Loki labels. Each unique value is a new label-set, and they will blow past Loki'smax_streams_per_userwithin a day on a busy registry. Extract them in thejsonstage and filter at query time. - Drop debug events at ingest if you keep
ORBITALREG_LOG_LEVEL= debugon for a soak — Promtail'smatchstage withaction: dropand Filebeat'sdrop_event.when.equals.level: DEBUGboth do this without round-tripping the bytes. - Time field: Loki and Splunk both pick up
timeautomatically with the recipes above; Filebeat needsdecode_json_fieldsplus atimestampprocessor to honour it (otherwise the Filebeat ingest time wins). On a shipper outage this matters — without it, the back-fill arrives stamped with the wrong hour. - Air-gapped installs (see Air-gapped operations) do not expose any egress channel for logs themselves — these recipes describe how an in-cluster collector ingests the JSON, and the in-cluster collector is the customer's responsibility (Loki, Elastic, Splunk all ship K8s-native deployments). OrbitalReg never POSTs its own logs to a vendor endpoint.