Structured logs

OrbitalReg emits one JSON log line per event from every subsystem (API handlers, scan dispatcher, retention sweeper, backup mirror, replication worker, …). The schema is the contract your SIEM ingestion (Splunk / Elastic / Loki / Grafana / Datadog) maps onto — keeping it stable across releases is treated as a public-API guarantee.

Phase A of PRODUCT-ROADMAP item 63 introduced the canonical field names and the internal/applog helper package, and migrated the scan dispatcher as the pilot subsystem. Phase B extended the migration to four further emitters — retention sweeper, backup-mirror dual-write + recovery worker, cross-instance Geo-Sync push/pull/apply workers, and the OIDC token-exchange handler — so every long-running background loop and every credential-mint hot path now writes the canonical schema. Phase C lands the end-to-end SIEM-ingestion recipes for Loki, Elasticsearch, and Splunk plus a shared sample-query catalogue — see SIEM setup below.

Output format

JSON, one event per line, written to stdout.
Time stamp lives under the stdlib slog default key time (RFC3339 with nanos).
Severity under level (DEBUG / INFO / WARN / ERROR).
Message under msg.
Level threshold is configurable per-process via ORBITALREG_LOG_LEVEL=debug|info|warn|error (default info).

Canonical fields

Key	Type	When emitted
`component`	string	always — names the subsystem (`api`, `scan_dispatcher`, `retention_sweeper`, `backup_mirror`, `replication`, `geo_sync`, `notify`, `imports`, `oidc_exchange`, `maintenance`, `cnpg_mirror`, `seed`)
`subcomponent`	string	optional — handler name for API events (`admin.security`, `webhooks`, …)
`request_id`	string	every API event — chi's request-id middleware value
`correlation_id`	string	optional — copied from the `X-Correlation-Id` request header
`principal_kind`	string	when an authenticated identity is on the context: `user` \| `service_account` \| `anonymous`
`principal_id`	string	UUID of the user or service account
`principal_email`	string	for human users only
`project_id`	string	when the event scopes to a project
`repo_id`	string	when the event scopes to a repository
`artifact_id`	string	when the event scopes to a single artifact
`latency_ms`	int64	duration of the operation (handler, scan, replication tick, …)
`bytes_in`	int64	request body / pulled bytes
`bytes_out`	int64	response body / pushed bytes
`err`	string	on failure — error message
`err_code`	string	on failure — stable `snake_case` discriminator (e.g. `pg_query_failed`, `scan_timeout`, `s3_get_failed`)

The level / time / msg keys are set by the stdlib slog JSON handler and are not in OrbitalReg's control — treat them as part of the schema.

Example event

json

{
  "time": "2026-05-04T14:32:18.412304Z",
  "level": "WARN",
  "msg": "claim query failed",
  "component": "scan_dispatcher",
  "err": "context deadline exceeded",
  "err_code": "pg_query_failed"
}

json

{
  "time": "2026-05-04T14:32:19.005111Z",
  "level": "INFO",
  "msg": "served",
  "component": "api",
  "subcomponent": "admin.security",
  "request_id": "rq-7a13f0",
  "principal_kind": "user",
  "principal_id": "5f6c2a08-…",
  "principal_email": "alice@example.test",
  "project_id": "0a9d40b8-…",
  "latency_ms": 47,
  "bytes_out": 13128
}

Helpers (Go)

Workers and handlers build their *slog.Logger through the helpers in api/internal/applog so a typo in request_id → req_id is a compile error rather than a SIEM regex miss:

// Worker — pre-tags every line with `component=scan_dispatcher`.
logger := applog.WorkerLogger(applog.ComponentScanDispatcher)
logger.Warn("claim query failed",
    applog.Err(err),
    applog.ErrCode("pg_query_failed"))

// HTTP handler — pre-tags with `component=api` + `request_id`.
log := applog.RequestLogger(r)
log = applog.WithProject(log, projectID)
log.Info("served",
    applog.Latency(time.Since(t0)),
    applog.BytesOut(int64(n)))

Always reach for the constants (applog.KeyErr, applog.KeyProjectID, …) or the slog.String / slog.Int64 form — never inline a raw field-name string in a call site.

CI guard

make lint-logs (also a GitHub Actions workflow) refuses any log.Printf / log.Println / log.Fatal* / log.Panic* call inside api/. The stdlib log package emits plain text, which breaks JSON line-ingestion on every SIEM. Use log/slog (always) through the canonical helpers (always).

Per-component coverage

The migration walks the codebase one subsystem at a time so each batch is reviewable in isolation. The table below tracks which emitters now publish through internal/applog. A subsystem listed as shipped is guaranteed to tag component=<name> on every line it emits.

Component name	Source path	Phase
`api`	`internal/applog.RequestLogger`	A
`scan_dispatcher`	`internal/scan/dispatcher.go`	A
`retention_sweeper`	`internal/retention/sweeper.go`	B
`backup_mirror`	`internal/storage/backup.go`, `recovery.go`	B
`geo_sync`	`internal/geosync/{geosync,apply,pull}.go`	B
`oidc_exchange`	`internal/handlers/oidc_exchange.go`	B
`notify`	`internal/notify/`	C
`imports`	`internal/jfrog/`, `internal/imports/*`	C
`cnpg_mirror`	`internal/cnpg/`	C
`maintenance`	`internal/maintenance/`	C
`seed`	`internal/seed/`	C

A replication constant is reserved in internal/applog for a future low-level pgx-replication-stream worker; the cross-instance geo-sync workers tag as geo_sync rather than replication because they replicate orchestration events (project / repo / artifact / security-block upserts) rather than raw row deltas.

PII redaction (opt-in)

When a deployment lands in a jurisdiction whose data-protection regime forbids storing identifying values in operational logs (GDPR / DSGVO is the recurring driver in DACH customers), set ORBITALREG_LOG_REDACT_FIELDS to the comma-separated list of dotted field paths that should be rewritten to [REDACTED] before the JSON encoder serialises the event. The redaction layer wraps the slog handler at boot so every emitter — request logger, worker logger, the bare slog.Default() callers — picks up the same scrub table without per-callsite plumbing. The default-off path is a thin pass-through, so production deployments that don't opt in pay zero per-record cost.

bash

# Standard DACH-customer profile: scrub the email + last-IP fields.
ORBITALREG_LOG_REDACT_FIELDS=principal_email,attrs.email,attrs.ip

Path syntax:

Top-level field: principal_email, repo_id, err — matches any attribute emitted directly on the record (whether from logger.Info("…", slog.String("principal_email", v)) or from a preamble logger.With(slog.String("principal_email", v))).
Nested attribute: attrs.email, request.headers.authorization — matches the leaf key inside one or more slog.Group(...) / Logger.WithGroup(...) levels. The leaf key keeps its name (the value flips to [REDACTED]) so SIEM dashboards that pivot on field presence keep working.

The redaction set is checked at boot. If the env var is non-empty, a single confirmation line lands on the SIEM at startup so the operator can verify the opt-in took effect:

json

{"time":"2026-05-05T08:00:01Z","level":"INFO",
 "component":"api","msg":"log redaction enabled",
 "redact_fields":["attrs.email","attrs.ip","principal_email"]}

When you need redaction

Customer profile	Recommendation
DACH / EU	Enable. GDPR / DSGVO require minimisation in operational logs.
US / global	Optional. Many customers prefer the un-redacted view for triage.
Air-gapped	Skip. Air-gap installs don't ship logs off-cluster, so the SIEM
	retention-window risk that drives the scrub doesn't apply.

What stays in the logs

Even with the recommended set above enabled, these fields remain visible because they're not personally identifying — they're operational keys SREs need to correlate during incidents:

request_id / correlation_id / trace_id / span_id — opaque per-request UUIDs, no PII.
principal_id — a user UUID, not the email; SIEM dashboards still group by user without exposing the address.
project_id / repo_id / artifact_id — internal UUIDs.

If a deployment must scrub these too (e.g. multi-tenant fan-out into a shared log lake), add them to the env-var list. The redaction layer treats every canonical schema field uniformly, so any field name in the Canonical fields table is a valid target.

SIEM setup

Every event is one shape-stable JSON object on stdout. Whatever collects container logs in your environment (Promtail, Filebeat, Elastic Agent, the Splunk Universal Forwarder, Vector, Fluent Bit, the Datadog Agent, …) is the right tool — there is no separate log-shipping daemon inside the OrbitalReg image. Pick the recipe below that matches your stack; the Sample queries section answers the same operator questions in each query language.

The OrbitalReg container labels itself with app.kubernetes.io/name= orbitalreg-api (and …/component={api,scan-dispatcher,maintenance, …} on workers running in their own deployments). Recipes below key off that label so you can ingest the whole platform with one collector config.

Loki (Grafana Agent / Promtail / Alloy)

Loki ingests JSON natively — the json pipeline stage promotes the canonical field names to label-able / |=-filterable values without requiring a structured parser per field.

Promtail (promtail.yaml, in-cluster scrape config):

yaml

scrape_configs:
  - job_name: orbitalreg
    kubernetes_sd_configs:
      - role: pod
    relabel_configs:
      - source_labels: [__meta_kubernetes_pod_label_app_kubernetes_io_name]
        regex: orbitalreg-api
        action: keep
      - source_labels: [__meta_kubernetes_namespace]
        target_label: namespace
      - source_labels: [__meta_kubernetes_pod_name]
        target_label: pod
      - source_labels: [__meta_kubernetes_pod_label_app_kubernetes_io_component]
        target_label: workload
    pipeline_stages:
      - cri: {}
      - json:
          expressions:
            level: level
            component: component
            subcomponent: subcomponent
            request_id: request_id
            err_code: err_code
            project_id: project_id
            repo_id: repo_id
            principal_email: principal_email
            latency_ms: latency_ms
      - labels:
          level:
          component:
          err_code:

Keep request_id, project_id, repo_id, and principal_email as indexed-but-unlabelled values (extract them in the json stage without listing them under labels:). Promoting high-cardinality fields to Loki labels balloons the index — leave them as structured-metadata that LogQL | json reads at query time.

Grafana Alloy (the modern replacement) uses the same pipeline — swap loki.process for pipeline_stages and loki.source.kubernetes for kubernetes_sd_configs. The field set is identical.

Bare-metal / Docker-compose hosts: point Promtail at the daemon's JSON-file logs with a docker_sd_configs block keyed off the com.docker.compose.service=api label.

Elasticsearch (Filebeat / Elastic Agent / Logstash)

Filebeat with the container input parses CRI-O / containerd logs and decodes the inner JSON in one step:

yaml

filebeat.inputs:
  - type: container
    paths:
      - /var/log/containers/orbitalreg-api-*.log
    processors:
      - add_kubernetes_metadata:
          host: ${NODE_NAME}
          matchers:
            - logs_path:
                logs_path: /var/log/containers/
      - decode_json_fields:
          fields: ["message"]
          target: ""
          overwrite_keys: true
          add_error_key: true
      - rename:
          fields:
            - { from: "msg",       to: "message" }
            - { from: "component", to: "service.name" }
          ignore_missing: true

output.elasticsearch:
  hosts: ["https://elastic.example.test:9200"]
  index: "orbitalreg-%{+yyyy.MM.dd}"

The rename processor maps component → service.name so Kibana's ECS-aware service-overview dashboards work out of the box; everything else stays under its OrbitalReg name in labels.* (Filebeat dumps unmapped JSON keys into the document root by default).

Logstash users can do the same with a one-stage filter:

ruby

filter {
  json { source => "message" target => "ob" remove_field => ["message"] }
  mutate {
    rename => {
      "[ob][msg]"       => "message"
      "[ob][component]" => "[service][name]"
    }
  }
  if [ob][latency_ms] { mutate { convert => { "[ob][latency_ms]" => "integer" } } }
}

Index template — pin the high-cardinality fields as keyword (don't let Elasticsearch infer text and waste an analyzer):

json

PUT _index_template/orbitalreg
{
  "index_patterns": ["orbitalreg-*"],
  "template": {
    "mappings": {
      "properties": {
        "ob.component":       { "type": "keyword" },
        "ob.subcomponent":    { "type": "keyword" },
        "ob.request_id":      { "type": "keyword" },
        "ob.correlation_id":  { "type": "keyword" },
        "ob.err_code":        { "type": "keyword" },
        "ob.project_id":      { "type": "keyword" },
        "ob.repo_id":         { "type": "keyword" },
        "ob.artifact_id":     { "type": "keyword" },
        "ob.principal_email": { "type": "keyword" },
        "ob.latency_ms":      { "type": "long"    },
        "ob.bytes_in":        { "type": "long"    },
        "ob.bytes_out":       { "type": "long"    }
      }
    }
  }
}

Splunk (Universal Forwarder / HEC / OpenTelemetry Collector)

Splunk's JSON ingestion runs at index time via INDEXED_EXTRACTIONS=json so canonical field names are searchable without a | spath:

inputs.conf (Universal Forwarder, K8s):

ini

[monitor:///var/log/containers/orbitalreg-api-*.log]
disabled = false
sourcetype = orbitalreg:json
index = orbitalreg

props.conf:

ini

[orbitalreg:json]
INDEXED_EXTRACTIONS = json
KV_MODE = none
TIMESTAMP_FIELDS = time
TIME_FORMAT = %Y-%m-%dT%H:%M:%S.%9N%Z
SHOULD_LINEMERGE = false
TRUNCATE = 0
LINE_BREAKER = ([\r\n]+)\{

fields.conf (mark high-cardinality keys as searchable but not auto-tokenized):

ini

[component]
INDEXED = true
[err_code]
INDEXED = true
[request_id]
INDEXED = true
[project_id]
INDEXED = true
[repo_id]
INDEXED = true
[principal_email]
INDEXED = true

HEC alternative: drop the forwarder and ship via an in-cluster OpenTelemetry Collector with the splunkhec exporter. The same canonical schema arrives through HEC; sourcetype should still be set to orbitalreg:json so the props/fields rules above apply.

Sample queries

The same operational questions in each query language. Each block is copy-pasteable — just substitute time ranges and IDs.

Errors in the scan dispatcher (last 1h)

text

# LogQL
{app="orbitalreg-api", component="scan_dispatcher", level="ERROR"}
  | json
  | line_format "{{.err_code}} {{.msg}}"

text

# Elasticsearch / Lucene (Discover)
ob.component:"scan_dispatcher" AND ob.level:"ERROR"

spl

# Splunk SPL
index=orbitalreg sourcetype="orbitalreg:json"
  component=scan_dispatcher level=ERROR
| table _time err_code msg

Top error codes by count (last 24h)

text

sum by (err_code) (
  count_over_time(
    {app="orbitalreg-api", level=~"WARN|ERROR"}
      | json
      | err_code != ""
    [24h]
  )
)

text

# Lens / aggregation: terms on ob.err_code, filter ob.level:(WARN OR ERROR)
ob.level:(WARN OR ERROR) AND _exists_:ob.err_code

spl

index=orbitalreg level IN (WARN, ERROR) err_code=*
| stats count by err_code | sort -count

Slowest API requests (P99 over last 1h, by route)

text

quantile_over_time(0.99,
  {app="orbitalreg-api", component="api"}
    | json
    | unwrap latency_ms
[1h]) by (subcomponent)

text

# Lens: percentile(99) of ob.latency_ms, broken down by ob.subcomponent
ob.component:"api" AND _exists_:ob.latency_ms

spl

index=orbitalreg component=api latency_ms=*
| stats perc99(latency_ms) AS p99 by subcomponent | sort -p99

Who pulled a specific artifact

text

{app="orbitalreg-api", component="api"}
  | json
  | artifact_id="6f9c2a08-…"
  | line_format "{{.principal_email}} {{.bytes_out}}B"

text

ob.artifact_id:"6f9c2a08-…"

spl

index=orbitalreg artifact_id="6f9c2a08-…"
| table _time principal_email bytes_out request_id

All retention deletions for a project

text

{app="orbitalreg-api", component="retention_sweeper"}
  | json
  | project_id="0a9d40b8-…"

text

ob.component:"retention_sweeper" AND ob.project_id:"0a9d40b8-…"

spl

index=orbitalreg component=retention_sweeper project_id="0a9d40b8-…"
| table _time repo_id artifact_id msg

OIDC token-exchange failures by issuer

text

sum by (oidc_issuer) (
  count_over_time(
    {app="orbitalreg-api", component="oidc_exchange", level="ERROR"}
      | json
    [24h]
  )
)

text

ob.component:"oidc_exchange" AND ob.level:"ERROR"
# Aggregate: terms on ob.oidc_issuer

spl

index=orbitalreg component=oidc_exchange level=ERROR
| stats count by oidc_issuer err_code | sort -count

Geo-sync apply errors (receive side)

text

{app="orbitalreg-api", component="geo_sync"}
  | json
  | level="ERROR"
  | line_format "{{.err_code}} peer={{.peer_id}} kind={{.event_kind}}"

text

ob.component:"geo_sync" AND ob.level:"ERROR"

spl

index=orbitalreg component=geo_sync level=ERROR
| stats count by err_code peer_id event_kind | sort -count

Trace one request end-to-end (handler → workers)

Every event triggered by an inbound API call carries the same request_id. Search by it across components to see the full hand-off chain — handler, scan submission, notification fan-out, geo-sync enqueue.

text

{app="orbitalreg-api"} | json | request_id="rq-7a13f0"

text

ob.request_id:"rq-7a13f0"

spl

index=orbitalreg request_id="rq-7a13f0" | sort _time

Tuning notes

Don't promote request_id / project_id / repo_id to Loki labels. Each unique value is a new label-set, and they will blow past Loki's max_streams_per_user within a day on a busy registry. Extract them in the json stage and filter at query time.
Drop debug events at ingest if you keep ORBITALREG_LOG_LEVEL= debug on for a soak — Promtail's match stage with action: drop and Filebeat's drop_event.when.equals.level: DEBUG both do this without round-tripping the bytes.
Time field: Loki and Splunk both pick up time automatically with the recipes above; Filebeat needs decode_json_fields plus a timestamp processor to honour it (otherwise the Filebeat ingest time wins). On a shipper outage this matters — without it, the back-fill arrives stamped with the wrong hour.
Air-gapped installs (see Air-gapped operations) do not expose any egress channel for logs themselves — these recipes describe how an in-cluster collector ingests the JSON, and the in-cluster collector is the customer's responsibility (Loki, Elastic, Splunk all ship K8s-native deployments). OrbitalReg never POSTs its own logs to a vendor endpoint.

Structured logs ​

Output format ​

Canonical fields ​

Example event ​

Helpers (Go) ​

CI guard ​

Per-component coverage ​

PII redaction (opt-in) ​

When you need redaction ​

What stays in the logs ​

SIEM setup ​

Loki (Grafana Agent / Promtail / Alloy) ​

Elasticsearch (Filebeat / Elastic Agent / Logstash) ​

Splunk (Universal Forwarder / HEC / OpenTelemetry Collector) ​

Sample queries ​

Errors in the scan dispatcher (last 1h) ​

Top error codes by count (last 24h) ​

Slowest API requests (P99 over last 1h, by route) ​

Who pulled a specific artifact ​

All retention deletions for a project ​

OIDC token-exchange failures by issuer ​

Geo-sync apply errors (receive side) ​

Trace one request end-to-end (handler → workers) ​

Tuning notes ​

Structured logs

Output format

Canonical fields

Example event

Helpers (Go)

CI guard

Per-component coverage

PII redaction (opt-in)

When you need redaction

What stays in the logs

SIEM setup

Loki (Grafana Agent / Promtail / Alloy)

Elasticsearch (Filebeat / Elastic Agent / Logstash)

Splunk (Universal Forwarder / HEC / OpenTelemetry Collector)

Sample queries

Errors in the scan dispatcher (last 1h)

Top error codes by count (last 24h)

Slowest API requests (P99 over last 1h, by route)

Who pulled a specific artifact

All retention deletions for a project

OIDC token-exchange failures by issuer

Geo-sync apply errors (receive side)

Trace one request end-to-end (handler → workers)

Tuning notes