Files
unkinben 4d594fbde7 feat(kanidm): vault-managed replication certs with auto-restart (#176)
- Store per-pod replication certs in Vault (kv/kubernetes/namespace/kanidm/default/repl-certs)
- VaultAuth + VaultStaticSecret sync certs to kanidm-repl-certs Secret
- busybox config-init init container injects peer certs from Secret into server.toml at startup
- Remove hardcoded partner_cert entries from per-pod server.toml templates
- Add automatic_refresh = true to all replication configs
- Add reloader.stakater.com/auto annotation to trigger rolling restart on ConfigMap/Secret changes
- Document domain UUID mismatch resolution and cert rotation in README

Reviewed-on: #176
2026-05-30 23:00:46 +10:00

2.0 KiB

kanidm

Three-replica kanidm identity server with Vault-managed replication certificates.

Architecture

  • Per-pod server-N.toml in resources/ — each has its own replication origin hardcoded
  • config-init busybox init container copies the right config and injects peer certs from the vault-synced kanidm-repl-certs Secret at pod startup
  • reloader.stakater.com/auto: "true" triggers a rolling restart when the ConfigMap or Secret changes
  • Vault path: kv/kubernetes/namespace/kanidm/default/repl-certs
    • Keys: kanidm-0, kanidm-1, kanidm-2 — each holds that pod's replication certificate

Initial setup

After the first pod starts, generate the admin credentials:

kubectl exec -n kanidm kanidm-0 -- /sbin/kanidmd recover-account -c /config/server.toml admin
kubectl exec -n kanidm kanidm-0 -- /sbin/kanidmd recover-account -c /config/server.toml idm_admin

Replication certificate rotation

When certs need to be renewed, update vault and reloader will roll the StatefulSet:

# Get new cert from a pod
kubectl exec -it -n kanidm kanidm-N -- /sbin/kanidmd renew-replication-certificate -c /config/server.toml

# Write updated cert to vault (reloader triggers restart automatically)
vault kv patch kv/kubernetes/namespace/kanidm/default/repl-certs "kanidm-N=<cert>"

Resolving domain UUID mismatch

If pods initialized independently (each with a different domain UUID), replication will fail with Consumer Domain UUID does not match. Fix by resetting kanidm-1 and kanidm-2 to sync from kanidm-0 (the authoritative node):

# Scale down to avoid split-brain during reset
kubectl scale statefulset -n kanidm kanidm --replicas=1

# Delete the stale PVCs for the replica pods
kubectl delete pvc -n kanidm data-kanidm-1 data-kanidm-2

# Scale back up — replicas start with empty DBs and automatic_refresh=true
# will trigger a full sync from kanidm-0 once TLS peer certs are verified
kubectl scale statefulset -n kanidm kanidm --replicas=3