15 Commits

Author SHA1 Message Date
unkinben 0199a422a0 Merge branch 'main' into fix-kanidm-replication-config
ci/woodpecker/pr/pre-commit Pipeline was successful
ci/woodpecker/pr/kubeconform Pipeline was successful
2026-05-30 23:33:36 +10:00
unkinben f11ec1056d fix(kanidm): remove invalid automatic_refresh from replication config (#179)
Reviewed-on: #179
2026-05-30 23:20:48 +10:00
unkinben 1b4b22cad8 fix(kanidm): remove invalid automatic_refresh from replication config
ci/woodpecker/pr/pre-commit Pipeline was successful
ci/woodpecker/pr/kubeconform Pipeline was successful
2026-05-30 23:15:41 +10:00
unkinben ed7feaf19a Update apps/base/kanidm/vaultauth.yaml (#177)
Fix the VaultAuth object

Reviewed-on: #177
2026-05-30 23:11:38 +10:00
unkinben 4d594fbde7 feat(kanidm): vault-managed replication certs with auto-restart (#176)
- Store per-pod replication certs in Vault (kv/kubernetes/namespace/kanidm/default/repl-certs)
- VaultAuth + VaultStaticSecret sync certs to kanidm-repl-certs Secret
- busybox config-init init container injects peer certs from Secret into server.toml at startup
- Remove hardcoded partner_cert entries from per-pod server.toml templates
- Add automatic_refresh = true to all replication configs
- Add reloader.stakater.com/auto annotation to trigger rolling restart on ConfigMap/Secret changes
- Document domain UUID mismatch resolution and cert rotation in README

Reviewed-on: #176
2026-05-30 23:00:46 +10:00
unkinben 1b781e0885 feat(woodpecker): set workflow pod priority class to power (#175)
## Summary
Sets `WOODPECKER_BACKEND_K8S_PRIORITY_CLASS: power` on the Woodpecker agent so all CI pipeline pods are scheduled with the `power` PriorityClass (value 100, preemptionPolicy: Never).

This means pipeline pods can be evicted when the cluster is under pressure but won't preempt other workloads.

## Dependency
Requires the `power` PriorityClass to exist on the cluster — deploy PR #174 (priority-classes app) first.

## Test plan
- Trigger a pipeline run and confirm pods are created with `priorityClassName: power`
- `kubectl get pod -n woodpecker -o jsonpath='{.items[*].spec.priorityClassName}'`

Reviewed-on: #175
2026-05-26 23:58:57 +10:00
unkinben ede25a3858 feat(platform): add priority-classes app with low/power/medium/high classes (#174)
## Summary
- New `apps/base/priority-classes/` app with four `PriorityClass` objects managed via the `platform` ArgoCD project
- Adds `apps/overlays/*/priority-classes` to the platform ApplicationSet generator
- Adds `priority-classes` namespace to platform AppProject destinations (required even for cluster-scoped resources)

| Class | Value | PreemptionPolicy | Intent |
|---|---|---|---|
| `low` | 100 | Never | Background work; evictable, won't preempt others |
| `power` | 100 | Never | Compute-heavy but expendable (e.g. AI/ML workloads) |
| `medium` | 10000 | PreemptLowerPriority | Standard services |
| `high` | 100000 | PreemptLowerPriority | Critical services; preempts lower-priority pods |

`PriorityClass` is already in the platform project's `clusterResourceWhitelist` so no project policy changes were needed.

## Test plan
- ArgoCD syncs `platform-priority-classes` successfully
- `kubectl get priorityclasses low power medium high` shows all four classes

Reviewed-on: #174
2026-05-26 23:41:54 +10:00
unkinben f5f713fe86 feat(artifactapi): add open-webui/open-webui to ghcr immutable patterns (#173)
Part of #155 (prerequisite for open-webui deployment PR #172).

## Summary
- Adds `^open-webui/open-webui` to the `ghcr` remote's `immutable_patterns` in `remote-docker.yaml` so version-pinned open-webui image pulls are cached indefinitely through artifactapi

## Test plan
- artifactapi serves `ghcr.io/open-webui/open-webui:<version>` with `X-Artifact-Source: cache` on second fetch

Reviewed-on: #173
2026-05-26 23:28:27 +10:00
unkinben 3990fbfe06 feat(vault): switch to Kubernetes service registration (#171)
Replaces Consul service registration with the native Kubernetes provider so Vault labels its own pods with active/standby/perf-standby status without requiring a Consul dependency.

## Changes
- `values.yaml`: swap `service_registration "consul"` for `service_registration "kubernetes" {}`, add `VAULT_K8S_NAMESPACE` and `VAULT_K8S_POD_NAME` env vars via downward API
- `role_k8s-service-registration.yaml`: Role + RoleBinding granting the `vault` service account `get`/`update`/`patch` on pods
- `kustomization.yaml`: include new RBAC file

Reviewed-on: #171
2026-05-26 00:06:56 +10:00
unkinben d358098fff chore: update replication certs (#170)
- add replication certs for kanidm-0, kanidm-1 and kanidm-2

Reviewed-on: #170
2026-05-25 23:52:06 +10:00
unkinben 201e601737 feat: update kanidm replicaiton (#169)
- split to per-server configs
- remove init containers that attempted to automate the replication config
- add README.md

Reviewed-on: #169
2026-05-25 23:25:48 +10:00
unkinben d230d87ec9 feat(artifactapi): add conftest to GitHub generic remote cache (#168)
## Summary

- Adds `open-policy-agent/conftest/.*/conftest_.*_Linux_x86_64.tar.gz$` to the `github` remote immutable patterns in artifactapi

## Why

conftest v0.68.2 (https://github.com/open-policy-agent/conftest/releases/tag/v0.68.2) is now used for OPA policy checks in CI (see #167). Caching the release tarball in artifactapi reduces external dependency on GitHub during builds.

Reviewed-on: #168
2026-05-25 22:44:57 +10:00
unkinben 6497dab25e fix(puppet): remove explicit clusterIP: null from puppetdb Service (#166)
## Summary

- Removes `clusterIP: null` from the `puppetdb` Service spec

## Why

Setting `clusterIP: null` makes ArgoCD's desired state explicit about the field being null. Kubernetes assigns a real IP on creation and the field is immutable afterward. The null vs assigned-IP mismatch causes permanent OutOfSync on the puppetdb Service. Removing the field means ArgoCD no longer claims ownership of `clusterIP`, so the API server's value is authoritative.

Reviewed-on: #166
2026-05-25 22:44:24 +10:00
unkinben f403c6b05d fix(kanidm): add explicit group/kind/weight to TLSRoute refs (#165)
## Summary

- Adds `group: gateway.networking.k8s.io` and `kind: Gateway` to `parentRefs`
- Adds `group: ""`, `kind: Service`, and `weight: 1` to `backendRefs`

## Why

The Gateway API controller defaults these fields when creating/updating TLSRoute objects, so the live state always has them. ArgoCD diffs desired vs live by string comparison, causing the `kanidm` TLSRoute to show permanent OutOfSync. Same root cause as #162 (HTTPRoutes).

Reviewed-on: #165
2026-05-25 22:43:52 +10:00
unkinben ac8b8212bd fix(consul): normalize cpu limit to canonical string form (#164)
## Summary

- Changes `server.resources.limits.cpu` from `1000m` to `"1"` in consul Helm values

## Why

`1000m` (1000 milliCPU) is equivalent to `1` CPU, but Kubernetes normalizes the value to `"1"` when storing. ArgoCD diffs desired vs live by string comparison, so the mismatch causes a permanent OutOfSync on the `consul-server` StatefulSet. Same root cause as #163.

Reviewed-on: #164
2026-05-25 22:43:35 +10:00
23 changed files with 269 additions and 89 deletions
@@ -6,6 +6,7 @@ remotes:
immutable_patterns:
- "^cloudnative-pg/cloudnative-pg"
- "^emberstack/helm-charts"
- "^open-webui/open-webui"
- "^openvoxproject/"
- "^stakater/reloader"
- "^stalwartlabs/stalwart"
@@ -36,6 +36,7 @@ remotes:
- "neovim/neovim/.*/nvim-linux-x86_64.tar.gz$"
- "nzbgetcom/nzbget/.*/nzbget-.*.x86_64.rpm$"
- "onedr0p/exportarr/.*/exportarr_.*_linux_amd64.tar.gz$"
- "open-policy-agent/conftest/.*/conftest_.*_Linux_x86_64.tar.gz$"
- "openbao/openbao-plugins/.*/openbao-plugin-secrets-consul_linux_amd64_.*.tar.gz$"
- "openbao/openbao-plugins/.*/openbao-plugin-secrets-nomad_linux_amd64_.*.tar.gz$"
- "prometheus-community/bind_exporter/.*/bind_exporter-.*.linux-amd64.tar.gz$"
+51
View File
@@ -0,0 +1,51 @@
# kanidm
Three-replica kanidm identity server with Vault-managed replication certificates.
## Architecture
- Per-pod `server-N.toml` in `resources/` — each has its own replication origin hardcoded
- `config-init` busybox init container copies the right config and injects peer certs from the
vault-synced `kanidm-repl-certs` Secret at pod startup
- `reloader.stakater.com/auto: "true"` triggers a rolling restart when the ConfigMap or Secret changes
- Vault path: `kv/kubernetes/namespace/kanidm/default/repl-certs`
- Keys: `kanidm-0`, `kanidm-1`, `kanidm-2` — each holds that pod's replication certificate
## Initial setup
After the first pod starts, generate the admin credentials:
```bash
kubectl exec -n kanidm kanidm-0 -- /sbin/kanidmd recover-account -c /config/server.toml admin
kubectl exec -n kanidm kanidm-0 -- /sbin/kanidmd recover-account -c /config/server.toml idm_admin
```
## Replication certificate rotation
When certs need to be renewed, update vault and reloader will roll the StatefulSet:
```bash
# Get new cert from a pod
kubectl exec -it -n kanidm kanidm-N -- /sbin/kanidmd renew-replication-certificate -c /config/server.toml
# Write updated cert to vault (reloader triggers restart automatically)
vault kv patch kv/kubernetes/namespace/kanidm/default/repl-certs "kanidm-N=<cert>"
```
## Resolving domain UUID mismatch
If pods initialized independently (each with a different domain UUID), replication will fail with
`Consumer Domain UUID does not match`. Fix by resetting kanidm-1 and kanidm-2 to sync from
kanidm-0 (the authoritative node):
```bash
# Scale down to avoid split-brain during reset
kubectl scale statefulset -n kanidm kanidm --replicas=1
# Delete the stale PVCs for the replica pods
kubectl delete pvc -n kanidm data-kanidm-1 data-kanidm-2
# Scale back up — replicas start with empty DBs and automatic_refresh=true
# will trigger a full sync from kanidm-0 once TLS peer certs are verified
kubectl scale statefulset -n kanidm kanidm --replicas=3
```
-40
View File
@@ -1,40 +0,0 @@
---
apiVersion: v1
kind: ConfigMap
metadata:
name: kanidm-config
namespace: kanidm
labels:
app.kubernetes.io/name: kanidm
app.kubernetes.io/instance: kanidm
data:
server.toml: |
version = "2"
domain = "auth.unkin.net"
origin = "https://auth.unkin.net"
bindaddress = "[::]:8443"
db_path = "/data/kanidm.db"
db_arc_size = 2048
tls_chain = "/data/tls/tls.crt"
tls_key = "/data/tls/tls.key"
log_level = "info"
[online_backup]
path = "/data/backups/"
schedule = "0 22 * * *"
versions = 7
[replication]
origin = "__REPL_ORIGIN__"
bindaddress = "[::]:8444"
---
apiVersion: v1
kind: ConfigMap
metadata:
name: kanidm-repl-certs
namespace: kanidm
labels:
app.kubernetes.io/name: kanidm
app.kubernetes.io/instance: kanidm
data: {}
+15 -2
View File
@@ -5,12 +5,25 @@ kind: Kustomization
resources:
- namespace.yaml
- serviceaccount.yaml
- rbac.yaml
- vaultauth.yaml
- vaultstaticsecret.yaml
- certificate.yaml
- configmap.yaml
- service.yaml
- statefulset.yaml
- poddisruptionbudget.yaml
- gateway.yaml
- httproute.yaml
- tlsroute.yaml
configMapGenerator:
- name: kanidm-config
namespace: kanidm
options:
disableNameSuffixHash: true
labels:
app.kubernetes.io/name: kanidm
app.kubernetes.io/instance: kanidm
files:
- server-0.toml=resources/server-0.toml
- server-1.toml=resources/server-1.toml
- server-2.toml=resources/server-2.toml
+19
View File
@@ -0,0 +1,19 @@
version = "2"
domain = "auth.unkin.net"
origin = "https://auth.unkin.net"
bindaddress = "[::]:8443"
db_path = "/data/kanidm.db"
db_arc_size = 2048
tls_chain = "/data/tls/tls.crt"
tls_key = "/data/tls/tls.key"
log_level = "info"
[online_backup]
path = "/data/backups/"
schedule = "0 22 * * *"
versions = 7
[replication]
origin = "repl://kanidm-0.kanidm-headless.kanidm.svc.cluster.local:8444"
bindaddress = "[::]:8444"
+19
View File
@@ -0,0 +1,19 @@
version = "2"
domain = "auth.unkin.net"
origin = "https://auth.unkin.net"
bindaddress = "[::]:8443"
db_path = "/data/kanidm.db"
db_arc_size = 2048
tls_chain = "/data/tls/tls.crt"
tls_key = "/data/tls/tls.key"
log_level = "info"
[online_backup]
path = "/data/backups/"
schedule = "0 22 * * *"
versions = 7
[replication]
origin = "repl://kanidm-1.kanidm-headless.kanidm.svc.cluster.local:8444"
bindaddress = "[::]:8444"
+19
View File
@@ -0,0 +1,19 @@
version = "2"
domain = "auth.unkin.net"
origin = "https://auth.unkin.net"
bindaddress = "[::]:8443"
db_path = "/data/kanidm.db"
db_arc_size = 2048
tls_chain = "/data/tls/tls.crt"
tls_key = "/data/tls/tls.key"
log_level = "info"
[online_backup]
path = "/data/backups/"
schedule = "0 22 * * *"
versions = 7
[replication]
origin = "repl://kanidm-2.kanidm-headless.kanidm.svc.cluster.local:8444"
bindaddress = "[::]:8444"
+12 -40
View File
@@ -4,6 +4,8 @@ kind: StatefulSet
metadata:
name: kanidm
namespace: kanidm
annotations:
reloader.stakater.com/auto: "true"
labels:
app.kubernetes.io/name: kanidm
app.kubernetes.io/instance: kanidm
@@ -36,23 +38,19 @@ spec:
fsGroup: 1000
initContainers:
- name: config-init
image: kanidm/server:1.10.3
image: busybox:1.36
command: ["/bin/sh", "-c"]
args:
- |
set -e
REPL_ORIGIN="repl://${POD_NAME}.kanidm-headless.kanidm.svc.cluster.local:8444"
sed "s|__REPL_ORIGIN__|${REPL_ORIGIN}|g" /config-template/server.toml > /config/server.toml
cp "/config-template/server-${POD_NAME##*-}.toml" /config/server.toml
for peer in kanidm-0 kanidm-1 kanidm-2; do
if [ "${peer}" = "${POD_NAME}" ]; then
continue
fi
[ "${peer}" = "${POD_NAME}" ] && continue
cert_file="/repl-certs/${peer}"
if [ -s "${cert_file}" ]; then
fqdn="${peer}.kanidm-headless.kanidm.svc.cluster.local"
printf '\n[replication."repl://%s:8444"]\ntype = "mutual-pull"\npartner_cert = "%s"\n' \
"${fqdn}" "$(cat ${cert_file})" >> /config/server.toml
fi
[ -s "${cert_file}" ] || continue
fqdn="${peer}.kanidm-headless.kanidm.svc.cluster.local"
printf '\n[replication."repl://%s:8444"]\ntype = "mutual-pull"\npartner_cert = "%s"\n' \
"${fqdn}" "$(cat ${cert_file})" >> /config/server.toml
done
env:
- name: POD_NAME
@@ -62,6 +60,7 @@ spec:
volumeMounts:
- name: config-template
mountPath: /config-template
readOnly: true
- name: config
mountPath: /config
- name: repl-certs
@@ -70,33 +69,6 @@ spec:
securityContext:
allowPrivilegeEscalation: false
readOnlyRootFilesystem: true
- name: repl-cert-publisher
image: bitnami/kubectl:1.33
restartPolicy: Always
command: ["/bin/sh", "-c"]
args:
- |
until kubectl exec "${POD_NAME}" -c kanidm -- /sbin/kanidmd renew-replication-certificate 2>/dev/null | grep -q '^# certificate:'; do
sleep 30
done
while true; do
cert=$(kubectl exec "${POD_NAME}" -c kanidm -- /sbin/kanidmd renew-replication-certificate 2>/dev/null \
| grep '^# certificate:' | sed 's/^# certificate: "\(.*\)"$/\1/')
if [ -n "${cert}" ]; then
kubectl patch configmap kanidm-repl-certs \
--type=merge \
-p "{\"data\":{\"${POD_NAME}\":\"${cert}\"}}"
fi
sleep 3600
done
env:
- name: POD_NAME
valueFrom:
fieldRef:
fieldPath: metadata.name
securityContext:
allowPrivilegeEscalation: false
readOnlyRootFilesystem: false
containers:
- name: kanidm
image: kanidm/server:1.10.3
@@ -145,8 +117,8 @@ spec:
- name: config
emptyDir: {}
- name: repl-certs
configMap:
name: kanidm-repl-certs
secret:
secretName: kanidm-repl-certs
- name: tls
secret:
secretName: kanidm-tls
+7 -2
View File
@@ -13,9 +13,14 @@ spec:
- auth.unkin.net
- au.auth.unkin.net
parentRefs:
- name: kanidm
- group: gateway.networking.k8s.io
kind: Gateway
name: kanidm
sectionName: https-passthrough
rules:
- backendRefs:
- name: kanidm
- group: ""
kind: Service
name: kanidm
port: 8443
weight: 1
+18
View File
@@ -0,0 +1,18 @@
---
apiVersion: secrets.hashicorp.com/v1beta1
kind: VaultAuth
metadata:
name: default
namespace: kanidm
spec:
allowedNamespaces:
- kanidm
kubernetes:
audiences:
- vault
role: default
serviceAccount: default
tokenExpirationSeconds: 600
method: kubernetes
mount: k8s/au/syd1
vaultConnectionRef: vso-system/default
+20
View File
@@ -0,0 +1,20 @@
---
apiVersion: secrets.hashicorp.com/v1beta1
kind: VaultStaticSecret
metadata:
name: repl-certs
namespace: kanidm
labels:
app.kubernetes.io/name: kanidm
app.kubernetes.io/instance: kanidm
spec:
vaultAuthRef: default
mount: kv
type: kv-v2
path: kubernetes/namespace/kanidm/default/repl-certs
refreshAfter: 5m
destination:
name: kanidm-repl-certs
create: true
overwrite: true
hmacSecretData: true
@@ -0,0 +1,6 @@
---
apiVersion: kustomize.config.k8s.io/v1beta1
kind: Kustomization
resources:
- priorityclasses.yaml
@@ -0,0 +1,36 @@
---
apiVersion: scheduling.k8s.io/v1
kind: PriorityClass
metadata:
name: low
value: 100
preemptionPolicy: Never
globalDefault: false
description: "Low-importance workloads. Can be evicted under pressure but will not preempt other pods."
---
apiVersion: scheduling.k8s.io/v1
kind: PriorityClass
metadata:
name: power
value: 100
preemptionPolicy: Never
globalDefault: false
description: "Compute-heavy workloads with low scheduling importance. Evictable under pressure."
---
apiVersion: scheduling.k8s.io/v1
kind: PriorityClass
metadata:
name: medium
value: 10000
preemptionPolicy: PreemptLowerPriority
globalDefault: false
description: "Standard workloads. Will preempt low-priority pods if the cluster is under pressure."
---
apiVersion: scheduling.k8s.io/v1
kind: PriorityClass
metadata:
name: high
value: 100000
preemptionPolicy: PreemptLowerPriority
globalDefault: false
description: "High-importance services. Will preempt medium- and low-priority pods if necessary."
-1
View File
@@ -9,7 +9,6 @@ metadata:
name: puppetdb
namespace: puppet
spec:
clusterIP: null
ports:
- name: pdb-http
port: 8080
+1
View File
@@ -6,3 +6,4 @@ resources:
- namespace.yaml
- gateway.yaml
- httproute.yaml
- role_k8s-service-registration.yaml
@@ -0,0 +1,24 @@
---
apiVersion: rbac.authorization.k8s.io/v1
kind: Role
metadata:
name: vault-k8s-service-registration
namespace: vault
rules:
- apiGroups: [""]
resources: ["pods"]
verbs: ["get", "update", "patch"]
---
apiVersion: rbac.authorization.k8s.io/v1
kind: RoleBinding
metadata:
name: vault-k8s-service-registration
namespace: vault
roleRef:
apiGroup: rbac.authorization.k8s.io
kind: Role
name: vault-k8s-service-registration
subjects:
- kind: ServiceAccount
name: vault
namespace: vault
+1 -1
View File
@@ -37,7 +37,7 @@ server:
cpu: 100m
limits:
memory: 2Gi
cpu: 1000m
cpu: "1"
client:
enabled: false
@@ -0,0 +1,6 @@
---
apiVersion: kustomize.config.k8s.io/v1beta1
kind: Kustomization
resources:
- ../../../base/priority-classes
+9 -3
View File
@@ -40,9 +40,7 @@ server:
}
}
service_registration "consul" {
address = "consul-server.consul.svc.cluster.local:8500"
}
service_registration "kubernetes" {}
dataStorage:
enabled: true
@@ -50,6 +48,14 @@ server:
storageClass: cephrbd-fast-delete
accessMode: ReadWriteOnce
extraEnv:
- name: VAULT_K8S_NAMESPACE
value: vault
- name: VAULT_K8S_POD_NAME
valueFrom:
fieldRef:
fieldPath: metadata.name
statefulSet:
securityContext:
container:
@@ -2,6 +2,7 @@ agent:
replicaCount: 3
env:
WOODPECKER_MAX_WORKFLOWS: "8"
WOODPECKER_BACKEND_K8S_PRIORITY_CLASS: power
WOODPECKER_BACKEND_K8S_STORAGE_CLASS: cephrbd-fast-delete
WOODPECKER_BACKEND_K8S_VOLUME_SIZE: 10G
WOODPECKER_BACKEND_K8S_STORAGE_RWX: false
+1
View File
@@ -22,6 +22,7 @@ spec:
- path: apps/overlays/*/jfrog
- path: apps/overlays/*/kanidm
- path: apps/overlays/*/node-feature-discovery
- path: apps/overlays/*/priority-classes
- path: apps/overlays/*/puppet
- path: apps/overlays/*/purelb
- path: apps/overlays/*/reflector-system
+2
View File
@@ -31,6 +31,8 @@ spec:
server: https://kubernetes.default.svc
- namespace: 'node-feature-discovery'
server: https://kubernetes.default.svc
- namespace: 'priority-classes'
server: https://kubernetes.default.svc
- namespace: 'purelb'
server: https://kubernetes.default.svc
- namespace: 'puppet'