Commit Graph

185 Commits

Author SHA1 Message Date
unkinben 01e73c3a21 Pull bind CRDs from operator repo instead of vendoring
ci/woodpecker/pr/pre-commit Pipeline was successful
ci/woodpecker/pr/kubeconform Pipeline failed
References the CRD bundle from the bind-operator repo by a stable raw URL
so the CRDs never drift from the operator, matching how other apps import
upstream manifests.

- replace the nine vendored crds/*.yaml with a single remote resource:
  git.unkin.net/unkin/bind-operator raw config/crd/install.yaml at v0.1.1
- bump the operator image to v0.1.1 so the running operator and its CRDs
  come from the same tag
2026-07-03 18:56:48 +10:00
unkinben c57b115400 Make external-dns tier authoritative (drop dynamic mode)
ci/woodpecker/pr/pre-commit Pipeline was successful
ci/woodpecker/pr/kubeconform Pipeline was successful
The dynamic cluster mode was removed from the operator; RFC2136 update
capability is a per-zone property, not a cluster role. The external-dns
tier is an authoritative cluster whose zones set dynamicUpdate.

- switch binddns-externaldns BindCluster to mode authoritative
- regenerate bindcluster schema (enum: authoritative, resolver)
2026-07-03 18:36:04 +10:00
unkinben d11c2900de Deploy bind-operator and three BIND DNS tiers
Adds the bind-operator and the three BindClusters that replace the
Puppet-managed BIND estate (authoritative / resolver / external-dns).

- add apps/base/bind-system: 9 CRDs, operator Deployment, RBAC (ns bind-system)
- add apps/base/binddns-auth: authoritative BindCluster + catalog zone + TSIG key
- add apps/base/binddns-resolver: recursive-resolver BindCluster with forwarders
- add apps/base/binddns-externaldns: dynamic (RFC2136) BindCluster + TSIG key
- add au-syd1 overlays for all four apps
- register the four apps in the platform ApplicationSet
- add binddns-* namespaces to the platform AppProject destinations
- add schemas/bind.unkin.net/*.json so kubeconform validates the new CRs

DNS Services are LoadBalancer via PureLB. TSIG key material is generated by
the operator into Secrets at runtime (no plain Secrets in git).
2026-07-03 17:48:45 +10:00
unkinben 15225433e9 chore(artifactapi): deploy v3.7.3 (#215)
ci/woodpecker/pr/pre-commit Pipeline was successful
ci/woodpecker/pr/kubeconform Pipeline was successful
## Why

artifactapi images \`v3.7.3\` are built and pushed to the registry, but au-syd1 is still running \`v3.6.5\`. This rolls the deployment forward to pick up the recent fixes.

## Changes

- \`api-deployment\`: \`artifactapi\` \`v3.6.5\` → \`v3.7.3\`
- \`ui-deployment\`: \`artifactapi-ui\` \`v3.6.5\` → \`v3.7.3\`

Included in v3.7.x since v3.6.5:
- Local-repo files now appear in the cached-objects UI (#99).
- Evicting a local RPM prunes its repodata metadata (#100).
- The bare domain redirects to the web UI at /ui (#101).

Reviewed-on: #215
Co-authored-by: Ben Vincent <ben@unkin.net>
Co-committed-by: Ben Vincent <ben@unkin.net>
2026-07-03 15:14:28 +10:00
unkinben bbb9acba36 feat: add woodpecker service accounts for media terraform repos (#214)
Add Kubernetes ServiceAccounts in the woodpecker namespace for terraform-sonarr, terraform-radarr, and terraform-prowlarr CI pipelines.

Reviewed-on: #214
Co-authored-by: Ben Vincent <ben@unkin.net>
Co-committed-by: Ben Vincent <ben@unkin.net>
2026-06-28 22:04:33 +10:00
benvin 48f32a044d fix: update TLSRoute to v1 (#213)
TLSRoutes are now in standard, no longer experimental

---------

Co-authored-by: Ben Vincent <ben@unkin.net>
Reviewed-on: #213
2026-06-28 17:50:27 +10:00
unkinben 7f1444fb38 Add Authentik identity provider deployment (#211)
## Summary
- Deploy Authentik (identity.unkin.net) via Helm chart 2026.5.3
- CNPG PostgreSQL cluster (3 instances) with separate rw/ro poolers (2 instances each)
- Redis with 5Gi persistent storage
- Gateway API for HTTPS (identity.unkin.net) and LDAPS (ldap.k8s.syd1.au.unkin.net, ldap.main.unkin.net)
- TLSRoute for LDAPS passthrough, HTTPRoute for external-dns record creation
- Vault secrets for postgres credentials, authentik secret key, and S3 storage credentials
- S3 storage via RadosGW (bucket: authentik)
- 3 server replicas, 2 worker replicas
- Woodpecker ServiceAccount for terraform-authentik CI
- Platform applicationset and project updated

## Dependencies
- terraform-git #15 (merged) — repo definition
- terraform-vault #78 (merged) — auth roles and Consul ACL

## Vault secrets needed before deploy
Write to `kv/kubernetes/namespace/authentik/default/`:
- `postgres-credentials`: username + password
- `authentik-credentials`: AUTHENTIK_SECRET_KEY
- `s3-credentials`: S3 access key + secret key

Reviewed-on: #211
Co-authored-by: Ben Vincent <ben@unkin.net>
Co-committed-by: Ben Vincent <ben@unkin.net>
2026-06-28 17:42:49 +10:00
unkinben cfca1e5278 Add age-api deployment (#210)
## Summary
- Deploy age-api to the au-syd1 cluster
- Uses configMapGenerator for people config with jaidi, ben, and sudaporn
- Includes gateway, httproute, service, and deployment
- Image: git.unkin.net/unkin/age-api:v0.1.0

Reviewed-on: #210
Co-authored-by: Ben Vincent <ben@unkin.net>
Co-committed-by: Ben Vincent <ben@unkin.net>
2026-06-28 12:19:38 +10:00
unkinben feaec2c8a9 chore: bump artifactapi + ui to v3.6.5 (#208)
Adds bandwidth saved stat to dashboard.

Reviewed-on: #208
Co-authored-by: Ben Vincent <ben@unkin.net>
Co-committed-by: Ben Vincent <ben@unkin.net>
2026-06-27 22:27:55 +10:00
unkinben d1cc467455 chore: bump artifactapi + ui to v3.6.4 (#207)
Fixes helm chart URL path duplication for same-host repos (stakater).

Reviewed-on: #207
Co-authored-by: Ben Vincent <ben@unkin.net>
Co-committed-by: Ben Vincent <ben@unkin.net>
2026-06-27 08:06:26 +10:00
unkinben 0e9ac4d390 chore: bump artifactapi + ui to v3.6.3 (#206)
Includes Docker Accept header forwarding, Content-Type fix, nginx base path fix, and version endpoint fix.

Reviewed-on: #206
Co-authored-by: Ben Vincent <ben@unkin.net>
Co-committed-by: Ben Vincent <ben@unkin.net>
2026-06-27 07:51:13 +10:00
unkinben 722ced3256 chore: bump artifactapi + ui to v3.6.2 (#205)
Includes Docker Bearer token auth (#60) and UI BASE_PATH build_args fix (#59).

Reviewed-on: #205
Co-authored-by: Ben Vincent <ben@unkin.net>
Co-committed-by: Ben Vincent <ben@unkin.net>
2026-06-27 00:20:27 +10:00
unkinben 92e6f0f13b chore: bump artifactapi + ui to v3.6.1 (#204)
Rebuilds UI with BASE_PATH=/ui so assets serve under /ui/.

Reviewed-on: #204
Co-authored-by: Ben Vincent <ben@unkin.net>
Co-committed-by: Ben Vincent <ben@unkin.net>
2026-06-27 00:03:58 +10:00
unkinben 825c46c91b chore: bump artifactapi + ui to v3.6.0 (#203)
Bumps API and UI images from v3.5.0 to v3.6.0.

Reviewed-on: #203
Co-authored-by: Ben Vincent <ben@unkin.net>
Co-committed-by: Ben Vincent <ben@unkin.net>
2026-06-26 23:57:52 +10:00
unkinben 2c9c79d8f1 fix: update UI health check paths to /ui (#202)
The UI now serves under /ui (artifactapi#58). Health probes need /ui instead of /.

Reviewed-on: #202
Co-authored-by: Ben Vincent <ben@unkin.net>
Co-committed-by: Ben Vincent <ben@unkin.net>
2026-06-26 23:57:27 +10:00
unkinben f695657d9d refactor: simplify artifactapi routes (#201)
Route /ui → UI service, everything else → API service.

Replaces the growing list of per-prefix rules (/api, /v2, /health) with a single catch-all to the API. No more needing to add a route rule every time the API adds a new top-level path.

Reviewed-on: #201
Co-authored-by: Ben Vincent <ben@unkin.net>
Co-committed-by: Ben Vincent <ben@unkin.net>
2026-06-26 23:39:24 +10:00
unkinben 5dee768170 fix: route /v2 and /health to artifactapi API service (#200)
The v3 route migration (#198) split routes into /api → API and / → UI, but /v2/ (Docker Registry V2 API) and /health now hit the UI catch-all instead of the API backend.

This breaks `docker pull artifactapi.k8s.syd1.au.unkin.net/...` with context deadline exceeded.

Adds /v2 and /health prefix rules before the UI catch-all.

Reviewed-on: #200
Co-authored-by: Ben Vincent <ben@unkin.net>
Co-committed-by: Ben Vincent <ben@unkin.net>
2026-06-26 23:31:47 +10:00
benvin f120f3b426 fix: rename environment2 to environment (#199)
update the environment secret reference to match what has been
 deployed. this prevents a containerconfigerror

---------

Co-authored-by: Ben Vincent <ben@unkin.net>
Reviewed-on: #199
2026-06-26 22:55:24 +10:00
benvin f6d60bd02d feat: artifactapi route change (#198)
complete cutover to artifactapi 3

---------

Co-authored-by: Ben Vincent <ben@unkin.net>
Reviewed-on: #198
2026-06-26 22:50:27 +10:00
benvin aac1b654bb feat: migrate to artifactapi 3+ (#197)
What changed:
- Adds new v3 API and UI deployments (separate api-deployment.yaml, ui-deployment.yaml) alongside the existing monolithic artifactapi-deployment.yaml
- Adds CNPG PostgreSQL cluster + pooler to replace the standalone postgres deployment
- Adds new api-env configmap, new Vault secrets (postgres-credentials, environment), and a second VaultAuth (default1)
- Adds new services targeting the split api and ui selectors
- Adds HPAs for both new deployments
- Updates kustomization to include all new resources

---------

Co-authored-by: Ben Vincent <ben@unkin.net>
Reviewed-on: #197
2026-06-26 22:18:07 +10:00
benvin 1c6e087116 chore: cleanup artifactory3 mess (#196)
attempted to let claude deploy a new version of artifactory with
terrible results. this change is to remove that mess so I can start
again.

---------

Co-authored-by: Ben Vincent <ben@unkin.net>
Reviewed-on: #196
2026-06-21 17:40:17 +10:00
benvin 9e6efb7c78 🤦 (#195)
Co-authored-by: Ben Vincent <ben@unkin.net>
Reviewed-on: #195
2026-06-21 17:30:47 +10:00
benvin cae42b4896 feat: manage postgres-credentials for artifactapi3 (#194)
pull credentials for postgres/cnpg from vault

---------

Co-authored-by: Ben Vincent <ben@unkin.net>
Reviewed-on: #194
2026-06-21 17:26:26 +10:00
benvin 349dc5fd01 chore: remove middleware resource (#193)
there is no crd for this, preventing the deployment of artifactapi 3

---------

Co-authored-by: Ben Vincent <ben@unkin.net>
Reviewed-on: #193
2026-06-21 09:10:49 +10:00
benvin 8cbd645332 feat: deploy artifactapi3 (#192)
just-enough to test terraform deployment and begin migration. have
change to cnpg for the database and a new bucket for storage

---------

Co-authored-by: Ben Vincent <ben@unkin.net>
Reviewed-on: #192
2026-06-20 12:22:22 +10:00
benvin ad2cdd3b63 fix: update woodpecker kustomization (#191)
Reviewed-on: #191
2026-06-17 21:34:02 +10:00
benvin 17782d716c feat: enable terraform-artifactapi jobs (#190)
woodpecker jobs for terraform-artifactapi use the service account of the
same name to run jobs, so that it can access specific secrets

- add terraform-artifactapi serviceaccount

---------

Co-authored-by: Ben Vincent <ben@unkin.net>
Reviewed-on: #190
2026-06-17 21:23:49 +10:00
unkinben 188c39f85d feat: add terraform-git service account for woodpecker CI (#189)
## Summary
- Add ServiceAccount terraform-git in woodpecker namespace for terraform-git CI pipelines
- Add to kustomization.yaml

## Test plan
- [ ] Verify ArgoCD syncs the new service account
- [ ] Verify woodpecker CI can use the service account

Reviewed-on: #189
Co-authored-by: Ben Vincent <ben@unkin.net>
Co-committed-by: Ben Vincent <ben@unkin.net>
2026-06-07 20:36:55 +10:00
unkinben 0b7819bda3 chore: bump almalinux9 image tags (#188)
Bump almalinux9 image tags to 20260606

Reviewed-on: #188
Co-authored-by: Ben Vincent <ben@unkin.net>
Co-committed-by: Ben Vincent <ben@unkin.net>
2026-06-07 00:35:12 +10:00
benvin 3c6330ebfd benvin/gitea (#187)
Co-authored-by: Ben Vincent <ben@unkin.net>
Reviewed-on: #187
2026-06-06 19:47:16 +10:00
unkinben a3a56d0c2b chore: add almalinux-vault repos (#186)
- 9.7 is end of life, ensure that we can still query packages

Reviewed-on: #186
2026-06-02 23:13:45 +10:00
unkinben 4b1fbe1fe1 feat(kanidm): scale down to single replica, remove replication (#185)
Drop from 3 replicas to 1. Remove init container, repl-certs secret,
replication port, podAntiAffinity, server-1/2 configs, and replication
stanza from server-0.toml. Mount configmap directly via subPath.

Reviewed-on: #185
2026-06-02 22:41:28 +10:00
unkinben 666f3d055c feat: add sessionaffinity to kanidm service (#184)
- required as traefik is in passthrough mode

Reviewed-on: #184
2026-06-02 21:17:51 +10:00
unkinben 3dc8801070 fix(kanidm): fix automatic_refresh TOML generation in init container (#182)
## Summary

- The `\n` escape in a shell variable wasn't interpreted as a newline when passed as a `printf %s` argument
- This caused `automatic_refresh = true` to be appended to the `partner_cert` string value on the same line, breaking TOML parsing on kanidm-2
- Fixed by using separate `printf` calls per peer type, with `\n` in the format string (not a variable) where it is correctly interpreted

## Test plan

- [ ] kanidm-2 init container generates valid TOML with `automatic_refresh = true` on its own line under the kanidm-0 peer section
- [ ] kanidm-1 and kanidm-2 start successfully and auto-refresh domain UUID from kanidm-0

Reviewed-on: #182
2026-05-31 00:25:21 +10:00
unkinben 60f1f3130b fix(kanidm): replicate 1/2 from 0 only with automatic_refresh (#181)
kanidm-0 is the authoritative supplier; kanidm-1 and kanidm-2 pull
from kanidm-0 only. automatic_refresh = true on the kanidm-0 peer
entry for kanidm-1/2 so fresh nodes auto-sync domain UUID on restart.

Reviewed-on: #181
2026-05-31 00:20:30 +10:00
unkinben b6f8cb0633 feat: autorestart statefulset (#180)
- ensure kanidm is restarted with vault secrets

Reviewed-on: #180
2026-05-30 23:40:07 +10:00
unkinben f11ec1056d fix(kanidm): remove invalid automatic_refresh from replication config (#179)
Reviewed-on: #179
2026-05-30 23:20:48 +10:00
unkinben ed7feaf19a Update apps/base/kanidm/vaultauth.yaml (#177)
Fix the VaultAuth object

Reviewed-on: #177
2026-05-30 23:11:38 +10:00
unkinben 4d594fbde7 feat(kanidm): vault-managed replication certs with auto-restart (#176)
- Store per-pod replication certs in Vault (kv/kubernetes/namespace/kanidm/default/repl-certs)
- VaultAuth + VaultStaticSecret sync certs to kanidm-repl-certs Secret
- busybox config-init init container injects peer certs from Secret into server.toml at startup
- Remove hardcoded partner_cert entries from per-pod server.toml templates
- Add automatic_refresh = true to all replication configs
- Add reloader.stakater.com/auto annotation to trigger rolling restart on ConfigMap/Secret changes
- Document domain UUID mismatch resolution and cert rotation in README

Reviewed-on: #176
2026-05-30 23:00:46 +10:00
unkinben 1b781e0885 feat(woodpecker): set workflow pod priority class to power (#175)
## Summary
Sets `WOODPECKER_BACKEND_K8S_PRIORITY_CLASS: power` on the Woodpecker agent so all CI pipeline pods are scheduled with the `power` PriorityClass (value 100, preemptionPolicy: Never).

This means pipeline pods can be evicted when the cluster is under pressure but won't preempt other workloads.

## Dependency
Requires the `power` PriorityClass to exist on the cluster — deploy PR #174 (priority-classes app) first.

## Test plan
- Trigger a pipeline run and confirm pods are created with `priorityClassName: power`
- `kubectl get pod -n woodpecker -o jsonpath='{.items[*].spec.priorityClassName}'`

Reviewed-on: #175
2026-05-26 23:58:57 +10:00
unkinben ede25a3858 feat(platform): add priority-classes app with low/power/medium/high classes (#174)
## Summary
- New `apps/base/priority-classes/` app with four `PriorityClass` objects managed via the `platform` ArgoCD project
- Adds `apps/overlays/*/priority-classes` to the platform ApplicationSet generator
- Adds `priority-classes` namespace to platform AppProject destinations (required even for cluster-scoped resources)

| Class | Value | PreemptionPolicy | Intent |
|---|---|---|---|
| `low` | 100 | Never | Background work; evictable, won't preempt others |
| `power` | 100 | Never | Compute-heavy but expendable (e.g. AI/ML workloads) |
| `medium` | 10000 | PreemptLowerPriority | Standard services |
| `high` | 100000 | PreemptLowerPriority | Critical services; preempts lower-priority pods |

`PriorityClass` is already in the platform project's `clusterResourceWhitelist` so no project policy changes were needed.

## Test plan
- ArgoCD syncs `platform-priority-classes` successfully
- `kubectl get priorityclasses low power medium high` shows all four classes

Reviewed-on: #174
2026-05-26 23:41:54 +10:00
unkinben f5f713fe86 feat(artifactapi): add open-webui/open-webui to ghcr immutable patterns (#173)
Part of #155 (prerequisite for open-webui deployment PR #172).

## Summary
- Adds `^open-webui/open-webui` to the `ghcr` remote's `immutable_patterns` in `remote-docker.yaml` so version-pinned open-webui image pulls are cached indefinitely through artifactapi

## Test plan
- artifactapi serves `ghcr.io/open-webui/open-webui:<version>` with `X-Artifact-Source: cache` on second fetch

Reviewed-on: #173
2026-05-26 23:28:27 +10:00
unkinben 3990fbfe06 feat(vault): switch to Kubernetes service registration (#171)
Replaces Consul service registration with the native Kubernetes provider so Vault labels its own pods with active/standby/perf-standby status without requiring a Consul dependency.

## Changes
- `values.yaml`: swap `service_registration "consul"` for `service_registration "kubernetes" {}`, add `VAULT_K8S_NAMESPACE` and `VAULT_K8S_POD_NAME` env vars via downward API
- `role_k8s-service-registration.yaml`: Role + RoleBinding granting the `vault` service account `get`/`update`/`patch` on pods
- `kustomization.yaml`: include new RBAC file

Reviewed-on: #171
2026-05-26 00:06:56 +10:00
unkinben d358098fff chore: update replication certs (#170)
- add replication certs for kanidm-0, kanidm-1 and kanidm-2

Reviewed-on: #170
2026-05-25 23:52:06 +10:00
unkinben 201e601737 feat: update kanidm replicaiton (#169)
- split to per-server configs
- remove init containers that attempted to automate the replication config
- add README.md

Reviewed-on: #169
2026-05-25 23:25:48 +10:00
unkinben d230d87ec9 feat(artifactapi): add conftest to GitHub generic remote cache (#168)
## Summary

- Adds `open-policy-agent/conftest/.*/conftest_.*_Linux_x86_64.tar.gz$` to the `github` remote immutable patterns in artifactapi

## Why

conftest v0.68.2 (https://github.com/open-policy-agent/conftest/releases/tag/v0.68.2) is now used for OPA policy checks in CI (see #167). Caching the release tarball in artifactapi reduces external dependency on GitHub during builds.

Reviewed-on: #168
2026-05-25 22:44:57 +10:00
unkinben 6497dab25e fix(puppet): remove explicit clusterIP: null from puppetdb Service (#166)
## Summary

- Removes `clusterIP: null` from the `puppetdb` Service spec

## Why

Setting `clusterIP: null` makes ArgoCD's desired state explicit about the field being null. Kubernetes assigns a real IP on creation and the field is immutable afterward. The null vs assigned-IP mismatch causes permanent OutOfSync on the puppetdb Service. Removing the field means ArgoCD no longer claims ownership of `clusterIP`, so the API server's value is authoritative.

Reviewed-on: #166
2026-05-25 22:44:24 +10:00
unkinben f403c6b05d fix(kanidm): add explicit group/kind/weight to TLSRoute refs (#165)
## Summary

- Adds `group: gateway.networking.k8s.io` and `kind: Gateway` to `parentRefs`
- Adds `group: ""`, `kind: Service`, and `weight: 1` to `backendRefs`

## Why

The Gateway API controller defaults these fields when creating/updating TLSRoute objects, so the live state always has them. ArgoCD diffs desired vs live by string comparison, causing the `kanidm` TLSRoute to show permanent OutOfSync. Same root cause as #162 (HTTPRoutes).

Reviewed-on: #165
2026-05-25 22:43:52 +10:00
unkinben ac8b8212bd fix(consul): normalize cpu limit to canonical string form (#164)
## Summary

- Changes `server.resources.limits.cpu` from `1000m` to `"1"` in consul Helm values

## Why

`1000m` (1000 milliCPU) is equivalent to `1` CPU, but Kubernetes normalizes the value to `"1"` when storing. ArgoCD diffs desired vs live by string comparison, so the mismatch causes a permanent OutOfSync on the `consul-server` StatefulSet. Same root cause as #163.

Reviewed-on: #164
2026-05-25 22:43:35 +10:00
unkinben dd282f59fb fix(litellm): normalize postgres cluster resource values (#163)
## Summary

- Changes `limits.memory` from `1024Mi` to `1Gi` (same value, canonical form)
- Changes `limits.cpu` from `1` (integer) to `"1"` (string, canonical form)

## Why

Kubernetes normalizes resource quantities on write — `1024Mi` becomes `1Gi` and integer `1` becomes string `"1"`. ArgoCD diffs by string comparison, so these equivalent values cause a permanent OutOfSync on the `litellm-postgres` Cluster.

Reviewed-on: #163
2026-05-24 23:30:10 +10:00