Compare commits

...

196 Commits

Author SHA1 Message Date
unkinben ea17970276 Deploy ns-resolver BIND cluster
ci/woodpecker/pr/pre-commit Pipeline was successful
ci/woodpecker/pr/kubeconform Pipeline was successful
2026-07-03 20:52:21 +10:00
unkinben 4b8f9313c8 Deploy bind-operator (operator + CRDs) (#219)
First of a 4-PR split of the bind rollout (was #216). Deploys just the operator control plane so it can be verified before any DNS clusters exist.

## Why
Roll out incrementally: operator + CRDs first, then each BIND tier as its own PR.

## Changes
- `apps/base/bind-system`: operator Deployment (`git.unkin.net/unkin/bind-operator:v0.1.1`), RBAC, namespace; CRDs pulled from the operator repo by raw URL (`config/crd/install.yaml` @ v0.1.1)
- au-syd1 `bind-system` overlay
- register all four bind apps in `argocd/applicationsets/platform.yaml` (DNS overlays instantiate only when their dirs land in the follow-up PRs)
- add `binddns-*` namespaces to `argocd/projects/platform.yaml`
- add `schemas/bind.unkin.net/*.json` for kubeconform

## Deploy impact
Operator pod + CRDs only. No DNS services yet — the operator is idle until BindClusters exist.

## Follow-ups (merge after this)
binddns-auth, binddns-resolver, binddns-externaldns — one PR each.

Reviewed-on: #219
Co-authored-by: Ben Vincent <ben@unkin.net>
Co-committed-by: Ben Vincent <ben@unkin.net>
2026-07-03 20:04:57 +10:00
unkinben bb330a0365 chore(artifactapi): deploy v3.7.4 (#218)
## Why

artifactapi `v3.7.4` images are built and pushed; au-syd1 is on `v3.7.3`. This rolls forward to ship the terraform provider registry.

## Changes

- `api-deployment`: `artifactapi` `v3.7.3` → `v3.7.4`
- `ui-deployment`: `artifactapi-ui` `v3.7.3` → `v3.7.4`

## What's new in v3.7.4

- Local terraform repos are now a real provider registry: `/.well-known/terraform.json` + `providers.v1` versions/download with GPG-signed SHA256SUMS (#102).
- The signing key self-provisions in the DB (`signing_keys` table) — no K8s secret to mount, so no deployment wiring needed.

Once synced, `terraform init` against `source = "artifactapi.k8s.syd1.au.unkin.net/<repo>/<type>"` works.

Reviewed-on: #218
Co-authored-by: Ben Vincent <ben@unkin.net>
Co-committed-by: Ben Vincent <ben@unkin.net>
2026-07-03 19:40:38 +10:00
unkinben 15225433e9 chore(artifactapi): deploy v3.7.3 (#215)
ci/woodpecker/pr/pre-commit Pipeline was successful
ci/woodpecker/pr/kubeconform Pipeline was successful
## Why

artifactapi images \`v3.7.3\` are built and pushed to the registry, but au-syd1 is still running \`v3.6.5\`. This rolls the deployment forward to pick up the recent fixes.

## Changes

- \`api-deployment\`: \`artifactapi\` \`v3.6.5\` → \`v3.7.3\`
- \`ui-deployment\`: \`artifactapi-ui\` \`v3.6.5\` → \`v3.7.3\`

Included in v3.7.x since v3.6.5:
- Local-repo files now appear in the cached-objects UI (#99).
- Evicting a local RPM prunes its repodata metadata (#100).
- The bare domain redirects to the web UI at /ui (#101).

Reviewed-on: #215
Co-authored-by: Ben Vincent <ben@unkin.net>
Co-committed-by: Ben Vincent <ben@unkin.net>
2026-07-03 15:14:28 +10:00
unkinben bbb9acba36 feat: add woodpecker service accounts for media terraform repos (#214)
Add Kubernetes ServiceAccounts in the woodpecker namespace for terraform-sonarr, terraform-radarr, and terraform-prowlarr CI pipelines.

Reviewed-on: #214
Co-authored-by: Ben Vincent <ben@unkin.net>
Co-committed-by: Ben Vincent <ben@unkin.net>
2026-06-28 22:04:33 +10:00
benvin 48f32a044d fix: update TLSRoute to v1 (#213)
TLSRoutes are now in standard, no longer experimental

---------

Co-authored-by: Ben Vincent <ben@unkin.net>
Reviewed-on: #213
2026-06-28 17:50:27 +10:00
unkinben 7f1444fb38 Add Authentik identity provider deployment (#211)
## Summary
- Deploy Authentik (identity.unkin.net) via Helm chart 2026.5.3
- CNPG PostgreSQL cluster (3 instances) with separate rw/ro poolers (2 instances each)
- Redis with 5Gi persistent storage
- Gateway API for HTTPS (identity.unkin.net) and LDAPS (ldap.k8s.syd1.au.unkin.net, ldap.main.unkin.net)
- TLSRoute for LDAPS passthrough, HTTPRoute for external-dns record creation
- Vault secrets for postgres credentials, authentik secret key, and S3 storage credentials
- S3 storage via RadosGW (bucket: authentik)
- 3 server replicas, 2 worker replicas
- Woodpecker ServiceAccount for terraform-authentik CI
- Platform applicationset and project updated

## Dependencies
- terraform-git #15 (merged) — repo definition
- terraform-vault #78 (merged) — auth roles and Consul ACL

## Vault secrets needed before deploy
Write to `kv/kubernetes/namespace/authentik/default/`:
- `postgres-credentials`: username + password
- `authentik-credentials`: AUTHENTIK_SECRET_KEY
- `s3-credentials`: S3 access key + secret key

Reviewed-on: #211
Co-authored-by: Ben Vincent <ben@unkin.net>
Co-committed-by: Ben Vincent <ben@unkin.net>
2026-06-28 17:42:49 +10:00
unkinben 784c3b5de1 Add JSON schema generation for kubeconform CRD validation (#212)
## Summary
- New `ci/generate-schemas.sh` script that generates JSON schemas from three sources:
  1. Live cluster CRDs via `kubectl get crds`
  2. Offline CRD manifests (ArgoCD v3.3.2, Gateway API v1.5.1)
  3. Kubernetes v1.33.7 swagger spec for native types
- Schemas follow Datree catalog convention (`<group>/<Kind>_<version>.json`)
- `validate-apps.sh` and `validate-clusters.sh` check local schemas first, falling back to remote
- Fixes TLSRoute (and other CRD) schema validation failures in kubeconform

## Sources
- ArgoCD: `artifactapi.../argoproj/argo-cd/refs/tags/v3.3.2/manifests/ha/install.yaml`
- Gateway API: `artifactapi.../kubernetes-sigs/gateway-api/releases/download/v1.5.1/standard-install.yaml`
- Kubernetes: `artifactapi.../kubernetes/kubernetes/refs/tags/v1.33.7/api/openapi-spec/swagger.json`

Reviewed-on: #212
Co-authored-by: Ben Vincent <ben@unkin.net>
Co-committed-by: Ben Vincent <ben@unkin.net>
2026-06-28 17:26:08 +10:00
unkinben cfca1e5278 Add age-api deployment (#210)
## Summary
- Deploy age-api to the au-syd1 cluster
- Uses configMapGenerator for people config with jaidi, ben, and sudaporn
- Includes gateway, httproute, service, and deployment
- Image: git.unkin.net/unkin/age-api:v0.1.0

Reviewed-on: #210
Co-authored-by: Ben Vincent <ben@unkin.net>
Co-committed-by: Ben Vincent <ben@unkin.net>
2026-06-28 12:19:38 +10:00
benvin 99a95f4e57 chore: source schema source for kubeconform (#209)
cache frequent lookups to prevent 400 errors from github. the schemas
are available via artifactapi.

---------

Co-authored-by: Ben Vincent <ben@unkin.net>
Reviewed-on: #209
2026-06-27 22:39:31 +10:00
unkinben feaec2c8a9 chore: bump artifactapi + ui to v3.6.5 (#208)
Adds bandwidth saved stat to dashboard.

Reviewed-on: #208
Co-authored-by: Ben Vincent <ben@unkin.net>
Co-committed-by: Ben Vincent <ben@unkin.net>
2026-06-27 22:27:55 +10:00
unkinben d1cc467455 chore: bump artifactapi + ui to v3.6.4 (#207)
Fixes helm chart URL path duplication for same-host repos (stakater).

Reviewed-on: #207
Co-authored-by: Ben Vincent <ben@unkin.net>
Co-committed-by: Ben Vincent <ben@unkin.net>
2026-06-27 08:06:26 +10:00
unkinben 0e9ac4d390 chore: bump artifactapi + ui to v3.6.3 (#206)
Includes Docker Accept header forwarding, Content-Type fix, nginx base path fix, and version endpoint fix.

Reviewed-on: #206
Co-authored-by: Ben Vincent <ben@unkin.net>
Co-committed-by: Ben Vincent <ben@unkin.net>
2026-06-27 07:51:13 +10:00
unkinben 722ced3256 chore: bump artifactapi + ui to v3.6.2 (#205)
Includes Docker Bearer token auth (#60) and UI BASE_PATH build_args fix (#59).

Reviewed-on: #205
Co-authored-by: Ben Vincent <ben@unkin.net>
Co-committed-by: Ben Vincent <ben@unkin.net>
2026-06-27 00:20:27 +10:00
unkinben 92e6f0f13b chore: bump artifactapi + ui to v3.6.1 (#204)
Rebuilds UI with BASE_PATH=/ui so assets serve under /ui/.

Reviewed-on: #204
Co-authored-by: Ben Vincent <ben@unkin.net>
Co-committed-by: Ben Vincent <ben@unkin.net>
2026-06-27 00:03:58 +10:00
unkinben 825c46c91b chore: bump artifactapi + ui to v3.6.0 (#203)
Bumps API and UI images from v3.5.0 to v3.6.0.

Reviewed-on: #203
Co-authored-by: Ben Vincent <ben@unkin.net>
Co-committed-by: Ben Vincent <ben@unkin.net>
2026-06-26 23:57:52 +10:00
unkinben 2c9c79d8f1 fix: update UI health check paths to /ui (#202)
The UI now serves under /ui (artifactapi#58). Health probes need /ui instead of /.

Reviewed-on: #202
Co-authored-by: Ben Vincent <ben@unkin.net>
Co-committed-by: Ben Vincent <ben@unkin.net>
2026-06-26 23:57:27 +10:00
unkinben f695657d9d refactor: simplify artifactapi routes (#201)
Route /ui → UI service, everything else → API service.

Replaces the growing list of per-prefix rules (/api, /v2, /health) with a single catch-all to the API. No more needing to add a route rule every time the API adds a new top-level path.

Reviewed-on: #201
Co-authored-by: Ben Vincent <ben@unkin.net>
Co-committed-by: Ben Vincent <ben@unkin.net>
2026-06-26 23:39:24 +10:00
unkinben 5dee768170 fix: route /v2 and /health to artifactapi API service (#200)
The v3 route migration (#198) split routes into /api → API and / → UI, but /v2/ (Docker Registry V2 API) and /health now hit the UI catch-all instead of the API backend.

This breaks `docker pull artifactapi.k8s.syd1.au.unkin.net/...` with context deadline exceeded.

Adds /v2 and /health prefix rules before the UI catch-all.

Reviewed-on: #200
Co-authored-by: Ben Vincent <ben@unkin.net>
Co-committed-by: Ben Vincent <ben@unkin.net>
2026-06-26 23:31:47 +10:00
benvin f120f3b426 fix: rename environment2 to environment (#199)
update the environment secret reference to match what has been
 deployed. this prevents a containerconfigerror

---------

Co-authored-by: Ben Vincent <ben@unkin.net>
Reviewed-on: #199
2026-06-26 22:55:24 +10:00
benvin f6d60bd02d feat: artifactapi route change (#198)
complete cutover to artifactapi 3

---------

Co-authored-by: Ben Vincent <ben@unkin.net>
Reviewed-on: #198
2026-06-26 22:50:27 +10:00
benvin aac1b654bb feat: migrate to artifactapi 3+ (#197)
What changed:
- Adds new v3 API and UI deployments (separate api-deployment.yaml, ui-deployment.yaml) alongside the existing monolithic artifactapi-deployment.yaml
- Adds CNPG PostgreSQL cluster + pooler to replace the standalone postgres deployment
- Adds new api-env configmap, new Vault secrets (postgres-credentials, environment), and a second VaultAuth (default1)
- Adds new services targeting the split api and ui selectors
- Adds HPAs for both new deployments
- Updates kustomization to include all new resources

---------

Co-authored-by: Ben Vincent <ben@unkin.net>
Reviewed-on: #197
2026-06-26 22:18:07 +10:00
benvin 1c6e087116 chore: cleanup artifactory3 mess (#196)
attempted to let claude deploy a new version of artifactory with
terrible results. this change is to remove that mess so I can start
again.

---------

Co-authored-by: Ben Vincent <ben@unkin.net>
Reviewed-on: #196
2026-06-21 17:40:17 +10:00
benvin 9e6efb7c78 🤦 (#195)
Co-authored-by: Ben Vincent <ben@unkin.net>
Reviewed-on: #195
2026-06-21 17:30:47 +10:00
benvin cae42b4896 feat: manage postgres-credentials for artifactapi3 (#194)
pull credentials for postgres/cnpg from vault

---------

Co-authored-by: Ben Vincent <ben@unkin.net>
Reviewed-on: #194
2026-06-21 17:26:26 +10:00
benvin 349dc5fd01 chore: remove middleware resource (#193)
there is no crd for this, preventing the deployment of artifactapi 3

---------

Co-authored-by: Ben Vincent <ben@unkin.net>
Reviewed-on: #193
2026-06-21 09:10:49 +10:00
benvin 8cbd645332 feat: deploy artifactapi3 (#192)
just-enough to test terraform deployment and begin migration. have
change to cnpg for the database and a new bucket for storage

---------

Co-authored-by: Ben Vincent <ben@unkin.net>
Reviewed-on: #192
2026-06-20 12:22:22 +10:00
benvin ad2cdd3b63 fix: update woodpecker kustomization (#191)
Reviewed-on: #191
2026-06-17 21:34:02 +10:00
benvin 17782d716c feat: enable terraform-artifactapi jobs (#190)
woodpecker jobs for terraform-artifactapi use the service account of the
same name to run jobs, so that it can access specific secrets

- add terraform-artifactapi serviceaccount

---------

Co-authored-by: Ben Vincent <ben@unkin.net>
Reviewed-on: #190
2026-06-17 21:23:49 +10:00
unkinben 188c39f85d feat: add terraform-git service account for woodpecker CI (#189)
## Summary
- Add ServiceAccount terraform-git in woodpecker namespace for terraform-git CI pipelines
- Add to kustomization.yaml

## Test plan
- [ ] Verify ArgoCD syncs the new service account
- [ ] Verify woodpecker CI can use the service account

Reviewed-on: #189
Co-authored-by: Ben Vincent <ben@unkin.net>
Co-committed-by: Ben Vincent <ben@unkin.net>
2026-06-07 20:36:55 +10:00
unkinben 0b7819bda3 chore: bump almalinux9 image tags (#188)
Bump almalinux9 image tags to 20260606

Reviewed-on: #188
Co-authored-by: Ben Vincent <ben@unkin.net>
Co-committed-by: Ben Vincent <ben@unkin.net>
2026-06-07 00:35:12 +10:00
benvin 3c6330ebfd benvin/gitea (#187)
Co-authored-by: Ben Vincent <ben@unkin.net>
Reviewed-on: #187
2026-06-06 19:47:16 +10:00
unkinben a3a56d0c2b chore: add almalinux-vault repos (#186)
- 9.7 is end of life, ensure that we can still query packages

Reviewed-on: #186
2026-06-02 23:13:45 +10:00
unkinben 4b1fbe1fe1 feat(kanidm): scale down to single replica, remove replication (#185)
Drop from 3 replicas to 1. Remove init container, repl-certs secret,
replication port, podAntiAffinity, server-1/2 configs, and replication
stanza from server-0.toml. Mount configmap directly via subPath.

Reviewed-on: #185
2026-06-02 22:41:28 +10:00
unkinben 666f3d055c feat: add sessionaffinity to kanidm service (#184)
- required as traefik is in passthrough mode

Reviewed-on: #184
2026-06-02 21:17:51 +10:00
unkinben 3dc8801070 fix(kanidm): fix automatic_refresh TOML generation in init container (#182)
## Summary

- The `\n` escape in a shell variable wasn't interpreted as a newline when passed as a `printf %s` argument
- This caused `automatic_refresh = true` to be appended to the `partner_cert` string value on the same line, breaking TOML parsing on kanidm-2
- Fixed by using separate `printf` calls per peer type, with `\n` in the format string (not a variable) where it is correctly interpreted

## Test plan

- [ ] kanidm-2 init container generates valid TOML with `automatic_refresh = true` on its own line under the kanidm-0 peer section
- [ ] kanidm-1 and kanidm-2 start successfully and auto-refresh domain UUID from kanidm-0

Reviewed-on: #182
2026-05-31 00:25:21 +10:00
unkinben 60f1f3130b fix(kanidm): replicate 1/2 from 0 only with automatic_refresh (#181)
kanidm-0 is the authoritative supplier; kanidm-1 and kanidm-2 pull
from kanidm-0 only. automatic_refresh = true on the kanidm-0 peer
entry for kanidm-1/2 so fresh nodes auto-sync domain UUID on restart.

Reviewed-on: #181
2026-05-31 00:20:30 +10:00
unkinben b6f8cb0633 feat: autorestart statefulset (#180)
- ensure kanidm is restarted with vault secrets

Reviewed-on: #180
2026-05-30 23:40:07 +10:00
unkinben f11ec1056d fix(kanidm): remove invalid automatic_refresh from replication config (#179)
Reviewed-on: #179
2026-05-30 23:20:48 +10:00
unkinben ed7feaf19a Update apps/base/kanidm/vaultauth.yaml (#177)
Fix the VaultAuth object

Reviewed-on: #177
2026-05-30 23:11:38 +10:00
unkinben 4d594fbde7 feat(kanidm): vault-managed replication certs with auto-restart (#176)
- Store per-pod replication certs in Vault (kv/kubernetes/namespace/kanidm/default/repl-certs)
- VaultAuth + VaultStaticSecret sync certs to kanidm-repl-certs Secret
- busybox config-init init container injects peer certs from Secret into server.toml at startup
- Remove hardcoded partner_cert entries from per-pod server.toml templates
- Add automatic_refresh = true to all replication configs
- Add reloader.stakater.com/auto annotation to trigger rolling restart on ConfigMap/Secret changes
- Document domain UUID mismatch resolution and cert rotation in README

Reviewed-on: #176
2026-05-30 23:00:46 +10:00
unkinben 1b781e0885 feat(woodpecker): set workflow pod priority class to power (#175)
## Summary
Sets `WOODPECKER_BACKEND_K8S_PRIORITY_CLASS: power` on the Woodpecker agent so all CI pipeline pods are scheduled with the `power` PriorityClass (value 100, preemptionPolicy: Never).

This means pipeline pods can be evicted when the cluster is under pressure but won't preempt other workloads.

## Dependency
Requires the `power` PriorityClass to exist on the cluster — deploy PR #174 (priority-classes app) first.

## Test plan
- Trigger a pipeline run and confirm pods are created with `priorityClassName: power`
- `kubectl get pod -n woodpecker -o jsonpath='{.items[*].spec.priorityClassName}'`

Reviewed-on: #175
2026-05-26 23:58:57 +10:00
unkinben ede25a3858 feat(platform): add priority-classes app with low/power/medium/high classes (#174)
## Summary
- New `apps/base/priority-classes/` app with four `PriorityClass` objects managed via the `platform` ArgoCD project
- Adds `apps/overlays/*/priority-classes` to the platform ApplicationSet generator
- Adds `priority-classes` namespace to platform AppProject destinations (required even for cluster-scoped resources)

| Class | Value | PreemptionPolicy | Intent |
|---|---|---|---|
| `low` | 100 | Never | Background work; evictable, won't preempt others |
| `power` | 100 | Never | Compute-heavy but expendable (e.g. AI/ML workloads) |
| `medium` | 10000 | PreemptLowerPriority | Standard services |
| `high` | 100000 | PreemptLowerPriority | Critical services; preempts lower-priority pods |

`PriorityClass` is already in the platform project's `clusterResourceWhitelist` so no project policy changes were needed.

## Test plan
- ArgoCD syncs `platform-priority-classes` successfully
- `kubectl get priorityclasses low power medium high` shows all four classes

Reviewed-on: #174
2026-05-26 23:41:54 +10:00
unkinben f5f713fe86 feat(artifactapi): add open-webui/open-webui to ghcr immutable patterns (#173)
Part of #155 (prerequisite for open-webui deployment PR #172).

## Summary
- Adds `^open-webui/open-webui` to the `ghcr` remote's `immutable_patterns` in `remote-docker.yaml` so version-pinned open-webui image pulls are cached indefinitely through artifactapi

## Test plan
- artifactapi serves `ghcr.io/open-webui/open-webui:<version>` with `X-Artifact-Source: cache` on second fetch

Reviewed-on: #173
2026-05-26 23:28:27 +10:00
unkinben 3990fbfe06 feat(vault): switch to Kubernetes service registration (#171)
Replaces Consul service registration with the native Kubernetes provider so Vault labels its own pods with active/standby/perf-standby status without requiring a Consul dependency.

## Changes
- `values.yaml`: swap `service_registration "consul"` for `service_registration "kubernetes" {}`, add `VAULT_K8S_NAMESPACE` and `VAULT_K8S_POD_NAME` env vars via downward API
- `role_k8s-service-registration.yaml`: Role + RoleBinding granting the `vault` service account `get`/`update`/`patch` on pods
- `kustomization.yaml`: include new RBAC file

Reviewed-on: #171
2026-05-26 00:06:56 +10:00
unkinben d358098fff chore: update replication certs (#170)
- add replication certs for kanidm-0, kanidm-1 and kanidm-2

Reviewed-on: #170
2026-05-25 23:52:06 +10:00
unkinben 201e601737 feat: update kanidm replicaiton (#169)
- split to per-server configs
- remove init containers that attempted to automate the replication config
- add README.md

Reviewed-on: #169
2026-05-25 23:25:48 +10:00
unkinben d230d87ec9 feat(artifactapi): add conftest to GitHub generic remote cache (#168)
## Summary

- Adds `open-policy-agent/conftest/.*/conftest_.*_Linux_x86_64.tar.gz$` to the `github` remote immutable patterns in artifactapi

## Why

conftest v0.68.2 (https://github.com/open-policy-agent/conftest/releases/tag/v0.68.2) is now used for OPA policy checks in CI (see #167). Caching the release tarball in artifactapi reduces external dependency on GitHub during builds.

Reviewed-on: #168
2026-05-25 22:44:57 +10:00
unkinben 6497dab25e fix(puppet): remove explicit clusterIP: null from puppetdb Service (#166)
## Summary

- Removes `clusterIP: null` from the `puppetdb` Service spec

## Why

Setting `clusterIP: null` makes ArgoCD's desired state explicit about the field being null. Kubernetes assigns a real IP on creation and the field is immutable afterward. The null vs assigned-IP mismatch causes permanent OutOfSync on the puppetdb Service. Removing the field means ArgoCD no longer claims ownership of `clusterIP`, so the API server's value is authoritative.

Reviewed-on: #166
2026-05-25 22:44:24 +10:00
unkinben f403c6b05d fix(kanidm): add explicit group/kind/weight to TLSRoute refs (#165)
## Summary

- Adds `group: gateway.networking.k8s.io` and `kind: Gateway` to `parentRefs`
- Adds `group: ""`, `kind: Service`, and `weight: 1` to `backendRefs`

## Why

The Gateway API controller defaults these fields when creating/updating TLSRoute objects, so the live state always has them. ArgoCD diffs desired vs live by string comparison, causing the `kanidm` TLSRoute to show permanent OutOfSync. Same root cause as #162 (HTTPRoutes).

Reviewed-on: #165
2026-05-25 22:43:52 +10:00
unkinben ac8b8212bd fix(consul): normalize cpu limit to canonical string form (#164)
## Summary

- Changes `server.resources.limits.cpu` from `1000m` to `"1"` in consul Helm values

## Why

`1000m` (1000 milliCPU) is equivalent to `1` CPU, but Kubernetes normalizes the value to `"1"` when storing. ArgoCD diffs desired vs live by string comparison, so the mismatch causes a permanent OutOfSync on the `consul-server` StatefulSet. Same root cause as #163.

Reviewed-on: #164
2026-05-25 22:43:35 +10:00
unkinben dd282f59fb fix(litellm): normalize postgres cluster resource values (#163)
## Summary

- Changes `limits.memory` from `1024Mi` to `1Gi` (same value, canonical form)
- Changes `limits.cpu` from `1` (integer) to `"1"` (string, canonical form)

## Why

Kubernetes normalizes resource quantities on write — `1024Mi` becomes `1Gi` and integer `1` becomes string `"1"`. ArgoCD diffs by string comparison, so these equivalent values cause a permanent OutOfSync on the `litellm-postgres` Cluster.

Reviewed-on: #163
2026-05-24 23:30:10 +10:00
unkinben 1890dd4bda fix(gateways): add explicit group/kind/weight to all HTTPRoute refs (#162)
## Summary

- Adds `group: gateway.networking.k8s.io` and `kind: Gateway` to all `parentRefs` entries
- Adds `group: ""`, `kind: Service`, and `weight: 1` to all `backendRefs` entries
- Affects 9 HTTPRoute files across artifactapi, cattle-system, consul, kanidm, litellm, paperclip, puppet, and vault

## Why

ArgoCD diffs the desired manifest against the live Kubernetes object. The Gateway API controller defaults these fields when creating/updating objects, so the live state always has them — causing persistent OutOfSync for every HTTPRoute. Same root cause as #153 (certificateRefs).

## Test plan

- [ ] All affected ArgoCD applications show Synced after merge

Reviewed-on: #162
2026-05-24 20:32:37 +10:00
unkinben 6815b66010 fix(kanidm): use dockerhub image instead of ghcr.io (#161)
## Summary

- Changes both `config-init` init container and `kanidm` container images from `ghcr.io/kanidm/server:1.10.3` to `kanidm/server:1.10.3`

## Why

`kanidm/server` is published on Docker Hub, not ghcr.io. RKE2 rewrites dockerhub pulls through the artifactapi mirror automatically.

## Test plan

- [ ] Pods roll successfully after ArgoCD sync
- [ ] Verify kanidm cluster replication still healthy

Reviewed-on: #161
2026-05-24 20:27:21 +10:00
unkinben 7cbec33588 fix(artifactapi): move kanidm to dockerhub remote (#160)
## Summary

- Removes `^kanidm/` from the `ghcr` remote immutable_patterns
- Adds `^kanidm/` to the `dockerhub` remote immutable_patterns

## Why

`kanidm/server` is published on Docker Hub, not ghcr.io. Pulling via the `ghcr` cache was failing with 403 on anonymous token fetch → 502 Bad Gateway.

## Test plan

- [ ] `docker pull artifactapi.k8s.syd1.au.unkin.net/dockerhub/kanidm/server:1.10.3` succeeds after artifactapi redeploys

Reviewed-on: #160
2026-05-24 20:24:33 +10:00
unkinben 3756208ccd benvin/kanidm (#159)
Reviewed-on: #159
2026-05-24 19:55:22 +10:00
unkinben 6ce92e8ead benvin/artifactapi-mail-images (#158)
Reviewed-on: #158
2026-05-24 14:44:38 +10:00
unkinben af79d86db6 feat(artifactapi): cache stalwart webadmin zip (#157)
## Summary

- Adds \`stalwartlabs/webadmin/releases/latest/download/webadmin.zip\` to \`mutable_patterns\` in the \`github\` generic remote so the stalwart webadmin UI can be fetched through artifactapi rather than directly from GitHub.

## Notes

- Uses \`mutable_patterns\` (not \`immutable\`) because \`releases/latest\` resolves to whichever release is current and changes over time.
- Access URL: \`https://artifactapi.k8s.syd1.au.unkin.net/generic/github/stalwartlabs/webadmin/releases/latest/download/webadmin.zip\`

Reviewed-on: #157
2026-05-24 12:55:16 +10:00
unkinben 5f4c9225bb feat(artifactapi): add mail stack images to docker registry cache (#156)
- ghcr: stalwartlabs/stalwart (Stalwart mail server)
- dockerhub: rspamd/rspamd (spam filter), tozd/postfix (MTA gateway)

Reviewed-on: #156
2026-05-24 12:42:27 +10:00
unkinben cbc2c1cb9f fix(gateways): add explicit group: "" to all certificateRefs entries (#153)
The Gateway API admission server defaults certificateRefs[].group to ""
when it is omitted. ArgoCD diffed the desired state (no group field) against
the live state (group: "") and flagged every gateway as out of sync.

Fix: explicitly set group: "" in all certificateRefs entries so the
rendered manifest matches the API server's canonical form exactly.

Affected: artifactapi, cattle-system, consul, litellm, paperclip,
puppet (puppetboard + puppetdb), vault.

Reviewed-on: #153
2026-05-23 23:47:24 +10:00
unkinben c6f9893804 fix(argocd): add vault and consul to platform project destinations (#152)
Vault and consul namespaces were missing from the platform AppProject
allowed destinations, causing ArgoCD sync failures with:
  destination server 'https://kubernetes.default.svc' and namespace
  'vault' do not match any of the allowed destinations in project 'platform'

Reviewed-on: #152
2026-05-23 23:27:24 +10:00
unkinben e43fb742ad feat(artifactapi): add kanidm to ghcr docker immutable patterns (#151)
Prerequisite for kanidm deployment (PR benvin/kanidm).

Reviewed-on: #151
2026-05-23 23:09:38 +10:00
unkinben 11ac2ae91e feat(consul): deploy HashiCorp Consul 1.22.7 via Helm chart (5-replica cluster) (#149)
## Summary

- Deploys HashiCorp Consul 1.22.7 using Helm chart 1.9.7 with 5 server replicas
- Configuration modelled on production consul: \`datacenter=au-syd1\`, \`connect=true\`, \`raft_multiplier=10\`, HTTP on 8500, GRPC on 8502, HTTPS disabled
- 5-replica server cluster with \`bootstrapExpect=5\`
- 10Gi cephrbd-fast-delete PVC per server pod
- Gateway API: HTTPS gateway + HTTPRoute (443→consul-consul-ui:80→8500) at \`consul.k8s.syd1.au.unkin.net\`
- PodDisruptionBudget patched from \`policy/v1beta1\` to \`policy/v1\` (k8s 1.25+ compatibility)
- ArgoCD platform ApplicationSet updated to include consul overlay path
- Clients disabled (server-only deployment)
- ConnectInject disabled (can be enabled later for service mesh)

## Requires

- PR #147 (artifactapi: add hashicorp/consul to docker immutable patterns) to be merged first

## Test plan

- [ ] Sandbox tested in \`sandbox-consul\`: all 5 server pods 1/1 Running, cluster formed
- [ ] After merge: ArgoCD syncs consul namespace
- [ ] Verify \`consul.k8s.syd1.au.unkin.net\` is accessible via Gateway

Reviewed-on: #149
2026-05-23 22:40:49 +10:00
unkinben d2be521878 feat(vault): deploy HashiCorp Vault 2.0.1 via Helm chart (5-replica HA raft) (#148)
## Summary

- Deploys HashiCorp Vault 2.0.1 using Helm chart 0.32.0 in HA raft mode (5 replicas)
- Configuration modelled on production vault: \`disable_mlock=true\`, headless-DNS retry_join for all 5 pods
- IPC_LOCK capability added via \`server.statefulSet.securityContext.container\`
- 10Gi cephrbd-fast-delete PVC per pod via \`dataStorage\`
- Gateway API: HTTPS gateway + HTTPRoute (443→vault service port 8200) at \`vault.k8s.syd1.au.unkin.net\`
- ArgoCD platform ApplicationSet updated to include vault overlay path
- Injector disabled (no agent sidecar injection needed)

## Requires

- PR #147 (artifactapi: add hashicorp/vault to docker immutable patterns) to be merged first

## Test plan

- [ ] Sandbox tested in \`sandbox-vault\`: all 5 pods Running, raft cluster forming
- [ ] After merge: ArgoCD syncs vault namespace
- [ ] Operator runs \`vault operator init\` to initialize, then unseals all 5 nodes
- [ ] Verify \`vault.k8s.syd1.au.unkin.net\` is accessible via Gateway

Reviewed-on: #148
2026-05-23 22:39:41 +10:00
unkinben bcd4c1a722 feat(cert-manager): upgrade to v1.20.2 and enable Gateway API support (#150)
## Summary

- Upgrades cert-manager from v1.19.2 to v1.20.2
- Enables `enableGatewayAPI: true` via the `ControllerConfiguration` config block

## Why

cert-manager's Gateway API integration was not enabled. Without it, `cert-manager.io/*` annotations on Gateway resources are ignored and no certificates are issued. This is required for the consul and vault PRs (#148, #149) to have their TLS certs automatically provisioned from their Gateway annotations.

In v1.20.2, `ExperimentalGatewayAPISupport` is BETA and defaults to true — enabling `enableGatewayAPI` in the controller config activates the gateway-shim controller.

## Test plan

- [ ] cert-manager rolls out cleanly (v1.20.2 pods become Ready)
- [ ] After rollout, existing Gateway-annotated services (artifactapi, puppet, litellm) retain valid certs
- [ ] New Gateway resources with `cert-manager.io/cluster-issuer` annotations trigger Certificate creation

Reviewed-on: #150
2026-05-23 22:38:39 +10:00
unkinben 6d9530b1ee feat(artifactapi): add hashicorp/consul and hashicorp/vault to docker immutable patterns (#147)
## Summary

- Adds \`^hashicorp/consul\` and \`^hashicorp/vault\` to the dockerhub immutable_patterns in artifactapi's remote-docker.yaml
- Replaces the more specific \`^hashicorp/vault-secrets-operator\` pattern since \`^hashicorp/vault\` subsumes it
- Required for the benvin/vault and benvin/consul branches (vault:2.0.1 and consul:1.22.7)

## Test plan

- [ ] Verify artifactapi accepts requests for hashicorp/vault and hashicorp/consul images after merge

Reviewed-on: #147
2026-05-23 18:21:25 +10:00
unkinben dcea768c15 feat(woodpecker): upgrade to v3.14.1 (chart 3.6.3) (#146)
Reviewed-on: #146
2026-05-23 18:00:55 +10:00
unkinben e05f9bfd83 feat: increase litellm resources (#144)
finding litellm performance has dropped, crashed in multiple cases, and
then it had scaled to the maximum level using the majority of memory in
cluster.

- reduce the rate at which litellm autoscales
- increase the requests/limits to match usage

Reviewed-on: #144
2026-05-23 17:59:43 +10:00
unkinben 445d8b6e7e feat: add HTTP→HTTPS redirect to Gateway API services (#145)
Add port 80 HTTP listener and redirect HTTPRoute to artifactapi,
cattle-system (rancher), litellm, paperclip, and puppetboard — restoring
the redirect behaviour that existed on the previous nginx/traefik Ingress
resources.

Reviewed-on: #145
2026-05-23 17:34:07 +10:00
unkinben c2637da068 feat(artifactapi): migrate Ingress to Gateway API (#129)
## Summary

- Replace `Ingress` (nginx) with `Gateway` + `HTTPRoute` using `traefik-internal` GatewayClass
- TLS terminated at the Gateway listener; cert-manager provisions the certificate via `vault-issuer`
- external-dns annotations moved to the Gateway

## Notes

The original Ingress had nginx-specific annotations (`proxy-body-size: 10g`, `proxy-read-timeout: 600`) which are not portable to Gateway API. These can be re-introduced via a Traefik `Middleware` CRD if needed.

## Test plan

- [ ] ArgoCD syncs the app cleanly
- [ ] cert-manager issues the `artifactapi-tls` certificate
- [ ] external-dns creates the DNS record
- [ ] `https://artifactapi.k8s.syd1.au.unkin.net` is reachable

Reviewed-on: #129
2026-05-23 16:06:33 +10:00
unkinben 90ddd932fe feat(puppet): migrate puppetdb Ingress to Gateway API (#131)
## Summary

- Replace `Ingress` (nginx) with `Gateway` + `HTTPRoute` using `traefik-internal` GatewayClass
- TLS terminated at the Gateway listener; cert-manager provisions the certificate via `vault-issuer`
- external-dns annotations moved to the Gateway
- `ingress_puppetboard.yaml` is unchanged in this PR (separate PR)

## Test plan

- [ ] ArgoCD syncs the puppet app cleanly
- [ ] cert-manager issues the `puppetdb-tls` certificate
- [ ] external-dns creates the DNS record
- [ ] `https://puppetdb.k8s.syd1.au.unkin.net` is reachable

Reviewed-on: #131
2026-05-23 16:05:26 +10:00
unkinben 2c6d88aa6b feat(puppet): migrate puppetboard Ingress to Gateway API (#130)
## Summary

- Replace `Ingress` (nginx) with `Gateway` + `HTTPRoute` using `traefik-internal` GatewayClass
- TLS terminated at the Gateway listener; cert-manager provisions the certificate via `vault-issuer`
- external-dns annotations moved to the Gateway
- `ingress_puppetdb.yaml` is unchanged in this PR (separate PR)

## Test plan

- [ ] ArgoCD syncs the puppet app cleanly
- [ ] cert-manager issues the `puppetboard-tls` certificate
- [ ] external-dns creates the DNS record
- [ ] `https://puppetboard.k8s.syd1.au.unkin.net` is reachable

Reviewed-on: #130
2026-05-23 01:31:28 +10:00
unkinben 58368948d9 feat(paperclip): migrate Ingress to Gateway API (#133)
## Summary

- Replace `Ingress` (nginx) with `Gateway` + `HTTPRoute` using `traefik-internal` GatewayClass
- TLS terminated at the Gateway listener; cert-manager provisions the certificate via `vault-issuer`
- external-dns annotations moved to the Gateway

## Test plan

- [ ] ArgoCD syncs the paperclip app cleanly
- [ ] cert-manager issues the `paperclip-tls` certificate
- [ ] external-dns creates the DNS record
- [ ] `https://paperclip.k8s.syd1.au.unkin.net` is reachable

Reviewed-on: #133
2026-05-23 01:31:03 +10:00
unkinben 4f5c3f7ea0 feat(litellm): migrate Ingress to Gateway API (#134)
## Summary

- Replace `Ingress` (nginx) with `Gateway` + `HTTPRoute` using `traefik-internal` GatewayClass
- TLS terminated at the Gateway listener; cert-manager provisions the certificate via `vault-issuer`
- external-dns annotations moved to the Gateway

## Test plan

- [ ] ArgoCD syncs the litellm app cleanly
- [ ] cert-manager issues the `litellm-tls` certificate
- [ ] external-dns creates the DNS record
- [ ] `https://litellm.k8s.syd1.au.unkin.net` is reachable

Reviewed-on: #134
2026-05-23 01:29:54 +10:00
unkinben fd87cb96b5 feat(externaldns): upgrade to 1.21.1, fix sources for installed CRDs (#143)
## Changes

- Upgrade external-dns chart from 1.19.0 → 1.21.1 (app v0.19.0 → v0.21.0)
- Remove `gateway-tcproute` source — `TCPRoute` CRD is not installed, causing crash-loop
- Add `gateway-tlsroute` source — `TLSRoute` CRD is installed (Gateway API 1.5.1)

## Why

The pod was crash-looping every minute with `failed to list *v1alpha2.TCPRoute: the server could not find the requested resource` because the TCPRoute CRD doesn't exist in this cluster. TLSRoute was previously removed but its CRD does exist.

Reviewed-on: #143
2026-05-23 01:28:20 +10:00
unkinben d619f9195e benvin/externaldns_compatability (#142)
Reviewed-on: #142
2026-05-23 01:17:20 +10:00
unkinben 1944dbbfcd temp: enable debug logging on externaldns to diagnose TLSRoute sync timeout (#140)
Temporary: enable --log-level=debug to understand why the TLSRoute informer never reports HasSynced within the 1m interval. To be closed/reverted after root cause is found.
Reviewed-on: #140
2026-05-23 01:07:45 +10:00
unkinben 0940cc20f8 fix(traefik): listen on port 443 directly for Gateway API compatibility (#138)
## Problem

Gateway listeners with `port: 443` were rejected with `PortUnavailable: Cannot find entryPoint for Gateway: no matching entryPoint for port 443 and protocol "HTTPS"`.

Traefik matches Gateway listener ports against its internal entryPoint ports (pod-level), not the Service's `exposedPort`. The `websecure` entryPoint was configured on port `8443`, so port `443` listeners had no match.

## Fix

- `ports.websecure.port: 443` — Traefik now binds directly on 443
- `securityContext.capabilities.add: [NET_BIND_SERVICE]` — allows a non-root process to bind to privileged ports (<1024)

The Service `exposedPort` stays at `443`, so external connectivity is unchanged. All existing Gateway listeners (`port: 443`) are correct as-is.

Applies to both internal and external Traefik instances.

## Test plan

- [ ] Traefik pods restart cleanly
- [ ] `kubectl get gateway -A` shows listeners as `Programmed: True`
- [ ] `https://rancher.k8s.syd1.au.unkin.net` (already merged) is reachable

Reviewed-on: #138
2026-05-23 00:44:13 +10:00
unkinben 20ce2b1b92 feat(cattle-system): migrate rancher Ingress to Gateway API (#132)
## Summary

- Replace `Ingress` (nginx) with `Gateway` + `HTTPRoute` using `traefik-internal` GatewayClass
- TLS terminated at the Gateway listener; cert-manager provisions the certificate via `vault-issuer`
- external-dns annotations moved to the Gateway

## Test plan

- [ ] ArgoCD syncs the cattle-system app cleanly
- [ ] cert-manager issues the `rancher-tls` certificate
- [ ] external-dns creates the DNS record
- [ ] `https://rancher.k8s.syd1.au.unkin.net` is reachable

Reviewed-on: #132
2026-05-23 00:24:57 +10:00
unkinben 64dc5a0242 fix(traefik): add instance labels to GatewayClasses (#137)
## Problem

GatewayClasses were `Unknown` even after controllerName was fixed. The `kubernetesGateway` `labelSelector` applies to all watched resources, including GatewayClasses themselves. Since neither GatewayClass had a `traefik.io/instance` label, both Traefik instances filtered them out and never accepted them.

## Fix

- `gatewayclass-internal.yaml`: add `traefik.io/instance: internal`
- `gatewayclass-external.yaml`: add `traefik.io/instance: external`

## Test plan

- [ ] `kubectl get gatewayclass` shows both as `Accepted: True`

Reviewed-on: #137
2026-05-23 00:23:18 +10:00
unkinben 57c14d32c0 fix(traefik): remove invalid controllerName flag causing CrashLoopBackOff (#136)
## URGENT — Traefik pods are CrashLoopBackOff

The merged PR #135 added `--providers.kubernetesgateway.controllerName` as an `additionalArguments` entry. Traefik v3.7.0 does not support this flag and fails immediately on startup.

Old replica sets are still running (one pod each) but new pods cannot come up.

## Fix

- Remove `additionalArguments` from both `values-internal.yaml` and `values-external.yaml`
- Revert GatewayClass `controllerName` back to `traefik.io/gateway-controller` (the hardcoded Traefik default — no override mechanism exists in v3.7.0)

## After merge

GatewayClasses will remain `Unknown` until a separate solution for internal/external separation is implemented (the `labelSelector` approach needs further investigation).

Reviewed-on: #136
2026-05-22 23:58:56 +10:00
unkinben 2df359c4a9 fix(traefik): set controllerName on GatewayClasses and Traefik providers (#135)
## Problem

Both GatewayClasses (`traefik-internal`, `traefik-external`) were stuck as `Unknown`. Neither Traefik deployment had `controllerName` set in `kubernetesGateway`, so both defaulted to `traefik.io/gateway-controller` — which matched neither GatewayClass.

## Fix

- `gatewayclass-internal.yaml`: `controllerName: traefik.io/gateway-controller-internal`
- `gatewayclass-external.yaml`: `controllerName: traefik.io/gateway-controller-external`
- `values-internal.yaml`: added `controllerName: traefik.io/gateway-controller-internal`
- `values-external.yaml`: added `controllerName: traefik.io/gateway-controller-external`

## Test plan

- [ ] ArgoCD syncs traefik-system cleanly
- [ ] `kubectl get gatewayclass` shows both as `Accepted: True`

Reviewed-on: #135
2026-05-22 23:44:06 +10:00
unkinben f53a2dc4f8 fix: terraform_vault must be RFC1123 compliant (#128)
Reviewed-on: #128
2026-05-21 23:19:20 +10:00
unkinben c5dd3cc5cb feat: add terraform_vault role (#127)
this adds a service account that can be used to run the terraform_vault
workflows with, so that we can access the jwt to generate a token

Reviewed-on: #127
2026-05-21 23:13:48 +10:00
unkinben 462b2b3f4f feat(externaldns): add Gateway API sources for httproute, tlsroute, grpcroute, tcproute, udproute (#126)
Reviewed-on: #126
2026-05-18 00:11:33 +10:00
unkinben 73c9b3f603 fix(traefik): replace invalid controllername flag with labelSelector for v3 (#125)
Remove --providers.kubernetesgateway.controllername which does not exist in
Traefik v3, update GatewayClass controllerName to the standard v3 value, and
use labelSelector on each instance's kubernetesGateway provider to differentiate
internal vs external traffic.

Reviewed-on: #125
2026-05-18 00:03:12 +10:00
unkinben 9a01a9ef19 fix: enable gateway/ingress class on platform project (#124)
- add missing classes to platform required to deploy traefik system

Reviewed-on: #124
2026-05-17 23:56:12 +10:00
unkinben 53553ddcfd feat: deploy internal/external traefik routers (#119)
deploy traefik for internal and external applications. port forwarding
from the external routers will only occur to the IP of the
traefik-external service.

- traefik-internal and traefik-external added
- each is a different deployment

Reviewed-on: #119
2026-05-17 23:44:50 +10:00
unkinben 5d3ff3a0f4 feat(artifactapi): allow kubeconform and kustomize from GitHub (#123)
Adds immutable patterns for yannh/kubeconform and kubernetes-sigs/kustomize
to fix 403 Forbidden errors when downloading their Linux amd64 releases.

Reviewed-on: #123
2026-05-17 12:19:27 +10:00
unkinben c3002dc3c1 feat(artifactapi): allow kubecolor releases from GitHub (#122)
Reviewed-on: #122
2026-05-11 23:39:48 +10:00
unkinben 27db33536a feat(artifactapi): allow almalinux, debian, and fedora from Docker Hub (#121)
Reviewed-on: #121
2026-05-10 22:56:39 +10:00
unkinben 8a7068a1c4 feat(artifactapi): add argo-helm as a remote and virtual helm member (#120)
Reviewed-on: #120
2026-05-10 22:53:43 +10:00
unkinben 1cefd3b78e feat: change argocd crds source to artifactapi (#118)
- migrate argocd crds to come from the artifactapi service

Reviewed-on: #118
2026-05-10 21:12:44 +10:00
unkinben 842d774fc3 feat: deploy gatewayapi crds (#117)
- enable gateway api crds

Reviewed-on: #117
2026-05-10 21:05:56 +10:00
unkinben 4c8827ce35 feat: add traefik/gatewayapi (#116)
enable access to charts/containers/api-specs so that we can migrate from
nginx-ingress to gateway api and traefik

Reviewed-on: #116
2026-05-10 17:07:33 +10:00
unkinben 5e03215f4d chore: migrate reloader/reflector to virtual/helm (#115)
Reviewed-on: #115
2026-05-05 21:42:23 +10:00
unkinben 02ee82da1e feat: update vso to 1.3.0 (#114)
- updates the vso helm chart from 1.2.0 to 1.3.0

Reviewed-on: #114
2026-05-05 00:01:58 +10:00
unkinben 18c519f979 chore: remove hashicorp helm repo (#113)
- no longer required, this is in virtual/helm repo in artifactapi

Reviewed-on: #113
2026-05-03 23:51:44 +10:00
unkinben dd0e297c14 chore: mount vault CA for helm TLS trust and add ArgoCD self-management (#112)
- Patch argocd-repo-server to mount vault-ca-cert and set SSL_CERT_DIR
  so helm subprocesses trust the internal CA when pulling charts
- Add argocd Application pointing at clusters/au-syd1/bootstrap so
  ArgoCD manages its own install going forward

Reviewed-on: #112
2026-05-03 22:47:53 +10:00
unkinben 6fb98d66b0 chore: add vault CA cert to argocd-tls-certs-cm for helm TLS trust (#111)
Patches argocd-tls-certs-cm with the Vault CA chain so ArgoCD can
verify TLS when pulling Helm charts from artifactapi.k8s.syd1.au.unkin.net.

Reviewed-on: #111
2026-05-03 17:13:25 +10:00
unkinben bcea7df925 chore: swap vso to virtual helm repo (#109)
- testing if there will be any changes after merging, before merging all of them

Reviewed-on: #109
2026-05-03 16:49:53 +10:00
unkinben f45194282b chore: add resource requests/limits to workflows (#110)
have seen some contention on woodpecker jobs, because they are not being
scheduled correctly. we need to set correct limits/requests so that they
can be accurately scheduled.

- set limits/requests for all workflows

Reviewed-on: #110
2026-05-03 16:49:46 +10:00
unkinben 260b2d4364 chore: mount vault CA cert for Node.js TLS trust in paperclip (#108)
Mount the vault-ca-cert secret and set NODE_EXTRA_CA_CERTS so Node.js
trusts the internal CA chain when making outbound TLS connections.

Reviewed-on: #108
2026-05-03 00:10:08 +10:00
unkinben 156b545249 fix: set Host header on paperclip health probes to bypass hostname guard (#107)
The privateHostnameGuard middleware blocks requests where the Host header
is not in the allowlist. Kubelet httpGet probes use the pod IP as the
Host header, which is never in the allowlist. Setting Host: localhost
ensures probes are always permitted.

Reviewed-on: #107
2026-05-02 23:01:59 +10:00
unkinben 0883f327e9 chore: update trusted hostnames (#106)
- remove scheme from paperclip.k8s..
- add localhost (what probe is hitting)

Reviewed-on: #106
2026-05-02 22:40:21 +10:00
unkinben 04b7c04366 chore: fix livenessProbe for paperclip (#105)
Reviewed-on: #105
2026-05-02 22:28:52 +10:00
unkinben 9914186fd5 chore: additional papaerclip environemnt variables (#104)
https://github.com/paperclipai/paperclip/issues/3121
Reviewed-on: https://git.unkin.net/unkin/argocd-apps/pulls/104
2026-05-02 22:11:38 +10:00
unkinben f55b7065f1 fix: rename pgpooler to include rw (#103)
- undo previous change (target pgcluster name)
- actually rename the pgpooler

Reviewed-on: #103
2026-05-02 21:39:51 +10:00
unkinben 87a5a271c3 fix: set pgpooler name to include -rw (#102)
- this matches the credentials set for paperclip

Reviewed-on: #102
2026-05-02 21:35:23 +10:00
unkinben 8e7bc289f6 chore: enable access to paperclip namespace (#101)
Reviewed-on: #101
2026-05-02 21:30:59 +10:00
unkinben e156cd10bd feat: deploy paperclip to au-syd1 via ArgoCD (aitooling project) (#100)
Adds base manifests and au-syd1 overlay for Paperclip (AI agent
orchestration platform), following the litellm deployment pattern.
Updates aitooling ApplicationSet to include the paperclip path.

Closes #99

Reviewed-on: #100
2026-05-02 21:27:51 +10:00
unkinben fe714694bf chore: bump artifactapi to 2.7.2 (#98)
Reviewed-on: #98
2026-05-02 17:19:56 +10:00
unkinben 6138afb98b feat: add litellm-env configmap with STORE_MODEL_IN_DB=True (#97)
Reviewed-on: #97
2026-05-01 22:17:53 +10:00
unkinben 949ddb76e4 chore: litellm ooming (#95)
- update memory and cpu resources

Reviewed-on: #95
2026-05-01 21:54:00 +10:00
unkinben 5372914803 feat: add litellm to new aitooling ArgoCD project (#94)
Deploys LiteLLM proxy with CNPG PostgreSQL (3-instance HA), PgBouncer
pooler, and Redis cache. Introduces a dedicated aitooling AppProject and
ApplicationSet to keep AI tooling services separate from platform infra.

Reviewed-on: #94
2026-05-01 21:40:26 +10:00
unkinben 67bb54f092 fix: artifactapi remotes (#93)
- split each yaml into its own mount

Reviewed-on: #93
2026-05-01 21:17:16 +10:00
unkinben fc568dc8b5 feat: split artifactapi config into conf.d and update to v2.7.1 (#92)
Split monolithic remotes.yaml into per-type-package files under
resources/conf.d/ to align with artifactapi v2.7.1 directory loading.
Updated schema: virtuals/locals use dedicated top-level keys, type field
removed. Added helm remotes for all kustomize helmCharts repos and
OCI patterns to docker remotes. CONFIG_PATH now points to the directory.

Reviewed-on: #92
2026-04-30 23:59:01 +10:00
unkinben 1c2c18697d feat: update artifactapi to 2.3.0 (#91)
- update to mutable/immutable ttl/patterns
- reoganised paths to correct patterns

Reviewed-on: #91
2026-04-27 13:16:02 +10:00
unkinben f2af65bc92 fix: update include patterns (#90)
- hadolint and nvim were wrong, updating

Reviewed-on: #90
2026-04-26 16:20:53 +10:00
unkinben fdca69d99a feat: update github remotes (#89)
- enable access to all tagged, master and main branches as tar/gzip
- enable access to additional tool releases

Reviewed-on: #89
2026-04-26 16:05:57 +10:00
unkinben f80be18220 benvin/dockerremotes (#88)
Reviewed-on: #88
2026-04-25 22:34:59 +10:00
unkinben 3a6d93bc3c feat: add woodpeckerci/plugin-docker-buildx to WOODPECKER_PLUGINS_PRIVILEGED (#87)
Plugin is no longer privileged by default in Woodpecker; explicitly list
both the standard and latest-insecure variants.

Reviewed-on: #87
2026-04-25 20:48:46 +10:00
unkinben 7535d655fe feat: add docker remotes to artifactapi (#86)
- set artifactapi to specific version
- add dockerhub and ghcr to remotes

Reviewed-on: #86
2026-04-25 17:40:35 +10:00
unkinben 3fc9cfa41a feat: add claude-code remote (#85)
Reviewed-on: #85
2026-04-25 11:20:47 +10:00
unkinben 7d555cd31a feat: migrate purelb to ArgoCD (#84)
Migrate PureLB load balancer from Terragrunt to ArgoCD/Kustomize.
Deploys purelb v0.13.0 with two LBNodeAgent and two ServiceGroup CRs
(common: 198.18.200.0/24, dmz: 198.18.199.0/24).
Adds LBNodeAgent and ServiceGroup to kubeconform skip list (no CRD catalog schema).

💘 Generated with Crush

Assisted-by: Claude Sonnet 4.6 via Crush <crush@charm.land>

Reviewed-on: #84
2026-04-07 19:52:17 +10:00
unkinben f0bdc0231a feat: migrate vso-system to ArgoCD (#81)
Migrate Vault Secrets Operator from Terragrunt to ArgoCD/Kustomize.
Deploys vault-secrets-operator v1.2.0 with 3 replicas, plus ClusterRole,
ClusterRoleBindings, and vault-admin ServiceAccount.

Note: static service account tokens (kubernetes.io/service-account-token)
cannot be stored in git; create manually or via Vault after deployment.

💘 Generated with Crush

Assisted-by: Claude Sonnet 4.6 via Crush <crush@charm.land>

Reviewed-on: #81
2026-04-07 19:33:50 +10:00
unkinben b100f3034e feat: migrate observability to ArgoCD (#82)
Migrate Victoria Metrics cluster and agent from Terragrunt to ArgoCD/Kustomize.
Creates new observability AppProject and ApplicationSet.
Deploys victoria-metrics-cluster v0.33.0 (vmselect/vminsert/vmstorage with
HPA, PDB, ingress) and victoria-metrics-agent v0.30.0 (3 replicas, k8s scrape
configs) in the observability namespace.

💘 Generated with Crush

Assisted-by: Claude Sonnet 4.6 via Crush <crush@charm.land>

Reviewed-on: #82
2026-04-07 19:15:45 +10:00
unkinben c3a145acbf feat: remove jfrog container registry (#83)
its not used and never really installed correctly. going to change to
artifact-keeper which promises to have the same capabilities and is open
source.

Reviewed-on: #83
2026-04-07 19:03:32 +10:00
unkinben 181bc152e7 feat: migrate vm-system to ArgoCD (#80)
Migrate Victoria Metrics operator from Terragrunt to ArgoCD/Kustomize.
Deploys victoria-metrics-operator v0.57.1 with 2 replicas in vm-system.

💘 Generated with Crush

Assisted-by: Claude Sonnet 4.6 via Crush <crush@charm.land>

Reviewed-on: #80
2026-03-27 17:04:15 +11:00
unkinben 5bcbd7e1ba feat: migrate elastic-system to ArgoCD (#79)
Migrate ECK operator from Terragrunt to ArgoCD/Kustomize.
Deploys eck-operator v3.2.0 with 2 replicas and PodDisruptionBudget
in the elastic-system namespace.

💘 Generated with Crush

Assisted-by: Claude Sonnet 4.6 via Crush <crush@charm.land>

Reviewed-on: #79
2026-03-27 17:00:05 +11:00
unkinben 02195e6235 feat: migrate reposync to ArgoCD (#78)
Migrate repository sync cronjobs from Terragrunt to ArgoCD/Kustomize.
Adds four daily CronJobs (almalinux9-baseos, almalinux9-appstream, epel9,
openvox7) with associated PVCs and ConfigMaps in the reposync namespace.

💘 Generated with Crush

Assisted-by: Claude Sonnet 4.6 via Crush <crush@charm.land>

Reviewed-on: #78
2026-03-27 16:26:35 +11:00
unkinben 95c9302aa8 feat: enable downloading tea (#77)
- enable downloading the tea prebuilt binaries

Reviewed-on: #77
2026-03-26 14:02:15 +11:00
unkinben e269220228 fix: clone r10k config to /tmp/r10k-config instead of /shared (#76)
The g10k-code cronjob was failing with "Permission denied" because the
container (running as uid 999, non-root) attempted to create /shared in
the container root filesystem, which is not writable. Clone to /tmp
which is always writable by unprivileged users.

Reviewed-on: #76
2026-03-24 19:25:06 +11:00
unkinben 1388875685 fix: remove shared-config PVC from g10k cronjob, clone r10k config directly (#75)
The RWO puppetserver-shared-config PVC caused multi-attach errors when
the cronjob pod was scheduled on a different node than the previous run,
stalling the init container indefinitely. Since the config only needs to
exist for the duration of the job, remove the init container and PVC
entirely and clone the r10k config directly into /shared within the main
container before running g10k.

Reviewed-on: #75
2026-03-24 18:54:58 +11:00
unkinben 49224d4a1b fix: increase generate-types memory limit and remove invalid JVM env var (#74)
The container was OOMKilled on every run because the 256Mi limit was far
too low for `puppet generate types`. Remove PUPPETSERVER_JAVA_ARGS (only
relevant to the puppetserver JVM, not the puppet CLI) and raise the
memory limit to 1Gi / request 512Mi.

Reviewed-on: #74
2026-03-24 18:51:46 +11:00
unkinben 28dc8dc238 feat: update gems for puppet (#73)
- add deep_merge, ipaddr, and hiera-eyaml gems
- pin intel-device-plugins to 0.35.0

Reviewed-on: #73
2026-03-24 18:33:03 +11:00
unkinben 33420e1286 revert: remove filemapper gem install (#72)
filemapper is not available on RubyGems under that name and was causing
puppetserver-compiler to crash loop. The interfaces provider that
requires puppetx/filemapper is Debian-specific and should not be loaded
on RedHat-based puppetservers.

Reviewed-on: #72
2026-03-24 18:22:23 +11:00
unkinben 0fc1268c51 fix: install filemapper gem and deploy generate-types cronjob (#71)
The network module's interfaces provider requires puppetx/filemapper
which was not installed, causing catalog compilation failures with
"no such file to load -- puppetx/filemapper".

Adds filemapper to additional-ruby-gems.sh for puppetserver/compiler
pods, installs it directly in the generate-types cronjob (which has no
access to that script), and adds cronjob_generate-types.yaml to the
kustomization so the CronJob is actually deployed.

💘 Generated with Crush

Assisted-by: Claude Sonnet 4.6 via Crush <crush@charm.land>

Reviewed-on: #71
2026-03-22 00:03:33 +11:00
unkinben c0d95b71a7 fix: connect puppetboard to puppetdb over SSL on port 8081 (#70)
Puppetboard was connecting to PuppetDB on port 8080 (plain HTTP), causing
403 Forbidden errors on the /metrics/v2 Jolokia endpoint which requires
HTTPS with a Puppet certificate. Also replaced the invalid
PUPPETDB_SSL_SKIP_VERIFY var with the correct PUPPETDB_SSL_VERIFY,
PUPPETDB_CERT, and PUPPETDB_KEY pointing to the certs already generated
by the cert-generator init container.

💘 Generated with Crush

Assisted-by: Claude Sonnet 4.6 via Crush <crush@charm.land>

Reviewed-on: #70
2026-03-22 00:01:54 +11:00
unkinben 2a96d9e948 feat: add PuppetDB read-only database user and pooler (#69)
PuppetDB requires a separate read-only database user for its read pool.
Without it, it refuses to use the write user for read queries and all
/pdb/query/v4 calls fail with a 500.

- Add puppetdb_read role via CNPG managed.roles with password sourced
  from a new postgres-read-credentials Vault secret
- Grant CONNECT, USAGE, SELECT and default privileges to puppetdb_read
  via postInitApplicationSQL (must also be run manually on existing cluster)
- Add puppet-postgres-pooler-ro Pooler (type: ro) routing to replicas
- Add puppetdb-read-database-conf ConfigMap with read-database.conf
  mounted into /etc/puppetlabs/puppetdb/conf.d/ in the PuppetDB deployment
- Wire OPENVOXDB_READ_POSTGRES_* env vars from the new secret

💘 Generated with Crush

Assisted-by: Claude Sonnet 4.6 via Crush <crush@charm.land>

Reviewed-on: #69
2026-03-21 23:31:01 +11:00
unkinben b49e8d3647 chore: change back to puppetdb:8081 (#68)
- puppetdb requires access via 8081 from puppetservers
- puppetservers do not trust the certificate via ingress

Reviewed-on: #68
2026-03-21 22:50:46 +11:00
unkinben 5f227939bc feat: add CronJob to generate Puppet types for all environments (#67)
- add kubernetes CronJob that runs every 5 minutes to automaticall generate Puppet types for all environments in the code directory.

Reviewed-on: #67
2026-03-21 17:39:03 +11:00
unkinben ffc861daa7 fix: update puppet.conf with main/server/user (#66)
- master config section is not used
- server containes all setting specifically for a server (puppet, puppet ca)
- user is for all puppet <command> tooling, like 'puppet generate'

Reviewed-on: #66
2026-03-21 17:16:15 +11:00
unkinben 47bd341371 chore: tidy initContainers (#65)
- make initcontainers easier to read/follow

Reviewed-on: #65
2026-03-21 17:16:07 +11:00
unkinben ee9ec23f6f chore: use docker not container (#64)
was referencing the main branch of upstream container, not the one I am
actually using. s/container/docker/

Reviewed-on: #64
2026-03-21 16:47:02 +11:00
unkinben 3f355bbfd3 feat: add custom entrypoint script for additional Ruby gems (#63)
Add support for installing additional Ruby gems via custom entrypoint script.
The script is mounted as a ConfigMap into /container-custom-entrypoint.d/
and will be executed during Puppetserver container startup.

Reviewed-on: #63
2026-03-21 16:01:46 +11:00
unkinben 00cbb6a817 fix: update ENC script CA certificate path (#62)
- Mount vault-ca-cert secret at /opt/vault-ca-cert.crt in both deployments
- Update cobbler-enc script to use correct CA certificate path
- Resolves OSError about missing TLS CA certificate bundle

Reviewed-on: #62
2026-03-20 23:05:35 +11:00
unkinben f474c5c530 feat: add shared bins volume for uv and cobbler-enc (#61)
- Add puppet-shared-bins PVC (10GB) for shared binaries
- Mount /opt/bin in both compiler and master deployments
- Add init container to install uv binary and cobbler script to shared volume
- Update cobbler-enc to use absolute path and uv cache directory
- Configure puppet.conf to reference cobbler-enc from /opt/bin

Reviewed-on: #61
2026-03-20 22:49:31 +11:00
unkinben c1ea6e1e81 fix: update puppet.conf to point to enc (#60)
enc script is in /etc/puppetlabs/puppet to ensure its copied during the init container phase

Reviewed-on: #60
2026-03-20 21:34:40 +11:00
unkinben 3553e9f6dd refactor: simplify DNS alt names for puppetserver compiler (#59)
Remove individual compiler pod DNS names and use generic puppetserver-compiler name instead.

Reviewed-on: #59
2026-03-20 21:27:04 +11:00
unkinben 6decc45e65 fix: use http port for puppetdb (#58)
DEBUG:urllib3.connectionpool:Starting new HTTP connection (1): puppetdb:8081
ERROR:pypuppetdb.api.base:Could not reach PuppetDB on puppetdb:8081 over HTTP.

- puppetdb_host assumes HTTP when not verifying ssl

Reviewed-on: #58
2026-03-20 21:26:52 +11:00
unkinben c2d23aaeae refactor: convert puppetserver compilers to deployment with configmap integration (#57)
- Convert StatefulSet to Deployment for better scaling flexibility
- Add initContainer to copy configmaps to shared RWX volume (10GB)
- Integrate puppetserver-compiler-config configmap for environment variables
- Configure configMapGenerator with stable names (disableNameSuffixHash)
- Update HPA to target Deployment instead of StatefulSet
- Simplify puppetboard SSL config to skip verification for internal connections

Reviewed-on: #57
2026-03-20 20:47:36 +11:00
unkinben f25117ab7f testing via ingress for puppetdb (#56)
Reviewed-on: #56
2026-03-20 00:00:41 +11:00
unkinben 47b894c450 enable debugging for puppetboard (#55)
Reviewed-on: #55
2026-03-19 23:56:49 +11:00
unkinben 059992f6a3 fix: external access to puppetdb (#53) (#54)
- use vault cert for puppetdb ingress

Reviewed-on: #53

Reviewed-on: #54
2026-03-19 23:32:27 +11:00
unkinben 6ffb0898a4 fix: external access to puppetdb (#53)
- use vault cert for puppetdb ingress

Reviewed-on: #53
2026-03-19 23:26:02 +11:00
unkinben 30d56030b5 fix: increase number of cnpg_pooler_connections (#52)
in previous puppet installs, the puppetdb api service opens MANY
connections. we need to increase the number to greater than 300.

Reviewed-on: #52
2026-03-19 18:37:03 +11:00
unkinben 504d4ae7c9 fix: enable PuppetDB HTTPS support with automatic SSL certificate generation (#51)
This enables secure HTTPS communication to PuppetDB, required for other puppet related services

- make use of USE_OPENVOXSERVER flag

Reviewed-on: #51
2026-03-19 17:06:49 +11:00
unkinben 24d09744e3 git commit -m "fix: configure PuppetDB HTTPS connections and add Puppetboard SSL support (#50)
- Update PuppetDB connections from HTTP (8080) to HTTPS (8081)
- Add automatic certificate generation for Puppetboard using Puppet CA
- Implement initContainers for proper certificate provisioning before app start
- Add dedicated PVC for Puppetboard certificates with RWX access
- Configure SSL verification and client authentication for secure PuppetDB access

Reviewed-on: #50
2026-03-19 16:34:41 +11:00
unkinben 301f8dcc1a fix: add NodeFeatureRule and Intel device plugin permissions to platform project (#49)
- Add nfd.k8s-sigs.io/NodeFeatureRule for node-feature-discovery
- Add deviceplugin.intel.com/* for Intel device plugins (GpuDevicePlugin, etc.)
- Add cert-manager.io resources (Certificate, Issuer) for Intel device plugins

Reviewed-on: #49
2026-03-19 02:20:32 +11:00
unkinben dfbb315522 feat: migrate node-feature-discovery and inteldeviceplugins-system to platform project (#48)
- Add node-feature-discovery and inteldeviceplugins-system to platform project
- Convert intel-nfd-rules from local Helm chart to static NodeFeatureRule manifests
- Add required Helm repositories (NFD OCI registry and Intel charts)
- Create base configurations with Helm charts and overlay structures
- Update platform ApplicationSet and project permissions

Reviewed-on: #48
2026-03-19 02:14:45 +11:00
unkinben d641f630e9 fix: change puppet compilers to use HTTP for internal puppetdb connections (#47)
This resolves SSL certificate verification failures preventing puppetdb access

- Update OPENVOXDB_SERVER_URLS from https://puppetdb:8081 to http://puppetdb:8080
- External access to puppetdb will still use HTTPS via ingress
- Internal cluster communication does not require encryption

Reviewed-on: #47
2026-03-19 01:51:11 +11:00
unkinben c157774033 fix: enable ServerSideApply for ArgoCD ApplicationSets (#46)
- resolve CRD annotation size limit errors by enabling server-side apply
- add storage ApplicationSet and project to kustomization files

Reviewed-on: #46
2026-03-19 01:37:56 +11:00
unkinben 90f793464b feat: migrate CSI drivers to dedicated storage project (#45)
- Migrate csi-cephfs from Terraform to ArgoCD
- Migrate csi-cephrbd from Terraform to ArgoCD
- Create dedicated storage project and ApplicationSet for CSI drivers
- Add csi-* pattern matching in storage ApplicationSet
- Remove CSI apps from platform project to separate concerns

Reviewed-on: #45
2026-03-19 01:29:31 +11:00
unkinben 06a8f98b5c feat: migrate cnpg-system from Terraform to ArgoCD (#44)
- Add cnpg-system base ArgoCD application with namespace
- Create cnpg-system overlay for au-syd1 with CloudNativePG Helm chart
- Update platform ApplicationSet to include cnpg-system deployment
- Configure cloudnative-pg operator v0.27.0 with HA and resource limits
- Maintain one-to-one migration from Terraform configuration

Reviewed-on: #44
2026-03-19 01:25:50 +11:00
unkinben 0bf6e80d6f feat: migrate externaldns from Terraform to ArgoCD (#43)
- Add externaldns base ArgoCD application with namespace and Vault integration
- Create externaldns overlay for au-syd1 with Helm chart configuration
- Update platform ApplicationSet to include externaldns deployment
- Configure external-dns v1.19.0 with RFC2136 provider for DNS updates
- Maintain one-to-one migration from Terraform configuration including TSIG secrets

Reviewed-on: #43
2026-03-19 01:22:39 +11:00
unkinben ed300fabed feat: migrate cert-manager from Terraform to ArgoCD (#42)
- Add cert-manager base ArgoCD application with namespace, RBAC resources
- Create cert-manager overlay for au-syd1 with Helm chart configuration
- Update platform ApplicationSet to include cert-manager deployment
- Configure cert-manager v1.19.2 with jetstack Helm repository
- Maintain one-to-one migration from Terraform configuration

Reviewed-on: #42
2026-03-19 01:18:19 +11:00
unkinben 656aedfc53 fix: enable unscoped permissions (#41)
- add access to create priorityclass resourcees in platform applicationset

Reviewed-on: #41
2026-03-19 01:03:54 +11:00
unkinben ea71ebb55b feat: migrate cattle-system (Rancher) from Terraform to ArgoCD (#39)
- Add cattle-system base ArgoCD application with namespace, Vault integration, and ingress
- Create cattle-system overlay for au-syd1 with Rancher Helm chart configuration
- Update platform ApplicationSet to include cattle-system deployment
- Update platform project to include Rancher Helm repository as source
- Configure Rancher v2.13.1 with HA, TLS, audit logging, and bootstrap secret from Vault
- Maintain one-to-one migration from Terraform configuration

Reviewed-on: #39
2026-03-19 00:56:39 +11:00
unkinben 5255c78927 chore: bump kubetest container (#40)
unkin/packer-images#43

Error: Error: chart requires kubeVersion: < 1.35.0-0 which is incompatible with Kubernetes v1.35.0

Reviewed-on: #40
2026-03-19 00:55:30 +11:00
unkinben 8207935d36 fix: cannot write to certificates namespace (#38)
- enable the platform application to write to certificates namespace

Reviewed-on: #38
2026-03-19 00:20:39 +11:00
unkinben 3f282fbdc2 feat: migrate certificates from Terraform to ArgoCD (#37)
- Add certificates base ArgoCD application with namespace and Vault CA certificate secret
- Create certificates overlay for au-syd1 with static certificate configuration
- Update platform ApplicationSet to include certificates deployment
- Configure Vault CA certificate with reflector annotations for cross-namespace replication
- Maintain one-to-one migration from Terraform configuration

Note: Skip no_plain_secrets hook as this is a public CA certificate that needs
to be replicated via reflector, not a sensitive secret

Reviewed-on: #37
2026-03-19 00:16:33 +11:00
unkinben 3961fe4e68 fix: annotations, not labels (#36)
<picard face palm gif>

- purelb requires annotations not labels

Reviewed-on: #36
2026-03-18 15:17:58 +11:00
unkinben e86cd7a6ae feat: ensure puppet is available externally (#35)
- change puppet/puppetca -> LoadBalancer
- dedicate ip's for puppet and puppetca loadbalancers
- name the puppetserver port
- remove puppet/puppetca ingress

Reviewed-on: #35
2026-03-18 15:07:25 +11:00
unkinben 88fe895409 fix: puppetboard port issues (#34)
service / ingres / deployment mismatch, attempt 2

Reviewed-on: #34
2026-03-18 14:31:43 +11:00
unkinben 687a7f1ffd fix: svc/puppetboard forwarding to wrong port (#33)
puppetboard uses `PUPPETBOARD_PORT` to specify the port, otherwise it
listens on tcp/80

```
ENV PUPPETBOARD_PORT 80
ENV PUPPETBOARD_HOST 0.0.0.0
ENV PUPPETBOARD_STATUS_ENDPOINT /status
ENV PUPPETBOARD_SETTINGS docker_settings.py
EXPOSE 80
```

- change svc/puppetboard to use tcp/80

Reviewed-on: #33
2026-03-18 14:25:00 +11:00
unkinben 64fb4da04c fix: puppetboard tcp is not a valid port (#32)
puppetdb_port has tcp:// in it, even though we pass the correct variable
in from a configmap.

```
ben@metabox ~/s/p/argocd-apps> kubectl --context admin run debug-pod --image=busybox --rm -it --restart=Never -n puppet -- env | grep -i puppetdb_port
PUPPETDB_PORT_8081_TCP_PORT=8081
PUPPETDB_PORT_8081_TCP_PROTO=tcp
PUPPETDB_PORT=tcp://10.43.101.142:8080
PUPPETDB_PORT_8080_TCP=tcp://10.43.101.142:8080
PUPPETDB_PORT_8080_TCP_ADDR=10.43.101.142
PUPPETDB_PORT_8081_TCP=tcp://10.43.101.142:8081
PUPPETDB_PORT_8080_TCP_PROTO=tcp
PUPPETDB_PORT_8081_TCP_ADDR=10.43.101.142
PUPPETDB_PORT_8080_TCP_PORT=8080
```

Reviewed-on: #32
2026-03-18 12:51:54 +11:00
unkinben 35f00858ae fix: puppet-compiler cant find ca (#31)
the puppetca is not pointing to the puppetmasters which prevents the
puppet-compilers from starting, preventing puppetdb/puppetboard from
starting.

- point puppetca service -> puppetserver-master

Reviewed-on: #31
2026-03-18 12:39:38 +11:00
unkinben 276d8c1d78 fix: update service names and references (#30)
updating all the names of services and their respective filenames to
better match the way puppet infra is used in my lab.

- puppet -> the compilers
- puppetca -> the master(s)
- puppetdb -> the puppetdb
- puppetboard -> puppetboard

updated references to these services in all other definitions I could find

note: need a good way to test these changes with argocd

Reviewed-on: #30
2026-03-18 12:19:57 +11:00
unkinben df1b9a5685 feat: complete puppet infrastructure (#29)
complete the implementation of puppet in kubernetes, taking many
features from the openvox helm chart and improving on them. changes from
helm are:
- using vault for storing secrets
- using g10k instead of r10k
- using a single shared g10k cronjob for all masters/compilers
- using a single shared /etc/puppetlabs/code directory (shared, cephfs)

changes:
- deploy puppet master and compiler servers with statefulset/deployment
- deploy puppetdb with postgresql backend, taking advantage of cnpg cluster and pooler
- deploy puppetboard
- all supporting configmaps, services, ingresses, and hpas
- added vaultstaticsecret for eyaml private keys
- configured secure mounting of eyaml keys at /var/lib/puppet/keys/
- updated base kustomization to include all 23 new puppet resource files

Reviewed-on: #29
2026-03-17 20:25:11 +11:00
unkinben 13de81a192 chore: cleanup r10k cache (#28)
g10k hardlinks, so reqires that the cache and code be in the same pvc.
updated r10k repository with cachedir in same pvc, and so now I can
remove these unused pvcs from argo.

unkin/puppet-r10k#4

Reviewed-on: #28
2026-03-17 19:05:21 +11:00
unkinben 02877b6385 fix: include puppet pvc yaml (#27)
- ensure the persistentvolumeclaims.yaml is included in kustomize

Reviewed-on: #27
2026-03-09 01:33:40 +11:00
unkinben b4d6fede98 chore: use specific images for ci tests (#26)
- kubetest contains required rpms
- base contains uv/make

Reviewed-on: #26
2026-03-09 01:13:33 +11:00
unkinben 14e3946d4b feat: initial puppet deployment (#25)
working towards a larger, redundant, autoscaling and simple puppet
implementation in kubernetes. this was originally based on the openvox
helm chart with several improvements (not all in this pr)

- use of cnpg instead of single bitnamilegacy postgres container
- use for g10k instead of r10k
- run one instance of g10k per namespace, instead of per-pod
- store only keep one copy of the environments/branches (instead of per-pod)
- change g10k to native cronjob instead of hacky implementation
- use vault secrets

part one adds:

- cnpg puppetdb pgsql cluster
- cnpg puppetdb pgpooler
- persistent volume claims for puppet, puppetdb, the code repository, etc

Reviewed-on: #25
2026-03-09 01:10:30 +11:00
unkinben 68b753d7fa chore: reload woodpecker (#24)
- add reloader annotations to woodpecker agent/server

Reviewed-on: #24
2026-03-07 16:02:39 +11:00
unkinben d7b661a619 chore: set WOODPECKER_ADMIN (#23)
- enable admin features for myself

Reviewed-on: #23
2026-03-07 15:47:42 +11:00
unkinben 2f6a56d15e chore: add rarlab remote (#22)
- cache rarlab packages
- found they disappear when a new release is available

Reviewed-on: #22
2026-03-07 12:14:04 +11:00
unkinben 563b81c5d2 feat: updates for artifactapi (#21)
- remove replicas (rely on horizontal-pod-scaler)
- add raw.githubusercontent.com remote

Reviewed-on: #21
2026-03-07 00:49:30 +11:00
unkinben e2ada738f8 fix: remove configmap hash (#20)
prevent the automatic hashing of configmaps

Reviewed-on: #20
2026-03-06 22:11:11 +11:00
unkinben 61b3546c2c fix: copy/paste error (#19)
- use correct role for artifactapi to access vault

Reviewed-on: #19
2026-03-06 21:46:01 +11:00
unkinben 05a88459a5 chore: migrate artifactapi to kustomize (#18)
- migrate terraform deployment to kustomize

Reviewed-on: #18
2026-03-06 21:35:47 +11:00
unkinben 0894e51ad5 feat: manage woodpecker-agent-secret in vault (#17)
- unkin/terraform-vault#60

Reviewed-on: #17
2026-03-06 18:33:21 +11:00
unkinben f9a8dca060 chore: change max workflows to string (#16)
WOODPECKER_MAX_WORKFLOWS shows no value in the pods environment, trying
as a string instead

Reviewed-on: #16
2026-03-03 23:14:05 +11:00
unkinben 46e11dd05e chore: increase agents to 3 (#15)
- increase woodpecker agents to 3 for parallel jobs

Reviewed-on: #15
2026-03-03 23:02:15 +11:00
unkinben 244d1b5baa fix: remove revision for pooler (#14)
- artifact from migrating yaml from k8s to argocd

Reviewed-on: #14
2026-03-03 22:50:45 +11:00
unkinben dbd8914013 feat: migrate woodpecker to argocd (#13)
- move woodpecker helm chart deployment to argocd
- move cnpg resources
- move vault resources

Reviewed-on: #13
2026-03-03 22:24:17 +11:00
553 changed files with 154402 additions and 91 deletions
+273
View File
@@ -0,0 +1,273 @@
---
description: Pull master, read open issues, pick one, branch, implement, test, commit, PR, and comment.
---
# Solve a Gitea Issue
## Current repo state
```!
git status --short
echo "Current branch: $(git branch --show-current)"
echo "Remote: $(git remote get-url origin 2>/dev/null || echo 'none')"
```
## Open issues (with full body)
```!
echo "Fetching open issues..."
issue_ids=$(tea issues list --output simple 2>/dev/null | awk 'NF && $1 ~ /^[0-9]+$/ {print $1}')
if [ -z "$issue_ids" ]; then
echo "No open issues found (or tea is not logged in)."
else
for id in $issue_ids; do
echo ""
echo "══════════════════════════════════════"
tea issues view "$id" --fields index,title,body 2>/dev/null \
|| tea issue "$id" 2>/dev/null \
|| echo " (could not read issue #$id)"
echo "══════════════════════════════════════"
done
fi
```
---
## Your task
Follow these steps **in order**. Do not skip steps.
### 1 — Choose an issue
Present the issues above to the user as a numbered list (index, one-line title). Ask which one to work on. Wait for the answer before continuing.
### 2 — Sync master
```bash
git checkout master
git pull
```
Confirm you are on master and up to date.
### 3 — Create a branch
Name the branch `benvin/issue-<N>-<short-slug>` where `<short-slug>` is 24 kebab-case words from the issue title.
```bash
git checkout -b benvin/issue-<N>-<slug>
```
### 4 — Read the issue in full
Re-read the full issue body shown above. If any part is ambiguous, state your interpretation before coding.
**If you discover other problems while working:** do NOT solve them inline. Create a new Gitea issue with `tea issues create --title "..." --description "..."` and stay focused on the assigned issue.
### 5 — Implement the solution
Make the code changes needed to resolve the issue. Follow the conventions already in the repo:
- `main.py` route handlers each contain a single function call; logic lives in submodules.
- No comments unless the WHY is non-obvious.
- No new files unless the issue or architecture requires it.
- Security: no command injection, XSS, SQL injection, or secrets in code.
- **For performance improvements:** implement at the most generic call site possible so the fix applies to all current and future implementations, not just the one being tested.
### 6 — Update tests
Add or update tests that cover the new behaviour. Tests live in `tests/`. Check existing test structure before writing new ones — mirror the style and fixture patterns already in use.
### 7 — Update README
If the feature introduces new config keys, endpoints, or user-facing behaviour, document it in `README.md`. Keep additions concise — follow the existing section style.
### 8 — Run the full test suite
```bash
make test
```
All tests must pass. If any fail, fix them before proceeding. Do not skip or suppress failing tests.
### 9 — Live Docker test (new package type only)
**Skip this step if the issue does not add a new remote package type.**
If the issue adds a new package type (e.g. `deb`, `conda`, `cargo`, `rubygems`, or any type not already in `remotes.yaml`), do the following before committing.
#### 9a — Add a real test remote to remotes.yaml
Append a valid, publicly accessible remote of the new type to `remotes.yaml`. Use a real upstream URL and patterns that cover both an immutable file (versioned artifact) and a mutable file (index/metadata). Add a comment explaining which URLs to use for manual testing.
#### 9b — Start the stack
```bash
make docker-up
```
Wait until `curl -s http://localhost:8000/health` returns `{"status":"healthy"}`.
#### 9c — Test a mutable file (first fetch — cache miss)
Download the index or metadata file for the new remote. Confirm:
- HTTP 200
- `X-Artifact-Source: remote` header (or equivalent log line confirming a cache miss)
- Content looks correct (not empty, not an error page)
```bash
curl -sv "http://localhost:8000/api/v1/remote/<new-remote>/<mutable-path>" 2>&1 | grep -E "< HTTP|X-Artifact"
```
#### 9d — Test a mutable file (second fetch — cache hit)
Repeat the exact same request. Confirm:
- HTTP 200
- `X-Artifact-Source: cache`
```bash
curl -sv "http://localhost:8000/api/v1/remote/<new-remote>/<mutable-path>" 2>&1 | grep -E "< HTTP|X-Artifact"
```
#### 9e — Test an immutable file (first fetch — cache miss)
Download a versioned/immutable artifact. Confirm HTTP 200 and a cache-miss log line.
```bash
curl -sv "http://localhost:8000/api/v1/remote/<new-remote>/<immutable-path>" 2>&1 | grep -E "< HTTP|X-Artifact"
```
#### 9f — Test an immutable file (second fetch — cache hit)
Repeat. Confirm `X-Artifact-Source: cache`.
#### 9g — Check container logs
```bash
make docker-logs
```
Scan for:
- `Cache MISS` on first fetches, `Cache HIT` on second fetches
- `Cache ADD SUCCESS` with correct sizes
- No unhandled exceptions or ERROR lines
#### 9h — Exercise package-type tooling against the proxy
Use the native tooling for this package type to verify end-to-end behaviour. Examples:
| Package type | Command |
|---|---|
| `pypi` | `uv run --index-url http://localhost:8000/api/v1/remote/<remote>/simple <tool>` |
| `npm` | `npm install --registry http://localhost:8000/api/v1/remote/<remote>/ <pkg>` |
| `helm` | `helm repo add test http://localhost:8000/api/v1/remote/<remote> && helm search repo test && helm template test/<chart>` |
| `alpine` | `apk fetch --repository http://localhost:8000/api/v1/remote/<remote>/<branch>/<arch> <pkg>` |
| `rpm` | `dnf install --repofrompath ... <pkg>` or `repoquery` |
| `generic` | `curl` / `wget` as appropriate |
Confirm the tool resolves and downloads correctly through the proxy.
#### 9i — Tear down
```bash
make docker-down
```
Fix any failures found during 9b9h before moving on.
### 9.5 — Performance issues: measure before/after and gate the PR
**Skip this step if the issue is not a performance improvement.**
For performance issues, a PR is only warranted if there is a measurable gain. Use the Docker stack to compare before and after.
#### 9.5a — Baseline measurement (before)
Start the stack with the **unmodified** code (temporarily revert your change):
```bash
make docker-up
```
Warm or clear the cache as appropriate, then measure the relevant metric — e.g. concurrent request latency during a slow operation, response time for a specific endpoint, or throughput. Record the numbers.
#### 9.5b — Apply your change and rebuild
```bash
make docker-up # rebuilds the image
```
Repeat exactly the same measurement. Record the numbers.
#### 9.5c — Decide
If the improvement is not clearly measurable, **do not open a PR**. Instead:
1. Update the issue with your findings.
2. Note any conditions under which the improvement would be observable.
3. Skip steps 1114.
If the improvement is clear, proceed with the commit and PR. Include the before/after numbers in the PR description and the issue comment.
#### 9.5d — Tear down
```bash
make docker-down
```
### 10 — Build the wheel (smoke check)
```bash
uv build --wheel
```
Confirm the build succeeds.
### 11 — Stage and commit
Stage only the files you changed. Do not use `git add -A` or `git add .` — list files explicitly. Run:
```bash
git add <file1> <file2> ...
git commit
```
The commit message must:
- Start with a conventional-commit prefix (`feat:`, `fix:`, `refactor:`, `chore:`, etc.)
- Summarise the change in ≤ 72 characters on the first line
- Optionally include a short body explaining *why* (not *what*)
If the pre-commit hook auto-fixes files, re-stage the fixed files and commit again.
### 12 — Push the branch
```bash
git push origin <branch-name>
```
### 13 — Open a pull request
```bash
tea pulls create \
--base master \
--head <branch-name> \
--title "<same as commit subject>" \
--description "Closes #<N>\n\n## Summary\n<bullet points>\n\n## Test plan\n<what was verified>"
```
### 14 — Comment on the issue
```bash
tea comment <N> "<resolution comment>"
```
The comment must cover:
- **How it was resolved** — what changed and why
- **Issues encountered** — any non-obvious problems hit during implementation
- **Potential future improvements** — what could be done next
### 15 — Return to master
```bash
git checkout master
```
Report the PR URL and a one-sentence summary to the user.
+2
View File
@@ -7,6 +7,7 @@ repos:
- id: check-json
- id: check-added-large-files
args: ['--maxkb=500']
exclude: '^schemas/'
- id: check-merge-conflict
- id: check-shebang-scripts-are-executable
- id: check-symlinks
@@ -19,6 +20,7 @@ repos:
- id: end-of-file-fixer
- id: forbid-new-submodules
- id: pretty-format-json
args: ['--autofix']
- id: trailing-whitespace
# YAML linting
+11 -2
View File
@@ -3,7 +3,16 @@ when:
steps:
- name: kubeconform
image: git.unkin.net/unkin/almalinux9-base:latest
image: git.unkin.net/unkin/almalinux9-kubetest:20260606
commands:
- dnf install make kustomize kubeconform helm -y
- make kubeconform
backend_options:
kubernetes:
serviceAccountName: default
resources:
requests:
memory: 512Mi
cpu: 1
limits:
memory: 2Gi
cpu: 2
+11 -2
View File
@@ -3,7 +3,16 @@ when:
steps:
- name: pre-commit
image: git.unkin.net/unkin/almalinux9-base:latest
image: git.unkin.net/unkin/almalinux9-base:20260606
commands:
- dnf install uv make -y
- uvx pre-commit run --all-files
backend_options:
kubernetes:
serviceAccountName: default
resources:
requests:
memory: 256Mi
cpu: 250m
limits:
memory: 1Gi
cpu: 1
+261
View File
@@ -0,0 +1,261 @@
# AGENTS.md
## Project Overview
This is an **ArgoCD GitOps repository** that manages Kubernetes applications for the `au-syd1` cluster using a Kustomize + Helm pattern. Applications are deployed via ArgoCD ApplicationSets that watch directory patterns in this repo.
The migration pattern for this repo is: **Terragrunt/Terraform → ArgoCD** (see `migration.md` for full guide).
---
## Essential Commands
```bash
# Build and render manifests for a path (outputs to manifests/<path>/)
make build apps/overlays/au-syd1/<app-name>
make build clusters/au-syd1/bootstrap
# Validate all apps and clusters with kubeconform
make kubeconform
# Clean generated manifests
make clean
# Quick build + inspect without persisting output
kustomize build --enable-helm apps/overlays/au-syd1/<app-name>
# Check all resource kinds produced by an overlay
kustomize build --enable-helm apps/overlays/au-syd1/<app-name> | grep "^kind:" | sort | uniq -c
# Run pre-commit checks against all files
uvx pre-commit run --all-files
```
---
## Directory Structure
```
argocd-apps/
├── argocd/
│ ├── applicationsets/ # ArgoCD ApplicationSet definitions (platform.yaml, storage.yaml)
│ └── projects/ # ArgoCD AppProject definitions (platform.yaml, storage.yaml)
├── apps/
│ ├── base/ # Base Kustomize resources per app (no cluster-specific config)
│ │ └── <app-name>/
│ │ ├── kustomization.yaml
│ │ ├── namespace.yaml
│ │ ├── vaultauth.yaml # (if Vault-managed secrets)
│ │ └── vaultstaticsecret.yaml
│ └── overlays/
│ └── au-syd1/ # Cluster-specific overlays
│ └── <app-name>/
│ ├── kustomization.yaml # references base + helmCharts
│ └── values.yaml # Helm values for this cluster
├── clusters/
│ └── au-syd1/
│ ├── apps/ # Entry point: references apps/base (ArgoCD app-of-apps)
│ └── bootstrap/ # ArgoCD install + initial Application manifest
├── ci/
│ ├── validate-apps.sh # kubeconform over apps/overlays/*/kustomization.yaml
│ ├── validate-clusters.sh # kubeconform over clusters/*/kustomization.yaml
│ └── validate-no-secrets.sh # pre-commit hook: blocks plain Kubernetes Secrets
└── sources/ # Reference sources (Terraform configs, upstream charts, etc.)
└── terraform-k8s/ # Original Terraform configs — reference when migrating
```
---
## Adding a New Application
Follow these 10 steps (detailed in `migration.md`):
### 1. Create base resources
```
apps/base/<app-name>/
├── kustomization.yaml
├── namespace.yaml
├── vaultauth.yaml # if needed
└── vaultstaticsecret.yaml # if needed
```
### 2. Create cluster overlay
```
apps/overlays/au-syd1/<app-name>/
├── kustomization.yaml
└── values.yaml
```
**Overlay kustomization.yaml pattern:**
```yaml
---
apiVersion: kustomize.config.k8s.io/v1beta1
kind: Kustomization
resources:
- ../../../base/<app-name>
helmCharts:
- name: <chart-name>
repo: <helm-repo-url>
version: "<version>"
releaseName: <release-name>
namespace: <namespace>
valuesFile: values.yaml
```
### 3. Register in ApplicationSet
Add a directory entry to `argocd/applicationsets/platform.yaml` (or `storage.yaml` for `csi-*` apps):
```yaml
- path: apps/overlays/*/<app-name>
```
### 4. Update AppProject
In `argocd/projects/platform.yaml` (or `storage.yaml`):
- Add the Helm repo URL to `sourceRepos`
- Add the namespace to `destinations`
- Add any required cluster-scoped resource types to `clusterResourceWhitelist`
### 5. Validate
```bash
kustomize build --enable-helm apps/overlays/au-syd1/<app-name>
make kubeconform
```
---
## Secret Management
**Plain Kubernetes `Secret` objects are blocked** by the pre-commit hook. Use Vault Operator CRDs instead:
### VaultAuth template
```yaml
apiVersion: secrets.hashicorp.com/v1beta1
kind: VaultAuth
metadata:
name: default
namespace: <namespace>
spec:
method: kubernetes
mount: k8s/au/syd1
vaultConnectionRef: vso-system/default
allowedNamespaces:
- <namespace>
kubernetes:
role: <role>
serviceAccount: <service-account>
audiences:
- vault
tokenExpirationSeconds: 600
```
### VaultStaticSecret template
```yaml
apiVersion: secrets.hashicorp.com/v1beta1
kind: VaultStaticSecret
metadata:
name: <secret-name>
namespace: <namespace>
spec:
vaultAuthRef: default
mount: kv
type: kv-v2
path: kubernetes/namespace/<namespace>/default/<secret-name>
refreshAfter: 5m
destination:
name: <k8s-secret-name>
create: true
overwrite: true
hmacSecretData: true
```
---
## YAML Conventions
- **2-space indentation** (enforced by yamllint)
- All files must end with a newline (`end-of-file-fixer`)
- No trailing whitespace
- YAML linting uses relaxed rules with `line-length: disable` (long base64/URLs are fine)
- yamllint ignores `chart` directories (vendored Helm charts)
- `---` document separator at top of every YAML file
- Multiple documents in one file are allowed (e.g., `vaultstaticsecret.yaml` often contains multiple secrets)
---
## Kubernetes Labels Pattern
Use standard `app.kubernetes.io/*` labels consistently:
```yaml
labels:
app.kubernetes.io/component: <component>
app.kubernetes.io/instance: <release-name>
app.kubernetes.io/name: <app-name>
app.kubernetes.io/version: <version>
```
---
## Resource Naming Conventions
Files in `apps/base/<app-name>/` follow the pattern:
```
<kind>_<name>.yaml
```
Examples:
- `deployment_puppetserver-master.yaml`
- `cronjob_g10k-code.yaml`
- `configmap_puppetboard-config.yaml`
- `horizontalpodautoscaler_puppetserver-compilers-autoscaler.yaml`
- `service_puppet-headless.yaml`
---
## Helm Chart Vendoring
Some overlays vendor Helm charts locally under `apps/overlays/au-syd1/<app-name>/charts/<chart-name>/`. When a chart is vendored, the overlay's `kustomization.yaml` references the local path. When not vendored, it references the OCI or HTTP repo directly.
Current Kubernetes target version: **1.33.7** (used by kubeconform in CI).
---
## Project Boundaries
| Project | ApplicationSet | App pattern |
|------------|---------------------------|--------------------------|
| `platform` | `argocd/applicationsets/platform.yaml` | Named apps (cert-manager, puppet, woodpecker, etc.) |
| `storage` | `argocd/applicationsets/storage.yaml` | `csi-*` apps |
The `clusters/au-syd1/apps/` entry-point is deployed as a standalone ArgoCD `Application` (not an ApplicationSet) called `au-syd1-apps`.
---
## CI / Pre-commit Hooks
Runs on every PR via Woodpecker CI (`.woodpecker/`):
| Check | Tool | Trigger |
|---|---|---|
| YAML lint + general file checks | `pre-commit` (yamllint + pre-commit-hooks) | PR |
| No plain Secrets | `ci/validate-no-secrets.sh` | PR (staged files) |
| Kubernetes manifest validation | `kubeconform` via `make kubeconform` | PR |
kubeconform skips: `CustomResourceDefinition`, `GpuDevicePlugin` (for apps validation).
---
## Git Workflow
- Branch naming: `benvin/<app-name>` (user prefix)
- **Never `git add .`** — add only relevant files explicitly
- If pre-commit modifies files, `git add -u` then `git commit --amend --no-edit`
- Use `git push --force-with-lease` after amending
---
## Security Policies
- `reloader.stakater.com/auto: "true"` annotation triggers rolling restarts on ConfigMap/Secret changes
- Security contexts follow least-privilege: `drop: [all]` then add only required capabilities
- `fsGroup: 999` on pod security context for Puppet workloads
- `runAsUser: 0` is used only for init containers that need to set file permissions, then regular containers run as non-root
+5 -1
View File
@@ -1,4 +1,4 @@
.PHONY: build clean
.PHONY: build clean schemas
# Build a kustomization path to manifests directory
# Usage: make build clusters/au-syd1/bootstrap
@@ -6,6 +6,10 @@ build:
@mkdir -p manifests/$(filter-out $@,$(MAKECMDGOALS))
@kustomize build --enable-helm $(filter-out $@,$(MAKECMDGOALS)) --output manifests/$(filter-out $@,$(MAKECMDGOALS))
# Generate JSON schemas from CRDs and Kubernetes swagger spec (run manually, results committed)
schemas:
@ci/generate-schemas.sh schemas
# kubeconform
kubeconform:
@ci/validate-apps.sh && \
+45
View File
@@ -0,0 +1,45 @@
---
apiVersion: apps/v1
kind: Deployment
metadata:
name: age-api
namespace: age-api
spec:
replicas: 1
selector:
matchLabels:
app: age-api
template:
metadata:
annotations:
reloader.stakater.com/auto: "true"
labels:
app: age-api
spec:
containers:
- name: age-api
image: git.unkin.net/unkin/age-api:v0.1.0
imagePullPolicy: IfNotPresent
ports:
- containerPort: 8080
name: http
protocol: TCP
env:
- name: CONFIG_PATH
value: /etc/age-api/config.yaml
resources:
limits:
cpu: 100m
memory: 64Mi
requests:
cpu: 10m
memory: 32Mi
volumeMounts:
- mountPath: /etc/age-api/config.yaml
name: config
subPath: config.yaml
restartPolicy: Always
volumes:
- name: config
configMap:
name: age-api-config
+37
View File
@@ -0,0 +1,37 @@
---
apiVersion: gateway.networking.k8s.io/v1
kind: Gateway
metadata:
labels:
traefik.io/instance: internal
annotations:
cert-manager.io/cluster-issuer: vault-issuer
cert-manager.io/common-name: age-api.k8s.syd1.au.unkin.net
cert-manager.io/private-key-size: "4096"
external-dns.alpha.kubernetes.io/hostname: age-api.k8s.syd1.au.unkin.net
external-dns.alpha.kubernetes.io/target: 198.18.200.4
name: age-api
namespace: age-api
spec:
gatewayClassName: traefik-internal
listeners:
- allowedRoutes:
namespaces:
from: Same
hostname: age-api.k8s.syd1.au.unkin.net
name: http
port: 80
protocol: HTTP
- allowedRoutes:
namespaces:
from: Same
hostname: age-api.k8s.syd1.au.unkin.net
name: https
port: 443
protocol: HTTPS
tls:
certificateRefs:
- group: ""
kind: Secret
name: age-api-tls
mode: Terminate
+49
View File
@@ -0,0 +1,49 @@
---
apiVersion: gateway.networking.k8s.io/v1
kind: HTTPRoute
metadata:
name: age-api-http-redirect
namespace: age-api
spec:
hostnames:
- age-api.k8s.syd1.au.unkin.net
parentRefs:
- group: gateway.networking.k8s.io
kind: Gateway
name: age-api
sectionName: http
rules:
- filters:
- type: RequestRedirect
requestRedirect:
scheme: https
statusCode: 301
matches:
- path:
type: PathPrefix
value: /
---
apiVersion: gateway.networking.k8s.io/v1
kind: HTTPRoute
metadata:
name: age-api
namespace: age-api
spec:
hostnames:
- age-api.k8s.syd1.au.unkin.net
parentRefs:
- group: gateway.networking.k8s.io
kind: Gateway
name: age-api
sectionName: https
rules:
- backendRefs:
- group: ""
kind: Service
name: age-api
port: 80
weight: 1
matches:
- path:
type: PathPrefix
value: /
+17
View File
@@ -0,0 +1,17 @@
---
apiVersion: kustomize.config.k8s.io/v1beta1
kind: Kustomization
resources:
- deployment.yaml
- gateway.yaml
- httproute.yaml
- namespace.yaml
- service.yaml
configMapGenerator:
- name: age-api-config
files:
- config.yaml=resources/config.yaml
options:
disableNameSuffixHash: true
+5
View File
@@ -0,0 +1,5 @@
---
apiVersion: v1
kind: Namespace
metadata:
name: age-api
+7
View File
@@ -0,0 +1,7 @@
people:
- name: jaidi
birthtime: 1773135720
- name: ben
birthtime: 559663200
- name: sudaporn
birthtime: 686757600
+17
View File
@@ -0,0 +1,17 @@
---
apiVersion: v1
kind: Service
metadata:
name: age-api
namespace: age-api
spec:
internalTrafficPolicy: Cluster
ports:
- name: http
port: 80
protocol: TCP
targetPort: http
selector:
app: age-api
sessionAffinity: None
type: ClusterIP
+91
View File
@@ -0,0 +1,91 @@
---
apiVersion: apps/v1
kind: Deployment
metadata:
name: api
namespace: artifactapi
annotations:
reloader.stakater.com/auto: "true"
spec:
selector:
matchLabels:
app: api
strategy:
rollingUpdate:
maxUnavailable: 1
type: RollingUpdate
template:
metadata:
labels:
app: api
spec:
automountServiceAccountToken: true
initContainers:
- name: combine-certs
image: alpine:3
command:
- sh
- -c
- cat /etc/ssl/certs/ca-certificates.crt /custom-ca/ca.crt > /combined-certs/ca-certificates.crt
volumeMounts:
- name: vault-ca-cert
mountPath: /custom-ca
readOnly: true
- name: combined-certs
mountPath: /combined-certs
containers:
- name: api
image: git.unkin.net/unkin/artifactapi:v3.7.4
imagePullPolicy: IfNotPresent
ports:
- containerPort: 8000
name: http
protocol: TCP
envFrom:
- configMapRef:
name: api-env
optional: false
- secretRef:
name: environment
optional: false
volumeMounts:
- name: combined-certs
mountPath: /etc/ssl/combined
readOnly: true
livenessProbe:
failureThreshold: 3
httpGet:
path: /health
port: http
scheme: HTTP
initialDelaySeconds: 30
periodSeconds: 30
successThreshold: 1
timeoutSeconds: 5
readinessProbe:
failureThreshold: 3
httpGet:
path: /health
port: http
scheme: HTTP
initialDelaySeconds: 10
periodSeconds: 5
successThreshold: 1
timeoutSeconds: 5
resources:
limits:
cpu: "1"
memory: 4Gi
requests:
cpu: 100m
memory: 256Mi
volumes:
- name: vault-ca-cert
secret:
secretName: vault-ca-cert
items:
- key: ca.crt
path: ca.crt
- name: combined-certs
emptyDir: {}
restartPolicy: Always
+41
View File
@@ -0,0 +1,41 @@
---
apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata:
name: api-hpa
namespace: artifactapi
spec:
scaleTargetRef:
apiVersion: apps/v1
kind: Deployment
name: api
minReplicas: 2
maxReplicas: 10
metrics:
- type: Resource
resource:
name: cpu
target:
type: Utilization
averageUtilization: 60
behavior:
scaleUp:
stabilizationWindowSeconds: 0
selectPolicy: Max
policies:
- type: Percent
value: 100
periodSeconds: 30
- type: Pods
value: 4
periodSeconds: 30
scaleDown:
stabilizationWindowSeconds: 300
selectPolicy: Min
policies:
- type: Percent
value: 10
periodSeconds: 60
- type: Pods
value: 2
periodSeconds: 60
+91
View File
@@ -0,0 +1,91 @@
---
apiVersion: postgresql.cnpg.io/v1
kind: Cluster
metadata:
name: postgres
namespace: artifactapi
spec:
affinity:
podAntiAffinityType: preferred
bootstrap:
initdb:
database: artifacts
encoding: UTF8
localeCType: C
localeCollate: C
owner: artifacts
secret:
name: postgres-credentials
enablePDB: true
enableSuperuserAccess: false
failoverDelay: 0
imageName: ghcr.io/cloudnative-pg/postgresql:18.1-system-trixie
instances: 3
logLevel: info
maxSyncReplicas: 0
minSyncReplicas: 0
monitoring:
customQueriesConfigMap:
- key: queries
name: cnpg-default-monitoring
disableDefaultQueries: false
enablePodMonitor: false
postgresql:
parameters:
archive_mode: "on"
archive_timeout: 5min
dynamic_shared_memory_type: posix
effective_cache_size: 256MB
full_page_writes: "on"
log_destination: csvlog
log_directory: /controller/log
log_filename: postgres
log_rotation_age: "0"
log_rotation_size: "0"
log_truncate_on_rotation: "false"
logging_collector: "on"
max_connections: "200"
max_parallel_workers: "16"
max_replication_slots: "16"
max_worker_processes: "16"
shared_buffers: 128MB
shared_memory_type: mmap
ssl_max_protocol_version: TLSv1.3
ssl_min_protocol_version: TLSv1.3
wal_keep_size: 256MB
wal_level: logical
wal_log_hints: "on"
wal_receiver_timeout: 5s
wal_sender_timeout: 5s
syncReplicaElectionConstraint:
enabled: false
primaryUpdateMethod: restart
primaryUpdateStrategy: unsupervised
probes:
liveness:
isolationCheck:
connectionTimeout: 1000
enabled: true
requestTimeout: 1000
replicationSlots:
highAvailability:
enabled: true
slotPrefix: _cnpg_
synchronizeReplicas:
enabled: true
updateInterval: 30
resources:
limits:
cpu: 500m
memory: 512Mi
requests:
cpu: 250m
memory: 256Mi
smartShutdownTimeout: 180
startDelay: 3600
stopDelay: 1800
storage:
resizeInUseVolumes: true
size: 20Gi
storageClass: cephrbd-fast-delete
switchoverDelay: 3600
+33
View File
@@ -0,0 +1,33 @@
---
apiVersion: postgresql.cnpg.io/v1
kind: Pooler
metadata:
name: postgres-pooler
namespace: artifactapi
spec:
cluster:
name: postgres
instances: 2
pgbouncer:
parameters:
default_pool_size: "100"
max_client_conn: "400"
paused: false
poolMode: session
template:
metadata:
labels:
app: pooler
spec:
affinity:
podAntiAffinity:
requiredDuringSchedulingIgnoredDuringExecution:
- labelSelector:
matchExpressions:
- key: app
operator: In
values:
- pooler
topologyKey: kubernetes.io/hostname
containers: []
type: rw
+16
View File
@@ -0,0 +1,16 @@
---
apiVersion: v1
kind: ConfigMap
metadata:
name: api-env
namespace: artifactapi
data:
DBHOST: postgres-pooler
DBNAME: artifacts
DBPORT: "5432"
DBUSER: artifacts
MINIO_BUCKET: artifactapi-prod-k8s-syd1-au
MINIO_ENDPOINT: radosgw.service.consul
MINIO_SECURE: "true"
REDIS_URL: redis://redis:6379
SSL_CERT_FILE: /etc/ssl/combined/ca-certificates.crt
+37
View File
@@ -0,0 +1,37 @@
---
apiVersion: gateway.networking.k8s.io/v1
kind: Gateway
metadata:
labels:
traefik.io/instance: internal
annotations:
cert-manager.io/cluster-issuer: vault-issuer
cert-manager.io/common-name: artifactapi.k8s.syd1.au.unkin.net
cert-manager.io/private-key-size: "4096"
external-dns.alpha.kubernetes.io/hostname: artifactapi.k8s.syd1.au.unkin.net
external-dns.alpha.kubernetes.io/target: 198.18.200.4
name: artifactapi
namespace: artifactapi
spec:
gatewayClassName: traefik-internal
listeners:
- allowedRoutes:
namespaces:
from: Same
hostname: artifactapi.k8s.syd1.au.unkin.net
name: http
port: 80
protocol: HTTP
- allowedRoutes:
namespaces:
from: Same
hostname: artifactapi.k8s.syd1.au.unkin.net
name: https
port: 443
protocol: HTTPS
tls:
certificateRefs:
- group: ""
kind: Secret
name: artifactapi-tls
mode: Terminate
+59
View File
@@ -0,0 +1,59 @@
---
apiVersion: gateway.networking.k8s.io/v1
kind: HTTPRoute
metadata:
name: http-redirect
namespace: artifactapi
spec:
hostnames:
- artifactapi.k8s.syd1.au.unkin.net
parentRefs:
- group: gateway.networking.k8s.io
kind: Gateway
name: artifactapi
sectionName: http
rules:
- filters:
- type: RequestRedirect
requestRedirect:
scheme: https
statusCode: 301
matches:
- path:
type: PathPrefix
value: /
---
apiVersion: gateway.networking.k8s.io/v1
kind: HTTPRoute
metadata:
name: api-route
namespace: artifactapi
spec:
hostnames:
- artifactapi.k8s.syd1.au.unkin.net
parentRefs:
- group: gateway.networking.k8s.io
kind: Gateway
name: artifactapi
sectionName: https
rules:
- backendRefs:
- group: ""
kind: Service
name: ui
port: 80
weight: 1
matches:
- path:
type: PathPrefix
value: /ui
- backendRefs:
- group: ""
kind: Service
name: artifactapi
port: 80
weight: 1
matches:
- path:
type: PathPrefix
value: /
+19
View File
@@ -0,0 +1,19 @@
---
apiVersion: kustomize.config.k8s.io/v1beta1
kind: Kustomization
resources:
- api-deployment.yaml
- api-hpa.yaml
- configmap.yaml
- cnpg_cluster.yaml
- cnpg_pooler.yaml
- gateway.yaml
- httproute.yaml
- namespace.yaml
- redis-deployment.yaml
- services.yaml
- ui-deployment.yaml
- ui-hpa.yaml
- vaultauth.yaml
- vaultstaticsecret.yaml
+5
View File
@@ -0,0 +1,5 @@
---
apiVersion: v1
kind: Namespace
metadata:
name: artifactapi
@@ -0,0 +1,56 @@
---
apiVersion: apps/v1
kind: Deployment
metadata:
name: redis
namespace: artifactapi
spec:
replicas: 1
selector:
matchLabels:
app: redis
template:
metadata:
labels:
app: redis
spec:
containers:
- name: redis
image: redis:7-alpine
command:
- redis-server
- --save
- "20"
- "1"
ports:
- containerPort: 6379
name: redis
protocol: TCP
resources:
limits:
cpu: 500m
memory: 512Mi
requests:
cpu: 50m
memory: 128Mi
livenessProbe:
exec:
command:
- redis-cli
- ping
failureThreshold: 3
initialDelaySeconds: 30
periodSeconds: 30
successThreshold: 1
timeoutSeconds: 5
readinessProbe:
exec:
command:
- redis-cli
- ping
failureThreshold: 3
initialDelaySeconds: 5
periodSeconds: 10
successThreshold: 1
timeoutSeconds: 5
restartPolicy: Always
+51
View File
@@ -0,0 +1,51 @@
---
apiVersion: v1
kind: Service
metadata:
name: artifactapi
namespace: artifactapi
spec:
internalTrafficPolicy: Cluster
ports:
- name: http
port: 80
protocol: TCP
targetPort: http
selector:
app: api
sessionAffinity: None
type: ClusterIP
---
apiVersion: v1
kind: Service
metadata:
name: ui
namespace: artifactapi
spec:
internalTrafficPolicy: Cluster
ports:
- name: http
port: 80
protocol: TCP
targetPort: http
selector:
app: ui
sessionAffinity: None
type: ClusterIP
---
apiVersion: v1
kind: Service
metadata:
name: redis
namespace: artifactapi
spec:
internalTrafficPolicy: Cluster
ports:
- name: redis
port: 6379
protocol: TCP
targetPort: redis
selector:
app: redis
sessionAffinity: None
type: ClusterIP
+58
View File
@@ -0,0 +1,58 @@
---
apiVersion: apps/v1
kind: Deployment
metadata:
name: ui
namespace: artifactapi
annotations:
reloader.stakater.com/auto: "true"
spec:
selector:
matchLabels:
app: ui
strategy:
rollingUpdate:
maxUnavailable: 1
type: RollingUpdate
template:
metadata:
labels:
app: ui
spec:
automountServiceAccountToken: true
containers:
- name: ui
image: git.unkin.net/unkin/artifactapi-ui:v3.7.4
imagePullPolicy: IfNotPresent
ports:
- containerPort: 80
name: http
protocol: TCP
livenessProbe:
failureThreshold: 3
httpGet:
path: /ui
port: http
scheme: HTTP
initialDelaySeconds: 15
periodSeconds: 30
successThreshold: 1
timeoutSeconds: 5
readinessProbe:
failureThreshold: 3
httpGet:
path: /ui
port: http
scheme: HTTP
initialDelaySeconds: 5
periodSeconds: 5
successThreshold: 1
timeoutSeconds: 5
resources:
limits:
cpu: 500m
memory: 512Mi
requests:
cpu: 50m
memory: 128Mi
restartPolicy: Always
+41
View File
@@ -0,0 +1,41 @@
---
apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata:
name: ui-hpa
namespace: artifactapi
spec:
scaleTargetRef:
apiVersion: apps/v1
kind: Deployment
name: ui
minReplicas: 2
maxReplicas: 10
metrics:
- type: Resource
resource:
name: cpu
target:
type: Utilization
averageUtilization: 60
behavior:
scaleUp:
stabilizationWindowSeconds: 0
selectPolicy: Max
policies:
- type: Percent
value: 100
periodSeconds: 30
- type: Pods
value: 4
periodSeconds: 30
scaleDown:
stabilizationWindowSeconds: 300
selectPolicy: Min
policies:
- type: Percent
value: 10
periodSeconds: 60
- type: Pods
value: 2
periodSeconds: 60
+18
View File
@@ -0,0 +1,18 @@
---
apiVersion: secrets.hashicorp.com/v1beta1
kind: VaultAuth
metadata:
name: default
namespace: artifactapi
spec:
allowedNamespaces:
- artifactapi
kubernetes:
audiences:
- vault
role: default
serviceAccount: default
tokenExpirationSeconds: 600
method: kubernetes
mount: k8s/au/syd1
vaultConnectionRef: vso-system/default
@@ -0,0 +1,34 @@
---
apiVersion: secrets.hashicorp.com/v1beta1
kind: VaultStaticSecret
metadata:
name: postgres-credentials
namespace: artifactapi
spec:
destination:
create: true
name: postgres-credentials
overwrite: true
hmacSecretData: true
mount: kv
path: kubernetes/namespace/artifactapi/default/postgres-credentials
refreshAfter: 5m
type: kv-v2
vaultAuthRef: default
---
apiVersion: secrets.hashicorp.com/v1beta1
kind: VaultStaticSecret
metadata:
name: environment
namespace: artifactapi
spec:
destination:
create: true
name: environment
overwrite: true
hmacSecretData: true
mount: kv
path: kubernetes/namespace/artifactapi/default/environment
refreshAfter: 5m
type: kv-v2
vaultAuthRef: default
+91
View File
@@ -0,0 +1,91 @@
---
apiVersion: postgresql.cnpg.io/v1
kind: Cluster
metadata:
name: postgres
namespace: authentik
spec:
affinity:
podAntiAffinityType: preferred
bootstrap:
initdb:
database: authentik
encoding: UTF8
localeCType: C
localeCollate: C
owner: authentik
secret:
name: postgres-credentials
enablePDB: true
enableSuperuserAccess: false
failoverDelay: 0
imageName: ghcr.io/cloudnative-pg/postgresql:18.1-system-trixie
instances: 3
logLevel: info
maxSyncReplicas: 0
minSyncReplicas: 0
monitoring:
customQueriesConfigMap:
- key: queries
name: cnpg-default-monitoring
disableDefaultQueries: false
enablePodMonitor: false
postgresql:
parameters:
archive_mode: "on"
archive_timeout: 5min
dynamic_shared_memory_type: posix
effective_cache_size: 256MB
full_page_writes: "on"
log_destination: csvlog
log_directory: /controller/log
log_filename: postgres
log_rotation_age: "0"
log_rotation_size: "0"
log_truncate_on_rotation: "false"
logging_collector: "on"
max_connections: "200"
max_parallel_workers: "16"
max_replication_slots: "16"
max_worker_processes: "16"
shared_buffers: 128MB
shared_memory_type: mmap
ssl_max_protocol_version: TLSv1.3
ssl_min_protocol_version: TLSv1.3
wal_keep_size: 256MB
wal_level: logical
wal_log_hints: "on"
wal_receiver_timeout: 5s
wal_sender_timeout: 5s
syncReplicaElectionConstraint:
enabled: false
primaryUpdateMethod: restart
primaryUpdateStrategy: unsupervised
probes:
liveness:
isolationCheck:
connectionTimeout: 1000
enabled: true
requestTimeout: 1000
replicationSlots:
highAvailability:
enabled: true
slotPrefix: _cnpg_
synchronizeReplicas:
enabled: true
updateInterval: 30
resources:
limits:
cpu: 500m
memory: 512Mi
requests:
cpu: 250m
memory: 256Mi
smartShutdownTimeout: 180
startDelay: 3600
stopDelay: 1800
storage:
resizeInUseVolumes: true
size: 20Gi
storageClass: cephrbd-fast-delete
switchoverDelay: 3600
+66
View File
@@ -0,0 +1,66 @@
---
apiVersion: postgresql.cnpg.io/v1
kind: Pooler
metadata:
name: postgres-pooler-rw
namespace: authentik
spec:
cluster:
name: postgres
instances: 2
pgbouncer:
parameters:
default_pool_size: "100"
max_client_conn: "400"
paused: false
poolMode: session
template:
metadata:
labels:
app: pooler-rw
spec:
affinity:
podAntiAffinity:
requiredDuringSchedulingIgnoredDuringExecution:
- labelSelector:
matchExpressions:
- key: app
operator: In
values:
- pooler-rw
topologyKey: kubernetes.io/hostname
containers: []
type: rw
---
apiVersion: postgresql.cnpg.io/v1
kind: Pooler
metadata:
name: postgres-pooler-ro
namespace: authentik
spec:
cluster:
name: postgres
instances: 2
pgbouncer:
parameters:
default_pool_size: "100"
max_client_conn: "400"
paused: false
poolMode: session
template:
metadata:
labels:
app: pooler-ro
spec:
affinity:
podAntiAffinity:
requiredDuringSchedulingIgnoredDuringExecution:
- labelSelector:
matchExpressions:
- key: app
operator: In
values:
- pooler-ro
topologyKey: kubernetes.io/hostname
containers: []
type: ro
+57
View File
@@ -0,0 +1,57 @@
---
apiVersion: gateway.networking.k8s.io/v1
kind: Gateway
metadata:
labels:
traefik.io/instance: internal
annotations:
cert-manager.io/cluster-issuer: vault-issuer
cert-manager.io/common-name: identity.unkin.net
cert-manager.io/private-key-size: "4096"
external-dns.alpha.kubernetes.io/hostname: identity.unkin.net,identity.k8s.syd1.au.unkin.net
external-dns.alpha.kubernetes.io/target: 198.18.200.4
name: authentik
namespace: authentik
spec:
gatewayClassName: traefik-internal
listeners:
- allowedRoutes:
namespaces:
from: Same
hostname: identity.unkin.net
name: http
port: 80
protocol: HTTP
- allowedRoutes:
namespaces:
from: Same
hostname: identity.unkin.net
name: https
port: 443
protocol: HTTPS
tls:
certificateRefs:
- group: ""
kind: Secret
name: authentik-tls
mode: Terminate
- allowedRoutes:
namespaces:
from: Same
hostname: identity.k8s.syd1.au.unkin.net
name: http-internal
port: 80
protocol: HTTP
- allowedRoutes:
namespaces:
from: Same
hostname: identity.k8s.syd1.au.unkin.net
name: https-internal
port: 443
protocol: HTTPS
tls:
certificateRefs:
- group: ""
kind: Secret
name: authentik-tls
mode: Terminate
+59
View File
@@ -0,0 +1,59 @@
---
apiVersion: gateway.networking.k8s.io/v1
kind: HTTPRoute
metadata:
name: authentik-http-redirect
namespace: authentik
spec:
hostnames:
- identity.unkin.net
- identity.k8s.syd1.au.unkin.net
parentRefs:
- group: gateway.networking.k8s.io
kind: Gateway
name: authentik
sectionName: http
- group: gateway.networking.k8s.io
kind: Gateway
name: authentik
sectionName: http-internal
rules:
- filters:
- type: RequestRedirect
requestRedirect:
scheme: https
statusCode: 301
matches:
- path:
type: PathPrefix
value: /
---
apiVersion: gateway.networking.k8s.io/v1
kind: HTTPRoute
metadata:
name: authentik
namespace: authentik
spec:
hostnames:
- identity.unkin.net
- identity.k8s.syd1.au.unkin.net
parentRefs:
- group: gateway.networking.k8s.io
kind: Gateway
name: authentik
sectionName: https
- group: gateway.networking.k8s.io
kind: Gateway
name: authentik
sectionName: https-internal
rules:
- backendRefs:
- group: ""
kind: Service
name: authentik-server
port: 80
weight: 1
matches:
- path:
type: PathPrefix
value: /
+19
View File
@@ -0,0 +1,19 @@
---
apiVersion: kustomize.config.k8s.io/v1beta1
kind: Kustomization
resources:
- cnpg_cluster.yaml
- cnpg_pooler.yaml
- gateway.yaml
- httproute.yaml
- ldap-gateway.yaml
- ldap-httproute.yaml
- ldap-service.yaml
- ldap-tlsroute.yaml
- namespace.yaml
- redis-deployment.yaml
- redis-pvc.yaml
- redis-service.yaml
- vaultauth.yaml
- vaultstaticsecret.yaml
+47
View File
@@ -0,0 +1,47 @@
---
apiVersion: gateway.networking.k8s.io/v1
kind: Gateway
metadata:
labels:
traefik.io/instance: internal
annotations:
cert-manager.io/cluster-issuer: vault-issuer
cert-manager.io/common-name: ldap.k8s.syd1.au.unkin.net
cert-manager.io/private-key-size: "4096"
name: authentik-ldap
namespace: authentik
spec:
gatewayClassName: traefik-internal
listeners:
- allowedRoutes:
namespaces:
from: Same
hostname: ldap.k8s.syd1.au.unkin.net
name: ldaps-internal
port: 636
protocol: TLS
tls:
mode: Passthrough
- allowedRoutes:
namespaces:
from: Same
hostname: ldap.main.unkin.net
name: ldaps-main
port: 636
protocol: TLS
tls:
mode: Passthrough
- allowedRoutes:
namespaces:
from: Same
hostname: ldap.k8s.syd1.au.unkin.net
name: http-dns
port: 80
protocol: HTTP
- allowedRoutes:
namespaces:
from: Same
hostname: ldap.main.unkin.net
name: http-dns-main
port: 80
protocol: HTTP
+32
View File
@@ -0,0 +1,32 @@
---
apiVersion: gateway.networking.k8s.io/v1
kind: HTTPRoute
metadata:
name: authentik-ldap-dns
namespace: authentik
annotations:
external-dns.alpha.kubernetes.io/hostname: ldap.k8s.syd1.au.unkin.net,ldap.main.unkin.net
external-dns.alpha.kubernetes.io/target: 198.18.200.4
spec:
hostnames:
- ldap.k8s.syd1.au.unkin.net
- ldap.main.unkin.net
parentRefs:
- group: gateway.networking.k8s.io
kind: Gateway
name: authentik-ldap
sectionName: http-dns
- group: gateway.networking.k8s.io
kind: Gateway
name: authentik-ldap
sectionName: http-dns-main
rules:
- filters:
- type: RequestRedirect
requestRedirect:
scheme: https
statusCode: 301
matches:
- path:
type: PathPrefix
value: /
+18
View File
@@ -0,0 +1,18 @@
---
apiVersion: v1
kind: Service
metadata:
name: authentik-ldap
namespace: authentik
spec:
internalTrafficPolicy: Cluster
ports:
- name: ldaps
port: 6636
protocol: TCP
targetPort: 6636
selector:
app.kubernetes.io/name: authentik
app.kubernetes.io/component: ldap
sessionAffinity: None
type: ClusterIP
+26
View File
@@ -0,0 +1,26 @@
---
apiVersion: gateway.networking.k8s.io/v1
kind: TLSRoute
metadata:
name: authentik-ldaps
namespace: authentik
spec:
hostnames:
- ldap.k8s.syd1.au.unkin.net
- ldap.main.unkin.net
parentRefs:
- group: gateway.networking.k8s.io
kind: Gateway
name: authentik-ldap
sectionName: ldaps-internal
- group: gateway.networking.k8s.io
kind: Gateway
name: authentik-ldap
sectionName: ldaps-main
rules:
- backendRefs:
- group: ""
kind: Service
name: authentik-ldap
port: 6636
weight: 1
+5
View File
@@ -0,0 +1,5 @@
---
apiVersion: v1
kind: Namespace
metadata:
name: authentik
+58
View File
@@ -0,0 +1,58 @@
---
apiVersion: apps/v1
kind: Deployment
metadata:
name: redis
namespace: authentik
spec:
replicas: 1
strategy:
type: Recreate
selector:
matchLabels:
app: redis
template:
metadata:
labels:
app: redis
spec:
containers:
- name: redis
image: redis:7-alpine
imagePullPolicy: IfNotPresent
args:
- --save
- "20"
- "1"
ports:
- containerPort: 6379
name: redis
protocol: TCP
livenessProbe:
exec:
command:
- redis-cli
- ping
initialDelaySeconds: 5
periodSeconds: 10
readinessProbe:
exec:
command:
- redis-cli
- ping
initialDelaySeconds: 5
periodSeconds: 10
resources:
limits:
cpu: 500m
memory: 512Mi
requests:
cpu: 50m
memory: 128Mi
volumeMounts:
- mountPath: /data
name: redis-data
volumes:
- name: redis-data
persistentVolumeClaim:
claimName: redis-data
+13
View File
@@ -0,0 +1,13 @@
---
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
name: redis-data
namespace: authentik
spec:
accessModes:
- ReadWriteOnce
resources:
requests:
storage: 5Gi
storageClassName: cephrbd-fast-delete
+17
View File
@@ -0,0 +1,17 @@
---
apiVersion: v1
kind: Service
metadata:
name: redis
namespace: authentik
spec:
internalTrafficPolicy: Cluster
ports:
- name: redis
port: 6379
protocol: TCP
targetPort: redis
selector:
app: redis
sessionAffinity: None
type: ClusterIP
+18
View File
@@ -0,0 +1,18 @@
---
apiVersion: secrets.hashicorp.com/v1beta1
kind: VaultAuth
metadata:
name: default
namespace: authentik
spec:
allowedNamespaces:
- authentik
kubernetes:
audiences:
- vault
role: default
serviceAccount: default
tokenExpirationSeconds: 600
method: kubernetes
mount: k8s/au/syd1
vaultConnectionRef: vso-system/default
@@ -0,0 +1,51 @@
---
apiVersion: secrets.hashicorp.com/v1beta1
kind: VaultStaticSecret
metadata:
name: postgres-credentials
namespace: authentik
spec:
destination:
create: true
name: postgres-credentials
overwrite: true
hmacSecretData: true
mount: kv
path: kubernetes/namespace/authentik/default/postgres-credentials
refreshAfter: 5m
type: kv-v2
vaultAuthRef: default
---
apiVersion: secrets.hashicorp.com/v1beta1
kind: VaultStaticSecret
metadata:
name: authentik-credentials
namespace: authentik
spec:
destination:
create: true
name: authentik-credentials
overwrite: true
hmacSecretData: true
mount: kv
path: kubernetes/namespace/authentik/default/authentik-credentials
refreshAfter: 5m
type: kv-v2
vaultAuthRef: default
---
apiVersion: secrets.hashicorp.com/v1beta1
kind: VaultStaticSecret
metadata:
name: s3-credentials
namespace: authentik
spec:
destination:
create: true
name: s3-credentials
overwrite: true
hmacSecretData: true
mount: kv
path: kubernetes/namespace/authentik/default/s3-credentials
refreshAfter: 5m
type: kv-v2
vaultAuthRef: default
+57
View File
@@ -0,0 +1,57 @@
---
apiVersion: apps/v1
kind: Deployment
metadata:
name: bind-operator
namespace: bind-system
labels:
app.kubernetes.io/name: bind-operator
spec:
replicas: 1
selector:
matchLabels:
app.kubernetes.io/name: bind-operator
template:
metadata:
labels:
app.kubernetes.io/name: bind-operator
spec:
serviceAccountName: bind-operator
securityContext:
runAsNonRoot: true
containers:
- name: operator
image: git.unkin.net/unkin/bind-operator:v0.1.1
args:
- --metrics-bind-address=:8080
- --health-probe-bind-address=:8081
- --leader-elect
ports:
- containerPort: 8080
name: metrics
- containerPort: 8081
name: health
readinessProbe:
httpGet:
path: /readyz
port: 8081
initialDelaySeconds: 5
periodSeconds: 10
livenessProbe:
httpGet:
path: /healthz
port: 8081
initialDelaySeconds: 15
periodSeconds: 20
securityContext:
allowPrivilegeEscalation: false
readOnlyRootFilesystem: true
capabilities:
drop: ["ALL"]
resources:
requests:
cpu: 50m
memory: 64Mi
limits:
cpu: 500m
memory: 256Mi
+11
View File
@@ -0,0 +1,11 @@
---
apiVersion: kustomize.config.k8s.io/v1beta1
kind: Kustomization
resources:
- namespace.yaml
# CRDs are pulled from the bind-operator repo at the matching tag rather than
# vendored here, so they never drift from the operator.
- https://git.unkin.net/unkin/bind-operator/raw/tag/v0.1.1/config/crd/install.yaml
- rbac.yaml
- deployment.yaml
+5
View File
@@ -0,0 +1,5 @@
---
apiVersion: v1
kind: Namespace
metadata:
name: bind-system
+46
View File
@@ -0,0 +1,46 @@
---
apiVersion: v1
kind: ServiceAccount
metadata:
name: bind-operator
namespace: bind-system
---
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRole
metadata:
name: bind-operator
rules:
- apiGroups: ["bind.unkin.net"]
resources: ["*"]
verbs: ["*"]
- apiGroups: [""]
resources: ["services", "configmaps", "secrets"]
verbs: ["get", "list", "watch", "create", "update", "patch", "delete"]
- apiGroups: [""]
resources: ["pods"]
verbs: ["get", "list", "watch"]
- apiGroups: [""]
resources: ["pods/exec"]
verbs: ["create", "get"]
- apiGroups: ["apps"]
resources: ["statefulsets"]
verbs: ["get", "list", "watch", "create", "update", "patch", "delete"]
- apiGroups: [""]
resources: ["events"]
verbs: ["create", "patch"]
- apiGroups: ["coordination.k8s.io"]
resources: ["leases"]
verbs: ["get", "list", "watch", "create", "update", "patch", "delete"]
---
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRoleBinding
metadata:
name: bind-operator
subjects:
- kind: ServiceAccount
name: bind-operator
namespace: bind-system
roleRef:
apiGroup: rbac.authorization.k8s.io
kind: ClusterRole
name: bind-operator
+37
View File
@@ -0,0 +1,37 @@
---
apiVersion: gateway.networking.k8s.io/v1
kind: Gateway
metadata:
labels:
traefik.io/instance: internal
annotations:
cert-manager.io/cluster-issuer: vault-issuer
cert-manager.io/common-name: rancher.k8s.syd1.au.unkin.net
cert-manager.io/private-key-size: "4096"
external-dns.alpha.kubernetes.io/hostname: rancher.k8s.syd1.au.unkin.net
external-dns.alpha.kubernetes.io/target: "198.18.200.4"
name: rancher
namespace: cattle-system
spec:
gatewayClassName: traefik-internal
listeners:
- allowedRoutes:
namespaces:
from: Same
hostname: rancher.k8s.syd1.au.unkin.net
name: http
port: 80
protocol: HTTP
- allowedRoutes:
namespaces:
from: Same
hostname: rancher.k8s.syd1.au.unkin.net
name: https
port: 443
protocol: HTTPS
tls:
certificateRefs:
- group: ""
kind: Secret
name: rancher-tls
mode: Terminate
+49
View File
@@ -0,0 +1,49 @@
---
apiVersion: gateway.networking.k8s.io/v1
kind: HTTPRoute
metadata:
name: rancher-http-redirect
namespace: cattle-system
spec:
hostnames:
- rancher.k8s.syd1.au.unkin.net
parentRefs:
- group: gateway.networking.k8s.io
kind: Gateway
name: rancher
sectionName: http
rules:
- filters:
- type: RequestRedirect
requestRedirect:
scheme: https
statusCode: 301
matches:
- path:
type: PathPrefix
value: /
---
apiVersion: gateway.networking.k8s.io/v1
kind: HTTPRoute
metadata:
name: rancher
namespace: cattle-system
spec:
hostnames:
- rancher.k8s.syd1.au.unkin.net
parentRefs:
- group: gateway.networking.k8s.io
kind: Gateway
name: rancher
sectionName: https
rules:
- backendRefs:
- group: ""
kind: Service
name: rancher
port: 80
weight: 1
matches:
- path:
type: PathPrefix
value: /
@@ -0,0 +1,10 @@
---
apiVersion: kustomize.config.k8s.io/v1beta1
kind: Kustomization
resources:
- namespace.yaml
- vaultauth.yaml
- vaultstaticsecret.yaml
- gateway.yaml
- httproute.yaml
+5
View File
@@ -0,0 +1,5 @@
---
apiVersion: v1
kind: Namespace
metadata:
name: cattle-system
+18
View File
@@ -0,0 +1,18 @@
---
apiVersion: secrets.hashicorp.com/v1beta1
kind: VaultAuth
metadata:
name: rancher
namespace: cattle-system
spec:
method: kubernetes
mount: k8s/au/syd1
vaultConnectionRef: vso-system/default
allowedNamespaces:
- cattle-system
kubernetes:
role: rancher
serviceAccount: rancher
audiences:
- vault
tokenExpirationSeconds: 600
@@ -0,0 +1,15 @@
---
apiVersion: secrets.hashicorp.com/v1beta1
kind: VaultStaticSecret
metadata:
name: rancher-bootstrap-secret
namespace: cattle-system
spec:
vaultAuthRef: rancher
mount: kv
type: kv-v2
path: service/kubernetes/au/syd1/rancher/bootstrap-password
refreshAfter: 5m
destination:
name: rancher-bootstrap-secret
create: true
+12
View File
@@ -0,0 +1,12 @@
---
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRole
metadata:
name: cert-manager-vault-token-creator
labels:
app.kubernetes.io/name: "cert-manager-config"
app.kubernetes.io/instance: "cert-manager-config"
rules:
- apiGroups: [""]
resources: ["serviceaccounts/token"]
verbs: ["create"]
@@ -0,0 +1,16 @@
---
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRoleBinding
metadata:
name: cert-manager-vault-token-creator
labels:
app.kubernetes.io/name: "cert-manager-config"
app.kubernetes.io/instance: "cert-manager-config"
roleRef:
apiGroup: rbac.authorization.k8s.io
kind: ClusterRole
name: cert-manager-vault-token-creator
subjects:
- kind: ServiceAccount
name: cert-manager
namespace: cert-manager
@@ -0,0 +1,9 @@
---
apiVersion: kustomize.config.k8s.io/v1beta1
kind: Kustomization
resources:
- namespace.yaml
- serviceaccount.yaml
- clusterrole.yaml
- clusterrolebinding.yaml
+5
View File
@@ -0,0 +1,5 @@
---
apiVersion: v1
kind: Namespace
metadata:
name: cert-manager
@@ -0,0 +1,11 @@
---
apiVersion: v1
kind: ServiceAccount
metadata:
name: vault-issuer
namespace: cert-manager
labels:
app.kubernetes.io/name: "cert-manager-config"
app.kubernetes.io/instance: "cert-manager-config"
app.kubernetes.io/component: "vault-issuer"
automountServiceAccountToken: true
@@ -0,0 +1,7 @@
---
apiVersion: kustomize.config.k8s.io/v1beta1
kind: Kustomization
resources:
- namespace.yaml
- vault-ca-cert.yaml
+5
View File
@@ -0,0 +1,5 @@
---
apiVersion: v1
kind: Namespace
metadata:
name: certificates
+59
View File
@@ -0,0 +1,59 @@
---
apiVersion: v1
kind: Secret
metadata:
name: vault-ca-cert
namespace: certificates
labels:
app.kubernetes.io/name: vault-ca-cert
app.kubernetes.io/part-of: vault-secrets-operator
annotations:
description: "Vault CA certificate replicated to all namespaces"
reflector.v1.k8s.emberstack.com/reflection-allowed: "true"
reflector.v1.k8s.emberstack.com/reflection-auto-enabled: "true"
reflector.v1.k8s.emberstack.com/reflection-allowed-namespaces: ""
type: Opaque
stringData:
ca.crt: |
-----BEGIN CERTIFICATE-----
MIIDujCCAqKgAwIBAgIULZAR/QcvAnxdi04S6bXhNeazozYwDQYJKoZIhvcNAQEL
BQAwFDESMBAGA1UEAxMJdW5raW4ubmV0MB4XDTI0MDQyNzExMzcyMloXDTI5MDQy
NjExMzc1MlowKzEpMCcGA1UEAxMgdW5raW4ubmV0IEludGVybWVkaWF0ZSBBdXRo
b3JpdHkwggEiMA0GCSqGSIb3DQEBAQUAA4IBDwAwggEKAoIBAQDDq0ZU2DnuYW5s
E3lPjVe2Ns6cPu64yx1GLVqB5VbOUs71ThRjPjvEwE98YtGMza8ok0CQSqS2qX8z
vnMbnVCaWKjCnem/dtQtB+8WCu5uQuNHhwqxgw1tD/klAkVLWGgTPDEgasvjDMkc
sW8in/BhtrV9YA/lQGpge+j9/MFXhlnvaLCPybFifPRX9Yc5CcnhSzLSzFPO4PJx
VH4Qu9eByyKHMTvgcCy6p9qjjzz+8dtAlxeIsgfTEdvtfCPowsF+v2XooutTsJt0
xUDvUDu4xV6tVCEOYRA2cZHkLRBhV289M0hocHrsGqMmA1+j0skwwt/6UkVHqlCT
mitItX+RAgMBAAGjgewwgekwDgYDVR0PAQH/BAQDAgEGMA8GA1UdEwEB/wQFMAMB
Af8wHQYDVR0OBBYEFEp/+grAdVqRSeb9xJjSeZYNW32MMB8GA1UdIwQYMBaAFBqc
v6Y+hfHt4EjgKa/uoQGEHTknMEcGCCsGAQUFBwEBBDswOTA3BggrBgEFBQcwAoYr
aHR0cHM6Ly92YXVsdC5zZXJ2aWNlLmNvbnN1bC92MS9wa2lfcm9vdC9jYTA9BgNV
HR8ENjA0MDKgMKAuhixodHRwczovL3ZhdWx0LnNlcnZpY2UuY29uc3VsL3YxL3Br
aV9yb290L2NybDANBgkqhkiG9w0BAQsFAAOCAQEAM0FS8tscZe7yly/gM7jO6lx5
muMFusifjUIrcQGnZBkoECeuUVPNTs3e/Th+XaxjCnmSpqSNT3z9Irr6Hhxf7n03
4+hpF3G0bf1yh4DRex/0ua3szvgo91RwyKVQM1BHIA1PwdF8csO+LT4FTMILzo4U
DdSVvDEIaxYYQCDNfAD81n+8lmFbabupfsKbkSTR+sNTS+TMnLpN8YwSXdB0e+RU
eEZRNVu0jKmbE8U/66Sc33YLe6cxbCclHA+G4giGwEP+lYZk+rFjmr6ci9bj5yyN
Sznr7xdW0ofOdACAQFFy5KTZqCDjIrvk12vUn4bSsXmWVIQEd+jPx6wuxD/rSw==
-----END CERTIFICATE-----
-----BEGIN CERTIFICATE-----
MIIDLzCCAhegAwIBAgIUIDADwsHIrQ8dfncpechBdIUCQdIwDQYJKoZIhvcNAQEL
BQAwFDESMBAGA1UEAxMJdW5raW4ubmV0MB4XDTI0MDQyNzExMjcwMloXDTM0MDQy
NTExMjczMlowFDESMBAGA1UEAxMJdW5raW4ubmV0MIIBIjANBgkqhkiG9w0BAQEF
AAOCAQ8AMIIBCgKCAQEA3ENPv7R7gCUJAg8Q4hB2LEZSdvbK155YbcrguLDDnu6m
2fkJn8jYMMW3Z6/+Y04ouGwi6sKup8ggTb217sY+dC4IUZjotDPAhruxfXVQAh0v
Yr3RYoxVDrm4nRSFLo1RA4Qt+1KK299mHGQf9iAiwbsFp5mDrJT9uz15FE2uWmbK
8/onMyJC4fnkMihVN6NIgTtjpHYNm5aAJwxoWldTopgF0ucb7X3XVPNbKAmd3Avd
lsOo6m751zSZ0HvJOxgRSy7lvPzMuUfCQsOcmI4O4+Z2FL4Y7p+T9DvWkciC7L3i
tBiK30fPfGKNpWaof1ONCcPQNjMwWcEFXqSiWUOXkwIDAQABo3kwdzAOBgNVHQ8B
Af8EBAMCAQYwDwYDVR0TAQH/BAUwAwEB/zAdBgNVHQ4EFgQUGpy/pj6F8e3gSOAp
r+6hAYQdOScwHwYDVR0jBBgwFoAUGpy/pj6F8e3gSOApr+6hAYQdOScwFAYDVR0R
BA0wC4IJdW5raW4ubmV0MA0GCSqGSIb3DQEBCwUAA4IBAQA5xocILzuvD+R2Iub1
UnTdcVpgNcxJmESz0eX4UrkcBmddtuFINXvDTv5//XTFs78LsVVSf00xZ+2C62Xe
xRdCdluHN8VDCAKulP4XJY1BiZ7im0v+iMgPDKhq4OXb86WFYI/8J6uRm7oIAwj1
zhhKxMimkzli+yHB8ipL15W7l68CMUgmOjFA+EG6sbfadFpQTX/h6TVj3FQPkU/p
UJEm2XjlGNAKGJrNRU47PM4vRDv5Joyowp9zv/pHFXvUJladaJupMKRJQVWQz1US
EXE67rawG79s3vm8dDolnbli/IhPHtjDRIprxAwrMs5tt9cY0xsRkFBZVcAOjrpb
4gqd
-----END CERTIFICATE-----
+5
View File
@@ -0,0 +1,5 @@
---
apiVersion: v1
kind: Namespace
metadata:
name: cnpg-system
+53
View File
@@ -0,0 +1,53 @@
---
apiVersion: gateway.networking.k8s.io/v1
kind: Gateway
metadata:
name: consul
namespace: consul
labels:
app.kubernetes.io/name: consul
app.kubernetes.io/instance: consul
traefik.io/instance: internal
annotations:
cert-manager.io/cluster-issuer: vault-issuer
cert-manager.io/common-name: consul.k8s.syd1.au.unkin.net
cert-manager.io/private-key-size: "4096"
cert-manager.io/alt-names: consul.service.consul
external-dns.alpha.kubernetes.io/hostname: consul.k8s.syd1.au.unkin.net
external-dns.alpha.kubernetes.io/target: 198.18.200.4
spec:
gatewayClassName: traefik-internal
listeners:
- name: http
port: 80
protocol: HTTP
hostname: consul.k8s.syd1.au.unkin.net
allowedRoutes:
namespaces:
from: Same
- name: https
port: 443
protocol: HTTPS
hostname: consul.k8s.syd1.au.unkin.net
allowedRoutes:
namespaces:
from: Same
tls:
mode: Terminate
certificateRefs:
- group: ""
kind: Secret
name: consul-tls
- name: consul-svc
port: 443
protocol: HTTPS
hostname: consul.service.consul
allowedRoutes:
namespaces:
from: Same
tls:
mode: Terminate
certificateRefs:
- group: ""
kind: Secret
name: consul-tls
+83
View File
@@ -0,0 +1,83 @@
---
apiVersion: gateway.networking.k8s.io/v1
kind: HTTPRoute
metadata:
name: consul-http-redirect
namespace: consul
labels:
app.kubernetes.io/name: consul
app.kubernetes.io/instance: consul
spec:
hostnames:
- consul.k8s.syd1.au.unkin.net
parentRefs:
- group: gateway.networking.k8s.io
kind: Gateway
name: consul
sectionName: http
rules:
- filters:
- type: RequestRedirect
requestRedirect:
scheme: https
statusCode: 301
matches:
- path:
type: PathPrefix
value: /
---
apiVersion: gateway.networking.k8s.io/v1
kind: HTTPRoute
metadata:
name: consul
namespace: consul
labels:
app.kubernetes.io/name: consul
app.kubernetes.io/instance: consul
spec:
hostnames:
- consul.k8s.syd1.au.unkin.net
parentRefs:
- group: gateway.networking.k8s.io
kind: Gateway
name: consul
sectionName: https
rules:
- backendRefs:
- group: ""
kind: Service
name: consul-ui
port: 80
weight: 1
matches:
- path:
type: PathPrefix
value: /
---
apiVersion: gateway.networking.k8s.io/v1
kind: HTTPRoute
metadata:
name: consul-svc
namespace: consul
labels:
app.kubernetes.io/name: consul
app.kubernetes.io/instance: consul
spec:
hostnames:
- consul.service.consul
parentRefs:
- group: gateway.networking.k8s.io
kind: Gateway
name: consul
sectionName: consul-svc
rules:
- backendRefs:
- group: ""
kind: Service
name: consul-ui
port: 80
weight: 1
matches:
- path:
type: PathPrefix
value: /
+8
View File
@@ -0,0 +1,8 @@
---
apiVersion: kustomize.config.k8s.io/v1beta1
kind: Kustomization
resources:
- namespace.yaml
- gateway.yaml
- httproute.yaml
+5
View File
@@ -0,0 +1,5 @@
---
apiVersion: v1
kind: Namespace
metadata:
name: consul
+9
View File
@@ -0,0 +1,9 @@
---
apiVersion: kustomize.config.k8s.io/v1beta1
kind: Kustomization
resources:
- namespace.yaml
- vaultauth.yaml
- vaultstaticsecret.yaml
- storageclass.yaml
+5
View File
@@ -0,0 +1,5 @@
---
apiVersion: v1
kind: Namespace
metadata:
name: csi-cephfs
+83
View File
@@ -0,0 +1,83 @@
---
apiVersion: storage.k8s.io/v1
kind: StorageClass
metadata:
name: cephfs-raid6-delete
provisioner: cephfs.csi.ceph.com
reclaimPolicy: Delete
allowVolumeExpansion: true
parameters:
clusterID: "cephfs_csi_ssd_ec_6_2"
fsName: "cephfs"
subVolumeGroup: csi_ssd_ec_6_2
csi.storage.k8s.io/provisioner-secret-name: "csi-cephfs-secret"
csi.storage.k8s.io/provisioner-secret-namespace: "csi-cephfs"
csi.storage.k8s.io/controller-expand-secret-name: "csi-cephfs-secret"
csi.storage.k8s.io/controller-expand-secret-namespace: "csi-cephfs"
csi.storage.k8s.io/node-stage-secret-name: "csi-cephfs-secret"
csi.storage.k8s.io/node-stage-secret-namespace: "csi-cephfs"
csi.storage.k8s.io/controller-publish-secret-name: "csi-cephfs-secret"
csi.storage.k8s.io/controller-publish-secret-namespace: "csi-cephfs"
---
apiVersion: storage.k8s.io/v1
kind: StorageClass
metadata:
name: cephfs-raid6-retain
provisioner: cephfs.csi.ceph.com
reclaimPolicy: Retain
allowVolumeExpansion: true
parameters:
clusterID: "cephfs_csi_ssd_ec_6_2"
fsName: "cephfs"
subVolumeGroup: csi_ssd_ec_6_2
csi.storage.k8s.io/provisioner-secret-name: "csi-cephfs-secret"
csi.storage.k8s.io/provisioner-secret-namespace: "csi-cephfs"
csi.storage.k8s.io/controller-expand-secret-name: "csi-cephfs-secret"
csi.storage.k8s.io/controller-expand-secret-namespace: "csi-cephfs"
csi.storage.k8s.io/node-stage-secret-name: "csi-cephfs-secret"
csi.storage.k8s.io/node-stage-secret-namespace: "csi-cephfs"
csi.storage.k8s.io/controller-publish-secret-name: "csi-cephfs-secret"
csi.storage.k8s.io/controller-publish-secret-namespace: "csi-cephfs"
---
apiVersion: storage.k8s.io/v1
kind: StorageClass
metadata:
name: cephfs-raid5-delete
provisioner: cephfs.csi.ceph.com
reclaimPolicy: Delete
allowVolumeExpansion: true
parameters:
clusterID: "cephfs_csi_ssd_ec_4_1"
fsName: "cephfs"
subVolumeGroup: csi_ssd_ec_4_1
csi.storage.k8s.io/provisioner-secret-name: "csi-cephfs-secret"
csi.storage.k8s.io/provisioner-secret-namespace: "csi-cephfs"
csi.storage.k8s.io/controller-expand-secret-name: "csi-cephfs-secret"
csi.storage.k8s.io/controller-expand-secret-namespace: "csi-cephfs"
csi.storage.k8s.io/node-stage-secret-name: "csi-cephfs-secret"
csi.storage.k8s.io/node-stage-secret-namespace: "csi-cephfs"
csi.storage.k8s.io/controller-publish-secret-name: "csi-cephfs-secret"
csi.storage.k8s.io/controller-publish-secret-namespace: "csi-cephfs"
---
apiVersion: storage.k8s.io/v1
kind: StorageClass
metadata:
name: cephfs-raid5-retain
provisioner: cephfs.csi.ceph.com
reclaimPolicy: Retain
allowVolumeExpansion: true
parameters:
clusterID: "cephfs_csi_ssd_ec_4_1"
fsName: "cephfs"
subVolumeGroup: csi_ssd_ec_4_1
csi.storage.k8s.io/provisioner-secret-name: "csi-cephfs-secret"
csi.storage.k8s.io/provisioner-secret-namespace: "csi-cephfs"
csi.storage.k8s.io/controller-expand-secret-name: "csi-cephfs-secret"
csi.storage.k8s.io/controller-expand-secret-namespace: "csi-cephfs"
csi.storage.k8s.io/node-stage-secret-name: "csi-cephfs-secret"
csi.storage.k8s.io/node-stage-secret-namespace: "csi-cephfs"
csi.storage.k8s.io/controller-publish-secret-name: "csi-cephfs-secret"
csi.storage.k8s.io/controller-publish-secret-namespace: "csi-cephfs"
+18
View File
@@ -0,0 +1,18 @@
---
apiVersion: secrets.hashicorp.com/v1beta1
kind: VaultAuth
metadata:
name: ceph-csi-cephfs
namespace: csi-cephfs
spec:
method: kubernetes
mount: k8s/au/syd1
vaultConnectionRef: vso-system/default
allowedNamespaces:
- csi-cephfs
kubernetes:
role: ceph-csi
serviceAccount: ceph-csi-cephfs-csi-cephfs-provisioner
audiences:
- vault
tokenExpirationSeconds: 600
@@ -0,0 +1,15 @@
---
apiVersion: secrets.hashicorp.com/v1beta1
kind: VaultStaticSecret
metadata:
name: csi-cephfs-secret
namespace: csi-cephfs
spec:
vaultAuthRef: ceph-csi-cephfs
mount: kv
type: kv-v2
path: service/kubernetes/au/syd1/csi/ceph-cephfs-secret
refreshAfter: 5m
destination:
name: csi-cephfs-secret
create: true
+9
View File
@@ -0,0 +1,9 @@
---
apiVersion: kustomize.config.k8s.io/v1beta1
kind: Kustomization
resources:
- namespace.yaml
- vaultauth.yaml
- vaultstaticsecret.yaml
- storageclass.yaml
+5
View File
@@ -0,0 +1,5 @@
---
apiVersion: v1
kind: Namespace
metadata:
name: csi-cephrbd
+39
View File
@@ -0,0 +1,39 @@
---
apiVersion: storage.k8s.io/v1
kind: StorageClass
metadata:
name: cephrbd-fast-delete
annotations:
storageclass.kubernetes.io/is-default-class: "true"
provisioner: rbd.csi.ceph.com
reclaimPolicy: Delete
allowVolumeExpansion: true
parameters:
clusterID: "de96a98f-3d23-465a-a899-86d3d67edab8"
pool: "kubernetes"
imageFeatures: "layering"
csi.storage.k8s.io/provisioner-secret-name: "csi-rbd-secret"
csi.storage.k8s.io/provisioner-secret-namespace: "csi-cephrbd"
csi.storage.k8s.io/controller-expand-secret-name: "csi-rbd-secret"
csi.storage.k8s.io/controller-expand-secret-namespace: "csi-cephrbd"
csi.storage.k8s.io/node-stage-secret-name: "csi-rbd-secret"
csi.storage.k8s.io/node-stage-secret-namespace: "csi-cephrbd"
---
apiVersion: storage.k8s.io/v1
kind: StorageClass
metadata:
name: cephrbd-fast-retain
provisioner: rbd.csi.ceph.com
reclaimPolicy: Retain
allowVolumeExpansion: true
parameters:
clusterID: "de96a98f-3d23-465a-a899-86d3d67edab8"
pool: "kubernetes"
imageFeatures: "layering"
csi.storage.k8s.io/provisioner-secret-name: "csi-rbd-secret"
csi.storage.k8s.io/provisioner-secret-namespace: "csi-cephrbd"
csi.storage.k8s.io/controller-expand-secret-name: "csi-rbd-secret"
csi.storage.k8s.io/controller-expand-secret-namespace: "csi-cephrbd"
csi.storage.k8s.io/node-stage-secret-name: "csi-rbd-secret"
csi.storage.k8s.io/node-stage-secret-namespace: "csi-cephrbd"
+18
View File
@@ -0,0 +1,18 @@
---
apiVersion: secrets.hashicorp.com/v1beta1
kind: VaultAuth
metadata:
name: ceph-csi-rbd
namespace: csi-cephrbd
spec:
method: kubernetes
mount: k8s/au/syd1
vaultConnectionRef: vso-system/default
allowedNamespaces:
- csi-cephrbd
kubernetes:
role: ceph-csi
serviceAccount: ceph-csi-rbd-csi-rbd-provisioner
audiences:
- vault
tokenExpirationSeconds: 600
@@ -0,0 +1,15 @@
---
apiVersion: secrets.hashicorp.com/v1beta1
kind: VaultStaticSecret
metadata:
name: csi-rbd-secret
namespace: csi-cephrbd
spec:
vaultAuthRef: ceph-csi-rbd
mount: kv
type: kv-v2
path: service/kubernetes/au/syd1/csi/ceph-rbd-secret
refreshAfter: 5m
destination:
name: csi-rbd-secret
create: true
@@ -0,0 +1,6 @@
---
apiVersion: kustomize.config.k8s.io/v1beta1
kind: Kustomization
resources:
- namespace.yaml
+7
View File
@@ -0,0 +1,7 @@
---
apiVersion: v1
kind: Namespace
metadata:
labels:
app.kubernetes.io/name: elastic-system
name: elastic-system
+8
View File
@@ -0,0 +1,8 @@
---
apiVersion: kustomize.config.k8s.io/v1beta1
kind: Kustomization
resources:
- namespace.yaml
- vaultauth.yaml
- vaultstaticsecret.yaml
+5
View File
@@ -0,0 +1,5 @@
---
apiVersion: v1
kind: Namespace
metadata:
name: externaldns
+18
View File
@@ -0,0 +1,18 @@
---
apiVersion: secrets.hashicorp.com/v1beta1
kind: VaultAuth
metadata:
name: default
namespace: externaldns
spec:
method: kubernetes
mount: k8s/au/syd1
vaultConnectionRef: vso-system/default
allowedNamespaces:
- externaldns
kubernetes:
role: externaldns
serviceAccount: externaldns
audiences:
- vault
tokenExpirationSeconds: 600
@@ -0,0 +1,18 @@
---
apiVersion: secrets.hashicorp.com/v1beta1
kind: VaultStaticSecret
metadata:
name: externaldns-tsig
namespace: externaldns
spec:
vaultAuthRef: default
mount: kv
type: kv-v2
path: service/kubernetes/au/syd1/externaldns/tsig
refreshAfter: 5m
destination:
name: externaldns-tsig
create: true
rolloutRestartTargets:
- kind: Deployment
name: externaldns
@@ -0,0 +1,19 @@
---
apiVersion: kustomize.config.k8s.io/v1beta1
kind: Kustomization
resources:
- namespace.yaml
helmCharts:
- name: intel-device-plugins-operator
repo: https://artifactapi.k8s.syd1.au.unkin.net/api/v1/virtual/helm
version: "0.35.0"
releaseName: intel-device-plugins-operator
namespace: inteldeviceplugins-system
- name: intel-device-plugins-gpu
repo: https://artifactapi.k8s.syd1.au.unkin.net/api/v1/virtual/helm
version: "0.34.1"
releaseName: intel-gpu-plugin
namespace: inteldeviceplugins-system
valuesFile: values-gpu-plugin.yaml
@@ -0,0 +1,5 @@
---
apiVersion: v1
kind: Namespace
metadata:
name: inteldeviceplugins-system
@@ -0,0 +1,13 @@
---
name: intel-gpu-device-plugin
sharedDevNum: 4
logLevel: 2
enableMonitoring: true
allocationPolicy: "none"
image:
hub: intel
tag: "" # Use latest from chart
nodeSelector:
intel.feature.node.kubernetes.io/gpu: 'true'
nodeFeatureRule: true
tolerations: []
+51
View File
@@ -0,0 +1,51 @@
# kanidm
Three-replica kanidm identity server with Vault-managed replication certificates.
## Architecture
- Per-pod `server-N.toml` in `resources/` — each has its own replication origin hardcoded
- `config-init` busybox init container copies the right config and injects peer certs from the
vault-synced `kanidm-repl-certs` Secret at pod startup
- `reloader.stakater.com/auto: "true"` triggers a rolling restart when the ConfigMap or Secret changes
- Vault path: `kv/kubernetes/namespace/kanidm/default/repl-certs`
- Keys: `kanidm-0`, `kanidm-1`, `kanidm-2` — each holds that pod's replication certificate
## Initial setup
After the first pod starts, generate the admin credentials:
```bash
kubectl exec -n kanidm kanidm-0 -- /sbin/kanidmd recover-account -c /config/server.toml admin
kubectl exec -n kanidm kanidm-0 -- /sbin/kanidmd recover-account -c /config/server.toml idm_admin
```
## Replication certificate rotation
When certs need to be renewed, update vault and reloader will roll the StatefulSet:
```bash
# Get new cert from a pod
kubectl exec -it -n kanidm kanidm-N -- /sbin/kanidmd renew-replication-certificate -c /config/server.toml
# Write updated cert to vault (reloader triggers restart automatically)
vault kv patch kv/kubernetes/namespace/kanidm/default/repl-certs "kanidm-N=<cert>"
```
## Resolving domain UUID mismatch
If pods initialized independently (each with a different domain UUID), replication will fail with
`Consumer Domain UUID does not match`. Fix by resetting kanidm-1 and kanidm-2 to sync from
kanidm-0 (the authoritative node):
```bash
# Scale down to avoid split-brain during reset
kubectl scale statefulset -n kanidm kanidm --replicas=1
# Delete the stale PVCs for the replica pods
kubectl delete pvc -n kanidm data-kanidm-1 data-kanidm-2
# Scale back up — replicas start with empty DBs and automatic_refresh=true
# will trigger a full sync from kanidm-0 once TLS peer certs are verified
kubectl scale statefulset -n kanidm kanidm --replicas=3
```
+26
View File
@@ -0,0 +1,26 @@
---
apiVersion: cert-manager.io/v1
kind: Certificate
metadata:
name: kanidm-tls
namespace: kanidm
labels:
app.kubernetes.io/name: kanidm
app.kubernetes.io/instance: kanidm
spec:
secretName: kanidm-tls
issuerRef:
kind: ClusterIssuer
name: vault-issuer
commonName: auth.unkin.net
dnsNames:
- auth.unkin.net
- au.auth.unkin.net
- kanidm.k8s.syd1.au.unkin.net
- kanidm.kanidm.svc.cluster.local
- kanidm-0.kanidm-headless.kanidm.svc.cluster.local
- kanidm-1.kanidm-headless.kanidm.svc.cluster.local
- kanidm-2.kanidm-headless.kanidm.svc.cluster.local
privateKey:
algorithm: RSA
size: 4096
+30
View File
@@ -0,0 +1,30 @@
---
apiVersion: gateway.networking.k8s.io/v1
kind: Gateway
metadata:
name: kanidm
namespace: kanidm
labels:
app.kubernetes.io/name: kanidm
app.kubernetes.io/instance: kanidm
traefik.io/instance: internal
annotations:
external-dns.alpha.kubernetes.io/hostname: kanidm.k8s.syd1.au.unkin.net
external-dns.alpha.kubernetes.io/target: 198.18.200.4
spec:
gatewayClassName: traefik-internal
listeners:
- name: http
port: 80
protocol: HTTP
allowedRoutes:
namespaces:
from: Same
- name: https-passthrough
port: 443
protocol: TLS
tls:
mode: Passthrough
allowedRoutes:
namespaces:
from: Same
+29
View File
@@ -0,0 +1,29 @@
---
apiVersion: gateway.networking.k8s.io/v1
kind: HTTPRoute
metadata:
name: kanidm-http-redirect
namespace: kanidm
labels:
app.kubernetes.io/name: kanidm
app.kubernetes.io/instance: kanidm
spec:
hostnames:
- kanidm.k8s.syd1.au.unkin.net
- auth.unkin.net
- au.auth.unkin.net
parentRefs:
- group: gateway.networking.k8s.io
kind: Gateway
name: kanidm
sectionName: http
rules:
- filters:
- type: RequestRedirect
requestRedirect:
scheme: https
statusCode: 301
matches:
- path:
type: PathPrefix
value: /
+27
View File
@@ -0,0 +1,27 @@
---
apiVersion: kustomize.config.k8s.io/v1beta1
kind: Kustomization
resources:
- namespace.yaml
- serviceaccount.yaml
- vaultauth.yaml
- vaultstaticsecret.yaml
- certificate.yaml
- service.yaml
- statefulset.yaml
- poddisruptionbudget.yaml
- gateway.yaml
- httproute.yaml
- tlsroute.yaml
configMapGenerator:
- name: kanidm-config
namespace: kanidm
options:
disableNameSuffixHash: true
labels:
app.kubernetes.io/name: kanidm
app.kubernetes.io/instance: kanidm
files:
- server-0.toml=resources/server-0.toml
+5
View File
@@ -0,0 +1,5 @@
---
apiVersion: v1
kind: Namespace
metadata:
name: kanidm
+15
View File
@@ -0,0 +1,15 @@
---
apiVersion: policy/v1
kind: PodDisruptionBudget
metadata:
name: kanidm
namespace: kanidm
labels:
app.kubernetes.io/name: kanidm
app.kubernetes.io/instance: kanidm
spec:
maxUnavailable: 1
selector:
matchLabels:
app.kubernetes.io/name: kanidm
app.kubernetes.io/instance: kanidm
+37
View File
@@ -0,0 +1,37 @@
---
apiVersion: rbac.authorization.k8s.io/v1
kind: Role
metadata:
name: kanidm-repl
namespace: kanidm
labels:
app.kubernetes.io/name: kanidm
app.kubernetes.io/instance: kanidm
rules:
- apiGroups: [""]
resources: ["pods"]
verbs: ["get", "list"]
- apiGroups: [""]
resources: ["pods/exec"]
verbs: ["create"]
- apiGroups: [""]
resources: ["configmaps"]
resourceNames: ["kanidm-repl-certs"]
verbs: ["get", "patch"]
---
apiVersion: rbac.authorization.k8s.io/v1
kind: RoleBinding
metadata:
name: kanidm-repl
namespace: kanidm
labels:
app.kubernetes.io/name: kanidm
app.kubernetes.io/instance: kanidm
subjects:
- kind: ServiceAccount
name: kanidm
namespace: kanidm
roleRef:
kind: Role
name: kanidm-repl
apiGroup: rbac.authorization.k8s.io
+15
View File
@@ -0,0 +1,15 @@
version = "2"
domain = "auth.unkin.net"
origin = "https://auth.unkin.net"
bindaddress = "[::]:8443"
db_path = "/data/kanidm.db"
db_arc_size = 2048
tls_chain = "/data/tls/tls.crt"
tls_key = "/data/tls/tls.key"
log_level = "info"
[online_backup]
path = "/data/backups/"
schedule = "0 22 * * *"
versions = 7
+43
View File
@@ -0,0 +1,43 @@
---
apiVersion: v1
kind: Service
metadata:
name: kanidm
namespace: kanidm
labels:
app.kubernetes.io/name: kanidm
app.kubernetes.io/instance: kanidm
spec:
type: ClusterIP
sessionAffinity: ClientIP
sessionAffinityConfig:
clientIP:
timeoutSeconds: 10800
ports:
- name: https
port: 8443
targetPort: https
protocol: TCP
selector:
app.kubernetes.io/name: kanidm
app.kubernetes.io/instance: kanidm
---
apiVersion: v1
kind: Service
metadata:
name: kanidm-headless
namespace: kanidm
labels:
app.kubernetes.io/name: kanidm
app.kubernetes.io/instance: kanidm
spec:
type: ClusterIP
clusterIP: None
ports:
- name: https
port: 8443
targetPort: https
protocol: TCP
selector:
app.kubernetes.io/name: kanidm
app.kubernetes.io/instance: kanidm
+9
View File
@@ -0,0 +1,9 @@
---
apiVersion: v1
kind: ServiceAccount
metadata:
name: kanidm
namespace: kanidm
labels:
app.kubernetes.io/name: kanidm
app.kubernetes.io/instance: kanidm
+85
View File
@@ -0,0 +1,85 @@
---
apiVersion: apps/v1
kind: StatefulSet
metadata:
name: kanidm
namespace: kanidm
annotations:
reloader.stakater.com/auto: "true"
labels:
app.kubernetes.io/name: kanidm
app.kubernetes.io/instance: kanidm
spec:
serviceName: kanidm-headless
replicas: 1
selector:
matchLabels:
app.kubernetes.io/name: kanidm
app.kubernetes.io/instance: kanidm
template:
metadata:
labels:
app.kubernetes.io/name: kanidm
app.kubernetes.io/instance: kanidm
spec:
serviceAccountName: kanidm
securityContext:
runAsUser: 1000
runAsGroup: 1000
runAsNonRoot: true
fsGroup: 1000
containers:
- name: kanidm
image: kanidm/server:1.10.3
command: ["/sbin/kanidmd"]
args: ["server", "-c", "/config/server.toml"]
ports:
- name: https
containerPort: 8443
protocol: TCP
volumeMounts:
- name: data
mountPath: /data
- name: config
mountPath: /config/server.toml
subPath: server-0.toml
readOnly: true
- name: tls
mountPath: /data/tls
readOnly: true
securityContext:
allowPrivilegeEscalation: false
readOnlyRootFilesystem: false
resources:
requests:
memory: 256Mi
cpu: 100m
limits:
memory: 1Gi
cpu: 500m
readinessProbe:
tcpSocket:
port: 8443
initialDelaySeconds: 15
periodSeconds: 10
livenessProbe:
tcpSocket:
port: 8443
initialDelaySeconds: 30
periodSeconds: 30
volumes:
- name: config
configMap:
name: kanidm-config
- name: tls
secret:
secretName: kanidm-tls
volumeClaimTemplates:
- metadata:
name: data
spec:
accessModes: [ReadWriteOnce]
storageClassName: cephrbd-fast-delete
resources:
requests:
storage: 10Gi
+26
View File
@@ -0,0 +1,26 @@
---
apiVersion: gateway.networking.k8s.io/v1
kind: TLSRoute
metadata:
name: kanidm
namespace: kanidm
labels:
app.kubernetes.io/name: kanidm
app.kubernetes.io/instance: kanidm
spec:
hostnames:
- kanidm.k8s.syd1.au.unkin.net
- auth.unkin.net
- au.auth.unkin.net
parentRefs:
- group: gateway.networking.k8s.io
kind: Gateway
name: kanidm
sectionName: https-passthrough
rules:
- backendRefs:
- group: ""
kind: Service
name: kanidm
port: 8443
weight: 1
+18
View File
@@ -0,0 +1,18 @@
---
apiVersion: secrets.hashicorp.com/v1beta1
kind: VaultAuth
metadata:
name: default
namespace: kanidm
spec:
allowedNamespaces:
- kanidm
kubernetes:
audiences:
- vault
role: default
serviceAccount: default
tokenExpirationSeconds: 600
method: kubernetes
mount: k8s/au/syd1
vaultConnectionRef: vso-system/default

Some files were not shown because too many files have changed in this diff Show More