Commit Graph

112 Commits

Author SHA1 Message Date
unkinben 4ec7c61757 fix(gateways): add explicit group: "" to all certificateRefs entries
ci/woodpecker/pr/pre-commit Pipeline was successful
ci/woodpecker/pr/kubeconform Pipeline was successful
The Gateway API admission server defaults certificateRefs[].group to ""
when it is omitted. ArgoCD diffed the desired state (no group field) against
the live state (group: "") and flagged every gateway as out of sync.

Fix: explicitly set group: "" in all certificateRefs entries so the
rendered manifest matches the API server's canonical form exactly.

Affected: artifactapi, cattle-system, consul, litellm, paperclip,
puppet (puppetboard + puppetdb), vault.
2026-05-23 23:39:42 +10:00
unkinben e43fb742ad feat(artifactapi): add kanidm to ghcr docker immutable patterns (#151)
Prerequisite for kanidm deployment (PR benvin/kanidm).

Reviewed-on: #151
2026-05-23 23:09:38 +10:00
unkinben 11ac2ae91e feat(consul): deploy HashiCorp Consul 1.22.7 via Helm chart (5-replica cluster) (#149)
## Summary

- Deploys HashiCorp Consul 1.22.7 using Helm chart 1.9.7 with 5 server replicas
- Configuration modelled on production consul: \`datacenter=au-syd1\`, \`connect=true\`, \`raft_multiplier=10\`, HTTP on 8500, GRPC on 8502, HTTPS disabled
- 5-replica server cluster with \`bootstrapExpect=5\`
- 10Gi cephrbd-fast-delete PVC per server pod
- Gateway API: HTTPS gateway + HTTPRoute (443→consul-consul-ui:80→8500) at \`consul.k8s.syd1.au.unkin.net\`
- PodDisruptionBudget patched from \`policy/v1beta1\` to \`policy/v1\` (k8s 1.25+ compatibility)
- ArgoCD platform ApplicationSet updated to include consul overlay path
- Clients disabled (server-only deployment)
- ConnectInject disabled (can be enabled later for service mesh)

## Requires

- PR #147 (artifactapi: add hashicorp/consul to docker immutable patterns) to be merged first

## Test plan

- [ ] Sandbox tested in \`sandbox-consul\`: all 5 server pods 1/1 Running, cluster formed
- [ ] After merge: ArgoCD syncs consul namespace
- [ ] Verify \`consul.k8s.syd1.au.unkin.net\` is accessible via Gateway

Reviewed-on: #149
2026-05-23 22:40:49 +10:00
unkinben d2be521878 feat(vault): deploy HashiCorp Vault 2.0.1 via Helm chart (5-replica HA raft) (#148)
## Summary

- Deploys HashiCorp Vault 2.0.1 using Helm chart 0.32.0 in HA raft mode (5 replicas)
- Configuration modelled on production vault: \`disable_mlock=true\`, headless-DNS retry_join for all 5 pods
- IPC_LOCK capability added via \`server.statefulSet.securityContext.container\`
- 10Gi cephrbd-fast-delete PVC per pod via \`dataStorage\`
- Gateway API: HTTPS gateway + HTTPRoute (443→vault service port 8200) at \`vault.k8s.syd1.au.unkin.net\`
- ArgoCD platform ApplicationSet updated to include vault overlay path
- Injector disabled (no agent sidecar injection needed)

## Requires

- PR #147 (artifactapi: add hashicorp/vault to docker immutable patterns) to be merged first

## Test plan

- [ ] Sandbox tested in \`sandbox-vault\`: all 5 pods Running, raft cluster forming
- [ ] After merge: ArgoCD syncs vault namespace
- [ ] Operator runs \`vault operator init\` to initialize, then unseals all 5 nodes
- [ ] Verify \`vault.k8s.syd1.au.unkin.net\` is accessible via Gateway

Reviewed-on: #148
2026-05-23 22:39:41 +10:00
unkinben 6d9530b1ee feat(artifactapi): add hashicorp/consul and hashicorp/vault to docker immutable patterns (#147)
## Summary

- Adds \`^hashicorp/consul\` and \`^hashicorp/vault\` to the dockerhub immutable_patterns in artifactapi's remote-docker.yaml
- Replaces the more specific \`^hashicorp/vault-secrets-operator\` pattern since \`^hashicorp/vault\` subsumes it
- Required for the benvin/vault and benvin/consul branches (vault:2.0.1 and consul:1.22.7)

## Test plan

- [ ] Verify artifactapi accepts requests for hashicorp/vault and hashicorp/consul images after merge

Reviewed-on: #147
2026-05-23 18:21:25 +10:00
unkinben e05f9bfd83 feat: increase litellm resources (#144)
finding litellm performance has dropped, crashed in multiple cases, and
then it had scaled to the maximum level using the majority of memory in
cluster.

- reduce the rate at which litellm autoscales
- increase the requests/limits to match usage

Reviewed-on: #144
2026-05-23 17:59:43 +10:00
unkinben 445d8b6e7e feat: add HTTP→HTTPS redirect to Gateway API services (#145)
Add port 80 HTTP listener and redirect HTTPRoute to artifactapi,
cattle-system (rancher), litellm, paperclip, and puppetboard — restoring
the redirect behaviour that existed on the previous nginx/traefik Ingress
resources.

Reviewed-on: #145
2026-05-23 17:34:07 +10:00
unkinben c2637da068 feat(artifactapi): migrate Ingress to Gateway API (#129)
## Summary

- Replace `Ingress` (nginx) with `Gateway` + `HTTPRoute` using `traefik-internal` GatewayClass
- TLS terminated at the Gateway listener; cert-manager provisions the certificate via `vault-issuer`
- external-dns annotations moved to the Gateway

## Notes

The original Ingress had nginx-specific annotations (`proxy-body-size: 10g`, `proxy-read-timeout: 600`) which are not portable to Gateway API. These can be re-introduced via a Traefik `Middleware` CRD if needed.

## Test plan

- [ ] ArgoCD syncs the app cleanly
- [ ] cert-manager issues the `artifactapi-tls` certificate
- [ ] external-dns creates the DNS record
- [ ] `https://artifactapi.k8s.syd1.au.unkin.net` is reachable

Reviewed-on: #129
2026-05-23 16:06:33 +10:00
unkinben 90ddd932fe feat(puppet): migrate puppetdb Ingress to Gateway API (#131)
## Summary

- Replace `Ingress` (nginx) with `Gateway` + `HTTPRoute` using `traefik-internal` GatewayClass
- TLS terminated at the Gateway listener; cert-manager provisions the certificate via `vault-issuer`
- external-dns annotations moved to the Gateway
- `ingress_puppetboard.yaml` is unchanged in this PR (separate PR)

## Test plan

- [ ] ArgoCD syncs the puppet app cleanly
- [ ] cert-manager issues the `puppetdb-tls` certificate
- [ ] external-dns creates the DNS record
- [ ] `https://puppetdb.k8s.syd1.au.unkin.net` is reachable

Reviewed-on: #131
2026-05-23 16:05:26 +10:00
unkinben 2c6d88aa6b feat(puppet): migrate puppetboard Ingress to Gateway API (#130)
## Summary

- Replace `Ingress` (nginx) with `Gateway` + `HTTPRoute` using `traefik-internal` GatewayClass
- TLS terminated at the Gateway listener; cert-manager provisions the certificate via `vault-issuer`
- external-dns annotations moved to the Gateway
- `ingress_puppetdb.yaml` is unchanged in this PR (separate PR)

## Test plan

- [ ] ArgoCD syncs the puppet app cleanly
- [ ] cert-manager issues the `puppetboard-tls` certificate
- [ ] external-dns creates the DNS record
- [ ] `https://puppetboard.k8s.syd1.au.unkin.net` is reachable

Reviewed-on: #130
2026-05-23 01:31:28 +10:00
unkinben 58368948d9 feat(paperclip): migrate Ingress to Gateway API (#133)
## Summary

- Replace `Ingress` (nginx) with `Gateway` + `HTTPRoute` using `traefik-internal` GatewayClass
- TLS terminated at the Gateway listener; cert-manager provisions the certificate via `vault-issuer`
- external-dns annotations moved to the Gateway

## Test plan

- [ ] ArgoCD syncs the paperclip app cleanly
- [ ] cert-manager issues the `paperclip-tls` certificate
- [ ] external-dns creates the DNS record
- [ ] `https://paperclip.k8s.syd1.au.unkin.net` is reachable

Reviewed-on: #133
2026-05-23 01:31:03 +10:00
unkinben 4f5c3f7ea0 feat(litellm): migrate Ingress to Gateway API (#134)
## Summary

- Replace `Ingress` (nginx) with `Gateway` + `HTTPRoute` using `traefik-internal` GatewayClass
- TLS terminated at the Gateway listener; cert-manager provisions the certificate via `vault-issuer`
- external-dns annotations moved to the Gateway

## Test plan

- [ ] ArgoCD syncs the litellm app cleanly
- [ ] cert-manager issues the `litellm-tls` certificate
- [ ] external-dns creates the DNS record
- [ ] `https://litellm.k8s.syd1.au.unkin.net` is reachable

Reviewed-on: #134
2026-05-23 01:29:54 +10:00
unkinben 20ce2b1b92 feat(cattle-system): migrate rancher Ingress to Gateway API (#132)
## Summary

- Replace `Ingress` (nginx) with `Gateway` + `HTTPRoute` using `traefik-internal` GatewayClass
- TLS terminated at the Gateway listener; cert-manager provisions the certificate via `vault-issuer`
- external-dns annotations moved to the Gateway

## Test plan

- [ ] ArgoCD syncs the cattle-system app cleanly
- [ ] cert-manager issues the `rancher-tls` certificate
- [ ] external-dns creates the DNS record
- [ ] `https://rancher.k8s.syd1.au.unkin.net` is reachable

Reviewed-on: #132
2026-05-23 00:24:57 +10:00
unkinben 64dc5a0242 fix(traefik): add instance labels to GatewayClasses (#137)
## Problem

GatewayClasses were `Unknown` even after controllerName was fixed. The `kubernetesGateway` `labelSelector` applies to all watched resources, including GatewayClasses themselves. Since neither GatewayClass had a `traefik.io/instance` label, both Traefik instances filtered them out and never accepted them.

## Fix

- `gatewayclass-internal.yaml`: add `traefik.io/instance: internal`
- `gatewayclass-external.yaml`: add `traefik.io/instance: external`

## Test plan

- [ ] `kubectl get gatewayclass` shows both as `Accepted: True`

Reviewed-on: #137
2026-05-23 00:23:18 +10:00
unkinben 57c14d32c0 fix(traefik): remove invalid controllerName flag causing CrashLoopBackOff (#136)
## URGENT — Traefik pods are CrashLoopBackOff

The merged PR #135 added `--providers.kubernetesgateway.controllerName` as an `additionalArguments` entry. Traefik v3.7.0 does not support this flag and fails immediately on startup.

Old replica sets are still running (one pod each) but new pods cannot come up.

## Fix

- Remove `additionalArguments` from both `values-internal.yaml` and `values-external.yaml`
- Revert GatewayClass `controllerName` back to `traefik.io/gateway-controller` (the hardcoded Traefik default — no override mechanism exists in v3.7.0)

## After merge

GatewayClasses will remain `Unknown` until a separate solution for internal/external separation is implemented (the `labelSelector` approach needs further investigation).

Reviewed-on: #136
2026-05-22 23:58:56 +10:00
unkinben 2df359c4a9 fix(traefik): set controllerName on GatewayClasses and Traefik providers (#135)
## Problem

Both GatewayClasses (`traefik-internal`, `traefik-external`) were stuck as `Unknown`. Neither Traefik deployment had `controllerName` set in `kubernetesGateway`, so both defaulted to `traefik.io/gateway-controller` — which matched neither GatewayClass.

## Fix

- `gatewayclass-internal.yaml`: `controllerName: traefik.io/gateway-controller-internal`
- `gatewayclass-external.yaml`: `controllerName: traefik.io/gateway-controller-external`
- `values-internal.yaml`: added `controllerName: traefik.io/gateway-controller-internal`
- `values-external.yaml`: added `controllerName: traefik.io/gateway-controller-external`

## Test plan

- [ ] ArgoCD syncs traefik-system cleanly
- [ ] `kubectl get gatewayclass` shows both as `Accepted: True`

Reviewed-on: #135
2026-05-22 23:44:06 +10:00
unkinben f53a2dc4f8 fix: terraform_vault must be RFC1123 compliant (#128)
Reviewed-on: #128
2026-05-21 23:19:20 +10:00
unkinben c5dd3cc5cb feat: add terraform_vault role (#127)
this adds a service account that can be used to run the terraform_vault
workflows with, so that we can access the jwt to generate a token

Reviewed-on: #127
2026-05-21 23:13:48 +10:00
unkinben 73c9b3f603 fix(traefik): replace invalid controllername flag with labelSelector for v3 (#125)
Remove --providers.kubernetesgateway.controllername which does not exist in
Traefik v3, update GatewayClass controllerName to the standard v3 value, and
use labelSelector on each instance's kubernetesGateway provider to differentiate
internal vs external traffic.

Reviewed-on: #125
2026-05-18 00:03:12 +10:00
unkinben 53553ddcfd feat: deploy internal/external traefik routers (#119)
deploy traefik for internal and external applications. port forwarding
from the external routers will only occur to the IP of the
traefik-external service.

- traefik-internal and traefik-external added
- each is a different deployment

Reviewed-on: #119
2026-05-17 23:44:50 +10:00
unkinben 5d3ff3a0f4 feat(artifactapi): allow kubeconform and kustomize from GitHub (#123)
Adds immutable patterns for yannh/kubeconform and kubernetes-sigs/kustomize
to fix 403 Forbidden errors when downloading their Linux amd64 releases.

Reviewed-on: #123
2026-05-17 12:19:27 +10:00
unkinben c3002dc3c1 feat(artifactapi): allow kubecolor releases from GitHub (#122)
Reviewed-on: #122
2026-05-11 23:39:48 +10:00
unkinben 27db33536a feat(artifactapi): allow almalinux, debian, and fedora from Docker Hub (#121)
Reviewed-on: #121
2026-05-10 22:56:39 +10:00
unkinben 8a7068a1c4 feat(artifactapi): add argo-helm as a remote and virtual helm member (#120)
Reviewed-on: #120
2026-05-10 22:53:43 +10:00
unkinben 4c8827ce35 feat: add traefik/gatewayapi (#116)
enable access to charts/containers/api-specs so that we can migrate from
nginx-ingress to gateway api and traefik

Reviewed-on: #116
2026-05-10 17:07:33 +10:00
unkinben 5e03215f4d chore: migrate reloader/reflector to virtual/helm (#115)
Reviewed-on: #115
2026-05-05 21:42:23 +10:00
unkinben 260b2d4364 chore: mount vault CA cert for Node.js TLS trust in paperclip (#108)
Mount the vault-ca-cert secret and set NODE_EXTRA_CA_CERTS so Node.js
trusts the internal CA chain when making outbound TLS connections.

Reviewed-on: #108
2026-05-03 00:10:08 +10:00
unkinben 156b545249 fix: set Host header on paperclip health probes to bypass hostname guard (#107)
The privateHostnameGuard middleware blocks requests where the Host header
is not in the allowlist. Kubelet httpGet probes use the pod IP as the
Host header, which is never in the allowlist. Setting Host: localhost
ensures probes are always permitted.

Reviewed-on: #107
2026-05-02 23:01:59 +10:00
unkinben 0883f327e9 chore: update trusted hostnames (#106)
- remove scheme from paperclip.k8s..
- add localhost (what probe is hitting)

Reviewed-on: #106
2026-05-02 22:40:21 +10:00
unkinben 04b7c04366 chore: fix livenessProbe for paperclip (#105)
Reviewed-on: #105
2026-05-02 22:28:52 +10:00
unkinben 9914186fd5 chore: additional papaerclip environemnt variables (#104)
https://github.com/paperclipai/paperclip/issues/3121
Reviewed-on: https://git.unkin.net/unkin/argocd-apps/pulls/104
2026-05-02 22:11:38 +10:00
unkinben f55b7065f1 fix: rename pgpooler to include rw (#103)
- undo previous change (target pgcluster name)
- actually rename the pgpooler

Reviewed-on: #103
2026-05-02 21:39:51 +10:00
unkinben 87a5a271c3 fix: set pgpooler name to include -rw (#102)
- this matches the credentials set for paperclip

Reviewed-on: #102
2026-05-02 21:35:23 +10:00
unkinben e156cd10bd feat: deploy paperclip to au-syd1 via ArgoCD (aitooling project) (#100)
Adds base manifests and au-syd1 overlay for Paperclip (AI agent
orchestration platform), following the litellm deployment pattern.
Updates aitooling ApplicationSet to include the paperclip path.

Closes #99

Reviewed-on: #100
2026-05-02 21:27:51 +10:00
unkinben fe714694bf chore: bump artifactapi to 2.7.2 (#98)
Reviewed-on: #98
2026-05-02 17:19:56 +10:00
unkinben 6138afb98b feat: add litellm-env configmap with STORE_MODEL_IN_DB=True (#97)
Reviewed-on: #97
2026-05-01 22:17:53 +10:00
unkinben 949ddb76e4 chore: litellm ooming (#95)
- update memory and cpu resources

Reviewed-on: #95
2026-05-01 21:54:00 +10:00
unkinben 5372914803 feat: add litellm to new aitooling ArgoCD project (#94)
Deploys LiteLLM proxy with CNPG PostgreSQL (3-instance HA), PgBouncer
pooler, and Redis cache. Introduces a dedicated aitooling AppProject and
ApplicationSet to keep AI tooling services separate from platform infra.

Reviewed-on: #94
2026-05-01 21:40:26 +10:00
unkinben 67bb54f092 fix: artifactapi remotes (#93)
- split each yaml into its own mount

Reviewed-on: #93
2026-05-01 21:17:16 +10:00
unkinben fc568dc8b5 feat: split artifactapi config into conf.d and update to v2.7.1 (#92)
Split monolithic remotes.yaml into per-type-package files under
resources/conf.d/ to align with artifactapi v2.7.1 directory loading.
Updated schema: virtuals/locals use dedicated top-level keys, type field
removed. Added helm remotes for all kustomize helmCharts repos and
OCI patterns to docker remotes. CONFIG_PATH now points to the directory.

Reviewed-on: #92
2026-04-30 23:59:01 +10:00
unkinben 1c2c18697d feat: update artifactapi to 2.3.0 (#91)
- update to mutable/immutable ttl/patterns
- reoganised paths to correct patterns

Reviewed-on: #91
2026-04-27 13:16:02 +10:00
unkinben f2af65bc92 fix: update include patterns (#90)
- hadolint and nvim were wrong, updating

Reviewed-on: #90
2026-04-26 16:20:53 +10:00
unkinben fdca69d99a feat: update github remotes (#89)
- enable access to all tagged, master and main branches as tar/gzip
- enable access to additional tool releases

Reviewed-on: #89
2026-04-26 16:05:57 +10:00
unkinben f80be18220 benvin/dockerremotes (#88)
Reviewed-on: #88
2026-04-25 22:34:59 +10:00
unkinben 7535d655fe feat: add docker remotes to artifactapi (#86)
- set artifactapi to specific version
- add dockerhub and ghcr to remotes

Reviewed-on: #86
2026-04-25 17:40:35 +10:00
unkinben 3fc9cfa41a feat: add claude-code remote (#85)
Reviewed-on: #85
2026-04-25 11:20:47 +10:00
unkinben 7d555cd31a feat: migrate purelb to ArgoCD (#84)
Migrate PureLB load balancer from Terragrunt to ArgoCD/Kustomize.
Deploys purelb v0.13.0 with two LBNodeAgent and two ServiceGroup CRs
(common: 198.18.200.0/24, dmz: 198.18.199.0/24).
Adds LBNodeAgent and ServiceGroup to kubeconform skip list (no CRD catalog schema).

💘 Generated with Crush

Assisted-by: Claude Sonnet 4.6 via Crush <crush@charm.land>

Reviewed-on: #84
2026-04-07 19:52:17 +10:00
unkinben f0bdc0231a feat: migrate vso-system to ArgoCD (#81)
Migrate Vault Secrets Operator from Terragrunt to ArgoCD/Kustomize.
Deploys vault-secrets-operator v1.2.0 with 3 replicas, plus ClusterRole,
ClusterRoleBindings, and vault-admin ServiceAccount.

Note: static service account tokens (kubernetes.io/service-account-token)
cannot be stored in git; create manually or via Vault after deployment.

💘 Generated with Crush

Assisted-by: Claude Sonnet 4.6 via Crush <crush@charm.land>

Reviewed-on: #81
2026-04-07 19:33:50 +10:00
unkinben b100f3034e feat: migrate observability to ArgoCD (#82)
Migrate Victoria Metrics cluster and agent from Terragrunt to ArgoCD/Kustomize.
Creates new observability AppProject and ApplicationSet.
Deploys victoria-metrics-cluster v0.33.0 (vmselect/vminsert/vmstorage with
HPA, PDB, ingress) and victoria-metrics-agent v0.30.0 (3 replicas, k8s scrape
configs) in the observability namespace.

💘 Generated with Crush

Assisted-by: Claude Sonnet 4.6 via Crush <crush@charm.land>

Reviewed-on: #82
2026-04-07 19:15:45 +10:00
unkinben c3a145acbf feat: remove jfrog container registry (#83)
its not used and never really installed correctly. going to change to
artifact-keeper which promises to have the same capabilities and is open
source.

Reviewed-on: #83
2026-04-07 19:03:32 +10:00