Commit Graph

113 Commits

Author SHA1 Message Date
unkinben d4b66bb651 fix: use chart logLevel value instead of duplicate extraArg
ci/woodpecker/pr/pre-commit Pipeline was successful
ci/woodpecker/pr/kubeconform Pipeline was successful
2026-05-23 01:08:49 +10:00
unkinben 1944dbbfcd temp: enable debug logging on externaldns to diagnose TLSRoute sync timeout (#140)
Temporary: enable --log-level=debug to understand why the TLSRoute informer never reports HasSynced within the 1m interval. To be closed/reverted after root cause is found.
Reviewed-on: #140
2026-05-23 01:07:45 +10:00
unkinben 0940cc20f8 fix(traefik): listen on port 443 directly for Gateway API compatibility (#138)
## Problem

Gateway listeners with `port: 443` were rejected with `PortUnavailable: Cannot find entryPoint for Gateway: no matching entryPoint for port 443 and protocol "HTTPS"`.

Traefik matches Gateway listener ports against its internal entryPoint ports (pod-level), not the Service's `exposedPort`. The `websecure` entryPoint was configured on port `8443`, so port `443` listeners had no match.

## Fix

- `ports.websecure.port: 443` — Traefik now binds directly on 443
- `securityContext.capabilities.add: [NET_BIND_SERVICE]` — allows a non-root process to bind to privileged ports (<1024)

The Service `exposedPort` stays at `443`, so external connectivity is unchanged. All existing Gateway listeners (`port: 443`) are correct as-is.

Applies to both internal and external Traefik instances.

## Test plan

- [ ] Traefik pods restart cleanly
- [ ] `kubectl get gateway -A` shows listeners as `Programmed: True`
- [ ] `https://rancher.k8s.syd1.au.unkin.net` (already merged) is reachable

Reviewed-on: #138
2026-05-23 00:44:13 +10:00
unkinben 20ce2b1b92 feat(cattle-system): migrate rancher Ingress to Gateway API (#132)
## Summary

- Replace `Ingress` (nginx) with `Gateway` + `HTTPRoute` using `traefik-internal` GatewayClass
- TLS terminated at the Gateway listener; cert-manager provisions the certificate via `vault-issuer`
- external-dns annotations moved to the Gateway

## Test plan

- [ ] ArgoCD syncs the cattle-system app cleanly
- [ ] cert-manager issues the `rancher-tls` certificate
- [ ] external-dns creates the DNS record
- [ ] `https://rancher.k8s.syd1.au.unkin.net` is reachable

Reviewed-on: #132
2026-05-23 00:24:57 +10:00
unkinben 64dc5a0242 fix(traefik): add instance labels to GatewayClasses (#137)
## Problem

GatewayClasses were `Unknown` even after controllerName was fixed. The `kubernetesGateway` `labelSelector` applies to all watched resources, including GatewayClasses themselves. Since neither GatewayClass had a `traefik.io/instance` label, both Traefik instances filtered them out and never accepted them.

## Fix

- `gatewayclass-internal.yaml`: add `traefik.io/instance: internal`
- `gatewayclass-external.yaml`: add `traefik.io/instance: external`

## Test plan

- [ ] `kubectl get gatewayclass` shows both as `Accepted: True`

Reviewed-on: #137
2026-05-23 00:23:18 +10:00
unkinben 57c14d32c0 fix(traefik): remove invalid controllerName flag causing CrashLoopBackOff (#136)
## URGENT — Traefik pods are CrashLoopBackOff

The merged PR #135 added `--providers.kubernetesgateway.controllerName` as an `additionalArguments` entry. Traefik v3.7.0 does not support this flag and fails immediately on startup.

Old replica sets are still running (one pod each) but new pods cannot come up.

## Fix

- Remove `additionalArguments` from both `values-internal.yaml` and `values-external.yaml`
- Revert GatewayClass `controllerName` back to `traefik.io/gateway-controller` (the hardcoded Traefik default — no override mechanism exists in v3.7.0)

## After merge

GatewayClasses will remain `Unknown` until a separate solution for internal/external separation is implemented (the `labelSelector` approach needs further investigation).

Reviewed-on: #136
2026-05-22 23:58:56 +10:00
unkinben 2df359c4a9 fix(traefik): set controllerName on GatewayClasses and Traefik providers (#135)
## Problem

Both GatewayClasses (`traefik-internal`, `traefik-external`) were stuck as `Unknown`. Neither Traefik deployment had `controllerName` set in `kubernetesGateway`, so both defaulted to `traefik.io/gateway-controller` — which matched neither GatewayClass.

## Fix

- `gatewayclass-internal.yaml`: `controllerName: traefik.io/gateway-controller-internal`
- `gatewayclass-external.yaml`: `controllerName: traefik.io/gateway-controller-external`
- `values-internal.yaml`: added `controllerName: traefik.io/gateway-controller-internal`
- `values-external.yaml`: added `controllerName: traefik.io/gateway-controller-external`

## Test plan

- [ ] ArgoCD syncs traefik-system cleanly
- [ ] `kubectl get gatewayclass` shows both as `Accepted: True`

Reviewed-on: #135
2026-05-22 23:44:06 +10:00
unkinben f53a2dc4f8 fix: terraform_vault must be RFC1123 compliant (#128)
Reviewed-on: #128
2026-05-21 23:19:20 +10:00
unkinben c5dd3cc5cb feat: add terraform_vault role (#127)
this adds a service account that can be used to run the terraform_vault
workflows with, so that we can access the jwt to generate a token

Reviewed-on: #127
2026-05-21 23:13:48 +10:00
unkinben 462b2b3f4f feat(externaldns): add Gateway API sources for httproute, tlsroute, grpcroute, tcproute, udproute (#126)
Reviewed-on: #126
2026-05-18 00:11:33 +10:00
unkinben 73c9b3f603 fix(traefik): replace invalid controllername flag with labelSelector for v3 (#125)
Remove --providers.kubernetesgateway.controllername which does not exist in
Traefik v3, update GatewayClass controllerName to the standard v3 value, and
use labelSelector on each instance's kubernetesGateway provider to differentiate
internal vs external traffic.

Reviewed-on: #125
2026-05-18 00:03:12 +10:00
unkinben 53553ddcfd feat: deploy internal/external traefik routers (#119)
deploy traefik for internal and external applications. port forwarding
from the external routers will only occur to the IP of the
traefik-external service.

- traefik-internal and traefik-external added
- each is a different deployment

Reviewed-on: #119
2026-05-17 23:44:50 +10:00
unkinben 5d3ff3a0f4 feat(artifactapi): allow kubeconform and kustomize from GitHub (#123)
Adds immutable patterns for yannh/kubeconform and kubernetes-sigs/kustomize
to fix 403 Forbidden errors when downloading their Linux amd64 releases.

Reviewed-on: #123
2026-05-17 12:19:27 +10:00
unkinben c3002dc3c1 feat(artifactapi): allow kubecolor releases from GitHub (#122)
Reviewed-on: #122
2026-05-11 23:39:48 +10:00
unkinben 27db33536a feat(artifactapi): allow almalinux, debian, and fedora from Docker Hub (#121)
Reviewed-on: #121
2026-05-10 22:56:39 +10:00
unkinben 8a7068a1c4 feat(artifactapi): add argo-helm as a remote and virtual helm member (#120)
Reviewed-on: #120
2026-05-10 22:53:43 +10:00
unkinben 4c8827ce35 feat: add traefik/gatewayapi (#116)
enable access to charts/containers/api-specs so that we can migrate from
nginx-ingress to gateway api and traefik

Reviewed-on: #116
2026-05-10 17:07:33 +10:00
unkinben 5e03215f4d chore: migrate reloader/reflector to virtual/helm (#115)
Reviewed-on: #115
2026-05-05 21:42:23 +10:00
unkinben 02ee82da1e feat: update vso to 1.3.0 (#114)
- updates the vso helm chart from 1.2.0 to 1.3.0

Reviewed-on: #114
2026-05-05 00:01:58 +10:00
unkinben bcea7df925 chore: swap vso to virtual helm repo (#109)
- testing if there will be any changes after merging, before merging all of them

Reviewed-on: #109
2026-05-03 16:49:53 +10:00
unkinben 260b2d4364 chore: mount vault CA cert for Node.js TLS trust in paperclip (#108)
Mount the vault-ca-cert secret and set NODE_EXTRA_CA_CERTS so Node.js
trusts the internal CA chain when making outbound TLS connections.

Reviewed-on: #108
2026-05-03 00:10:08 +10:00
unkinben 156b545249 fix: set Host header on paperclip health probes to bypass hostname guard (#107)
The privateHostnameGuard middleware blocks requests where the Host header
is not in the allowlist. Kubelet httpGet probes use the pod IP as the
Host header, which is never in the allowlist. Setting Host: localhost
ensures probes are always permitted.

Reviewed-on: #107
2026-05-02 23:01:59 +10:00
unkinben 0883f327e9 chore: update trusted hostnames (#106)
- remove scheme from paperclip.k8s..
- add localhost (what probe is hitting)

Reviewed-on: #106
2026-05-02 22:40:21 +10:00
unkinben 04b7c04366 chore: fix livenessProbe for paperclip (#105)
Reviewed-on: #105
2026-05-02 22:28:52 +10:00
unkinben 9914186fd5 chore: additional papaerclip environemnt variables (#104)
https://github.com/paperclipai/paperclip/issues/3121
Reviewed-on: https://git.unkin.net/unkin/argocd-apps/pulls/104
2026-05-02 22:11:38 +10:00
unkinben f55b7065f1 fix: rename pgpooler to include rw (#103)
- undo previous change (target pgcluster name)
- actually rename the pgpooler

Reviewed-on: #103
2026-05-02 21:39:51 +10:00
unkinben 87a5a271c3 fix: set pgpooler name to include -rw (#102)
- this matches the credentials set for paperclip

Reviewed-on: #102
2026-05-02 21:35:23 +10:00
unkinben e156cd10bd feat: deploy paperclip to au-syd1 via ArgoCD (aitooling project) (#100)
Adds base manifests and au-syd1 overlay for Paperclip (AI agent
orchestration platform), following the litellm deployment pattern.
Updates aitooling ApplicationSet to include the paperclip path.

Closes #99

Reviewed-on: #100
2026-05-02 21:27:51 +10:00
unkinben fe714694bf chore: bump artifactapi to 2.7.2 (#98)
Reviewed-on: #98
2026-05-02 17:19:56 +10:00
unkinben 6138afb98b feat: add litellm-env configmap with STORE_MODEL_IN_DB=True (#97)
Reviewed-on: #97
2026-05-01 22:17:53 +10:00
unkinben 949ddb76e4 chore: litellm ooming (#95)
- update memory and cpu resources

Reviewed-on: #95
2026-05-01 21:54:00 +10:00
unkinben 5372914803 feat: add litellm to new aitooling ArgoCD project (#94)
Deploys LiteLLM proxy with CNPG PostgreSQL (3-instance HA), PgBouncer
pooler, and Redis cache. Introduces a dedicated aitooling AppProject and
ApplicationSet to keep AI tooling services separate from platform infra.

Reviewed-on: #94
2026-05-01 21:40:26 +10:00
unkinben 67bb54f092 fix: artifactapi remotes (#93)
- split each yaml into its own mount

Reviewed-on: #93
2026-05-01 21:17:16 +10:00
unkinben fc568dc8b5 feat: split artifactapi config into conf.d and update to v2.7.1 (#92)
Split monolithic remotes.yaml into per-type-package files under
resources/conf.d/ to align with artifactapi v2.7.1 directory loading.
Updated schema: virtuals/locals use dedicated top-level keys, type field
removed. Added helm remotes for all kustomize helmCharts repos and
OCI patterns to docker remotes. CONFIG_PATH now points to the directory.

Reviewed-on: #92
2026-04-30 23:59:01 +10:00
unkinben 1c2c18697d feat: update artifactapi to 2.3.0 (#91)
- update to mutable/immutable ttl/patterns
- reoganised paths to correct patterns

Reviewed-on: #91
2026-04-27 13:16:02 +10:00
unkinben f2af65bc92 fix: update include patterns (#90)
- hadolint and nvim were wrong, updating

Reviewed-on: #90
2026-04-26 16:20:53 +10:00
unkinben fdca69d99a feat: update github remotes (#89)
- enable access to all tagged, master and main branches as tar/gzip
- enable access to additional tool releases

Reviewed-on: #89
2026-04-26 16:05:57 +10:00
unkinben f80be18220 benvin/dockerremotes (#88)
Reviewed-on: #88
2026-04-25 22:34:59 +10:00
unkinben 3a6d93bc3c feat: add woodpeckerci/plugin-docker-buildx to WOODPECKER_PLUGINS_PRIVILEGED (#87)
Plugin is no longer privileged by default in Woodpecker; explicitly list
both the standard and latest-insecure variants.

Reviewed-on: #87
2026-04-25 20:48:46 +10:00
unkinben 7535d655fe feat: add docker remotes to artifactapi (#86)
- set artifactapi to specific version
- add dockerhub and ghcr to remotes

Reviewed-on: #86
2026-04-25 17:40:35 +10:00
unkinben 3fc9cfa41a feat: add claude-code remote (#85)
Reviewed-on: #85
2026-04-25 11:20:47 +10:00
unkinben 7d555cd31a feat: migrate purelb to ArgoCD (#84)
Migrate PureLB load balancer from Terragrunt to ArgoCD/Kustomize.
Deploys purelb v0.13.0 with two LBNodeAgent and two ServiceGroup CRs
(common: 198.18.200.0/24, dmz: 198.18.199.0/24).
Adds LBNodeAgent and ServiceGroup to kubeconform skip list (no CRD catalog schema).

💘 Generated with Crush

Assisted-by: Claude Sonnet 4.6 via Crush <crush@charm.land>

Reviewed-on: #84
2026-04-07 19:52:17 +10:00
unkinben f0bdc0231a feat: migrate vso-system to ArgoCD (#81)
Migrate Vault Secrets Operator from Terragrunt to ArgoCD/Kustomize.
Deploys vault-secrets-operator v1.2.0 with 3 replicas, plus ClusterRole,
ClusterRoleBindings, and vault-admin ServiceAccount.

Note: static service account tokens (kubernetes.io/service-account-token)
cannot be stored in git; create manually or via Vault after deployment.

💘 Generated with Crush

Assisted-by: Claude Sonnet 4.6 via Crush <crush@charm.land>

Reviewed-on: #81
2026-04-07 19:33:50 +10:00
unkinben b100f3034e feat: migrate observability to ArgoCD (#82)
Migrate Victoria Metrics cluster and agent from Terragrunt to ArgoCD/Kustomize.
Creates new observability AppProject and ApplicationSet.
Deploys victoria-metrics-cluster v0.33.0 (vmselect/vminsert/vmstorage with
HPA, PDB, ingress) and victoria-metrics-agent v0.30.0 (3 replicas, k8s scrape
configs) in the observability namespace.

💘 Generated with Crush

Assisted-by: Claude Sonnet 4.6 via Crush <crush@charm.land>

Reviewed-on: #82
2026-04-07 19:15:45 +10:00
unkinben c3a145acbf feat: remove jfrog container registry (#83)
its not used and never really installed correctly. going to change to
artifact-keeper which promises to have the same capabilities and is open
source.

Reviewed-on: #83
2026-04-07 19:03:32 +10:00
unkinben 181bc152e7 feat: migrate vm-system to ArgoCD (#80)
Migrate Victoria Metrics operator from Terragrunt to ArgoCD/Kustomize.
Deploys victoria-metrics-operator v0.57.1 with 2 replicas in vm-system.

💘 Generated with Crush

Assisted-by: Claude Sonnet 4.6 via Crush <crush@charm.land>

Reviewed-on: #80
2026-03-27 17:04:15 +11:00
unkinben 5bcbd7e1ba feat: migrate elastic-system to ArgoCD (#79)
Migrate ECK operator from Terragrunt to ArgoCD/Kustomize.
Deploys eck-operator v3.2.0 with 2 replicas and PodDisruptionBudget
in the elastic-system namespace.

💘 Generated with Crush

Assisted-by: Claude Sonnet 4.6 via Crush <crush@charm.land>

Reviewed-on: #79
2026-03-27 17:00:05 +11:00
unkinben 02195e6235 feat: migrate reposync to ArgoCD (#78)
Migrate repository sync cronjobs from Terragrunt to ArgoCD/Kustomize.
Adds four daily CronJobs (almalinux9-baseos, almalinux9-appstream, epel9,
openvox7) with associated PVCs and ConfigMaps in the reposync namespace.

💘 Generated with Crush

Assisted-by: Claude Sonnet 4.6 via Crush <crush@charm.land>

Reviewed-on: #78
2026-03-27 16:26:35 +11:00
unkinben 95c9302aa8 feat: enable downloading tea (#77)
- enable downloading the tea prebuilt binaries

Reviewed-on: #77
2026-03-26 14:02:15 +11:00
unkinben e269220228 fix: clone r10k config to /tmp/r10k-config instead of /shared (#76)
The g10k-code cronjob was failing with "Permission denied" because the
container (running as uid 999, non-root) attempted to create /shared in
the container root filesystem, which is not writable. Clone to /tmp
which is always writable by unprivileged users.

Reviewed-on: #76
2026-03-24 19:25:06 +11:00