argocd-apps

Author	SHA1	Message	Date
unkinben	cfca1e5278	Add age-api deployment (#210 ) ## Summary - Deploy age-api to the au-syd1 cluster - Uses configMapGenerator for people config with jaidi, ben, and sudaporn - Includes gateway, httproute, service, and deployment - Image: git.unkin.net/unkin/age-api:v0.1.0 Reviewed-on: #210 Co-authored-by: Ben Vincent <ben@unkin.net> Co-committed-by: Ben Vincent <ben@unkin.net>	2026-06-28 12:19:38 +10:00
unkinben	1b781e0885	feat(woodpecker): set workflow pod priority class to power (#175 ) ## Summary Sets `WOODPECKER_BACKEND_K8S_PRIORITY_CLASS: power` on the Woodpecker agent so all CI pipeline pods are scheduled with the `power` PriorityClass (value 100, preemptionPolicy: Never). This means pipeline pods can be evicted when the cluster is under pressure but won't preempt other workloads. ## Dependency Requires the `power` PriorityClass to exist on the cluster — deploy PR #174 (priority-classes app) first. ## Test plan - Trigger a pipeline run and confirm pods are created with `priorityClassName: power` - `kubectl get pod -n woodpecker -o jsonpath='{.items[*].spec.priorityClassName}'` Reviewed-on: #175	2026-05-26 23:58:57 +10:00
unkinben	ede25a3858	feat(platform): add priority-classes app with low/power/medium/high classes (#174 ) ## Summary - New `apps/base/priority-classes/` app with four `PriorityClass` objects managed via the `platform` ArgoCD project - Adds `apps/overlays/*/priority-classes` to the platform ApplicationSet generator - Adds `priority-classes` namespace to platform AppProject destinations (required even for cluster-scoped resources) \| Class \| Value \| PreemptionPolicy \| Intent \| \|---\|---\|---\|---\| \| `low` \| 100 \| Never \| Background work; evictable, won't preempt others \| \| `power` \| 100 \| Never \| Compute-heavy but expendable (e.g. AI/ML workloads) \| \| `medium` \| 10000 \| PreemptLowerPriority \| Standard services \| \| `high` \| 100000 \| PreemptLowerPriority \| Critical services; preempts lower-priority pods \| `PriorityClass` is already in the platform project's `clusterResourceWhitelist` so no project policy changes were needed. ## Test plan - ArgoCD syncs `platform-priority-classes` successfully - `kubectl get priorityclasses low power medium high` shows all four classes Reviewed-on: #174	2026-05-26 23:41:54 +10:00
unkinben	3990fbfe06	feat(vault): switch to Kubernetes service registration (#171 ) Replaces Consul service registration with the native Kubernetes provider so Vault labels its own pods with active/standby/perf-standby status without requiring a Consul dependency. ## Changes - `values.yaml`: swap `service_registration "consul"` for `service_registration "kubernetes" {}`, add `VAULT_K8S_NAMESPACE` and `VAULT_K8S_POD_NAME` env vars via downward API - `role_k8s-service-registration.yaml`: Role + RoleBinding granting the `vault` service account `get`/`update`/`patch` on pods - `kustomization.yaml`: include new RBAC file Reviewed-on: #171	2026-05-26 00:06:56 +10:00
unkinben	ac8b8212bd	fix(consul): normalize cpu limit to canonical string form (#164 ) ## Summary - Changes `server.resources.limits.cpu` from `1000m` to `"1"` in consul Helm values ## Why `1000m` (1000 milliCPU) is equivalent to `1` CPU, but Kubernetes normalizes the value to `"1"` when storing. ArgoCD diffs desired vs live by string comparison, so the mismatch causes a permanent OutOfSync on the `consul-server` StatefulSet. Same root cause as #163. Reviewed-on: #164	2026-05-25 22:43:35 +10:00
unkinben	3756208ccd	benvin/kanidm (#159 ) Reviewed-on: #159	2026-05-24 19:55:22 +10:00
unkinben	11ac2ae91e	feat(consul): deploy HashiCorp Consul 1.22.7 via Helm chart (5-replica cluster) (#149 ) ## Summary - Deploys HashiCorp Consul 1.22.7 using Helm chart 1.9.7 with 5 server replicas - Configuration modelled on production consul: \`datacenter=au-syd1\`, \`connect=true\`, \`raft_multiplier=10\`, HTTP on 8500, GRPC on 8502, HTTPS disabled - 5-replica server cluster with \`bootstrapExpect=5\` - 10Gi cephrbd-fast-delete PVC per server pod - Gateway API: HTTPS gateway + HTTPRoute (443→consul-consul-ui:80→8500) at \`consul.k8s.syd1.au.unkin.net\` - PodDisruptionBudget patched from \`policy/v1beta1\` to \`policy/v1\` (k8s 1.25+ compatibility) - ArgoCD platform ApplicationSet updated to include consul overlay path - Clients disabled (server-only deployment) - ConnectInject disabled (can be enabled later for service mesh) ## Requires - PR #147 (artifactapi: add hashicorp/consul to docker immutable patterns) to be merged first ## Test plan - [ ] Sandbox tested in \`sandbox-consul\`: all 5 server pods 1/1 Running, cluster formed - [ ] After merge: ArgoCD syncs consul namespace - [ ] Verify \`consul.k8s.syd1.au.unkin.net\` is accessible via Gateway Reviewed-on: #149	2026-05-23 22:40:49 +10:00
unkinben	d2be521878	feat(vault): deploy HashiCorp Vault 2.0.1 via Helm chart (5-replica HA raft) (#148 ) ## Summary - Deploys HashiCorp Vault 2.0.1 using Helm chart 0.32.0 in HA raft mode (5 replicas) - Configuration modelled on production vault: \`disable_mlock=true\`, headless-DNS retry_join for all 5 pods - IPC_LOCK capability added via \`server.statefulSet.securityContext.container\` - 10Gi cephrbd-fast-delete PVC per pod via \`dataStorage\` - Gateway API: HTTPS gateway + HTTPRoute (443→vault service port 8200) at \`vault.k8s.syd1.au.unkin.net\` - ArgoCD platform ApplicationSet updated to include vault overlay path - Injector disabled (no agent sidecar injection needed) ## Requires - PR #147 (artifactapi: add hashicorp/vault to docker immutable patterns) to be merged first ## Test plan - [ ] Sandbox tested in \`sandbox-vault\`: all 5 pods Running, raft cluster forming - [ ] After merge: ArgoCD syncs vault namespace - [ ] Operator runs \`vault operator init\` to initialize, then unseals all 5 nodes - [ ] Verify \`vault.k8s.syd1.au.unkin.net\` is accessible via Gateway Reviewed-on: #148	2026-05-23 22:39:41 +10:00
unkinben	bcd4c1a722	feat(cert-manager): upgrade to v1.20.2 and enable Gateway API support (#150 ) ## Summary - Upgrades cert-manager from v1.19.2 to v1.20.2 - Enables `enableGatewayAPI: true` via the `ControllerConfiguration` config block ## Why cert-manager's Gateway API integration was not enabled. Without it, `cert-manager.io/*` annotations on Gateway resources are ignored and no certificates are issued. This is required for the consul and vault PRs (#148, #149) to have their TLS certs automatically provisioned from their Gateway annotations. In v1.20.2, `ExperimentalGatewayAPISupport` is BETA and defaults to true — enabling `enableGatewayAPI` in the controller config activates the gateway-shim controller. ## Test plan - [ ] cert-manager rolls out cleanly (v1.20.2 pods become Ready) - [ ] After rollout, existing Gateway-annotated services (artifactapi, puppet, litellm) retain valid certs - [ ] New Gateway resources with `cert-manager.io/cluster-issuer` annotations trigger Certificate creation Reviewed-on: #150	2026-05-23 22:38:39 +10:00
unkinben	dcea768c15	feat(woodpecker): upgrade to v3.14.1 (chart 3.6.3) (#146 ) Reviewed-on: #146	2026-05-23 18:00:55 +10:00
unkinben	fd87cb96b5	feat(externaldns): upgrade to 1.21.1, fix sources for installed CRDs (#143 ) ## Changes - Upgrade external-dns chart from 1.19.0 → 1.21.1 (app v0.19.0 → v0.21.0) - Remove `gateway-tcproute` source — `TCPRoute` CRD is not installed, causing crash-loop - Add `gateway-tlsroute` source — `TLSRoute` CRD is installed (Gateway API 1.5.1) ## Why The pod was crash-looping every minute with `failed to list *v1alpha2.TCPRoute: the server could not find the requested resource` because the TCPRoute CRD doesn't exist in this cluster. TLSRoute was previously removed but its CRD does exist. Reviewed-on: #143	2026-05-23 01:28:20 +10:00
unkinben	d619f9195e	benvin/externaldns_compatability (#142 ) Reviewed-on: #142	2026-05-23 01:17:20 +10:00
unkinben	1944dbbfcd	temp: enable debug logging on externaldns to diagnose TLSRoute sync timeout (#140 ) Temporary: enable --log-level=debug to understand why the TLSRoute informer never reports HasSynced within the 1m interval. To be closed/reverted after root cause is found. Reviewed-on: #140	2026-05-23 01:07:45 +10:00
unkinben	0940cc20f8	fix(traefik): listen on port 443 directly for Gateway API compatibility (#138 ) ## Problem Gateway listeners with `port: 443` were rejected with `PortUnavailable: Cannot find entryPoint for Gateway: no matching entryPoint for port 443 and protocol "HTTPS"`. Traefik matches Gateway listener ports against its internal entryPoint ports (pod-level), not the Service's `exposedPort`. The `websecure` entryPoint was configured on port `8443`, so port `443` listeners had no match. ## Fix - `ports.websecure.port: 443` — Traefik now binds directly on 443 - `securityContext.capabilities.add: [NET_BIND_SERVICE]` — allows a non-root process to bind to privileged ports (<1024) The Service `exposedPort` stays at `443`, so external connectivity is unchanged. All existing Gateway listeners (`port: 443`) are correct as-is. Applies to both internal and external Traefik instances. ## Test plan - [ ] Traefik pods restart cleanly - [ ] `kubectl get gateway -A` shows listeners as `Programmed: True` - [ ] `https://rancher.k8s.syd1.au.unkin.net` (already merged) is reachable Reviewed-on: #138	2026-05-23 00:44:13 +10:00
unkinben	57c14d32c0	fix(traefik): remove invalid controllerName flag causing CrashLoopBackOff (#136 ) ## URGENT — Traefik pods are CrashLoopBackOff The merged PR #135 added `--providers.kubernetesgateway.controllerName` as an `additionalArguments` entry. Traefik v3.7.0 does not support this flag and fails immediately on startup. Old replica sets are still running (one pod each) but new pods cannot come up. ## Fix - Remove `additionalArguments` from both `values-internal.yaml` and `values-external.yaml` - Revert GatewayClass `controllerName` back to `traefik.io/gateway-controller` (the hardcoded Traefik default — no override mechanism exists in v3.7.0) ## After merge GatewayClasses will remain `Unknown` until a separate solution for internal/external separation is implemented (the `labelSelector` approach needs further investigation). Reviewed-on: #136	2026-05-22 23:58:56 +10:00
unkinben	2df359c4a9	fix(traefik): set controllerName on GatewayClasses and Traefik providers (#135 ) ## Problem Both GatewayClasses (`traefik-internal`, `traefik-external`) were stuck as `Unknown`. Neither Traefik deployment had `controllerName` set in `kubernetesGateway`, so both defaulted to `traefik.io/gateway-controller` — which matched neither GatewayClass. ## Fix - `gatewayclass-internal.yaml`: `controllerName: traefik.io/gateway-controller-internal` - `gatewayclass-external.yaml`: `controllerName: traefik.io/gateway-controller-external` - `values-internal.yaml`: added `controllerName: traefik.io/gateway-controller-internal` - `values-external.yaml`: added `controllerName: traefik.io/gateway-controller-external` ## Test plan - [ ] ArgoCD syncs traefik-system cleanly - [ ] `kubectl get gatewayclass` shows both as `Accepted: True` Reviewed-on: #135	2026-05-22 23:44:06 +10:00
unkinben	462b2b3f4f	feat(externaldns): add Gateway API sources for httproute, tlsroute, grpcroute, tcproute, udproute (#126 ) Reviewed-on: #126	2026-05-18 00:11:33 +10:00
unkinben	73c9b3f603	fix(traefik): replace invalid controllername flag with labelSelector for v3 (#125 ) Remove --providers.kubernetesgateway.controllername which does not exist in Traefik v3, update GatewayClass controllerName to the standard v3 value, and use labelSelector on each instance's kubernetesGateway provider to differentiate internal vs external traffic. Reviewed-on: #125	2026-05-18 00:03:12 +10:00
unkinben	53553ddcfd	feat: deploy internal/external traefik routers (#119 ) deploy traefik for internal and external applications. port forwarding from the external routers will only occur to the IP of the traefik-external service. - traefik-internal and traefik-external added - each is a different deployment Reviewed-on: #119	2026-05-17 23:44:50 +10:00
unkinben	5e03215f4d	chore: migrate reloader/reflector to virtual/helm (#115 ) Reviewed-on: #115	2026-05-05 21:42:23 +10:00
unkinben	02ee82da1e	feat: update vso to 1.3.0 (#114 ) - updates the vso helm chart from 1.2.0 to 1.3.0 Reviewed-on: #114	2026-05-05 00:01:58 +10:00
unkinben	bcea7df925	chore: swap vso to virtual helm repo (#109 ) - testing if there will be any changes after merging, before merging all of them Reviewed-on: #109	2026-05-03 16:49:53 +10:00
unkinben	e156cd10bd	feat: deploy paperclip to au-syd1 via ArgoCD (aitooling project) (#100 ) Adds base manifests and au-syd1 overlay for Paperclip (AI agent orchestration platform), following the litellm deployment pattern. Updates aitooling ApplicationSet to include the paperclip path. Closes #99 Reviewed-on: #100	2026-05-02 21:27:51 +10:00
unkinben	5372914803	feat: add litellm to new aitooling ArgoCD project (#94 ) Deploys LiteLLM proxy with CNPG PostgreSQL (3-instance HA), PgBouncer pooler, and Redis cache. Introduces a dedicated aitooling AppProject and ApplicationSet to keep AI tooling services separate from platform infra. Reviewed-on: #94	2026-05-01 21:40:26 +10:00
unkinben	3a6d93bc3c	feat: add woodpeckerci/plugin-docker-buildx to WOODPECKER_PLUGINS_PRIVILEGED (#87 ) Plugin is no longer privileged by default in Woodpecker; explicitly list both the standard and latest-insecure variants. Reviewed-on: #87	2026-04-25 20:48:46 +10:00
unkinben	7d555cd31a	feat: migrate purelb to ArgoCD (#84 ) Migrate PureLB load balancer from Terragrunt to ArgoCD/Kustomize. Deploys purelb v0.13.0 with two LBNodeAgent and two ServiceGroup CRs (common: 198.18.200.0/24, dmz: 198.18.199.0/24). Adds LBNodeAgent and ServiceGroup to kubeconform skip list (no CRD catalog schema). 💘 Generated with Crush Assisted-by: Claude Sonnet 4.6 via Crush <crush@charm.land> Reviewed-on: #84	2026-04-07 19:52:17 +10:00
unkinben	f0bdc0231a	feat: migrate vso-system to ArgoCD (#81 ) Migrate Vault Secrets Operator from Terragrunt to ArgoCD/Kustomize. Deploys vault-secrets-operator v1.2.0 with 3 replicas, plus ClusterRole, ClusterRoleBindings, and vault-admin ServiceAccount. Note: static service account tokens (kubernetes.io/service-account-token) cannot be stored in git; create manually or via Vault after deployment. 💘 Generated with Crush Assisted-by: Claude Sonnet 4.6 via Crush <crush@charm.land> Reviewed-on: #81	2026-04-07 19:33:50 +10:00
unkinben	b100f3034e	feat: migrate observability to ArgoCD (#82 ) Migrate Victoria Metrics cluster and agent from Terragrunt to ArgoCD/Kustomize. Creates new observability AppProject and ApplicationSet. Deploys victoria-metrics-cluster v0.33.0 (vmselect/vminsert/vmstorage with HPA, PDB, ingress) and victoria-metrics-agent v0.30.0 (3 replicas, k8s scrape configs) in the observability namespace. 💘 Generated with Crush Assisted-by: Claude Sonnet 4.6 via Crush <crush@charm.land> Reviewed-on: #82	2026-04-07 19:15:45 +10:00
unkinben	c3a145acbf	feat: remove jfrog container registry (#83 ) its not used and never really installed correctly. going to change to artifact-keeper which promises to have the same capabilities and is open source. Reviewed-on: #83	2026-04-07 19:03:32 +10:00
unkinben	181bc152e7	feat: migrate vm-system to ArgoCD (#80 ) Migrate Victoria Metrics operator from Terragrunt to ArgoCD/Kustomize. Deploys victoria-metrics-operator v0.57.1 with 2 replicas in vm-system. 💘 Generated with Crush Assisted-by: Claude Sonnet 4.6 via Crush <crush@charm.land> Reviewed-on: #80	2026-03-27 17:04:15 +11:00
unkinben	5bcbd7e1ba	feat: migrate elastic-system to ArgoCD (#79 ) Migrate ECK operator from Terragrunt to ArgoCD/Kustomize. Deploys eck-operator v3.2.0 with 2 replicas and PodDisruptionBudget in the elastic-system namespace. 💘 Generated with Crush Assisted-by: Claude Sonnet 4.6 via Crush <crush@charm.land> Reviewed-on: #79	2026-03-27 17:00:05 +11:00
unkinben	02195e6235	feat: migrate reposync to ArgoCD (#78 ) Migrate repository sync cronjobs from Terragrunt to ArgoCD/Kustomize. Adds four daily CronJobs (almalinux9-baseos, almalinux9-appstream, epel9, openvox7) with associated PVCs and ConfigMaps in the reposync namespace. 💘 Generated with Crush Assisted-by: Claude Sonnet 4.6 via Crush <crush@charm.land> Reviewed-on: #78	2026-03-27 16:26:35 +11:00
unkinben	dfbb315522	feat: migrate node-feature-discovery and inteldeviceplugins-system to platform project (#48 ) - Add node-feature-discovery and inteldeviceplugins-system to platform project - Convert intel-nfd-rules from local Helm chart to static NodeFeatureRule manifests - Add required Helm repositories (NFD OCI registry and Intel charts) - Create base configurations with Helm charts and overlay structures - Update platform ApplicationSet and project permissions Reviewed-on: #48	2026-03-19 02:14:45 +11:00
unkinben	90f793464b	feat: migrate CSI drivers to dedicated storage project (#45 ) - Migrate csi-cephfs from Terraform to ArgoCD - Migrate csi-cephrbd from Terraform to ArgoCD - Create dedicated storage project and ApplicationSet for CSI drivers - Add csi-* pattern matching in storage ApplicationSet - Remove CSI apps from platform project to separate concerns Reviewed-on: #45	2026-03-19 01:29:31 +11:00
unkinben	06a8f98b5c	feat: migrate cnpg-system from Terraform to ArgoCD (#44 ) - Add cnpg-system base ArgoCD application with namespace - Create cnpg-system overlay for au-syd1 with CloudNativePG Helm chart - Update platform ApplicationSet to include cnpg-system deployment - Configure cloudnative-pg operator v0.27.0 with HA and resource limits - Maintain one-to-one migration from Terraform configuration Reviewed-on: #44	2026-03-19 01:25:50 +11:00
unkinben	0bf6e80d6f	feat: migrate externaldns from Terraform to ArgoCD (#43 ) - Add externaldns base ArgoCD application with namespace and Vault integration - Create externaldns overlay for au-syd1 with Helm chart configuration - Update platform ApplicationSet to include externaldns deployment - Configure external-dns v1.19.0 with RFC2136 provider for DNS updates - Maintain one-to-one migration from Terraform configuration including TSIG secrets Reviewed-on: #43	2026-03-19 01:22:39 +11:00
unkinben	ed300fabed	feat: migrate cert-manager from Terraform to ArgoCD (#42 ) - Add cert-manager base ArgoCD application with namespace, RBAC resources - Create cert-manager overlay for au-syd1 with Helm chart configuration - Update platform ApplicationSet to include cert-manager deployment - Configure cert-manager v1.19.2 with jetstack Helm repository - Maintain one-to-one migration from Terraform configuration Reviewed-on: #42	2026-03-19 01:18:19 +11:00
unkinben	ea71ebb55b	feat: migrate cattle-system (Rancher) from Terraform to ArgoCD (#39 ) - Add cattle-system base ArgoCD application with namespace, Vault integration, and ingress - Create cattle-system overlay for au-syd1 with Rancher Helm chart configuration - Update platform ApplicationSet to include cattle-system deployment - Update platform project to include Rancher Helm repository as source - Configure Rancher v2.13.1 with HA, TLS, audit logging, and bootstrap secret from Vault - Maintain one-to-one migration from Terraform configuration Reviewed-on: #39	2026-03-19 00:56:39 +11:00
unkinben	3f282fbdc2	feat: migrate certificates from Terraform to ArgoCD (#37 ) - Add certificates base ArgoCD application with namespace and Vault CA certificate secret - Create certificates overlay for au-syd1 with static certificate configuration - Update platform ApplicationSet to include certificates deployment - Configure Vault CA certificate with reflector annotations for cross-namespace replication - Maintain one-to-one migration from Terraform configuration Note: Skip no_plain_secrets hook as this is a public CA certificate that needs to be replicated via reflector, not a sensitive secret Reviewed-on: #37	2026-03-19 00:16:33 +11:00
unkinben	14e3946d4b	feat: initial puppet deployment (#25 ) working towards a larger, redundant, autoscaling and simple puppet implementation in kubernetes. this was originally based on the openvox helm chart with several improvements (not all in this pr) - use of cnpg instead of single bitnamilegacy postgres container - use for g10k instead of r10k - run one instance of g10k per namespace, instead of per-pod - store only keep one copy of the environments/branches (instead of per-pod) - change g10k to native cronjob instead of hacky implementation - use vault secrets part one adds: - cnpg puppetdb pgsql cluster - cnpg puppetdb pgpooler - persistent volume claims for puppet, puppetdb, the code repository, etc Reviewed-on: #25	2026-03-09 01:10:30 +11:00
unkinben	68b753d7fa	chore: reload woodpecker (#24 ) - add reloader annotations to woodpecker agent/server Reviewed-on: #24	2026-03-07 16:02:39 +11:00
unkinben	d7b661a619	chore: set WOODPECKER_ADMIN (#23 ) - enable admin features for myself Reviewed-on: #23	2026-03-07 15:47:42 +11:00
unkinben	05a88459a5	chore: migrate artifactapi to kustomize (#18 ) - migrate terraform deployment to kustomize Reviewed-on: #18	2026-03-06 21:35:47 +11:00
unkinben	f9a8dca060	chore: change max workflows to string (#16 ) WOODPECKER_MAX_WORKFLOWS shows no value in the pods environment, trying as a string instead Reviewed-on: #16	2026-03-03 23:14:05 +11:00
unkinben	46e11dd05e	chore: increase agents to 3 (#15 ) - increase woodpecker agents to 3 for parallel jobs Reviewed-on: #15	2026-03-03 23:02:15 +11:00
unkinben	dbd8914013	feat: migrate woodpecker to argocd (#13 ) - move woodpecker helm chart deployment to argocd - move cnpg resources - move vault resources Reviewed-on: #13	2026-03-03 22:24:17 +11:00
unkinben	be9d485bfe	feat: testing jfrog-container-registry (#11 ) - trialing jfrog container registry Reviewed-on: #11	2026-03-02 23:07:47 +11:00
unkinben	ebb47348fe	fix: resolve issues with helm deployments (#8 ) - remove helm-patch files that are unused - change platform namespaces allowed to *-system - change chart name Reviewed-on: #8	2026-03-01 18:55:47 +11:00
unkinben	e873935634	feat: add reloader (#6 ) - deploy reloader via helm - only watch configmaps, secrets are reloaded by vso Reviewed-on: #6	2026-03-01 16:34:01 +11:00
unkinben	c52af7eb11	fix: helm-charts in overlay only (#5 ) weird issues with kustomize not being able to merge helm-charts between base/overlays - move the helm-charts to the overlay only Reviewed-on: #5	2026-03-01 16:01:32 +11:00

1 2

52 Commits