Commit Graph

42 Commits

Author SHA1 Message Date
unkinben d11c2900de Deploy bind-operator and three BIND DNS tiers
Adds the bind-operator and the three BindClusters that replace the
Puppet-managed BIND estate (authoritative / resolver / external-dns).

- add apps/base/bind-system: 9 CRDs, operator Deployment, RBAC (ns bind-system)
- add apps/base/binddns-auth: authoritative BindCluster + catalog zone + TSIG key
- add apps/base/binddns-resolver: recursive-resolver BindCluster with forwarders
- add apps/base/binddns-externaldns: dynamic (RFC2136) BindCluster + TSIG key
- add au-syd1 overlays for all four apps
- register the four apps in the platform ApplicationSet
- add binddns-* namespaces to the platform AppProject destinations
- add schemas/bind.unkin.net/*.json so kubeconform validates the new CRs

DNS Services are LoadBalancer via PureLB. TSIG key material is generated by
the operator into Secrets at runtime (no plain Secrets in git).
2026-07-03 17:48:45 +10:00
unkinben 7f1444fb38 Add Authentik identity provider deployment (#211)
## Summary
- Deploy Authentik (identity.unkin.net) via Helm chart 2026.5.3
- CNPG PostgreSQL cluster (3 instances) with separate rw/ro poolers (2 instances each)
- Redis with 5Gi persistent storage
- Gateway API for HTTPS (identity.unkin.net) and LDAPS (ldap.k8s.syd1.au.unkin.net, ldap.main.unkin.net)
- TLSRoute for LDAPS passthrough, HTTPRoute for external-dns record creation
- Vault secrets for postgres credentials, authentik secret key, and S3 storage credentials
- S3 storage via RadosGW (bucket: authentik)
- 3 server replicas, 2 worker replicas
- Woodpecker ServiceAccount for terraform-authentik CI
- Platform applicationset and project updated

## Dependencies
- terraform-git #15 (merged) — repo definition
- terraform-vault #78 (merged) — auth roles and Consul ACL

## Vault secrets needed before deploy
Write to `kv/kubernetes/namespace/authentik/default/`:
- `postgres-credentials`: username + password
- `authentik-credentials`: AUTHENTIK_SECRET_KEY
- `s3-credentials`: S3 access key + secret key

Reviewed-on: #211
Co-authored-by: Ben Vincent <ben@unkin.net>
Co-committed-by: Ben Vincent <ben@unkin.net>
2026-06-28 17:42:49 +10:00
unkinben cfca1e5278 Add age-api deployment (#210)
## Summary
- Deploy age-api to the au-syd1 cluster
- Uses configMapGenerator for people config with jaidi, ben, and sudaporn
- Includes gateway, httproute, service, and deployment
- Image: git.unkin.net/unkin/age-api:v0.1.0

Reviewed-on: #210
Co-authored-by: Ben Vincent <ben@unkin.net>
Co-committed-by: Ben Vincent <ben@unkin.net>
2026-06-28 12:19:38 +10:00
unkinben ede25a3858 feat(platform): add priority-classes app with low/power/medium/high classes (#174)
## Summary
- New `apps/base/priority-classes/` app with four `PriorityClass` objects managed via the `platform` ArgoCD project
- Adds `apps/overlays/*/priority-classes` to the platform ApplicationSet generator
- Adds `priority-classes` namespace to platform AppProject destinations (required even for cluster-scoped resources)

| Class | Value | PreemptionPolicy | Intent |
|---|---|---|---|
| `low` | 100 | Never | Background work; evictable, won't preempt others |
| `power` | 100 | Never | Compute-heavy but expendable (e.g. AI/ML workloads) |
| `medium` | 10000 | PreemptLowerPriority | Standard services |
| `high` | 100000 | PreemptLowerPriority | Critical services; preempts lower-priority pods |

`PriorityClass` is already in the platform project's `clusterResourceWhitelist` so no project policy changes were needed.

## Test plan
- ArgoCD syncs `platform-priority-classes` successfully
- `kubectl get priorityclasses low power medium high` shows all four classes

Reviewed-on: #174
2026-05-26 23:41:54 +10:00
unkinben 3756208ccd benvin/kanidm (#159)
Reviewed-on: #159
2026-05-24 19:55:22 +10:00
unkinben c6f9893804 fix(argocd): add vault and consul to platform project destinations (#152)
Vault and consul namespaces were missing from the platform AppProject
allowed destinations, causing ArgoCD sync failures with:
  destination server 'https://kubernetes.default.svc' and namespace
  'vault' do not match any of the allowed destinations in project 'platform'

Reviewed-on: #152
2026-05-23 23:27:24 +10:00
unkinben 11ac2ae91e feat(consul): deploy HashiCorp Consul 1.22.7 via Helm chart (5-replica cluster) (#149)
## Summary

- Deploys HashiCorp Consul 1.22.7 using Helm chart 1.9.7 with 5 server replicas
- Configuration modelled on production consul: \`datacenter=au-syd1\`, \`connect=true\`, \`raft_multiplier=10\`, HTTP on 8500, GRPC on 8502, HTTPS disabled
- 5-replica server cluster with \`bootstrapExpect=5\`
- 10Gi cephrbd-fast-delete PVC per server pod
- Gateway API: HTTPS gateway + HTTPRoute (443→consul-consul-ui:80→8500) at \`consul.k8s.syd1.au.unkin.net\`
- PodDisruptionBudget patched from \`policy/v1beta1\` to \`policy/v1\` (k8s 1.25+ compatibility)
- ArgoCD platform ApplicationSet updated to include consul overlay path
- Clients disabled (server-only deployment)
- ConnectInject disabled (can be enabled later for service mesh)

## Requires

- PR #147 (artifactapi: add hashicorp/consul to docker immutable patterns) to be merged first

## Test plan

- [ ] Sandbox tested in \`sandbox-consul\`: all 5 server pods 1/1 Running, cluster formed
- [ ] After merge: ArgoCD syncs consul namespace
- [ ] Verify \`consul.k8s.syd1.au.unkin.net\` is accessible via Gateway

Reviewed-on: #149
2026-05-23 22:40:49 +10:00
unkinben d2be521878 feat(vault): deploy HashiCorp Vault 2.0.1 via Helm chart (5-replica HA raft) (#148)
## Summary

- Deploys HashiCorp Vault 2.0.1 using Helm chart 0.32.0 in HA raft mode (5 replicas)
- Configuration modelled on production vault: \`disable_mlock=true\`, headless-DNS retry_join for all 5 pods
- IPC_LOCK capability added via \`server.statefulSet.securityContext.container\`
- 10Gi cephrbd-fast-delete PVC per pod via \`dataStorage\`
- Gateway API: HTTPS gateway + HTTPRoute (443→vault service port 8200) at \`vault.k8s.syd1.au.unkin.net\`
- ArgoCD platform ApplicationSet updated to include vault overlay path
- Injector disabled (no agent sidecar injection needed)

## Requires

- PR #147 (artifactapi: add hashicorp/vault to docker immutable patterns) to be merged first

## Test plan

- [ ] Sandbox tested in \`sandbox-vault\`: all 5 pods Running, raft cluster forming
- [ ] After merge: ArgoCD syncs vault namespace
- [ ] Operator runs \`vault operator init\` to initialize, then unseals all 5 nodes
- [ ] Verify \`vault.k8s.syd1.au.unkin.net\` is accessible via Gateway

Reviewed-on: #148
2026-05-23 22:39:41 +10:00
unkinben 9a01a9ef19 fix: enable gateway/ingress class on platform project (#124)
- add missing classes to platform required to deploy traefik system

Reviewed-on: #124
2026-05-17 23:56:12 +10:00
unkinben 53553ddcfd feat: deploy internal/external traefik routers (#119)
deploy traefik for internal and external applications. port forwarding
from the external routers will only occur to the IP of the
traefik-external service.

- traefik-internal and traefik-external added
- each is a different deployment

Reviewed-on: #119
2026-05-17 23:44:50 +10:00
unkinben 5e03215f4d chore: migrate reloader/reflector to virtual/helm (#115)
Reviewed-on: #115
2026-05-05 21:42:23 +10:00
unkinben 18c519f979 chore: remove hashicorp helm repo (#113)
- no longer required, this is in virtual/helm repo in artifactapi

Reviewed-on: #113
2026-05-03 23:51:44 +10:00
unkinben bcea7df925 chore: swap vso to virtual helm repo (#109)
- testing if there will be any changes after merging, before merging all of them

Reviewed-on: #109
2026-05-03 16:49:53 +10:00
unkinben 8e7bc289f6 chore: enable access to paperclip namespace (#101)
Reviewed-on: #101
2026-05-02 21:30:59 +10:00
unkinben e156cd10bd feat: deploy paperclip to au-syd1 via ArgoCD (aitooling project) (#100)
Adds base manifests and au-syd1 overlay for Paperclip (AI agent
orchestration platform), following the litellm deployment pattern.
Updates aitooling ApplicationSet to include the paperclip path.

Closes #99

Reviewed-on: #100
2026-05-02 21:27:51 +10:00
unkinben 5372914803 feat: add litellm to new aitooling ArgoCD project (#94)
Deploys LiteLLM proxy with CNPG PostgreSQL (3-instance HA), PgBouncer
pooler, and Redis cache. Introduces a dedicated aitooling AppProject and
ApplicationSet to keep AI tooling services separate from platform infra.

Reviewed-on: #94
2026-05-01 21:40:26 +10:00
unkinben 7d555cd31a feat: migrate purelb to ArgoCD (#84)
Migrate PureLB load balancer from Terragrunt to ArgoCD/Kustomize.
Deploys purelb v0.13.0 with two LBNodeAgent and two ServiceGroup CRs
(common: 198.18.200.0/24, dmz: 198.18.199.0/24).
Adds LBNodeAgent and ServiceGroup to kubeconform skip list (no CRD catalog schema).

💘 Generated with Crush

Assisted-by: Claude Sonnet 4.6 via Crush <crush@charm.land>

Reviewed-on: #84
2026-04-07 19:52:17 +10:00
unkinben f0bdc0231a feat: migrate vso-system to ArgoCD (#81)
Migrate Vault Secrets Operator from Terragrunt to ArgoCD/Kustomize.
Deploys vault-secrets-operator v1.2.0 with 3 replicas, plus ClusterRole,
ClusterRoleBindings, and vault-admin ServiceAccount.

Note: static service account tokens (kubernetes.io/service-account-token)
cannot be stored in git; create manually or via Vault after deployment.

💘 Generated with Crush

Assisted-by: Claude Sonnet 4.6 via Crush <crush@charm.land>

Reviewed-on: #81
2026-04-07 19:33:50 +10:00
unkinben b100f3034e feat: migrate observability to ArgoCD (#82)
Migrate Victoria Metrics cluster and agent from Terragrunt to ArgoCD/Kustomize.
Creates new observability AppProject and ApplicationSet.
Deploys victoria-metrics-cluster v0.33.0 (vmselect/vminsert/vmstorage with
HPA, PDB, ingress) and victoria-metrics-agent v0.30.0 (3 replicas, k8s scrape
configs) in the observability namespace.

💘 Generated with Crush

Assisted-by: Claude Sonnet 4.6 via Crush <crush@charm.land>

Reviewed-on: #82
2026-04-07 19:15:45 +10:00
unkinben 181bc152e7 feat: migrate vm-system to ArgoCD (#80)
Migrate Victoria Metrics operator from Terragrunt to ArgoCD/Kustomize.
Deploys victoria-metrics-operator v0.57.1 with 2 replicas in vm-system.

💘 Generated with Crush

Assisted-by: Claude Sonnet 4.6 via Crush <crush@charm.land>

Reviewed-on: #80
2026-03-27 17:04:15 +11:00
unkinben 5bcbd7e1ba feat: migrate elastic-system to ArgoCD (#79)
Migrate ECK operator from Terragrunt to ArgoCD/Kustomize.
Deploys eck-operator v3.2.0 with 2 replicas and PodDisruptionBudget
in the elastic-system namespace.

💘 Generated with Crush

Assisted-by: Claude Sonnet 4.6 via Crush <crush@charm.land>

Reviewed-on: #79
2026-03-27 17:00:05 +11:00
unkinben 02195e6235 feat: migrate reposync to ArgoCD (#78)
Migrate repository sync cronjobs from Terragrunt to ArgoCD/Kustomize.
Adds four daily CronJobs (almalinux9-baseos, almalinux9-appstream, epel9,
openvox7) with associated PVCs and ConfigMaps in the reposync namespace.

💘 Generated with Crush

Assisted-by: Claude Sonnet 4.6 via Crush <crush@charm.land>

Reviewed-on: #78
2026-03-27 16:26:35 +11:00
unkinben 301f8dcc1a fix: add NodeFeatureRule and Intel device plugin permissions to platform project (#49)
- Add nfd.k8s-sigs.io/NodeFeatureRule for node-feature-discovery
- Add deviceplugin.intel.com/* for Intel device plugins (GpuDevicePlugin, etc.)
- Add cert-manager.io resources (Certificate, Issuer) for Intel device plugins

Reviewed-on: #49
2026-03-19 02:20:32 +11:00
unkinben dfbb315522 feat: migrate node-feature-discovery and inteldeviceplugins-system to platform project (#48)
- Add node-feature-discovery and inteldeviceplugins-system to platform project
- Convert intel-nfd-rules from local Helm chart to static NodeFeatureRule manifests
- Add required Helm repositories (NFD OCI registry and Intel charts)
- Create base configurations with Helm charts and overlay structures
- Update platform ApplicationSet and project permissions

Reviewed-on: #48
2026-03-19 02:14:45 +11:00
unkinben c157774033 fix: enable ServerSideApply for ArgoCD ApplicationSets (#46)
- resolve CRD annotation size limit errors by enabling server-side apply
- add storage ApplicationSet and project to kustomization files

Reviewed-on: #46
2026-03-19 01:37:56 +11:00
unkinben 90f793464b feat: migrate CSI drivers to dedicated storage project (#45)
- Migrate csi-cephfs from Terraform to ArgoCD
- Migrate csi-cephrbd from Terraform to ArgoCD
- Create dedicated storage project and ApplicationSet for CSI drivers
- Add csi-* pattern matching in storage ApplicationSet
- Remove CSI apps from platform project to separate concerns

Reviewed-on: #45
2026-03-19 01:29:31 +11:00
unkinben 06a8f98b5c feat: migrate cnpg-system from Terraform to ArgoCD (#44)
- Add cnpg-system base ArgoCD application with namespace
- Create cnpg-system overlay for au-syd1 with CloudNativePG Helm chart
- Update platform ApplicationSet to include cnpg-system deployment
- Configure cloudnative-pg operator v0.27.0 with HA and resource limits
- Maintain one-to-one migration from Terraform configuration

Reviewed-on: #44
2026-03-19 01:25:50 +11:00
unkinben 0bf6e80d6f feat: migrate externaldns from Terraform to ArgoCD (#43)
- Add externaldns base ArgoCD application with namespace and Vault integration
- Create externaldns overlay for au-syd1 with Helm chart configuration
- Update platform ApplicationSet to include externaldns deployment
- Configure external-dns v1.19.0 with RFC2136 provider for DNS updates
- Maintain one-to-one migration from Terraform configuration including TSIG secrets

Reviewed-on: #43
2026-03-19 01:22:39 +11:00
unkinben ed300fabed feat: migrate cert-manager from Terraform to ArgoCD (#42)
- Add cert-manager base ArgoCD application with namespace, RBAC resources
- Create cert-manager overlay for au-syd1 with Helm chart configuration
- Update platform ApplicationSet to include cert-manager deployment
- Configure cert-manager v1.19.2 with jetstack Helm repository
- Maintain one-to-one migration from Terraform configuration

Reviewed-on: #42
2026-03-19 01:18:19 +11:00
unkinben 656aedfc53 fix: enable unscoped permissions (#41)
- add access to create priorityclass resourcees in platform applicationset

Reviewed-on: #41
2026-03-19 01:03:54 +11:00
unkinben ea71ebb55b feat: migrate cattle-system (Rancher) from Terraform to ArgoCD (#39)
- Add cattle-system base ArgoCD application with namespace, Vault integration, and ingress
- Create cattle-system overlay for au-syd1 with Rancher Helm chart configuration
- Update platform ApplicationSet to include cattle-system deployment
- Update platform project to include Rancher Helm repository as source
- Configure Rancher v2.13.1 with HA, TLS, audit logging, and bootstrap secret from Vault
- Maintain one-to-one migration from Terraform configuration

Reviewed-on: #39
2026-03-19 00:56:39 +11:00
unkinben 8207935d36 fix: cannot write to certificates namespace (#38)
- enable the platform application to write to certificates namespace

Reviewed-on: #38
2026-03-19 00:20:39 +11:00
unkinben 3f282fbdc2 feat: migrate certificates from Terraform to ArgoCD (#37)
- Add certificates base ArgoCD application with namespace and Vault CA certificate secret
- Create certificates overlay for au-syd1 with static certificate configuration
- Update platform ApplicationSet to include certificates deployment
- Configure Vault CA certificate with reflector annotations for cross-namespace replication
- Maintain one-to-one migration from Terraform configuration

Note: Skip no_plain_secrets hook as this is a public CA certificate that needs
to be replicated via reflector, not a sensitive secret

Reviewed-on: #37
2026-03-19 00:16:33 +11:00
unkinben 14e3946d4b feat: initial puppet deployment (#25)
working towards a larger, redundant, autoscaling and simple puppet
implementation in kubernetes. this was originally based on the openvox
helm chart with several improvements (not all in this pr)

- use of cnpg instead of single bitnamilegacy postgres container
- use for g10k instead of r10k
- run one instance of g10k per namespace, instead of per-pod
- store only keep one copy of the environments/branches (instead of per-pod)
- change g10k to native cronjob instead of hacky implementation
- use vault secrets

part one adds:

- cnpg puppetdb pgsql cluster
- cnpg puppetdb pgpooler
- persistent volume claims for puppet, puppetdb, the code repository, etc

Reviewed-on: #25
2026-03-09 01:10:30 +11:00
unkinben 05a88459a5 chore: migrate artifactapi to kustomize (#18)
- migrate terraform deployment to kustomize

Reviewed-on: #18
2026-03-06 21:35:47 +11:00
unkinben dbd8914013 feat: migrate woodpecker to argocd (#13)
- move woodpecker helm chart deployment to argocd
- move cnpg resources
- move vault resources

Reviewed-on: #13
2026-03-03 22:24:17 +11:00
unkinben be9d485bfe feat: testing jfrog-container-registry (#11)
- trialing jfrog container registry

Reviewed-on: #11
2026-03-02 23:07:47 +11:00
unkinben 0daa026f01 feat: add pre-commit workflow (#10)
- enforce pre-commit is run for all pull-requests

Reviewed-on: #10
2026-03-02 00:19:04 +11:00
unkinben ebb47348fe fix: resolve issues with helm deployments (#8)
- remove helm-patch files that are unused
- change platform namespaces allowed to *-system
- change chart name

Reviewed-on: #8
2026-03-01 18:55:47 +11:00
unkinben 4809dad90f chore: update managed applications (#7)
- ensure reloader is included in directory list
- remove untracked projects

Reviewed-on: #7
2026-03-01 17:18:54 +11:00
unkinben ce261f66c0 chore: rename apps (#4)
- rename apps from cluster-app to project-app

Reviewed-on: #4
2026-03-01 15:16:34 +11:00
unkinben 971835f845 feat: initial commit
- add structure to clusters, apps and argocd objects
- add bootstrapping features
2026-03-01 14:31:16 +11:00