The k8s au-syd1 VictoriaMetrics stack ran as two helm charts
(victoria-metrics-cluster + victoria-metrics-agent) and only scraped
in-cluster targets. The victoria-metrics-operator already runs in
vm-system, so move the stack onto operator-managed CRDs. This lets the
VMAgent consume VMServiceScrape/VMPodScrape (auto-converted from
Prometheus ServiceMonitors) and adds Consul service discovery so the
cluster scrapes the same puppet-prod targets as the puppet vmagent.
Changes:
- Add VMCluster `main`: vmstorage 2 replicas (down from 3, replicationFactor
2, cephrbd-fast-delete 200Gi, 180d retention), vminsert/vmselect 2 replicas
+ HPA (2-10, 60% cpu).
- Add VMAgent `main`: keeps the kubernetes SD jobs (apiservers/nodes/cadvisor),
selectAllByDefault for VMServiceScrape/VMPodScrape, and a Consul SD job
against consul.service.consul (puppet Consul) replicating the puppet vmagent
relabels (keep tag `metrics`, scheme from `metrics_scheme`, job from
`metrics_job`). TLS verified against the reflected vault-ca-cert (no
insecure skip-verify).
- Expose vmselect/vminsert/vmagent via Gateway API (traefik-internal Gateway +
HTTPRoute, http->https redirect), same hostnames as before.
- Remove the two helm charts, their values files, and vendored charts.
Migrate Victoria Metrics cluster and agent from Terragrunt to ArgoCD/Kustomize.
Creates new observability AppProject and ApplicationSet.
Deploys victoria-metrics-cluster v0.33.0 (vmselect/vminsert/vmstorage with
HPA, PDB, ingress) and victoria-metrics-agent v0.30.0 (3 replicas, k8s scrape
configs) in the observability namespace.
💘 Generated with Crush
Assisted-by: Claude Sonnet 4.6 via Crush <crush@charm.land>
Reviewed-on: #82