observability: migrate VictoriaMetrics to operator CRDs + Consul SD #234
Reference in New Issue
Block a user
Delete Branch "benvin/vm-operator-migration"
Deleting a branch is permanent. Although the deleted branch may continue to exist for a short time before it actually gets removed, it CANNOT be undone in most cases. Continue?
Why
The k8s au-syd1 VictoriaMetrics stack ran as two helm charts and only scraped in-cluster targets. The victoria-metrics-operator already runs in vm-system, so this moves the stack onto operator-managed CRDs. That unlocks VMServiceScrape/VMPodScrape (auto-converted from Prometheus ServiceMonitors, used by a follow-up PR) and adds Consul service discovery so the cluster scrapes the same puppet-prod targets as the puppet vmagent. Also shrinks vmstorage 3 → 2 (Ceph-backed, replicationFactor 2).
Changes
main: vmstorage 2 replicas (cephrbd-fast-delete 200Gi, 180d retention, replicationFactor 2), vminsert/vmselect 2 replicas + HPA (2–10, 60% cpu).main: retains the kubernetes SD jobs (apiservers/nodes/cadvisor),selectAllByDefaultfor VMServiceScrape/VMPodScrape, and a Consul SD job againstconsul.service.consul(resolves to the puppet Consul from pods) replicating the puppet vmagent relabels — keep tagmetrics,__scheme__frommetrics_scheme,jobfrommetrics_job. TLS is verified against the reflectedvault-ca-cert(no insecure skip-verify).Notes
*.main.unkin.netnode FQDNs (needed for CA SAN match on scrape targets);/targetsshowsjob=consul.