Files
argocd-apps/apps/base/observability/gateway.yaml
unkinben 5f8f2b1c70 observability: migrate VictoriaMetrics to operator CRDs + Consul SD (#234)
## Why
The k8s au-syd1 VictoriaMetrics stack ran as two helm charts and only scraped in-cluster targets. The victoria-metrics-operator already runs in vm-system, so this moves the stack onto operator-managed CRDs. That unlocks VMServiceScrape/VMPodScrape (auto-converted from Prometheus ServiceMonitors, used by a follow-up PR) and adds Consul service discovery so the cluster scrapes the **same puppet-prod targets** as the puppet vmagent. Also shrinks vmstorage 3 → 2 (Ceph-backed, replicationFactor 2).

## Changes
- Add **VMCluster `main`**: vmstorage 2 replicas (cephrbd-fast-delete 200Gi, 180d retention, replicationFactor 2), vminsert/vmselect 2 replicas + HPA (2–10, 60% cpu).
- Add **VMAgent `main`**: retains the kubernetes SD jobs (apiservers/nodes/cadvisor), `selectAllByDefault` for VMServiceScrape/VMPodScrape, and a **Consul SD job** against `consul.service.consul` (resolves to the puppet Consul from pods) replicating the puppet vmagent relabels — keep tag `metrics`, `__scheme__` from `metrics_scheme`, `job` from `metrics_job`. TLS is **verified against the reflected `vault-ca-cert`** (no insecure skip-verify).
- Expose vmselect/vminsert/vmagent via **Gateway API** (traefik-internal Gateway + HTTPRoute, http→https redirect), same hostnames.
- Remove the two helm charts, their values files, and vendored charts.

## Notes
- Data wipe on cutover is acceptable (confirmed) — old helm PVCs can be deleted.
- Verify at rollout: pods resolve `*.main.unkin.net` node FQDNs (needed for CA SAN match on scrape targets); `/targets` shows `job=consul`.

Reviewed-on: #234
Co-authored-by: Ben Vincent <ben@unkin.net>
Co-committed-by: Ben Vincent <ben@unkin.net>
2026-07-05 22:17:32 +10:00

118 lines
3.1 KiB
YAML

---
apiVersion: gateway.networking.k8s.io/v1
kind: Gateway
metadata:
name: vmselect
namespace: observability
labels:
app.kubernetes.io/name: vmselect
app.kubernetes.io/instance: victoria-metrics
traefik.io/instance: internal
annotations:
cert-manager.io/cluster-issuer: vault-issuer
cert-manager.io/common-name: vmselect.k8s.syd1.au.unkin.net
cert-manager.io/private-key-size: "4096"
external-dns.alpha.kubernetes.io/hostname: vmselect.k8s.syd1.au.unkin.net
external-dns.alpha.kubernetes.io/target: 198.18.200.4
spec:
gatewayClassName: traefik-internal
listeners:
- name: http
port: 80
protocol: HTTP
hostname: vmselect.k8s.syd1.au.unkin.net
allowedRoutes:
namespaces:
from: Same
- name: https
port: 443
protocol: HTTPS
hostname: vmselect.k8s.syd1.au.unkin.net
allowedRoutes:
namespaces:
from: Same
tls:
mode: Terminate
certificateRefs:
- group: ""
kind: Secret
name: vmselect-tls
---
apiVersion: gateway.networking.k8s.io/v1
kind: Gateway
metadata:
name: vminsert
namespace: observability
labels:
app.kubernetes.io/name: vminsert
app.kubernetes.io/instance: victoria-metrics
traefik.io/instance: internal
annotations:
cert-manager.io/cluster-issuer: vault-issuer
cert-manager.io/common-name: vminsert.k8s.syd1.au.unkin.net
cert-manager.io/private-key-size: "4096"
external-dns.alpha.kubernetes.io/hostname: vminsert.k8s.syd1.au.unkin.net
external-dns.alpha.kubernetes.io/target: 198.18.200.4
spec:
gatewayClassName: traefik-internal
listeners:
- name: http
port: 80
protocol: HTTP
hostname: vminsert.k8s.syd1.au.unkin.net
allowedRoutes:
namespaces:
from: Same
- name: https
port: 443
protocol: HTTPS
hostname: vminsert.k8s.syd1.au.unkin.net
allowedRoutes:
namespaces:
from: Same
tls:
mode: Terminate
certificateRefs:
- group: ""
kind: Secret
name: vminsert-tls
---
apiVersion: gateway.networking.k8s.io/v1
kind: Gateway
metadata:
name: vmagent
namespace: observability
labels:
app.kubernetes.io/name: vmagent
app.kubernetes.io/instance: victoria-metrics
traefik.io/instance: internal
annotations:
cert-manager.io/cluster-issuer: vault-issuer
cert-manager.io/common-name: vmagent.k8s.syd1.au.unkin.net
cert-manager.io/private-key-size: "4096"
external-dns.alpha.kubernetes.io/hostname: vmagent.k8s.syd1.au.unkin.net
external-dns.alpha.kubernetes.io/target: 198.18.200.4
spec:
gatewayClassName: traefik-internal
listeners:
- name: http
port: 80
protocol: HTTP
hostname: vmagent.k8s.syd1.au.unkin.net
allowedRoutes:
namespaces:
from: Same
- name: https
port: 443
protocol: HTTPS
hostname: vmagent.k8s.syd1.au.unkin.net
allowedRoutes:
namespaces:
from: Same
tls:
mode: Terminate
certificateRefs:
- group: ""
kind: Secret
name: vmagent-tls