Commit Graph

1167 Commits

Author SHA1 Message Date
benvin 97d21c81c5 feat: make rke2 registries.yaml conditional on manage_registries (#472)
Add/Remove the registries.yaml file based on the manage_registries
boolean. We are leaving it on default=false now as the artifactapi
server was broken.

---------

Co-authored-by: Ben Vincent <ben@unkin.net>
Reviewed-on: #472
2026-06-27 07:50:31 +10:00
unkinben e140b300bb chore: bump almalinux9 image tags (#471)
Bump almalinux9 image tags to 20260606

Reviewed-on: #471
Co-authored-by: Ben Vincent <ben@unkin.net>
Co-committed-by: Ben Vincent <ben@unkin.net>
2026-06-07 00:31:30 +10:00
benvin 57c844b7e8 feat: upgrade grafana from default to 13.0.2 (#470)
Pin grafana package version to 13.0.2 via a new version parameter on
profiles::metrics::grafana, wired through to the puppet-grafana class.

---------

Co-authored-by: Ben Vincent <ben@unkin.net>
Reviewed-on: #470
2026-06-06 23:46:16 +10:00
benvin 757de20682 feat: upgrade gitea from 1.22.0 to 1.26.2 (#469)
- update release to install to 1.26.2
- change base_url to artifactapi
- update releases/checksums

---------

Co-authored-by: Ben Vincent <ben@unkin.net>
Reviewed-on: #469
2026-06-06 20:23:25 +10:00
unkinben 6ef1b20abd feat: add switch to change to almalinux-vault (#468)
- move old almalinux versions to query the almalinux-vault
- default to the almalinux remote

Reviewed-on: #468
2026-06-06 17:35:04 +10:00
unkinben b754d947d5 feat: add auth.unkin.net proxying to Kubernetes Traefik ingress (#467)
Add static haproxy2 backends for syd1 Kubernetes Traefik ingress
(external 198.18.199.0, internal 198.18.200.4) and route
auth.unkin.net to the internal backend with Let's Encrypt cert.

Reviewed-on: #467
2026-06-02 22:50:10 +10:00
unkinben ba35c8907c chore: increase inotify limits on rke2 nodes to fix fsnotify watcher errors (#466)
Reviewed-on: #466
2026-05-26 23:50:25 +10:00
unkinben ceacfc85ae feat: restart rke2 when registries.yaml is deployed (#465)
- ensure we restart rke2 to pickup registries.yaml changes
- add a comment to registries.yaml to force a restart

Reviewed-on: #465
2026-05-06 23:11:20 +10:00
unkinben 7e45e0d2e5 chore: expand puppet-validate to two cpus (#464)
puppet validate takes 5 mins on one core. doubling to two cores should
bring it down to 2.5mins

Reviewed-on: #464
2026-05-06 22:29:39 +10:00
unkinben 682f65e046 chore: setup proper resource requirements for puppet ci jobs (#463)
currently, all woodpecker jobs jam onto one host, and have no resource
limits resulting in one kubernetes host suddenly maxing its cpu

- ensure we allocate resources for each woodpecker job

Reviewed-on: #463
2026-05-06 22:24:30 +10:00
unkinben 0d412aebdb chore: deploy rke2 registries.yaml (#462)
ensure all new docker pulls are actioned through artifactapi

Reviewed-on: #462
2026-05-06 22:17:59 +10:00
unkinben 4b9b28ddb7 chore: disable rp_filter on k8s nodes (#461)
- k8s control/compute are multihomed, must disable rp_filter

Reviewed-on: #461
2026-04-11 21:51:42 +10:00
unkinben 0451894b48 feat: add ceph service management profiles and facts (#459)
## Summary

- Adds `Unkin::Ceph::Utils` facter module detecting ceph service instances via `systemctl list-units`, exposing `is_ceph_mon`, `is_ceph_mgr`, `is_ceph_mds`, `is_ceph_osd` booleans and a `ceph_services` hash of unit names
- Adds `profiles::ceph::mon`, `mgr`, `mds`, `osd` — each with `Boolean $ensure_running` that iterates discovered service instances and manages them as running and enabled
- Works across incus nodes (mon/mgr/mds/osd) and k8s compute/control nodes (osd only); verified on prodnxsr0001 which correctly reports `is_ceph_osd: true` and `ceph_services: {osd: [ceph-osd@5]}`

## Test plan

- [x] Noop deploy against prodnxsr0001.main.unkin.net passed cleanly
- [x] `ceph_services` fact returns correct service map
- [x] `is_ceph_osd` returns `True`, `is_ceph_mon` returns `False` as expected
- [x] Test on an incus/ceph node with mon/mgr/mds services

Reviewed-on: #459
2026-04-07 19:02:17 +10:00
unkinben 3714691240 chore: enable access to dns (#460)
rebuilding router, taking the chance to not mess up ip ranges. I did
have 198.18.21.0/24 and 198.18.21.160/27 and 198.18.21.192/27 all on
differnt interfaces.

- update IP's that can reach bind view for main.unkin.net
- keep both for intermediate period

Reviewed-on: #460
2026-04-06 22:46:40 +10:00
unkinben dbe04a91e3 chore: change to ceph-public loopback (#458)
- use ceph public loopback port 9443 for dashboard

Reviewed-on: #458
2026-04-05 22:35:39 +10:00
unkinben 476c8115c5 fix: replace puppetdbquery with native PQL queries (#457)
Replace deprecated dalen-puppetdbquery module with native puppetdb_query
function using PQL syntax to resolve URI.escape compatibility issues.
This is required to migrated to Puppet 8 (and kubernetes).

Changes:
- Remove dalen-puppetdbquery dependency from Puppetfile
- Replace query_nodes() calls with puppetdb_query() using PQL syntax
- Update 27 function calls across 18 Puppet manifests
- Maintain equivalent functionality with improved compatibility

Reviewed-on: #457
2026-03-21 22:35:42 +11:00
unkinben 1d41d07b2d fix: allow transfer for external-dns (#456)
external-dns required axfr support to remove old records. add the
capability for the externaldns tsig key.

Reviewed-on: #456
2026-03-18 20:00:22 +11:00
unkinben 029c998797 feat: improve ci performance (#455)
split all pre-commit checks into individual workflows, so that
woodpecker spawns a container/job for each. this vastly improves the
time it takes for CI to complete checks for puppet

- create per-pre-commit-check pre-commit config files
- create per-pre-commit-check woodpecker workflows

Reviewed-on: #455
2026-03-17 17:38:22 +11:00
unkinben 0c0d4a3f61 chore: update r10k repo path (#454)
- change to use letsencrypt ssl path for simpler tls trust management

Reviewed-on: #454
2026-03-17 17:36:58 +11:00
unkinben 1e707b8b9a feat: puppetboard 7 python (#453)
auto-upgraded to puppetboard 7, which requires 3.10 python. upgrade
puppetboard venv from 3.9 (system python) -> 3.12

Reviewed-on: #453
2026-03-16 23:53:52 +11:00
unkinben 416c5ce7d9 chore: update puppet-bind repo url (#452)
changing this to `git.unkin.net` as that certificate is publicly
trusted, requiring no certificate changes for r10k docker container

Reviewed-on: #452
2026-03-08 19:01:55 +11:00
unkinben 0377c40a07 chore: cleanup gitea actions workflows (#451)
- migrated workflows to woodpeckerci

Reviewed-on: #451
2026-02-28 17:50:41 +11:00
unkinben 8bb40dadce feat: add woodpecker ci jobs (#450)
- pre-commit job to run pre-commit against

Reviewed-on: #450
2026-02-28 17:30:23 +11:00
unkinben bc769aa1df feat: add ldap groups for kubernetes/vault (#449)
need to separate the permissions inside vault into different groups, one
per-permission.

- add group for each kubernetes role in vault

Reviewed-on: #449
2026-02-14 19:22:26 +11:00
unkinben 4e652ccbe6 chore: add alt-names to consul (#448)
- ensure consul datacenter is added to altnames

Reviewed-on: #448
2026-02-09 01:03:20 +11:00
unkinben 8c24c6582f feat: manage vault version (#446)
- add params for version and package name
- add param to cleanup openbao
- add version lock (if not latest)

Reviewed-on: #446
2026-02-08 22:26:22 +11:00
unkinben 6bfc63ca31 feat: enable plugins for vault/openbao (#447)
- install openbao-plugins
- add plugin_directory

Reviewed-on: #447
2026-02-08 19:19:33 +11:00
unkinben 69dc9e8f66 docs: add docs for cephfs (#445)
- specifically related to managing csi volumes for kubernetes

Reviewed-on: #445
2026-02-03 19:56:14 +11:00
unkinben c4d28d52bc chore: remove helm deploys from puppet (#444)
- migrate helm deployments to terraform

Reviewed-on: #444
2026-01-30 20:52:51 +11:00
unkinben 6219855fb1 chore: add additional user (#443)
- as per request

Reviewed-on: #443
2026-01-26 20:21:10 +11:00
unkinben 7215a6f534 chore: terraform state too large for body (#442)
- update consul/nginx max body size to 512MB

Reviewed-on: #442
2026-01-18 17:15:08 +11:00
unkinben 88efdbcdd3 chore: reduce synced repos (#441)
- remove repos now available via artifactapi

Reviewed-on: #441
2026-01-17 17:12:44 +11:00
unkinben 3c114371e0 chore: docs for ceph (#440)
- add maintenance mode, how to bootstrap an osd, remove an osd

Reviewed-on: #440
2026-01-17 13:26:44 +11:00
unkinben 1077bdcbc1 chore: update ceph gpgkey (#438)
- stop checking ceph gpgkey (fixme)
- use artifactapi for retrieving large rke image bundle

Reviewed-on: #438
2026-01-16 23:51:11 +11:00
unkinben 4e928585f5 fix: ceph repos remove dash (#437)
Reviewed-on: #437
2026-01-15 21:52:17 +11:00
unkinben dbe1398218 chore: centralise all yum repo configuration (#436)
- add 30+ repository definitions to AlmaLinux/all_releases.yaml with `ensure: absent` defaults
- update all role-specific hieradata files to use `ensure: present` pattern
- remove duplicated repository URL/GPG key configurations from individual roles
- maintains existing functionality while improving maintainability"

Reviewed-on: #436
2026-01-15 21:35:13 +11:00
unkinben 9f5b1cec82 fix: thundering hurd (#435)
- started all puppet clients at the same time, resulting in thundering herd
- add a randomness timer of 10 minutes

Reviewed-on: #435
2026-01-12 20:21:39 +11:00
unkinben 383bbb0507 fix: ensure join-api is functioning (#434)
- consul was directing new rke2 control nodes to a dead join api
- add additional check to verify its responding (not just up)

Reviewed-on: #434
2026-01-11 13:51:36 +11:00
unkinben 6f51bffeaa core: bump radowgw client_max_body_size (#433)
Reviewed-on: #433
2026-01-07 23:27:09 +11:00
unkinben 57870658b5 feat: act runner updates (#432)
saving artifacts are breaking in some actions as the runner will switch
between different git hosts. using haproxy will ensure the same backend
is always hit via stick-tables and cookies

- ensure runners use haproxy to reach git

we now package act_runner now, lets use the rpm

- change installation method to rpm instead of curl + untar
- add capability to versionlock act_runner
- fix paths to act_runner
- remove manually installed act_runner

Reviewed-on: #432
2026-01-03 21:51:47 +11:00
unkinben f8caa71f34 fix: increase artifact upload size for git (#431)
- rpmbuilder artifacts can be very large
- increase 1Gb limit to 5GB

Reviewed-on: #431
2025-12-30 22:52:43 +11:00
unkinben a2c56c9e46 chore: add almalinux 9.7 repositories (#430)
- ensure almalinux 9.7 is synced

Reviewed-on: #430
2025-12-30 22:48:54 +11:00
unkinben 40d8e924ee feat: enable managing root password (#429)
- update root password in common.eyaml
- add missing param to the accounts::root manifest
- remove if block as undef sshkeys has same effect

Reviewed-on: #429
2025-12-28 20:12:12 +11:00
unkinben 0aec795aec feat: manage externaldns bind (#428)
- add module to manage externaldns bind for k8s
- add infra::dns::externaldns role
- add 198.18.19.20 as anycast for k8s external-dns service

Reviewed-on: #428
2025-11-22 23:25:55 +11:00
unkinben 9854403b02 feat: add syslog listener for vlinsert (#427)
- enable syslog capture via vlinsert
- add syslog.service.consul service

Reviewed-on: #427
2025-11-20 23:47:10 +11:00
unkinben 6400c89853 feat: add vmcluster static targets (#426)
- add ability to list static targets for vmagent to scrape
- add vyos router to be scraped

Reviewed-on: #426
2025-11-20 20:19:53 +11:00
unkinben 9eff241003 feat: add SMTP submission listener and enhance stalwart configuration (#425)
- add SMTP submission listener on port 587 with TLS requirement
- configure HAProxy frontend/backend for submission with send-proxy-v2 support
- add send-proxy-v2 support to all listeners
- add dynamic HAProxy node discovery for proxy trusted networks
- use service hostname instead of node FQDN for autoconfig/autodiscover
- remove redundant IMAP/IMAPS/SMTP alt-names from TLS certificates
- update VRRP CNAME configuration to use mail.main.unkin.net

Reviewed-on: #425
2025-11-09 18:48:06 +11:00
unkinben 35614060bd chore: replace stalwart S3 keys (#424)
- update stalwart S3 AK/SK after migrating to new zonegroup

Reviewed-on: #424
2025-11-08 22:56:24 +11:00
unkinben 1b0fd10fd7 fix: remove . from end of vrrp_cnames (#423)
- autoconfig/autodiscovery should not end with a dot

Reviewed-on: #423
2025-11-08 21:38:10 +11:00
unkinben 2c9fb3d86a chore: add stalwart required tls alt names (#422)
- add alt-names for service addresses stalwart is expected to reply too

Reviewed-on: #422
2025-11-08 21:28:41 +11:00