docs: add RKE2 image rewriting guide and expand pattern examples

Add a new "Docker Image Rewriting with RKE2" section covering:
- How the /v2/ proxy integrates with registries.yaml mirror rewrites
- Per-registry examples (docker.io, ghcr.io, registry.k8s.io, quay.io)
- include_patterns for restricting which images are cached
- TLS CA configuration for private certificate authorities
- Apply and verification commands

Expand the Configuration section with:
- Richer include_patterns examples (anchored, extension, architecture,
  Docker image name patterns, repodata directories)
- New index_patterns section explaining built-in defaults per package
  type and how to add custom patterns (Helm index.yaml, APT InRelease/
  Packages.gz, extra RPM comps.xml)
This commit is contained in:
2026-04-25 20:20:42 +10:00
parent 8da43e610e
commit f3394b9ca6
+275 -10
View File
@@ -161,27 +161,101 @@ remotes:
### Include Patterns ### Include Patterns
Include patterns are regular expressions that control which files can be accessed: Include patterns are regular expressions that control which files can be accessed. Patterns use Python `re.search`, so they match anywhere in the path unless anchored with `^` or `$`. Only files matching at least one pattern are served; all others return HTTP 403.
```yaml ```yaml
include_patterns: include_patterns:
# Specific project patterns # Exact project + architecture — most restrictive
- "^gruntwork-io/terragrunt/releases/download/.*/terragrunt_linux_amd64$"
# Any release asset for a project, any version
- "gruntwork-io/terragrunt/.*terragrunt_linux_amd64.*" - "gruntwork-io/terragrunt/.*terragrunt_linux_amd64.*"
# File extension patterns # File extension only — allow all files of a given type from any path
- ".*\\.tar\\.gz$" - ".*\\.tar\\.gz$"
- ".*\\.zip$"
- ".*\\.rpm$" - ".*\\.rpm$"
- ".*\\.zip$"
# Architecture-specific patterns # Architecture subtree — allow everything under x86_64/
- ".*/x86_64/.*" - ".*/x86_64/.*"
- ".*/linux-amd64/.*"
# Version-specific patterns # Combined: architecture + extension
- "prometheus/node_exporter/.*/node_exporter-.*\\.linux-amd64\\.tar\\.gz$" - ".*/x86_64/.*\\.rpm$"
- ".*/noarch/.*\\.rpm$"
# Docker image names (used with package: docker remotes)
- "^library/nginx" # nginx official images only
- "^rancher/" # all rancher/* images
- "^rancher/rke2-runtime" # specific image
# Repodata directories — allow all metadata for an RPM repo
- ".*/repodata/.*$"
``` ```
**Security Note**: Only files matching at least one include pattern are accessible. Files not matching any pattern return HTTP 403. **Security note**: Omitting `include_patterns` entirely allows all files from that remote. Index files (e.g. `APKINDEX.tar.gz`, `repomd.xml`, tag manifests) always bypass pattern enforcement — they are served unconditionally so clients can discover available packages.
### Index Patterns
Index patterns identify repository metadata files. Index files get special treatment:
- **Always served** regardless of `include_patterns`
- **Cached with `index_ttl`** instead of `file_ttl`
- **Automatically refreshed** when the TTL expires — the cached copy is evicted and re-fetched on next request
Built-in defaults per package type:
| Package type | Built-in index patterns |
|---|---|
| `alpine` | `APKINDEX\.tar\.gz$` |
| `rpm` | `repomd\.xml$`, `repodata/` metadata (xml, sqlite, yaml, asc, txt variants), `Packages\.gz$` |
| `docker` | Tag manifests (non-digest refs), `/tags/list` |
| `generic` | *(none)* |
Use `index_patterns` to add extra patterns on top of the defaults. Duplicates are ignored automatically.
```yaml
remotes:
helm-charts:
base_url: "https://charts.example.com"
type: "remote"
package: "generic"
include_patterns:
- ".*\\.tgz$" # chart archives
index_patterns:
- "index\\.yaml$" # Helm repo index — re-fetched on every TTL expiry
cache:
file_ttl: 0
index_ttl: 600 # re-check the index every 10 minutes
apt-mirror:
base_url: "https://apt.example.com"
type: "remote"
package: "generic"
include_patterns:
- ".*\\.deb$"
index_patterns:
- "InRelease$" # signed APT release file
- "Release$" # unsigned APT release file
- "Packages\\.gz$" # compressed package list
- "Packages\\.xz$"
cache:
file_ttl: 0
index_ttl: 3600 # hourly index refresh
almalinux-with-extras:
base_url: "https://mirror.example.com/almalinux"
type: "remote"
package: "rpm" # inherits repomd.xml + repodata/* defaults
include_patterns:
- ".*/x86_64/.*\\.rpm$"
- ".*/noarch/.*\\.rpm$"
index_patterns:
- "comps\\.xml$" # optional group metadata (adds to rpm defaults)
cache:
file_ttl: 0
index_ttl: 7200
```
Pattern matching uses `re.search`, so `"index\\.yaml$"` matches `/stable/index.yaml` and `/index.yaml`. Anchor with `^` to restrict to the path root.
### Cache Configuration ### Cache Configuration
@@ -661,4 +735,195 @@ curl "http://localhost:8000/api/github/gruntwork-io/terragrunt/releases/download
- Use external managed databases for production workloads - Use external managed databases for production workloads
- Configure backup strategies for persistent volumes - Configure backup strategies for persistent volumes
- Set up proper TLS certificates for ingress - Set up proper TLS certificates for ingress
- Consider using StatefulSets for databases with persistent storage - Consider using StatefulSets for databases with persistent storage
## Docker Image Rewriting with RKE2
RKE2 can route container image pulls through registry mirrors using `/etc/rancher/rke2/registries.yaml`. The artifact API implements the Docker Registry HTTP API v2 at `/v2/`, so it acts as a transparent caching mirror for any upstream registry.
### How it works
1. A pod requests `docker.io/library/nginx:latest`
2. RKE2 intercepts the pull and rewrites the image path using the `rewrite` rules
3. The rewritten request hits the artifact API (`/v2/dockerhub/library/nginx/manifests/latest`)
4. On first access the API fetches the manifest and layers from Docker Hub and caches them in S3
5. Subsequent pulls are served directly from cache, with no upstream traffic
### registries.yaml
Place this file on every RKE2 node at `/etc/rancher/rke2/registries.yaml`. The `rewrite` field maps the original image path (as the upstream registry sees it) to the path the artifact API expects under `/v2/{remote_name}/...`.
#### Docker Hub
Docker Hub resolves unqualified image names like `nginx` as `library/nginx`. The rewrite prepends the remote name so the request lands on the correct remote.
```yaml
# /etc/rancher/rke2/registries.yaml
mirrors:
docker.io:
endpoint:
- "https://artifacts.example.com"
rewrite:
"^(.*)$": "dockerhub/$1"
```
Corresponding `remotes.yaml` entry:
```yaml
remotes:
dockerhub:
base_url: "https://registry-1.docker.io"
type: "remote"
package: "docker"
username: "your-dockerhub-username"
password: "your-dockerhub-token" # PAT with read scope
cache:
file_ttl: 0
index_ttl: 300
```
A pull of `nginx:latest` becomes `/v2/dockerhub/library/nginx/manifests/latest` on the artifact API.
#### GitHub Container Registry (ghcr.io)
```yaml
mirrors:
ghcr.io:
endpoint:
- "https://artifacts.example.com"
rewrite:
"^(.*)$": "ghcr/$1"
```
```yaml
remotes:
ghcr:
base_url: "https://ghcr.io"
type: "remote"
package: "docker"
username: "your-github-username"
password: "ghp_your_github_pat" # read:packages scope required
cache:
file_ttl: 0
index_ttl: 300
```
A pull of `ghcr.io/rancher/rke2-runtime:v1.30.0-rke2r1` becomes `/v2/ghcr/rancher/rke2-runtime/manifests/v1.30.0-rke2r1`.
#### Multiple registries
```yaml
# /etc/rancher/rke2/registries.yaml
mirrors:
docker.io:
endpoint:
- "https://artifacts.example.com"
rewrite:
"^(.*)$": "dockerhub/$1"
ghcr.io:
endpoint:
- "https://artifacts.example.com"
rewrite:
"^(.*)$": "ghcr/$1"
registry.k8s.io:
endpoint:
- "https://artifacts.example.com"
rewrite:
"^(.*)$": "k8s-registry/$1"
quay.io:
endpoint:
- "https://artifacts.example.com"
rewrite:
"^(.*)$": "quay/$1"
```
Each entry needs a matching remote in `remotes.yaml` using the name from the rewrite target (e.g. `k8s-registry`, `quay`).
#### Restricting which images are cached
Use `include_patterns` on the remote to allow only specific images through the proxy. Requests for images not matching any pattern return HTTP 403 to the node.
```yaml
remotes:
dockerhub:
base_url: "https://registry-1.docker.io"
type: "remote"
package: "docker"
include_patterns:
- "^library/nginx" # official nginx only
- "^library/redis" # official redis only
- "^rancher/" # all rancher images
- "^grafana/grafana" # specific image
cache:
file_ttl: 0
index_ttl: 300
```
Omit `include_patterns` to allow all images from that registry.
#### TLS configuration
If the artifact API uses a private CA certificate, tell containerd about it in `registries.yaml`:
```yaml
mirrors:
docker.io:
endpoint:
- "https://artifacts.example.com"
rewrite:
"^(.*)$": "dockerhub/$1"
configs:
"artifacts.example.com":
tls:
ca_file: /etc/ssl/certs/internal-ca.crt
```
### Applying the configuration
```bash
# Write registries.yaml on each node (server and agent)
sudo mkdir -p /etc/rancher/rke2
sudo tee /etc/rancher/rke2/registries.yaml <<'EOF'
mirrors:
docker.io:
endpoint:
- "https://artifacts.example.com"
rewrite:
"^(.*)$": "dockerhub/$1"
ghcr.io:
endpoint:
- "https://artifacts.example.com"
rewrite:
"^(.*)$": "ghcr/$1"
EOF
# Restart the RKE2 service (server nodes)
sudo systemctl restart rke2-server
# Or on agent nodes
sudo systemctl restart rke2-agent
# Confirm containerd picked up the mirror config
sudo /var/lib/rancher/rke2/bin/crictl info | jq '.config.registry.mirrors'
```
### Verifying pulls go through the cache
```bash
# Pull an image on a node
sudo /var/lib/rancher/rke2/bin/crictl pull nginx:latest
# Check the artifact API received the request
kubectl logs deployment/artifactapi -n artifact-storage | grep "nginx"
# Expect: Cache MISS on first pull, Cache HIT on subsequent pulls
# Query the manifest endpoint directly — 200 means it's cached
curl -I https://artifacts.example.com/v2/dockerhub/library/nginx/manifests/latest
# Check what's stored in the cache
curl https://artifacts.example.com/ | jq '.remotes'
```