docs: add RKE2 image rewriting guide and expand pattern examples
Add a new "Docker Image Rewriting with RKE2" section covering: - How the /v2/ proxy integrates with registries.yaml mirror rewrites - Per-registry examples (docker.io, ghcr.io, registry.k8s.io, quay.io) - include_patterns for restricting which images are cached - TLS CA configuration for private certificate authorities - Apply and verification commands Expand the Configuration section with: - Richer include_patterns examples (anchored, extension, architecture, Docker image name patterns, repodata directories) - New index_patterns section explaining built-in defaults per package type and how to add custom patterns (Helm index.yaml, APT InRelease/ Packages.gz, extra RPM comps.xml)
This commit is contained in:
@@ -161,27 +161,101 @@ remotes:
|
||||
|
||||
### Include Patterns
|
||||
|
||||
Include patterns are regular expressions that control which files can be accessed:
|
||||
Include patterns are regular expressions that control which files can be accessed. Patterns use Python `re.search`, so they match anywhere in the path unless anchored with `^` or `$`. Only files matching at least one pattern are served; all others return HTTP 403.
|
||||
|
||||
```yaml
|
||||
include_patterns:
|
||||
# Specific project patterns
|
||||
# Exact project + architecture — most restrictive
|
||||
- "^gruntwork-io/terragrunt/releases/download/.*/terragrunt_linux_amd64$"
|
||||
|
||||
# Any release asset for a project, any version
|
||||
- "gruntwork-io/terragrunt/.*terragrunt_linux_amd64.*"
|
||||
|
||||
# File extension patterns
|
||||
# File extension only — allow all files of a given type from any path
|
||||
- ".*\\.tar\\.gz$"
|
||||
- ".*\\.zip$"
|
||||
- ".*\\.rpm$"
|
||||
- ".*\\.zip$"
|
||||
|
||||
# Architecture-specific patterns
|
||||
# Architecture subtree — allow everything under x86_64/
|
||||
- ".*/x86_64/.*"
|
||||
- ".*/linux-amd64/.*"
|
||||
|
||||
# Version-specific patterns
|
||||
- "prometheus/node_exporter/.*/node_exporter-.*\\.linux-amd64\\.tar\\.gz$"
|
||||
# Combined: architecture + extension
|
||||
- ".*/x86_64/.*\\.rpm$"
|
||||
- ".*/noarch/.*\\.rpm$"
|
||||
|
||||
# Docker image names (used with package: docker remotes)
|
||||
- "^library/nginx" # nginx official images only
|
||||
- "^rancher/" # all rancher/* images
|
||||
- "^rancher/rke2-runtime" # specific image
|
||||
|
||||
# Repodata directories — allow all metadata for an RPM repo
|
||||
- ".*/repodata/.*$"
|
||||
```
|
||||
|
||||
**Security Note**: Only files matching at least one include pattern are accessible. Files not matching any pattern return HTTP 403.
|
||||
**Security note**: Omitting `include_patterns` entirely allows all files from that remote. Index files (e.g. `APKINDEX.tar.gz`, `repomd.xml`, tag manifests) always bypass pattern enforcement — they are served unconditionally so clients can discover available packages.
|
||||
|
||||
### Index Patterns
|
||||
|
||||
Index patterns identify repository metadata files. Index files get special treatment:
|
||||
- **Always served** regardless of `include_patterns`
|
||||
- **Cached with `index_ttl`** instead of `file_ttl`
|
||||
- **Automatically refreshed** when the TTL expires — the cached copy is evicted and re-fetched on next request
|
||||
|
||||
Built-in defaults per package type:
|
||||
|
||||
| Package type | Built-in index patterns |
|
||||
|---|---|
|
||||
| `alpine` | `APKINDEX\.tar\.gz$` |
|
||||
| `rpm` | `repomd\.xml$`, `repodata/` metadata (xml, sqlite, yaml, asc, txt variants), `Packages\.gz$` |
|
||||
| `docker` | Tag manifests (non-digest refs), `/tags/list` |
|
||||
| `generic` | *(none)* |
|
||||
|
||||
Use `index_patterns` to add extra patterns on top of the defaults. Duplicates are ignored automatically.
|
||||
|
||||
```yaml
|
||||
remotes:
|
||||
helm-charts:
|
||||
base_url: "https://charts.example.com"
|
||||
type: "remote"
|
||||
package: "generic"
|
||||
include_patterns:
|
||||
- ".*\\.tgz$" # chart archives
|
||||
index_patterns:
|
||||
- "index\\.yaml$" # Helm repo index — re-fetched on every TTL expiry
|
||||
cache:
|
||||
file_ttl: 0
|
||||
index_ttl: 600 # re-check the index every 10 minutes
|
||||
|
||||
apt-mirror:
|
||||
base_url: "https://apt.example.com"
|
||||
type: "remote"
|
||||
package: "generic"
|
||||
include_patterns:
|
||||
- ".*\\.deb$"
|
||||
index_patterns:
|
||||
- "InRelease$" # signed APT release file
|
||||
- "Release$" # unsigned APT release file
|
||||
- "Packages\\.gz$" # compressed package list
|
||||
- "Packages\\.xz$"
|
||||
cache:
|
||||
file_ttl: 0
|
||||
index_ttl: 3600 # hourly index refresh
|
||||
|
||||
almalinux-with-extras:
|
||||
base_url: "https://mirror.example.com/almalinux"
|
||||
type: "remote"
|
||||
package: "rpm" # inherits repomd.xml + repodata/* defaults
|
||||
include_patterns:
|
||||
- ".*/x86_64/.*\\.rpm$"
|
||||
- ".*/noarch/.*\\.rpm$"
|
||||
index_patterns:
|
||||
- "comps\\.xml$" # optional group metadata (adds to rpm defaults)
|
||||
cache:
|
||||
file_ttl: 0
|
||||
index_ttl: 7200
|
||||
```
|
||||
|
||||
Pattern matching uses `re.search`, so `"index\\.yaml$"` matches `/stable/index.yaml` and `/index.yaml`. Anchor with `^` to restrict to the path root.
|
||||
|
||||
### Cache Configuration
|
||||
|
||||
@@ -662,3 +736,194 @@ curl "http://localhost:8000/api/github/gruntwork-io/terragrunt/releases/download
|
||||
- Configure backup strategies for persistent volumes
|
||||
- Set up proper TLS certificates for ingress
|
||||
- Consider using StatefulSets for databases with persistent storage
|
||||
|
||||
## Docker Image Rewriting with RKE2
|
||||
|
||||
RKE2 can route container image pulls through registry mirrors using `/etc/rancher/rke2/registries.yaml`. The artifact API implements the Docker Registry HTTP API v2 at `/v2/`, so it acts as a transparent caching mirror for any upstream registry.
|
||||
|
||||
### How it works
|
||||
|
||||
1. A pod requests `docker.io/library/nginx:latest`
|
||||
2. RKE2 intercepts the pull and rewrites the image path using the `rewrite` rules
|
||||
3. The rewritten request hits the artifact API (`/v2/dockerhub/library/nginx/manifests/latest`)
|
||||
4. On first access the API fetches the manifest and layers from Docker Hub and caches them in S3
|
||||
5. Subsequent pulls are served directly from cache, with no upstream traffic
|
||||
|
||||
### registries.yaml
|
||||
|
||||
Place this file on every RKE2 node at `/etc/rancher/rke2/registries.yaml`. The `rewrite` field maps the original image path (as the upstream registry sees it) to the path the artifact API expects under `/v2/{remote_name}/...`.
|
||||
|
||||
#### Docker Hub
|
||||
|
||||
Docker Hub resolves unqualified image names like `nginx` as `library/nginx`. The rewrite prepends the remote name so the request lands on the correct remote.
|
||||
|
||||
```yaml
|
||||
# /etc/rancher/rke2/registries.yaml
|
||||
mirrors:
|
||||
docker.io:
|
||||
endpoint:
|
||||
- "https://artifacts.example.com"
|
||||
rewrite:
|
||||
"^(.*)$": "dockerhub/$1"
|
||||
```
|
||||
|
||||
Corresponding `remotes.yaml` entry:
|
||||
|
||||
```yaml
|
||||
remotes:
|
||||
dockerhub:
|
||||
base_url: "https://registry-1.docker.io"
|
||||
type: "remote"
|
||||
package: "docker"
|
||||
username: "your-dockerhub-username"
|
||||
password: "your-dockerhub-token" # PAT with read scope
|
||||
cache:
|
||||
file_ttl: 0
|
||||
index_ttl: 300
|
||||
```
|
||||
|
||||
A pull of `nginx:latest` becomes `/v2/dockerhub/library/nginx/manifests/latest` on the artifact API.
|
||||
|
||||
#### GitHub Container Registry (ghcr.io)
|
||||
|
||||
```yaml
|
||||
mirrors:
|
||||
ghcr.io:
|
||||
endpoint:
|
||||
- "https://artifacts.example.com"
|
||||
rewrite:
|
||||
"^(.*)$": "ghcr/$1"
|
||||
```
|
||||
|
||||
```yaml
|
||||
remotes:
|
||||
ghcr:
|
||||
base_url: "https://ghcr.io"
|
||||
type: "remote"
|
||||
package: "docker"
|
||||
username: "your-github-username"
|
||||
password: "ghp_your_github_pat" # read:packages scope required
|
||||
cache:
|
||||
file_ttl: 0
|
||||
index_ttl: 300
|
||||
```
|
||||
|
||||
A pull of `ghcr.io/rancher/rke2-runtime:v1.30.0-rke2r1` becomes `/v2/ghcr/rancher/rke2-runtime/manifests/v1.30.0-rke2r1`.
|
||||
|
||||
#### Multiple registries
|
||||
|
||||
```yaml
|
||||
# /etc/rancher/rke2/registries.yaml
|
||||
mirrors:
|
||||
docker.io:
|
||||
endpoint:
|
||||
- "https://artifacts.example.com"
|
||||
rewrite:
|
||||
"^(.*)$": "dockerhub/$1"
|
||||
|
||||
ghcr.io:
|
||||
endpoint:
|
||||
- "https://artifacts.example.com"
|
||||
rewrite:
|
||||
"^(.*)$": "ghcr/$1"
|
||||
|
||||
registry.k8s.io:
|
||||
endpoint:
|
||||
- "https://artifacts.example.com"
|
||||
rewrite:
|
||||
"^(.*)$": "k8s-registry/$1"
|
||||
|
||||
quay.io:
|
||||
endpoint:
|
||||
- "https://artifacts.example.com"
|
||||
rewrite:
|
||||
"^(.*)$": "quay/$1"
|
||||
```
|
||||
|
||||
Each entry needs a matching remote in `remotes.yaml` using the name from the rewrite target (e.g. `k8s-registry`, `quay`).
|
||||
|
||||
#### Restricting which images are cached
|
||||
|
||||
Use `include_patterns` on the remote to allow only specific images through the proxy. Requests for images not matching any pattern return HTTP 403 to the node.
|
||||
|
||||
```yaml
|
||||
remotes:
|
||||
dockerhub:
|
||||
base_url: "https://registry-1.docker.io"
|
||||
type: "remote"
|
||||
package: "docker"
|
||||
include_patterns:
|
||||
- "^library/nginx" # official nginx only
|
||||
- "^library/redis" # official redis only
|
||||
- "^rancher/" # all rancher images
|
||||
- "^grafana/grafana" # specific image
|
||||
cache:
|
||||
file_ttl: 0
|
||||
index_ttl: 300
|
||||
```
|
||||
|
||||
Omit `include_patterns` to allow all images from that registry.
|
||||
|
||||
#### TLS configuration
|
||||
|
||||
If the artifact API uses a private CA certificate, tell containerd about it in `registries.yaml`:
|
||||
|
||||
```yaml
|
||||
mirrors:
|
||||
docker.io:
|
||||
endpoint:
|
||||
- "https://artifacts.example.com"
|
||||
rewrite:
|
||||
"^(.*)$": "dockerhub/$1"
|
||||
|
||||
configs:
|
||||
"artifacts.example.com":
|
||||
tls:
|
||||
ca_file: /etc/ssl/certs/internal-ca.crt
|
||||
```
|
||||
|
||||
### Applying the configuration
|
||||
|
||||
```bash
|
||||
# Write registries.yaml on each node (server and agent)
|
||||
sudo mkdir -p /etc/rancher/rke2
|
||||
sudo tee /etc/rancher/rke2/registries.yaml <<'EOF'
|
||||
mirrors:
|
||||
docker.io:
|
||||
endpoint:
|
||||
- "https://artifacts.example.com"
|
||||
rewrite:
|
||||
"^(.*)$": "dockerhub/$1"
|
||||
ghcr.io:
|
||||
endpoint:
|
||||
- "https://artifacts.example.com"
|
||||
rewrite:
|
||||
"^(.*)$": "ghcr/$1"
|
||||
EOF
|
||||
|
||||
# Restart the RKE2 service (server nodes)
|
||||
sudo systemctl restart rke2-server
|
||||
|
||||
# Or on agent nodes
|
||||
sudo systemctl restart rke2-agent
|
||||
|
||||
# Confirm containerd picked up the mirror config
|
||||
sudo /var/lib/rancher/rke2/bin/crictl info | jq '.config.registry.mirrors'
|
||||
```
|
||||
|
||||
### Verifying pulls go through the cache
|
||||
|
||||
```bash
|
||||
# Pull an image on a node
|
||||
sudo /var/lib/rancher/rke2/bin/crictl pull nginx:latest
|
||||
|
||||
# Check the artifact API received the request
|
||||
kubectl logs deployment/artifactapi -n artifact-storage | grep "nginx"
|
||||
# Expect: Cache MISS on first pull, Cache HIT on subsequent pulls
|
||||
|
||||
# Query the manifest endpoint directly — 200 means it's cached
|
||||
curl -I https://artifacts.example.com/v2/dockerhub/library/nginx/manifests/latest
|
||||
|
||||
# Check what's stored in the cache
|
||||
curl https://artifacts.example.com/ | jq '.remotes'
|
||||
```
|
||||
Reference in New Issue
Block a user