Complete rewrite of ArtifactAPI from Python/FastAPI to Go as a single binary. Core engine: - 10 package providers: generic, docker, helm, pypi, npm, rpm, alpine, puppet, terraform, goproxy — each with built-in mutable patterns - Content-addressable storage (SHA256 dedup across all remotes) - Three-tier caching: Redis (TTL/locks) → S3/MinIO (blobs) → upstream - Classifier with allowlist/blocklist per-remote (empty = allow all) - Circuit breaker, conditional revalidation, stale-on-error - Background garbage collection for orphaned blobs - Access logging to PostgreSQL API: - v1 proxy endpoints (backwards compatible) - v2 management API: CRUD remotes/virtuals, object browser, stats, health, SSE events, probe/test endpoint - Virtual repos with index merging (Helm YAML + PyPI HTML) Frontend (React + Vite, separate Dockerfile): - Dashboard with stats, health indicators, top remotes - Remotes list with type filter, remote detail with config/patterns - Object browser with pagination and evict - Test Remote page: probe any remote path, see headers/size/timing - Virtuals page with expandable member lists TUI (Bubble Tea): - Dashboard, remotes list/detail, object browser, virtuals - Vim-style navigation, artifactapi tui --endpoint <url> Infrastructure: - S3 client supports MinIO, Ceph RGW, AWS S3 (minio-go) - PostgreSQL schema with migrations - Docker Compose: API + UI + Postgres 17 + Redis 7 + MinIO - Makefile with Go version check, build/test/lint/fmt/e2e targets - Distroless Docker image (~15MB) Testing: - Unit tests for models, classifier, providers, mergers - E2E tests with testcontainers-go (real Postgres/Redis/MinIO) Terraform config: - All 40 production remotes + helm virtual as HCL - Provider repo: terraform-provider-artifactapi v0.0.1 (separate) --------- Co-authored-by: Ben Vincent <ben@unkin.net> Reviewed-on: #47
This commit was merged in pull request #47.
This commit is contained in:
@@ -1,582 +1,167 @@
|
||||
# Artifact Storage System
|
||||
# ArtifactAPI
|
||||
|
||||
FastAPI caching proxy that downloads and stores files from remote sources in S3-compatible storage.
|
||||
Caching proxy for package repositories. Single Go binary, 10 package types, content-addressable storage, managed by Terraform.
|
||||
|
||||
## Features
|
||||
## Quick Start
|
||||
|
||||
- Remote definitions via `remotes.yaml` — generic HTTP, Alpine APK, RPM, Docker, PyPI, npm, Helm, Puppet Forge, Terraform/OpenTofu registry
|
||||
- Virtual repositories — merge multiple remotes of the same package type into a single unified index
|
||||
- Immutable/mutable caching model with per-remote TTLs
|
||||
- Conditional revalidation (`If-None-Match` / `If-Modified-Since`) on TTL expiry
|
||||
- Stale-on-upstream-error: refreshes TTL when backend is unreachable rather than evicting
|
||||
- URL rewriting for PyPI simple index, npm metadata, and Helm `index.yaml`
|
||||
- Access control via regex patterns — unmatched paths return 403
|
||||
- Docker tag banning — block named tags (e.g. `latest`) while allowing digest pulls
|
||||
```bash
|
||||
# Start backing services
|
||||
docker compose up -d postgres redis minio
|
||||
|
||||
# Build and run
|
||||
make build
|
||||
./bin/artifactapi
|
||||
|
||||
# Frontend (separate container or dev server)
|
||||
cd ui && npm install && npm run dev
|
||||
```
|
||||
|
||||
API: `http://localhost:8000` | Frontend: `http://localhost:5173`
|
||||
|
||||
## Package Types
|
||||
|
||||
| Type | Mutable (auto-detected) | Immutable (auto-detected) |
|
||||
|---|---|---|
|
||||
| `generic` | nothing | everything |
|
||||
| `docker` | tag manifests, `/tags/list` | blobs, digest manifests |
|
||||
| `helm` | `index.yaml` | `.tgz` charts |
|
||||
| `pypi` | `simple/*` index pages | `.whl`, `.tar.gz` |
|
||||
| `npm` | package metadata | `.tgz` tarballs |
|
||||
| `rpm` | `repomd.xml`, `repodata/*` | `.rpm` |
|
||||
| `alpine` | `APKINDEX.tar.gz` | `.apk` |
|
||||
| `puppet` | `v3/modules/*`, `v3/releases*` | `.tar.gz` |
|
||||
| `terraform` | `*/versions` | `*/download/*/*` |
|
||||
| `goproxy` | `@v/list`, `@latest` | `.info`, `.mod`, `.zip` |
|
||||
|
||||
Providers classify paths automatically. Users only configure what to proxy and TTLs.
|
||||
|
||||
## Terraform
|
||||
|
||||
Remotes and virtuals are managed by Terraform. Each package type has its own resource:
|
||||
|
||||
```hcl
|
||||
resource "artifactapi_remote_generic" "github" {
|
||||
name = "github"
|
||||
base_url = "https://github.com"
|
||||
|
||||
immutable_ttl = 0
|
||||
mutable_ttl = 7200
|
||||
|
||||
patterns = [
|
||||
"ducaale/xh/.*/xh-.*-x86_64-unknown-linux-musl.tar.gz$",
|
||||
"mikefarah/yq/.*/yq_linux_amd64$",
|
||||
]
|
||||
|
||||
mutable_patterns = [
|
||||
".*/archive/refs/heads/.*\\.tar\\.gz$",
|
||||
]
|
||||
}
|
||||
|
||||
resource "artifactapi_remote_docker" "dockerhub" {
|
||||
name = "dockerhub"
|
||||
base_url = "https://registry-1.docker.io"
|
||||
|
||||
immutable_ttl = 0
|
||||
mutable_ttl = 300
|
||||
ban_tags_enabled = true
|
||||
ban_tags = ["latest"]
|
||||
|
||||
patterns = [
|
||||
"^library/postgres",
|
||||
"^library/redis",
|
||||
]
|
||||
}
|
||||
|
||||
resource "artifactapi_remote_helm" "jetstack" {
|
||||
name = "jetstack"
|
||||
base_url = "https://charts.jetstack.io"
|
||||
|
||||
immutable_ttl = 0
|
||||
mutable_ttl = 3600
|
||||
}
|
||||
|
||||
resource "artifactapi_virtual" "helm" {
|
||||
name = "helm"
|
||||
package_type = "helm"
|
||||
members = [artifactapi_remote_helm.jetstack.name]
|
||||
}
|
||||
```
|
||||
|
||||
Provider: [terraform-provider-artifactapi](../terraform-provider-artifactapi)
|
||||
|
||||
## Access Control
|
||||
|
||||
| Field | Default | Behaviour |
|
||||
|---|---|---|
|
||||
| `patterns` | empty (proxy all) | If set, only matching paths are proxied. Acts as allowlist. |
|
||||
| `blocklist` | empty | Matching paths always denied. Checked first. |
|
||||
| `mutable_patterns` | empty | Override: force paths to mutable TTL. |
|
||||
| `immutable_patterns` | empty | Override: force paths to immutable TTL. |
|
||||
|
||||
No patterns + no blocklist = open proxy. Provider handles mutability classification automatically.
|
||||
|
||||
## API
|
||||
|
||||
### Proxy (v1)
|
||||
|
||||
```
|
||||
GET /api/v1/remote/{name}/{path} Proxy/cache artifact
|
||||
GET /api/v1/virtual/{name}/{path} Virtual repo (merged index)
|
||||
GET /v2/{name}/{path} Docker Registry v2
|
||||
```
|
||||
|
||||
### Management (v2)
|
||||
|
||||
```
|
||||
GET/POST /api/v2/remotes List / create remotes
|
||||
GET/PUT/DELETE /api/v2/remotes/{name} Read / update / delete remote
|
||||
GET/DELETE /api/v2/remotes/{name}/objects Browse / evict cached objects
|
||||
GET /api/v2/stats Overview stats
|
||||
GET /api/v2/health Service health
|
||||
POST /api/v2/probe Test a remote (fetch without streaming to client)
|
||||
GET /api/v2/events SSE event stream
|
||||
```
|
||||
|
||||
## Architecture
|
||||
|
||||
```
|
||||
client → /api/v1/remote/{remote}/{path}
|
||||
↓
|
||||
Redis: mutable TTL check
|
||||
↓ miss / expired
|
||||
S3: object exists?
|
||||
↓ no
|
||||
upstream remote → S3 + PostgreSQL metadata
|
||||
↓
|
||||
response (X-Artifact-Source: cache|remote)
|
||||
PostgreSQL ─── config (remotes, virtuals), artifact metadata, access log
|
||||
Redis ─── TTL keys, fetch locks, circuit breaker state
|
||||
S3/MinIO ─── content-addressable blob storage (blobs/sha256/{hash})
|
||||
```
|
||||
|
||||
Docker Registry traffic uses the `/v2/{remote}/{path}` endpoint implementing the Docker Registry HTTP API v2.
|
||||
S3 client supports MinIO, Ceph RGW, and AWS S3 (via minio-go).
|
||||
|
||||
### Code layout
|
||||
## Environment Variables
|
||||
|
||||
```
|
||||
src/artifactapi/
|
||||
├── main.py — FastAPI app + thin route declarations only
|
||||
├── config.py — ConfigManager (loads remotes.yaml)
|
||||
├── metrics.py — Prometheus + Redis metrics
|
||||
├── docker_auth.py — backwards-compat shim → auth/docker.py
|
||||
├── artifact/ — route handler implementations
|
||||
│ ├── proxy.py — GET /api/v1/remote (remote proxy, cache, revalidation)
|
||||
│ ├── virtual.py — GET /api/v1/virtual (virtual repo index merging)
|
||||
│ ├── local.py — PUT/HEAD/DELETE /api/v1/remote (local repos)
|
||||
│ ├── docker.py — /v2/ Docker Registry v2 proxy
|
||||
│ ├── discovery.py — /api/v1/artifacts discovery + bulk cache
|
||||
│ └── flush.py — PUT /cache/flush
|
||||
├── auth/
|
||||
│ ├── __init__.py — re-exports Docker auth helpers
|
||||
│ └── docker.py — Bearer token fetching + in-memory cache
|
||||
├── cache/
|
||||
│ ├── __init__.py — re-exports RedisCache
|
||||
│ └── redis.py — RedisCache (TTL keys, ETag metadata)
|
||||
├── database/
|
||||
│ ├── __init__.py — re-exports DatabaseManager
|
||||
│ └── postgres.py — DatabaseManager (artifact + local-file tables)
|
||||
├── storage/
|
||||
│ ├── __init__.py — re-exports S3Storage
|
||||
│ └── s3.py — S3Storage (MinIO/S3 abstraction)
|
||||
└── remote/
|
||||
├── __init__.py
|
||||
├── base.py — content-type detection
|
||||
├── generic.py — generic HTTP remotes
|
||||
├── helm.py — Helm index.yaml URL rewriting
|
||||
├── npm.py — npm metadata URL rewriting
|
||||
├── puppet.py — Puppet Forge JSON URL rewriting
|
||||
├── python.py — PyPI URL construction + HTML rewriting
|
||||
├── rpm.py — RPM remotes
|
||||
└── terraform.py — Terraform/OpenTofu registry URL construction + download URL rewriting
|
||||
```
|
||||
|
||||
## API Endpoints
|
||||
|
||||
| Method | Path | Description |
|
||||
| Variable | Default | Description |
|
||||
|---|---|---|
|
||||
| `GET` | `/api/v1/remote/{remote}/{path}` | Fetch artifact (auto-cache on miss) |
|
||||
| `GET` | `/api/v1/virtual/{virtual}/{path}` | Fetch from virtual (merged) repository |
|
||||
| `GET` | `/api/v1/local/{local}/{path}` | Download from local repository |
|
||||
| `PUT` | `/api/v1/local/{local}/{path}` | Upload to local repository |
|
||||
| `HEAD` | `/api/v1/local/{local}/{path}` | Check existence (local) |
|
||||
| `DELETE` | `/api/v1/local/{local}/{path}` | Delete from local repository |
|
||||
| `GET` | `/v2/{remote}/{path}` | Docker Registry v2 proxy |
|
||||
| `PUT` | `/cache/flush` | Flush cache entries |
|
||||
| `GET` | `/health` | Health check |
|
||||
| `GET` | `/config` | View loaded configuration |
|
||||
| `GET` | `/` | API info and available remotes |
|
||||
| `LISTEN_ADDR` | `:8000` | Server listen address |
|
||||
| `DBHOST` | `localhost` | PostgreSQL host |
|
||||
| `DBPORT` | `5432` | PostgreSQL port |
|
||||
| `DBUSER` | `artifacts` | PostgreSQL user |
|
||||
| `DBPASS` | | PostgreSQL password |
|
||||
| `DBNAME` | `artifacts` | PostgreSQL database |
|
||||
| `REDIS_URL` | `redis://localhost:6379` | Redis URL |
|
||||
| `MINIO_ENDPOINT` | `localhost:9000` | S3 endpoint |
|
||||
| `MINIO_ACCESS_KEY` | | S3 access key |
|
||||
| `MINIO_SECRET_KEY` | | S3 secret key |
|
||||
| `MINIO_BUCKET` | `artifacts` | S3 bucket |
|
||||
| `MINIO_SECURE` | `false` | Use HTTPS for S3 |
|
||||
| `MINIO_REGION` | | S3 region (AWS) |
|
||||
|
||||
## Configuration
|
||||
|
||||
Runtime settings come from environment variables; remote definitions live in one or more YAML files pointed to by `CONFIG_PATH`.
|
||||
|
||||
### Environment Variables
|
||||
|
||||
| Variable | Description |
|
||||
|---|---|
|
||||
| `CONFIG_PATH` | Path to a config YAML file **or** a directory of YAML files |
|
||||
| `DBHOST`, `DBPORT`, `DBUSER`, `DBPASS`, `DBNAME` | PostgreSQL connection |
|
||||
| `REDIS_URL` | Redis URL (e.g. `redis://localhost:6379`) |
|
||||
| `MINIO_ENDPOINT` | MinIO/S3 endpoint |
|
||||
| `MINIO_ACCESS_KEY` | S3 access key |
|
||||
| `MINIO_SECRET_KEY` | S3 secret key |
|
||||
| `MINIO_BUCKET` | S3 bucket name |
|
||||
| `MINIO_SECURE` | Use HTTPS (`true`/`false`) |
|
||||
|
||||
### Split configuration
|
||||
|
||||
`CONFIG_PATH` accepts three forms:
|
||||
|
||||
**Single file** (original behaviour):
|
||||
```
|
||||
CONFIG_PATH=/etc/artifactapi/remotes.yaml
|
||||
```
|
||||
|
||||
**Directory** — all `*.yaml` / `*.yml` files in the directory are loaded and merged alphabetically. `remotes` keys are merged across files; later files win on conflict:
|
||||
```
|
||||
CONFIG_PATH=/etc/artifactapi/conf.d/
|
||||
```
|
||||
|
||||
**Main file + `config_dir`** — the main file holds global settings and a `config_dir` pointer; each file in that directory contributes its own `remotes`. Relative `config_dir` paths are resolved relative to the main file:
|
||||
```yaml
|
||||
# /etc/artifactapi/config.yaml
|
||||
config_dir: conf.d # or an absolute path
|
||||
|
||||
# s3/redis/database settings go here (or in env vars)
|
||||
remotes: {} # optional base remotes
|
||||
```
|
||||
|
||||
### Configuration structure
|
||||
|
||||
Repositories are declared under three top-level keys matching their type:
|
||||
|
||||
```yaml
|
||||
remotes: # proxy (caching) remotes
|
||||
remote-name:
|
||||
base_url: "https://example.com"
|
||||
package: "generic" # generic, alpine, rpm, docker, pypi, npm, helm, puppet, terraform
|
||||
description: "..."
|
||||
immutable_patterns: # regex — cached forever
|
||||
- ".*\\.tar\\.gz$"
|
||||
mutable_patterns: # regex — expire after mutable_ttl
|
||||
- "index\\.yaml$"
|
||||
check_mutable_updates: false # send HEAD (If-None-Match) on TTL expiry
|
||||
cache:
|
||||
immutable_ttl: 0 # 0 = indefinitely
|
||||
mutable_ttl: 3600
|
||||
|
||||
virtuals: # virtual (merged-index) repositories
|
||||
virtual-name:
|
||||
package: "helm"
|
||||
members:
|
||||
- remote-a
|
||||
- remote-b
|
||||
|
||||
locals: # local upload repositories (no base_url)
|
||||
local-name:
|
||||
package: "generic"
|
||||
cache:
|
||||
immutable_ttl: 0
|
||||
mutable_ttl: 0
|
||||
```
|
||||
|
||||
## Remote Types
|
||||
|
||||
### generic
|
||||
|
||||
Arbitrary HTTP file servers — GitHub releases, HashiCorp, custom servers.
|
||||
|
||||
```yaml
|
||||
remotes:
|
||||
github:
|
||||
base_url: "https://github.com"
|
||||
package: "generic"
|
||||
immutable_patterns:
|
||||
- "gruntwork-io/terragrunt/.*terragrunt_linux_amd64.*"
|
||||
cache:
|
||||
immutable_ttl: 0
|
||||
|
||||
github-archive:
|
||||
base_url: "https://github.com"
|
||||
package: "generic"
|
||||
immutable_patterns:
|
||||
- ".*/archive/refs/tags/.*\\.tar\\.gz$" # tag archives never change
|
||||
mutable_patterns:
|
||||
- ".*/archive/refs/heads/main\\.tar\\.gz$" # branch archives can change
|
||||
check_mutable_updates: true
|
||||
cache:
|
||||
immutable_ttl: 0
|
||||
mutable_ttl: 86400
|
||||
```
|
||||
|
||||
Access: `GET /api/v1/remote/github/owner/repo/releases/download/v1.0/binary.tar.gz`
|
||||
|
||||
### alpine
|
||||
|
||||
```yaml
|
||||
remotes:
|
||||
alpine:
|
||||
base_url: "https://dl-cdn.alpinelinux.org"
|
||||
package: "alpine"
|
||||
immutable_patterns:
|
||||
- ".*/x86_64/.*\\.apk$"
|
||||
cache:
|
||||
immutable_ttl: 0
|
||||
mutable_ttl: 7200
|
||||
```
|
||||
|
||||
`APKINDEX.tar.gz` is a built-in mutable pattern — no `mutable_patterns` entry needed.
|
||||
|
||||
### rpm
|
||||
|
||||
```yaml
|
||||
remotes:
|
||||
almalinux:
|
||||
base_url: "https://mirror.example.com/almalinux"
|
||||
package: "rpm"
|
||||
immutable_patterns:
|
||||
- ".*/x86_64/.*\\.rpm$"
|
||||
- ".*/noarch/.*\\.rpm$"
|
||||
cache:
|
||||
immutable_ttl: 0
|
||||
mutable_ttl: 7200
|
||||
```
|
||||
|
||||
`repomd.xml` and `repodata/` metadata files are built-in mutable patterns.
|
||||
|
||||
### docker
|
||||
|
||||
```yaml
|
||||
remotes:
|
||||
dockerhub:
|
||||
base_url: "https://registry-1.docker.io"
|
||||
package: "docker"
|
||||
# username / password optional for public images
|
||||
cache:
|
||||
immutable_ttl: 0
|
||||
mutable_ttl: 300
|
||||
|
||||
ghcr:
|
||||
base_url: "https://ghcr.io"
|
||||
package: "docker"
|
||||
username: "your-github-username"
|
||||
password: "ghp_your_pat" # read:packages scope
|
||||
cache:
|
||||
immutable_ttl: 0
|
||||
mutable_ttl: 300
|
||||
```
|
||||
|
||||
Tag manifests and `/tags/list` are built-in mutable patterns. Digest-addressed blobs are immutable.
|
||||
|
||||
#### Banning tags
|
||||
|
||||
Set `ban_tags_enabled: true` and list named tags in `ban_tags` to block specific tag references. Requests for a banned tag return `403`. Digest-addressed pulls (`sha256:…`) are never blocked, so images already in use can still be referenced by digest.
|
||||
|
||||
```yaml
|
||||
remotes:
|
||||
dockerhub:
|
||||
base_url: "https://registry-1.docker.io"
|
||||
package: "docker"
|
||||
ban_tags_enabled: true
|
||||
ban_tags:
|
||||
- latest # force pinned tags in CI/CD
|
||||
- edge
|
||||
cache:
|
||||
immutable_ttl: 0
|
||||
mutable_ttl: 300
|
||||
```
|
||||
|
||||
`ban_tags_enabled` defaults to `false`. Setting it to `true` with an empty `ban_tags` list has no effect.
|
||||
|
||||
For RKE2/containerd, configure `/etc/rancher/rke2/registries.yaml`:
|
||||
|
||||
```yaml
|
||||
mirrors:
|
||||
docker.io:
|
||||
endpoint:
|
||||
- "https://artifacts.example.com"
|
||||
rewrite:
|
||||
"^(.*)$": "dockerhub/$1"
|
||||
ghcr.io:
|
||||
endpoint:
|
||||
- "https://artifacts.example.com"
|
||||
rewrite:
|
||||
"^(.*)$": "ghcr/$1"
|
||||
```
|
||||
|
||||
### pypi
|
||||
|
||||
```yaml
|
||||
remotes:
|
||||
pypi:
|
||||
base_url: "https://files.pythonhosted.org"
|
||||
package: "pypi"
|
||||
check_mutable_updates: true
|
||||
immutable_patterns:
|
||||
- "packages/.*\\.whl$"
|
||||
- "packages/.*\\.whl\\.metadata$"
|
||||
- "packages/.*\\.tar\\.gz$"
|
||||
- "packages/.*\\.zip$"
|
||||
cache:
|
||||
immutable_ttl: 0
|
||||
mutable_ttl: 600
|
||||
```
|
||||
|
||||
> **Note**: Simple index requests (`/simple/{package}/`) are always fetched from `https://pypi.org`, regardless of `base_url`. This is hardcoded — `base_url` only controls where package files are downloaded from. For self-hosted registries (Gitea, Nexus) where both index and files share the same host, set `base_url` to that host and the override does not apply.
|
||||
|
||||
URLs in simple index HTML are rewritten to route package file downloads back through the same remote.
|
||||
|
||||
Configure uv:
|
||||
|
||||
```toml
|
||||
# /etc/uv/uv.toml or ~/.config/uv/uv.toml
|
||||
[[index]]
|
||||
url = "https://artifacts.example.com/api/v1/remote/pypi/simple"
|
||||
default = true
|
||||
```
|
||||
|
||||
### npm
|
||||
|
||||
```yaml
|
||||
remotes:
|
||||
npm:
|
||||
base_url: "https://registry.npmjs.org"
|
||||
package: "npm"
|
||||
check_mutable_updates: true
|
||||
immutable_patterns:
|
||||
- "\.tgz$"
|
||||
mutable_patterns:
|
||||
- "^(?!.*\.tgz$).*"
|
||||
cache:
|
||||
immutable_ttl: 0
|
||||
mutable_ttl: 600
|
||||
```
|
||||
|
||||
`dist.tarball` URLs in package metadata JSON are rewritten to route tarball downloads back through the same remote.
|
||||
|
||||
Configure npm / yarn / pnpm:
|
||||
|
||||
```ini
|
||||
# .npmrc or ~/.npmrc
|
||||
registry=https://artifacts.example.com/api/v1/remote/npm/
|
||||
```
|
||||
|
||||
### helm
|
||||
|
||||
```yaml
|
||||
remotes:
|
||||
hashicorp-helm:
|
||||
base_url: "https://helm.releases.hashicorp.com"
|
||||
package: "helm"
|
||||
check_mutable_updates: true
|
||||
immutable_patterns:
|
||||
- "\\.tgz$"
|
||||
cache:
|
||||
immutable_ttl: 0
|
||||
mutable_ttl: 3600
|
||||
```
|
||||
|
||||
`index.yaml` is a built-in mutable pattern. Chart URLs inside `index.yaml` are rewritten to route tarball downloads back through the same remote.
|
||||
|
||||
Configure Helm:
|
||||
## Development
|
||||
|
||||
```bash
|
||||
helm repo add hashicorp https://artifacts.example.com/api/v1/remote/hashicorp-helm
|
||||
helm repo update
|
||||
make build # Build binary
|
||||
make test # Unit tests
|
||||
make e2e # E2E tests (needs Docker)
|
||||
make lint # golangci-lint + go vet
|
||||
make fmt # gofmt + goimports
|
||||
```
|
||||
|
||||
### puppet
|
||||
|
||||
Proxy for [Puppet Forge](https://forge.puppet.com) (forgeapi.puppet.com). Module metadata is cached as mutable; versioned module tarballs are cached as immutable.
|
||||
|
||||
```yaml
|
||||
remotes:
|
||||
puppet-forge:
|
||||
base_url: "https://forgeapi.puppet.com"
|
||||
package: "puppet"
|
||||
check_mutable_updates: true
|
||||
immutable_patterns:
|
||||
- "^v3/files/.*\\.tar\\.gz$"
|
||||
cache:
|
||||
immutable_ttl: 0 # module tarballs cached indefinitely
|
||||
mutable_ttl: 600 # module metadata refreshed after 10 minutes
|
||||
```
|
||||
|
||||
`v3/modules/` and `v3/releases` are built-in mutable patterns — module metadata pages expire after `mutable_ttl` and are re-fetched on the next request.
|
||||
|
||||
**URL rewriting**: the proxy rewrites `file_uri` fields in Forge JSON responses from relative paths (`/v3/files/…`) to absolute proxy URLs. g10k resolves download URLs with Go's `url.ResolveReference`, so an absolute `file_uri` overrides the forge base entirely — tarballs download straight from the proxy without a second hop.
|
||||
|
||||
**Client configuration — g10k**: set `forge_base_url` in the g10k config file:
|
||||
|
||||
```yaml
|
||||
# g10k.yaml
|
||||
cachedir: /tmp/g10k
|
||||
forge_base_url: https://artifacts.example.com/api/v1/remote/puppet-forge
|
||||
sources:
|
||||
control:
|
||||
remote: git@git.example.com:puppet/control.git
|
||||
basedir: /etc/puppetlabs/code/environments
|
||||
```
|
||||
|
||||
Alternatively, set the URL per-Puppetfile with the `forge.baseUrl` directive (works with `-puppetfile` mode and does not require a config file):
|
||||
|
||||
```ruby
|
||||
forge.baseUrl https://artifacts.example.com/api/v1/remote/puppet-forge
|
||||
|
||||
mod 'puppetlabs-stdlib', '9.7.0'
|
||||
mod 'puppetlabs-inifile', '6.2.0'
|
||||
```
|
||||
|
||||
### terraform
|
||||
|
||||
Proxy for [Terraform](https://registry.terraform.io) / [OpenTofu](https://opentofu.org) provider registries using the [Registry Protocol](https://developer.hashicorp.com/terraform/internals/provider-registry-protocol). Provider version listings are mutable; per-version download info is immutable.
|
||||
|
||||
Two remotes are needed: one for the registry API and one for the release CDN (where the actual `.zip` binaries live):
|
||||
|
||||
```yaml
|
||||
remotes:
|
||||
terraform-registry:
|
||||
base_url: "https://registry.terraform.io"
|
||||
package: "terraform"
|
||||
releases_remote: "hashicorp-releases" # name of the CDN remote below
|
||||
immutable_patterns:
|
||||
- "[^/]+/[^/]+/[^/]+/download/[^/]+/[^/]+$"
|
||||
cache:
|
||||
immutable_ttl: 0 # per-version download info cached indefinitely
|
||||
mutable_ttl: 300 # provider version lists refreshed after 5 minutes
|
||||
|
||||
hashicorp-releases:
|
||||
base_url: "https://releases.hashicorp.com"
|
||||
package: "generic"
|
||||
immutable_patterns:
|
||||
- ".*\\.zip$"
|
||||
- ".*SHA256SUMS(\\.sig)?$"
|
||||
cache:
|
||||
immutable_ttl: 0
|
||||
mutable_ttl: 0
|
||||
```
|
||||
|
||||
`{namespace}/{type}/versions` is a built-in mutable pattern — the version list expires after `mutable_ttl` and is re-fetched on the next request.
|
||||
|
||||
**URL rewriting**: the `download_url`, `shasums_url`, and `shasums_signature_url` fields in per-version download info JSON are rewritten from `releases.hashicorp.com` to point at the remote named by `releases_remote`, so Terraform fetches binaries through the proxy.
|
||||
|
||||
**Client configuration**: redirect Terraform's provider registry lookup via `.terraformrc` without changing any provider source addresses in your Terraform code:
|
||||
|
||||
```hcl
|
||||
# ~/.terraformrc (or /etc/terraform.rc, or TF_CLI_CONFIG_FILE)
|
||||
host "registry.terraform.io" {
|
||||
services = {
|
||||
"providers.v1" = "http://artifacts.example.com/api/v1/remote/terraform-registry/"
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
With this in place, `terraform init` / `tofu init` fetches provider metadata from the proxy and downloads zips from the `hashicorp-releases` remote. No changes to `.tf` files are needed.
|
||||
|
||||
### virtual
|
||||
|
||||
A virtual repository presents a single unified index built from multiple member remotes of the same package type. Clients configure one endpoint and get access to all member remotes transparently.
|
||||
|
||||
All members must share the same `package` type as the virtual repo. Currently supported package types: `helm`.
|
||||
|
||||
```yaml
|
||||
remotes:
|
||||
helm-hashicorp:
|
||||
base_url: "https://helm.releases.hashicorp.com"
|
||||
package: "helm"
|
||||
immutable_patterns:
|
||||
- "\\.tgz$"
|
||||
cache:
|
||||
immutable_ttl: 0
|
||||
mutable_ttl: 3600
|
||||
|
||||
helm-bitnami:
|
||||
base_url: "https://charts.bitnami.com/bitnami"
|
||||
package: "helm"
|
||||
immutable_patterns:
|
||||
- "\\.tgz$"
|
||||
cache:
|
||||
immutable_ttl: 0
|
||||
mutable_ttl: 3600
|
||||
|
||||
virtuals:
|
||||
helm-all:
|
||||
package: "helm"
|
||||
members:
|
||||
- helm-hashicorp # listed first = highest priority
|
||||
- helm-bitnami
|
||||
```
|
||||
|
||||
**How it works:**
|
||||
|
||||
1. A request for the package index triggers a parallel fetch of each member's index from S3 cache, falling back to upstream if not yet cached.
|
||||
2. Member indexes are merged into a single index with URL rewriting so artifact download URLs continue to resolve through the individual member remote.
|
||||
3. The merged index is cached in Redis with a TTL equal to the minimum `mutable_ttl` across all members.
|
||||
|
||||
**Priority / conflict resolution:**
|
||||
|
||||
When the same artifact name and version appears in more than one member, the member listed **first** in `members` wins. Subsequent members contribute only artifacts not already present.
|
||||
|
||||
**Partial failures:**
|
||||
|
||||
If a member is unreachable and has no cached index, it is skipped and a warning is logged. The merged index is still served from available members. If *no* members can be reached, the request returns `502`.
|
||||
|
||||
**Caching:**
|
||||
|
||||
The merged index is cached using `min(mutable_ttl)` across all members. Each member's raw index is cached in S3 under its own remote key; the virtual handler reuses those copies when available. On rebuild, each member's parsed index is also stored as a compact msgpack file (`index.msgpack`) alongside the raw YAML, eliminating the YAML parse cost on subsequent rebuilds.
|
||||
|
||||
**Helm example:**
|
||||
### TUI
|
||||
|
||||
```bash
|
||||
helm repo add all https://artifacts.example.com/api/v1/virtual/helm-all
|
||||
helm repo update
|
||||
./bin/artifactapi tui --endpoint http://localhost:8000
|
||||
```
|
||||
|
||||
Chart tarball URLs in the merged `index.yaml` are rewritten to point at the individual member remote (e.g. `…/api/v1/remote/helm-hashicorp/vault-0.27.0.tgz`), so downloads bypass the virtual endpoint entirely.
|
||||
|
||||
### local
|
||||
|
||||
```yaml
|
||||
locals:
|
||||
local-generic:
|
||||
package: "generic"
|
||||
description: "Local file repository"
|
||||
cache:
|
||||
immutable_ttl: 0
|
||||
mutable_ttl: 0
|
||||
```
|
||||
|
||||
No `base_url`. Files are uploaded via `PUT /api/v1/local/{name}/{path}` and downloaded via `GET /api/v1/local/{name}/{path}`.
|
||||
|
||||
## Caching Model
|
||||
|
||||
### Immutable patterns
|
||||
|
||||
Files matching `immutable_patterns` are cached for `immutable_ttl` seconds (0 = indefinitely). Use for versioned release artifacts that never change once published.
|
||||
|
||||
**Access control**: only paths matching an immutable or mutable pattern are served; all others return 403. Omitting `immutable_patterns` entirely allows all paths from that remote.
|
||||
|
||||
### Mutable patterns
|
||||
|
||||
Files matching `mutable_patterns` expire after `mutable_ttl` seconds and are re-fetched on the next request. Mutable files are always served regardless of `immutable_patterns`.
|
||||
|
||||
Each package type has built-in defaults that are merged with any user-defined `mutable_patterns`:
|
||||
|
||||
| Package type | Built-in mutable patterns |
|
||||
|---|---|
|
||||
| `alpine` | `APKINDEX\.tar\.gz$` |
|
||||
| `rpm` | `repomd\.xml$`, `repodata/` metadata variants, `Packages\.gz$` |
|
||||
| `docker` | Tag manifests (non-digest refs), `/tags/list` |
|
||||
| `pypi` | `simple/` (per-package and top-level index pages) |
|
||||
| `helm` | `index\.yaml$` |
|
||||
| `puppet` | `^v3/modules/`, `^v3/releases` |
|
||||
| `terraform` | `[^/]+/[^/]+/versions$` |
|
||||
| `npm` | *(none built-in — define via `mutable_patterns`)* |
|
||||
| `generic` | *(none)* |
|
||||
|
||||
### Conditional revalidation
|
||||
|
||||
Set `check_mutable_updates: true` to send `HEAD` with `If-None-Match` / `If-Modified-Since` on TTL expiry. A 304 response refreshes the TTL without re-downloading. Only applies to user-defined `mutable_patterns` — built-in patterns are always re-fetched unconditionally.
|
||||
|
||||
### Stale-on-upstream-error
|
||||
|
||||
When a mutable file expires and the upstream is unreachable (connection refused, DNS failure, timeout), the cached copy is kept and its TTL refreshed. HTTP error responses (4xx, 5xx) are not treated as network failures and proceed with normal expiry.
|
||||
|
||||
### Quarantine (supply-chain protection)
|
||||
|
||||
Set `quarantine_new: true` and `quarantine_days: N` on a remote to block immutable artifacts published within the last N days. Requests return `404` until the quarantine period expires, giving time to detect malicious packages before they are consumed.
|
||||
|
||||
```yaml
|
||||
remotes:
|
||||
pypi:
|
||||
base_url: "https://files.pythonhosted.org"
|
||||
package: "pypi"
|
||||
quarantine_new: true
|
||||
quarantine_days: 3 # block packages published in the last 3 days
|
||||
immutable_patterns:
|
||||
- "packages/.*\\.whl$"
|
||||
- "packages/.*\\.tar\\.gz$"
|
||||
cache:
|
||||
immutable_ttl: 0
|
||||
mutable_ttl: 600
|
||||
```
|
||||
|
||||
The upstream `Last-Modified` response header is used as the publish date proxy. Artifacts that have no `Last-Modified` header are allowed through (fail-open). Mutable files (index pages, tag manifests) are never quarantined.
|
||||
|
||||
Reference in New Issue
Block a user