0df726467a
cache/redis.py, database/postgres.py, and remote/{base,generic,helm,npm,python,rpm}.py
replace the flat modules. All public symbols re-exported from their package
__init__.py for backwards compatibility. No functional changes; all 187 tests pass.
Closes #19
344 lines
10 KiB
Markdown
344 lines
10 KiB
Markdown
# Artifact Storage System
|
|
|
|
FastAPI caching proxy that downloads and stores files from remote sources in S3-compatible storage.
|
|
|
|
## Features
|
|
|
|
- Remote definitions via `remotes.yaml` — generic HTTP, Alpine APK, RPM, Docker, PyPI, npm, Helm
|
|
- Immutable/mutable caching model with per-remote TTLs
|
|
- Conditional revalidation (`If-None-Match` / `If-Modified-Since`) on TTL expiry
|
|
- Stale-on-upstream-error: refreshes TTL when backend is unreachable rather than evicting
|
|
- URL rewriting for PyPI simple index, npm metadata, and Helm `index.yaml`
|
|
- Access control via regex patterns — unmatched paths return 403
|
|
|
|
## Architecture
|
|
|
|
```
|
|
client → /api/v1/remote/{remote}/{path}
|
|
↓
|
|
Redis: mutable TTL check
|
|
↓ miss / expired
|
|
S3: object exists?
|
|
↓ no
|
|
upstream remote → S3 + PostgreSQL metadata
|
|
↓
|
|
response (X-Artifact-Source: cache|remote)
|
|
```
|
|
|
|
Docker Registry traffic uses the `/v2/{remote}/{path}` endpoint implementing the Docker Registry HTTP API v2.
|
|
|
|
### Code layout
|
|
|
|
```
|
|
src/artifactapi/
|
|
├── main.py — FastAPI app, route handlers
|
|
├── config.py — ConfigManager (loads remotes.yaml)
|
|
├── storage.py — S3Storage (MinIO/S3 abstraction)
|
|
├── docker_auth.py — Docker Bearer token fetching
|
|
├── metrics.py — Prometheus + Redis metrics
|
|
├── cache/
|
|
│ ├── __init__.py — re-exports RedisCache
|
|
│ └── redis.py — RedisCache (TTL keys, ETag metadata)
|
|
├── database/
|
|
│ ├── __init__.py — re-exports DatabaseManager
|
|
│ └── postgres.py — DatabaseManager (artifact + local-file tables)
|
|
└── remote/
|
|
├── __init__.py
|
|
├── base.py — content-type detection
|
|
├── generic.py — generic HTTP remotes
|
|
├── helm.py — Helm index.yaml URL rewriting
|
|
├── npm.py — npm metadata URL rewriting
|
|
├── python.py — PyPI URL construction + HTML rewriting
|
|
└── rpm.py — RPM remotes
|
|
```
|
|
|
|
## API Endpoints
|
|
|
|
| Method | Path | Description |
|
|
|---|---|---|
|
|
| `GET` | `/api/v1/remote/{remote}/{path}` | Fetch artifact (auto-cache on miss) |
|
|
| `PUT` | `/api/v1/remote/{remote}/{path}` | Upload to local remote |
|
|
| `HEAD` | `/api/v1/remote/{remote}/{path}` | Check existence (local remotes) |
|
|
| `DELETE` | `/api/v1/remote/{remote}/{path}` | Delete from local remote |
|
|
| `GET` | `/v2/{remote}/{path}` | Docker Registry v2 proxy |
|
|
| `PUT` | `/cache/flush` | Flush cache entries |
|
|
| `GET` | `/health` | Health check |
|
|
| `GET` | `/config` | View loaded configuration |
|
|
| `GET` | `/` | API info and available remotes |
|
|
|
|
## Configuration
|
|
|
|
Runtime settings come from environment variables; remote definitions live in `remotes.yaml`.
|
|
|
|
### Environment Variables
|
|
|
|
| Variable | Description |
|
|
|---|---|
|
|
| `DBHOST`, `DBPORT`, `DBUSER`, `DBPASS`, `DBNAME` | PostgreSQL connection |
|
|
| `REDIS_URL` | Redis URL (e.g. `redis://localhost:6379`) |
|
|
| `MINIO_ENDPOINT` | MinIO/S3 endpoint |
|
|
| `MINIO_ACCESS_KEY` | S3 access key |
|
|
| `MINIO_SECRET_KEY` | S3 secret key |
|
|
| `MINIO_BUCKET` | S3 bucket name |
|
|
| `MINIO_SECURE` | Use HTTPS (`true`/`false`) |
|
|
|
|
### remotes.yaml Structure
|
|
|
|
```yaml
|
|
remotes:
|
|
remote-name:
|
|
base_url: "https://example.com"
|
|
type: "remote" # "remote" or "local"
|
|
package: "generic" # generic, alpine, rpm, docker, pypi, npm, helm
|
|
description: "..."
|
|
immutable_patterns: # regex — cached forever
|
|
- ".*\\.tar\\.gz$"
|
|
mutable_patterns: # regex — expire after mutable_ttl
|
|
- "index\\.yaml$"
|
|
check_mutable_updates: false # send HEAD (If-None-Match) on TTL expiry
|
|
cache:
|
|
immutable_ttl: 0 # 0 = indefinitely
|
|
mutable_ttl: 3600
|
|
```
|
|
|
|
## Remote Types
|
|
|
|
### generic
|
|
|
|
Arbitrary HTTP file servers — GitHub releases, HashiCorp, custom servers.
|
|
|
|
```yaml
|
|
remotes:
|
|
github:
|
|
base_url: "https://github.com"
|
|
type: "remote"
|
|
package: "generic"
|
|
immutable_patterns:
|
|
- "gruntwork-io/terragrunt/.*terragrunt_linux_amd64.*"
|
|
cache:
|
|
immutable_ttl: 0
|
|
|
|
github-archive:
|
|
base_url: "https://github.com"
|
|
type: "remote"
|
|
package: "generic"
|
|
immutable_patterns:
|
|
- ".*/archive/refs/tags/.*\\.tar\\.gz$" # tag archives never change
|
|
mutable_patterns:
|
|
- ".*/archive/refs/heads/main\\.tar\\.gz$" # branch archives can change
|
|
check_mutable_updates: true
|
|
cache:
|
|
immutable_ttl: 0
|
|
mutable_ttl: 86400
|
|
```
|
|
|
|
Access: `GET /api/v1/remote/github/owner/repo/releases/download/v1.0/binary.tar.gz`
|
|
|
|
### alpine
|
|
|
|
```yaml
|
|
remotes:
|
|
alpine:
|
|
base_url: "https://dl-cdn.alpinelinux.org"
|
|
type: "remote"
|
|
package: "alpine"
|
|
immutable_patterns:
|
|
- ".*/x86_64/.*\\.apk$"
|
|
cache:
|
|
immutable_ttl: 0
|
|
mutable_ttl: 7200
|
|
```
|
|
|
|
`APKINDEX.tar.gz` is a built-in mutable pattern — no `mutable_patterns` entry needed.
|
|
|
|
### rpm
|
|
|
|
```yaml
|
|
remotes:
|
|
almalinux:
|
|
base_url: "https://mirror.example.com/almalinux"
|
|
type: "remote"
|
|
package: "rpm"
|
|
immutable_patterns:
|
|
- ".*/x86_64/.*\\.rpm$"
|
|
- ".*/noarch/.*\\.rpm$"
|
|
cache:
|
|
immutable_ttl: 0
|
|
mutable_ttl: 7200
|
|
```
|
|
|
|
`repomd.xml` and `repodata/` metadata files are built-in mutable patterns.
|
|
|
|
### docker
|
|
|
|
```yaml
|
|
remotes:
|
|
dockerhub:
|
|
base_url: "https://registry-1.docker.io"
|
|
type: "remote"
|
|
package: "docker"
|
|
# username / password optional for public images
|
|
cache:
|
|
immutable_ttl: 0
|
|
mutable_ttl: 300
|
|
|
|
ghcr:
|
|
base_url: "https://ghcr.io"
|
|
type: "remote"
|
|
package: "docker"
|
|
username: "your-github-username"
|
|
password: "ghp_your_pat" # read:packages scope
|
|
cache:
|
|
immutable_ttl: 0
|
|
mutable_ttl: 300
|
|
```
|
|
|
|
Tag manifests and `/tags/list` are built-in mutable patterns. Digest-addressed blobs are immutable.
|
|
|
|
For RKE2/containerd, configure `/etc/rancher/rke2/registries.yaml`:
|
|
|
|
```yaml
|
|
mirrors:
|
|
docker.io:
|
|
endpoint:
|
|
- "https://artifacts.example.com"
|
|
rewrite:
|
|
"^(.*)$": "dockerhub/$1"
|
|
ghcr.io:
|
|
endpoint:
|
|
- "https://artifacts.example.com"
|
|
rewrite:
|
|
"^(.*)$": "ghcr/$1"
|
|
```
|
|
|
|
### pypi
|
|
|
|
```yaml
|
|
remotes:
|
|
pypi:
|
|
base_url: "https://files.pythonhosted.org"
|
|
type: "remote"
|
|
package: "pypi"
|
|
check_mutable_updates: true
|
|
immutable_patterns:
|
|
- "packages/.*\\.whl$"
|
|
- "packages/.*\\.whl\\.metadata$"
|
|
- "packages/.*\\.tar\\.gz$"
|
|
- "packages/.*\\.zip$"
|
|
cache:
|
|
immutable_ttl: 0
|
|
mutable_ttl: 600
|
|
```
|
|
|
|
> **Note**: Simple index requests (`/simple/{package}/`) are always fetched from `https://pypi.org`, regardless of `base_url`. This is hardcoded — `base_url` only controls where package files are downloaded from. For self-hosted registries (Gitea, Nexus) where both index and files share the same host, set `base_url` to that host and the override does not apply.
|
|
|
|
URLs in simple index HTML are rewritten to route package file downloads back through the same remote.
|
|
|
|
Configure uv:
|
|
|
|
```toml
|
|
# /etc/uv/uv.toml or ~/.config/uv/uv.toml
|
|
[[index]]
|
|
url = "https://artifacts.example.com/api/v1/remote/pypi/simple"
|
|
default = true
|
|
```
|
|
|
|
### npm
|
|
|
|
```yaml
|
|
remotes:
|
|
npm:
|
|
base_url: "https://registry.npmjs.org"
|
|
type: "remote"
|
|
package: "npm"
|
|
check_mutable_updates: true
|
|
immutable_patterns:
|
|
- "\.tgz$"
|
|
mutable_patterns:
|
|
- "^(?!.*\.tgz$).*"
|
|
cache:
|
|
immutable_ttl: 0
|
|
mutable_ttl: 600
|
|
```
|
|
|
|
`dist.tarball` URLs in package metadata JSON are rewritten to route tarball downloads back through the same remote.
|
|
|
|
Configure npm / yarn / pnpm:
|
|
|
|
```ini
|
|
# .npmrc or ~/.npmrc
|
|
registry=https://artifacts.example.com/api/v1/remote/npm/
|
|
```
|
|
|
|
### helm
|
|
|
|
```yaml
|
|
remotes:
|
|
hashicorp-helm:
|
|
base_url: "https://helm.releases.hashicorp.com"
|
|
type: "remote"
|
|
package: "helm"
|
|
check_mutable_updates: true
|
|
immutable_patterns:
|
|
- "\\.tgz$"
|
|
cache:
|
|
immutable_ttl: 0
|
|
mutable_ttl: 3600
|
|
```
|
|
|
|
`index.yaml` is a built-in mutable pattern. Chart URLs inside `index.yaml` are rewritten to route tarball downloads back through the same remote.
|
|
|
|
Configure Helm:
|
|
|
|
```bash
|
|
helm repo add hashicorp https://artifacts.example.com/api/v1/remote/hashicorp-helm
|
|
helm repo update
|
|
```
|
|
|
|
### local
|
|
|
|
```yaml
|
|
remotes:
|
|
local-generic:
|
|
type: "local"
|
|
package: "generic"
|
|
description: "Local file repository"
|
|
cache:
|
|
immutable_ttl: 0
|
|
mutable_ttl: 0
|
|
```
|
|
|
|
No `base_url`. Files are uploaded via `PUT` and served via `GET`.
|
|
|
|
## Caching Model
|
|
|
|
### Immutable patterns
|
|
|
|
Files matching `immutable_patterns` are cached for `immutable_ttl` seconds (0 = indefinitely). Use for versioned release artifacts that never change once published.
|
|
|
|
**Access control**: only paths matching an immutable or mutable pattern are served; all others return 403. Omitting `immutable_patterns` entirely allows all paths from that remote.
|
|
|
|
### Mutable patterns
|
|
|
|
Files matching `mutable_patterns` expire after `mutable_ttl` seconds and are re-fetched on the next request. Mutable files are always served regardless of `immutable_patterns`.
|
|
|
|
Each package type has built-in defaults that are merged with any user-defined `mutable_patterns`:
|
|
|
|
| Package type | Built-in mutable patterns |
|
|
|---|---|
|
|
| `alpine` | `APKINDEX\.tar\.gz$` |
|
|
| `rpm` | `repomd\.xml$`, `repodata/` metadata variants, `Packages\.gz$` |
|
|
| `docker` | Tag manifests (non-digest refs), `/tags/list` |
|
|
| `pypi` | `simple/` (per-package and top-level index pages) |
|
|
| `helm` | `index\.yaml$` |
|
|
| `npm` | *(none built-in — define via `mutable_patterns`)* |
|
|
| `generic` | *(none)* |
|
|
|
|
### Conditional revalidation
|
|
|
|
Set `check_mutable_updates: true` to send `HEAD` with `If-None-Match` / `If-Modified-Since` on TTL expiry. A 304 response refreshes the TTL without re-downloading. Only applies to user-defined `mutable_patterns` — built-in patterns are always re-fetched unconditionally.
|
|
|
|
### Stale-on-upstream-error
|
|
|
|
When a mutable file expires and the upstream is unreachable (connection refused, DNS failure, timeout), the cached copy is kept and its TTL refreshed. HTTP error responses (4xx, 5xx) are not treated as network failures and proceed with normal expiry.
|