Files
artifactapi/PLAN.md
T
benvin b46c116f6b
ci/woodpecker/tag/docker Pipeline was successful
Feat/v3 go rewrite (#47)
Complete rewrite of ArtifactAPI from Python/FastAPI to Go as a single binary.

Core engine:
- 10 package providers: generic, docker, helm, pypi, npm, rpm, alpine,
  puppet, terraform, goproxy — each with built-in mutable patterns
- Content-addressable storage (SHA256 dedup across all remotes)
- Three-tier caching: Redis (TTL/locks) → S3/MinIO (blobs) → upstream
- Classifier with allowlist/blocklist per-remote (empty = allow all)
- Circuit breaker, conditional revalidation, stale-on-error
- Background garbage collection for orphaned blobs
- Access logging to PostgreSQL

API:
- v1 proxy endpoints (backwards compatible)
- v2 management API: CRUD remotes/virtuals, object browser, stats,
  health, SSE events, probe/test endpoint
- Virtual repos with index merging (Helm YAML + PyPI HTML)

Frontend (React + Vite, separate Dockerfile):
- Dashboard with stats, health indicators, top remotes
- Remotes list with type filter, remote detail with config/patterns
- Object browser with pagination and evict
- Test Remote page: probe any remote path, see headers/size/timing
- Virtuals page with expandable member lists

TUI (Bubble Tea):
- Dashboard, remotes list/detail, object browser, virtuals
- Vim-style navigation, artifactapi tui --endpoint <url>

Infrastructure:
- S3 client supports MinIO, Ceph RGW, AWS S3 (minio-go)
- PostgreSQL schema with migrations
- Docker Compose: API + UI + Postgres 17 + Redis 7 + MinIO
- Makefile with Go version check, build/test/lint/fmt/e2e targets
- Distroless Docker image (~15MB)

Testing:
- Unit tests for models, classifier, providers, mergers
- E2E tests with testcontainers-go (real Postgres/Redis/MinIO)

Terraform config:
- All 40 production remotes + helm virtual as HCL
- Provider repo: terraform-provider-artifactapi v0.0.1 (separate)

---------

Co-authored-by: Ben Vincent <ben@unkin.net>
Reviewed-on: #47
2026-06-07 19:30:35 +10:00

881 lines
39 KiB
Markdown
Raw Blame History

This file contains ambiguous Unicode characters
This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.
# ArtifactAPI v3 — Go Rewrite Plan
## Context
ArtifactAPI is a production artifact proxy/cache serving ~42 remotes (Docker registries, Helm repos, RPM/Alpine repos, GitHub releases, PyPI, npm, Puppet Forge, Terraform registries, Go module proxies) across a Kubernetes cluster. The current Python (FastAPI) implementation works but has architectural debt: opaque hashed S3 paths, no UI for visibility, YAML config files that drift, no garbage collection, no access logging, and virtual repos limited to Helm only.
The v3 rewrite targets: a single Go binary (API + TUI), a separate React frontend (own Dockerfile), a Terraform provider (separate repo), content-addressable storage, and a cleaner data model that makes the cache inspectable and manageable.
**Repo**: Same repo (`git.unkin.net/unkin/artifactapi`), new branch.
**Module**: `git.unkin.net/unkin/artifactapi`
**Frontend**: React + Vite, separate Dockerfile, talks to API
**Terraform provider**: Separate repo (`terraform-provider-artifactapi`)
---
## Architecture Overview
```
┌───────────────────────────────────┐ ┌──────────────────────┐
│ Go Binary (API + TUI) │ │ Frontend Container │
│ │ │ │
│ ┌──────────┐ ┌───────────────┐ │ │ React + Vite SPA │
│ │ REST API │ │ Proxy Engine │ │◄───│ nginx / node serve │
│ │ /api/v2 │ │ /api/v1/... │ │ │ Dockerfile.ui │
│ │ │ │ /v2/... (OCI) │ │ └──────────────────────┘
│ └────┬─────┘ └──────┬────────┘ │
│ │ │ │ ┌──────────────────────┐
│ ┌────┴───────────────┴────────┐ │ │ Terraform Provider │
│ │ Data Layer │ │◄───│ (separate repo) │
│ │ PostgreSQL · Redis · S3 │ │ └──────────────────────┘
│ └─────────────────────────────┘ │
│ │ ┌──────────────────────┐
│ ┌─────────────────────────────┐ │ │ TUI (subcommand) │
│ │ artifactapi tui │──│───►│ artifactapi tui │
│ └─────────────────────────────┘ │ │ --endpoint <url> │
└───────────────────────────────────┘ └──────────────────────┘
```
Three independent deployment units:
1. **Go binary** — API server + TUI subcommand (single `Dockerfile`)
2. **React frontend** — SPA served by nginx (`Dockerfile.ui`), talks to `/api/v2`
3. **Terraform provider** — separate repo, calls `/api/v2` CRUD
---
## Project Structure (Modular)
```
artifactapi/
├── cmd/
│ └── artifactapi/
│ └── main.go # entrypoint: serve / tui subcommands
├── pkg/ # PUBLIC — importable by terraform provider, CLI tools
│ ├── models/ # shared domain types
│ │ ├── remote.go # Remote, RemoteConfig, PackageType enum
│ │ ├── virtual.go # Virtual, VirtualConfig
│ │ ├── artifact.go # Artifact, Blob, AccessLogEntry
│ │ ├── local.go # LocalFile, LocalRepo
│ │ └── stats.go # RemoteStats, OverviewStats
│ └── client/ # typed Go API client (used by TUI + Terraform provider)
│ ├── client.go # Client struct, base HTTP
│ ├── remotes.go # remote CRUD methods
│ ├── virtuals.go # virtual CRUD methods
│ ├── objects.go # object browse/evict methods
│ └── stats.go # stats methods
├── internal/ # PRIVATE — server internals
│ ├── server/
│ │ ├── server.go # HTTP server setup, router
│ │ └── middleware.go # logging, recovery, request-id, access logging
│ │
│ ├── api/
│ │ ├── v1/ # proxy endpoints (v1 compat)
│ │ │ ├── proxy.go # GET /api/v1/remote/{name}/{path}
│ │ │ ├── docker.go # /v2/{name}/{path}
│ │ │ ├── virtual.go # GET /api/v1/virtual/{name}/{path}
│ │ │ └── local.go # CRUD /api/v1/local/{name}/{path}
│ │ └── v2/ # management API
│ │ ├── remotes.go # CRUD + stats
│ │ ├── virtuals.go # CRUD
│ │ ├── objects.go # browse/evict cached objects
│ │ ├── stats.go # overview, top-remotes
│ │ ├── events.go # SSE stream
│ │ └── health.go # health, metrics
│ │
│ ├── provider/ # package-type providers (registry protocol handlers)
│ │ ├── provider.go # Provider interface + registry
│ │ ├── generic/
│ │ │ ├── generic.go
│ │ │ └── generic_test.go
│ │ ├── docker/
│ │ │ ├── docker.go # OCI Distribution v2 via go-containerregistry
│ │ │ ├── auth.go # Bearer token fetch + cache
│ │ │ └── docker_test.go
│ │ ├── helm/
│ │ │ ├── helm.go # index rewriting via helm.sh/helm/v3/pkg/repo
│ │ │ ├── merger.go # virtual index merge
│ │ │ └── helm_test.go
│ │ ├── pypi/
│ │ │ ├── pypi.go # simple index HTML rewriting
│ │ │ ├── merger.go # virtual simple index merge
│ │ │ └── pypi_test.go
│ │ ├── npm/
│ │ │ ├── npm.go # metadata JSON rewriting
│ │ │ └── npm_test.go
│ │ ├── rpm/
│ │ │ ├── rpm.go # repodata patterns
│ │ │ └── rpm_test.go
│ │ ├── alpine/
│ │ │ ├── alpine.go # APKINDEX patterns
│ │ │ └── alpine_test.go
│ │ ├── puppet/
│ │ │ ├── puppet.go # file_uri JSON rewriting
│ │ │ └── puppet_test.go
│ │ ├── terraform/
│ │ │ ├── terraform.go # registry protocol, download URL rewriting
│ │ │ └── terraform_test.go
│ │ └── goproxy/
│ │ ├── goproxy.go # Go module proxy protocol (GOPROXY)
│ │ └── goproxy_test.go
│ │
│ ├── proxy/
│ │ ├── engine.go # core fetch-or-cache logic
│ │ ├── engine_test.go
│ │ ├── classifier.go # immutable vs mutable classification
│ │ ├── classifier_test.go
│ │ ├── revalidator.go # conditional HEAD requests (ETag/Last-Modified)
│ │ └── circuit.go # per-remote circuit breaker
│ │
│ ├── storage/
│ │ ├── s3.go # S3 client (minio-go — works with MinIO, Ceph, AWS)
│ │ ├── s3_test.go
│ │ ├── cas.go # content-addressable store logic
│ │ └── cas_test.go
│ │
│ ├── cache/
│ │ ├── redis.go # TTL management, fetch locks
│ │ ├── redis_test.go
│ │ └── lock.go # distributed lock abstraction
│ │
│ ├── database/
│ │ ├── postgres.go # connection pool, migration runner
│ │ ├── queries/ # SQL query files or sqlc-generated code
│ │ │ ├── remotes.sql.go
│ │ │ ├── virtuals.sql.go
│ │ │ ├── artifacts.sql.go
│ │ │ └── access_log.sql.go
│ │ └── migrations/ # golang-migrate SQL files
│ │ ├── 001_initial.up.sql
│ │ └── 001_initial.down.sql
│ │
│ ├── metrics/
│ │ └── prometheus.go # counters, gauges, histograms
│ │
│ ├── gc/
│ │ ├── gc.go # background garbage collection goroutine
│ │ └── gc_test.go
│ │
│ ├── tui/
│ │ ├── app.go # Bubble Tea main model
│ │ ├── views/
│ │ │ ├── dashboard.go
│ │ │ ├── remotes.go
│ │ │ ├── objects.go
│ │ │ └── virtuals.go
│ │ └── components/
│ │ ├── table.go
│ │ └── statusbar.go
│ │
│ └── config/
│ └── env.go # environment variable parsing + validation
├── ui/ # React frontend — SEPARATE DOCKERFILE
│ ├── src/
│ │ ├── App.tsx
│ │ ├── pages/
│ │ │ ├── Dashboard.tsx
│ │ │ ├── Remotes.tsx
│ │ │ ├── RemoteDetail.tsx
│ │ │ ├── Virtuals.tsx
│ │ │ └── Objects.tsx
│ │ ├── components/
│ │ │ ├── RemoteTable.tsx
│ │ │ ├── ObjectBrowser.tsx
│ │ │ ├── StatsCard.tsx
│ │ │ └── EventFeed.tsx
│ │ └── api/
│ │ └── client.ts # typed API client
│ ├── package.json
│ ├── vite.config.ts
│ ├── tsconfig.json
│ ├── Dockerfile.ui # multi-stage: node build → nginx
│ └── nginx.conf # proxy /api/* to backend, serve SPA
├── e2e/ # end-to-end integration tests
│ ├── e2e_test.go # TestMain spins up docker-compose stack
│ ├── proxy_test.go # proxy through real remotes
│ ├── docker_test.go # Docker v2 protocol e2e
│ ├── management_test.go # v2 API CRUD
│ ├── virtual_test.go # virtual repo merge e2e
│ └── docker-compose.e2e.yml # postgres + redis + minio for tests
├── go.mod
├── go.sum
├── Makefile
├── Dockerfile # Go binary (API server + TUI)
├── Dockerfile.ui # symlink or copy → ui/Dockerfile.ui
└── docker-compose.yml
```
### Key Modularisation Decisions
- **`pkg/models/`** — Shared domain types importable by the Terraform provider and any external tooling. No dependencies on internal packages
- **`pkg/client/`** — Typed Go API client used by both the TUI and the Terraform provider. Depends only on `pkg/models/` and stdlib
- **`internal/provider/`** — Each package type is its own subpackage with isolated tests. A provider registry maps `PackageType → Provider`
- **`internal/database/queries/`** — Use [sqlc](https://sqlc.dev/) to generate type-safe query functions from SQL, or hand-written query files
- **`e2e/`** — Separate test binary that spins up a real docker-compose stack
---
## Go Ecosystem Libraries
Prefer existing, maintained Go modules over writing protocol handlers from scratch.
### Package-Type Libraries
| Package Type | Go Module | What It Gives Us |
|---|---|---|
| **Docker/OCI** | `github.com/google/go-containerregistry` | Full Registry v2/OCI client: manifest parsing, auth challenges, blob operations. `pkg/registry` can implement a v2 server. Reference: `github.com/regclient/regclient` |
| **Helm** | `helm.sh/helm/v3/pkg/repo` | Parse/generate `index.yaml`, `IndexFile`/`ChartVersion` types, URL entries. Used directly for merge |
| **Terraform** | `github.com/hashicorp/terraform-registry-address` | Provider/module address parsing, `ForRegistryProtocol()` URL generation. Protocol spec: provider registry protocol v1 |
| **Go Modules** | `github.com/goproxy/goproxy` | Minimalist GOPROXY protocol handler, implements full spec as `http.Handler`. Handles `/@v/list`, `/@v/{v}.info`, `/@v/{v}.mod`, `/@v/{v}.zip`, `/@latest` |
| **RPM** | `rs3.io/go/rpm/repomd` | Parse `repomd.xml`, `primary.xml` with proper XML namespace handling |
| **Alpine** | `gitlab.alpinelinux.org/alpine/go` | Official Alpine library: parse APKINDEX, `.apk` files |
| **PyPI** | stdlib `golang.org/x/net/html` | No dedicated Go PyPI library exists. Parse simple index HTML with `x/net/html`, extract `<a>` tags. Minimal — the rewriting is just href replacement |
| **npm** | stdlib `encoding/json` | npm metadata is JSON — parse with stdlib, rewrite `dist.tarball` URLs. No special library needed |
| **Puppet Forge** | stdlib `encoding/json` | Forge API is JSON — parse and rewrite `file_uri` fields. Community lib `github.com/johnmccabe/go-puppetforge` exists but is thin; stdlib suffices |
### Infrastructure Libraries
| Purpose | Go Module | Why This One |
|---|---|---|
| **HTTP router** | `github.com/go-chi/chi/v5` | Lightweight, stdlib `http.Handler` compatible, middleware chain |
| **PostgreSQL** | `github.com/jackc/pgx/v5` | Pure Go, connection pooling, COPY support, prepared statements |
| **SQL generation** | `github.com/sqlc-dev/sqlc` | Generate type-safe Go from SQL queries — no ORM, no reflection |
| **Redis** | `github.com/redis/go-redis/v9` | Full Redis client, pipelining, pub/sub |
| **S3 (MinIO/Ceph/AWS)** | `github.com/minio/minio-go/v7` | Native S3-compatible client. Works with MinIO, Ceph RGW, AWS S3, any S3-compatible backend out of the box. Lighter than aws-sdk-go-v2, purpose-built for S3 compat |
| **DB migrations** | `github.com/golang-migrate/migrate/v4` | SQL file-based migrations, CLI + library |
| **Prometheus** | `github.com/prometheus/client_golang` | Counters, gauges, histograms |
| **TUI** | `github.com/charmbracelet/bubbletea` | Elm-architecture TUI framework |
| **TUI styling** | `github.com/charmbracelet/lipgloss` | Terminal styling |
| **TUI components** | `github.com/charmbracelet/bubbles` | Table, text input, spinner, etc. |
| **Structured logging** | `log/slog` (stdlib) | Go 1.21+ structured logging, zero dependencies |
| **Testing** | `github.com/stretchr/testify` | Assertions + require for unit tests |
| **Test containers** | `github.com/testcontainers/testcontainers-go` | Spin up Postgres/Redis/MinIO in e2e tests |
### S3 Client: Multi-Backend Support
Using `minio-go/v7` as the S3 client because it natively supports:
- **MinIO** — primary development/production target
- **Ceph RGW** — S3-compatible via endpoint config
- **AWS S3** — via region + credential config
- **Any S3-compatible** — GCS (interop mode), Wasabi, DigitalOcean Spaces, etc.
No abstraction layer needed — `minio-go` handles endpoint differences internally. Config:
```go
client, _ := minio.New(endpoint, &minio.Options{
Creds: credentials.NewStaticV4(accessKey, secretKey, ""),
Secure: useTLS,
Region: region, // optional, for AWS
})
```
---
## Data Layer
### PostgreSQL Schema
```sql
-- Remotes: managed exclusively by Terraform
CREATE TABLE remotes (
name TEXT PRIMARY KEY,
package_type TEXT NOT NULL, -- generic, docker, helm, pypi, npm, rpm, alpine, puppet, terraform, goproxy
base_url TEXT NOT NULL,
description TEXT DEFAULT '',
username TEXT DEFAULT '',
password TEXT DEFAULT '',
immutable_ttl INTEGER DEFAULT 0,
mutable_ttl INTEGER DEFAULT 3600,
check_mutable BOOLEAN DEFAULT TRUE,
immutable_patterns TEXT[] DEFAULT '{}', -- user-defined immutable patterns
mutable_patterns TEXT[] DEFAULT '{}', -- user-defined mutable patterns (merged with provider built-ins)
allowlist TEXT[] DEFAULT '{}', -- if empty, allow all paths; if non-empty, only matching paths proxied
blocklist TEXT[] DEFAULT '{}', -- always denied, checked before allowlist
ban_tags_enabled BOOLEAN DEFAULT FALSE,
ban_tags TEXT[] DEFAULT '{}',
quarantine_enabled BOOLEAN DEFAULT FALSE,
quarantine_days INTEGER DEFAULT 3,
stale_on_error BOOLEAN DEFAULT TRUE,
releases_remote TEXT DEFAULT '', -- terraform type: name of CDN remote for download URL rewriting
managed_by TEXT DEFAULT '', -- 'terraform' or empty
created_at TIMESTAMPTZ DEFAULT NOW(),
updated_at TIMESTAMPTZ DEFAULT NOW()
);
-- Virtual repositories
CREATE TABLE virtuals (
name TEXT PRIMARY KEY,
package_type TEXT NOT NULL,
description TEXT DEFAULT '',
members TEXT[] NOT NULL,
managed_by TEXT DEFAULT '',
created_at TIMESTAMPTZ DEFAULT NOW(),
updated_at TIMESTAMPTZ DEFAULT NOW()
);
-- Content-addressable blob storage tracking
CREATE TABLE blobs (
content_hash TEXT PRIMARY KEY,
s3_key TEXT NOT NULL,
size_bytes BIGINT NOT NULL,
content_type TEXT DEFAULT 'application/octet-stream',
created_at TIMESTAMPTZ DEFAULT NOW()
);
-- Artifact metadata: maps (remote, path) → content blob
CREATE TABLE artifacts (
id BIGSERIAL PRIMARY KEY,
remote_name TEXT NOT NULL REFERENCES remotes(name) ON DELETE CASCADE,
path TEXT NOT NULL,
content_hash TEXT NOT NULL REFERENCES blobs(content_hash),
upstream_etag TEXT DEFAULT '',
upstream_last_modified TIMESTAMPTZ,
first_seen_at TIMESTAMPTZ DEFAULT NOW(),
last_fetched_at TIMESTAMPTZ DEFAULT NOW(),
last_accessed_at TIMESTAMPTZ DEFAULT NOW(),
fetch_count BIGINT DEFAULT 1,
access_count BIGINT DEFAULT 1,
UNIQUE(remote_name, path)
);
CREATE INDEX idx_artifacts_remote ON artifacts(remote_name);
CREATE INDEX idx_artifacts_last_accessed ON artifacts(last_accessed_at);
-- Local file uploads
CREATE TABLE local_files (
id BIGSERIAL PRIMARY KEY,
repo_name TEXT NOT NULL,
file_path TEXT NOT NULL,
content_hash TEXT NOT NULL REFERENCES blobs(content_hash),
created_at TIMESTAMPTZ DEFAULT NOW(),
UNIQUE(repo_name, file_path)
);
-- Access log (append-only, powers dashboards)
CREATE TABLE access_log (
id BIGSERIAL PRIMARY KEY,
remote_name TEXT NOT NULL,
path TEXT NOT NULL,
cache_hit BOOLEAN NOT NULL,
size_bytes BIGINT DEFAULT 0,
upstream_ms INTEGER DEFAULT 0,
client_ip TEXT DEFAULT '',
created_at TIMESTAMPTZ DEFAULT NOW()
);
CREATE INDEX idx_access_log_remote_time ON access_log(remote_name, created_at);
```
### Redis Usage (Ephemeral Only)
| Key pattern | Type | TTL | Purpose |
|---|---|---|---|
| `ttl:{remote}:{path}` | STRING | remote's immutable/mutable TTL | Artifact freshness — existence = still fresh |
| `lock:{remote}:{path}` | STRING (NX) | 30s | Fetch lock — prevents thundering herd |
| `etag:{remote}:{path}` | STRING | same as TTL key | Cached ETag for conditional revalidation |
| `circuit:{remote}` | STRING | configurable | Circuit breaker — consecutive failure count |
Losing Redis = all TTLs expire = next request re-validates upstream. No data loss.
### S3 Layout (Content-Addressable)
```
artifacts-bucket/
├── blobs/sha256/{content_hash} # immutable CAS blobs
├── indexes/{remote}/{path} # mutable index files (helm, pypi, rpm, etc.)
├── indexes/{virtual}/{path} # merged virtual indexes
└── local/{repo}/{path} # user uploads (CAS-backed via blobs table)
```
---
## Terraform Remote Type (New in v2)
The `terraform` package type proxies the Terraform Provider Registry Protocol:
- **URL construction**: prepends `/v1/providers/` to request paths
- **Built-in mutable pattern**: `[^/]+/[^/]+/versions$` (version listings change over time)
- **Built-in immutable pattern**: `[^/]+/[^/]+/[^/]+/download/[^/]+/[^/]+$` (per-version download info is fixed)
- **Response rewriting**: download info JSON — rewrites `download_url`, `shasums_url`, `shasums_signature_url` to route through a companion `releases_remote` (e.g., `hashicorp-releases` generic remote)
- **Config**: requires `releases_remote` field pointing to the CDN remote that serves the actual binaries
Uses `github.com/hashicorp/terraform-registry-address` for address parsing and protocol-compliant URL generation.
---
## Go Module Proxy Remote Type (New)
The `goproxy` package type implements the GOPROXY protocol (Go module proxy):
| Endpoint | Mutability | Description |
|---|---|---|
| `{module}/@v/list` | Mutable | Plain text list of known versions |
| `{module}/@latest` | Mutable | JSON metadata for latest version |
| `{module}/@v/{version}.info` | Immutable | JSON version metadata (`Version`, `Time`) |
| `{module}/@v/{version}.mod` | Immutable | `go.mod` file for that version |
| `{module}/@v/{version}.zip` | Immutable | Source archive for that version |
- **No URL rewriting needed** — responses are self-contained (no embedded URLs)
- **Config**: `base_url` points to upstream proxy (e.g., `https://proxy.golang.org`)
- **Client usage**: set `GOPROXY=https://artifactapi.example.com/api/v1/remote/goproxy`
- Uses `github.com/goproxy/goproxy` for protocol handling
---
## Allowlist / Blocklist / Automatic Mutable Patterns
### Access Control (Per-Remote)
| Field | Default | Behavior |
|---|---|---|
| `blocklist` | `[]` (empty) | If a path matches any blocklist pattern → **403 Forbidden**. Checked first |
| `allowlist` | `[]` (empty) | If empty → **allow everything**. If non-empty → only matching paths are proxied; everything else → **403** |
Evaluation order: blocklist → allowlist → proxy. No allowlist + no blocklist = open proxy (default).
### Automatic Mutable Patterns (Per-Provider Built-ins)
Each provider declares built-in mutable patterns that are **always merged** with user-defined `mutable_patterns`. Users never need to configure these — the provider knows which paths change over time.
| Provider | Built-in Mutable Patterns | Rationale |
|---|---|---|
| **generic** | *(none)* | No convention for what's mutable |
| **docker** | `/manifests/(?!sha256:)[^/]+$`, `/tags/list$` | Tag manifests change; digest manifests don't |
| **helm** | `index\.yaml$` | Chart index changes when new charts are published |
| **pypi** | `simple/` | Package index pages change with new releases |
| **npm** | `^[^/]+$` (package metadata, not `.tgz`) | Package metadata changes; tarballs are immutable |
| **rpm** | `repomd\.xml$`, `repodata/.*`, `Packages\.gz$` | Repo metadata rebuilt on every publish |
| **alpine** | `APKINDEX\.tar\.gz$` | Package index rebuilt on every publish |
| **puppet** | `^v3/modules/`, `^v3/releases` | Module metadata changes with new releases |
| **terraform** | `[^/]+/[^/]+/versions$` | Provider version listings grow over time |
| **goproxy** | `@v/list$`, `@latest$` | Version list and latest pointer change |
These are returned by `Provider.BuiltinMutablePatterns()` and merged at classification time:
```
effective_mutable = provider.BuiltinMutablePatterns() remote.mutable_patterns
```
If a path matches `effective_mutable` → use `mutable_ttl`. If it matches `remote.immutable_patterns` → use `immutable_ttl`. Immutable patterns take precedence over mutable when both match.
---
## API Design
### v1 Proxy Endpoints (Backwards Compatible)
| Method | Path | Description |
|---|---|---|
| `GET` | `/api/v1/remote/{name}/{path}` | Proxy/cache artifact |
| `GET` | `/api/v1/virtual/{name}/{path}` | Virtual repo proxy |
| `GET/HEAD` | `/v2/{name}/{path}` | Docker Registry v2 |
| `GET` | `/v2/` | Docker v2 ping |
| `GET/PUT/HEAD/DELETE` | `/api/v1/local/{name}/{path}` | Local repo CRUD |
### v2 Management API (New)
```
GET /api/v2/remotes → [{name, package_type, base_url, description, stats}]
GET /api/v2/remotes/{name} → {full config + stats + health}
POST /api/v2/remotes → create remote (Terraform provider)
PUT /api/v2/remotes/{name} → update remote (Terraform provider)
DELETE /api/v2/remotes/{name} → delete remote — cascades artifacts, GC cleans S3
GET /api/v2/virtuals → [{name, package_type, members, stats}]
GET /api/v2/virtuals/{name} → {full config + member details}
POST /api/v2/virtuals → create virtual
PUT /api/v2/virtuals/{name} → update virtual
DELETE /api/v2/virtuals/{name} → delete virtual
GET /api/v2/remotes/{name}/objects → paginated objects
?q=pattern&sort=size|accessed|age&page=1&per_page=50
DELETE /api/v2/remotes/{name}/objects/{path} → evict specific cached object
DELETE /api/v2/remotes/{name}/cache → flush cache
?type=all|indexes|blobs
GET /api/v2/stats → overview stats
GET /api/v2/stats/top-remotes → top remotes by size/requests/hit-rate
GET /api/v2/health → {status, postgres, redis, s3, uptime}
GET /metrics → Prometheus format
GET /api/v2/events → SSE stream
```
---
## Proxy Engine
### Request Flow
```
Client Request
Classify (immutable/mutable/denied)
├── blocklist match → 403
├── allowlist non-empty + no match → 403
Check Redis TTL key
├── exists (fresh) → serve from S3, log access
├── missing (expired or uncached)
│ │
│ ▼
│ Acquire fetch lock (Redis SETNX, 30s TTL)
│ │
│ ├── lock acquired
│ │ ├── mutable + check_mutable + have ETag → HEAD upstream
│ │ │ ├── 304 → refresh TTL, serve from S3
│ │ │ └── changed → full fetch
│ │ └── full fetch from upstream
│ │ → provider.RewriteResponse() if needed
│ │ → CAS store (hash → check blobs → upload if new)
│ │ → upsert artifact in Postgres
│ │ → set Redis TTL + release lock
│ │ → on upstream error + stale_on_error → refresh TTL, serve stale
│ │
│ └── lock not acquired → poll S3 briefly, serve if another pod fetched it
Stream response from S3, log access
```
### Circuit Breaker
Per-remote, tracked in Redis. Closed → Open (after N failures) → Half-open (after cooldown). Exposed via `GET /api/v2/remotes/{name}` health field.
### Content-Addressable Storage
1. Stream upstream → temp file, compute SHA256 inline
2. Check `blobs` table for hash
3. Exists → skip S3 upload, upsert `artifacts` row only
4. New → upload to `blobs/sha256/{hash}`, insert both rows
### Garbage Collection
Background goroutine (configurable interval, default 1h):
1. Orphaned blobs: delete S3 objects whose `content_hash` has no referencing `artifacts` or `local_files` rows
2. Cold artifacts: optional per-remote, delete artifacts not accessed in N days
3. Remote deletion: `ON DELETE CASCADE` handles Postgres; GC sweeps orphaned blobs
---
## Package Providers
### Provider Interface
```go
type Provider interface {
Type() models.PackageType
BuiltinMutablePatterns() []*regexp.Regexp
BuiltinImmutablePatterns() []*regexp.Regexp
ContentType(path string) string
UpstreamURL(remote models.Remote, path string) string
RewriteResponse(body []byte, remote models.Remote, proxyBaseURL string) ([]byte, error)
AuthHeaders(ctx context.Context, remote models.Remote) (http.Header, error)
}
type IndexMerger interface {
MergeIndexes(members []MemberIndex, proxyBaseURL string) ([]byte, error)
}
```
### Provider Registry
```go
var registry = map[models.PackageType]Provider{
models.PackageGeneric: &generic.Provider{},
models.PackageDocker: &docker.Provider{},
models.PackageHelm: &helm.Provider{},
models.PackagePyPI: &pypi.Provider{},
models.PackageNPM: &npm.Provider{},
models.PackageRPM: &rpm.Provider{},
models.PackageAlpine: &alpine.Provider{},
models.PackagePuppet: &puppet.Provider{},
models.PackageTerraform: &terraform.Provider{},
models.PackageGoProxy: &goproxy.Provider{},
}
func Get(t models.PackageType) (Provider, error) { ... }
```
Each provider lives in its own subpackage under `internal/provider/` with its own `_test.go`.
---
## Testing Strategy
### Unit Tests
Every package gets `_test.go` files alongside the source. Run with `go test ./...`.
| Package | What's Tested |
|---|---|
| `internal/provider/docker/` | Auth token parsing/caching, manifest classification, tag banning, URL construction, blob key generation |
| `internal/provider/helm/` | `index.yaml` parsing (using `helm.sh/helm/v3/pkg/repo`), URL rewriting, index merging |
| `internal/provider/pypi/` | Simple index HTML parsing, URL rewriting, index merging |
| `internal/provider/npm/` | Metadata JSON rewriting (`dist.tarball` URLs) |
| `internal/provider/terraform/` | Registry URL construction, download info JSON rewriting, `releases_remote` URL extraction |
| `internal/provider/rpm/` | Mutable pattern matching (repodata) |
| `internal/provider/alpine/` | Mutable pattern matching (APKINDEX) |
| `internal/provider/puppet/` | `file_uri` JSON rewriting |
| `internal/proxy/` | Classifier (immutable vs mutable vs denied), circuit breaker state transitions, revalidator logic |
| `internal/storage/` | CAS key generation, dedup detection, S3 operation mocking |
| `internal/cache/` | Redis TTL set/check, fetch lock acquire/release/contention |
| `internal/gc/` | Orphan detection queries, cold artifact selection |
| `pkg/models/` | Model validation, PackageType enum |
| `pkg/client/` | API client request/response serialization |
### End-to-End Tests
Located in `e2e/`. Use `testcontainers-go` to spin up real Postgres, Redis, and MinIO containers. The test binary starts the actual `artifactapi` server against these backends.
```go
// e2e/e2e_test.go
func TestMain(m *testing.M) {
// Start postgres, redis, minio via testcontainers-go
// Run migrations
// Start artifactapi server on random port
// Run tests
// Tear down
}
```
| Test File | What's Tested |
|---|---|
| `e2e/proxy_test.go` | Proxy a real GitHub release through generic remote, verify S3 storage, verify Redis TTL, verify Postgres artifact row, verify cache hit on second request |
| `e2e/docker_test.go` | Pull a real image manifest + blob through Docker v2 proxy, verify blob deduplication, tag banning |
| `e2e/management_test.go` | Full CRUD lifecycle: create remote via v2 API, proxy through it, list objects, evict object, flush cache, delete remote |
| `e2e/virtual_test.go` | Create two helm remotes + virtual, fetch merged index, verify priority ordering |
| `e2e/terraform_test.go` | Proxy terraform provider version listing + download info, verify URL rewriting to releases_remote |
| `e2e/goproxy_test.go` | Proxy Go module `@v/list`, `.info`, `.mod`, `.zip` through GOPROXY remote, verify mutable vs immutable classification |
| `e2e/gc_test.go` | Create artifact, delete remote, trigger GC, verify S3 blob cleaned up |
### Code Quality
- `gofmt` / `goimports` — enforced in CI, run on save
- `golangci-lint` — comprehensive linter suite (staticcheck, errcheck, govet, etc.)
- `go vet ./...` — run in CI
- Makefile targets: `make test`, `make lint`, `make e2e`, `make fmt`
---
## Terraform Provider (Separate Repo)
**Repo**: `terraform-provider-artifactapi`
**Uses**: `pkg/client/` and `pkg/models/` from the main module
```hcl
provider "artifactapi" {
endpoint = "https://artifactapi.k8s.syd1.au.unkin.net"
}
resource "artifactapi_remote" "terraform_registry" {
name = "terraform-registry"
package_type = "terraform"
base_url = "https://registry.terraform.io"
description = "Terraform provider registry"
releases_remote = artifactapi_remote.hashicorp_releases.name
immutable_patterns = [
"[^/]+/[^/]+/[^/]+/download/[^/]+/[^/]+$",
]
cache {
immutable_ttl = 0
mutable_ttl = 300
}
}
resource "artifactapi_remote" "hashicorp_releases" {
name = "hashicorp-releases"
package_type = "generic"
base_url = "https://releases.hashicorp.com"
immutable_patterns = [
".*\\.zip$",
".*SHA256SUMS(\\.sig)?$",
]
cache {
immutable_ttl = 0
mutable_ttl = 0
}
}
resource "artifactapi_virtual" "helm" {
name = "helm"
package_type = "helm"
description = "All helm repos merged"
members = [
artifactapi_remote.jetstack.name,
artifactapi_remote.hashicorp_helm.name,
]
}
```
---
## Web UI (React + Vite — Separate Container)
### Deployment
Separate `Dockerfile.ui`: multi-stage build (node → nginx). Served as its own container/pod. nginx proxies `/api/*` to the Go backend.
### Pages
| Route | Content |
|---|---|
| `/` | Dashboard: total objects, storage used, dedup savings, bandwidth saved, top remotes chart, live SSE event feed, health indicators |
| `/remotes` | Remote table: name, type, description, object count, size, hit rate, health. Filter by type, sort any column |
| `/remotes/:name` | Config (read-only, "Managed by Terraform" badge), stats, object browser with search/sort/evict, flush actions |
| `/virtuals` | Virtual table: name, type, members, merged object count |
| `/virtuals/:name` | Member list with individual stats |
All config is read-only — managed by Terraform.
---
## TUI (Bubble Tea — Subcommand)
`artifactapi tui --endpoint http://localhost:8000` or via `ARTIFACTAPI_ENDPOINT` env.
Uses `pkg/client/` for all API calls (same client as Terraform provider).
| View | Key bindings |
|---|---|
| Dashboard | summary stats, top remotes |
| Remotes list | `j`/`k` navigate, `/` filter, `Enter` detail |
| Remote detail | config + stats, `Enter` → object browser |
| Object browser | `/` search, `d` evict, `f` flush |
| Virtuals | `j`/`k`, `Enter` detail |
---
## Improvements Over v2
| Area | v2 (Python) | v3 (Go) |
|---|---|---|
| S3 paths | Hashed, opaque | Content-addressed CAS |
| Config | YAML files, mtime reload | Terraform via API |
| Package types | 8 types | 10 types (+ terraform, goproxy) |
| Virtual repos | Helm only | Helm + PyPI, extensible |
| Deduplication | Docker blobs only | All types via CAS |
| Revalidation | Opt-in flag | Default for all mutable |
| Access logging | None | Per-artifact in Postgres |
| GC | None | Background goroutine |
| Upstream health | Per-request | Circuit breaker |
| S3 backends | MinIO only | MinIO, Ceph, AWS (minio-go) |
| UI | None | Web dashboard + TUI |
| Binary | Python + venv | Static Go binary |
| Frontend | N/A | Separate container (React) |
| Testing | Mocked unit tests | Unit + e2e with real backends |
---
## Implementation Phases
### Phase 1: Core Engine + Models
- Go module, Makefile (`make build test lint fmt e2e`), Dockerfile, docker-compose
- `pkg/models/` — all domain types
- PostgreSQL schema + migrations
- S3 storage layer with CAS (`minio-go/v7`)
- Redis cache layer (TTL, locks)
- Proxy engine: fetch-or-cache, classifier, revalidator
- Generic + Docker providers (most complexity: OCI auth, CAS, tag banning)
- Health + metrics endpoints
- Unit tests for all packages
- **Milestone**: proxy Docker + generic, cache in S3, track in Postgres
### Phase 2: All Providers
- Helm (using `helm.sh/helm/v3/pkg/repo`)
- PyPI (stdlib `x/net/html`)
- npm (stdlib `encoding/json`)
- RPM (using `rs3.io/go/rpm/repomd`)
- Alpine (using `gitlab.alpinelinux.org/alpine/go`)
- Puppet Forge (stdlib `encoding/json`)
- Terraform (using `hashicorp/terraform-registry-address`)
- Go Modules / GOPROXY (using `github.com/goproxy/goproxy`)
- Unit tests per provider
- **Milestone**: feature parity with v2 + goproxy
### Phase 3: Management API + Virtual Repos + GC
- `pkg/client/` — shared Go API client
- v2 CRUD endpoints
- Virtual repo engine: `IndexMerger` for Helm + PyPI
- Circuit breaker
- Access logging middleware
- GC goroutine
- **Milestone**: full API, virtuals, GC
### Phase 4: End-to-End Tests
- `e2e/` test suite with `testcontainers-go`
- Proxy, Docker, management, virtual, terraform, GC tests
- CI pipeline: `make e2e`
- **Milestone**: comprehensive e2e coverage
### Phase 5: Terraform Provider
- Separate repo: `terraform-provider-artifactapi`
- Imports `pkg/client/` and `pkg/models/`
- `artifactapi_remote` + `artifactapi_virtual` resources + data sources
- Import support
- **Milestone**: manage all config via Terraform
### Phase 6: Web UI
- React + Vite in `ui/`
- `Dockerfile.ui` (multi-stage → nginx)
- Dashboard, remotes, objects, virtuals pages
- SSE event feed
- **Milestone**: full web UI in separate container
### Phase 7: TUI
- Bubble Tea in `internal/tui/`
- Uses `pkg/client/`
- Dashboard, remotes, objects, virtuals views
- **Milestone**: TUI feature parity with web UI
### Phase 8: Migration + Cutover
- Migration tool: v2 YAML → Terraform HCL + `terraform import` commands
- S3 rehash script: `{remote}/{hash16}/{file}``blobs/sha256/{content_hash}`
- Parallel run, response comparison
- Cutover
---
## Makefile Targets
```makefile
.PHONY: build test lint fmt e2e docker docker-ui
build: ## Build Go binary
go build -o bin/artifactapi ./cmd/artifactapi
test: ## Run unit tests
go test ./...
lint: ## Run golangci-lint + go vet
golangci-lint run ./...
go vet ./...
fmt: ## Format code (gofmt + goimports)
gofmt -w .
goimports -w .
e2e: ## Run end-to-end tests (requires Docker)
go test -tags=e2e -count=1 -timeout=5m ./e2e/...
docker: ## Build API server Docker image
docker build -t artifactapi .
docker-ui: ## Build frontend Docker image
docker build -t artifactapi-ui -f ui/Dockerfile.ui ui/
compose: ## Start full stack (API + UI + Postgres + Redis + MinIO)
docker compose up -d
```