Feat/v3 go rewrite (#47)
ci/woodpecker/tag/docker Pipeline was successful

Complete rewrite of ArtifactAPI from Python/FastAPI to Go as a single binary.

Core engine:
- 10 package providers: generic, docker, helm, pypi, npm, rpm, alpine,
  puppet, terraform, goproxy — each with built-in mutable patterns
- Content-addressable storage (SHA256 dedup across all remotes)
- Three-tier caching: Redis (TTL/locks) → S3/MinIO (blobs) → upstream
- Classifier with allowlist/blocklist per-remote (empty = allow all)
- Circuit breaker, conditional revalidation, stale-on-error
- Background garbage collection for orphaned blobs
- Access logging to PostgreSQL

API:
- v1 proxy endpoints (backwards compatible)
- v2 management API: CRUD remotes/virtuals, object browser, stats,
  health, SSE events, probe/test endpoint
- Virtual repos with index merging (Helm YAML + PyPI HTML)

Frontend (React + Vite, separate Dockerfile):
- Dashboard with stats, health indicators, top remotes
- Remotes list with type filter, remote detail with config/patterns
- Object browser with pagination and evict
- Test Remote page: probe any remote path, see headers/size/timing
- Virtuals page with expandable member lists

TUI (Bubble Tea):
- Dashboard, remotes list/detail, object browser, virtuals
- Vim-style navigation, artifactapi tui --endpoint <url>

Infrastructure:
- S3 client supports MinIO, Ceph RGW, AWS S3 (minio-go)
- PostgreSQL schema with migrations
- Docker Compose: API + UI + Postgres 17 + Redis 7 + MinIO
- Makefile with Go version check, build/test/lint/fmt/e2e targets
- Distroless Docker image (~15MB)

Testing:
- Unit tests for models, classifier, providers, mergers
- E2E tests with testcontainers-go (real Postgres/Redis/MinIO)

Terraform config:
- All 40 production remotes + helm virtual as HCL
- Provider repo: terraform-provider-artifactapi v0.0.1 (separate)

---------

Co-authored-by: Ben Vincent <ben@unkin.net>
Reviewed-on: #47
This commit was merged in pull request #47.
This commit is contained in:
2026-06-07 19:30:35 +10:00
parent f25bf6cb29
commit b46c116f6b
160 changed files with 11448 additions and 7907 deletions
+880
View File
@@ -0,0 +1,880 @@
# ArtifactAPI v3 — Go Rewrite Plan
## Context
ArtifactAPI is a production artifact proxy/cache serving ~42 remotes (Docker registries, Helm repos, RPM/Alpine repos, GitHub releases, PyPI, npm, Puppet Forge, Terraform registries, Go module proxies) across a Kubernetes cluster. The current Python (FastAPI) implementation works but has architectural debt: opaque hashed S3 paths, no UI for visibility, YAML config files that drift, no garbage collection, no access logging, and virtual repos limited to Helm only.
The v3 rewrite targets: a single Go binary (API + TUI), a separate React frontend (own Dockerfile), a Terraform provider (separate repo), content-addressable storage, and a cleaner data model that makes the cache inspectable and manageable.
**Repo**: Same repo (`git.unkin.net/unkin/artifactapi`), new branch.
**Module**: `git.unkin.net/unkin/artifactapi`
**Frontend**: React + Vite, separate Dockerfile, talks to API
**Terraform provider**: Separate repo (`terraform-provider-artifactapi`)
---
## Architecture Overview
```
┌───────────────────────────────────┐ ┌──────────────────────┐
│ Go Binary (API + TUI) │ │ Frontend Container │
│ │ │ │
│ ┌──────────┐ ┌───────────────┐ │ │ React + Vite SPA │
│ │ REST API │ │ Proxy Engine │ │◄───│ nginx / node serve │
│ │ /api/v2 │ │ /api/v1/... │ │ │ Dockerfile.ui │
│ │ │ │ /v2/... (OCI) │ │ └──────────────────────┘
│ └────┬─────┘ └──────┬────────┘ │
│ │ │ │ ┌──────────────────────┐
│ ┌────┴───────────────┴────────┐ │ │ Terraform Provider │
│ │ Data Layer │ │◄───│ (separate repo) │
│ │ PostgreSQL · Redis · S3 │ │ └──────────────────────┘
│ └─────────────────────────────┘ │
│ │ ┌──────────────────────┐
│ ┌─────────────────────────────┐ │ │ TUI (subcommand) │
│ │ artifactapi tui │──│───►│ artifactapi tui │
│ └─────────────────────────────┘ │ │ --endpoint <url> │
└───────────────────────────────────┘ └──────────────────────┘
```
Three independent deployment units:
1. **Go binary** — API server + TUI subcommand (single `Dockerfile`)
2. **React frontend** — SPA served by nginx (`Dockerfile.ui`), talks to `/api/v2`
3. **Terraform provider** — separate repo, calls `/api/v2` CRUD
---
## Project Structure (Modular)
```
artifactapi/
├── cmd/
│ └── artifactapi/
│ └── main.go # entrypoint: serve / tui subcommands
├── pkg/ # PUBLIC — importable by terraform provider, CLI tools
│ ├── models/ # shared domain types
│ │ ├── remote.go # Remote, RemoteConfig, PackageType enum
│ │ ├── virtual.go # Virtual, VirtualConfig
│ │ ├── artifact.go # Artifact, Blob, AccessLogEntry
│ │ ├── local.go # LocalFile, LocalRepo
│ │ └── stats.go # RemoteStats, OverviewStats
│ └── client/ # typed Go API client (used by TUI + Terraform provider)
│ ├── client.go # Client struct, base HTTP
│ ├── remotes.go # remote CRUD methods
│ ├── virtuals.go # virtual CRUD methods
│ ├── objects.go # object browse/evict methods
│ └── stats.go # stats methods
├── internal/ # PRIVATE — server internals
│ ├── server/
│ │ ├── server.go # HTTP server setup, router
│ │ └── middleware.go # logging, recovery, request-id, access logging
│ │
│ ├── api/
│ │ ├── v1/ # proxy endpoints (v1 compat)
│ │ │ ├── proxy.go # GET /api/v1/remote/{name}/{path}
│ │ │ ├── docker.go # /v2/{name}/{path}
│ │ │ ├── virtual.go # GET /api/v1/virtual/{name}/{path}
│ │ │ └── local.go # CRUD /api/v1/local/{name}/{path}
│ │ └── v2/ # management API
│ │ ├── remotes.go # CRUD + stats
│ │ ├── virtuals.go # CRUD
│ │ ├── objects.go # browse/evict cached objects
│ │ ├── stats.go # overview, top-remotes
│ │ ├── events.go # SSE stream
│ │ └── health.go # health, metrics
│ │
│ ├── provider/ # package-type providers (registry protocol handlers)
│ │ ├── provider.go # Provider interface + registry
│ │ ├── generic/
│ │ │ ├── generic.go
│ │ │ └── generic_test.go
│ │ ├── docker/
│ │ │ ├── docker.go # OCI Distribution v2 via go-containerregistry
│ │ │ ├── auth.go # Bearer token fetch + cache
│ │ │ └── docker_test.go
│ │ ├── helm/
│ │ │ ├── helm.go # index rewriting via helm.sh/helm/v3/pkg/repo
│ │ │ ├── merger.go # virtual index merge
│ │ │ └── helm_test.go
│ │ ├── pypi/
│ │ │ ├── pypi.go # simple index HTML rewriting
│ │ │ ├── merger.go # virtual simple index merge
│ │ │ └── pypi_test.go
│ │ ├── npm/
│ │ │ ├── npm.go # metadata JSON rewriting
│ │ │ └── npm_test.go
│ │ ├── rpm/
│ │ │ ├── rpm.go # repodata patterns
│ │ │ └── rpm_test.go
│ │ ├── alpine/
│ │ │ ├── alpine.go # APKINDEX patterns
│ │ │ └── alpine_test.go
│ │ ├── puppet/
│ │ │ ├── puppet.go # file_uri JSON rewriting
│ │ │ └── puppet_test.go
│ │ ├── terraform/
│ │ │ ├── terraform.go # registry protocol, download URL rewriting
│ │ │ └── terraform_test.go
│ │ └── goproxy/
│ │ ├── goproxy.go # Go module proxy protocol (GOPROXY)
│ │ └── goproxy_test.go
│ │
│ ├── proxy/
│ │ ├── engine.go # core fetch-or-cache logic
│ │ ├── engine_test.go
│ │ ├── classifier.go # immutable vs mutable classification
│ │ ├── classifier_test.go
│ │ ├── revalidator.go # conditional HEAD requests (ETag/Last-Modified)
│ │ └── circuit.go # per-remote circuit breaker
│ │
│ ├── storage/
│ │ ├── s3.go # S3 client (minio-go — works with MinIO, Ceph, AWS)
│ │ ├── s3_test.go
│ │ ├── cas.go # content-addressable store logic
│ │ └── cas_test.go
│ │
│ ├── cache/
│ │ ├── redis.go # TTL management, fetch locks
│ │ ├── redis_test.go
│ │ └── lock.go # distributed lock abstraction
│ │
│ ├── database/
│ │ ├── postgres.go # connection pool, migration runner
│ │ ├── queries/ # SQL query files or sqlc-generated code
│ │ │ ├── remotes.sql.go
│ │ │ ├── virtuals.sql.go
│ │ │ ├── artifacts.sql.go
│ │ │ └── access_log.sql.go
│ │ └── migrations/ # golang-migrate SQL files
│ │ ├── 001_initial.up.sql
│ │ └── 001_initial.down.sql
│ │
│ ├── metrics/
│ │ └── prometheus.go # counters, gauges, histograms
│ │
│ ├── gc/
│ │ ├── gc.go # background garbage collection goroutine
│ │ └── gc_test.go
│ │
│ ├── tui/
│ │ ├── app.go # Bubble Tea main model
│ │ ├── views/
│ │ │ ├── dashboard.go
│ │ │ ├── remotes.go
│ │ │ ├── objects.go
│ │ │ └── virtuals.go
│ │ └── components/
│ │ ├── table.go
│ │ └── statusbar.go
│ │
│ └── config/
│ └── env.go # environment variable parsing + validation
├── ui/ # React frontend — SEPARATE DOCKERFILE
│ ├── src/
│ │ ├── App.tsx
│ │ ├── pages/
│ │ │ ├── Dashboard.tsx
│ │ │ ├── Remotes.tsx
│ │ │ ├── RemoteDetail.tsx
│ │ │ ├── Virtuals.tsx
│ │ │ └── Objects.tsx
│ │ ├── components/
│ │ │ ├── RemoteTable.tsx
│ │ │ ├── ObjectBrowser.tsx
│ │ │ ├── StatsCard.tsx
│ │ │ └── EventFeed.tsx
│ │ └── api/
│ │ └── client.ts # typed API client
│ ├── package.json
│ ├── vite.config.ts
│ ├── tsconfig.json
│ ├── Dockerfile.ui # multi-stage: node build → nginx
│ └── nginx.conf # proxy /api/* to backend, serve SPA
├── e2e/ # end-to-end integration tests
│ ├── e2e_test.go # TestMain spins up docker-compose stack
│ ├── proxy_test.go # proxy through real remotes
│ ├── docker_test.go # Docker v2 protocol e2e
│ ├── management_test.go # v2 API CRUD
│ ├── virtual_test.go # virtual repo merge e2e
│ └── docker-compose.e2e.yml # postgres + redis + minio for tests
├── go.mod
├── go.sum
├── Makefile
├── Dockerfile # Go binary (API server + TUI)
├── Dockerfile.ui # symlink or copy → ui/Dockerfile.ui
└── docker-compose.yml
```
### Key Modularisation Decisions
- **`pkg/models/`** — Shared domain types importable by the Terraform provider and any external tooling. No dependencies on internal packages
- **`pkg/client/`** — Typed Go API client used by both the TUI and the Terraform provider. Depends only on `pkg/models/` and stdlib
- **`internal/provider/`** — Each package type is its own subpackage with isolated tests. A provider registry maps `PackageType → Provider`
- **`internal/database/queries/`** — Use [sqlc](https://sqlc.dev/) to generate type-safe query functions from SQL, or hand-written query files
- **`e2e/`** — Separate test binary that spins up a real docker-compose stack
---
## Go Ecosystem Libraries
Prefer existing, maintained Go modules over writing protocol handlers from scratch.
### Package-Type Libraries
| Package Type | Go Module | What It Gives Us |
|---|---|---|
| **Docker/OCI** | `github.com/google/go-containerregistry` | Full Registry v2/OCI client: manifest parsing, auth challenges, blob operations. `pkg/registry` can implement a v2 server. Reference: `github.com/regclient/regclient` |
| **Helm** | `helm.sh/helm/v3/pkg/repo` | Parse/generate `index.yaml`, `IndexFile`/`ChartVersion` types, URL entries. Used directly for merge |
| **Terraform** | `github.com/hashicorp/terraform-registry-address` | Provider/module address parsing, `ForRegistryProtocol()` URL generation. Protocol spec: provider registry protocol v1 |
| **Go Modules** | `github.com/goproxy/goproxy` | Minimalist GOPROXY protocol handler, implements full spec as `http.Handler`. Handles `/@v/list`, `/@v/{v}.info`, `/@v/{v}.mod`, `/@v/{v}.zip`, `/@latest` |
| **RPM** | `rs3.io/go/rpm/repomd` | Parse `repomd.xml`, `primary.xml` with proper XML namespace handling |
| **Alpine** | `gitlab.alpinelinux.org/alpine/go` | Official Alpine library: parse APKINDEX, `.apk` files |
| **PyPI** | stdlib `golang.org/x/net/html` | No dedicated Go PyPI library exists. Parse simple index HTML with `x/net/html`, extract `<a>` tags. Minimal — the rewriting is just href replacement |
| **npm** | stdlib `encoding/json` | npm metadata is JSON — parse with stdlib, rewrite `dist.tarball` URLs. No special library needed |
| **Puppet Forge** | stdlib `encoding/json` | Forge API is JSON — parse and rewrite `file_uri` fields. Community lib `github.com/johnmccabe/go-puppetforge` exists but is thin; stdlib suffices |
### Infrastructure Libraries
| Purpose | Go Module | Why This One |
|---|---|---|
| **HTTP router** | `github.com/go-chi/chi/v5` | Lightweight, stdlib `http.Handler` compatible, middleware chain |
| **PostgreSQL** | `github.com/jackc/pgx/v5` | Pure Go, connection pooling, COPY support, prepared statements |
| **SQL generation** | `github.com/sqlc-dev/sqlc` | Generate type-safe Go from SQL queries — no ORM, no reflection |
| **Redis** | `github.com/redis/go-redis/v9` | Full Redis client, pipelining, pub/sub |
| **S3 (MinIO/Ceph/AWS)** | `github.com/minio/minio-go/v7` | Native S3-compatible client. Works with MinIO, Ceph RGW, AWS S3, any S3-compatible backend out of the box. Lighter than aws-sdk-go-v2, purpose-built for S3 compat |
| **DB migrations** | `github.com/golang-migrate/migrate/v4` | SQL file-based migrations, CLI + library |
| **Prometheus** | `github.com/prometheus/client_golang` | Counters, gauges, histograms |
| **TUI** | `github.com/charmbracelet/bubbletea` | Elm-architecture TUI framework |
| **TUI styling** | `github.com/charmbracelet/lipgloss` | Terminal styling |
| **TUI components** | `github.com/charmbracelet/bubbles` | Table, text input, spinner, etc. |
| **Structured logging** | `log/slog` (stdlib) | Go 1.21+ structured logging, zero dependencies |
| **Testing** | `github.com/stretchr/testify` | Assertions + require for unit tests |
| **Test containers** | `github.com/testcontainers/testcontainers-go` | Spin up Postgres/Redis/MinIO in e2e tests |
### S3 Client: Multi-Backend Support
Using `minio-go/v7` as the S3 client because it natively supports:
- **MinIO** — primary development/production target
- **Ceph RGW** — S3-compatible via endpoint config
- **AWS S3** — via region + credential config
- **Any S3-compatible** — GCS (interop mode), Wasabi, DigitalOcean Spaces, etc.
No abstraction layer needed — `minio-go` handles endpoint differences internally. Config:
```go
client, _ := minio.New(endpoint, &minio.Options{
Creds: credentials.NewStaticV4(accessKey, secretKey, ""),
Secure: useTLS,
Region: region, // optional, for AWS
})
```
---
## Data Layer
### PostgreSQL Schema
```sql
-- Remotes: managed exclusively by Terraform
CREATE TABLE remotes (
name TEXT PRIMARY KEY,
package_type TEXT NOT NULL, -- generic, docker, helm, pypi, npm, rpm, alpine, puppet, terraform, goproxy
base_url TEXT NOT NULL,
description TEXT DEFAULT '',
username TEXT DEFAULT '',
password TEXT DEFAULT '',
immutable_ttl INTEGER DEFAULT 0,
mutable_ttl INTEGER DEFAULT 3600,
check_mutable BOOLEAN DEFAULT TRUE,
immutable_patterns TEXT[] DEFAULT '{}', -- user-defined immutable patterns
mutable_patterns TEXT[] DEFAULT '{}', -- user-defined mutable patterns (merged with provider built-ins)
allowlist TEXT[] DEFAULT '{}', -- if empty, allow all paths; if non-empty, only matching paths proxied
blocklist TEXT[] DEFAULT '{}', -- always denied, checked before allowlist
ban_tags_enabled BOOLEAN DEFAULT FALSE,
ban_tags TEXT[] DEFAULT '{}',
quarantine_enabled BOOLEAN DEFAULT FALSE,
quarantine_days INTEGER DEFAULT 3,
stale_on_error BOOLEAN DEFAULT TRUE,
releases_remote TEXT DEFAULT '', -- terraform type: name of CDN remote for download URL rewriting
managed_by TEXT DEFAULT '', -- 'terraform' or empty
created_at TIMESTAMPTZ DEFAULT NOW(),
updated_at TIMESTAMPTZ DEFAULT NOW()
);
-- Virtual repositories
CREATE TABLE virtuals (
name TEXT PRIMARY KEY,
package_type TEXT NOT NULL,
description TEXT DEFAULT '',
members TEXT[] NOT NULL,
managed_by TEXT DEFAULT '',
created_at TIMESTAMPTZ DEFAULT NOW(),
updated_at TIMESTAMPTZ DEFAULT NOW()
);
-- Content-addressable blob storage tracking
CREATE TABLE blobs (
content_hash TEXT PRIMARY KEY,
s3_key TEXT NOT NULL,
size_bytes BIGINT NOT NULL,
content_type TEXT DEFAULT 'application/octet-stream',
created_at TIMESTAMPTZ DEFAULT NOW()
);
-- Artifact metadata: maps (remote, path) → content blob
CREATE TABLE artifacts (
id BIGSERIAL PRIMARY KEY,
remote_name TEXT NOT NULL REFERENCES remotes(name) ON DELETE CASCADE,
path TEXT NOT NULL,
content_hash TEXT NOT NULL REFERENCES blobs(content_hash),
upstream_etag TEXT DEFAULT '',
upstream_last_modified TIMESTAMPTZ,
first_seen_at TIMESTAMPTZ DEFAULT NOW(),
last_fetched_at TIMESTAMPTZ DEFAULT NOW(),
last_accessed_at TIMESTAMPTZ DEFAULT NOW(),
fetch_count BIGINT DEFAULT 1,
access_count BIGINT DEFAULT 1,
UNIQUE(remote_name, path)
);
CREATE INDEX idx_artifacts_remote ON artifacts(remote_name);
CREATE INDEX idx_artifacts_last_accessed ON artifacts(last_accessed_at);
-- Local file uploads
CREATE TABLE local_files (
id BIGSERIAL PRIMARY KEY,
repo_name TEXT NOT NULL,
file_path TEXT NOT NULL,
content_hash TEXT NOT NULL REFERENCES blobs(content_hash),
created_at TIMESTAMPTZ DEFAULT NOW(),
UNIQUE(repo_name, file_path)
);
-- Access log (append-only, powers dashboards)
CREATE TABLE access_log (
id BIGSERIAL PRIMARY KEY,
remote_name TEXT NOT NULL,
path TEXT NOT NULL,
cache_hit BOOLEAN NOT NULL,
size_bytes BIGINT DEFAULT 0,
upstream_ms INTEGER DEFAULT 0,
client_ip TEXT DEFAULT '',
created_at TIMESTAMPTZ DEFAULT NOW()
);
CREATE INDEX idx_access_log_remote_time ON access_log(remote_name, created_at);
```
### Redis Usage (Ephemeral Only)
| Key pattern | Type | TTL | Purpose |
|---|---|---|---|
| `ttl:{remote}:{path}` | STRING | remote's immutable/mutable TTL | Artifact freshness — existence = still fresh |
| `lock:{remote}:{path}` | STRING (NX) | 30s | Fetch lock — prevents thundering herd |
| `etag:{remote}:{path}` | STRING | same as TTL key | Cached ETag for conditional revalidation |
| `circuit:{remote}` | STRING | configurable | Circuit breaker — consecutive failure count |
Losing Redis = all TTLs expire = next request re-validates upstream. No data loss.
### S3 Layout (Content-Addressable)
```
artifacts-bucket/
├── blobs/sha256/{content_hash} # immutable CAS blobs
├── indexes/{remote}/{path} # mutable index files (helm, pypi, rpm, etc.)
├── indexes/{virtual}/{path} # merged virtual indexes
└── local/{repo}/{path} # user uploads (CAS-backed via blobs table)
```
---
## Terraform Remote Type (New in v2)
The `terraform` package type proxies the Terraform Provider Registry Protocol:
- **URL construction**: prepends `/v1/providers/` to request paths
- **Built-in mutable pattern**: `[^/]+/[^/]+/versions$` (version listings change over time)
- **Built-in immutable pattern**: `[^/]+/[^/]+/[^/]+/download/[^/]+/[^/]+$` (per-version download info is fixed)
- **Response rewriting**: download info JSON — rewrites `download_url`, `shasums_url`, `shasums_signature_url` to route through a companion `releases_remote` (e.g., `hashicorp-releases` generic remote)
- **Config**: requires `releases_remote` field pointing to the CDN remote that serves the actual binaries
Uses `github.com/hashicorp/terraform-registry-address` for address parsing and protocol-compliant URL generation.
---
## Go Module Proxy Remote Type (New)
The `goproxy` package type implements the GOPROXY protocol (Go module proxy):
| Endpoint | Mutability | Description |
|---|---|---|
| `{module}/@v/list` | Mutable | Plain text list of known versions |
| `{module}/@latest` | Mutable | JSON metadata for latest version |
| `{module}/@v/{version}.info` | Immutable | JSON version metadata (`Version`, `Time`) |
| `{module}/@v/{version}.mod` | Immutable | `go.mod` file for that version |
| `{module}/@v/{version}.zip` | Immutable | Source archive for that version |
- **No URL rewriting needed** — responses are self-contained (no embedded URLs)
- **Config**: `base_url` points to upstream proxy (e.g., `https://proxy.golang.org`)
- **Client usage**: set `GOPROXY=https://artifactapi.example.com/api/v1/remote/goproxy`
- Uses `github.com/goproxy/goproxy` for protocol handling
---
## Allowlist / Blocklist / Automatic Mutable Patterns
### Access Control (Per-Remote)
| Field | Default | Behavior |
|---|---|---|
| `blocklist` | `[]` (empty) | If a path matches any blocklist pattern → **403 Forbidden**. Checked first |
| `allowlist` | `[]` (empty) | If empty → **allow everything**. If non-empty → only matching paths are proxied; everything else → **403** |
Evaluation order: blocklist → allowlist → proxy. No allowlist + no blocklist = open proxy (default).
### Automatic Mutable Patterns (Per-Provider Built-ins)
Each provider declares built-in mutable patterns that are **always merged** with user-defined `mutable_patterns`. Users never need to configure these — the provider knows which paths change over time.
| Provider | Built-in Mutable Patterns | Rationale |
|---|---|---|
| **generic** | *(none)* | No convention for what's mutable |
| **docker** | `/manifests/(?!sha256:)[^/]+$`, `/tags/list$` | Tag manifests change; digest manifests don't |
| **helm** | `index\.yaml$` | Chart index changes when new charts are published |
| **pypi** | `simple/` | Package index pages change with new releases |
| **npm** | `^[^/]+$` (package metadata, not `.tgz`) | Package metadata changes; tarballs are immutable |
| **rpm** | `repomd\.xml$`, `repodata/.*`, `Packages\.gz$` | Repo metadata rebuilt on every publish |
| **alpine** | `APKINDEX\.tar\.gz$` | Package index rebuilt on every publish |
| **puppet** | `^v3/modules/`, `^v3/releases` | Module metadata changes with new releases |
| **terraform** | `[^/]+/[^/]+/versions$` | Provider version listings grow over time |
| **goproxy** | `@v/list$`, `@latest$` | Version list and latest pointer change |
These are returned by `Provider.BuiltinMutablePatterns()` and merged at classification time:
```
effective_mutable = provider.BuiltinMutablePatterns() remote.mutable_patterns
```
If a path matches `effective_mutable` → use `mutable_ttl`. If it matches `remote.immutable_patterns` → use `immutable_ttl`. Immutable patterns take precedence over mutable when both match.
---
## API Design
### v1 Proxy Endpoints (Backwards Compatible)
| Method | Path | Description |
|---|---|---|
| `GET` | `/api/v1/remote/{name}/{path}` | Proxy/cache artifact |
| `GET` | `/api/v1/virtual/{name}/{path}` | Virtual repo proxy |
| `GET/HEAD` | `/v2/{name}/{path}` | Docker Registry v2 |
| `GET` | `/v2/` | Docker v2 ping |
| `GET/PUT/HEAD/DELETE` | `/api/v1/local/{name}/{path}` | Local repo CRUD |
### v2 Management API (New)
```
GET /api/v2/remotes → [{name, package_type, base_url, description, stats}]
GET /api/v2/remotes/{name} → {full config + stats + health}
POST /api/v2/remotes → create remote (Terraform provider)
PUT /api/v2/remotes/{name} → update remote (Terraform provider)
DELETE /api/v2/remotes/{name} → delete remote — cascades artifacts, GC cleans S3
GET /api/v2/virtuals → [{name, package_type, members, stats}]
GET /api/v2/virtuals/{name} → {full config + member details}
POST /api/v2/virtuals → create virtual
PUT /api/v2/virtuals/{name} → update virtual
DELETE /api/v2/virtuals/{name} → delete virtual
GET /api/v2/remotes/{name}/objects → paginated objects
?q=pattern&sort=size|accessed|age&page=1&per_page=50
DELETE /api/v2/remotes/{name}/objects/{path} → evict specific cached object
DELETE /api/v2/remotes/{name}/cache → flush cache
?type=all|indexes|blobs
GET /api/v2/stats → overview stats
GET /api/v2/stats/top-remotes → top remotes by size/requests/hit-rate
GET /api/v2/health → {status, postgres, redis, s3, uptime}
GET /metrics → Prometheus format
GET /api/v2/events → SSE stream
```
---
## Proxy Engine
### Request Flow
```
Client Request
Classify (immutable/mutable/denied)
├── blocklist match → 403
├── allowlist non-empty + no match → 403
Check Redis TTL key
├── exists (fresh) → serve from S3, log access
├── missing (expired or uncached)
│ │
│ ▼
│ Acquire fetch lock (Redis SETNX, 30s TTL)
│ │
│ ├── lock acquired
│ │ ├── mutable + check_mutable + have ETag → HEAD upstream
│ │ │ ├── 304 → refresh TTL, serve from S3
│ │ │ └── changed → full fetch
│ │ └── full fetch from upstream
│ │ → provider.RewriteResponse() if needed
│ │ → CAS store (hash → check blobs → upload if new)
│ │ → upsert artifact in Postgres
│ │ → set Redis TTL + release lock
│ │ → on upstream error + stale_on_error → refresh TTL, serve stale
│ │
│ └── lock not acquired → poll S3 briefly, serve if another pod fetched it
Stream response from S3, log access
```
### Circuit Breaker
Per-remote, tracked in Redis. Closed → Open (after N failures) → Half-open (after cooldown). Exposed via `GET /api/v2/remotes/{name}` health field.
### Content-Addressable Storage
1. Stream upstream → temp file, compute SHA256 inline
2. Check `blobs` table for hash
3. Exists → skip S3 upload, upsert `artifacts` row only
4. New → upload to `blobs/sha256/{hash}`, insert both rows
### Garbage Collection
Background goroutine (configurable interval, default 1h):
1. Orphaned blobs: delete S3 objects whose `content_hash` has no referencing `artifacts` or `local_files` rows
2. Cold artifacts: optional per-remote, delete artifacts not accessed in N days
3. Remote deletion: `ON DELETE CASCADE` handles Postgres; GC sweeps orphaned blobs
---
## Package Providers
### Provider Interface
```go
type Provider interface {
Type() models.PackageType
BuiltinMutablePatterns() []*regexp.Regexp
BuiltinImmutablePatterns() []*regexp.Regexp
ContentType(path string) string
UpstreamURL(remote models.Remote, path string) string
RewriteResponse(body []byte, remote models.Remote, proxyBaseURL string) ([]byte, error)
AuthHeaders(ctx context.Context, remote models.Remote) (http.Header, error)
}
type IndexMerger interface {
MergeIndexes(members []MemberIndex, proxyBaseURL string) ([]byte, error)
}
```
### Provider Registry
```go
var registry = map[models.PackageType]Provider{
models.PackageGeneric: &generic.Provider{},
models.PackageDocker: &docker.Provider{},
models.PackageHelm: &helm.Provider{},
models.PackagePyPI: &pypi.Provider{},
models.PackageNPM: &npm.Provider{},
models.PackageRPM: &rpm.Provider{},
models.PackageAlpine: &alpine.Provider{},
models.PackagePuppet: &puppet.Provider{},
models.PackageTerraform: &terraform.Provider{},
models.PackageGoProxy: &goproxy.Provider{},
}
func Get(t models.PackageType) (Provider, error) { ... }
```
Each provider lives in its own subpackage under `internal/provider/` with its own `_test.go`.
---
## Testing Strategy
### Unit Tests
Every package gets `_test.go` files alongside the source. Run with `go test ./...`.
| Package | What's Tested |
|---|---|
| `internal/provider/docker/` | Auth token parsing/caching, manifest classification, tag banning, URL construction, blob key generation |
| `internal/provider/helm/` | `index.yaml` parsing (using `helm.sh/helm/v3/pkg/repo`), URL rewriting, index merging |
| `internal/provider/pypi/` | Simple index HTML parsing, URL rewriting, index merging |
| `internal/provider/npm/` | Metadata JSON rewriting (`dist.tarball` URLs) |
| `internal/provider/terraform/` | Registry URL construction, download info JSON rewriting, `releases_remote` URL extraction |
| `internal/provider/rpm/` | Mutable pattern matching (repodata) |
| `internal/provider/alpine/` | Mutable pattern matching (APKINDEX) |
| `internal/provider/puppet/` | `file_uri` JSON rewriting |
| `internal/proxy/` | Classifier (immutable vs mutable vs denied), circuit breaker state transitions, revalidator logic |
| `internal/storage/` | CAS key generation, dedup detection, S3 operation mocking |
| `internal/cache/` | Redis TTL set/check, fetch lock acquire/release/contention |
| `internal/gc/` | Orphan detection queries, cold artifact selection |
| `pkg/models/` | Model validation, PackageType enum |
| `pkg/client/` | API client request/response serialization |
### End-to-End Tests
Located in `e2e/`. Use `testcontainers-go` to spin up real Postgres, Redis, and MinIO containers. The test binary starts the actual `artifactapi` server against these backends.
```go
// e2e/e2e_test.go
func TestMain(m *testing.M) {
// Start postgres, redis, minio via testcontainers-go
// Run migrations
// Start artifactapi server on random port
// Run tests
// Tear down
}
```
| Test File | What's Tested |
|---|---|
| `e2e/proxy_test.go` | Proxy a real GitHub release through generic remote, verify S3 storage, verify Redis TTL, verify Postgres artifact row, verify cache hit on second request |
| `e2e/docker_test.go` | Pull a real image manifest + blob through Docker v2 proxy, verify blob deduplication, tag banning |
| `e2e/management_test.go` | Full CRUD lifecycle: create remote via v2 API, proxy through it, list objects, evict object, flush cache, delete remote |
| `e2e/virtual_test.go` | Create two helm remotes + virtual, fetch merged index, verify priority ordering |
| `e2e/terraform_test.go` | Proxy terraform provider version listing + download info, verify URL rewriting to releases_remote |
| `e2e/goproxy_test.go` | Proxy Go module `@v/list`, `.info`, `.mod`, `.zip` through GOPROXY remote, verify mutable vs immutable classification |
| `e2e/gc_test.go` | Create artifact, delete remote, trigger GC, verify S3 blob cleaned up |
### Code Quality
- `gofmt` / `goimports` — enforced in CI, run on save
- `golangci-lint` — comprehensive linter suite (staticcheck, errcheck, govet, etc.)
- `go vet ./...` — run in CI
- Makefile targets: `make test`, `make lint`, `make e2e`, `make fmt`
---
## Terraform Provider (Separate Repo)
**Repo**: `terraform-provider-artifactapi`
**Uses**: `pkg/client/` and `pkg/models/` from the main module
```hcl
provider "artifactapi" {
endpoint = "https://artifactapi.k8s.syd1.au.unkin.net"
}
resource "artifactapi_remote" "terraform_registry" {
name = "terraform-registry"
package_type = "terraform"
base_url = "https://registry.terraform.io"
description = "Terraform provider registry"
releases_remote = artifactapi_remote.hashicorp_releases.name
immutable_patterns = [
"[^/]+/[^/]+/[^/]+/download/[^/]+/[^/]+$",
]
cache {
immutable_ttl = 0
mutable_ttl = 300
}
}
resource "artifactapi_remote" "hashicorp_releases" {
name = "hashicorp-releases"
package_type = "generic"
base_url = "https://releases.hashicorp.com"
immutable_patterns = [
".*\\.zip$",
".*SHA256SUMS(\\.sig)?$",
]
cache {
immutable_ttl = 0
mutable_ttl = 0
}
}
resource "artifactapi_virtual" "helm" {
name = "helm"
package_type = "helm"
description = "All helm repos merged"
members = [
artifactapi_remote.jetstack.name,
artifactapi_remote.hashicorp_helm.name,
]
}
```
---
## Web UI (React + Vite — Separate Container)
### Deployment
Separate `Dockerfile.ui`: multi-stage build (node → nginx). Served as its own container/pod. nginx proxies `/api/*` to the Go backend.
### Pages
| Route | Content |
|---|---|
| `/` | Dashboard: total objects, storage used, dedup savings, bandwidth saved, top remotes chart, live SSE event feed, health indicators |
| `/remotes` | Remote table: name, type, description, object count, size, hit rate, health. Filter by type, sort any column |
| `/remotes/:name` | Config (read-only, "Managed by Terraform" badge), stats, object browser with search/sort/evict, flush actions |
| `/virtuals` | Virtual table: name, type, members, merged object count |
| `/virtuals/:name` | Member list with individual stats |
All config is read-only — managed by Terraform.
---
## TUI (Bubble Tea — Subcommand)
`artifactapi tui --endpoint http://localhost:8000` or via `ARTIFACTAPI_ENDPOINT` env.
Uses `pkg/client/` for all API calls (same client as Terraform provider).
| View | Key bindings |
|---|---|
| Dashboard | summary stats, top remotes |
| Remotes list | `j`/`k` navigate, `/` filter, `Enter` detail |
| Remote detail | config + stats, `Enter` → object browser |
| Object browser | `/` search, `d` evict, `f` flush |
| Virtuals | `j`/`k`, `Enter` detail |
---
## Improvements Over v2
| Area | v2 (Python) | v3 (Go) |
|---|---|---|
| S3 paths | Hashed, opaque | Content-addressed CAS |
| Config | YAML files, mtime reload | Terraform via API |
| Package types | 8 types | 10 types (+ terraform, goproxy) |
| Virtual repos | Helm only | Helm + PyPI, extensible |
| Deduplication | Docker blobs only | All types via CAS |
| Revalidation | Opt-in flag | Default for all mutable |
| Access logging | None | Per-artifact in Postgres |
| GC | None | Background goroutine |
| Upstream health | Per-request | Circuit breaker |
| S3 backends | MinIO only | MinIO, Ceph, AWS (minio-go) |
| UI | None | Web dashboard + TUI |
| Binary | Python + venv | Static Go binary |
| Frontend | N/A | Separate container (React) |
| Testing | Mocked unit tests | Unit + e2e with real backends |
---
## Implementation Phases
### Phase 1: Core Engine + Models
- Go module, Makefile (`make build test lint fmt e2e`), Dockerfile, docker-compose
- `pkg/models/` — all domain types
- PostgreSQL schema + migrations
- S3 storage layer with CAS (`minio-go/v7`)
- Redis cache layer (TTL, locks)
- Proxy engine: fetch-or-cache, classifier, revalidator
- Generic + Docker providers (most complexity: OCI auth, CAS, tag banning)
- Health + metrics endpoints
- Unit tests for all packages
- **Milestone**: proxy Docker + generic, cache in S3, track in Postgres
### Phase 2: All Providers
- Helm (using `helm.sh/helm/v3/pkg/repo`)
- PyPI (stdlib `x/net/html`)
- npm (stdlib `encoding/json`)
- RPM (using `rs3.io/go/rpm/repomd`)
- Alpine (using `gitlab.alpinelinux.org/alpine/go`)
- Puppet Forge (stdlib `encoding/json`)
- Terraform (using `hashicorp/terraform-registry-address`)
- Go Modules / GOPROXY (using `github.com/goproxy/goproxy`)
- Unit tests per provider
- **Milestone**: feature parity with v2 + goproxy
### Phase 3: Management API + Virtual Repos + GC
- `pkg/client/` — shared Go API client
- v2 CRUD endpoints
- Virtual repo engine: `IndexMerger` for Helm + PyPI
- Circuit breaker
- Access logging middleware
- GC goroutine
- **Milestone**: full API, virtuals, GC
### Phase 4: End-to-End Tests
- `e2e/` test suite with `testcontainers-go`
- Proxy, Docker, management, virtual, terraform, GC tests
- CI pipeline: `make e2e`
- **Milestone**: comprehensive e2e coverage
### Phase 5: Terraform Provider
- Separate repo: `terraform-provider-artifactapi`
- Imports `pkg/client/` and `pkg/models/`
- `artifactapi_remote` + `artifactapi_virtual` resources + data sources
- Import support
- **Milestone**: manage all config via Terraform
### Phase 6: Web UI
- React + Vite in `ui/`
- `Dockerfile.ui` (multi-stage → nginx)
- Dashboard, remotes, objects, virtuals pages
- SSE event feed
- **Milestone**: full web UI in separate container
### Phase 7: TUI
- Bubble Tea in `internal/tui/`
- Uses `pkg/client/`
- Dashboard, remotes, objects, virtuals views
- **Milestone**: TUI feature parity with web UI
### Phase 8: Migration + Cutover
- Migration tool: v2 YAML → Terraform HCL + `terraform import` commands
- S3 rehash script: `{remote}/{hash16}/{file}``blobs/sha256/{content_hash}`
- Parallel run, response comparison
- Cutover
---
## Makefile Targets
```makefile
.PHONY: build test lint fmt e2e docker docker-ui
build: ## Build Go binary
go build -o bin/artifactapi ./cmd/artifactapi
test: ## Run unit tests
go test ./...
lint: ## Run golangci-lint + go vet
golangci-lint run ./...
go vet ./...
fmt: ## Format code (gofmt + goimports)
gofmt -w .
goimports -w .
e2e: ## Run end-to-end tests (requires Docker)
go test -tags=e2e -count=1 -timeout=5m ./e2e/...
docker: ## Build API server Docker image
docker build -t artifactapi .
docker-ui: ## Build frontend Docker image
docker build -t artifactapi-ui -f ui/Dockerfile.ui ui/
compose: ## Start full stack (API + UI + Postgres + Redis + MinIO)
docker compose up -d
```