Complete rewrite of ArtifactAPI from Python/FastAPI to Go as a single binary. Core engine: - 10 package providers: generic, docker, helm, pypi, npm, rpm, alpine, puppet, terraform, goproxy — each with built-in mutable patterns - Content-addressable storage (SHA256 dedup across all remotes) - Three-tier caching: Redis (TTL/locks) → S3/MinIO (blobs) → upstream - Classifier with allowlist/blocklist per-remote (empty = allow all) - Circuit breaker, conditional revalidation, stale-on-error - Background garbage collection for orphaned blobs - Access logging to PostgreSQL API: - v1 proxy endpoints (backwards compatible) - v2 management API: CRUD remotes/virtuals, object browser, stats, health, SSE events, probe/test endpoint - Virtual repos with index merging (Helm YAML + PyPI HTML) Frontend (React + Vite, separate Dockerfile): - Dashboard with stats, health indicators, top remotes - Remotes list with type filter, remote detail with config/patterns - Object browser with pagination and evict - Test Remote page: probe any remote path, see headers/size/timing - Virtuals page with expandable member lists TUI (Bubble Tea): - Dashboard, remotes list/detail, object browser, virtuals - Vim-style navigation, artifactapi tui --endpoint <url> Infrastructure: - S3 client supports MinIO, Ceph RGW, AWS S3 (minio-go) - PostgreSQL schema with migrations - Docker Compose: API + UI + Postgres 17 + Redis 7 + MinIO - Makefile with Go version check, build/test/lint/fmt/e2e targets - Distroless Docker image (~15MB) Testing: - Unit tests for models, classifier, providers, mergers - E2E tests with testcontainers-go (real Postgres/Redis/MinIO) Terraform config: - All 40 production remotes + helm virtual as HCL - Provider repo: terraform-provider-artifactapi v0.0.1 (separate)
39 KiB
ArtifactAPI v3 — Go Rewrite Plan
Context
ArtifactAPI is a production artifact proxy/cache serving ~42 remotes (Docker registries, Helm repos, RPM/Alpine repos, GitHub releases, PyPI, npm, Puppet Forge, Terraform registries, Go module proxies) across a Kubernetes cluster. The current Python (FastAPI) implementation works but has architectural debt: opaque hashed S3 paths, no UI for visibility, YAML config files that drift, no garbage collection, no access logging, and virtual repos limited to Helm only.
The v3 rewrite targets: a single Go binary (API + TUI), a separate React frontend (own Dockerfile), a Terraform provider (separate repo), content-addressable storage, and a cleaner data model that makes the cache inspectable and manageable.
Repo: Same repo (git.unkin.net/unkin/artifactapi), new branch.
Module: git.unkin.net/unkin/artifactapi
Frontend: React + Vite, separate Dockerfile, talks to API
Terraform provider: Separate repo (terraform-provider-artifactapi)
Architecture Overview
┌───────────────────────────────────┐ ┌──────────────────────┐
│ Go Binary (API + TUI) │ │ Frontend Container │
│ │ │ │
│ ┌──────────┐ ┌───────────────┐ │ │ React + Vite SPA │
│ │ REST API │ │ Proxy Engine │ │◄───│ nginx / node serve │
│ │ /api/v2 │ │ /api/v1/... │ │ │ Dockerfile.ui │
│ │ │ │ /v2/... (OCI) │ │ └──────────────────────┘
│ └────┬─────┘ └──────┬────────┘ │
│ │ │ │ ┌──────────────────────┐
│ ┌────┴───────────────┴────────┐ │ │ Terraform Provider │
│ │ Data Layer │ │◄───│ (separate repo) │
│ │ PostgreSQL · Redis · S3 │ │ └──────────────────────┘
│ └─────────────────────────────┘ │
│ │ ┌──────────────────────┐
│ ┌─────────────────────────────┐ │ │ TUI (subcommand) │
│ │ artifactapi tui │──│───►│ artifactapi tui │
│ └─────────────────────────────┘ │ │ --endpoint <url> │
└───────────────────────────────────┘ └──────────────────────┘
Three independent deployment units:
- Go binary — API server + TUI subcommand (single
Dockerfile) - React frontend — SPA served by nginx (
Dockerfile.ui), talks to/api/v2 - Terraform provider — separate repo, calls
/api/v2CRUD
Project Structure (Modular)
artifactapi/
├── cmd/
│ └── artifactapi/
│ └── main.go # entrypoint: serve / tui subcommands
│
├── pkg/ # PUBLIC — importable by terraform provider, CLI tools
│ ├── models/ # shared domain types
│ │ ├── remote.go # Remote, RemoteConfig, PackageType enum
│ │ ├── virtual.go # Virtual, VirtualConfig
│ │ ├── artifact.go # Artifact, Blob, AccessLogEntry
│ │ ├── local.go # LocalFile, LocalRepo
│ │ └── stats.go # RemoteStats, OverviewStats
│ └── client/ # typed Go API client (used by TUI + Terraform provider)
│ ├── client.go # Client struct, base HTTP
│ ├── remotes.go # remote CRUD methods
│ ├── virtuals.go # virtual CRUD methods
│ ├── objects.go # object browse/evict methods
│ └── stats.go # stats methods
│
├── internal/ # PRIVATE — server internals
│ ├── server/
│ │ ├── server.go # HTTP server setup, router
│ │ └── middleware.go # logging, recovery, request-id, access logging
│ │
│ ├── api/
│ │ ├── v1/ # proxy endpoints (v1 compat)
│ │ │ ├── proxy.go # GET /api/v1/remote/{name}/{path}
│ │ │ ├── docker.go # /v2/{name}/{path}
│ │ │ ├── virtual.go # GET /api/v1/virtual/{name}/{path}
│ │ │ └── local.go # CRUD /api/v1/local/{name}/{path}
│ │ └── v2/ # management API
│ │ ├── remotes.go # CRUD + stats
│ │ ├── virtuals.go # CRUD
│ │ ├── objects.go # browse/evict cached objects
│ │ ├── stats.go # overview, top-remotes
│ │ ├── events.go # SSE stream
│ │ └── health.go # health, metrics
│ │
│ ├── provider/ # package-type providers (registry protocol handlers)
│ │ ├── provider.go # Provider interface + registry
│ │ ├── generic/
│ │ │ ├── generic.go
│ │ │ └── generic_test.go
│ │ ├── docker/
│ │ │ ├── docker.go # OCI Distribution v2 via go-containerregistry
│ │ │ ├── auth.go # Bearer token fetch + cache
│ │ │ └── docker_test.go
│ │ ├── helm/
│ │ │ ├── helm.go # index rewriting via helm.sh/helm/v3/pkg/repo
│ │ │ ├── merger.go # virtual index merge
│ │ │ └── helm_test.go
│ │ ├── pypi/
│ │ │ ├── pypi.go # simple index HTML rewriting
│ │ │ ├── merger.go # virtual simple index merge
│ │ │ └── pypi_test.go
│ │ ├── npm/
│ │ │ ├── npm.go # metadata JSON rewriting
│ │ │ └── npm_test.go
│ │ ├── rpm/
│ │ │ ├── rpm.go # repodata patterns
│ │ │ └── rpm_test.go
│ │ ├── alpine/
│ │ │ ├── alpine.go # APKINDEX patterns
│ │ │ └── alpine_test.go
│ │ ├── puppet/
│ │ │ ├── puppet.go # file_uri JSON rewriting
│ │ │ └── puppet_test.go
│ │ ├── terraform/
│ │ │ ├── terraform.go # registry protocol, download URL rewriting
│ │ │ └── terraform_test.go
│ │ └── goproxy/
│ │ ├── goproxy.go # Go module proxy protocol (GOPROXY)
│ │ └── goproxy_test.go
│ │
│ ├── proxy/
│ │ ├── engine.go # core fetch-or-cache logic
│ │ ├── engine_test.go
│ │ ├── classifier.go # immutable vs mutable classification
│ │ ├── classifier_test.go
│ │ ├── revalidator.go # conditional HEAD requests (ETag/Last-Modified)
│ │ └── circuit.go # per-remote circuit breaker
│ │
│ ├── storage/
│ │ ├── s3.go # S3 client (minio-go — works with MinIO, Ceph, AWS)
│ │ ├── s3_test.go
│ │ ├── cas.go # content-addressable store logic
│ │ └── cas_test.go
│ │
│ ├── cache/
│ │ ├── redis.go # TTL management, fetch locks
│ │ ├── redis_test.go
│ │ └── lock.go # distributed lock abstraction
│ │
│ ├── database/
│ │ ├── postgres.go # connection pool, migration runner
│ │ ├── queries/ # SQL query files or sqlc-generated code
│ │ │ ├── remotes.sql.go
│ │ │ ├── virtuals.sql.go
│ │ │ ├── artifacts.sql.go
│ │ │ └── access_log.sql.go
│ │ └── migrations/ # golang-migrate SQL files
│ │ ├── 001_initial.up.sql
│ │ └── 001_initial.down.sql
│ │
│ ├── metrics/
│ │ └── prometheus.go # counters, gauges, histograms
│ │
│ ├── gc/
│ │ ├── gc.go # background garbage collection goroutine
│ │ └── gc_test.go
│ │
│ ├── tui/
│ │ ├── app.go # Bubble Tea main model
│ │ ├── views/
│ │ │ ├── dashboard.go
│ │ │ ├── remotes.go
│ │ │ ├── objects.go
│ │ │ └── virtuals.go
│ │ └── components/
│ │ ├── table.go
│ │ └── statusbar.go
│ │
│ └── config/
│ └── env.go # environment variable parsing + validation
│
├── ui/ # React frontend — SEPARATE DOCKERFILE
│ ├── src/
│ │ ├── App.tsx
│ │ ├── pages/
│ │ │ ├── Dashboard.tsx
│ │ │ ├── Remotes.tsx
│ │ │ ├── RemoteDetail.tsx
│ │ │ ├── Virtuals.tsx
│ │ │ └── Objects.tsx
│ │ ├── components/
│ │ │ ├── RemoteTable.tsx
│ │ │ ├── ObjectBrowser.tsx
│ │ │ ├── StatsCard.tsx
│ │ │ └── EventFeed.tsx
│ │ └── api/
│ │ └── client.ts # typed API client
│ ├── package.json
│ ├── vite.config.ts
│ ├── tsconfig.json
│ ├── Dockerfile.ui # multi-stage: node build → nginx
│ └── nginx.conf # proxy /api/* to backend, serve SPA
│
├── e2e/ # end-to-end integration tests
│ ├── e2e_test.go # TestMain spins up docker-compose stack
│ ├── proxy_test.go # proxy through real remotes
│ ├── docker_test.go # Docker v2 protocol e2e
│ ├── management_test.go # v2 API CRUD
│ ├── virtual_test.go # virtual repo merge e2e
│ └── docker-compose.e2e.yml # postgres + redis + minio for tests
│
├── go.mod
├── go.sum
├── Makefile
├── Dockerfile # Go binary (API server + TUI)
├── Dockerfile.ui # symlink or copy → ui/Dockerfile.ui
└── docker-compose.yml
Key Modularisation Decisions
pkg/models/— Shared domain types importable by the Terraform provider and any external tooling. No dependencies on internal packagespkg/client/— Typed Go API client used by both the TUI and the Terraform provider. Depends only onpkg/models/and stdlibinternal/provider/— Each package type is its own subpackage with isolated tests. A provider registry mapsPackageType → Providerinternal/database/queries/— Use sqlc to generate type-safe query functions from SQL, or hand-written query filese2e/— Separate test binary that spins up a real docker-compose stack
Go Ecosystem Libraries
Prefer existing, maintained Go modules over writing protocol handlers from scratch.
Package-Type Libraries
| Package Type | Go Module | What It Gives Us |
|---|---|---|
| Docker/OCI | github.com/google/go-containerregistry |
Full Registry v2/OCI client: manifest parsing, auth challenges, blob operations. pkg/registry can implement a v2 server. Reference: github.com/regclient/regclient |
| Helm | helm.sh/helm/v3/pkg/repo |
Parse/generate index.yaml, IndexFile/ChartVersion types, URL entries. Used directly for merge |
| Terraform | github.com/hashicorp/terraform-registry-address |
Provider/module address parsing, ForRegistryProtocol() URL generation. Protocol spec: provider registry protocol v1 |
| Go Modules | github.com/goproxy/goproxy |
Minimalist GOPROXY protocol handler, implements full spec as http.Handler. Handles /@v/list, /@v/{v}.info, /@v/{v}.mod, /@v/{v}.zip, /@latest |
| RPM | rs3.io/go/rpm/repomd |
Parse repomd.xml, primary.xml with proper XML namespace handling |
| Alpine | gitlab.alpinelinux.org/alpine/go |
Official Alpine library: parse APKINDEX, .apk files |
| PyPI | stdlib golang.org/x/net/html |
No dedicated Go PyPI library exists. Parse simple index HTML with x/net/html, extract <a> tags. Minimal — the rewriting is just href replacement |
| npm | stdlib encoding/json |
npm metadata is JSON — parse with stdlib, rewrite dist.tarball URLs. No special library needed |
| Puppet Forge | stdlib encoding/json |
Forge API is JSON — parse and rewrite file_uri fields. Community lib github.com/johnmccabe/go-puppetforge exists but is thin; stdlib suffices |
Infrastructure Libraries
| Purpose | Go Module | Why This One |
|---|---|---|
| HTTP router | github.com/go-chi/chi/v5 |
Lightweight, stdlib http.Handler compatible, middleware chain |
| PostgreSQL | github.com/jackc/pgx/v5 |
Pure Go, connection pooling, COPY support, prepared statements |
| SQL generation | github.com/sqlc-dev/sqlc |
Generate type-safe Go from SQL queries — no ORM, no reflection |
| Redis | github.com/redis/go-redis/v9 |
Full Redis client, pipelining, pub/sub |
| S3 (MinIO/Ceph/AWS) | github.com/minio/minio-go/v7 |
Native S3-compatible client. Works with MinIO, Ceph RGW, AWS S3, any S3-compatible backend out of the box. Lighter than aws-sdk-go-v2, purpose-built for S3 compat |
| DB migrations | github.com/golang-migrate/migrate/v4 |
SQL file-based migrations, CLI + library |
| Prometheus | github.com/prometheus/client_golang |
Counters, gauges, histograms |
| TUI | github.com/charmbracelet/bubbletea |
Elm-architecture TUI framework |
| TUI styling | github.com/charmbracelet/lipgloss |
Terminal styling |
| TUI components | github.com/charmbracelet/bubbles |
Table, text input, spinner, etc. |
| Structured logging | log/slog (stdlib) |
Go 1.21+ structured logging, zero dependencies |
| Testing | github.com/stretchr/testify |
Assertions + require for unit tests |
| Test containers | github.com/testcontainers/testcontainers-go |
Spin up Postgres/Redis/MinIO in e2e tests |
S3 Client: Multi-Backend Support
Using minio-go/v7 as the S3 client because it natively supports:
- MinIO — primary development/production target
- Ceph RGW — S3-compatible via endpoint config
- AWS S3 — via region + credential config
- Any S3-compatible — GCS (interop mode), Wasabi, DigitalOcean Spaces, etc.
No abstraction layer needed — minio-go handles endpoint differences internally. Config:
client, _ := minio.New(endpoint, &minio.Options{
Creds: credentials.NewStaticV4(accessKey, secretKey, ""),
Secure: useTLS,
Region: region, // optional, for AWS
})
Data Layer
PostgreSQL Schema
-- Remotes: managed exclusively by Terraform
CREATE TABLE remotes (
name TEXT PRIMARY KEY,
package_type TEXT NOT NULL, -- generic, docker, helm, pypi, npm, rpm, alpine, puppet, terraform, goproxy
base_url TEXT NOT NULL,
description TEXT DEFAULT '',
username TEXT DEFAULT '',
password TEXT DEFAULT '',
immutable_ttl INTEGER DEFAULT 0,
mutable_ttl INTEGER DEFAULT 3600,
check_mutable BOOLEAN DEFAULT TRUE,
immutable_patterns TEXT[] DEFAULT '{}', -- user-defined immutable patterns
mutable_patterns TEXT[] DEFAULT '{}', -- user-defined mutable patterns (merged with provider built-ins)
allowlist TEXT[] DEFAULT '{}', -- if empty, allow all paths; if non-empty, only matching paths proxied
blocklist TEXT[] DEFAULT '{}', -- always denied, checked before allowlist
ban_tags_enabled BOOLEAN DEFAULT FALSE,
ban_tags TEXT[] DEFAULT '{}',
quarantine_enabled BOOLEAN DEFAULT FALSE,
quarantine_days INTEGER DEFAULT 3,
stale_on_error BOOLEAN DEFAULT TRUE,
releases_remote TEXT DEFAULT '', -- terraform type: name of CDN remote for download URL rewriting
managed_by TEXT DEFAULT '', -- 'terraform' or empty
created_at TIMESTAMPTZ DEFAULT NOW(),
updated_at TIMESTAMPTZ DEFAULT NOW()
);
-- Virtual repositories
CREATE TABLE virtuals (
name TEXT PRIMARY KEY,
package_type TEXT NOT NULL,
description TEXT DEFAULT '',
members TEXT[] NOT NULL,
managed_by TEXT DEFAULT '',
created_at TIMESTAMPTZ DEFAULT NOW(),
updated_at TIMESTAMPTZ DEFAULT NOW()
);
-- Content-addressable blob storage tracking
CREATE TABLE blobs (
content_hash TEXT PRIMARY KEY,
s3_key TEXT NOT NULL,
size_bytes BIGINT NOT NULL,
content_type TEXT DEFAULT 'application/octet-stream',
created_at TIMESTAMPTZ DEFAULT NOW()
);
-- Artifact metadata: maps (remote, path) → content blob
CREATE TABLE artifacts (
id BIGSERIAL PRIMARY KEY,
remote_name TEXT NOT NULL REFERENCES remotes(name) ON DELETE CASCADE,
path TEXT NOT NULL,
content_hash TEXT NOT NULL REFERENCES blobs(content_hash),
upstream_etag TEXT DEFAULT '',
upstream_last_modified TIMESTAMPTZ,
first_seen_at TIMESTAMPTZ DEFAULT NOW(),
last_fetched_at TIMESTAMPTZ DEFAULT NOW(),
last_accessed_at TIMESTAMPTZ DEFAULT NOW(),
fetch_count BIGINT DEFAULT 1,
access_count BIGINT DEFAULT 1,
UNIQUE(remote_name, path)
);
CREATE INDEX idx_artifacts_remote ON artifacts(remote_name);
CREATE INDEX idx_artifacts_last_accessed ON artifacts(last_accessed_at);
-- Local file uploads
CREATE TABLE local_files (
id BIGSERIAL PRIMARY KEY,
repo_name TEXT NOT NULL,
file_path TEXT NOT NULL,
content_hash TEXT NOT NULL REFERENCES blobs(content_hash),
created_at TIMESTAMPTZ DEFAULT NOW(),
UNIQUE(repo_name, file_path)
);
-- Access log (append-only, powers dashboards)
CREATE TABLE access_log (
id BIGSERIAL PRIMARY KEY,
remote_name TEXT NOT NULL,
path TEXT NOT NULL,
cache_hit BOOLEAN NOT NULL,
size_bytes BIGINT DEFAULT 0,
upstream_ms INTEGER DEFAULT 0,
client_ip TEXT DEFAULT '',
created_at TIMESTAMPTZ DEFAULT NOW()
);
CREATE INDEX idx_access_log_remote_time ON access_log(remote_name, created_at);
Redis Usage (Ephemeral Only)
| Key pattern | Type | TTL | Purpose |
|---|---|---|---|
ttl:{remote}:{path} |
STRING | remote's immutable/mutable TTL | Artifact freshness — existence = still fresh |
lock:{remote}:{path} |
STRING (NX) | 30s | Fetch lock — prevents thundering herd |
etag:{remote}:{path} |
STRING | same as TTL key | Cached ETag for conditional revalidation |
circuit:{remote} |
STRING | configurable | Circuit breaker — consecutive failure count |
Losing Redis = all TTLs expire = next request re-validates upstream. No data loss.
S3 Layout (Content-Addressable)
artifacts-bucket/
├── blobs/sha256/{content_hash} # immutable CAS blobs
├── indexes/{remote}/{path} # mutable index files (helm, pypi, rpm, etc.)
├── indexes/{virtual}/{path} # merged virtual indexes
└── local/{repo}/{path} # user uploads (CAS-backed via blobs table)
Terraform Remote Type (New in v2)
The terraform package type proxies the Terraform Provider Registry Protocol:
- URL construction: prepends
/v1/providers/to request paths - Built-in mutable pattern:
[^/]+/[^/]+/versions$(version listings change over time) - Built-in immutable pattern:
[^/]+/[^/]+/[^/]+/download/[^/]+/[^/]+$(per-version download info is fixed) - Response rewriting: download info JSON — rewrites
download_url,shasums_url,shasums_signature_urlto route through a companionreleases_remote(e.g.,hashicorp-releasesgeneric remote) - Config: requires
releases_remotefield pointing to the CDN remote that serves the actual binaries
Uses github.com/hashicorp/terraform-registry-address for address parsing and protocol-compliant URL generation.
Go Module Proxy Remote Type (New)
The goproxy package type implements the GOPROXY protocol (Go module proxy):
| Endpoint | Mutability | Description |
|---|---|---|
{module}/@v/list |
Mutable | Plain text list of known versions |
{module}/@latest |
Mutable | JSON metadata for latest version |
{module}/@v/{version}.info |
Immutable | JSON version metadata (Version, Time) |
{module}/@v/{version}.mod |
Immutable | go.mod file for that version |
{module}/@v/{version}.zip |
Immutable | Source archive for that version |
- No URL rewriting needed — responses are self-contained (no embedded URLs)
- Config:
base_urlpoints to upstream proxy (e.g.,https://proxy.golang.org) - Client usage: set
GOPROXY=https://artifactapi.example.com/api/v1/remote/goproxy - Uses
github.com/goproxy/goproxyfor protocol handling
Allowlist / Blocklist / Automatic Mutable Patterns
Access Control (Per-Remote)
| Field | Default | Behavior |
|---|---|---|
blocklist |
[] (empty) |
If a path matches any blocklist pattern → 403 Forbidden. Checked first |
allowlist |
[] (empty) |
If empty → allow everything. If non-empty → only matching paths are proxied; everything else → 403 |
Evaluation order: blocklist → allowlist → proxy. No allowlist + no blocklist = open proxy (default).
Automatic Mutable Patterns (Per-Provider Built-ins)
Each provider declares built-in mutable patterns that are always merged with user-defined mutable_patterns. Users never need to configure these — the provider knows which paths change over time.
| Provider | Built-in Mutable Patterns | Rationale |
|---|---|---|
| generic | (none) | No convention for what's mutable |
| docker | /manifests/(?!sha256:)[^/]+$, /tags/list$ |
Tag manifests change; digest manifests don't |
| helm | index\.yaml$ |
Chart index changes when new charts are published |
| pypi | simple/ |
Package index pages change with new releases |
| npm | ^[^/]+$ (package metadata, not .tgz) |
Package metadata changes; tarballs are immutable |
| rpm | repomd\.xml$, repodata/.*, Packages\.gz$ |
Repo metadata rebuilt on every publish |
| alpine | APKINDEX\.tar\.gz$ |
Package index rebuilt on every publish |
| puppet | ^v3/modules/, ^v3/releases |
Module metadata changes with new releases |
| terraform | [^/]+/[^/]+/versions$ |
Provider version listings grow over time |
| goproxy | @v/list$, @latest$ |
Version list and latest pointer change |
These are returned by Provider.BuiltinMutablePatterns() and merged at classification time:
effective_mutable = provider.BuiltinMutablePatterns() ∪ remote.mutable_patterns
If a path matches effective_mutable → use mutable_ttl. If it matches remote.immutable_patterns → use immutable_ttl. Immutable patterns take precedence over mutable when both match.
API Design
v1 Proxy Endpoints (Backwards Compatible)
| Method | Path | Description |
|---|---|---|
GET |
/api/v1/remote/{name}/{path} |
Proxy/cache artifact |
GET |
/api/v1/virtual/{name}/{path} |
Virtual repo proxy |
GET/HEAD |
/v2/{name}/{path} |
Docker Registry v2 |
GET |
/v2/ |
Docker v2 ping |
GET/PUT/HEAD/DELETE |
/api/v1/local/{name}/{path} |
Local repo CRUD |
v2 Management API (New)
GET /api/v2/remotes → [{name, package_type, base_url, description, stats}]
GET /api/v2/remotes/{name} → {full config + stats + health}
POST /api/v2/remotes → create remote (Terraform provider)
PUT /api/v2/remotes/{name} → update remote (Terraform provider)
DELETE /api/v2/remotes/{name} → delete remote — cascades artifacts, GC cleans S3
GET /api/v2/virtuals → [{name, package_type, members, stats}]
GET /api/v2/virtuals/{name} → {full config + member details}
POST /api/v2/virtuals → create virtual
PUT /api/v2/virtuals/{name} → update virtual
DELETE /api/v2/virtuals/{name} → delete virtual
GET /api/v2/remotes/{name}/objects → paginated objects
?q=pattern&sort=size|accessed|age&page=1&per_page=50
DELETE /api/v2/remotes/{name}/objects/{path} → evict specific cached object
DELETE /api/v2/remotes/{name}/cache → flush cache
?type=all|indexes|blobs
GET /api/v2/stats → overview stats
GET /api/v2/stats/top-remotes → top remotes by size/requests/hit-rate
GET /api/v2/health → {status, postgres, redis, s3, uptime}
GET /metrics → Prometheus format
GET /api/v2/events → SSE stream
Proxy Engine
Request Flow
Client Request
│
▼
Classify (immutable/mutable/denied)
│
├── blocklist match → 403
├── allowlist non-empty + no match → 403
│
▼
Check Redis TTL key
│
├── exists (fresh) → serve from S3, log access
│
├── missing (expired or uncached)
│ │
│ ▼
│ Acquire fetch lock (Redis SETNX, 30s TTL)
│ │
│ ├── lock acquired
│ │ ├── mutable + check_mutable + have ETag → HEAD upstream
│ │ │ ├── 304 → refresh TTL, serve from S3
│ │ │ └── changed → full fetch
│ │ └── full fetch from upstream
│ │ → provider.RewriteResponse() if needed
│ │ → CAS store (hash → check blobs → upload if new)
│ │ → upsert artifact in Postgres
│ │ → set Redis TTL + release lock
│ │ → on upstream error + stale_on_error → refresh TTL, serve stale
│ │
│ └── lock not acquired → poll S3 briefly, serve if another pod fetched it
│
▼
Stream response from S3, log access
Circuit Breaker
Per-remote, tracked in Redis. Closed → Open (after N failures) → Half-open (after cooldown). Exposed via GET /api/v2/remotes/{name} health field.
Content-Addressable Storage
- Stream upstream → temp file, compute SHA256 inline
- Check
blobstable for hash - Exists → skip S3 upload, upsert
artifactsrow only - New → upload to
blobs/sha256/{hash}, insert both rows
Garbage Collection
Background goroutine (configurable interval, default 1h):
- Orphaned blobs: delete S3 objects whose
content_hashhas no referencingartifactsorlocal_filesrows - Cold artifacts: optional per-remote, delete artifacts not accessed in N days
- Remote deletion:
ON DELETE CASCADEhandles Postgres; GC sweeps orphaned blobs
Package Providers
Provider Interface
type Provider interface {
Type() models.PackageType
BuiltinMutablePatterns() []*regexp.Regexp
BuiltinImmutablePatterns() []*regexp.Regexp
ContentType(path string) string
UpstreamURL(remote models.Remote, path string) string
RewriteResponse(body []byte, remote models.Remote, proxyBaseURL string) ([]byte, error)
AuthHeaders(ctx context.Context, remote models.Remote) (http.Header, error)
}
type IndexMerger interface {
MergeIndexes(members []MemberIndex, proxyBaseURL string) ([]byte, error)
}
Provider Registry
var registry = map[models.PackageType]Provider{
models.PackageGeneric: &generic.Provider{},
models.PackageDocker: &docker.Provider{},
models.PackageHelm: &helm.Provider{},
models.PackagePyPI: &pypi.Provider{},
models.PackageNPM: &npm.Provider{},
models.PackageRPM: &rpm.Provider{},
models.PackageAlpine: &alpine.Provider{},
models.PackagePuppet: &puppet.Provider{},
models.PackageTerraform: &terraform.Provider{},
models.PackageGoProxy: &goproxy.Provider{},
}
func Get(t models.PackageType) (Provider, error) { ... }
Each provider lives in its own subpackage under internal/provider/ with its own _test.go.
Testing Strategy
Unit Tests
Every package gets _test.go files alongside the source. Run with go test ./....
| Package | What's Tested |
|---|---|
internal/provider/docker/ |
Auth token parsing/caching, manifest classification, tag banning, URL construction, blob key generation |
internal/provider/helm/ |
index.yaml parsing (using helm.sh/helm/v3/pkg/repo), URL rewriting, index merging |
internal/provider/pypi/ |
Simple index HTML parsing, URL rewriting, index merging |
internal/provider/npm/ |
Metadata JSON rewriting (dist.tarball URLs) |
internal/provider/terraform/ |
Registry URL construction, download info JSON rewriting, releases_remote URL extraction |
internal/provider/rpm/ |
Mutable pattern matching (repodata) |
internal/provider/alpine/ |
Mutable pattern matching (APKINDEX) |
internal/provider/puppet/ |
file_uri JSON rewriting |
internal/proxy/ |
Classifier (immutable vs mutable vs denied), circuit breaker state transitions, revalidator logic |
internal/storage/ |
CAS key generation, dedup detection, S3 operation mocking |
internal/cache/ |
Redis TTL set/check, fetch lock acquire/release/contention |
internal/gc/ |
Orphan detection queries, cold artifact selection |
pkg/models/ |
Model validation, PackageType enum |
pkg/client/ |
API client request/response serialization |
End-to-End Tests
Located in e2e/. Use testcontainers-go to spin up real Postgres, Redis, and MinIO containers. The test binary starts the actual artifactapi server against these backends.
// e2e/e2e_test.go
func TestMain(m *testing.M) {
// Start postgres, redis, minio via testcontainers-go
// Run migrations
// Start artifactapi server on random port
// Run tests
// Tear down
}
| Test File | What's Tested |
|---|---|
e2e/proxy_test.go |
Proxy a real GitHub release through generic remote, verify S3 storage, verify Redis TTL, verify Postgres artifact row, verify cache hit on second request |
e2e/docker_test.go |
Pull a real image manifest + blob through Docker v2 proxy, verify blob deduplication, tag banning |
e2e/management_test.go |
Full CRUD lifecycle: create remote via v2 API, proxy through it, list objects, evict object, flush cache, delete remote |
e2e/virtual_test.go |
Create two helm remotes + virtual, fetch merged index, verify priority ordering |
e2e/terraform_test.go |
Proxy terraform provider version listing + download info, verify URL rewriting to releases_remote |
e2e/goproxy_test.go |
Proxy Go module @v/list, .info, .mod, .zip through GOPROXY remote, verify mutable vs immutable classification |
e2e/gc_test.go |
Create artifact, delete remote, trigger GC, verify S3 blob cleaned up |
Code Quality
gofmt/goimports— enforced in CI, run on savegolangci-lint— comprehensive linter suite (staticcheck, errcheck, govet, etc.)go vet ./...— run in CI- Makefile targets:
make test,make lint,make e2e,make fmt
Terraform Provider (Separate Repo)
Repo: terraform-provider-artifactapi
Uses: pkg/client/ and pkg/models/ from the main module
provider "artifactapi" {
endpoint = "https://artifactapi.k8s.syd1.au.unkin.net"
}
resource "artifactapi_remote" "terraform_registry" {
name = "terraform-registry"
package_type = "terraform"
base_url = "https://registry.terraform.io"
description = "Terraform provider registry"
releases_remote = artifactapi_remote.hashicorp_releases.name
immutable_patterns = [
"[^/]+/[^/]+/[^/]+/download/[^/]+/[^/]+$",
]
cache {
immutable_ttl = 0
mutable_ttl = 300
}
}
resource "artifactapi_remote" "hashicorp_releases" {
name = "hashicorp-releases"
package_type = "generic"
base_url = "https://releases.hashicorp.com"
immutable_patterns = [
".*\\.zip$",
".*SHA256SUMS(\\.sig)?$",
]
cache {
immutable_ttl = 0
mutable_ttl = 0
}
}
resource "artifactapi_virtual" "helm" {
name = "helm"
package_type = "helm"
description = "All helm repos merged"
members = [
artifactapi_remote.jetstack.name,
artifactapi_remote.hashicorp_helm.name,
]
}
Web UI (React + Vite — Separate Container)
Deployment
Separate Dockerfile.ui: multi-stage build (node → nginx). Served as its own container/pod. nginx proxies /api/* to the Go backend.
Pages
| Route | Content |
|---|---|
/ |
Dashboard: total objects, storage used, dedup savings, bandwidth saved, top remotes chart, live SSE event feed, health indicators |
/remotes |
Remote table: name, type, description, object count, size, hit rate, health. Filter by type, sort any column |
/remotes/:name |
Config (read-only, "Managed by Terraform" badge), stats, object browser with search/sort/evict, flush actions |
/virtuals |
Virtual table: name, type, members, merged object count |
/virtuals/:name |
Member list with individual stats |
All config is read-only — managed by Terraform.
TUI (Bubble Tea — Subcommand)
artifactapi tui --endpoint http://localhost:8000 or via ARTIFACTAPI_ENDPOINT env.
Uses pkg/client/ for all API calls (same client as Terraform provider).
| View | Key bindings |
|---|---|
| Dashboard | summary stats, top remotes |
| Remotes list | j/k navigate, / filter, Enter detail |
| Remote detail | config + stats, Enter → object browser |
| Object browser | / search, d evict, f flush |
| Virtuals | j/k, Enter detail |
Improvements Over v2
| Area | v2 (Python) | v3 (Go) |
|---|---|---|
| S3 paths | Hashed, opaque | Content-addressed CAS |
| Config | YAML files, mtime reload | Terraform via API |
| Package types | 8 types | 10 types (+ terraform, goproxy) |
| Virtual repos | Helm only | Helm + PyPI, extensible |
| Deduplication | Docker blobs only | All types via CAS |
| Revalidation | Opt-in flag | Default for all mutable |
| Access logging | None | Per-artifact in Postgres |
| GC | None | Background goroutine |
| Upstream health | Per-request | Circuit breaker |
| S3 backends | MinIO only | MinIO, Ceph, AWS (minio-go) |
| UI | None | Web dashboard + TUI |
| Binary | Python + venv | Static Go binary |
| Frontend | N/A | Separate container (React) |
| Testing | Mocked unit tests | Unit + e2e with real backends |
Implementation Phases
Phase 1: Core Engine + Models
- Go module, Makefile (
make build test lint fmt e2e), Dockerfile, docker-compose pkg/models/— all domain types- PostgreSQL schema + migrations
- S3 storage layer with CAS (
minio-go/v7) - Redis cache layer (TTL, locks)
- Proxy engine: fetch-or-cache, classifier, revalidator
- Generic + Docker providers (most complexity: OCI auth, CAS, tag banning)
- Health + metrics endpoints
- Unit tests for all packages
- Milestone: proxy Docker + generic, cache in S3, track in Postgres
Phase 2: All Providers
- Helm (using
helm.sh/helm/v3/pkg/repo) - PyPI (stdlib
x/net/html) - npm (stdlib
encoding/json) - RPM (using
rs3.io/go/rpm/repomd) - Alpine (using
gitlab.alpinelinux.org/alpine/go) - Puppet Forge (stdlib
encoding/json) - Terraform (using
hashicorp/terraform-registry-address) - Go Modules / GOPROXY (using
github.com/goproxy/goproxy) - Unit tests per provider
- Milestone: feature parity with v2 + goproxy
Phase 3: Management API + Virtual Repos + GC
pkg/client/— shared Go API client- v2 CRUD endpoints
- Virtual repo engine:
IndexMergerfor Helm + PyPI - Circuit breaker
- Access logging middleware
- GC goroutine
- Milestone: full API, virtuals, GC
Phase 4: End-to-End Tests
e2e/test suite withtestcontainers-go- Proxy, Docker, management, virtual, terraform, GC tests
- CI pipeline:
make e2e - Milestone: comprehensive e2e coverage
Phase 5: Terraform Provider
- Separate repo:
terraform-provider-artifactapi - Imports
pkg/client/andpkg/models/ artifactapi_remote+artifactapi_virtualresources + data sources- Import support
- Milestone: manage all config via Terraform
Phase 6: Web UI
- React + Vite in
ui/ Dockerfile.ui(multi-stage → nginx)- Dashboard, remotes, objects, virtuals pages
- SSE event feed
- Milestone: full web UI in separate container
Phase 7: TUI
- Bubble Tea in
internal/tui/ - Uses
pkg/client/ - Dashboard, remotes, objects, virtuals views
- Milestone: TUI feature parity with web UI
Phase 8: Migration + Cutover
- Migration tool: v2 YAML → Terraform HCL +
terraform importcommands - S3 rehash script:
{remote}/{hash16}/{file}→blobs/sha256/{content_hash} - Parallel run, response comparison
- Cutover
Makefile Targets
.PHONY: build test lint fmt e2e docker docker-ui
build: ## Build Go binary
go build -o bin/artifactapi ./cmd/artifactapi
test: ## Run unit tests
go test ./...
lint: ## Run golangci-lint + go vet
golangci-lint run ./...
go vet ./...
fmt: ## Format code (gofmt + goimports)
gofmt -w .
goimports -w .
e2e: ## Run end-to-end tests (requires Docker)
go test -tags=e2e -count=1 -timeout=5m ./e2e/...
docker: ## Build API server Docker image
docker build -t artifactapi .
docker-ui: ## Build frontend Docker image
docker build -t artifactapi-ui -f ui/Dockerfile.ui ui/
compose: ## Start full stack (API + UI + Postgres + Redis + MinIO)
docker compose up -d