Files
artifactapi/PLAN.md
benvin b46c116f6b
ci/woodpecker/tag/docker Pipeline was successful
Feat/v3 go rewrite (#47)
Complete rewrite of ArtifactAPI from Python/FastAPI to Go as a single binary.

Core engine:
- 10 package providers: generic, docker, helm, pypi, npm, rpm, alpine,
  puppet, terraform, goproxy — each with built-in mutable patterns
- Content-addressable storage (SHA256 dedup across all remotes)
- Three-tier caching: Redis (TTL/locks) → S3/MinIO (blobs) → upstream
- Classifier with allowlist/blocklist per-remote (empty = allow all)
- Circuit breaker, conditional revalidation, stale-on-error
- Background garbage collection for orphaned blobs
- Access logging to PostgreSQL

API:
- v1 proxy endpoints (backwards compatible)
- v2 management API: CRUD remotes/virtuals, object browser, stats,
  health, SSE events, probe/test endpoint
- Virtual repos with index merging (Helm YAML + PyPI HTML)

Frontend (React + Vite, separate Dockerfile):
- Dashboard with stats, health indicators, top remotes
- Remotes list with type filter, remote detail with config/patterns
- Object browser with pagination and evict
- Test Remote page: probe any remote path, see headers/size/timing
- Virtuals page with expandable member lists

TUI (Bubble Tea):
- Dashboard, remotes list/detail, object browser, virtuals
- Vim-style navigation, artifactapi tui --endpoint <url>

Infrastructure:
- S3 client supports MinIO, Ceph RGW, AWS S3 (minio-go)
- PostgreSQL schema with migrations
- Docker Compose: API + UI + Postgres 17 + Redis 7 + MinIO
- Makefile with Go version check, build/test/lint/fmt/e2e targets
- Distroless Docker image (~15MB)

Testing:
- Unit tests for models, classifier, providers, mergers
- E2E tests with testcontainers-go (real Postgres/Redis/MinIO)

Terraform config:
- All 40 production remotes + helm virtual as HCL
- Provider repo: terraform-provider-artifactapi v0.0.1 (separate)

---------

Co-authored-by: Ben Vincent <ben@unkin.net>
Reviewed-on: #47
2026-06-07 19:30:35 +10:00

39 KiB
Raw Permalink Blame History

ArtifactAPI v3 — Go Rewrite Plan

Context

ArtifactAPI is a production artifact proxy/cache serving ~42 remotes (Docker registries, Helm repos, RPM/Alpine repos, GitHub releases, PyPI, npm, Puppet Forge, Terraform registries, Go module proxies) across a Kubernetes cluster. The current Python (FastAPI) implementation works but has architectural debt: opaque hashed S3 paths, no UI for visibility, YAML config files that drift, no garbage collection, no access logging, and virtual repos limited to Helm only.

The v3 rewrite targets: a single Go binary (API + TUI), a separate React frontend (own Dockerfile), a Terraform provider (separate repo), content-addressable storage, and a cleaner data model that makes the cache inspectable and manageable.

Repo: Same repo (git.unkin.net/unkin/artifactapi), new branch. Module: git.unkin.net/unkin/artifactapi Frontend: React + Vite, separate Dockerfile, talks to API Terraform provider: Separate repo (terraform-provider-artifactapi)


Architecture Overview

┌───────────────────────────────────┐    ┌──────────────────────┐
│        Go Binary (API + TUI)      │    │   Frontend Container │
│                                   │    │                      │
│  ┌──────────┐  ┌───────────────┐  │    │  React + Vite SPA    │
│  │ REST API │  │ Proxy Engine  │  │◄───│  nginx / node serve  │
│  │ /api/v2  │  │ /api/v1/...   │  │    │  Dockerfile.ui       │
│  │          │  │ /v2/... (OCI) │  │    └──────────────────────┘
│  └────┬─────┘  └──────┬────────┘  │
│       │               │           │    ┌──────────────────────┐
│  ┌────┴───────────────┴────────┐  │    │ Terraform Provider   │
│  │        Data Layer           │  │◄───│ (separate repo)      │
│  │  PostgreSQL · Redis · S3    │  │    └──────────────────────┘
│  └─────────────────────────────┘  │
│                                   │    ┌──────────────────────┐
│  ┌─────────────────────────────┐  │    │ TUI (subcommand)     │
│  │  artifactapi tui            │──│───►│ artifactapi tui      │
│  └─────────────────────────────┘  │    │ --endpoint <url>     │
└───────────────────────────────────┘    └──────────────────────┘

Three independent deployment units:

  1. Go binary — API server + TUI subcommand (single Dockerfile)
  2. React frontend — SPA served by nginx (Dockerfile.ui), talks to /api/v2
  3. Terraform provider — separate repo, calls /api/v2 CRUD

Project Structure (Modular)

artifactapi/
├── cmd/
│   └── artifactapi/
│       └── main.go                     # entrypoint: serve / tui subcommands
│
├── pkg/                                # PUBLIC — importable by terraform provider, CLI tools
│   ├── models/                         # shared domain types
│   │   ├── remote.go                   # Remote, RemoteConfig, PackageType enum
│   │   ├── virtual.go                  # Virtual, VirtualConfig
│   │   ├── artifact.go                 # Artifact, Blob, AccessLogEntry
│   │   ├── local.go                    # LocalFile, LocalRepo
│   │   └── stats.go                    # RemoteStats, OverviewStats
│   └── client/                         # typed Go API client (used by TUI + Terraform provider)
│       ├── client.go                   # Client struct, base HTTP
│       ├── remotes.go                  # remote CRUD methods
│       ├── virtuals.go                 # virtual CRUD methods
│       ├── objects.go                  # object browse/evict methods
│       └── stats.go                    # stats methods
│
├── internal/                           # PRIVATE — server internals
│   ├── server/
│   │   ├── server.go                   # HTTP server setup, router
│   │   └── middleware.go               # logging, recovery, request-id, access logging
│   │
│   ├── api/
│   │   ├── v1/                         # proxy endpoints (v1 compat)
│   │   │   ├── proxy.go               # GET /api/v1/remote/{name}/{path}
│   │   │   ├── docker.go              # /v2/{name}/{path}
│   │   │   ├── virtual.go             # GET /api/v1/virtual/{name}/{path}
│   │   │   └── local.go               # CRUD /api/v1/local/{name}/{path}
│   │   └── v2/                         # management API
│   │       ├── remotes.go             # CRUD + stats
│   │       ├── virtuals.go            # CRUD
│   │       ├── objects.go             # browse/evict cached objects
│   │       ├── stats.go               # overview, top-remotes
│   │       ├── events.go              # SSE stream
│   │       └── health.go              # health, metrics
│   │
│   ├── provider/                       # package-type providers (registry protocol handlers)
│   │   ├── provider.go                # Provider interface + registry
│   │   ├── generic/
│   │   │   ├── generic.go
│   │   │   └── generic_test.go
│   │   ├── docker/
│   │   │   ├── docker.go              # OCI Distribution v2 via go-containerregistry
│   │   │   ├── auth.go                # Bearer token fetch + cache
│   │   │   └── docker_test.go
│   │   ├── helm/
│   │   │   ├── helm.go                # index rewriting via helm.sh/helm/v3/pkg/repo
│   │   │   ├── merger.go              # virtual index merge
│   │   │   └── helm_test.go
│   │   ├── pypi/
│   │   │   ├── pypi.go                # simple index HTML rewriting
│   │   │   ├── merger.go              # virtual simple index merge
│   │   │   └── pypi_test.go
│   │   ├── npm/
│   │   │   ├── npm.go                 # metadata JSON rewriting
│   │   │   └── npm_test.go
│   │   ├── rpm/
│   │   │   ├── rpm.go                 # repodata patterns
│   │   │   └── rpm_test.go
│   │   ├── alpine/
│   │   │   ├── alpine.go              # APKINDEX patterns
│   │   │   └── alpine_test.go
│   │   ├── puppet/
│   │   │   ├── puppet.go              # file_uri JSON rewriting
│   │   │   └── puppet_test.go
│   │   ├── terraform/
│   │   │   ├── terraform.go           # registry protocol, download URL rewriting
│   │   │   └── terraform_test.go
│   │   └── goproxy/
│   │       ├── goproxy.go             # Go module proxy protocol (GOPROXY)
│   │       └── goproxy_test.go
│   │
│   ├── proxy/
│   │   ├── engine.go                  # core fetch-or-cache logic
│   │   ├── engine_test.go
│   │   ├── classifier.go              # immutable vs mutable classification
│   │   ├── classifier_test.go
│   │   ├── revalidator.go             # conditional HEAD requests (ETag/Last-Modified)
│   │   └── circuit.go                 # per-remote circuit breaker
│   │
│   ├── storage/
│   │   ├── s3.go                      # S3 client (minio-go — works with MinIO, Ceph, AWS)
│   │   ├── s3_test.go
│   │   ├── cas.go                     # content-addressable store logic
│   │   └── cas_test.go
│   │
│   ├── cache/
│   │   ├── redis.go                   # TTL management, fetch locks
│   │   ├── redis_test.go
│   │   └── lock.go                    # distributed lock abstraction
│   │
│   ├── database/
│   │   ├── postgres.go                # connection pool, migration runner
│   │   ├── queries/                   # SQL query files or sqlc-generated code
│   │   │   ├── remotes.sql.go
│   │   │   ├── virtuals.sql.go
│   │   │   ├── artifacts.sql.go
│   │   │   └── access_log.sql.go
│   │   └── migrations/               # golang-migrate SQL files
│   │       ├── 001_initial.up.sql
│   │       └── 001_initial.down.sql
│   │
│   ├── metrics/
│   │   └── prometheus.go             # counters, gauges, histograms
│   │
│   ├── gc/
│   │   ├── gc.go                      # background garbage collection goroutine
│   │   └── gc_test.go
│   │
│   ├── tui/
│   │   ├── app.go                     # Bubble Tea main model
│   │   ├── views/
│   │   │   ├── dashboard.go
│   │   │   ├── remotes.go
│   │   │   ├── objects.go
│   │   │   └── virtuals.go
│   │   └── components/
│   │       ├── table.go
│   │       └── statusbar.go
│   │
│   └── config/
│       └── env.go                     # environment variable parsing + validation
│
├── ui/                                 # React frontend — SEPARATE DOCKERFILE
│   ├── src/
│   │   ├── App.tsx
│   │   ├── pages/
│   │   │   ├── Dashboard.tsx
│   │   │   ├── Remotes.tsx
│   │   │   ├── RemoteDetail.tsx
│   │   │   ├── Virtuals.tsx
│   │   │   └── Objects.tsx
│   │   ├── components/
│   │   │   ├── RemoteTable.tsx
│   │   │   ├── ObjectBrowser.tsx
│   │   │   ├── StatsCard.tsx
│   │   │   └── EventFeed.tsx
│   │   └── api/
│   │       └── client.ts              # typed API client
│   ├── package.json
│   ├── vite.config.ts
│   ├── tsconfig.json
│   ├── Dockerfile.ui                  # multi-stage: node build → nginx
│   └── nginx.conf                     # proxy /api/* to backend, serve SPA
│
├── e2e/                                # end-to-end integration tests
│   ├── e2e_test.go                    # TestMain spins up docker-compose stack
│   ├── proxy_test.go                  # proxy through real remotes
│   ├── docker_test.go                 # Docker v2 protocol e2e
│   ├── management_test.go            # v2 API CRUD
│   ├── virtual_test.go               # virtual repo merge e2e
│   └── docker-compose.e2e.yml        # postgres + redis + minio for tests
│
├── go.mod
├── go.sum
├── Makefile
├── Dockerfile                          # Go binary (API server + TUI)
├── Dockerfile.ui                       # symlink or copy → ui/Dockerfile.ui
└── docker-compose.yml

Key Modularisation Decisions

  • pkg/models/ — Shared domain types importable by the Terraform provider and any external tooling. No dependencies on internal packages
  • pkg/client/ — Typed Go API client used by both the TUI and the Terraform provider. Depends only on pkg/models/ and stdlib
  • internal/provider/ — Each package type is its own subpackage with isolated tests. A provider registry maps PackageType → Provider
  • internal/database/queries/ — Use sqlc to generate type-safe query functions from SQL, or hand-written query files
  • e2e/ — Separate test binary that spins up a real docker-compose stack

Go Ecosystem Libraries

Prefer existing, maintained Go modules over writing protocol handlers from scratch.

Package-Type Libraries

Package Type Go Module What It Gives Us
Docker/OCI github.com/google/go-containerregistry Full Registry v2/OCI client: manifest parsing, auth challenges, blob operations. pkg/registry can implement a v2 server. Reference: github.com/regclient/regclient
Helm helm.sh/helm/v3/pkg/repo Parse/generate index.yaml, IndexFile/ChartVersion types, URL entries. Used directly for merge
Terraform github.com/hashicorp/terraform-registry-address Provider/module address parsing, ForRegistryProtocol() URL generation. Protocol spec: provider registry protocol v1
Go Modules github.com/goproxy/goproxy Minimalist GOPROXY protocol handler, implements full spec as http.Handler. Handles /@v/list, /@v/{v}.info, /@v/{v}.mod, /@v/{v}.zip, /@latest
RPM rs3.io/go/rpm/repomd Parse repomd.xml, primary.xml with proper XML namespace handling
Alpine gitlab.alpinelinux.org/alpine/go Official Alpine library: parse APKINDEX, .apk files
PyPI stdlib golang.org/x/net/html No dedicated Go PyPI library exists. Parse simple index HTML with x/net/html, extract <a> tags. Minimal — the rewriting is just href replacement
npm stdlib encoding/json npm metadata is JSON — parse with stdlib, rewrite dist.tarball URLs. No special library needed
Puppet Forge stdlib encoding/json Forge API is JSON — parse and rewrite file_uri fields. Community lib github.com/johnmccabe/go-puppetforge exists but is thin; stdlib suffices

Infrastructure Libraries

Purpose Go Module Why This One
HTTP router github.com/go-chi/chi/v5 Lightweight, stdlib http.Handler compatible, middleware chain
PostgreSQL github.com/jackc/pgx/v5 Pure Go, connection pooling, COPY support, prepared statements
SQL generation github.com/sqlc-dev/sqlc Generate type-safe Go from SQL queries — no ORM, no reflection
Redis github.com/redis/go-redis/v9 Full Redis client, pipelining, pub/sub
S3 (MinIO/Ceph/AWS) github.com/minio/minio-go/v7 Native S3-compatible client. Works with MinIO, Ceph RGW, AWS S3, any S3-compatible backend out of the box. Lighter than aws-sdk-go-v2, purpose-built for S3 compat
DB migrations github.com/golang-migrate/migrate/v4 SQL file-based migrations, CLI + library
Prometheus github.com/prometheus/client_golang Counters, gauges, histograms
TUI github.com/charmbracelet/bubbletea Elm-architecture TUI framework
TUI styling github.com/charmbracelet/lipgloss Terminal styling
TUI components github.com/charmbracelet/bubbles Table, text input, spinner, etc.
Structured logging log/slog (stdlib) Go 1.21+ structured logging, zero dependencies
Testing github.com/stretchr/testify Assertions + require for unit tests
Test containers github.com/testcontainers/testcontainers-go Spin up Postgres/Redis/MinIO in e2e tests

S3 Client: Multi-Backend Support

Using minio-go/v7 as the S3 client because it natively supports:

  • MinIO — primary development/production target
  • Ceph RGW — S3-compatible via endpoint config
  • AWS S3 — via region + credential config
  • Any S3-compatible — GCS (interop mode), Wasabi, DigitalOcean Spaces, etc.

No abstraction layer needed — minio-go handles endpoint differences internally. Config:

client, _ := minio.New(endpoint, &minio.Options{
    Creds:  credentials.NewStaticV4(accessKey, secretKey, ""),
    Secure: useTLS,
    Region: region, // optional, for AWS
})

Data Layer

PostgreSQL Schema

-- Remotes: managed exclusively by Terraform
CREATE TABLE remotes (
    name            TEXT PRIMARY KEY,
    package_type    TEXT NOT NULL,  -- generic, docker, helm, pypi, npm, rpm, alpine, puppet, terraform, goproxy
    base_url        TEXT NOT NULL,
    description     TEXT DEFAULT '',
    username        TEXT DEFAULT '',
    password        TEXT DEFAULT '',
    immutable_ttl   INTEGER DEFAULT 0,
    mutable_ttl     INTEGER DEFAULT 3600,
    check_mutable   BOOLEAN DEFAULT TRUE,
    immutable_patterns  TEXT[] DEFAULT '{}',  -- user-defined immutable patterns
    mutable_patterns    TEXT[] DEFAULT '{}',  -- user-defined mutable patterns (merged with provider built-ins)
    allowlist        TEXT[] DEFAULT '{}',     -- if empty, allow all paths; if non-empty, only matching paths proxied
    blocklist        TEXT[] DEFAULT '{}',     -- always denied, checked before allowlist
    ban_tags_enabled    BOOLEAN DEFAULT FALSE,
    ban_tags            TEXT[] DEFAULT '{}',
    quarantine_enabled  BOOLEAN DEFAULT FALSE,
    quarantine_days     INTEGER DEFAULT 3,
    stale_on_error      BOOLEAN DEFAULT TRUE,
    releases_remote     TEXT DEFAULT '',  -- terraform type: name of CDN remote for download URL rewriting
    managed_by          TEXT DEFAULT '',  -- 'terraform' or empty
    created_at      TIMESTAMPTZ DEFAULT NOW(),
    updated_at      TIMESTAMPTZ DEFAULT NOW()
);

-- Virtual repositories
CREATE TABLE virtuals (
    name            TEXT PRIMARY KEY,
    package_type    TEXT NOT NULL,
    description     TEXT DEFAULT '',
    members         TEXT[] NOT NULL,
    managed_by      TEXT DEFAULT '',
    created_at      TIMESTAMPTZ DEFAULT NOW(),
    updated_at      TIMESTAMPTZ DEFAULT NOW()
);

-- Content-addressable blob storage tracking
CREATE TABLE blobs (
    content_hash    TEXT PRIMARY KEY,
    s3_key          TEXT NOT NULL,
    size_bytes      BIGINT NOT NULL,
    content_type    TEXT DEFAULT 'application/octet-stream',
    created_at      TIMESTAMPTZ DEFAULT NOW()
);

-- Artifact metadata: maps (remote, path) → content blob
CREATE TABLE artifacts (
    id              BIGSERIAL PRIMARY KEY,
    remote_name     TEXT NOT NULL REFERENCES remotes(name) ON DELETE CASCADE,
    path            TEXT NOT NULL,
    content_hash    TEXT NOT NULL REFERENCES blobs(content_hash),
    upstream_etag   TEXT DEFAULT '',
    upstream_last_modified TIMESTAMPTZ,
    first_seen_at   TIMESTAMPTZ DEFAULT NOW(),
    last_fetched_at TIMESTAMPTZ DEFAULT NOW(),
    last_accessed_at TIMESTAMPTZ DEFAULT NOW(),
    fetch_count     BIGINT DEFAULT 1,
    access_count    BIGINT DEFAULT 1,
    UNIQUE(remote_name, path)
);

CREATE INDEX idx_artifacts_remote ON artifacts(remote_name);
CREATE INDEX idx_artifacts_last_accessed ON artifacts(last_accessed_at);

-- Local file uploads
CREATE TABLE local_files (
    id              BIGSERIAL PRIMARY KEY,
    repo_name       TEXT NOT NULL,
    file_path       TEXT NOT NULL,
    content_hash    TEXT NOT NULL REFERENCES blobs(content_hash),
    created_at      TIMESTAMPTZ DEFAULT NOW(),
    UNIQUE(repo_name, file_path)
);

-- Access log (append-only, powers dashboards)
CREATE TABLE access_log (
    id              BIGSERIAL PRIMARY KEY,
    remote_name     TEXT NOT NULL,
    path            TEXT NOT NULL,
    cache_hit       BOOLEAN NOT NULL,
    size_bytes      BIGINT DEFAULT 0,
    upstream_ms     INTEGER DEFAULT 0,
    client_ip       TEXT DEFAULT '',
    created_at      TIMESTAMPTZ DEFAULT NOW()
);

CREATE INDEX idx_access_log_remote_time ON access_log(remote_name, created_at);

Redis Usage (Ephemeral Only)

Key pattern Type TTL Purpose
ttl:{remote}:{path} STRING remote's immutable/mutable TTL Artifact freshness — existence = still fresh
lock:{remote}:{path} STRING (NX) 30s Fetch lock — prevents thundering herd
etag:{remote}:{path} STRING same as TTL key Cached ETag for conditional revalidation
circuit:{remote} STRING configurable Circuit breaker — consecutive failure count

Losing Redis = all TTLs expire = next request re-validates upstream. No data loss.

S3 Layout (Content-Addressable)

artifacts-bucket/
├── blobs/sha256/{content_hash}     # immutable CAS blobs
├── indexes/{remote}/{path}          # mutable index files (helm, pypi, rpm, etc.)
├── indexes/{virtual}/{path}         # merged virtual indexes
└── local/{repo}/{path}              # user uploads (CAS-backed via blobs table)

Terraform Remote Type (New in v2)

The terraform package type proxies the Terraform Provider Registry Protocol:

  • URL construction: prepends /v1/providers/ to request paths
  • Built-in mutable pattern: [^/]+/[^/]+/versions$ (version listings change over time)
  • Built-in immutable pattern: [^/]+/[^/]+/[^/]+/download/[^/]+/[^/]+$ (per-version download info is fixed)
  • Response rewriting: download info JSON — rewrites download_url, shasums_url, shasums_signature_url to route through a companion releases_remote (e.g., hashicorp-releases generic remote)
  • Config: requires releases_remote field pointing to the CDN remote that serves the actual binaries

Uses github.com/hashicorp/terraform-registry-address for address parsing and protocol-compliant URL generation.


Go Module Proxy Remote Type (New)

The goproxy package type implements the GOPROXY protocol (Go module proxy):

Endpoint Mutability Description
{module}/@v/list Mutable Plain text list of known versions
{module}/@latest Mutable JSON metadata for latest version
{module}/@v/{version}.info Immutable JSON version metadata (Version, Time)
{module}/@v/{version}.mod Immutable go.mod file for that version
{module}/@v/{version}.zip Immutable Source archive for that version
  • No URL rewriting needed — responses are self-contained (no embedded URLs)
  • Config: base_url points to upstream proxy (e.g., https://proxy.golang.org)
  • Client usage: set GOPROXY=https://artifactapi.example.com/api/v1/remote/goproxy
  • Uses github.com/goproxy/goproxy for protocol handling

Allowlist / Blocklist / Automatic Mutable Patterns

Access Control (Per-Remote)

Field Default Behavior
blocklist [] (empty) If a path matches any blocklist pattern → 403 Forbidden. Checked first
allowlist [] (empty) If empty → allow everything. If non-empty → only matching paths are proxied; everything else → 403

Evaluation order: blocklist → allowlist → proxy. No allowlist + no blocklist = open proxy (default).

Automatic Mutable Patterns (Per-Provider Built-ins)

Each provider declares built-in mutable patterns that are always merged with user-defined mutable_patterns. Users never need to configure these — the provider knows which paths change over time.

Provider Built-in Mutable Patterns Rationale
generic (none) No convention for what's mutable
docker /manifests/(?!sha256:)[^/]+$, /tags/list$ Tag manifests change; digest manifests don't
helm index\.yaml$ Chart index changes when new charts are published
pypi simple/ Package index pages change with new releases
npm ^[^/]+$ (package metadata, not .tgz) Package metadata changes; tarballs are immutable
rpm repomd\.xml$, repodata/.*, Packages\.gz$ Repo metadata rebuilt on every publish
alpine APKINDEX\.tar\.gz$ Package index rebuilt on every publish
puppet ^v3/modules/, ^v3/releases Module metadata changes with new releases
terraform [^/]+/[^/]+/versions$ Provider version listings grow over time
goproxy @v/list$, @latest$ Version list and latest pointer change

These are returned by Provider.BuiltinMutablePatterns() and merged at classification time:

effective_mutable = provider.BuiltinMutablePatterns()  remote.mutable_patterns

If a path matches effective_mutable → use mutable_ttl. If it matches remote.immutable_patterns → use immutable_ttl. Immutable patterns take precedence over mutable when both match.


API Design

v1 Proxy Endpoints (Backwards Compatible)

Method Path Description
GET /api/v1/remote/{name}/{path} Proxy/cache artifact
GET /api/v1/virtual/{name}/{path} Virtual repo proxy
GET/HEAD /v2/{name}/{path} Docker Registry v2
GET /v2/ Docker v2 ping
GET/PUT/HEAD/DELETE /api/v1/local/{name}/{path} Local repo CRUD

v2 Management API (New)

GET    /api/v2/remotes                              → [{name, package_type, base_url, description, stats}]
GET    /api/v2/remotes/{name}                       → {full config + stats + health}
POST   /api/v2/remotes                              → create remote (Terraform provider)
PUT    /api/v2/remotes/{name}                        → update remote (Terraform provider)
DELETE /api/v2/remotes/{name}                        → delete remote — cascades artifacts, GC cleans S3

GET    /api/v2/virtuals                             → [{name, package_type, members, stats}]
GET    /api/v2/virtuals/{name}                      → {full config + member details}
POST   /api/v2/virtuals                             → create virtual
PUT    /api/v2/virtuals/{name}                       → update virtual
DELETE /api/v2/virtuals/{name}                       → delete virtual

GET    /api/v2/remotes/{name}/objects               → paginated objects
         ?q=pattern&sort=size|accessed|age&page=1&per_page=50
DELETE /api/v2/remotes/{name}/objects/{path}         → evict specific cached object
DELETE /api/v2/remotes/{name}/cache                 → flush cache
         ?type=all|indexes|blobs

GET    /api/v2/stats                                → overview stats
GET    /api/v2/stats/top-remotes                    → top remotes by size/requests/hit-rate

GET    /api/v2/health                               → {status, postgres, redis, s3, uptime}
GET    /metrics                                     → Prometheus format
GET    /api/v2/events                               → SSE stream

Proxy Engine

Request Flow

Client Request
    │
    ▼
Classify (immutable/mutable/denied)
    │
    ├── blocklist match → 403
    ├── allowlist non-empty + no match → 403
    │
    ▼
Check Redis TTL key
    │
    ├── exists (fresh) → serve from S3, log access
    │
    ├── missing (expired or uncached)
    │   │
    │   ▼
    │   Acquire fetch lock (Redis SETNX, 30s TTL)
    │   │
    │   ├── lock acquired
    │   │   ├── mutable + check_mutable + have ETag → HEAD upstream
    │   │   │   ├── 304 → refresh TTL, serve from S3
    │   │   │   └── changed → full fetch
    │   │   └── full fetch from upstream
    │   │       → provider.RewriteResponse() if needed
    │   │       → CAS store (hash → check blobs → upload if new)
    │   │       → upsert artifact in Postgres
    │   │       → set Redis TTL + release lock
    │   │       → on upstream error + stale_on_error → refresh TTL, serve stale
    │   │
    │   └── lock not acquired → poll S3 briefly, serve if another pod fetched it
    │
    ▼
Stream response from S3, log access

Circuit Breaker

Per-remote, tracked in Redis. Closed → Open (after N failures) → Half-open (after cooldown). Exposed via GET /api/v2/remotes/{name} health field.

Content-Addressable Storage

  1. Stream upstream → temp file, compute SHA256 inline
  2. Check blobs table for hash
  3. Exists → skip S3 upload, upsert artifacts row only
  4. New → upload to blobs/sha256/{hash}, insert both rows

Garbage Collection

Background goroutine (configurable interval, default 1h):

  1. Orphaned blobs: delete S3 objects whose content_hash has no referencing artifacts or local_files rows
  2. Cold artifacts: optional per-remote, delete artifacts not accessed in N days
  3. Remote deletion: ON DELETE CASCADE handles Postgres; GC sweeps orphaned blobs

Package Providers

Provider Interface

type Provider interface {
    Type() models.PackageType
    BuiltinMutablePatterns() []*regexp.Regexp
    BuiltinImmutablePatterns() []*regexp.Regexp
    ContentType(path string) string
    UpstreamURL(remote models.Remote, path string) string
    RewriteResponse(body []byte, remote models.Remote, proxyBaseURL string) ([]byte, error)
    AuthHeaders(ctx context.Context, remote models.Remote) (http.Header, error)
}

type IndexMerger interface {
    MergeIndexes(members []MemberIndex, proxyBaseURL string) ([]byte, error)
}

Provider Registry

var registry = map[models.PackageType]Provider{
    models.PackageGeneric:   &generic.Provider{},
    models.PackageDocker:    &docker.Provider{},
    models.PackageHelm:      &helm.Provider{},
    models.PackagePyPI:      &pypi.Provider{},
    models.PackageNPM:       &npm.Provider{},
    models.PackageRPM:       &rpm.Provider{},
    models.PackageAlpine:    &alpine.Provider{},
    models.PackagePuppet:    &puppet.Provider{},
    models.PackageTerraform: &terraform.Provider{},
    models.PackageGoProxy:   &goproxy.Provider{},
}

func Get(t models.PackageType) (Provider, error) { ... }

Each provider lives in its own subpackage under internal/provider/ with its own _test.go.


Testing Strategy

Unit Tests

Every package gets _test.go files alongside the source. Run with go test ./....

Package What's Tested
internal/provider/docker/ Auth token parsing/caching, manifest classification, tag banning, URL construction, blob key generation
internal/provider/helm/ index.yaml parsing (using helm.sh/helm/v3/pkg/repo), URL rewriting, index merging
internal/provider/pypi/ Simple index HTML parsing, URL rewriting, index merging
internal/provider/npm/ Metadata JSON rewriting (dist.tarball URLs)
internal/provider/terraform/ Registry URL construction, download info JSON rewriting, releases_remote URL extraction
internal/provider/rpm/ Mutable pattern matching (repodata)
internal/provider/alpine/ Mutable pattern matching (APKINDEX)
internal/provider/puppet/ file_uri JSON rewriting
internal/proxy/ Classifier (immutable vs mutable vs denied), circuit breaker state transitions, revalidator logic
internal/storage/ CAS key generation, dedup detection, S3 operation mocking
internal/cache/ Redis TTL set/check, fetch lock acquire/release/contention
internal/gc/ Orphan detection queries, cold artifact selection
pkg/models/ Model validation, PackageType enum
pkg/client/ API client request/response serialization

End-to-End Tests

Located in e2e/. Use testcontainers-go to spin up real Postgres, Redis, and MinIO containers. The test binary starts the actual artifactapi server against these backends.

// e2e/e2e_test.go
func TestMain(m *testing.M) {
    // Start postgres, redis, minio via testcontainers-go
    // Run migrations
    // Start artifactapi server on random port
    // Run tests
    // Tear down
}
Test File What's Tested
e2e/proxy_test.go Proxy a real GitHub release through generic remote, verify S3 storage, verify Redis TTL, verify Postgres artifact row, verify cache hit on second request
e2e/docker_test.go Pull a real image manifest + blob through Docker v2 proxy, verify blob deduplication, tag banning
e2e/management_test.go Full CRUD lifecycle: create remote via v2 API, proxy through it, list objects, evict object, flush cache, delete remote
e2e/virtual_test.go Create two helm remotes + virtual, fetch merged index, verify priority ordering
e2e/terraform_test.go Proxy terraform provider version listing + download info, verify URL rewriting to releases_remote
e2e/goproxy_test.go Proxy Go module @v/list, .info, .mod, .zip through GOPROXY remote, verify mutable vs immutable classification
e2e/gc_test.go Create artifact, delete remote, trigger GC, verify S3 blob cleaned up

Code Quality

  • gofmt / goimports — enforced in CI, run on save
  • golangci-lint — comprehensive linter suite (staticcheck, errcheck, govet, etc.)
  • go vet ./... — run in CI
  • Makefile targets: make test, make lint, make e2e, make fmt

Terraform Provider (Separate Repo)

Repo: terraform-provider-artifactapi Uses: pkg/client/ and pkg/models/ from the main module

provider "artifactapi" {
  endpoint = "https://artifactapi.k8s.syd1.au.unkin.net"
}

resource "artifactapi_remote" "terraform_registry" {
  name            = "terraform-registry"
  package_type    = "terraform"
  base_url        = "https://registry.terraform.io"
  description     = "Terraform provider registry"
  releases_remote = artifactapi_remote.hashicorp_releases.name

  immutable_patterns = [
    "[^/]+/[^/]+/[^/]+/download/[^/]+/[^/]+$",
  ]

  cache {
    immutable_ttl = 0
    mutable_ttl   = 300
  }
}

resource "artifactapi_remote" "hashicorp_releases" {
  name         = "hashicorp-releases"
  package_type = "generic"
  base_url     = "https://releases.hashicorp.com"

  immutable_patterns = [
    ".*\\.zip$",
    ".*SHA256SUMS(\\.sig)?$",
  ]

  cache {
    immutable_ttl = 0
    mutable_ttl   = 0
  }
}

resource "artifactapi_virtual" "helm" {
  name         = "helm"
  package_type = "helm"
  description  = "All helm repos merged"
  members      = [
    artifactapi_remote.jetstack.name,
    artifactapi_remote.hashicorp_helm.name,
  ]
}

Web UI (React + Vite — Separate Container)

Deployment

Separate Dockerfile.ui: multi-stage build (node → nginx). Served as its own container/pod. nginx proxies /api/* to the Go backend.

Pages

Route Content
/ Dashboard: total objects, storage used, dedup savings, bandwidth saved, top remotes chart, live SSE event feed, health indicators
/remotes Remote table: name, type, description, object count, size, hit rate, health. Filter by type, sort any column
/remotes/:name Config (read-only, "Managed by Terraform" badge), stats, object browser with search/sort/evict, flush actions
/virtuals Virtual table: name, type, members, merged object count
/virtuals/:name Member list with individual stats

All config is read-only — managed by Terraform.


TUI (Bubble Tea — Subcommand)

artifactapi tui --endpoint http://localhost:8000 or via ARTIFACTAPI_ENDPOINT env.

Uses pkg/client/ for all API calls (same client as Terraform provider).

View Key bindings
Dashboard summary stats, top remotes
Remotes list j/k navigate, / filter, Enter detail
Remote detail config + stats, Enter → object browser
Object browser / search, d evict, f flush
Virtuals j/k, Enter detail

Improvements Over v2

Area v2 (Python) v3 (Go)
S3 paths Hashed, opaque Content-addressed CAS
Config YAML files, mtime reload Terraform via API
Package types 8 types 10 types (+ terraform, goproxy)
Virtual repos Helm only Helm + PyPI, extensible
Deduplication Docker blobs only All types via CAS
Revalidation Opt-in flag Default for all mutable
Access logging None Per-artifact in Postgres
GC None Background goroutine
Upstream health Per-request Circuit breaker
S3 backends MinIO only MinIO, Ceph, AWS (minio-go)
UI None Web dashboard + TUI
Binary Python + venv Static Go binary
Frontend N/A Separate container (React)
Testing Mocked unit tests Unit + e2e with real backends

Implementation Phases

Phase 1: Core Engine + Models

  • Go module, Makefile (make build test lint fmt e2e), Dockerfile, docker-compose
  • pkg/models/ — all domain types
  • PostgreSQL schema + migrations
  • S3 storage layer with CAS (minio-go/v7)
  • Redis cache layer (TTL, locks)
  • Proxy engine: fetch-or-cache, classifier, revalidator
  • Generic + Docker providers (most complexity: OCI auth, CAS, tag banning)
  • Health + metrics endpoints
  • Unit tests for all packages
  • Milestone: proxy Docker + generic, cache in S3, track in Postgres

Phase 2: All Providers

  • Helm (using helm.sh/helm/v3/pkg/repo)
  • PyPI (stdlib x/net/html)
  • npm (stdlib encoding/json)
  • RPM (using rs3.io/go/rpm/repomd)
  • Alpine (using gitlab.alpinelinux.org/alpine/go)
  • Puppet Forge (stdlib encoding/json)
  • Terraform (using hashicorp/terraform-registry-address)
  • Go Modules / GOPROXY (using github.com/goproxy/goproxy)
  • Unit tests per provider
  • Milestone: feature parity with v2 + goproxy

Phase 3: Management API + Virtual Repos + GC

  • pkg/client/ — shared Go API client
  • v2 CRUD endpoints
  • Virtual repo engine: IndexMerger for Helm + PyPI
  • Circuit breaker
  • Access logging middleware
  • GC goroutine
  • Milestone: full API, virtuals, GC

Phase 4: End-to-End Tests

  • e2e/ test suite with testcontainers-go
  • Proxy, Docker, management, virtual, terraform, GC tests
  • CI pipeline: make e2e
  • Milestone: comprehensive e2e coverage

Phase 5: Terraform Provider

  • Separate repo: terraform-provider-artifactapi
  • Imports pkg/client/ and pkg/models/
  • artifactapi_remote + artifactapi_virtual resources + data sources
  • Import support
  • Milestone: manage all config via Terraform

Phase 6: Web UI

  • React + Vite in ui/
  • Dockerfile.ui (multi-stage → nginx)
  • Dashboard, remotes, objects, virtuals pages
  • SSE event feed
  • Milestone: full web UI in separate container

Phase 7: TUI

  • Bubble Tea in internal/tui/
  • Uses pkg/client/
  • Dashboard, remotes, objects, virtuals views
  • Milestone: TUI feature parity with web UI

Phase 8: Migration + Cutover

  • Migration tool: v2 YAML → Terraform HCL + terraform import commands
  • S3 rehash script: {remote}/{hash16}/{file}blobs/sha256/{content_hash}
  • Parallel run, response comparison
  • Cutover

Makefile Targets

.PHONY: build test lint fmt e2e docker docker-ui

build:                  ## Build Go binary
	go build -o bin/artifactapi ./cmd/artifactapi

test:                   ## Run unit tests
	go test ./...

lint:                   ## Run golangci-lint + go vet
	golangci-lint run ./...
	go vet ./...

fmt:                    ## Format code (gofmt + goimports)
	gofmt -w .
	goimports -w .

e2e:                    ## Run end-to-end tests (requires Docker)
	go test -tags=e2e -count=1 -timeout=5m ./e2e/...

docker:                 ## Build API server Docker image
	docker build -t artifactapi .

docker-ui:              ## Build frontend Docker image
	docker build -t artifactapi-ui -f ui/Dockerfile.ui ui/

compose:                ## Start full stack (API + UI + Postgres + Redis + MinIO)
	docker compose up -d