b46c116f6b
ci/woodpecker/tag/docker Pipeline was successful
Complete rewrite of ArtifactAPI from Python/FastAPI to Go as a single binary. Core engine: - 10 package providers: generic, docker, helm, pypi, npm, rpm, alpine, puppet, terraform, goproxy — each with built-in mutable patterns - Content-addressable storage (SHA256 dedup across all remotes) - Three-tier caching: Redis (TTL/locks) → S3/MinIO (blobs) → upstream - Classifier with allowlist/blocklist per-remote (empty = allow all) - Circuit breaker, conditional revalidation, stale-on-error - Background garbage collection for orphaned blobs - Access logging to PostgreSQL API: - v1 proxy endpoints (backwards compatible) - v2 management API: CRUD remotes/virtuals, object browser, stats, health, SSE events, probe/test endpoint - Virtual repos with index merging (Helm YAML + PyPI HTML) Frontend (React + Vite, separate Dockerfile): - Dashboard with stats, health indicators, top remotes - Remotes list with type filter, remote detail with config/patterns - Object browser with pagination and evict - Test Remote page: probe any remote path, see headers/size/timing - Virtuals page with expandable member lists TUI (Bubble Tea): - Dashboard, remotes list/detail, object browser, virtuals - Vim-style navigation, artifactapi tui --endpoint <url> Infrastructure: - S3 client supports MinIO, Ceph RGW, AWS S3 (minio-go) - PostgreSQL schema with migrations - Docker Compose: API + UI + Postgres 17 + Redis 7 + MinIO - Makefile with Go version check, build/test/lint/fmt/e2e targets - Distroless Docker image (~15MB) Testing: - Unit tests for models, classifier, providers, mergers - E2E tests with testcontainers-go (real Postgres/Redis/MinIO) Terraform config: - All 40 production remotes + helm virtual as HCL - Provider repo: terraform-provider-artifactapi v0.0.1 (separate) --------- Co-authored-by: Ben Vincent <ben@unkin.net> Reviewed-on: #47
881 lines
39 KiB
Markdown
881 lines
39 KiB
Markdown
# ArtifactAPI v3 — Go Rewrite Plan
|
||
|
||
## Context
|
||
|
||
ArtifactAPI is a production artifact proxy/cache serving ~42 remotes (Docker registries, Helm repos, RPM/Alpine repos, GitHub releases, PyPI, npm, Puppet Forge, Terraform registries, Go module proxies) across a Kubernetes cluster. The current Python (FastAPI) implementation works but has architectural debt: opaque hashed S3 paths, no UI for visibility, YAML config files that drift, no garbage collection, no access logging, and virtual repos limited to Helm only.
|
||
|
||
The v3 rewrite targets: a single Go binary (API + TUI), a separate React frontend (own Dockerfile), a Terraform provider (separate repo), content-addressable storage, and a cleaner data model that makes the cache inspectable and manageable.
|
||
|
||
**Repo**: Same repo (`git.unkin.net/unkin/artifactapi`), new branch.
|
||
**Module**: `git.unkin.net/unkin/artifactapi`
|
||
**Frontend**: React + Vite, separate Dockerfile, talks to API
|
||
**Terraform provider**: Separate repo (`terraform-provider-artifactapi`)
|
||
|
||
---
|
||
|
||
## Architecture Overview
|
||
|
||
```
|
||
┌───────────────────────────────────┐ ┌──────────────────────┐
|
||
│ Go Binary (API + TUI) │ │ Frontend Container │
|
||
│ │ │ │
|
||
│ ┌──────────┐ ┌───────────────┐ │ │ React + Vite SPA │
|
||
│ │ REST API │ │ Proxy Engine │ │◄───│ nginx / node serve │
|
||
│ │ /api/v2 │ │ /api/v1/... │ │ │ Dockerfile.ui │
|
||
│ │ │ │ /v2/... (OCI) │ │ └──────────────────────┘
|
||
│ └────┬─────┘ └──────┬────────┘ │
|
||
│ │ │ │ ┌──────────────────────┐
|
||
│ ┌────┴───────────────┴────────┐ │ │ Terraform Provider │
|
||
│ │ Data Layer │ │◄───│ (separate repo) │
|
||
│ │ PostgreSQL · Redis · S3 │ │ └──────────────────────┘
|
||
│ └─────────────────────────────┘ │
|
||
│ │ ┌──────────────────────┐
|
||
│ ┌─────────────────────────────┐ │ │ TUI (subcommand) │
|
||
│ │ artifactapi tui │──│───►│ artifactapi tui │
|
||
│ └─────────────────────────────┘ │ │ --endpoint <url> │
|
||
└───────────────────────────────────┘ └──────────────────────┘
|
||
```
|
||
|
||
Three independent deployment units:
|
||
1. **Go binary** — API server + TUI subcommand (single `Dockerfile`)
|
||
2. **React frontend** — SPA served by nginx (`Dockerfile.ui`), talks to `/api/v2`
|
||
3. **Terraform provider** — separate repo, calls `/api/v2` CRUD
|
||
|
||
---
|
||
|
||
## Project Structure (Modular)
|
||
|
||
```
|
||
artifactapi/
|
||
├── cmd/
|
||
│ └── artifactapi/
|
||
│ └── main.go # entrypoint: serve / tui subcommands
|
||
│
|
||
├── pkg/ # PUBLIC — importable by terraform provider, CLI tools
|
||
│ ├── models/ # shared domain types
|
||
│ │ ├── remote.go # Remote, RemoteConfig, PackageType enum
|
||
│ │ ├── virtual.go # Virtual, VirtualConfig
|
||
│ │ ├── artifact.go # Artifact, Blob, AccessLogEntry
|
||
│ │ ├── local.go # LocalFile, LocalRepo
|
||
│ │ └── stats.go # RemoteStats, OverviewStats
|
||
│ └── client/ # typed Go API client (used by TUI + Terraform provider)
|
||
│ ├── client.go # Client struct, base HTTP
|
||
│ ├── remotes.go # remote CRUD methods
|
||
│ ├── virtuals.go # virtual CRUD methods
|
||
│ ├── objects.go # object browse/evict methods
|
||
│ └── stats.go # stats methods
|
||
│
|
||
├── internal/ # PRIVATE — server internals
|
||
│ ├── server/
|
||
│ │ ├── server.go # HTTP server setup, router
|
||
│ │ └── middleware.go # logging, recovery, request-id, access logging
|
||
│ │
|
||
│ ├── api/
|
||
│ │ ├── v1/ # proxy endpoints (v1 compat)
|
||
│ │ │ ├── proxy.go # GET /api/v1/remote/{name}/{path}
|
||
│ │ │ ├── docker.go # /v2/{name}/{path}
|
||
│ │ │ ├── virtual.go # GET /api/v1/virtual/{name}/{path}
|
||
│ │ │ └── local.go # CRUD /api/v1/local/{name}/{path}
|
||
│ │ └── v2/ # management API
|
||
│ │ ├── remotes.go # CRUD + stats
|
||
│ │ ├── virtuals.go # CRUD
|
||
│ │ ├── objects.go # browse/evict cached objects
|
||
│ │ ├── stats.go # overview, top-remotes
|
||
│ │ ├── events.go # SSE stream
|
||
│ │ └── health.go # health, metrics
|
||
│ │
|
||
│ ├── provider/ # package-type providers (registry protocol handlers)
|
||
│ │ ├── provider.go # Provider interface + registry
|
||
│ │ ├── generic/
|
||
│ │ │ ├── generic.go
|
||
│ │ │ └── generic_test.go
|
||
│ │ ├── docker/
|
||
│ │ │ ├── docker.go # OCI Distribution v2 via go-containerregistry
|
||
│ │ │ ├── auth.go # Bearer token fetch + cache
|
||
│ │ │ └── docker_test.go
|
||
│ │ ├── helm/
|
||
│ │ │ ├── helm.go # index rewriting via helm.sh/helm/v3/pkg/repo
|
||
│ │ │ ├── merger.go # virtual index merge
|
||
│ │ │ └── helm_test.go
|
||
│ │ ├── pypi/
|
||
│ │ │ ├── pypi.go # simple index HTML rewriting
|
||
│ │ │ ├── merger.go # virtual simple index merge
|
||
│ │ │ └── pypi_test.go
|
||
│ │ ├── npm/
|
||
│ │ │ ├── npm.go # metadata JSON rewriting
|
||
│ │ │ └── npm_test.go
|
||
│ │ ├── rpm/
|
||
│ │ │ ├── rpm.go # repodata patterns
|
||
│ │ │ └── rpm_test.go
|
||
│ │ ├── alpine/
|
||
│ │ │ ├── alpine.go # APKINDEX patterns
|
||
│ │ │ └── alpine_test.go
|
||
│ │ ├── puppet/
|
||
│ │ │ ├── puppet.go # file_uri JSON rewriting
|
||
│ │ │ └── puppet_test.go
|
||
│ │ ├── terraform/
|
||
│ │ │ ├── terraform.go # registry protocol, download URL rewriting
|
||
│ │ │ └── terraform_test.go
|
||
│ │ └── goproxy/
|
||
│ │ ├── goproxy.go # Go module proxy protocol (GOPROXY)
|
||
│ │ └── goproxy_test.go
|
||
│ │
|
||
│ ├── proxy/
|
||
│ │ ├── engine.go # core fetch-or-cache logic
|
||
│ │ ├── engine_test.go
|
||
│ │ ├── classifier.go # immutable vs mutable classification
|
||
│ │ ├── classifier_test.go
|
||
│ │ ├── revalidator.go # conditional HEAD requests (ETag/Last-Modified)
|
||
│ │ └── circuit.go # per-remote circuit breaker
|
||
│ │
|
||
│ ├── storage/
|
||
│ │ ├── s3.go # S3 client (minio-go — works with MinIO, Ceph, AWS)
|
||
│ │ ├── s3_test.go
|
||
│ │ ├── cas.go # content-addressable store logic
|
||
│ │ └── cas_test.go
|
||
│ │
|
||
│ ├── cache/
|
||
│ │ ├── redis.go # TTL management, fetch locks
|
||
│ │ ├── redis_test.go
|
||
│ │ └── lock.go # distributed lock abstraction
|
||
│ │
|
||
│ ├── database/
|
||
│ │ ├── postgres.go # connection pool, migration runner
|
||
│ │ ├── queries/ # SQL query files or sqlc-generated code
|
||
│ │ │ ├── remotes.sql.go
|
||
│ │ │ ├── virtuals.sql.go
|
||
│ │ │ ├── artifacts.sql.go
|
||
│ │ │ └── access_log.sql.go
|
||
│ │ └── migrations/ # golang-migrate SQL files
|
||
│ │ ├── 001_initial.up.sql
|
||
│ │ └── 001_initial.down.sql
|
||
│ │
|
||
│ ├── metrics/
|
||
│ │ └── prometheus.go # counters, gauges, histograms
|
||
│ │
|
||
│ ├── gc/
|
||
│ │ ├── gc.go # background garbage collection goroutine
|
||
│ │ └── gc_test.go
|
||
│ │
|
||
│ ├── tui/
|
||
│ │ ├── app.go # Bubble Tea main model
|
||
│ │ ├── views/
|
||
│ │ │ ├── dashboard.go
|
||
│ │ │ ├── remotes.go
|
||
│ │ │ ├── objects.go
|
||
│ │ │ └── virtuals.go
|
||
│ │ └── components/
|
||
│ │ ├── table.go
|
||
│ │ └── statusbar.go
|
||
│ │
|
||
│ └── config/
|
||
│ └── env.go # environment variable parsing + validation
|
||
│
|
||
├── ui/ # React frontend — SEPARATE DOCKERFILE
|
||
│ ├── src/
|
||
│ │ ├── App.tsx
|
||
│ │ ├── pages/
|
||
│ │ │ ├── Dashboard.tsx
|
||
│ │ │ ├── Remotes.tsx
|
||
│ │ │ ├── RemoteDetail.tsx
|
||
│ │ │ ├── Virtuals.tsx
|
||
│ │ │ └── Objects.tsx
|
||
│ │ ├── components/
|
||
│ │ │ ├── RemoteTable.tsx
|
||
│ │ │ ├── ObjectBrowser.tsx
|
||
│ │ │ ├── StatsCard.tsx
|
||
│ │ │ └── EventFeed.tsx
|
||
│ │ └── api/
|
||
│ │ └── client.ts # typed API client
|
||
│ ├── package.json
|
||
│ ├── vite.config.ts
|
||
│ ├── tsconfig.json
|
||
│ ├── Dockerfile.ui # multi-stage: node build → nginx
|
||
│ └── nginx.conf # proxy /api/* to backend, serve SPA
|
||
│
|
||
├── e2e/ # end-to-end integration tests
|
||
│ ├── e2e_test.go # TestMain spins up docker-compose stack
|
||
│ ├── proxy_test.go # proxy through real remotes
|
||
│ ├── docker_test.go # Docker v2 protocol e2e
|
||
│ ├── management_test.go # v2 API CRUD
|
||
│ ├── virtual_test.go # virtual repo merge e2e
|
||
│ └── docker-compose.e2e.yml # postgres + redis + minio for tests
|
||
│
|
||
├── go.mod
|
||
├── go.sum
|
||
├── Makefile
|
||
├── Dockerfile # Go binary (API server + TUI)
|
||
├── Dockerfile.ui # symlink or copy → ui/Dockerfile.ui
|
||
└── docker-compose.yml
|
||
```
|
||
|
||
### Key Modularisation Decisions
|
||
|
||
- **`pkg/models/`** — Shared domain types importable by the Terraform provider and any external tooling. No dependencies on internal packages
|
||
- **`pkg/client/`** — Typed Go API client used by both the TUI and the Terraform provider. Depends only on `pkg/models/` and stdlib
|
||
- **`internal/provider/`** — Each package type is its own subpackage with isolated tests. A provider registry maps `PackageType → Provider`
|
||
- **`internal/database/queries/`** — Use [sqlc](https://sqlc.dev/) to generate type-safe query functions from SQL, or hand-written query files
|
||
- **`e2e/`** — Separate test binary that spins up a real docker-compose stack
|
||
|
||
---
|
||
|
||
## Go Ecosystem Libraries
|
||
|
||
Prefer existing, maintained Go modules over writing protocol handlers from scratch.
|
||
|
||
### Package-Type Libraries
|
||
|
||
| Package Type | Go Module | What It Gives Us |
|
||
|---|---|---|
|
||
| **Docker/OCI** | `github.com/google/go-containerregistry` | Full Registry v2/OCI client: manifest parsing, auth challenges, blob operations. `pkg/registry` can implement a v2 server. Reference: `github.com/regclient/regclient` |
|
||
| **Helm** | `helm.sh/helm/v3/pkg/repo` | Parse/generate `index.yaml`, `IndexFile`/`ChartVersion` types, URL entries. Used directly for merge |
|
||
| **Terraform** | `github.com/hashicorp/terraform-registry-address` | Provider/module address parsing, `ForRegistryProtocol()` URL generation. Protocol spec: provider registry protocol v1 |
|
||
| **Go Modules** | `github.com/goproxy/goproxy` | Minimalist GOPROXY protocol handler, implements full spec as `http.Handler`. Handles `/@v/list`, `/@v/{v}.info`, `/@v/{v}.mod`, `/@v/{v}.zip`, `/@latest` |
|
||
| **RPM** | `rs3.io/go/rpm/repomd` | Parse `repomd.xml`, `primary.xml` with proper XML namespace handling |
|
||
| **Alpine** | `gitlab.alpinelinux.org/alpine/go` | Official Alpine library: parse APKINDEX, `.apk` files |
|
||
| **PyPI** | stdlib `golang.org/x/net/html` | No dedicated Go PyPI library exists. Parse simple index HTML with `x/net/html`, extract `<a>` tags. Minimal — the rewriting is just href replacement |
|
||
| **npm** | stdlib `encoding/json` | npm metadata is JSON — parse with stdlib, rewrite `dist.tarball` URLs. No special library needed |
|
||
| **Puppet Forge** | stdlib `encoding/json` | Forge API is JSON — parse and rewrite `file_uri` fields. Community lib `github.com/johnmccabe/go-puppetforge` exists but is thin; stdlib suffices |
|
||
|
||
### Infrastructure Libraries
|
||
|
||
| Purpose | Go Module | Why This One |
|
||
|---|---|---|
|
||
| **HTTP router** | `github.com/go-chi/chi/v5` | Lightweight, stdlib `http.Handler` compatible, middleware chain |
|
||
| **PostgreSQL** | `github.com/jackc/pgx/v5` | Pure Go, connection pooling, COPY support, prepared statements |
|
||
| **SQL generation** | `github.com/sqlc-dev/sqlc` | Generate type-safe Go from SQL queries — no ORM, no reflection |
|
||
| **Redis** | `github.com/redis/go-redis/v9` | Full Redis client, pipelining, pub/sub |
|
||
| **S3 (MinIO/Ceph/AWS)** | `github.com/minio/minio-go/v7` | Native S3-compatible client. Works with MinIO, Ceph RGW, AWS S3, any S3-compatible backend out of the box. Lighter than aws-sdk-go-v2, purpose-built for S3 compat |
|
||
| **DB migrations** | `github.com/golang-migrate/migrate/v4` | SQL file-based migrations, CLI + library |
|
||
| **Prometheus** | `github.com/prometheus/client_golang` | Counters, gauges, histograms |
|
||
| **TUI** | `github.com/charmbracelet/bubbletea` | Elm-architecture TUI framework |
|
||
| **TUI styling** | `github.com/charmbracelet/lipgloss` | Terminal styling |
|
||
| **TUI components** | `github.com/charmbracelet/bubbles` | Table, text input, spinner, etc. |
|
||
| **Structured logging** | `log/slog` (stdlib) | Go 1.21+ structured logging, zero dependencies |
|
||
| **Testing** | `github.com/stretchr/testify` | Assertions + require for unit tests |
|
||
| **Test containers** | `github.com/testcontainers/testcontainers-go` | Spin up Postgres/Redis/MinIO in e2e tests |
|
||
|
||
### S3 Client: Multi-Backend Support
|
||
|
||
Using `minio-go/v7` as the S3 client because it natively supports:
|
||
- **MinIO** — primary development/production target
|
||
- **Ceph RGW** — S3-compatible via endpoint config
|
||
- **AWS S3** — via region + credential config
|
||
- **Any S3-compatible** — GCS (interop mode), Wasabi, DigitalOcean Spaces, etc.
|
||
|
||
No abstraction layer needed — `minio-go` handles endpoint differences internally. Config:
|
||
```go
|
||
client, _ := minio.New(endpoint, &minio.Options{
|
||
Creds: credentials.NewStaticV4(accessKey, secretKey, ""),
|
||
Secure: useTLS,
|
||
Region: region, // optional, for AWS
|
||
})
|
||
```
|
||
|
||
---
|
||
|
||
## Data Layer
|
||
|
||
### PostgreSQL Schema
|
||
|
||
```sql
|
||
-- Remotes: managed exclusively by Terraform
|
||
CREATE TABLE remotes (
|
||
name TEXT PRIMARY KEY,
|
||
package_type TEXT NOT NULL, -- generic, docker, helm, pypi, npm, rpm, alpine, puppet, terraform, goproxy
|
||
base_url TEXT NOT NULL,
|
||
description TEXT DEFAULT '',
|
||
username TEXT DEFAULT '',
|
||
password TEXT DEFAULT '',
|
||
immutable_ttl INTEGER DEFAULT 0,
|
||
mutable_ttl INTEGER DEFAULT 3600,
|
||
check_mutable BOOLEAN DEFAULT TRUE,
|
||
immutable_patterns TEXT[] DEFAULT '{}', -- user-defined immutable patterns
|
||
mutable_patterns TEXT[] DEFAULT '{}', -- user-defined mutable patterns (merged with provider built-ins)
|
||
allowlist TEXT[] DEFAULT '{}', -- if empty, allow all paths; if non-empty, only matching paths proxied
|
||
blocklist TEXT[] DEFAULT '{}', -- always denied, checked before allowlist
|
||
ban_tags_enabled BOOLEAN DEFAULT FALSE,
|
||
ban_tags TEXT[] DEFAULT '{}',
|
||
quarantine_enabled BOOLEAN DEFAULT FALSE,
|
||
quarantine_days INTEGER DEFAULT 3,
|
||
stale_on_error BOOLEAN DEFAULT TRUE,
|
||
releases_remote TEXT DEFAULT '', -- terraform type: name of CDN remote for download URL rewriting
|
||
managed_by TEXT DEFAULT '', -- 'terraform' or empty
|
||
created_at TIMESTAMPTZ DEFAULT NOW(),
|
||
updated_at TIMESTAMPTZ DEFAULT NOW()
|
||
);
|
||
|
||
-- Virtual repositories
|
||
CREATE TABLE virtuals (
|
||
name TEXT PRIMARY KEY,
|
||
package_type TEXT NOT NULL,
|
||
description TEXT DEFAULT '',
|
||
members TEXT[] NOT NULL,
|
||
managed_by TEXT DEFAULT '',
|
||
created_at TIMESTAMPTZ DEFAULT NOW(),
|
||
updated_at TIMESTAMPTZ DEFAULT NOW()
|
||
);
|
||
|
||
-- Content-addressable blob storage tracking
|
||
CREATE TABLE blobs (
|
||
content_hash TEXT PRIMARY KEY,
|
||
s3_key TEXT NOT NULL,
|
||
size_bytes BIGINT NOT NULL,
|
||
content_type TEXT DEFAULT 'application/octet-stream',
|
||
created_at TIMESTAMPTZ DEFAULT NOW()
|
||
);
|
||
|
||
-- Artifact metadata: maps (remote, path) → content blob
|
||
CREATE TABLE artifacts (
|
||
id BIGSERIAL PRIMARY KEY,
|
||
remote_name TEXT NOT NULL REFERENCES remotes(name) ON DELETE CASCADE,
|
||
path TEXT NOT NULL,
|
||
content_hash TEXT NOT NULL REFERENCES blobs(content_hash),
|
||
upstream_etag TEXT DEFAULT '',
|
||
upstream_last_modified TIMESTAMPTZ,
|
||
first_seen_at TIMESTAMPTZ DEFAULT NOW(),
|
||
last_fetched_at TIMESTAMPTZ DEFAULT NOW(),
|
||
last_accessed_at TIMESTAMPTZ DEFAULT NOW(),
|
||
fetch_count BIGINT DEFAULT 1,
|
||
access_count BIGINT DEFAULT 1,
|
||
UNIQUE(remote_name, path)
|
||
);
|
||
|
||
CREATE INDEX idx_artifacts_remote ON artifacts(remote_name);
|
||
CREATE INDEX idx_artifacts_last_accessed ON artifacts(last_accessed_at);
|
||
|
||
-- Local file uploads
|
||
CREATE TABLE local_files (
|
||
id BIGSERIAL PRIMARY KEY,
|
||
repo_name TEXT NOT NULL,
|
||
file_path TEXT NOT NULL,
|
||
content_hash TEXT NOT NULL REFERENCES blobs(content_hash),
|
||
created_at TIMESTAMPTZ DEFAULT NOW(),
|
||
UNIQUE(repo_name, file_path)
|
||
);
|
||
|
||
-- Access log (append-only, powers dashboards)
|
||
CREATE TABLE access_log (
|
||
id BIGSERIAL PRIMARY KEY,
|
||
remote_name TEXT NOT NULL,
|
||
path TEXT NOT NULL,
|
||
cache_hit BOOLEAN NOT NULL,
|
||
size_bytes BIGINT DEFAULT 0,
|
||
upstream_ms INTEGER DEFAULT 0,
|
||
client_ip TEXT DEFAULT '',
|
||
created_at TIMESTAMPTZ DEFAULT NOW()
|
||
);
|
||
|
||
CREATE INDEX idx_access_log_remote_time ON access_log(remote_name, created_at);
|
||
```
|
||
|
||
### Redis Usage (Ephemeral Only)
|
||
|
||
| Key pattern | Type | TTL | Purpose |
|
||
|---|---|---|---|
|
||
| `ttl:{remote}:{path}` | STRING | remote's immutable/mutable TTL | Artifact freshness — existence = still fresh |
|
||
| `lock:{remote}:{path}` | STRING (NX) | 30s | Fetch lock — prevents thundering herd |
|
||
| `etag:{remote}:{path}` | STRING | same as TTL key | Cached ETag for conditional revalidation |
|
||
| `circuit:{remote}` | STRING | configurable | Circuit breaker — consecutive failure count |
|
||
|
||
Losing Redis = all TTLs expire = next request re-validates upstream. No data loss.
|
||
|
||
### S3 Layout (Content-Addressable)
|
||
|
||
```
|
||
artifacts-bucket/
|
||
├── blobs/sha256/{content_hash} # immutable CAS blobs
|
||
├── indexes/{remote}/{path} # mutable index files (helm, pypi, rpm, etc.)
|
||
├── indexes/{virtual}/{path} # merged virtual indexes
|
||
└── local/{repo}/{path} # user uploads (CAS-backed via blobs table)
|
||
```
|
||
|
||
---
|
||
|
||
## Terraform Remote Type (New in v2)
|
||
|
||
The `terraform` package type proxies the Terraform Provider Registry Protocol:
|
||
|
||
- **URL construction**: prepends `/v1/providers/` to request paths
|
||
- **Built-in mutable pattern**: `[^/]+/[^/]+/versions$` (version listings change over time)
|
||
- **Built-in immutable pattern**: `[^/]+/[^/]+/[^/]+/download/[^/]+/[^/]+$` (per-version download info is fixed)
|
||
- **Response rewriting**: download info JSON — rewrites `download_url`, `shasums_url`, `shasums_signature_url` to route through a companion `releases_remote` (e.g., `hashicorp-releases` generic remote)
|
||
- **Config**: requires `releases_remote` field pointing to the CDN remote that serves the actual binaries
|
||
|
||
Uses `github.com/hashicorp/terraform-registry-address` for address parsing and protocol-compliant URL generation.
|
||
|
||
---
|
||
|
||
## Go Module Proxy Remote Type (New)
|
||
|
||
The `goproxy` package type implements the GOPROXY protocol (Go module proxy):
|
||
|
||
| Endpoint | Mutability | Description |
|
||
|---|---|---|
|
||
| `{module}/@v/list` | Mutable | Plain text list of known versions |
|
||
| `{module}/@latest` | Mutable | JSON metadata for latest version |
|
||
| `{module}/@v/{version}.info` | Immutable | JSON version metadata (`Version`, `Time`) |
|
||
| `{module}/@v/{version}.mod` | Immutable | `go.mod` file for that version |
|
||
| `{module}/@v/{version}.zip` | Immutable | Source archive for that version |
|
||
|
||
- **No URL rewriting needed** — responses are self-contained (no embedded URLs)
|
||
- **Config**: `base_url` points to upstream proxy (e.g., `https://proxy.golang.org`)
|
||
- **Client usage**: set `GOPROXY=https://artifactapi.example.com/api/v1/remote/goproxy`
|
||
- Uses `github.com/goproxy/goproxy` for protocol handling
|
||
|
||
---
|
||
|
||
## Allowlist / Blocklist / Automatic Mutable Patterns
|
||
|
||
### Access Control (Per-Remote)
|
||
|
||
| Field | Default | Behavior |
|
||
|---|---|---|
|
||
| `blocklist` | `[]` (empty) | If a path matches any blocklist pattern → **403 Forbidden**. Checked first |
|
||
| `allowlist` | `[]` (empty) | If empty → **allow everything**. If non-empty → only matching paths are proxied; everything else → **403** |
|
||
|
||
Evaluation order: blocklist → allowlist → proxy. No allowlist + no blocklist = open proxy (default).
|
||
|
||
### Automatic Mutable Patterns (Per-Provider Built-ins)
|
||
|
||
Each provider declares built-in mutable patterns that are **always merged** with user-defined `mutable_patterns`. Users never need to configure these — the provider knows which paths change over time.
|
||
|
||
| Provider | Built-in Mutable Patterns | Rationale |
|
||
|---|---|---|
|
||
| **generic** | *(none)* | No convention for what's mutable |
|
||
| **docker** | `/manifests/(?!sha256:)[^/]+$`, `/tags/list$` | Tag manifests change; digest manifests don't |
|
||
| **helm** | `index\.yaml$` | Chart index changes when new charts are published |
|
||
| **pypi** | `simple/` | Package index pages change with new releases |
|
||
| **npm** | `^[^/]+$` (package metadata, not `.tgz`) | Package metadata changes; tarballs are immutable |
|
||
| **rpm** | `repomd\.xml$`, `repodata/.*`, `Packages\.gz$` | Repo metadata rebuilt on every publish |
|
||
| **alpine** | `APKINDEX\.tar\.gz$` | Package index rebuilt on every publish |
|
||
| **puppet** | `^v3/modules/`, `^v3/releases` | Module metadata changes with new releases |
|
||
| **terraform** | `[^/]+/[^/]+/versions$` | Provider version listings grow over time |
|
||
| **goproxy** | `@v/list$`, `@latest$` | Version list and latest pointer change |
|
||
|
||
These are returned by `Provider.BuiltinMutablePatterns()` and merged at classification time:
|
||
```
|
||
effective_mutable = provider.BuiltinMutablePatterns() ∪ remote.mutable_patterns
|
||
```
|
||
|
||
If a path matches `effective_mutable` → use `mutable_ttl`. If it matches `remote.immutable_patterns` → use `immutable_ttl`. Immutable patterns take precedence over mutable when both match.
|
||
|
||
---
|
||
|
||
## API Design
|
||
|
||
### v1 Proxy Endpoints (Backwards Compatible)
|
||
|
||
| Method | Path | Description |
|
||
|---|---|---|
|
||
| `GET` | `/api/v1/remote/{name}/{path}` | Proxy/cache artifact |
|
||
| `GET` | `/api/v1/virtual/{name}/{path}` | Virtual repo proxy |
|
||
| `GET/HEAD` | `/v2/{name}/{path}` | Docker Registry v2 |
|
||
| `GET` | `/v2/` | Docker v2 ping |
|
||
| `GET/PUT/HEAD/DELETE` | `/api/v1/local/{name}/{path}` | Local repo CRUD |
|
||
|
||
### v2 Management API (New)
|
||
|
||
```
|
||
GET /api/v2/remotes → [{name, package_type, base_url, description, stats}]
|
||
GET /api/v2/remotes/{name} → {full config + stats + health}
|
||
POST /api/v2/remotes → create remote (Terraform provider)
|
||
PUT /api/v2/remotes/{name} → update remote (Terraform provider)
|
||
DELETE /api/v2/remotes/{name} → delete remote — cascades artifacts, GC cleans S3
|
||
|
||
GET /api/v2/virtuals → [{name, package_type, members, stats}]
|
||
GET /api/v2/virtuals/{name} → {full config + member details}
|
||
POST /api/v2/virtuals → create virtual
|
||
PUT /api/v2/virtuals/{name} → update virtual
|
||
DELETE /api/v2/virtuals/{name} → delete virtual
|
||
|
||
GET /api/v2/remotes/{name}/objects → paginated objects
|
||
?q=pattern&sort=size|accessed|age&page=1&per_page=50
|
||
DELETE /api/v2/remotes/{name}/objects/{path} → evict specific cached object
|
||
DELETE /api/v2/remotes/{name}/cache → flush cache
|
||
?type=all|indexes|blobs
|
||
|
||
GET /api/v2/stats → overview stats
|
||
GET /api/v2/stats/top-remotes → top remotes by size/requests/hit-rate
|
||
|
||
GET /api/v2/health → {status, postgres, redis, s3, uptime}
|
||
GET /metrics → Prometheus format
|
||
GET /api/v2/events → SSE stream
|
||
```
|
||
|
||
---
|
||
|
||
## Proxy Engine
|
||
|
||
### Request Flow
|
||
|
||
```
|
||
Client Request
|
||
│
|
||
▼
|
||
Classify (immutable/mutable/denied)
|
||
│
|
||
├── blocklist match → 403
|
||
├── allowlist non-empty + no match → 403
|
||
│
|
||
▼
|
||
Check Redis TTL key
|
||
│
|
||
├── exists (fresh) → serve from S3, log access
|
||
│
|
||
├── missing (expired or uncached)
|
||
│ │
|
||
│ ▼
|
||
│ Acquire fetch lock (Redis SETNX, 30s TTL)
|
||
│ │
|
||
│ ├── lock acquired
|
||
│ │ ├── mutable + check_mutable + have ETag → HEAD upstream
|
||
│ │ │ ├── 304 → refresh TTL, serve from S3
|
||
│ │ │ └── changed → full fetch
|
||
│ │ └── full fetch from upstream
|
||
│ │ → provider.RewriteResponse() if needed
|
||
│ │ → CAS store (hash → check blobs → upload if new)
|
||
│ │ → upsert artifact in Postgres
|
||
│ │ → set Redis TTL + release lock
|
||
│ │ → on upstream error + stale_on_error → refresh TTL, serve stale
|
||
│ │
|
||
│ └── lock not acquired → poll S3 briefly, serve if another pod fetched it
|
||
│
|
||
▼
|
||
Stream response from S3, log access
|
||
```
|
||
|
||
### Circuit Breaker
|
||
|
||
Per-remote, tracked in Redis. Closed → Open (after N failures) → Half-open (after cooldown). Exposed via `GET /api/v2/remotes/{name}` health field.
|
||
|
||
### Content-Addressable Storage
|
||
|
||
1. Stream upstream → temp file, compute SHA256 inline
|
||
2. Check `blobs` table for hash
|
||
3. Exists → skip S3 upload, upsert `artifacts` row only
|
||
4. New → upload to `blobs/sha256/{hash}`, insert both rows
|
||
|
||
### Garbage Collection
|
||
|
||
Background goroutine (configurable interval, default 1h):
|
||
1. Orphaned blobs: delete S3 objects whose `content_hash` has no referencing `artifacts` or `local_files` rows
|
||
2. Cold artifacts: optional per-remote, delete artifacts not accessed in N days
|
||
3. Remote deletion: `ON DELETE CASCADE` handles Postgres; GC sweeps orphaned blobs
|
||
|
||
---
|
||
|
||
## Package Providers
|
||
|
||
### Provider Interface
|
||
|
||
```go
|
||
type Provider interface {
|
||
Type() models.PackageType
|
||
BuiltinMutablePatterns() []*regexp.Regexp
|
||
BuiltinImmutablePatterns() []*regexp.Regexp
|
||
ContentType(path string) string
|
||
UpstreamURL(remote models.Remote, path string) string
|
||
RewriteResponse(body []byte, remote models.Remote, proxyBaseURL string) ([]byte, error)
|
||
AuthHeaders(ctx context.Context, remote models.Remote) (http.Header, error)
|
||
}
|
||
|
||
type IndexMerger interface {
|
||
MergeIndexes(members []MemberIndex, proxyBaseURL string) ([]byte, error)
|
||
}
|
||
```
|
||
|
||
### Provider Registry
|
||
|
||
```go
|
||
var registry = map[models.PackageType]Provider{
|
||
models.PackageGeneric: &generic.Provider{},
|
||
models.PackageDocker: &docker.Provider{},
|
||
models.PackageHelm: &helm.Provider{},
|
||
models.PackagePyPI: &pypi.Provider{},
|
||
models.PackageNPM: &npm.Provider{},
|
||
models.PackageRPM: &rpm.Provider{},
|
||
models.PackageAlpine: &alpine.Provider{},
|
||
models.PackagePuppet: &puppet.Provider{},
|
||
models.PackageTerraform: &terraform.Provider{},
|
||
models.PackageGoProxy: &goproxy.Provider{},
|
||
}
|
||
|
||
func Get(t models.PackageType) (Provider, error) { ... }
|
||
```
|
||
|
||
Each provider lives in its own subpackage under `internal/provider/` with its own `_test.go`.
|
||
|
||
---
|
||
|
||
## Testing Strategy
|
||
|
||
### Unit Tests
|
||
|
||
Every package gets `_test.go` files alongside the source. Run with `go test ./...`.
|
||
|
||
| Package | What's Tested |
|
||
|---|---|
|
||
| `internal/provider/docker/` | Auth token parsing/caching, manifest classification, tag banning, URL construction, blob key generation |
|
||
| `internal/provider/helm/` | `index.yaml` parsing (using `helm.sh/helm/v3/pkg/repo`), URL rewriting, index merging |
|
||
| `internal/provider/pypi/` | Simple index HTML parsing, URL rewriting, index merging |
|
||
| `internal/provider/npm/` | Metadata JSON rewriting (`dist.tarball` URLs) |
|
||
| `internal/provider/terraform/` | Registry URL construction, download info JSON rewriting, `releases_remote` URL extraction |
|
||
| `internal/provider/rpm/` | Mutable pattern matching (repodata) |
|
||
| `internal/provider/alpine/` | Mutable pattern matching (APKINDEX) |
|
||
| `internal/provider/puppet/` | `file_uri` JSON rewriting |
|
||
| `internal/proxy/` | Classifier (immutable vs mutable vs denied), circuit breaker state transitions, revalidator logic |
|
||
| `internal/storage/` | CAS key generation, dedup detection, S3 operation mocking |
|
||
| `internal/cache/` | Redis TTL set/check, fetch lock acquire/release/contention |
|
||
| `internal/gc/` | Orphan detection queries, cold artifact selection |
|
||
| `pkg/models/` | Model validation, PackageType enum |
|
||
| `pkg/client/` | API client request/response serialization |
|
||
|
||
### End-to-End Tests
|
||
|
||
Located in `e2e/`. Use `testcontainers-go` to spin up real Postgres, Redis, and MinIO containers. The test binary starts the actual `artifactapi` server against these backends.
|
||
|
||
```go
|
||
// e2e/e2e_test.go
|
||
func TestMain(m *testing.M) {
|
||
// Start postgres, redis, minio via testcontainers-go
|
||
// Run migrations
|
||
// Start artifactapi server on random port
|
||
// Run tests
|
||
// Tear down
|
||
}
|
||
```
|
||
|
||
| Test File | What's Tested |
|
||
|---|---|
|
||
| `e2e/proxy_test.go` | Proxy a real GitHub release through generic remote, verify S3 storage, verify Redis TTL, verify Postgres artifact row, verify cache hit on second request |
|
||
| `e2e/docker_test.go` | Pull a real image manifest + blob through Docker v2 proxy, verify blob deduplication, tag banning |
|
||
| `e2e/management_test.go` | Full CRUD lifecycle: create remote via v2 API, proxy through it, list objects, evict object, flush cache, delete remote |
|
||
| `e2e/virtual_test.go` | Create two helm remotes + virtual, fetch merged index, verify priority ordering |
|
||
| `e2e/terraform_test.go` | Proxy terraform provider version listing + download info, verify URL rewriting to releases_remote |
|
||
| `e2e/goproxy_test.go` | Proxy Go module `@v/list`, `.info`, `.mod`, `.zip` through GOPROXY remote, verify mutable vs immutable classification |
|
||
| `e2e/gc_test.go` | Create artifact, delete remote, trigger GC, verify S3 blob cleaned up |
|
||
|
||
### Code Quality
|
||
|
||
- `gofmt` / `goimports` — enforced in CI, run on save
|
||
- `golangci-lint` — comprehensive linter suite (staticcheck, errcheck, govet, etc.)
|
||
- `go vet ./...` — run in CI
|
||
- Makefile targets: `make test`, `make lint`, `make e2e`, `make fmt`
|
||
|
||
---
|
||
|
||
## Terraform Provider (Separate Repo)
|
||
|
||
**Repo**: `terraform-provider-artifactapi`
|
||
**Uses**: `pkg/client/` and `pkg/models/` from the main module
|
||
|
||
```hcl
|
||
provider "artifactapi" {
|
||
endpoint = "https://artifactapi.k8s.syd1.au.unkin.net"
|
||
}
|
||
|
||
resource "artifactapi_remote" "terraform_registry" {
|
||
name = "terraform-registry"
|
||
package_type = "terraform"
|
||
base_url = "https://registry.terraform.io"
|
||
description = "Terraform provider registry"
|
||
releases_remote = artifactapi_remote.hashicorp_releases.name
|
||
|
||
immutable_patterns = [
|
||
"[^/]+/[^/]+/[^/]+/download/[^/]+/[^/]+$",
|
||
]
|
||
|
||
cache {
|
||
immutable_ttl = 0
|
||
mutable_ttl = 300
|
||
}
|
||
}
|
||
|
||
resource "artifactapi_remote" "hashicorp_releases" {
|
||
name = "hashicorp-releases"
|
||
package_type = "generic"
|
||
base_url = "https://releases.hashicorp.com"
|
||
|
||
immutable_patterns = [
|
||
".*\\.zip$",
|
||
".*SHA256SUMS(\\.sig)?$",
|
||
]
|
||
|
||
cache {
|
||
immutable_ttl = 0
|
||
mutable_ttl = 0
|
||
}
|
||
}
|
||
|
||
resource "artifactapi_virtual" "helm" {
|
||
name = "helm"
|
||
package_type = "helm"
|
||
description = "All helm repos merged"
|
||
members = [
|
||
artifactapi_remote.jetstack.name,
|
||
artifactapi_remote.hashicorp_helm.name,
|
||
]
|
||
}
|
||
```
|
||
|
||
---
|
||
|
||
## Web UI (React + Vite — Separate Container)
|
||
|
||
### Deployment
|
||
|
||
Separate `Dockerfile.ui`: multi-stage build (node → nginx). Served as its own container/pod. nginx proxies `/api/*` to the Go backend.
|
||
|
||
### Pages
|
||
|
||
| Route | Content |
|
||
|---|---|
|
||
| `/` | Dashboard: total objects, storage used, dedup savings, bandwidth saved, top remotes chart, live SSE event feed, health indicators |
|
||
| `/remotes` | Remote table: name, type, description, object count, size, hit rate, health. Filter by type, sort any column |
|
||
| `/remotes/:name` | Config (read-only, "Managed by Terraform" badge), stats, object browser with search/sort/evict, flush actions |
|
||
| `/virtuals` | Virtual table: name, type, members, merged object count |
|
||
| `/virtuals/:name` | Member list with individual stats |
|
||
|
||
All config is read-only — managed by Terraform.
|
||
|
||
---
|
||
|
||
## TUI (Bubble Tea — Subcommand)
|
||
|
||
`artifactapi tui --endpoint http://localhost:8000` or via `ARTIFACTAPI_ENDPOINT` env.
|
||
|
||
Uses `pkg/client/` for all API calls (same client as Terraform provider).
|
||
|
||
| View | Key bindings |
|
||
|---|---|
|
||
| Dashboard | summary stats, top remotes |
|
||
| Remotes list | `j`/`k` navigate, `/` filter, `Enter` detail |
|
||
| Remote detail | config + stats, `Enter` → object browser |
|
||
| Object browser | `/` search, `d` evict, `f` flush |
|
||
| Virtuals | `j`/`k`, `Enter` detail |
|
||
|
||
---
|
||
|
||
## Improvements Over v2
|
||
|
||
| Area | v2 (Python) | v3 (Go) |
|
||
|---|---|---|
|
||
| S3 paths | Hashed, opaque | Content-addressed CAS |
|
||
| Config | YAML files, mtime reload | Terraform via API |
|
||
| Package types | 8 types | 10 types (+ terraform, goproxy) |
|
||
| Virtual repos | Helm only | Helm + PyPI, extensible |
|
||
| Deduplication | Docker blobs only | All types via CAS |
|
||
| Revalidation | Opt-in flag | Default for all mutable |
|
||
| Access logging | None | Per-artifact in Postgres |
|
||
| GC | None | Background goroutine |
|
||
| Upstream health | Per-request | Circuit breaker |
|
||
| S3 backends | MinIO only | MinIO, Ceph, AWS (minio-go) |
|
||
| UI | None | Web dashboard + TUI |
|
||
| Binary | Python + venv | Static Go binary |
|
||
| Frontend | N/A | Separate container (React) |
|
||
| Testing | Mocked unit tests | Unit + e2e with real backends |
|
||
|
||
---
|
||
|
||
## Implementation Phases
|
||
|
||
### Phase 1: Core Engine + Models
|
||
- Go module, Makefile (`make build test lint fmt e2e`), Dockerfile, docker-compose
|
||
- `pkg/models/` — all domain types
|
||
- PostgreSQL schema + migrations
|
||
- S3 storage layer with CAS (`minio-go/v7`)
|
||
- Redis cache layer (TTL, locks)
|
||
- Proxy engine: fetch-or-cache, classifier, revalidator
|
||
- Generic + Docker providers (most complexity: OCI auth, CAS, tag banning)
|
||
- Health + metrics endpoints
|
||
- Unit tests for all packages
|
||
- **Milestone**: proxy Docker + generic, cache in S3, track in Postgres
|
||
|
||
### Phase 2: All Providers
|
||
- Helm (using `helm.sh/helm/v3/pkg/repo`)
|
||
- PyPI (stdlib `x/net/html`)
|
||
- npm (stdlib `encoding/json`)
|
||
- RPM (using `rs3.io/go/rpm/repomd`)
|
||
- Alpine (using `gitlab.alpinelinux.org/alpine/go`)
|
||
- Puppet Forge (stdlib `encoding/json`)
|
||
- Terraform (using `hashicorp/terraform-registry-address`)
|
||
- Go Modules / GOPROXY (using `github.com/goproxy/goproxy`)
|
||
- Unit tests per provider
|
||
- **Milestone**: feature parity with v2 + goproxy
|
||
|
||
### Phase 3: Management API + Virtual Repos + GC
|
||
- `pkg/client/` — shared Go API client
|
||
- v2 CRUD endpoints
|
||
- Virtual repo engine: `IndexMerger` for Helm + PyPI
|
||
- Circuit breaker
|
||
- Access logging middleware
|
||
- GC goroutine
|
||
- **Milestone**: full API, virtuals, GC
|
||
|
||
### Phase 4: End-to-End Tests
|
||
- `e2e/` test suite with `testcontainers-go`
|
||
- Proxy, Docker, management, virtual, terraform, GC tests
|
||
- CI pipeline: `make e2e`
|
||
- **Milestone**: comprehensive e2e coverage
|
||
|
||
### Phase 5: Terraform Provider
|
||
- Separate repo: `terraform-provider-artifactapi`
|
||
- Imports `pkg/client/` and `pkg/models/`
|
||
- `artifactapi_remote` + `artifactapi_virtual` resources + data sources
|
||
- Import support
|
||
- **Milestone**: manage all config via Terraform
|
||
|
||
### Phase 6: Web UI
|
||
- React + Vite in `ui/`
|
||
- `Dockerfile.ui` (multi-stage → nginx)
|
||
- Dashboard, remotes, objects, virtuals pages
|
||
- SSE event feed
|
||
- **Milestone**: full web UI in separate container
|
||
|
||
### Phase 7: TUI
|
||
- Bubble Tea in `internal/tui/`
|
||
- Uses `pkg/client/`
|
||
- Dashboard, remotes, objects, virtuals views
|
||
- **Milestone**: TUI feature parity with web UI
|
||
|
||
### Phase 8: Migration + Cutover
|
||
- Migration tool: v2 YAML → Terraform HCL + `terraform import` commands
|
||
- S3 rehash script: `{remote}/{hash16}/{file}` → `blobs/sha256/{content_hash}`
|
||
- Parallel run, response comparison
|
||
- Cutover
|
||
|
||
---
|
||
|
||
## Makefile Targets
|
||
|
||
```makefile
|
||
.PHONY: build test lint fmt e2e docker docker-ui
|
||
|
||
build: ## Build Go binary
|
||
go build -o bin/artifactapi ./cmd/artifactapi
|
||
|
||
test: ## Run unit tests
|
||
go test ./...
|
||
|
||
lint: ## Run golangci-lint + go vet
|
||
golangci-lint run ./...
|
||
go vet ./...
|
||
|
||
fmt: ## Format code (gofmt + goimports)
|
||
gofmt -w .
|
||
goimports -w .
|
||
|
||
e2e: ## Run end-to-end tests (requires Docker)
|
||
go test -tags=e2e -count=1 -timeout=5m ./e2e/...
|
||
|
||
docker: ## Build API server Docker image
|
||
docker build -t artifactapi .
|
||
|
||
docker-ui: ## Build frontend Docker image
|
||
docker build -t artifactapi-ui -f ui/Dockerfile.ui ui/
|
||
|
||
compose: ## Start full stack (API + UI + Postgres + Redis + MinIO)
|
||
docker compose up -d
|
||
```
|