Commit Graph

20 Commits

Author SHA1 Message Date
benvin b46c116f6b Feat/v3 go rewrite (#47)
ci/woodpecker/tag/docker Pipeline was successful
Complete rewrite of ArtifactAPI from Python/FastAPI to Go as a single binary.

Core engine:
- 10 package providers: generic, docker, helm, pypi, npm, rpm, alpine,
  puppet, terraform, goproxy — each with built-in mutable patterns
- Content-addressable storage (SHA256 dedup across all remotes)
- Three-tier caching: Redis (TTL/locks) → S3/MinIO (blobs) → upstream
- Classifier with allowlist/blocklist per-remote (empty = allow all)
- Circuit breaker, conditional revalidation, stale-on-error
- Background garbage collection for orphaned blobs
- Access logging to PostgreSQL

API:
- v1 proxy endpoints (backwards compatible)
- v2 management API: CRUD remotes/virtuals, object browser, stats,
  health, SSE events, probe/test endpoint
- Virtual repos with index merging (Helm YAML + PyPI HTML)

Frontend (React + Vite, separate Dockerfile):
- Dashboard with stats, health indicators, top remotes
- Remotes list with type filter, remote detail with config/patterns
- Object browser with pagination and evict
- Test Remote page: probe any remote path, see headers/size/timing
- Virtuals page with expandable member lists

TUI (Bubble Tea):
- Dashboard, remotes list/detail, object browser, virtuals
- Vim-style navigation, artifactapi tui --endpoint <url>

Infrastructure:
- S3 client supports MinIO, Ceph RGW, AWS S3 (minio-go)
- PostgreSQL schema with migrations
- Docker Compose: API + UI + Postgres 17 + Redis 7 + MinIO
- Makefile with Go version check, build/test/lint/fmt/e2e targets
- Distroless Docker image (~15MB)

Testing:
- Unit tests for models, classifier, providers, mergers
- E2E tests with testcontainers-go (real Postgres/Redis/MinIO)

Terraform config:
- All 40 production remotes + helm virtual as HCL
- Provider repo: terraform-provider-artifactapi v0.0.1 (separate)

---------

Co-authored-by: Ben Vincent <ben@unkin.net>
Reviewed-on: #47
2026-06-07 19:30:35 +10:00
unkinben 99cc71f56c feat: add Terraform/OpenTofu registry remote type (#45)
## Summary

- New `terraform` package type implementing the [Terraform Registry Protocol](https://developer.hashicorp.com/terraform/internals/provider-registry-protocol)
- `construct_url` prepends `/v1/providers/` so paths like `hashicorp/vault/versions` map to `registry.terraform.io/v1/providers/hashicorp/vault/versions`
- `resolve_content` rewrites `download_url`, `shasums_url`, and `shasums_signature_url` in per-version download info JSON to route through a companion `releases_remote` (generic remote proxying `releases.hashicorp.com`)
- Built-in mutable pattern for `{namespace}/{type}/versions` — version lists expire and are re-fetched; per-version download info is immutable
- Client configuration via `.terraformrc` / `.tofurc` host block — no changes to `.tf` provider source addresses needed

## Test plan

- [x] 8 unit tests covering mutable detection, URL rewriting, binary pass-through, `construct_url` correctness, and cache miss behaviour
- [x] End-to-end: OpenTofu 1.10.3 pulling `hashicorp/vault v4.5.0` through docker-compose stack — `tofu init` succeeded, provider installed and signed
- [x] Verified `download_url` / `shasums_url` rewritten to `hashicorp-releases` proxy in cached response
- [x] All 339 tests pass

Reviewed-on: #45
Co-authored-by: Ben Vincent <ben@unkin.net>
Co-committed-by: Ben Vincent <ben@unkin.net>
2026-06-06 23:51:52 +10:00
unkinben 9287cf7cf2 feat: add Puppet Forge remote type (#44)
## Summary

- Adds \`package: puppet\` for proxying Puppet Forge (forgeapi.puppet.com)
- \`remote/puppet.py\` rewrites JSON responses: absolute forge URLs → proxy URLs, and relative \`/v3/files/\` \`file_uri\` paths → absolute proxy URLs. g10k uses Go's \`url.ResolveReference\`, so an absolute \`file_uri\` overrides the base URL entirely — tarballs are fetched directly from the proxy without a second hop
- Built-in mutable patterns: \`^v3/modules/\` and \`^v3/releases\` (module metadata); tarballs at \`v3/files/\` are configured as immutable via \`immutable_patterns\`
- 9 new tests covering mutable detection, URL rewriting (relative \`file_uri\` and absolute forge URLs), content-type, tarball pass-through, and pattern blocking

## Client configuration

**g10k config file** (\`forge_base_url\` at root level):
\`\`\`yaml
cachedir: /tmp/g10k
forge_base_url: https://artifacts.example.com/api/v1/remote/puppet-forge
sources:
  control:
    remote: git@git.example.com:puppet/control.git
    basedir: /etc/puppetlabs/code/environments
\`\`\`

**Puppetfile** (\`forge.baseUrl\` directive, works with \`-puppetfile\` mode):
\`\`\`ruby
forge.baseUrl https://artifacts.example.com/api/v1/remote/puppet-forge

mod 'puppetlabs-stdlib', '9.7.0'
\`\`\`

## Test plan

- [x] 331 unit tests pass (\`make test\`)
- [x] End-to-end: g10k 0.9.10 on AlmaLinux 9 via \`forge_base_url\` — stdlib 9.7.0, inifile 6.2.0, concat 9.1.0 installed; proxy logs confirm cache MISS → fetch → ADD for metadata and tarballs
- [x] End-to-end: \`forge.baseUrl\` Puppetfile directive with \`-puppetfile\` mode — same result

Reviewed-on: #44
2026-05-17 10:56:50 +10:00
unkinben ff2aefeef4 feat: add ban_tags_enabled/ban_tags to docker remotes to block named tags (#43)
ci/woodpecker/tag/docker Pipeline was successful
Adds two per-remote config keys for docker remotes:

  ban_tags_enabled: false   # opt-in, default off
  ban_tags:
    - latest
    - edge

When ban_tags_enabled is true and a manifest request arrives for a named
tag in ban_tags, the proxy returns 403. sha256-addressed pulls are never
blocked, so images already pulled can still be referenced by digest.
Blob requests are unaffected.

Reviewed-on: #43
2026-05-10 22:13:11 +10:00
unkinben 8a7f26b193 feat: cache parsed member indexes as msgpack to skip YAML re-parse on rebuild (#40)
ci/woodpecker/tag/docker Pipeline was successful
Closes #36

## Summary

- After fetching a member's `index.yaml` (from upstream or S3), the handler now parses it and stores a compact msgpack file (`index.msgpack`) alongside the raw YAML in S3
- On subsequent virtual rebuilds (member caches valid, virtual TTL expired), the handler loads the msgpack file instead of re-parsing raw YAML — eliminating the costliest phase
- `_entries_to_msgpack_safe()` converts datetime/date objects to ISO strings before packing (msgpack cannot natively serialize Python datetimes)
- `_merge_helm_indexes()` accepts `list[dict | None]` as pre-parsed entries; falls back to raw YAML parse when msgpack is unavailable
- `_VirtualHandler.merge()` protocol updated to pass pre-parsed entries to all future handler implementations
- Broken msgpack is detected and rebuilt from raw YAML automatically

## Performance

Phase breakdown (19-member helm-all virtual, 14 MB total):

| Phase | Time | % |
|---|---|---|
| YAML parse (eliminated) | 6314 ms | 60% |
| URL rewrite + dedup | 33 ms | 0.3% |
| YAML dump | 4124 ms | 39% |

| Scenario | Before (CSafeLoader only, #34) | After |
|---|---|---|
| Cold rebuild (upstream fetch) | ~21s | ~26s (+5s for msgpack build, one-time) |
| **Warm rebuild (S3 hit, virtual expired)** | **~9.6s** | **~5.9s (38% faster)** |
| Virtual cache hit | ~0.03s | ~0.03s |

Log line confirms msgpack hits: `msgpack=19/19`

## Test plan

- 297 tests pass
- `TestEntriesToMsgpackSafe`: datetime/date serialization, empty input, round-trip
- `TestMergeHelmIndexesWithParsed`: pre-parsed path produces identical output to raw-bytes path
- `TestGetMemberIndexMsgpack`: msgpack hit, cold-build, broken msgpack fallback, upstream failure
- Docker warm-rebuild measured at 5.9s vs 9.6s baseline

Reviewed-on: #40
2026-05-02 17:15:31 +10:00
unkinben 1656664dfa refactor: split config into remotes/virtuals/locals sections (#31)
ci/woodpecker/tag/docker Pipeline was successful
Repository types now live under dedicated top-level keys instead of a
shared remotes: block distinguished by a type field:

  remotes:   caching proxy remotes (no type field needed)
  virtuals:  virtual merged-index repositories
  locals:    local upload repositories

Routes for local repos move from /api/v1/remote/ to /api/v1/local/.
config.py gains get_virtual_config() and get_local_config() lookups.
Root endpoint now reports all three sections. Drop root conf.d/ (was
an exact duplicate of examples/conf.d-method/).

Reviewed-on: #31
2026-04-30 23:50:20 +10:00
unkinben c7baae8d0d feat: add virtual repository support for unified index merging (#30)
Adds a new virtual repo type that merges indexes from multiple member remotes
of the same package type. Currently supports helm (index.yaml merge with URL
rewriting). Member fetches run in parallel; merged index is Redis-cached at
min(mutable_ttl) across members.

Reviewed-on: #30
2026-04-29 23:01:14 +10:00
unkinben 64266f40e9 feat: support config.d directory for split configuration (closes #20)
ci/woodpecker/pr/pre-commit Pipeline was successful
ci/woodpecker/pr/test Pipeline was successful
ci/woodpecker/pr/build Pipeline was successful
CONFIG_PATH now accepts a directory path (all *.yaml files merged) or a
main file with a config_dir key pointing to a drop-in directory. Remotes
are merged alphabetically across files; later files win on conflicts.
2026-04-28 23:21:02 +10:00
unkinben 3bd3ca8b74 feat: quarantine new releases to prevent supply chain attacks
ci/woodpecker/pr/pre-commit Pipeline was successful
ci/woodpecker/pr/test Pipeline was successful
ci/woodpecker/pr/build Pipeline was successful
Add per-remote quarantine support: when quarantine_new=true and quarantine_days=N,
immutable artifacts published within the last N days are blocked with 404 until
the quarantine window expires.

- ConfigManager.get_quarantine_config() reads quarantine_new/quarantine_days
- RedisCache.store/get_artifact_published() persist Last-Modified per artifact
- proxy._check_quarantine() enforces the window; fails open when date is unknown
- proxy._fetch_last_modified() HEAD-requests upstream to discover publish date
- Docker proxy route wires quarantine checks on both cache-hit and cache-miss
- remotes.yaml: quarantine_new/quarantine_days added to pypi example (3-day window)
- README: documents quarantine configuration
2026-04-28 23:01:52 +10:00
unkinben e6d9b175ce refactor: extract route handler logic into artifact/ subpackage
ci/woodpecker/pr/pre-commit Pipeline was successful
ci/woodpecker/pr/test Pipeline was successful
ci/woodpecker/pr/build Pipeline was successful
Each route in main.py is now a single-line delegation to an artifact submodule:
- artifact/proxy.py  — remote artifact GET, caching, mutable revalidation
- artifact/local.py  — local repo upload/check/delete
- artifact/docker.py — Docker Registry v2 proxy + ping
- artifact/discovery.py — GitHub release discovery + bulk cache
- artifact/flush.py  — cache flush

UpstreamUnreachable, cache_single_artifact, _upstream_reachable and
check_upstream_changed moved from main.py to artifact/proxy.py.
Tests updated to patch at their new locations.

All 187 tests pass.
2026-04-28 22:21:01 +10:00
unkinben 0daca40156 refactor: add storage/s3 and auth/docker submodules
- storage/s3.py: S3Storage moved from storage.py; storage/__init__.py re-exports it
- auth/docker.py: Docker Bearer token logic moved from docker_auth.py
- docker_auth.py: thin shim re-exporting all public symbols (including _token_cache)
  for backwards compatibility with existing test and import paths
- main.py: now imports get_docker_token_for_response from .auth

All 187 tests pass.
2026-04-28 22:15:04 +10:00
unkinben 0df726467a refactor: split cache, database, and remote logic into submodules
cache/redis.py, database/postgres.py, and remote/{base,generic,helm,npm,python,rpm}.py
replace the flat modules. All public symbols re-exported from their package
__init__.py for backwards compatibility. No functional changes; all 187 tests pass.

Closes #19
2026-04-28 22:09:58 +10:00
unkinben 0c780c1bd1 chore: cleanup the readme
ci/woodpecker/pr/pre-commit Pipeline was successful
ci/woodpecker/pr/test Pipeline was successful
ci/woodpecker/pr/build Pipeline was successful
2026-04-28 21:57:14 +10:00
unkinben 3352a3e886 refactor: simplify pypi and npm URL rewriting — single remote, no redundant config keys
ci/woodpecker/pr/pre-commit Pipeline was successful
ci/woodpecker/pr/test Pipeline was successful
ci/woodpecker/pr/build Pipeline was successful
- npm: remove npm_files_url/npm_files_remote; rewrite uses base_url and
  remote name directly (same approach as helm)
- npm: replace hardcoded .tgz extension check with immutable_patterns match
- pypi: collapse pypi + pypi-files into a single remote (base_url points
  to files.pythonhosted.org); simple/ requests are transparently fetched
  from pypi.org with no extra config required
- pypi: remove pypi_files_url/pypi_files_remote from pypi and pypi-gitea
- pypi: rewrite check now uses immutable_patterns (consistent with npm)
- Update README for both pypi and npm sections
- Update tests and fixtures to reflect single-remote pypi config
2026-04-27 22:42:23 +10:00
unkinben 4ca89b9159 feat: add helm chart repository caching proxy
ci/woodpecker/pr/pre-commit Pipeline was successful
ci/woodpecker/pr/test Pipeline was successful
ci/woodpecker/pr/build Pipeline was successful
- Add helm package type with index.yaml as mutable (TTL-based) and
  .tgz chart tarballs as immutable
- Rewrite chart URLs in index.yaml to serve tarballs via proxy cache
- Add text/yaml content-type detection for .yaml/.yml files
- Add hashicorp-helm example remote in remotes.yaml
- Update README with Helm chart repository proxy section
- Add tests for helm mutable patterns and route behaviour
2026-04-27 22:17:31 +10:00
unkinben d585ab425c feat: add npm remote type with metadata URL rewriting and caching
ci/woodpecker/pr/test Pipeline was successful
ci/woodpecker/pr/pre-commit Pipeline was successful
ci/woodpecker/pr/build Pipeline was successful
- Add `npm` package type to config with no built-in mutable defaults;
  users set explicit mutable_patterns (e.g. ^(?!.*\.tgz$).*) and
  immutable_patterns (e.g. \.tgz$) in remotes.yaml
- Rewrite dist.tarball URLs in metadata JSON on the fly so tarball
  downloads pass through the same proxy remote instead of hitting
  npmjs.org directly
- Single-remote design: npm_files_remote points back to itself since
  both metadata and tarballs are served from registry.npmjs.org
- Add .tgz to _get_content_type (application/gzip)
- Add example npm remote to remotes.yaml
- Add npm proxy section to README covering remotes.yaml config,
  client setup (npm/yarn/pnpm), rewriting behaviour, and
  mutable vs immutable path table
- Add tests for mutable pattern matching, URL rewriting, content-type,
  scoped packages, cache miss, and tarball immutability
2026-04-27 20:28:31 +10:00
unkinben 5de912db75 docs: describe PyPI remote usage with uv system/user uv.toml
ci/woodpecker/pr/pre-commit Pipeline was successful
ci/woodpecker/pr/test Pipeline was successful
ci/woodpecker/pr/build Pipeline was successful
2026-04-27 14:37:41 +10:00
unkinben fe837dabf7 feat: keep stale mutables when upstream is unreachable; update README
ci/woodpecker/pr/pre-commit Pipeline was successful
ci/woodpecker/pr/test Pipeline was successful
ci/woodpecker/pr/build Pipeline was successful
When a mutable file's TTL expires and the upstream backend cannot be
contacted (network error or timeout), the cached copy is kept and its
TTL refreshed instead of being evicted. This keeps RPM repodata, Alpine
indexes, branch archives, and other mutable data available during
upstream outages.

Adds UpstreamUnreachable exception and _upstream_reachable() helper.
check_upstream_changed() now raises UpstreamUnreachable on network
errors (was silently returning True). handle_expired_mutable() catches
the exception on the check_mutable_updates path and calls
_upstream_reachable() on the plain-expiry path.

README updated to current immutable/mutable terminology and documents
all new caching features.
2026-04-27 11:38:50 +10:00
unkinben f3394b9ca6 docs: add RKE2 image rewriting guide and expand pattern examples
Add a new "Docker Image Rewriting with RKE2" section covering:
- How the /v2/ proxy integrates with registries.yaml mirror rewrites
- Per-registry examples (docker.io, ghcr.io, registry.k8s.io, quay.io)
- include_patterns for restricting which images are cached
- TLS CA configuration for private certificate authorities
- Apply and verification commands

Expand the Configuration section with:
- Richer include_patterns examples (anchored, extension, architecture,
  Docker image name patterns, repodata directories)
- New index_patterns section explaining built-in defaults per package
  type and how to add custom patterns (Helm index.yaml, APT InRelease/
  Packages.gz, extra RPM comps.xml)
2026-04-25 20:20:42 +10:00
unkinben 46711eec6a Initial implementation of generic artifact storage system
- FastAPI-based caching proxy for remote file servers
- YAML configuration for multiple remotes (GitHub, Gitea, HashiCorp, etc.)
- Direct URL API: /api/v1/remote/{remote}/{path} with auto-download and caching
- Pattern-based access control with regex filtering
- S3/MinIO backend storage with predictable paths
- Docker Compose setup with MinIO for local development
2026-01-06 21:13:13 +11:00