## Summary
- Upload Python wheels/sdists to local PyPI repos with filename validation
- PEP 503 simple index computed on-demand from stored files
- Package names normalized per PEP 503 (lowercase, hyphens)
- Overwrites rejected (409 Conflict)
## Test plan
- [x] Build wheel with `uv build` → upload → verify simple index HTML → `uv pip install` from local repo
- [x] Bad filename rejection (400)
- [x] Overwrite rejection (409)
- [x] Hash integrity verification on download
Reviewed-on: #50
Co-authored-by: Ben Vincent <ben@unkin.net>
Co-committed-by: Ben Vincent <ben@unkin.net>
Introduces repo_type (remote/local) as a separate axis from package_type
so that any package type can be hosted locally. A terraform local repo
is package_type=terraform + repo_type=local.
- Remote model gains RepoType field (defaults to "remote")
- Database schema adds repo_type column with migration for existing DBs
- V1 proxy adds /api/v1/local/{name}/* route for serving local files
- V2 upload via PUT /api/v2/remotes/{name}/files/{ns}/{type}/{file}.zip
validates filename matches terraform-provider-{type}_{ver}_{os}_{arch}.zip
and returns 409 on duplicate (no overwrites)
- index.json and {version}.json are computed on-the-fly from uploaded zips
rather than stored as separate files
- V2 create validates repo_type and requires base_url only for remotes
---------
Co-authored-by: Ben Vincent <ben@unkin.net>
Reviewed-on: #49
- Objects page renders paths as a collapsible tree instead of flat list
with expand/collapse all, aggregated size/hits per directory
- Dashboard gains top-files-by-hits and top-files-by-bandwidth tables
- Backend: new /api/v2/stats/top-files-by-hits and
/api/v2/stats/top-files-by-bandwidth endpoints
- Raised per_page max to 5000 for objects listing
---------
Co-authored-by: Ben Vincent <ben@unkin.net>
Reviewed-on: #48
## Summary
- New `terraform` package type implementing the [Terraform Registry Protocol](https://developer.hashicorp.com/terraform/internals/provider-registry-protocol)
- `construct_url` prepends `/v1/providers/` so paths like `hashicorp/vault/versions` map to `registry.terraform.io/v1/providers/hashicorp/vault/versions`
- `resolve_content` rewrites `download_url`, `shasums_url`, and `shasums_signature_url` in per-version download info JSON to route through a companion `releases_remote` (generic remote proxying `releases.hashicorp.com`)
- Built-in mutable pattern for `{namespace}/{type}/versions` — version lists expire and are re-fetched; per-version download info is immutable
- Client configuration via `.terraformrc` / `.tofurc` host block — no changes to `.tf` provider source addresses needed
## Test plan
- [x] 8 unit tests covering mutable detection, URL rewriting, binary pass-through, `construct_url` correctness, and cache miss behaviour
- [x] End-to-end: OpenTofu 1.10.3 pulling `hashicorp/vault v4.5.0` through docker-compose stack — `tofu init` succeeded, provider installed and signed
- [x] Verified `download_url` / `shasums_url` rewritten to `hashicorp-releases` proxy in cached response
- [x] All 339 tests pass
Reviewed-on: #45
Co-authored-by: Ben Vincent <ben@unkin.net>
Co-committed-by: Ben Vincent <ben@unkin.net>
Adds two per-remote config keys for docker remotes:
ban_tags_enabled: false # opt-in, default off
ban_tags:
- latest
- edge
When ban_tags_enabled is true and a manifest request arrives for a named
tag in ban_tags, the proxy returns 403. sha256-addressed pulls are never
blocked, so images already pulled can still be referenced by digest.
Blob requests are unaffected.
Reviewed-on: #43
Tag manifests (e.g. library/nginx/manifests/latest) and their sha256-addressed
counterparts were stored at separate S3 keys with no cross-reference, so a
sha256 manifest request always missed cache even when the identical content had
just been stored under the tag key.
After serving any mutable (tag) manifest, compute the sha256 of the response
body and write it under the digest key (manifests/sha256:<hex>) if absent. The
next sha256-addressed pull hits cache immediately.
Also adds a short-lived Redis distributed lock (SET NX EX 30) around upstream
fetches so that concurrent pods racing for the same cold key poll storage for
up to 5 s before issuing a duplicate upstream request, eliminating the
thundering herd on deploy events.
Includes unit tests for both the lock primitives (acquire/release, fail-open
when Redis is unavailable) and the docker proxy behaviour (cross-link written
on tag hit, not written for sha256 requests, lock acquired/released, poll path
serves from cache without upstream fetch, fallback fetch when poll times out).
Reviewed-on: #42
Closes#34
## Summary
- At module load time, a `try/except` selects `yaml.CSafeLoader` / `yaml.CDumper` (C extensions) when libyaml is available, otherwise falls back to `yaml.SafeLoader` / `yaml.Dumper`
- `_HelmDumper` inherits from whichever dumper base was selected — custom datetime/date representers are registered the same way as before
- `_merge_helm_indexes` uses `yaml.load(raw_data, Loader=_YamlLoader)` instead of `yaml.safe_load`
- No change to `yaml.dump(...)` call — it already passes `Dumper=_HelmDumper`, which now inherits from the C base when available
- Five new tests in `TestYamlExtensionSelection` cover: loader/dumper base are classes, `_HelmDumper` inherits from the selected base, C extensions used when available, loader can parse YAML
## Measured performance gain
19-member `helm-all` virtual repo, real upstream data, Docker (AlmaLinux 9):
| | `merge=` time |
|---|---|
| Before (SafeLoader + Dumper) | **38,877ms** |
| After (CSafeLoader + CDumper) | **9,625ms** |
| Speedup | **4.0×** |
Local microbenchmark (500 charts × 10 versions × 19 members, 3 runs avg):
- Before: **40.8s** → After: **6.1s** (**6.7×** faster)
## Test plan
- [x] 283 unit tests pass (`make test`)
- [x] Wheel builds cleanly (`uv build --wheel`)
- [x] C extension confirmed available in AlmaLinux 9 container: `yaml.CSafeLoader: <class 'yaml.cyaml.CSafeLoader'>`
- [x] Baseline Docker timing measured with pure-Python path forced: merge=38,877ms
- [x] After Docker timing measured with C extension path: merge=9,625ms
Reviewed-on: #39
Closes#35
## Summary
- Wraps `handler.merge(...)` in `await asyncio.to_thread(...)` so the CPU-bound YAML parse/merge/dump runs in the thread pool instead of blocking the event loop
- Change is at the generic `handle()` dispatch site — applies to all current and future `_VirtualHandler` implementations without modification
- Also fixes a pre-existing bug in `examples/single-file/remotes.yaml` where `base_url` and `package` keys were merged onto a single line, preventing `docker-compose up` from starting the app
## Measured performance gain
19-member `helm-all` virtual repo, single uvicorn worker, cache miss (38s merge):
| | Concurrent `/health` latency |
|---|---|
| Before (blocking) | **37,721ms** for first request (stalled) |
| After (thread pool) | **8–63ms** for all requests |
## Test plan
- [x] 278 unit tests pass (`make test`)
- [x] Live concurrency test: cache miss merge started in background, 5 concurrent `/health` checks measured — all <65ms
- [x] Baseline comparison: same test with blocking call — first health check stalled 37.7s
Reviewed-on: #38
Closes#33
## Summary
- `_merge_helm_indexes` now parses each member's raw YAML first, then rewrites `urls` entries in-place via the new `_rewrite_urls` helper
- **Relative URLs** (e.g. `rancher-2.13.1.tgz`) are prepended with `{proxy_base}/api/v1/remote/{member_name}/`
- **Absolute URLs** matching `base_url` are rewritten to the proxy path (existing behaviour, now correct)
- **Absolute URLs** with a different prefix are left unchanged
- Removes the `_helm.resolve_content` raw-bytes detour from the virtual merge path; `remote/helm.py` is unchanged (still used for direct remote proxying)
## Test plan
- [x] 278 unit tests pass (`make test`)
- [x] New `TestRewriteUrls` class covering relative, absolute-match, absolute-no-match, leading-slash, and multi-URL cases
- [x] New `test_relative_urls_rewritten_to_proxy` in `TestMergeHelmIndexes`
- [x] Updated `test_first_member_wins_on_duplicate_name_and_version` to assert on proxy remote name (not upstream hostname)
- [x] Live Docker test: Rancher `index.yaml` relative URLs rewritten correctly to `http://localhost:8000/api/v1/remote/rancher-stable/rancher-2.14.1.tgz` etc.
- [x] `helm-all` virtual (19 members) returns HTTP 200 with 395k-line merged index on cache miss
Reviewed-on: #37
Repository types now live under dedicated top-level keys instead of a
shared remotes: block distinguished by a type field:
remotes: caching proxy remotes (no type field needed)
virtuals: virtual merged-index repositories
locals: local upload repositories
Routes for local repos move from /api/v1/remote/ to /api/v1/local/.
config.py gains get_virtual_config() and get_local_config() lookups.
Root endpoint now reports all three sections. Drop root conf.d/ (was
an exact duplicate of examples/conf.d-method/).
Reviewed-on: #31
Adds a new virtual repo type that merges indexes from multiple member remotes
of the same package type. Currently supports helm (index.yaml merge with URL
rewriting). Member fetches run in parallel; merged index is Redis-cached at
min(mutable_ttl) across members.
Reviewed-on: #30
examples/single-file/remotes.yaml — original monolithic config
examples/conf.d-method/ — one yaml per remote (alpine, github, pypi)
docker-compose updated to mount from examples/single-file/.
CONFIG_PATH now accepts a directory path (all *.yaml files merged) or a
main file with a config_dir key pointing to a drop-in directory. Remotes
are merged alphabetically across files; later files win on conflicts.
Add per-remote quarantine support: when quarantine_new=true and quarantine_days=N,
immutable artifacts published within the last N days are blocked with 404 until
the quarantine window expires.
- ConfigManager.get_quarantine_config() reads quarantine_new/quarantine_days
- RedisCache.store/get_artifact_published() persist Last-Modified per artifact
- proxy._check_quarantine() enforces the window; fails open when date is unknown
- proxy._fetch_last_modified() HEAD-requests upstream to discover publish date
- Docker proxy route wires quarantine checks on both cache-hit and cache-miss
- remotes.yaml: quarantine_new/quarantine_days added to pypi example (3-day window)
- README: documents quarantine configuration
Each route in main.py is now a single-line delegation to an artifact submodule:
- artifact/proxy.py — remote artifact GET, caching, mutable revalidation
- artifact/local.py — local repo upload/check/delete
- artifact/docker.py — Docker Registry v2 proxy + ping
- artifact/discovery.py — GitHub release discovery + bulk cache
- artifact/flush.py — cache flush
UpstreamUnreachable, cache_single_artifact, _upstream_reachable and
check_upstream_changed moved from main.py to artifact/proxy.py.
Tests updated to patch at their new locations.
All 187 tests pass.
- storage/s3.py: S3Storage moved from storage.py; storage/__init__.py re-exports it
- auth/docker.py: Docker Bearer token logic moved from docker_auth.py
- docker_auth.py: thin shim re-exporting all public symbols (including _token_cache)
for backwards compatibility with existing test and import paths
- main.py: now imports get_docker_token_for_response from .auth
All 187 tests pass.
cache/redis.py, database/postgres.py, and remote/{base,generic,helm,npm,python,rpm}.py
replace the flat modules. All public symbols re-exported from their package
__init__.py for backwards compatibility. No functional changes; all 187 tests pass.
Closes#19
- npm: remove npm_files_url/npm_files_remote; rewrite uses base_url and
remote name directly (same approach as helm)
- npm: replace hardcoded .tgz extension check with immutable_patterns match
- pypi: collapse pypi + pypi-files into a single remote (base_url points
to files.pythonhosted.org); simple/ requests are transparently fetched
from pypi.org with no extra config required
- pypi: remove pypi_files_url/pypi_files_remote from pypi and pypi-gitea
- pypi: rewrite check now uses immutable_patterns (consistent with npm)
- Update README for both pypi and npm sections
- Update tests and fixtures to reflect single-remote pypi config
- Add helm package type with index.yaml as mutable (TTL-based) and
.tgz chart tarballs as immutable
- Rewrite chart URLs in index.yaml to serve tarballs via proxy cache
- Add text/yaml content-type detection for .yaml/.yml files
- Add hashicorp-helm example remote in remotes.yaml
- Update README with Helm chart repository proxy section
- Add tests for helm mutable patterns and route behaviour
- Add `npm` package type to config with no built-in mutable defaults;
users set explicit mutable_patterns (e.g. ^(?!.*\.tgz$).*) and
immutable_patterns (e.g. \.tgz$) in remotes.yaml
- Rewrite dist.tarball URLs in metadata JSON on the fly so tarball
downloads pass through the same proxy remote instead of hitting
npmjs.org directly
- Single-remote design: npm_files_remote points back to itself since
both metadata and tarballs are served from registry.npmjs.org
- Add .tgz to _get_content_type (application/gzip)
- Add example npm remote to remotes.yaml
- Add npm proxy section to README covering remotes.yaml config,
client setup (npm/yarn/pnpm), rewriting behaviour, and
mutable vs immutable path table
- Add tests for mutable pattern matching, URL rewriting, content-type,
scoped packages, cache miss, and tarball immutability
- Add 'pypi' package type to config.py; simple/ paths are mutable by default
- Refactor content-type detection into _get_content_type() helper; add .whl
- Add _resolve_content() which rewrites files host URLs in simple index HTML
to go through the proxy (pypi_files_url / pypi_files_remote config keys),
and returns text/html content-type for simple index responses
- Add basic auth support for non-Docker remotes (username + password/token
in remote config); thread auth through _upstream_reachable and
check_upstream_changed so mutable TTL checks also authenticate
- Add 'pypi' remote (pypi.org simple index) and 'pypi-files' remote
(files.pythonhosted.org) to remotes.yaml; add 'pypi-gitea' example for
Gitea package registries where index and files share the same base URL
- Add unit tests: simple index URL rewriting, HTML content-type, .whl/.tar.gz
content-types, mutable index detection, and immutable pattern enforcement
When a mutable file's TTL expires and the upstream backend cannot be
contacted (network error or timeout), the cached copy is kept and its
TTL refreshed instead of being evicted. This keeps RPM repodata, Alpine
indexes, branch archives, and other mutable data available during
upstream outages.
Adds UpstreamUnreachable exception and _upstream_reachable() helper.
check_upstream_changed() now raises UpstreamUnreachable on network
errors (was silently returning True). handle_expired_mutable() catches
the exception on the check_mutable_updates path and calls
_upstream_reachable() on the plain-expiry path.
README updated to current immutable/mutable terminology and documents
all new caching features.
Deduplicates the expired-mutable TTL/redownload branching logic that
was copied verbatim between get_artifact and docker_v2_proxy. Adds the
missing happy-path test for a changed mutable file that is successfully
re-fetched from upstream.
When check_mutable_updates: true is set on a remote, expired user-defined
mutable files are revalidated before re-downloading:
- On expiry a conditional HEAD is sent with If-None-Match / If-Modified-Since
- 304 Not Modified: TTL is refreshed in Redis, S3 cache is untouched
- 200 / no conditional support: cache is invalidated and file re-downloaded
- Network error: safe fallback — assume changed, re-download
ETag and Last-Modified from upstream responses are stored in Redis under
mutable:meta:<remote>:<hash> (no expiry, cleaned up on re-download or
cache flush). The flag only applies to user-configured mutable_patterns;
built-in package-type defaults (APKINDEX, repomd.xml, Docker manifests)
are always re-fetched unconditionally.
cache/flush also clears mutable:meta:* keys alongside index:* keys.
Remove remotes.yaml from .gitignore and add header comments explaining
the immutable_patterns/mutable_patterns/cache keys. Marks the file
clearly as an example to copy and adapt; warns against committing
real credentials.
Replace the include_patterns/index_patterns split with a clearer
immutable_patterns/mutable_patterns model:
- immutable_patterns: artifacts cached indefinitely (no TTL)
- mutable_patterns: artifacts that expire and are re-fetched after
cache.mutable_ttl seconds (replaces cache.index_ttl)
_PACKAGE_INDEX_PATTERNS renamed to _PACKAGE_MUTABLE_PATTERNS; all
built-in package-type index patterns (APKINDEX, repomd, manifests, etc.)
default to the remote's mutable_ttl (default 1 hour).
cache.file_ttl renamed to cache.immutable_ttl for consistency.
Adds github-archive remote to remotes.yaml as a worked example showing
tag archives as immutable and branch archives as mutable (1-day TTL).
docker-compose.yml: fix VERSION=dev → 2.2.2.dev0 (valid PEP 440),
add :z SELinux label to volume mounts.
- Rebase Dockerfile onto almalinux9-base, install via uv tool install
- Remove dev artifacts (remotes.yaml, ca-bundle.pem) from image
- Mount gitignored dev files via docker-compose volumes instead
- Add .dockerignore to keep secrets out of build context
- Track docker-compose.yml in git (no secrets; dev files mounted as volumes)