Compare commits

..

31 Commits

Author SHA1 Message Date
unkinben ff2aefeef4 feat: add ban_tags_enabled/ban_tags to docker remotes to block named tags (#43)
ci/woodpecker/tag/docker Pipeline was successful
Adds two per-remote config keys for docker remotes:

  ban_tags_enabled: false   # opt-in, default off
  ban_tags:
    - latest
    - edge

When ban_tags_enabled is true and a manifest request arrives for a named
tag in ban_tags, the proxy returns 403. sha256-addressed pulls are never
blocked, so images already pulled can still be referenced by digest.
Blob requests are unaffected.

Reviewed-on: #43
2026-05-10 22:13:11 +10:00
unkinben a115904bbc fix: cross-link tag manifests to digest keys and add fetch lock to prevent thundering herd (#42)
Tag manifests (e.g. library/nginx/manifests/latest) and their sha256-addressed
counterparts were stored at separate S3 keys with no cross-reference, so a
sha256 manifest request always missed cache even when the identical content had
just been stored under the tag key.

After serving any mutable (tag) manifest, compute the sha256 of the response
body and write it under the digest key (manifests/sha256:<hex>) if absent. The
next sha256-addressed pull hits cache immediately.

Also adds a short-lived Redis distributed lock (SET NX EX 30) around upstream
fetches so that concurrent pods racing for the same cold key poll storage for
up to 5 s before issuing a duplicate upstream request, eliminating the
thundering herd on deploy events.

Includes unit tests for both the lock primitives (acquire/release, fail-open
when Redis is unavailable) and the docker proxy behaviour (cross-link written
on tag hit, not written for sha256 requests, lock acquired/released, poll path
serves from cache without upstream fetch, fallback fetch when poll times out).

Reviewed-on: #42
2026-05-10 22:12:54 +10:00
unkinben 8a7f26b193 feat: cache parsed member indexes as msgpack to skip YAML re-parse on rebuild (#40)
ci/woodpecker/tag/docker Pipeline was successful
Closes #36

## Summary

- After fetching a member's `index.yaml` (from upstream or S3), the handler now parses it and stores a compact msgpack file (`index.msgpack`) alongside the raw YAML in S3
- On subsequent virtual rebuilds (member caches valid, virtual TTL expired), the handler loads the msgpack file instead of re-parsing raw YAML — eliminating the costliest phase
- `_entries_to_msgpack_safe()` converts datetime/date objects to ISO strings before packing (msgpack cannot natively serialize Python datetimes)
- `_merge_helm_indexes()` accepts `list[dict | None]` as pre-parsed entries; falls back to raw YAML parse when msgpack is unavailable
- `_VirtualHandler.merge()` protocol updated to pass pre-parsed entries to all future handler implementations
- Broken msgpack is detected and rebuilt from raw YAML automatically

## Performance

Phase breakdown (19-member helm-all virtual, 14 MB total):

| Phase | Time | % |
|---|---|---|
| YAML parse (eliminated) | 6314 ms | 60% |
| URL rewrite + dedup | 33 ms | 0.3% |
| YAML dump | 4124 ms | 39% |

| Scenario | Before (CSafeLoader only, #34) | After |
|---|---|---|
| Cold rebuild (upstream fetch) | ~21s | ~26s (+5s for msgpack build, one-time) |
| **Warm rebuild (S3 hit, virtual expired)** | **~9.6s** | **~5.9s (38% faster)** |
| Virtual cache hit | ~0.03s | ~0.03s |

Log line confirms msgpack hits: `msgpack=19/19`

## Test plan

- 297 tests pass
- `TestEntriesToMsgpackSafe`: datetime/date serialization, empty input, round-trip
- `TestMergeHelmIndexesWithParsed`: pre-parsed path produces identical output to raw-bytes path
- `TestGetMemberIndexMsgpack`: msgpack hit, cold-build, broken msgpack fallback, upstream failure
- Docker warm-rebuild measured at 5.9s vs 9.6s baseline

Reviewed-on: #40
2026-05-02 17:15:31 +10:00
unkinben 15f934cd0b perf: use yaml.CSafeLoader/CDumper for 4x faster virtual index merge (#39)
Closes #34

## Summary

- At module load time, a `try/except` selects `yaml.CSafeLoader` / `yaml.CDumper` (C extensions) when libyaml is available, otherwise falls back to `yaml.SafeLoader` / `yaml.Dumper`
- `_HelmDumper` inherits from whichever dumper base was selected — custom datetime/date representers are registered the same way as before
- `_merge_helm_indexes` uses `yaml.load(raw_data, Loader=_YamlLoader)` instead of `yaml.safe_load`
- No change to `yaml.dump(...)` call — it already passes `Dumper=_HelmDumper`, which now inherits from the C base when available
- Five new tests in `TestYamlExtensionSelection` cover: loader/dumper base are classes, `_HelmDumper` inherits from the selected base, C extensions used when available, loader can parse YAML

## Measured performance gain

19-member `helm-all` virtual repo, real upstream data, Docker (AlmaLinux 9):

| | `merge=` time |
|---|---|
| Before (SafeLoader + Dumper) | **38,877ms** |
| After (CSafeLoader + CDumper) | **9,625ms** |
| Speedup | **4.0×** |

Local microbenchmark (500 charts × 10 versions × 19 members, 3 runs avg):
- Before: **40.8s** → After: **6.1s** (**6.7×** faster)

## Test plan

- [x] 283 unit tests pass (`make test`)
- [x] Wheel builds cleanly (`uv build --wheel`)
- [x] C extension confirmed available in AlmaLinux 9 container: `yaml.CSafeLoader: <class 'yaml.cyaml.CSafeLoader'>`
- [x] Baseline Docker timing measured with pure-Python path forced: merge=38,877ms
- [x] After Docker timing measured with C extension path: merge=9,625ms

Reviewed-on: #39
2026-05-02 11:51:00 +10:00
unkinben 7b6c69b70f perf: offload virtual repo merge to thread pool via asyncio.to_thread (#38)
Closes #35

## Summary

- Wraps `handler.merge(...)` in `await asyncio.to_thread(...)` so the CPU-bound YAML parse/merge/dump runs in the thread pool instead of blocking the event loop
- Change is at the generic `handle()` dispatch site — applies to all current and future `_VirtualHandler` implementations without modification
- Also fixes a pre-existing bug in `examples/single-file/remotes.yaml` where `base_url` and `package` keys were merged onto a single line, preventing `docker-compose up` from starting the app

## Measured performance gain

19-member `helm-all` virtual repo, single uvicorn worker, cache miss (38s merge):

| | Concurrent `/health` latency |
|---|---|
| Before (blocking) | **37,721ms** for first request (stalled) |
| After (thread pool) | **8–63ms** for all requests |

## Test plan

- [x] 278 unit tests pass (`make test`)
- [x] Live concurrency test: cache miss merge started in background, 5 concurrent `/health` checks measured — all <65ms
- [x] Baseline comparison: same test with blocking call — first health check stalled 37.7s

Reviewed-on: #38
2026-05-02 01:35:45 +10:00
unkinben 624d858062 fix: rewrite helm index.yaml URLs post-parse to handle relative URLs (#37)
Closes #33

## Summary

- `_merge_helm_indexes` now parses each member's raw YAML first, then rewrites `urls` entries in-place via the new `_rewrite_urls` helper
- **Relative URLs** (e.g. `rancher-2.13.1.tgz`) are prepended with `{proxy_base}/api/v1/remote/{member_name}/`
- **Absolute URLs** matching `base_url` are rewritten to the proxy path (existing behaviour, now correct)
- **Absolute URLs** with a different prefix are left unchanged
- Removes the `_helm.resolve_content` raw-bytes detour from the virtual merge path; `remote/helm.py` is unchanged (still used for direct remote proxying)

## Test plan

- [x] 278 unit tests pass (`make test`)
- [x] New `TestRewriteUrls` class covering relative, absolute-match, absolute-no-match, leading-slash, and multi-URL cases
- [x] New `test_relative_urls_rewritten_to_proxy` in `TestMergeHelmIndexes`
- [x] Updated `test_first_member_wins_on_duplicate_name_and_version` to assert on proxy remote name (not upstream hostname)
- [x] Live Docker test: Rancher `index.yaml` relative URLs rewritten correctly to `http://localhost:8000/api/v1/remote/rancher-stable/rancher-2.14.1.tgz` etc.
- [x] `helm-all` virtual (19 members) returns HTTP 200 with 395k-line merged index on cache miss

Reviewed-on: #37
2026-05-02 01:22:16 +10:00
unkinben 1656664dfa refactor: split config into remotes/virtuals/locals sections (#31)
ci/woodpecker/tag/docker Pipeline was successful
Repository types now live under dedicated top-level keys instead of a
shared remotes: block distinguished by a type field:

  remotes:   caching proxy remotes (no type field needed)
  virtuals:  virtual merged-index repositories
  locals:    local upload repositories

Routes for local repos move from /api/v1/remote/ to /api/v1/local/.
config.py gains get_virtual_config() and get_local_config() lookups.
Root endpoint now reports all three sections. Drop root conf.d/ (was
an exact duplicate of examples/conf.d-method/).

Reviewed-on: #31
2026-04-30 23:50:20 +10:00
unkinben c7baae8d0d feat: add virtual repository support for unified index merging (#30)
Adds a new virtual repo type that merges indexes from multiple member remotes
of the same package type. Currently supports helm (index.yaml merge with URL
rewriting). Member fetches run in parallel; merged index is Redis-cached at
min(mutable_ttl) across members.

Reviewed-on: #30
2026-04-29 23:01:14 +10:00
unkinben 4789635e87 Merge pull request 'chore: move example config files into examples/' (#27) from benvin/examples-directory into master
ci/woodpecker/tag/docker Pipeline was successful
Reviewed-on: #27
2026-04-28 23:47:03 +10:00
unkinben ba52fedd27 chore: restructure examples into single-file and conf.d-method subdirs
ci/woodpecker/pr/pre-commit Pipeline was successful
ci/woodpecker/pr/test Pipeline was successful
ci/woodpecker/pr/build Pipeline was successful
examples/single-file/remotes.yaml  — original monolithic config
examples/conf.d-method/            — one yaml per remote (alpine, github, pypi)

docker-compose updated to mount from examples/single-file/.
2026-04-28 23:46:06 +10:00
unkinben 76633403b2 chore: move example config files into examples/
ci/woodpecker/pr/pre-commit Pipeline was successful
ci/woodpecker/pr/test Pipeline was successful
ci/woodpecker/pr/build Pipeline was successful
Keeps the repo root clean — example remotes.yaml lives in examples/.
docker-compose.yml updated to mount from the new path.
2026-04-28 23:44:14 +10:00
unkinben cae3503ac4 Merge pull request 'feat: support config.d directory for split configuration (closes #20)' (#26) from benvin/issue-20-config-dir-split into master
Reviewed-on: #26
2026-04-28 23:39:56 +10:00
unkinben 3f098df428 chore: add conf.d example split-config files
ci/woodpecker/pr/pre-commit Pipeline was successful
ci/woodpecker/pr/test Pipeline was successful
ci/woodpecker/pr/build Pipeline was successful
Three example files (alpine, github, pypi) demonstrating per-remote
YAML files for the conf.d directory mode.
2026-04-28 23:29:41 +10:00
unkinben 64266f40e9 feat: support config.d directory for split configuration (closes #20)
ci/woodpecker/pr/pre-commit Pipeline was successful
ci/woodpecker/pr/test Pipeline was successful
ci/woodpecker/pr/build Pipeline was successful
CONFIG_PATH now accepts a directory path (all *.yaml files merged) or a
main file with a config_dir key pointing to a drop-in directory. Remotes
are merged alphabetically across files; later files win on conflicts.
2026-04-28 23:21:02 +10:00
unkinben be25fc19f7 Merge pull request 'feat: quarantine new releases (supply-chain attack prevention)' (#25) from benvin/issue-22-quarantine into master
Reviewed-on: #25
2026-04-28 23:13:28 +10:00
unkinben 3bd3ca8b74 feat: quarantine new releases to prevent supply chain attacks
ci/woodpecker/pr/pre-commit Pipeline was successful
ci/woodpecker/pr/test Pipeline was successful
ci/woodpecker/pr/build Pipeline was successful
Add per-remote quarantine support: when quarantine_new=true and quarantine_days=N,
immutable artifacts published within the last N days are blocked with 404 until
the quarantine window expires.

- ConfigManager.get_quarantine_config() reads quarantine_new/quarantine_days
- RedisCache.store/get_artifact_published() persist Last-Modified per artifact
- proxy._check_quarantine() enforces the window; fails open when date is unknown
- proxy._fetch_last_modified() HEAD-requests upstream to discover publish date
- Docker proxy route wires quarantine checks on both cache-hit and cache-miss
- remotes.yaml: quarantine_new/quarantine_days added to pypi example (3-day window)
- README: documents quarantine configuration
2026-04-28 23:01:52 +10:00
unkinben 373366e695 Merge pull request 'refactor: split codebase into submodules (closes #19)' (#24) from benvin/issue-19-submodules into master
Reviewed-on: #24
2026-04-28 22:47:38 +10:00
unkinben e6d9b175ce refactor: extract route handler logic into artifact/ subpackage
ci/woodpecker/pr/pre-commit Pipeline was successful
ci/woodpecker/pr/test Pipeline was successful
ci/woodpecker/pr/build Pipeline was successful
Each route in main.py is now a single-line delegation to an artifact submodule:
- artifact/proxy.py  — remote artifact GET, caching, mutable revalidation
- artifact/local.py  — local repo upload/check/delete
- artifact/docker.py — Docker Registry v2 proxy + ping
- artifact/discovery.py — GitHub release discovery + bulk cache
- artifact/flush.py  — cache flush

UpstreamUnreachable, cache_single_artifact, _upstream_reachable and
check_upstream_changed moved from main.py to artifact/proxy.py.
Tests updated to patch at their new locations.

All 187 tests pass.
2026-04-28 22:21:01 +10:00
unkinben 0daca40156 refactor: add storage/s3 and auth/docker submodules
- storage/s3.py: S3Storage moved from storage.py; storage/__init__.py re-exports it
- auth/docker.py: Docker Bearer token logic moved from docker_auth.py
- docker_auth.py: thin shim re-exporting all public symbols (including _token_cache)
  for backwards compatibility with existing test and import paths
- main.py: now imports get_docker_token_for_response from .auth

All 187 tests pass.
2026-04-28 22:15:04 +10:00
unkinben 0df726467a refactor: split cache, database, and remote logic into submodules
cache/redis.py, database/postgres.py, and remote/{base,generic,helm,npm,python,rpm}.py
replace the flat modules. All public symbols re-exported from their package
__init__.py for backwards compatibility. No functional changes; all 187 tests pass.

Closes #19
2026-04-28 22:09:58 +10:00
unkinben b8bc7f8714 Merge pull request 'chore: cleanup the readme' (#23) from benvin/readme-refactor into master
Reviewed-on: #23
2026-04-28 22:00:32 +10:00
unkinben 0c780c1bd1 chore: cleanup the readme
ci/woodpecker/pr/pre-commit Pipeline was successful
ci/woodpecker/pr/test Pipeline was successful
ci/woodpecker/pr/build Pipeline was successful
2026-04-28 21:57:14 +10:00
unkinben 173b5d8b10 Merge pull request 'refactor: simplify pypi and npm URL rewriting' (#18) from benvin/simplify-remote-url-rewriting into master
Reviewed-on: #18
2026-04-27 22:43:33 +10:00
unkinben 3352a3e886 refactor: simplify pypi and npm URL rewriting — single remote, no redundant config keys
ci/woodpecker/pr/pre-commit Pipeline was successful
ci/woodpecker/pr/test Pipeline was successful
ci/woodpecker/pr/build Pipeline was successful
- npm: remove npm_files_url/npm_files_remote; rewrite uses base_url and
  remote name directly (same approach as helm)
- npm: replace hardcoded .tgz extension check with immutable_patterns match
- pypi: collapse pypi + pypi-files into a single remote (base_url points
  to files.pythonhosted.org); simple/ requests are transparently fetched
  from pypi.org with no extra config required
- pypi: remove pypi_files_url/pypi_files_remote from pypi and pypi-gitea
- pypi: rewrite check now uses immutable_patterns (consistent with npm)
- Update README for both pypi and npm sections
- Update tests and fixtures to reflect single-remote pypi config
2026-04-27 22:42:23 +10:00
unkinben 8adcbac405 Merge pull request 'feat: add helm chart repository caching proxy' (#17) from benvin/helm-remote into master
ci/woodpecker/tag/docker Pipeline was successful
Reviewed-on: #17
2026-04-27 22:22:36 +10:00
unkinben 4ca89b9159 feat: add helm chart repository caching proxy
ci/woodpecker/pr/pre-commit Pipeline was successful
ci/woodpecker/pr/test Pipeline was successful
ci/woodpecker/pr/build Pipeline was successful
- Add helm package type with index.yaml as mutable (TTL-based) and
  .tgz chart tarballs as immutable
- Rewrite chart URLs in index.yaml to serve tarballs via proxy cache
- Add text/yaml content-type detection for .yaml/.yml files
- Add hashicorp-helm example remote in remotes.yaml
- Update README with Helm chart repository proxy section
- Add tests for helm mutable patterns and route behaviour
2026-04-27 22:17:31 +10:00
unkinben 25b85ddc92 Merge pull request 'feat: add npm registry caching proxy' (#16) from benvin/npm-remote into master
ci/woodpecker/tag/docker Pipeline was successful
Reviewed-on: #16
2026-04-27 20:30:18 +10:00
unkinben d585ab425c feat: add npm remote type with metadata URL rewriting and caching
ci/woodpecker/pr/test Pipeline was successful
ci/woodpecker/pr/pre-commit Pipeline was successful
ci/woodpecker/pr/build Pipeline was successful
- Add `npm` package type to config with no built-in mutable defaults;
  users set explicit mutable_patterns (e.g. ^(?!.*\.tgz$).*) and
  immutable_patterns (e.g. \.tgz$) in remotes.yaml
- Rewrite dist.tarball URLs in metadata JSON on the fly so tarball
  downloads pass through the same proxy remote instead of hitting
  npmjs.org directly
- Single-remote design: npm_files_remote points back to itself since
  both metadata and tarballs are served from registry.npmjs.org
- Add .tgz to _get_content_type (application/gzip)
- Add example npm remote to remotes.yaml
- Add npm proxy section to README covering remotes.yaml config,
  client setup (npm/yarn/pnpm), rewriting behaviour, and
  mutable vs immutable path table
- Add tests for mutable pattern matching, URL rewriting, content-type,
  scoped packages, cache miss, and tarball immutability
2026-04-27 20:28:31 +10:00
unkinben 6b1a6c9eb4 Merge pull request 'feat: add PyPI remote type with URL rewriting and basic auth' (#15) from benvin/pypi-remote into master
Reviewed-on: #15
2026-04-27 14:46:27 +10:00
unkinben 5de912db75 docs: describe PyPI remote usage with uv system/user uv.toml
ci/woodpecker/pr/pre-commit Pipeline was successful
ci/woodpecker/pr/test Pipeline was successful
ci/woodpecker/pr/build Pipeline was successful
2026-04-27 14:37:41 +10:00
unkinben 8e9d313892 feat: add pypi remote type with URL rewriting and basic auth
ci/woodpecker/pr/pre-commit Pipeline was successful
ci/woodpecker/pr/test Pipeline was successful
ci/woodpecker/pr/build Pipeline was successful
- Add 'pypi' package type to config.py; simple/ paths are mutable by default
- Refactor content-type detection into _get_content_type() helper; add .whl
- Add _resolve_content() which rewrites files host URLs in simple index HTML
  to go through the proxy (pypi_files_url / pypi_files_remote config keys),
  and returns text/html content-type for simple index responses
- Add basic auth support for non-Docker remotes (username + password/token
  in remote config); thread auth through _upstream_reachable and
  check_upstream_changed so mutable TTL checks also authenticate
- Add 'pypi' remote (pypi.org simple index) and 'pypi-files' remote
  (files.pythonhosted.org) to remotes.yaml; add 'pypi-gitea' example for
  Gitea package registries where index and files share the same base URL
- Add unit tests: simple index URL rewriting, HTML content-type, .whl/.tar.gz
  content-types, mutable index detection, and immutable pattern enforcement
2026-04-27 14:31:33 +10:00
39 changed files with 4288 additions and 1954 deletions
+374 -818
View File
File diff suppressed because it is too large Load Diff
+1 -1
View File
@@ -10,7 +10,7 @@ services:
ports: ports:
- "8000:8000" - "8000:8000"
volumes: volumes:
- ./remotes.yaml:/app/remotes.yaml:ro,z - ./examples/single-file/remotes.yaml:/app/remotes.yaml:ro,z
- ./ca-bundle.pem:/app/ca-bundle.pem:ro,z - ./ca-bundle.pem:/app/ca-bundle.pem:ro,z
environment: environment:
- CONFIG_PATH=/app/remotes.yaml - CONFIG_PATH=/app/remotes.yaml
+10
View File
@@ -0,0 +1,10 @@
remotes:
alpine:
base_url: "https://dl-cdn.alpinelinux.org"
package: "alpine"
description: "Alpine Linux APK package repository"
immutable_patterns:
- ".*/x86_64/.*\\.apk$"
cache:
immutable_ttl: 0
mutable_ttl: 7200
+11
View File
@@ -0,0 +1,11 @@
remotes:
github:
base_url: "https://github.com"
package: "generic"
description: "GitHub releases and files"
immutable_patterns:
- "gruntwork-io/terragrunt/.*terragrunt_linux_amd64.*"
- "prometheus/node_exporter/.*/node_exporter-.*\\.linux-amd64\\.tar\\.gz$"
cache:
immutable_ttl: 0
mutable_ttl: 0
+16
View File
@@ -0,0 +1,16 @@
remotes:
pypi:
base_url: "https://files.pythonhosted.org"
package: "pypi"
description: "Python Package Index"
check_mutable_updates: true
quarantine_new: true
quarantine_days: 3
immutable_patterns:
- "packages/.*\\.whl$"
- "packages/.*\\.whl\\.metadata$"
- "packages/.*\\.tar\\.gz$"
- "packages/.*\\.zip$"
cache:
immutable_ttl: 0
mutable_ttl: 600
+487
View File
@@ -0,0 +1,487 @@
# Example remotes configuration — copy and adapt for your environment.
#
# immutable_patterns: artifacts cached forever (e.g. release binaries, versioned tags).
# mutable_patterns: artifacts that expire after cache.mutable_ttl seconds and are
# re-fetched from upstream on next request (e.g. index files,
# branch archives). Defaults to the package-type built-ins when
# not set (APKINDEX, repomd.xml, Docker manifests, etc.).
# cache:
# immutable_ttl: TTL for immutable files (0 = forever, rarely needed to change).
# mutable_ttl: TTL in seconds for mutable files. Omit to use the default (3600).
#
# quarantine_new: Set to true to block immutable artifacts published within the last
# quarantine_days days. Requests return 404 until the quarantine period
# expires. Fails open when the publish date cannot be determined.
# quarantine_days: Number of days to quarantine newly published artifacts (requires
# quarantine_new: true). The upstream Last-Modified header is used as
# the publish date.
#
# WARNING: this file may contain credentials — do not commit real values.
#
# Global configuration
#s3:
# endpoint: "localhost:9000"
# access_key: "minioadmin"
# secret_key: "minioadmin"
# bucket: "artifacts"
# secure: false
#
#redis:
# url: "redis://localhost:6379/0"
#
#database:
# url: "postgresql://artifacts:artifacts123@localhost:5432/artifacts"
#
remotes:
github:
base_url: "https://github.com"
package: "generic"
description: "GitHub releases and files"
immutable_patterns:
- "gruntwork-io/terragrunt/.*terragrunt_linux_amd64.*"
- "lxc/incus/.*\\.tar\\.gz$"
- "prometheus/node_exporter/.*/node_exporter-.*\\.linux-amd64\\.tar\\.gz$"
- "VictoriaMetrics/VictoriaMetrics/.*/vmutils-linux-amd64-.*\\.tar\\.gz$"
- "VictoriaMetrics/VictoriaMetrics/.*/victoria-metrics-linux-amd64-.*-cluster\\.tar\\.gz$"
- "VictoriaMetrics/VictoriaMetrics/.*/victoria-logs-linux-amd64-.*\\.tar\\.gz$"
- "VictoriaMetrics/VictoriaMetrics/.*/vlutils-linux-amd64-.*\\.tar\\.gz$"
- "prometheus-community/bind_exporter/.*/bind_exporter-.*\\.linux-amd64\\.tar\\.gz$"
- "prometheus-community/pgbouncer_exporter/.*/pgbouncer_exporter-.*\\.linux-amd64\\.tar\\.gz$"
- "prometheus-community/postgres_exporter/.*/postgres_exporter-.*\\.linux-amd64\\.tar\\.gz$"
- "onedr0p/exportarr/.*/exportarr_.*_linux_amd64\\.tar\\.gz$"
- "tynany/frr_exporter/.*/frr_exporter-.*\\.linux-amd64\\.tar\\.gz$"
- "camptocamp/prometheus-puppetdb-exporter/.*/prometheus-puppetdb-exporter-.*\\.linux-amd64\\.tar\\.gz$"
- "grafana/jsonnet-language-server/.*/jsonnet-language-server_.*_linux_amd64$"
- "helmfile/helmfile/.*/helmfile_.*_linux_amd64\\.tar\\.gz$"
- "helmfile/vals/.*/vals_.*_linux_amd64\\.tar\\.gz$"
- "openbao/openbao-plugins/.*/openbao-plugin-secrets-consul_linux_amd64_.*\\.tar\\.gz$"
- "openbao/openbao-plugins/.*/openbao-plugin-secrets-nomad_linux_amd64_.*\\.tar\\.gz$"
- "apple/foundationdb/.*/libfdb_c\\.x86_64\\.so$"
- "stalwartlabs/stalwart/.*/stalwart-cli-x86_64-unknown-linux-gnu\\.tar\\.gz$"
- "stalwartlabs/stalwart/.*/stalwart-foundationdb-x86_64-unknown-linux-gnu\\.tar\\.gz$"
- "stalwartlabs/stalwart/.*/stalwart-x86_64-unknown-linux-gnu\\.tar\\.gz$"
cache:
immutable_ttl: 0 # Files cached indefinitely
mutable_ttl: 0
github-archive:
base_url: "https://github.com"
package: "generic"
description: "GitHub repository archive tarballs"
immutable_patterns:
# Tag archives are immutable — a tag never changes
- ".*/archive/refs/tags/.*\\.tar\\.gz$"
mutable_patterns:
# Branch archives can change on every push
- ".*/archive/refs/heads/main\\.tar\\.gz$"
- ".*/archive/refs/heads/master\\.tar\\.gz$"
# Before re-downloading an expired branch archive, check whether it has
# actually changed (304 Not Modified → just refresh the TTL, no transfer).
# Only applies to user-defined mutable_patterns, not package-type defaults.
check_mutable_updates: true
cache:
immutable_ttl: 0 # Tag archives cached indefinitely
mutable_ttl: 86400 # Branch archives refreshed after 1 day
gitea-dl:
base_url: "https://dl.gitea.com"
package: "generic"
description: "Gitea download site"
immutable_patterns:
- "act_runner/.*/act_runner-.*-linux-amd64$"
cache:
immutable_ttl: 0 # Files cached indefinitely
mutable_ttl: 0
hashicorp-releases:
base_url: "https://releases.hashicorp.com"
package: "generic"
description: "HashiCorp product releases"
immutable_patterns:
- "terraform/.*terraform_.*_linux_amd64\\.zip$"
- "terraform/.*terraform_.*_windows_amd64\\.zip$"
- "terraform/.*terraform_.*_darwin_amd64\\.zip$"
- "vault/.*vault_.*_linux_amd64\\.zip$"
- "vault/.*vault_.*_windows_amd64\\.zip$"
- "vault/.*vault_.*_darwin_amd64\\.zip$"
- "consul-cni/.*/consul-cni_.*_linux_amd64\\.zip$"
- "consul/.*/consul_.*_linux_amd64\\.zip$"
- "nomad-autoscaler/.*/nomad-autoscaler_.*_linux_amd64\\.zip$"
- "nomad/.*/nomad_.*_linux_amd64\\.zip$"
- "packer/.*/packer_.*_linux_amd64\\.zip$"
cache:
immutable_ttl: 0 # Files cached indefinitely
mutable_ttl: 0
alpine:
base_url: "https://dl-cdn.alpinelinux.org"
package: "alpine"
description: "Alpine Linux APK package repository"
immutable_patterns:
- ".*/x86_64/.*\\.apk$"
# check_mutable_updates not set: APKINDEX.tar.gz is a package-type default
# and is always re-fetched on expiry — conditional checks are skipped for
# built-in mutable patterns regardless of this flag.
cache:
immutable_ttl: 0 # Files cached indefinitely
mutable_ttl: 7200 # Index files (APKINDEX.tar.gz) cached for 2 hours
almalinux:
base_url: "https://gsl-syd.mm.fcix.net/almalinux"
package: "rpm"
description: "AlmaLinux RPM package repository"
immutable_patterns:
- ".*/x86_64/.*\\.rpm$"
- ".*/noarch/.*\\.rpm$"
- ".*/repodata/.*$"
- ".*\\.rpm$" # Allow all RPM files
# repomd.xml / repodata are package-type defaults — always re-fetched on
# expiry. check_mutable_updates would only apply to any custom
# mutable_patterns added here.
cache:
immutable_ttl: 0 # Files cached indefinitely
mutable_ttl: 7200 # Metadata files cached for 2 hours
epel:
base_url: "http://mirror.aarnet.edu.au/pub/epel"
package: "rpm"
description: "EPEL (Extra Packages for Enterprise Linux)"
immutable_patterns:
- "8/Everything/x86_64/.*\\.rpm$"
- "9/Everything/x86_64/.*\\.rpm$"
- "10/Everything/x86_64/.*\\.rpm$"
- ".*/noarch/.*\\.rpm$"
- ".*/repodata/.*$"
cache:
immutable_ttl: 0 # Files cached indefinitely
mutable_ttl: 7200 # Metadata files cached for 2 hours
fedora:
base_url: "https://gsl-syd.mm.fcix.net/fedora/linux"
package: "rpm"
description: "Fedora Linux RPM package repository"
immutable_patterns:
- "releases/.*/Everything/x86_64/.*\\.rpm$"
- "updates/.*/Everything/x86_64/.*\\.rpm$"
- "development/.*/Everything/x86_64/.*\\.rpm$"
- ".*/noarch/.*\\.rpm$"
- "updates/.*/Everything/x86_64/repodata/.*$"
cache:
immutable_ttl: 0 # Files cached indefinitely
mutable_ttl: 300 # Metadata files cached for 5 minutes
ghcr:
base_url: "https://ghcr.io"
package: "docker"
description: "GitHub Container Registry"
# username: "your-github-username"
# password: "your-github-pat" # needs read:packages scope
# Docker manifest/tag-list patterns are package-type defaults — always
# re-fetched on expiry. check_mutable_updates only applies to any custom
# mutable_patterns you add (e.g. a metadata endpoint).
cache:
immutable_ttl: 0
mutable_ttl: 300
dockerhub:
base_url: "https://registry-1.docker.io"
package: "docker"
description: "Docker Hub registry"
cache:
immutable_ttl: 0
mutable_ttl: 300
pypi:
base_url: "https://files.pythonhosted.org"
package: "pypi"
description: "Python Package Index — simple index and package files via a single remote"
# simple/ requests are transparently fetched from pypi.org; package files come from
# files.pythonhosted.org (base_url). URLs in the simple index are rewritten to this remote.
check_mutable_updates: true
# Block packages published within the last 3 days (supply-chain attack mitigation).
# Immutable artifacts (wheel/sdist) newer than quarantine_days return 404 until
# the window passes. Disable by setting quarantine_new: false or removing both keys.
quarantine_new: true
quarantine_days: 3
immutable_patterns:
- "packages/.*\\.whl$"
- "packages/.*\\.whl\\.metadata$"
- "packages/.*\\.tar\\.gz$"
- "packages/.*\\.zip$"
- "packages/.*\\.egg$"
cache:
immutable_ttl: 0
mutable_ttl: 600 # Simple index pages refreshed after 10 minutes
pypi-gitea:
base_url: "https://gitea.example.com/api/packages/myorg/pypi"
package: "pypi"
description: "Private Gitea PyPI registry — simple index and files at the same host"
# username: "your-gitea-username"
# password: "your-personal-access-token" # needs package:read scope
check_mutable_updates: true
immutable_patterns:
- "files/.*\\.whl$"
- "files/.*\\.whl\\.metadata$"
- "files/.*\\.tar\\.gz$"
- "files/.*\\.zip$"
- "files/.*\\.egg$"
cache:
immutable_ttl: 0
mutable_ttl: 600
npm:
base_url: "https://registry.npmjs.org"
package: "npm"
description: "npm registry — package metadata with tarball URL rewriting"
check_mutable_updates: true
immutable_patterns:
- \.tgz$
mutable_patterns:
- ^(?!.*\.tgz$).*
cache:
immutable_ttl: 0
mutable_ttl: 600 # Package metadata refreshed after 10 minutes
hashicorp-helm:
base_url: "https://helm.releases.hashicorp.com"
package: "helm"
description: "HashiCorp Helm chart repository (Vault, Consul, Nomad, etc.)"
check_mutable_updates: true
immutable_patterns:
- "\\.tgz$"
cache:
immutable_ttl: 0 # Chart tarballs are versioned — cache forever
mutable_ttl: 3600 # index.yaml refreshed after 1 hour
metallb:
base_url: "https://metallb.github.io/metallb"
package: "helm"
description: "MetalLB load balancer Helm charts"
check_mutable_updates: true
immutable_patterns:
- "\\.tgz$"
cache:
immutable_ttl: 0
mutable_ttl: 3600
jetstack:
base_url: "https://charts.jetstack.io"
package: "helm"
description: "Jetstack Helm charts (cert-manager)"
check_mutable_updates: true
immutable_patterns:
- "\\.tgz$"
cache:
immutable_ttl: 0
mutable_ttl: 3600
rancher-stable:
base_url: "https://releases.rancher.com/server-charts/stable"
package: "helm"
description: "Rancher stable Helm charts"
check_mutable_updates: true
immutable_patterns:
- "\\.tgz$"
cache:
immutable_ttl: 0
mutable_ttl: 3600
purelb:
base_url: "https://gitlab.com/api/v4/projects/20400619/packages/helm/stable"
package: "helm"
description: "PureLB load balancer Helm charts"
check_mutable_updates: true
immutable_patterns:
- "\\.tgz$"
cache:
immutable_ttl: 0
mutable_ttl: 3600
istio:
base_url: "https://istio-release.storage.googleapis.com/charts"
package: "helm"
description: "Istio service mesh Helm charts"
check_mutable_updates: true
immutable_patterns:
- "\\.tgz$"
cache:
immutable_ttl: 0
mutable_ttl: 3600
cnpg:
base_url: "https://cloudnative-pg.github.io/charts"
package: "helm"
description: "CloudNativePG operator Helm charts"
check_mutable_updates: true
immutable_patterns:
- "\\.tgz$"
cache:
immutable_ttl: 0
mutable_ttl: 3600
ceph-csi:
base_url: "https://ceph.github.io/csi-charts"
package: "helm"
description: "Ceph CSI driver Helm charts"
check_mutable_updates: true
immutable_patterns:
- "\\.tgz$"
cache:
immutable_ttl: 0
mutable_ttl: 3600
external-dns:
base_url: "https://kubernetes-sigs.github.io/external-dns/"
package: "helm"
description: "ExternalDNS Helm charts"
check_mutable_updates: true
immutable_patterns:
- "\\.tgz$"
cache:
immutable_ttl: 0
mutable_ttl: 3600
intel-helm:
base_url: "https://intel.github.io/helm-charts/"
package: "helm"
description: "Intel Helm charts"
check_mutable_updates: true
immutable_patterns:
- "\\.tgz$"
cache:
immutable_ttl: 0
mutable_ttl: 3600
elastic:
base_url: "https://helm.elastic.co"
package: "helm"
description: "Elastic stack Helm charts"
check_mutable_updates: true
immutable_patterns:
- "\\.tgz$"
cache:
immutable_ttl: 0
mutable_ttl: 3600
k8up-io:
base_url: "https://k8up-io.github.io/k8up"
package: "helm"
description: "K8up backup operator Helm charts"
check_mutable_updates: true
immutable_patterns:
- "\\.tgz$"
cache:
immutable_ttl: 0
mutable_ttl: 3600
victoriametrics:
base_url: "https://victoriametrics.github.io/helm-charts/"
package: "helm"
description: "VictoriaMetrics observability Helm charts"
check_mutable_updates: true
immutable_patterns:
- "\\.tgz$"
cache:
immutable_ttl: 0
mutable_ttl: 3600
grafana:
base_url: "https://grafana.github.io/helm-charts"
package: "helm"
description: "Grafana observability Helm charts"
check_mutable_updates: true
immutable_patterns:
- "\\.tgz$"
cache:
immutable_ttl: 0
mutable_ttl: 3600
helm-openldap:
base_url: "https://jp-gouin.github.io/helm-openldap/"
package: "helm"
description: "OpenLDAP Helm charts"
check_mutable_updates: true
immutable_patterns:
- "\\.tgz$"
cache:
immutable_ttl: 0
mutable_ttl: 3600
woodpecker:
base_url: "https://woodpecker-ci.org/"
package: "helm"
description: "Woodpecker CI Helm charts"
check_mutable_updates: true
immutable_patterns:
- "\\.tgz$"
cache:
immutable_ttl: 0
mutable_ttl: 3600
stakater:
base_url: "https://stakater.github.io/stakater-charts"
package: "helm"
description: "Stakater Helm charts"
check_mutable_updates: true
immutable_patterns:
- "\\.tgz$"
cache:
immutable_ttl: 0
mutable_ttl: 3600
jfrog:
base_url: "https://charts.jfrog.io/"
package: "helm"
description: "JFrog Helm charts"
check_mutable_updates: true
immutable_patterns:
- "\\.tgz$"
cache:
immutable_ttl: 0
mutable_ttl: 3600
openvox:
base_url: "https://openvoxproject.github.io/openvox-helm-chart"
package: "helm"
description: "OpenVox Helm charts"
check_mutable_updates: true
immutable_patterns:
- "\\.tgz$"
cache:
immutable_ttl: 0
mutable_ttl: 3600
virtuals:
helm-all:
package: "helm"
description: "Virtual repository merging all helm remotes — member order is priority order for duplicate chart+version"
members:
- hashicorp-helm
- metallb
- jetstack
- rancher-stable
- purelb
- istio
- cnpg
- ceph-csi
- external-dns
- intel-helm
- elastic
- k8up-io
- victoriametrics
- grafana
- helm-openldap
- woodpecker
- stakater
- jfrog
- openvox
locals:
local-generic:
package: "generic"
description: "Local generic file repository"
cache:
immutable_ttl: 0 # Files cached indefinitely
mutable_ttl: 0
+1
View File
@@ -14,6 +14,7 @@ dependencies = [
"lxml>=4.9.0", "lxml>=4.9.0",
"prometheus-client>=0.19.0", "prometheus-client>=0.19.0",
"python-multipart>=0.0.6", "python-multipart>=0.0.6",
"msgpack>=1.0.0",
] ]
requires-python = ">=3.11" requires-python = ">=3.11"
readme = "README.md" readme = "README.md"
-203
View File
@@ -1,203 +0,0 @@
# Example remotes configuration — copy and adapt for your environment.
#
# immutable_patterns: artifacts cached forever (e.g. release binaries, versioned tags).
# mutable_patterns: artifacts that expire after cache.mutable_ttl seconds and are
# re-fetched from upstream on next request (e.g. index files,
# branch archives). Defaults to the package-type built-ins when
# not set (APKINDEX, repomd.xml, Docker manifests, etc.).
# cache:
# immutable_ttl: TTL for immutable files (0 = forever, rarely needed to change).
# mutable_ttl: TTL in seconds for mutable files. Omit to use the default (3600).
#
# WARNING: this file may contain credentials — do not commit real values.
#
# Global configuration
#s3:
# endpoint: "localhost:9000"
# access_key: "minioadmin"
# secret_key: "minioadmin"
# bucket: "artifacts"
# secure: false
#
#redis:
# url: "redis://localhost:6379/0"
#
#database:
# url: "postgresql://artifacts:artifacts123@localhost:5432/artifacts"
#
remotes:
github:
base_url: "https://github.com"
type: "remote"
package: "generic"
description: "GitHub releases and files"
immutable_patterns:
- "gruntwork-io/terragrunt/.*terragrunt_linux_amd64.*"
- "lxc/incus/.*\\.tar\\.gz$"
- "prometheus/node_exporter/.*/node_exporter-.*\\.linux-amd64\\.tar\\.gz$"
- "VictoriaMetrics/VictoriaMetrics/.*/vmutils-linux-amd64-.*\\.tar\\.gz$"
- "VictoriaMetrics/VictoriaMetrics/.*/victoria-metrics-linux-amd64-.*-cluster\\.tar\\.gz$"
- "VictoriaMetrics/VictoriaMetrics/.*/victoria-logs-linux-amd64-.*\\.tar\\.gz$"
- "VictoriaMetrics/VictoriaMetrics/.*/vlutils-linux-amd64-.*\\.tar\\.gz$"
- "prometheus-community/bind_exporter/.*/bind_exporter-.*\\.linux-amd64\\.tar\\.gz$"
- "prometheus-community/pgbouncer_exporter/.*/pgbouncer_exporter-.*\\.linux-amd64\\.tar\\.gz$"
- "prometheus-community/postgres_exporter/.*/postgres_exporter-.*\\.linux-amd64\\.tar\\.gz$"
- "onedr0p/exportarr/.*/exportarr_.*_linux_amd64\\.tar\\.gz$"
- "tynany/frr_exporter/.*/frr_exporter-.*\\.linux-amd64\\.tar\\.gz$"
- "camptocamp/prometheus-puppetdb-exporter/.*/prometheus-puppetdb-exporter-.*\\.linux-amd64\\.tar\\.gz$"
- "grafana/jsonnet-language-server/.*/jsonnet-language-server_.*_linux_amd64$"
- "helmfile/helmfile/.*/helmfile_.*_linux_amd64\\.tar\\.gz$"
- "helmfile/vals/.*/vals_.*_linux_amd64\\.tar\\.gz$"
- "openbao/openbao-plugins/.*/openbao-plugin-secrets-consul_linux_amd64_.*\\.tar\\.gz$"
- "openbao/openbao-plugins/.*/openbao-plugin-secrets-nomad_linux_amd64_.*\\.tar\\.gz$"
- "apple/foundationdb/.*/libfdb_c\\.x86_64\\.so$"
- "stalwartlabs/stalwart/.*/stalwart-cli-x86_64-unknown-linux-gnu\\.tar\\.gz$"
- "stalwartlabs/stalwart/.*/stalwart-foundationdb-x86_64-unknown-linux-gnu\\.tar\\.gz$"
- "stalwartlabs/stalwart/.*/stalwart-x86_64-unknown-linux-gnu\\.tar\\.gz$"
cache:
immutable_ttl: 0 # Files cached indefinitely
mutable_ttl: 0
github-archive:
base_url: "https://github.com"
type: "remote"
package: "generic"
description: "GitHub repository archive tarballs"
immutable_patterns:
# Tag archives are immutable — a tag never changes
- ".*/archive/refs/tags/.*\\.tar\\.gz$"
mutable_patterns:
# Branch archives can change on every push
- ".*/archive/refs/heads/main\\.tar\\.gz$"
- ".*/archive/refs/heads/master\\.tar\\.gz$"
# Before re-downloading an expired branch archive, check whether it has
# actually changed (304 Not Modified → just refresh the TTL, no transfer).
# Only applies to user-defined mutable_patterns, not package-type defaults.
check_mutable_updates: true
cache:
immutable_ttl: 0 # Tag archives cached indefinitely
mutable_ttl: 86400 # Branch archives refreshed after 1 day
gitea-dl:
base_url: "https://dl.gitea.com"
type: "remote"
package: "generic"
description: "Gitea download site"
immutable_patterns:
- "act_runner/.*/act_runner-.*-linux-amd64$"
cache:
immutable_ttl: 0 # Files cached indefinitely
mutable_ttl: 0
hashicorp-releases:
base_url: "https://releases.hashicorp.com"
type: "remote"
package: "generic"
description: "HashiCorp product releases"
immutable_patterns:
- "terraform/.*terraform_.*_linux_amd64\\.zip$"
- "terraform/.*terraform_.*_windows_amd64\\.zip$"
- "terraform/.*terraform_.*_darwin_amd64\\.zip$"
- "vault/.*vault_.*_linux_amd64\\.zip$"
- "vault/.*vault_.*_windows_amd64\\.zip$"
- "vault/.*vault_.*_darwin_amd64\\.zip$"
- "consul-cni/.*/consul-cni_.*_linux_amd64\\.zip$"
- "consul/.*/consul_.*_linux_amd64\\.zip$"
- "nomad-autoscaler/.*/nomad-autoscaler_.*_linux_amd64\\.zip$"
- "nomad/.*/nomad_.*_linux_amd64\\.zip$"
- "packer/.*/packer_.*_linux_amd64\\.zip$"
cache:
immutable_ttl: 0 # Files cached indefinitely
mutable_ttl: 0
alpine:
base_url: "https://dl-cdn.alpinelinux.org"
type: "remote"
package: "alpine"
description: "Alpine Linux APK package repository"
immutable_patterns:
- ".*/x86_64/.*\\.apk$"
# check_mutable_updates not set: APKINDEX.tar.gz is a package-type default
# and is always re-fetched on expiry — conditional checks are skipped for
# built-in mutable patterns regardless of this flag.
cache:
immutable_ttl: 0 # Files cached indefinitely
mutable_ttl: 7200 # Index files (APKINDEX.tar.gz) cached for 2 hours
almalinux:
base_url: "https://gsl-syd.mm.fcix.net/almalinux"
type: "remote"
package: "rpm"
description: "AlmaLinux RPM package repository"
immutable_patterns:
- ".*/x86_64/.*\\.rpm$"
- ".*/noarch/.*\\.rpm$"
- ".*/repodata/.*$"
- ".*\\.rpm$" # Allow all RPM files
# repomd.xml / repodata are package-type defaults — always re-fetched on
# expiry. check_mutable_updates would only apply to any custom
# mutable_patterns added here.
cache:
immutable_ttl: 0 # Files cached indefinitely
mutable_ttl: 7200 # Metadata files cached for 2 hours
epel:
base_url: "http://mirror.aarnet.edu.au/pub/epel"
type: "remote"
package: "rpm"
description: "EPEL (Extra Packages for Enterprise Linux)"
immutable_patterns:
- "8/Everything/x86_64/.*\\.rpm$"
- "9/Everything/x86_64/.*\\.rpm$"
- "10/Everything/x86_64/.*\\.rpm$"
- ".*/noarch/.*\\.rpm$"
- ".*/repodata/.*$"
cache:
immutable_ttl: 0 # Files cached indefinitely
mutable_ttl: 7200 # Metadata files cached for 2 hours
fedora:
base_url: "https://gsl-syd.mm.fcix.net/fedora/linux"
type: "remote"
package: "rpm"
description: "Fedora Linux RPM package repository"
immutable_patterns:
- "releases/.*/Everything/x86_64/.*\\.rpm$"
- "updates/.*/Everything/x86_64/.*\\.rpm$"
- "development/.*/Everything/x86_64/.*\\.rpm$"
- ".*/noarch/.*\\.rpm$"
- "updates/.*/Everything/x86_64/repodata/.*$"
cache:
immutable_ttl: 0 # Files cached indefinitely
mutable_ttl: 300 # Metadata files cached for 5 minutes
ghcr:
base_url: "https://ghcr.io"
type: "remote"
package: "docker"
description: "GitHub Container Registry"
# username: "your-github-username"
# password: "your-github-pat" # needs read:packages scope
# Docker manifest/tag-list patterns are package-type defaults — always
# re-fetched on expiry. check_mutable_updates only applies to any custom
# mutable_patterns you add (e.g. a metadata endpoint).
cache:
immutable_ttl: 0
mutable_ttl: 300
dockerhub:
base_url: "https://registry-1.docker.io"
type: "remote"
package: "docker"
description: "Docker Hub registry"
cache:
immutable_ttl: 0
mutable_ttl: 300
local-generic:
type: "local"
package: "generic"
description: "Local generic file repository"
cache:
immutable_ttl: 0 # Files cached indefinitely
mutable_ttl: 0
+3
View File
@@ -0,0 +1,3 @@
from . import discovery, docker, flush, local, proxy
__all__ = ["discovery", "docker", "flush", "local", "proxy"]
+82
View File
@@ -0,0 +1,82 @@
import logging
import re
from typing import Any
from urllib.parse import urlparse
import httpx
from fastapi import HTTPException
from .proxy import cache_single_artifact
logger = logging.getLogger(__name__)
async def _discover_github_releases(remote: str, include_pattern: str) -> list[str]:
match = re.match(r"github\.com/([^/]+)/([^/]+)", remote)
if not match:
raise HTTPException(status_code=400, detail="Invalid GitHub remote format")
owner, repo = match.groups()
async with httpx.AsyncClient(follow_redirects=True) as client:
response = await client.get(f"https://api.github.com/repos/{owner}/{repo}/releases")
if response.status_code != 200:
raise HTTPException(status_code=response.status_code, detail=f"Failed to fetch releases: {response.text}")
releases = response.json()
regex = re.compile(include_pattern.replace("*", ".*"))
return [
asset["browser_download_url"]
for release in releases
for asset in release.get("assets", [])
if regex.search(asset["browser_download_url"])
]
async def _discover(remote: str, include_pattern: str) -> list[str]:
if "github.com" in remote:
return await _discover_github_releases(remote, include_pattern)
raise HTTPException(status_code=400, detail=f"Unsupported remote: {remote}")
async def cache_artifacts(remote: str, include_pattern: str, storage) -> dict[str, Any]:
try:
matching_urls = await _discover(remote, include_pattern)
if not matching_urls:
return {"message": "No matching artifacts found", "cached_count": 0, "artifacts": []}
cached_artifacts = []
for url in matching_urls:
result = await cache_single_artifact(url, "", "", storage, {})
cached_artifacts.append(result)
cached_count = sum(1 for a in cached_artifacts if a["status"] in ["cached", "already_cached"])
return {
"message": f"Processed {len(matching_urls)} artifacts, {cached_count} successfully cached",
"cached_count": cached_count,
"artifacts": cached_artifacts,
}
except Exception as e:
raise HTTPException(status_code=500, detail=str(e))
async def list_artifacts(remote: str, include_pattern: str, storage) -> dict[str, Any]:
try:
matching_urls = await _discover(remote, include_pattern)
cached_artifacts = []
for url in matching_urls:
parsed = urlparse(url)
key = storage.get_object_key(remote, parsed.path)
if storage.exists(key):
cached_artifacts.append({"url": url, "cached_url": storage.get_url(key), "key": key})
return {
"remote": remote,
"pattern": include_pattern,
"total_found": len(matching_urls),
"cached_count": len(cached_artifacts),
"artifacts": cached_artifacts,
}
except Exception as e:
raise HTTPException(status_code=500, detail=str(e))
+138
View File
@@ -0,0 +1,138 @@
import asyncio
import hashlib
import json
import logging
import re
from fastapi import HTTPException, Request, Response
from . import proxy as _proxy
logger = logging.getLogger(__name__)
def ping() -> Response:
return Response(
content="{}",
media_type="application/json",
headers={"Docker-Distribution-Api-Version": "registry/2.0"},
)
async def proxy(request: Request, remote_name: str, path: str, storage, cache, config, metrics) -> Response:
remote_config = config.get_remote_config(remote_name)
if not remote_config:
raise HTTPException(status_code=404, detail=f"Remote '{remote_name}' not configured")
if remote_config.get("package") != "docker":
raise HTTPException(status_code=400, detail=f"Remote '{remote_name}' is not a docker remote")
patterns = config.get_immutable_patterns(remote_name, "")
if patterns:
path_parts = path.split("/")
image_name = "/".join(path_parts[:2]) if len(path_parts) >= 2 else path
if not any(re.search(p, path) or re.search(p, image_name) for p in patterns):
logger.info(f"PATTERN BLOCKED: {remote_name}/{path}")
raise HTTPException(status_code=403, detail="Image not allowed by configuration patterns")
if remote_config.get("ban_tags_enabled", False):
ban_tags = remote_config.get("ban_tags", [])
if ban_tags:
tag_match = re.search(r"/manifests/([^/]+)$", path)
if tag_match:
tag = tag_match.group(1)
if not tag.startswith("sha256:") and tag in ban_tags:
logger.info(f"TAG BANNED: {remote_name}/{path} (tag: {tag})")
raise HTTPException(status_code=403, detail=f"Tag '{tag}' is not permitted on this remote")
base_url = remote_config.get("base_url", "").rstrip("/")
remote_url = f"{base_url}/v2/{path}"
cached_key = storage.get_object_key(remote_name, path)
if not storage.exists(cached_key):
cached_key = None
is_mutable = cache.is_mutable_file(path, config.get_mutable_patterns(remote_name))
if cached_key and is_mutable:
if not cache.is_index_valid(remote_name, path):
if not await _proxy.handle_expired_mutable(remote_name, path, remote_url, config, cache, storage):
cached_key = None
lock_acquired = False
if not cached_key:
lock_acquired = cache.acquire_fetch_lock(remote_name, path)
if not lock_acquired:
# Another pod is already fetching — poll storage briefly before issuing a duplicate upstream request
for _ in range(10):
await asyncio.sleep(0.5)
probe_key = storage.get_object_key(remote_name, path)
if storage.exists(probe_key):
cached_key = probe_key
break
if not cached_key:
logger.info(f"Cache MISS: {remote_name}/{path} - fetching from remote: {remote_url}")
try:
result = await _proxy.cache_single_artifact(remote_url, remote_name, path, storage, remote_config)
if result["status"] == "error":
raise HTTPException(status_code=502, detail=f"Failed to fetch: {result['error']}")
if result["status"] == "cached" and is_mutable:
cache_config = config.get_cache_config(remote_name)
mutable_ttl = cache_config.get("mutable_ttl", 3600)
cache.mark_index_cached(remote_name, path, mutable_ttl)
logger.info(f"Mutable file cached with TTL: {remote_name}/{path} (ttl: {mutable_ttl}s)")
if result.get("etag") or result.get("last_modified"):
cache.store_mutable_meta(remote_name, path, result.get("etag"), result.get("last_modified"))
if not is_mutable:
published = result.get("last_modified")
if published:
cache.store_artifact_published(remote_name, path, published)
_proxy._check_quarantine(remote_name, published, config)
finally:
if lock_acquired:
cache.release_fetch_lock(remote_name, path)
elif not is_mutable:
published = cache.get_artifact_published(remote_name, path)
if not published:
published = await _proxy._fetch_last_modified(remote_url, remote_config)
if published:
cache.store_artifact_published(remote_name, path, published)
_proxy._check_quarantine(remote_name, published, config)
artifact_data = storage.download_object(storage.get_object_key(remote_name, path))
is_blob = "/blobs/" in path
if is_blob:
content_type = "application/octet-stream"
else:
try:
manifest_json = json.loads(artifact_data)
content_type = manifest_json.get("mediaType")
if not content_type:
if "manifests" in manifest_json:
content_type = "application/vnd.oci.image.index.v1+json"
else:
content_type = "application/vnd.oci.image.manifest.v1+json"
except Exception:
content_type = "application/vnd.oci.image.manifest.v1+json"
digest = f"sha256:{hashlib.sha256(artifact_data).hexdigest()}"
# Cross-link tag manifests to their sha256 digest key so digest-addressed pulls hit cache
if is_mutable and "/manifests/" in path:
digest_path = re.sub(r"/manifests/[^/]+$", f"/manifests/{digest}", path)
digest_key = storage.get_object_key(remote_name, digest_path)
if not storage.exists(digest_key):
storage.upload(digest_key, artifact_data)
headers = {
"Docker-Distribution-Api-Version": "registry/2.0",
"Docker-Content-Digest": digest,
"Content-Length": str(len(artifact_data)),
}
if request.method == "HEAD":
return Response(status_code=200, headers=headers, media_type=content_type)
metrics.record_cache_hit(remote_name, len(artifact_data))
return Response(content=artifact_data, media_type=content_type, headers=headers)
+66
View File
@@ -0,0 +1,66 @@
import logging
from fastapi import HTTPException
logger = logging.getLogger(__name__)
def handle(remote: str | None, cache_type: str, cache, storage) -> dict:
try:
result = {"remote": remote, "cache_type": cache_type, "flushed": {"redis_keys": 0, "s3_objects": 0, "operations": []}}
if cache_type in ["all", "index", "metrics"] and cache.available and cache.client:
patterns = []
if cache_type in ["all", "index"]:
if remote:
patterns += [f"index:{remote}:*", f"mutable:meta:{remote}:*"]
else:
patterns += ["index:*", "mutable:meta:*"]
if cache_type in ["all", "metrics"]:
patterns.append(f"metrics:*:{remote}" if remote else "metrics:*")
for pattern in patterns:
keys = cache.client.keys(pattern)
if keys:
cache.client.delete(*keys)
result["flushed"]["redis_keys"] += len(keys)
logger.info(f"Cache flush: deleted {len(keys)} Redis keys matching '{pattern}'")
if result["flushed"]["redis_keys"] > 0:
result["flushed"]["operations"].append(f"Deleted {result['flushed']['redis_keys']} Redis keys")
if cache_type in ["all", "files"]:
try:
list_params = {"Bucket": storage.bucket}
if remote:
list_params["Prefix"] = f"{remote}/"
response = storage.client.list_objects_v2(**list_params)
if "Contents" in response:
objects_to_delete = [obj["Key"] for obj in response["Contents"]]
for key in objects_to_delete:
try:
storage.client.delete_object(Bucket=storage.bucket, Key=key)
result["flushed"]["s3_objects"] += 1
except Exception as e:
logger.warning(f"Failed to delete S3 object {key}: {e}")
if objects_to_delete:
scope = f" for remote '{remote}'" if remote else ""
result["flushed"]["operations"].append(f"Deleted {len(objects_to_delete)} S3 objects{scope}")
logger.info(f"Cache flush: deleted {len(objects_to_delete)} S3 objects{scope}")
except Exception as e:
result["flushed"]["operations"].append(f"S3 flush failed: {str(e)}")
logger.error(f"Cache flush S3 error: {e}")
if not result["flushed"]["operations"]:
result["flushed"]["operations"].append("No cache entries found to flush")
return result
except Exception as e:
logger.error(f"Cache flush error: {e}")
raise HTTPException(status_code=500, detail=f"Cache flush failed: {str(e)}")
+113
View File
@@ -0,0 +1,113 @@
import hashlib
import logging
import os
from fastapi import HTTPException, Response, UploadFile
from fastapi.responses import JSONResponse
logger = logging.getLogger(__name__)
def download(remote_name: str, path: str, storage, database, config) -> Response:
if not config.get_local_config(remote_name):
raise HTTPException(status_code=404, detail=f"Local repository '{remote_name}' not configured")
metadata = database.get_local_file_metadata(remote_name, path)
if not metadata:
raise HTTPException(status_code=404, detail="File not found")
content = storage.download_object(metadata["s3_key"])
return Response(
content=content,
media_type=metadata.get("content_type", "application/octet-stream"),
headers={"Content-Disposition": f"attachment; filename={os.path.basename(path)}"},
)
async def upload(remote_name: str, path: str, file: UploadFile, storage, database, config) -> JSONResponse:
if not config.get_local_config(remote_name):
raise HTTPException(status_code=404, detail=f"Local repository '{remote_name}' not configured")
try:
content = await file.read()
sha256_sum = hashlib.sha256(content).hexdigest()
if database.file_exists(remote_name, path):
raise HTTPException(status_code=409, detail="File already exists")
s3_key = f"local/{remote_name}/{path}"
content_type = file.content_type or "application/octet-stream"
try:
storage.upload(s3_key, content)
except Exception as e:
raise HTTPException(status_code=500, detail=f"Upload failed: {e}")
success = database.add_local_file(
repository_name=remote_name,
file_path=path,
s3_key=s3_key,
size_bytes=len(content),
sha256_sum=sha256_sum,
content_type=content_type,
)
if not success:
storage.delete_object(s3_key)
raise HTTPException(status_code=500, detail="Failed to save file metadata")
return JSONResponse(
{
"message": "File uploaded successfully",
"file_path": path,
"size_bytes": len(content),
"sha256_sum": sha256_sum,
"content_type": content_type,
}
)
except HTTPException:
raise
except Exception as e:
raise HTTPException(status_code=500, detail=f"Upload failed: {str(e)}")
def check_exists(remote_name: str, path: str, database, config) -> Response:
if not config.get_local_config(remote_name):
raise HTTPException(status_code=404, detail=f"Local repository '{remote_name}' not configured")
try:
metadata = database.get_local_file_metadata(remote_name, path)
if not metadata:
raise HTTPException(status_code=404, detail="File not found")
return Response(
headers={
"Content-Length": str(metadata["size_bytes"]),
"Content-Type": metadata.get("content_type", "application/octet-stream"),
"X-SHA256": metadata["sha256_sum"],
"X-Created-At": metadata["created_at"].isoformat() if metadata["created_at"] else "",
"X-Uploaded-At": metadata["uploaded_at"].isoformat() if metadata["uploaded_at"] else "",
}
)
except HTTPException:
raise
except Exception as e:
raise HTTPException(status_code=500, detail=f"Check failed: {str(e)}")
def delete(remote_name: str, path: str, storage, database, config) -> JSONResponse:
if not config.get_local_config(remote_name):
raise HTTPException(status_code=404, detail=f"Local repository '{remote_name}' not configured")
try:
s3_key = database.delete_local_file(remote_name, path)
if not s3_key:
raise HTTPException(status_code=404, detail="File not found")
if not storage.delete_object(s3_key):
logger.warning(f"Failed to delete S3 object {s3_key} after database removal")
return JSONResponse({"message": "File deleted successfully"})
except HTTPException:
raise
except Exception as e:
raise HTTPException(status_code=500, detail=f"Delete failed: {str(e)}")
+318
View File
@@ -0,0 +1,318 @@
import base64
import logging
import os
import re
from datetime import UTC, datetime, timedelta
from email.utils import parsedate_to_datetime
import httpx
from fastapi import HTTPException, Request, Response
from ..auth import get_docker_token_for_response
from ..remote import helm as _helm
from ..remote import npm as _npm
from ..remote import python as _pypi
from ..remote.base import get_content_type
logger = logging.getLogger(__name__)
class UpstreamUnreachable(Exception):
"""Raised when the upstream backend cannot be contacted (network or timeout error)."""
def _check_quarantine(remote_name: str, last_modified_str: str | None, config) -> None:
"""Raise HTTP 404 if the artifact is within the per-remote quarantine window.
Fails open (allows the request) when the publish date cannot be determined.
"""
enabled, days = config.get_quarantine_config(remote_name)
if not enabled or not days:
return
if not last_modified_str:
return # cannot determine age → allow
try:
publish_date = parsedate_to_datetime(last_modified_str)
except Exception:
return # unparseable → allow
cutoff = datetime.now(UTC) - timedelta(days=days)
if publish_date > cutoff:
available_on = (publish_date + timedelta(days=days)).date()
raise HTTPException(
status_code=404,
detail=(
f"Package quarantined: published {publish_date.date()}, available after {available_on} ({days}-day new-release quarantine)"
),
)
async def _fetch_last_modified(remote_url: str, remote_cfg: dict) -> str | None:
"""HEAD the upstream URL and return the Last-Modified header, or None on any failure."""
auth = _basic_auth_header(remote_cfg)
try:
async with httpx.AsyncClient(follow_redirects=True) as client:
response = await client.head(remote_url, headers=auth, timeout=10.0)
return response.headers.get("Last-Modified")
except Exception:
return None
def _basic_auth_header(remote_cfg: dict) -> dict[str, str]:
username = remote_cfg.get("username")
password = remote_cfg.get("password")
if username and password:
token = base64.b64encode(f"{username}:{password}".encode()).decode()
return {"Authorization": f"Basic {token}"}
return {}
def _resolve_content(
data: bytes,
path: str,
filename: str,
remote_config: dict,
request: Request,
remote_name: str = "",
) -> tuple[bytes, str]:
package = remote_config.get("package")
proxy_base = str(request.base_url).rstrip("/")
base_url = remote_config.get("base_url", "").rstrip("/")
if package == "pypi":
return _pypi.resolve_content(data, path, filename, remote_config.get("immutable_patterns", []), base_url, proxy_base, remote_name)
if package == "npm":
return _npm.resolve_content(data, path, filename, remote_config.get("immutable_patterns", []), base_url, proxy_base, remote_name)
if package == "helm":
return _helm.resolve_content(data, path, filename, base_url, proxy_base, remote_name)
return data, get_content_type(filename)
def construct_url(remote_config: dict, path: str) -> str:
base_url = remote_config.get("base_url", "").rstrip("/")
if remote_config.get("package") == "docker":
return f"{base_url}/v2/{path}"
if remote_config.get("package") == "pypi":
return _pypi.construct_url(base_url, path)
return f"{base_url}/{path}"
async def cache_single_artifact(url: str, remote_name: str, path: str, storage, remote_config: dict) -> dict:
key = storage.get_object_key(remote_name, path)
if storage.exists(key):
logger.info(f"Cache ALREADY EXISTS: {url} (key: {key})")
return {"url": url, "cached_url": storage.get_url(key), "status": "already_cached"}
try:
is_docker = remote_config.get("package") == "docker" or "/v2/" in url
headers = {}
username = remote_config.get("username")
password = remote_config.get("password")
if is_docker:
if "/manifests/" in url:
headers["Accept"] = (
"application/vnd.docker.distribution.manifest.v2+json,"
"application/vnd.oci.image.manifest.v1+json,"
"application/vnd.oci.image.index.v1+json,"
"application/vnd.docker.distribution.manifest.list.v2+json"
)
elif "/blobs/" in url:
headers["Accept"] = "application/octet-stream"
elif username and password:
headers["Authorization"] = "Basic " + base64.b64encode(f"{username}:{password}".encode()).decode()
async with httpx.AsyncClient(follow_redirects=True) as client:
response = await client.get(url, headers=headers)
if response.status_code == 401 and is_docker:
www_auth = response.headers.get("WWW-Authenticate", "")
token = await get_docker_token_for_response(www_auth, username, password)
if token:
headers["Authorization"] = f"Bearer {token}"
response = await client.get(url, headers=headers)
response.raise_for_status()
storage.upload(key, response.content)
logger.info(f"Cache ADD SUCCESS: {url} (size: {len(response.content)} bytes, key: {key})")
return {
"url": url,
"cached_url": storage.get_url(key),
"storage_path": f"s3://{storage.bucket}/{key}",
"size": len(response.content),
"status": "cached",
"etag": response.headers.get("ETag"),
"last_modified": response.headers.get("Last-Modified"),
}
except Exception as e:
return {"url": url, "status": "error", "error": str(e)}
async def _upstream_reachable(url: str, auth_headers: dict | None = None) -> bool:
try:
async with httpx.AsyncClient(follow_redirects=True) as client:
await client.head(url, headers=auth_headers or {}, timeout=10.0)
return True
except (httpx.NetworkError, httpx.TimeoutException):
return False
except Exception:
return True
async def check_upstream_changed(remote_url: str, remote_name: str, path: str, cache, auth_headers: dict | None = None) -> bool:
meta = cache.get_mutable_meta(remote_name, path)
if not meta:
return True
headers = dict(auth_headers or {})
if meta.get("etag"):
headers["If-None-Match"] = meta["etag"]
if meta.get("last_modified"):
headers["If-Modified-Since"] = meta["last_modified"]
if not (meta.get("etag") or meta.get("last_modified")):
return True
try:
async with httpx.AsyncClient(follow_redirects=True) as client:
response = await client.head(remote_url, headers=headers)
return response.status_code != 304
except (httpx.NetworkError, httpx.TimeoutException) as exc:
raise UpstreamUnreachable(str(exc)) from exc
async def handle_expired_mutable(remote_name: str, path: str, remote_url: str, config, cache, storage) -> bool:
"""Handle an expired mutable file. Returns True if the cached copy is still valid."""
mutable_ttl = config.get_cache_config(remote_name).get("mutable_ttl", 3600)
remote_cfg = config.get_remote_config(remote_name) or {}
auth = _basic_auth_header(remote_cfg)
check_updates = remote_cfg.get("check_mutable_updates", False)
user_mutable = check_updates and cache.is_mutable_file(path, config.get_user_mutable_patterns(remote_name))
if user_mutable:
try:
changed = await check_upstream_changed(remote_url, remote_name, path, cache, auth)
except UpstreamUnreachable:
cache.mark_index_cached(remote_name, path, mutable_ttl)
logger.warning(f"Mutable STALE (backend unreachable): {remote_name}/{path} - TTL extended ({mutable_ttl}s)")
return True
if not changed:
cache.mark_index_cached(remote_name, path, mutable_ttl)
logger.info(f"Mutable file UNCHANGED: {remote_name}/{path} - TTL refreshed ({mutable_ttl}s)")
return True
logger.info(f"Mutable file CHANGED: {remote_name}/{path} - re-downloading")
else:
if not await _upstream_reachable(remote_url, auth):
cache.mark_index_cached(remote_name, path, mutable_ttl)
logger.warning(f"Mutable STALE (backend unreachable): {remote_name}/{path} - TTL extended ({mutable_ttl}s)")
return True
logger.info(f"Mutable file EXPIRED: {remote_name}/{path} - removing from cache")
cache.cleanup_expired_index(storage, remote_name, path)
return False
async def handle(request: Request, remote_name: str, path: str, storage, cache, config, database, metrics) -> Response:
remote_config = config.get_remote_config(remote_name)
if not remote_config:
raise HTTPException(status_code=404, detail=f"Remote '{remote_name}' not configured")
path_parts = path.split("/")
if len(path_parts) >= 2:
repo_path = f"{path_parts[0]}/{path_parts[1]}"
file_path = "/".join(path_parts[2:])
else:
repo_path = path
file_path = path
mutable_patterns = config.get_mutable_patterns(remote_name)
if not cache.is_mutable_file(file_path, mutable_patterns) and not cache.is_mutable_file(path, mutable_patterns):
patterns = config.get_immutable_patterns(remote_name, repo_path)
if patterns and not any(re.search(p, file_path) or re.search(p, path) for p in patterns):
logger.info(f"PATTERN BLOCKED: {remote_name}/{path} - not matching include patterns")
raise HTTPException(status_code=403, detail="Artifact not allowed by configuration patterns")
remote_url = construct_url(remote_config, path)
if not remote_config.get("base_url"):
raise HTTPException(status_code=500, detail=f"No base_url configured for remote '{remote_name}'")
cached_key = storage.get_object_key(remote_name, path)
if not storage.exists(cached_key):
cached_key = None
filename = os.path.basename(path)
is_mutable = cache.is_mutable_file(path, mutable_patterns)
if cached_key and is_mutable:
if not cache.is_index_valid(remote_name, path):
if not await handle_expired_mutable(remote_name, path, remote_url, config, cache, storage):
cached_key = None
if cached_key:
if not is_mutable:
published = cache.get_artifact_published(remote_name, path)
if not published:
published = await _fetch_last_modified(remote_url, remote_config)
if published:
cache.store_artifact_published(remote_name, path, published)
_check_quarantine(remote_name, published, config)
try:
artifact_data = storage.download_object(cached_key)
artifact_data, content_type = _resolve_content(artifact_data, path, filename, remote_config, request, remote_name)
logger.info(f"Cache HIT: {remote_name}/{path} (size: {len(artifact_data)} bytes, key: {cached_key})")
metrics.record_cache_hit(remote_name, len(artifact_data))
database.record_artifact_mapping(cached_key, remote_name, path, len(artifact_data))
return Response(
content=artifact_data,
media_type=content_type,
headers={
"Content-Disposition": f"attachment; filename={filename}",
"X-Artifact-Source": "cache",
"X-Artifact-Size": str(len(artifact_data)),
},
)
except HTTPException:
raise
except Exception as e:
raise HTTPException(status_code=500, detail=f"Error retrieving cached artifact: {str(e)}")
logger.info(f"Cache MISS: {remote_name}/{path} - fetching from remote: {remote_url}")
result = await cache_single_artifact(remote_url, remote_name, path, storage, remote_config)
if result["status"] == "error":
logger.error(f"Cache ADD FAILED: {remote_name}/{path} - {result['error']}")
raise HTTPException(status_code=502, detail=f"Failed to fetch artifact: {result['error']}")
if result["status"] == "cached" and is_mutable:
cache_config = config.get_cache_config(remote_name)
mutable_ttl = cache_config.get("mutable_ttl", 3600)
cache.mark_index_cached(remote_name, path, mutable_ttl)
logger.info(f"Mutable file cached with TTL: {remote_name}/{path} (ttl: {mutable_ttl}s)")
if result.get("etag") or result.get("last_modified"):
cache.store_mutable_meta(remote_name, path, result.get("etag"), result.get("last_modified"))
if not is_mutable:
published = result.get("last_modified")
if published:
cache.store_artifact_published(remote_name, path, published)
_check_quarantine(remote_name, published, config)
try:
cache_key = storage.get_object_key(remote_name, path)
artifact_data = storage.download_object(cache_key)
artifact_data, content_type = _resolve_content(artifact_data, path, filename, remote_config, request, remote_name)
metrics.record_cache_miss(remote_name, len(artifact_data))
database.record_artifact_mapping(cache_key, remote_name, path, len(artifact_data))
return Response(
content=artifact_data,
media_type=content_type,
headers={
"Content-Disposition": f"attachment; filename={filename}",
"X-Artifact-Source": "remote",
"X-Artifact-Size": str(len(artifact_data)),
},
)
except Exception as e:
raise HTTPException(status_code=500, detail=f"Error serving artifact: {str(e)}")
+317
View File
@@ -0,0 +1,317 @@
import asyncio
import base64
import logging
import time
from datetime import UTC, date, datetime
from typing import Protocol, runtime_checkable
import httpx
import msgpack as _msgpack
import yaml
from fastapi import HTTPException, Request, Response
logger = logging.getLogger(__name__)
try:
_YamlLoader = yaml.CSafeLoader
_YamlDumperBase = yaml.CDumper
except AttributeError:
_YamlLoader = yaml.SafeLoader
_YamlDumperBase = yaml.Dumper
class _HelmDumper(_YamlDumperBase):
"""YAML dumper that serializes datetime/date objects back to ISO 8601 strings.
yaml.safe_load converts timestamp-shaped YAML scalars (e.g. chart `created`
fields) to Python datetime objects. Without a custom representer, yaml.dump
would render them as "2022-12-16 11:08:49+00:00" (space, not T), which
Go's YAML parser cannot unmarshal into time.Time.
"""
def _repr_datetime(dumper: yaml.Dumper, data: datetime) -> yaml.ScalarNode:
s = data.strftime("%Y-%m-%dT%H:%M:%S.%f") + ("Z" if data.tzinfo else "")
return dumper.represent_scalar("tag:yaml.org,2002:str", s)
def _repr_date(dumper: yaml.Dumper, data: date) -> yaml.ScalarNode:
return dumper.represent_scalar("tag:yaml.org,2002:str", data.isoformat())
_HelmDumper.add_representer(datetime, _repr_datetime)
_HelmDumper.add_representer(date, _repr_date)
def _entries_to_msgpack_safe(entries: dict) -> dict:
"""Convert datetime/date values to ISO strings for msgpack serialization."""
result = {}
for chart, versions in entries.items():
safe_versions = []
for v in versions:
safe_v = {}
for k, val in v.items():
if isinstance(val, datetime):
safe_v[k] = val.isoformat()
elif isinstance(val, date):
safe_v[k] = val.isoformat()
else:
safe_v[k] = val
safe_versions.append(safe_v)
result[chart] = safe_versions
return result
async def _get_member_index(
member_name: str,
member_cfg: dict,
path: str,
storage,
cache,
) -> tuple[str, dict, int, bytes | None, dict | None]:
"""Fetch or retrieve cached index.yaml for one member remote.
Returns (member_name, member_cfg, ttl, raw_bytes, parsed_entries).
raw_bytes is None if the member is unreachable and not in S3.
parsed_entries is the pre-parsed entries dict (from msgpack cache), or None.
"""
member_ttl = member_cfg.get("cache", {}).get("mutable_ttl", 3600)
s3_key = storage.get_object_key(member_name, path)
msgpack_key = storage.get_object_key(member_name, "index.msgpack")
raw_data: bytes | None = None
parsed_entries: dict | None = None
if storage.exists(s3_key) and cache.is_index_valid(member_name, path):
try:
raw_data = storage.download_object(s3_key)
logger.info(f"Virtual: cache hit for member '{member_name}'")
except Exception:
raw_data = None
if raw_data is not None and storage.exists(msgpack_key):
try:
packed = storage.download_object(msgpack_key)
parsed_entries = _msgpack.unpackb(packed, raw=False)
logger.debug(f"Virtual: msgpack hit for member '{member_name}'")
except Exception:
parsed_entries = None
if raw_data is None:
base_url = member_cfg.get("base_url", "").rstrip("/")
upstream_url = f"{base_url}/index.yaml"
headers = {}
username = member_cfg.get("username")
password = member_cfg.get("password")
if username and password:
token = base64.b64encode(f"{username}:{password}".encode()).decode()
headers["Authorization"] = f"Basic {token}"
try:
async with httpx.AsyncClient(follow_redirects=True) as client:
response = await client.get(upstream_url, headers=headers, timeout=30.0)
response.raise_for_status()
raw_data = response.content
except Exception as e:
logger.warning(f"Virtual: failed to fetch index.yaml from member '{member_name}': {e}")
return member_name, member_cfg, member_ttl, None, None
try:
storage.upload(s3_key, raw_data)
cache.mark_index_cached(member_name, path, member_ttl)
except Exception as e:
logger.warning(f"Virtual: failed to cache index.yaml for member '{member_name}': {e}")
if parsed_entries is None and raw_data is not None:
try:
index = yaml.load(raw_data, Loader=_YamlLoader)
safe_entries = _entries_to_msgpack_safe(index.get("entries") or {})
storage.upload(msgpack_key, _msgpack.packb(safe_entries, use_bin_type=True))
parsed_entries = safe_entries
except Exception as e:
logger.warning(f"Virtual: failed to build msgpack cache for '{member_name}': {e}")
return member_name, member_cfg, member_ttl, raw_data, parsed_entries
def _rewrite_urls(urls: list, base_url: str, proxy_base: str, member_name: str) -> list:
proxy_remote = f"{proxy_base}/api/v1/remote/{member_name}"
rewritten = []
for url in urls:
if url.startswith(("http://", "https://")):
if base_url and url.startswith(base_url):
url = proxy_remote + url[len(base_url) :]
else:
url = f"{proxy_remote}/{url.lstrip('/')}"
rewritten.append(url)
return rewritten
def _merge_helm_indexes(
raw_indexes: list[bytes],
parsed_entries_list: list[dict | None],
member_names: list[str],
member_configs: list[dict],
proxy_base: str,
) -> bytes:
"""Merge helm index.yaml files with per-member URL rewriting.
Priority is determined by position in member_names: earlier members win
when the same chart name + version appears in multiple remotes.
Uses pre-parsed msgpack entries when available to skip YAML parsing.
"""
merged_entries: dict[str, list] = {}
for raw_data, pre_parsed, member_name, member_cfg in zip(raw_indexes, parsed_entries_list, member_names, member_configs):
base_url = member_cfg.get("base_url", "").rstrip("/")
if pre_parsed is not None:
entries = pre_parsed
else:
try:
index = yaml.load(raw_data, Loader=_YamlLoader)
except Exception as e:
logger.warning(f"Virtual: failed to parse index.yaml from member '{member_name}': {e}")
continue
entries = index.get("entries") or {}
for chart_name, versions in entries.items():
for version_entry in versions:
version_entry["urls"] = _rewrite_urls(
version_entry.get("urls") or [],
base_url,
proxy_base,
member_name,
)
if chart_name not in merged_entries:
merged_entries[chart_name] = list(versions)
else:
existing = {(v.get("name"), v.get("version")) for v in merged_entries[chart_name]}
for version_entry in versions:
key = (version_entry.get("name"), version_entry.get("version"))
if key not in existing:
merged_entries[chart_name].append(version_entry)
existing.add(key)
merged = {
"apiVersion": "v1",
"entries": merged_entries,
"generated": datetime.now(UTC).strftime("%Y-%m-%dT%H:%M:%S.000Z"),
}
return yaml.dump(merged, Dumper=_HelmDumper, default_flow_style=False, allow_unicode=True).encode()
@runtime_checkable
class _VirtualHandler(Protocol):
def accepts_path(self, path: str) -> bool: ...
def merge(
self,
raw_indexes: list[bytes],
parsed_entries: list[dict | None],
member_names: list[str],
member_configs: list[dict],
proxy_base: str,
) -> bytes: ...
def path_error(self) -> str: ...
class _HelmHandler:
def accepts_path(self, path: str) -> bool:
return path == "index.yaml"
def merge(
self,
raw_indexes: list[bytes],
parsed_entries: list[dict | None],
member_names: list[str],
member_configs: list[dict],
proxy_base: str,
) -> bytes:
return _merge_helm_indexes(raw_indexes, parsed_entries, member_names, member_configs, proxy_base)
def path_error(self) -> str:
return "Virtual helm repositories only serve index.yaml; chart tarballs are served directly by member remotes"
_HANDLERS: dict[str, _VirtualHandler] = {
"helm": _HelmHandler(),
}
async def handle(request: Request, virtual_name: str, path: str, storage, cache, config) -> Response:
virtual_cfg = config.get_virtual_config(virtual_name)
if not virtual_cfg:
raise HTTPException(status_code=404, detail=f"Virtual repository '{virtual_name}' not configured")
package = virtual_cfg.get("package")
handler = _HANDLERS.get(package)
if handler is None:
raise HTTPException(status_code=400, detail=f"Virtual repositories with package '{package}' are not yet supported")
if not handler.accepts_path(path):
raise HTTPException(status_code=404, detail=handler.path_error())
members = virtual_cfg.get("members", [])
if not members:
raise HTTPException(status_code=500, detail=f"Virtual repository '{virtual_name}' has no members configured")
virtual_key = storage.get_object_key(virtual_name, path)
if cache.is_index_valid(virtual_name, path) and storage.exists(virtual_key):
data = storage.download_object(virtual_key)
logger.info(f"Virtual HIT: {virtual_name}/{path}")
return Response(content=data, media_type="text/yaml")
# Resolve configs first (config reads are sync/cheap)
member_entries = []
for member_name in members:
member_cfg = config.get_remote_config(member_name)
if not member_cfg:
logger.warning(f"Virtual '{virtual_name}': member '{member_name}' not found in config, skipping")
continue
member_entries.append((member_name, member_cfg))
# Fetch all member indexes in parallel; asyncio.gather preserves input order
proxy_base = str(request.base_url).rstrip("/")
t_fetch = time.perf_counter()
results = await asyncio.gather(*[_get_member_index(name, cfg, path, storage, cache) for name, cfg in member_entries])
fetch_ms = int((time.perf_counter() - t_fetch) * 1000)
raw_indexes: list[bytes] = []
used_parsed: list[dict | None] = []
used_members: list[str] = []
used_configs: list[dict] = []
min_ttl: int | None = None
for member_name, member_cfg, member_ttl, raw_data, parsed_entries in results:
if min_ttl is None or member_ttl < min_ttl:
min_ttl = member_ttl
if raw_data is None:
logger.warning(f"Virtual '{virtual_name}': skipping unreachable member '{member_name}'")
continue
raw_indexes.append(raw_data)
used_parsed.append(parsed_entries)
used_members.append(member_name)
used_configs.append(member_cfg)
if not raw_indexes:
raise HTTPException(status_code=502, detail=f"Virtual repository '{virtual_name}': no member indices could be fetched")
if min_ttl is None:
min_ttl = 3600
t_merge = time.perf_counter()
merged = await asyncio.to_thread(handler.merge, raw_indexes, used_parsed, used_members, used_configs, proxy_base)
merge_ms = int((time.perf_counter() - t_merge) * 1000)
try:
t_store = time.perf_counter()
storage.upload(virtual_key, merged)
cache.mark_index_cached(virtual_name, path, min_ttl)
store_ms = int((time.perf_counter() - t_store) * 1000)
msgpack_hits = sum(1 for p in used_parsed if p is not None)
logger.info(
f"Virtual MISS: {virtual_name}/{path} rebuilt from {used_members} "
f"(fetch={fetch_ms}ms merge={merge_ms}ms store={store_ms}ms ttl={min_ttl}s "
f"msgpack={msgpack_hits}/{len(used_members)})"
)
except Exception as e:
logger.warning(f"Virtual: failed to store merged index for '{virtual_name}': {e}")
return Response(content=merged, media_type="text/yaml")
+3
View File
@@ -0,0 +1,3 @@
from .docker import fetch_token, get_docker_token_for_response, parse_www_authenticate
__all__ = ["fetch_token", "get_docker_token_for_response", "parse_www_authenticate"]
+96
View File
@@ -0,0 +1,96 @@
import logging
import re
import time
import httpx
logger = logging.getLogger(__name__)
# In-memory token cache: key -> (token, expires_at)
_token_cache: dict[str, tuple[str, float]] = {}
_WWW_AUTH_RE = re.compile(
r'Bearer\s+realm="(?P<realm>[^"]+)"'
r'(?:,service="(?P<service>[^"]*)")?'
r'(?:,scope="(?P<scope>[^"]*)")?',
re.IGNORECASE,
)
def _cache_key(realm: str, service: str, scope: str, username: str | None) -> str:
return f"{realm}|{service}|{scope}|{username or ''}"
def _get_cached_token(key: str) -> str | None:
entry = _token_cache.get(key)
if entry and entry[1] > time.time():
return entry[0]
_token_cache.pop(key, None)
return None
def _store_token(key: str, token: str, expires_in: int) -> None:
# Expire 30s early to avoid using a token right as it expires
_token_cache[key] = (token, time.time() + max(expires_in - 30, 10))
async def fetch_token(
realm: str,
service: str,
scope: str,
username: str | None = None,
password: str | None = None,
) -> str | None:
"""Fetch a Bearer token from a Docker registry auth server."""
key = _cache_key(realm, service, scope, username)
cached = _get_cached_token(key)
if cached:
return cached
params: dict[str, str] = {}
if service:
params["service"] = service
if scope:
params["scope"] = scope
auth = (username, password) if username and password else None
try:
async with httpx.AsyncClient(follow_redirects=True) as client:
response = await client.get(realm, params=params, auth=auth)
response.raise_for_status()
data = response.json()
except Exception as e:
logger.warning(f"Docker token fetch failed ({realm}): {e}")
return None
token = data.get("token") or data.get("access_token")
if not token:
logger.warning(f"Docker token response missing token field: {data}")
return None
expires_in = int(data.get("expires_in", 300))
_store_token(key, token, expires_in)
logger.debug(f"Docker token obtained (realm={realm}, service={service}, scope={scope}, expires_in={expires_in}s)")
return token
def parse_www_authenticate(header: str) -> tuple[str, str, str] | None:
"""Parse WWW-Authenticate: Bearer header. Returns (realm, service, scope) or None."""
m = _WWW_AUTH_RE.search(header)
if not m:
return None
return m.group("realm"), m.group("service") or "", m.group("scope") or ""
async def get_docker_token_for_response(
www_authenticate: str,
username: str | None = None,
password: str | None = None,
) -> str | None:
"""Given a WWW-Authenticate header value, fetch and return a Bearer token."""
parsed = parse_www_authenticate(www_authenticate)
if not parsed:
return None
realm, service, scope = parsed
return await fetch_token(realm, service, scope, username, password)
+3
View File
@@ -0,0 +1,3 @@
from .redis import RedisCache
__all__ = ["RedisCache"]
+41 -10
View File
@@ -11,7 +11,6 @@ class RedisCache:
try: try:
self.client = redis.from_url(self.redis_url, decode_responses=True) self.client = redis.from_url(self.redis_url, decode_responses=True)
# Test connection
self.client.ping() self.client.ping()
self.available = True self.available = True
except Exception as e: except Exception as e:
@@ -20,7 +19,6 @@ class RedisCache:
self.available = False self.available = False
def is_mutable_file(self, file_path: str, patterns: list[str] | None = None) -> bool: def is_mutable_file(self, file_path: str, patterns: list[str] | None = None) -> bool:
"""Return True if file_path matches any of the mutable patterns."""
if patterns is None: if patterns is None:
patterns = [] patterns = []
return any(re.search(p, file_path) for p in patterns) return any(re.search(p, file_path) for p in patterns)
@@ -32,10 +30,8 @@ class RedisCache:
return f"mutable:meta:{remote_name}:{hashlib.sha256(path.encode()).hexdigest()[:16]}" return f"mutable:meta:{remote_name}:{hashlib.sha256(path.encode()).hexdigest()[:16]}"
def is_index_valid(self, remote_name: str, path: str) -> bool: def is_index_valid(self, remote_name: str, path: str) -> bool:
"""Check if mutable file is still within its TTL window."""
if not self.available: if not self.available:
return False return False
try: try:
key = self.get_index_cache_key(remote_name, path) key = self.get_index_cache_key(remote_name, path)
return self.client.exists(key) > 0 return self.client.exists(key) > 0
@@ -43,10 +39,8 @@ class RedisCache:
return False return False
def mark_index_cached(self, remote_name: str, path: str, ttl: int = 300) -> None: def mark_index_cached(self, remote_name: str, path: str, ttl: int = 300) -> None:
"""Set or refresh the TTL key for a mutable file."""
if not self.available: if not self.available:
return return
try: try:
key = self.get_index_cache_key(remote_name, path) key = self.get_index_cache_key(remote_name, path)
self.client.setex(key, ttl, str(int(time.time()))) self.client.setex(key, ttl, str(int(time.time())))
@@ -54,7 +48,6 @@ class RedisCache:
pass pass
def store_mutable_meta(self, remote_name: str, path: str, etag: str | None, last_modified: str | None) -> None: def store_mutable_meta(self, remote_name: str, path: str, etag: str | None, last_modified: str | None) -> None:
"""Persist ETag and Last-Modified for future conditional requests."""
if not self.available: if not self.available:
return return
data = {} data = {}
@@ -70,7 +63,6 @@ class RedisCache:
pass pass
def get_mutable_meta(self, remote_name: str, path: str) -> dict: def get_mutable_meta(self, remote_name: str, path: str) -> dict:
"""Return stored ETag/Last-Modified for a mutable file, or {}."""
if not self.available: if not self.available:
return {} return {}
try: try:
@@ -86,15 +78,54 @@ class RedisCache:
except Exception: except Exception:
pass pass
def get_artifact_published_key(self, remote_name: str, path: str) -> str:
return f"pkg:published:{remote_name}:{hashlib.sha256(path.encode()).hexdigest()[:16]}"
def store_artifact_published(self, remote_name: str, path: str, last_modified: str) -> None:
"""Persist the upstream Last-Modified header for a (typically immutable) artifact."""
if not self.available:
return
try:
self.client.set(self.get_artifact_published_key(remote_name, path), last_modified)
except Exception:
pass
def get_artifact_published(self, remote_name: str, path: str) -> str | None:
"""Return the stored Last-Modified string for an artifact, or None."""
if not self.available:
return None
try:
return self.client.get(self.get_artifact_published_key(remote_name, path))
except Exception:
return None
def acquire_fetch_lock(self, remote_name: str, path: str, ttl: int = 30) -> bool:
"""Try to acquire a short-lived fetch lock. Returns True if acquired, False if held by another caller."""
if not self.available:
return True # fail open: no Redis → behave as if we always hold the lock
key = f"fetchlock:{remote_name}:{hashlib.sha256(path.encode()).hexdigest()[:16]}"
try:
return bool(self.client.set(key, 1, nx=True, ex=ttl))
except Exception:
return True
def release_fetch_lock(self, remote_name: str, path: str) -> None:
if not self.available:
return
key = f"fetchlock:{remote_name}:{hashlib.sha256(path.encode()).hexdigest()[:16]}"
try:
self.client.delete(key)
except Exception:
pass
def cleanup_expired_index(self, storage, remote_name: str, path: str) -> None: def cleanup_expired_index(self, storage, remote_name: str, path: str) -> None:
"""Remove an expired mutable file from S3 and clear its Redis meta."""
if not self.available: if not self.available:
return return
try: try:
import os import os
from .config import ConfigManager from ..config import ConfigManager
config_path = os.environ.get("CONFIG_PATH") config_path = os.environ.get("CONFIG_PATH")
if config_path: if config_path:
+99 -14
View File
@@ -1,3 +1,4 @@
import glob
import json import json
import os import os
@@ -18,36 +19,99 @@ _PACKAGE_MUTABLE_PATTERNS: dict[str, list[str]] = {
r"/manifests/(?!sha256:)[^/]+$", r"/manifests/(?!sha256:)[^/]+$",
r"/tags/list$", r"/tags/list$",
], ],
"pypi": [
r"simple/", # Per-package and top-level simple index pages
],
"npm": [],
"helm": [
r"index\.yaml$",
],
"generic": [], "generic": [],
} }
class ConfigManager: class ConfigManager:
def __init__(self, config_file: str = "remotes.yaml"): def __init__(self, config_path: str = "remotes.yaml"):
self.config_file = config_file self.config_path = config_path
self._last_modified = 0 self._config_dir: str | None = None
self._last_modified: float = 0.0
self.config = self._load_config() self.config = self._load_config()
def _load_config(self) -> dict: def _load_single_file(self, path: str) -> dict:
try: try:
with open(self.config_file) as f: with open(path) as f:
if self.config_file.endswith(".yaml") or self.config_file.endswith(".yml"): if path.endswith((".yaml", ".yml")):
return yaml.safe_load(f) return yaml.safe_load(f) or {}
else:
return json.load(f) return json.load(f)
except FileNotFoundError: except FileNotFoundError:
return {"remotes": {}} return {}
@staticmethod
def _merge(base: dict, overlay: dict) -> dict:
result = {**base}
for key, value in overlay.items():
if key in ("remotes", "virtuals", "locals") and isinstance(base.get(key), dict) and isinstance(value, dict):
result[key] = {**base.get(key, {}), **value}
else:
result[key] = value
return result
def _load_from_dir(self, dir_path: str) -> dict:
merged: dict = {}
files = sorted(glob.glob(os.path.join(dir_path, "*.yaml")) + glob.glob(os.path.join(dir_path, "*.yml")))
for path in files:
merged = self._merge(merged, self._load_single_file(path))
return merged
def _load_config(self) -> dict:
self._config_dir = None
if os.path.isdir(self.config_path):
return self._load_from_dir(self.config_path) or {"remotes": {}, "virtuals": {}, "locals": {}}
config = self._load_single_file(self.config_path)
if not config:
return {"remotes": {}, "virtuals": {}, "locals": {}}
config_dir = config.pop("config_dir", None)
if config_dir:
if not os.path.isabs(config_dir):
config_dir = os.path.join(os.path.dirname(os.path.abspath(self.config_path)), config_dir)
self._config_dir = config_dir
config = self._merge(config, self._load_from_dir(config_dir))
return config
def _file_mtimes(self) -> list[float]:
mtimes: list[float] = []
if os.path.isdir(self.config_path):
for f in glob.glob(os.path.join(self.config_path, "*.yaml")) + glob.glob(os.path.join(self.config_path, "*.yml")):
try:
mtimes.append(os.path.getmtime(f))
except OSError:
pass
else:
try:
mtimes.append(os.path.getmtime(self.config_path))
except OSError:
pass
if self._config_dir and os.path.isdir(self._config_dir):
for f in glob.glob(os.path.join(self._config_dir, "*.yaml")) + glob.glob(os.path.join(self._config_dir, "*.yml")):
try:
mtimes.append(os.path.getmtime(f))
except OSError:
pass
return mtimes
def _check_reload(self) -> None: def _check_reload(self) -> None:
"""Check if config file has been modified and reload if needed"""
try: try:
import os current_modified = max(self._file_mtimes(), default=0.0)
current_modified = os.path.getmtime(self.config_file)
if current_modified > self._last_modified: if current_modified > self._last_modified:
self._last_modified = current_modified self._last_modified = current_modified
self.config = self._load_config() self.config = self._load_config()
print(f"Config reloaded from {self.config_file}") print(f"Config reloaded from {self.config_path}")
except OSError: except OSError:
pass pass
@@ -55,6 +119,14 @@ class ConfigManager:
self._check_reload() self._check_reload()
return self.config.get("remotes", {}).get(remote_name) return self.config.get("remotes", {}).get(remote_name)
def get_virtual_config(self, virtual_name: str) -> dict | None:
self._check_reload()
return self.config.get("virtuals", {}).get(virtual_name)
def get_local_config(self, local_name: str) -> dict | None:
self._check_reload()
return self.config.get("locals", {}).get(local_name)
def get_immutable_patterns(self, remote_name: str, repo_path: str = "") -> list[str]: def get_immutable_patterns(self, remote_name: str, repo_path: str = "") -> list[str]:
remote_config = self.get_remote_config(remote_name) remote_config = self.get_remote_config(remote_name)
if not remote_config: if not remote_config:
@@ -152,3 +224,16 @@ class ConfigManager:
return {} return {}
return remote_config.get("cache", {}) return remote_config.get("cache", {})
def get_quarantine_config(self, remote_name: str) -> tuple[bool, int]:
"""Return (enabled, quarantine_days) for a remote.
When enabled=True and quarantine_days>0, immutable artifacts published
within the last quarantine_days days are blocked with a 404.
"""
remote_config = self.get_remote_config(remote_name)
if not remote_config:
return False, 0
enabled = bool(remote_config.get("quarantine_new", False))
days = int(remote_config.get("quarantine_days", 0))
return enabled, days
+3
View File
@@ -0,0 +1,3 @@
from .postgres import DatabaseManager
__all__ = ["DatabaseManager"]
@@ -9,7 +9,6 @@ class DatabaseManager:
self._init_database() self._init_database()
def _init_database(self): def _init_database(self):
"""Initialize database connection and create schema if needed"""
try: try:
self.connection = psycopg2.connect(self.db_url) self.connection = psycopg2.connect(self.db_url)
self.connection.autocommit = True self.connection.autocommit = True
@@ -21,10 +20,8 @@ class DatabaseManager:
self.available = False self.available = False
def _create_schema(self): def _create_schema(self):
"""Create tables if they don't exist"""
try: try:
with self.connection.cursor() as cursor: with self.connection.cursor() as cursor:
# Create table to map S3 keys to remote names
cursor.execute(""" cursor.execute("""
CREATE TABLE IF NOT EXISTS artifact_mappings ( CREATE TABLE IF NOT EXISTS artifact_mappings (
id SERIAL PRIMARY KEY, id SERIAL PRIMARY KEY,
@@ -51,7 +48,6 @@ class DatabaseManager:
) )
""") """)
# Create indexes separately
cursor.execute("CREATE INDEX IF NOT EXISTS idx_s3_key ON artifact_mappings (s3_key)") cursor.execute("CREATE INDEX IF NOT EXISTS idx_s3_key ON artifact_mappings (s3_key)")
cursor.execute("CREATE INDEX IF NOT EXISTS idx_remote_name ON artifact_mappings (remote_name)") cursor.execute("CREATE INDEX IF NOT EXISTS idx_remote_name ON artifact_mappings (remote_name)")
cursor.execute("CREATE INDEX IF NOT EXISTS idx_local_repo_path ON local_files (repository_name, file_path)") cursor.execute("CREATE INDEX IF NOT EXISTS idx_local_repo_path ON local_files (repository_name, file_path)")
@@ -61,7 +57,6 @@ class DatabaseManager:
print(f"Error creating schema: {e}") print(f"Error creating schema: {e}")
def record_artifact_mapping(self, s3_key: str, remote_name: str, file_path: str, size_bytes: int): def record_artifact_mapping(self, s3_key: str, remote_name: str, file_path: str, size_bytes: int):
"""Record mapping between S3 key and remote"""
if not self.available: if not self.available:
return return
@@ -83,7 +78,6 @@ class DatabaseManager:
print(f"Error recording artifact mapping: {e}") print(f"Error recording artifact mapping: {e}")
def get_storage_by_remote(self) -> dict[str, int]: def get_storage_by_remote(self) -> dict[str, int]:
"""Get storage size breakdown by remote from database"""
if not self.available: if not self.available:
return {} return {}
@@ -101,7 +95,6 @@ class DatabaseManager:
return {} return {}
def get_remote_for_s3_key(self, s3_key: str) -> str | None: def get_remote_for_s3_key(self, s3_key: str) -> str | None:
"""Get remote name for given S3 key"""
if not self.available: if not self.available:
return None return None
@@ -126,7 +119,6 @@ class DatabaseManager:
sha256_sum: str, sha256_sum: str,
content_type: str = None, content_type: str = None,
): ):
"""Add a file to local repository"""
if not self.available: if not self.available:
return False return False
@@ -153,7 +145,6 @@ class DatabaseManager:
return False return False
def get_local_file_metadata(self, repository_name: str, file_path: str): def get_local_file_metadata(self, repository_name: str, file_path: str):
"""Get metadata for a local file"""
if not self.available: if not self.available:
return None return None
@@ -185,7 +176,6 @@ class DatabaseManager:
return None return None
def list_local_files(self, repository_name: str, prefix: str = ""): def list_local_files(self, repository_name: str, prefix: str = ""):
"""List files in local repository with optional path prefix"""
if not self.available: if not self.available:
return [] return []
@@ -229,7 +219,6 @@ class DatabaseManager:
return [] return []
def delete_local_file(self, repository_name: str, file_path: str): def delete_local_file(self, repository_name: str, file_path: str):
"""Delete a file from local repository"""
if not self.available: if not self.available:
return False return False
@@ -251,7 +240,6 @@ class DatabaseManager:
return None return None
def file_exists(self, repository_name: str, file_path: str): def file_exists(self, repository_name: str, file_path: str):
"""Check if file exists in local repository"""
if not self.available: if not self.available:
return False return False
+17 -94
View File
@@ -1,96 +1,19 @@
import logging from .auth.docker import (
import re _cache_key,
import time _get_cached_token,
_store_token,
import httpx _token_cache,
fetch_token,
logger = logging.getLogger(__name__) get_docker_token_for_response,
parse_www_authenticate,
# In-memory token cache: key -> (token, expires_at)
_token_cache: dict[str, tuple[str, float]] = {}
_WWW_AUTH_RE = re.compile(
r'Bearer\s+realm="(?P<realm>[^"]+)"'
r'(?:,service="(?P<service>[^"]*)")?'
r'(?:,scope="(?P<scope>[^"]*)")?',
re.IGNORECASE,
) )
__all__ = [
def _cache_key(realm: str, service: str, scope: str, username: str | None) -> str: "_cache_key",
return f"{realm}|{service}|{scope}|{username or ''}" "_get_cached_token",
"_store_token",
"_token_cache",
def _get_cached_token(key: str) -> str | None: "fetch_token",
entry = _token_cache.get(key) "get_docker_token_for_response",
if entry and entry[1] > time.time(): "parse_www_authenticate",
return entry[0] ]
_token_cache.pop(key, None)
return None
def _store_token(key: str, token: str, expires_in: int) -> None:
# Expire 30s early to avoid using a token right as it expires
_token_cache[key] = (token, time.time() + max(expires_in - 30, 10))
async def fetch_token(
realm: str,
service: str,
scope: str,
username: str | None = None,
password: str | None = None,
) -> str | None:
"""Fetch a Bearer token from a Docker registry auth server."""
key = _cache_key(realm, service, scope, username)
cached = _get_cached_token(key)
if cached:
return cached
params: dict[str, str] = {}
if service:
params["service"] = service
if scope:
params["scope"] = scope
auth = (username, password) if username and password else None
try:
async with httpx.AsyncClient(follow_redirects=True) as client:
response = await client.get(realm, params=params, auth=auth)
response.raise_for_status()
data = response.json()
except Exception as e:
logger.warning(f"Docker token fetch failed ({realm}): {e}")
return None
token = data.get("token") or data.get("access_token")
if not token:
logger.warning(f"Docker token response missing token field: {data}")
return None
expires_in = int(data.get("expires_in", 300))
_store_token(key, token, expires_in)
logger.debug(f"Docker token obtained (realm={realm}, service={service}, scope={scope}, expires_in={expires_in}s)")
return token
def parse_www_authenticate(header: str) -> tuple[str, str, str] | None:
"""Parse WWW-Authenticate: Bearer header. Returns (realm, service, scope) or None."""
m = _WWW_AUTH_RE.search(header)
if not m:
return None
return m.group("realm"), m.group("service") or "", m.group("scope") or ""
async def get_docker_token_for_response(
www_authenticate: str,
username: str | None = None,
password: str | None = None,
) -> str | None:
"""Given a WWW-Authenticate header value, fetch and return a Bearer token."""
parsed = parse_www_authenticate(www_authenticate)
if not parsed:
return None
realm, service, scope = parsed
return await fetch_token(realm, service, scope, username, password)
+50 -721
View File
@@ -1,13 +1,8 @@
import hashlib
import json
import logging import logging
import os import os
import re
from typing import Any
import httpx from fastapi import FastAPI, File, Query, Request, UploadFile
from fastapi import FastAPI, File, HTTPException, Query, Request, Response, UploadFile from fastapi.responses import PlainTextResponse
from fastapi.responses import JSONResponse, PlainTextResponse
from prometheus_client import CONTENT_TYPE_LATEST, generate_latest from prometheus_client import CONTENT_TYPE_LATEST, generate_latest
from pydantic import BaseModel from pydantic import BaseModel
@@ -16,50 +11,41 @@ try:
__version__ = version("artifactapi") __version__ = version("artifactapi")
except ImportError: except ImportError:
# Fallback for development when package isn't installed
__version__ = "dev" __version__ = "dev"
from .artifact import discovery, flush, local, proxy, virtual
from .artifact import docker as docker_handler
from .cache import RedisCache from .cache import RedisCache
from .config import ConfigManager from .config import ConfigManager
from .database import DatabaseManager from .database import DatabaseManager
from .docker_auth import get_docker_token_for_response
from .metrics import MetricsManager from .metrics import MetricsManager
from .storage import S3Storage from .storage import S3Storage
class ArtifactRequest(BaseModel):
remote: str
include_pattern: str
class UpstreamUnreachable(Exception):
"""Raised when the upstream backend cannot be contacted (network or timeout error)."""
# Configure logging
logging.basicConfig(level=logging.INFO, format="%(asctime)s - %(name)s - %(levelname)s - %(message)s") logging.basicConfig(level=logging.INFO, format="%(asctime)s - %(name)s - %(levelname)s - %(message)s")
logger = logging.getLogger(__name__) logger = logging.getLogger(__name__)
app = FastAPI(title="Artifact Storage API", version=__version__) app = FastAPI(title="Artifact Storage API", version=__version__)
# Initialize components using config
config_path = os.environ.get("CONFIG_PATH") config_path = os.environ.get("CONFIG_PATH")
if not config_path: if not config_path:
raise ValueError("CONFIG_PATH environment variable is required") raise ValueError("CONFIG_PATH environment variable is required")
config = ConfigManager(config_path) config = ConfigManager(config_path)
# Get configurations
s3_config = config.get_s3_config() s3_config = config.get_s3_config()
redis_config = config.get_redis_config() redis_config = config.get_redis_config()
db_config = config.get_database_config() db_config = config.get_database_config()
# Initialize services
storage = S3Storage(**s3_config) storage = S3Storage(**s3_config)
cache = RedisCache(redis_config["url"]) cache = RedisCache(redis_config["url"])
database = DatabaseManager(db_config["url"]) database = DatabaseManager(db_config["url"])
metrics = MetricsManager(cache, database) metrics = MetricsManager(cache, database)
class ArtifactRequest(BaseModel):
remote: str
include_pattern: str
@app.get("/") @app.get("/")
def read_root(): def read_root():
config._check_reload() config._check_reload()
@@ -67,6 +53,8 @@ def read_root():
"message": "Artifact Storage API", "message": "Artifact Storage API",
"version": app.version, "version": app.version,
"remotes": list(config.config.get("remotes", {}).keys()), "remotes": list(config.config.get("remotes", {}).keys()),
"virtuals": list(config.config.get("virtuals", {}).keys()),
"locals": list(config.config.get("locals", {}).keys()),
} }
@@ -75,735 +63,76 @@ def health_check():
return {"status": "healthy"} return {"status": "healthy"}
@app.get("/config")
def get_config():
return config.config
@app.get("/metrics")
def get_metrics(json: bool | None = Query(False, description="Return JSON format instead of Prometheus")):
config._check_reload()
if json:
return metrics.get_metrics(storage, config)
metrics.get_metrics(storage, config)
return PlainTextResponse(generate_latest().decode("utf-8"), media_type=CONTENT_TYPE_LATEST)
@app.put("/cache/flush") @app.put("/cache/flush")
def flush_cache( def flush_cache(
remote: str = Query(default=None, description="Specific remote to flush (optional)"), remote: str = Query(default=None, description="Specific remote to flush (optional)"),
cache_type: str = Query(default="all", description="Type to flush: 'all', 'index', 'files', 'metrics'"), cache_type: str = Query(default="all", description="Type to flush: 'all', 'index', 'files', 'metrics'"),
): ):
"""Flush cache entries for specified remote or all remotes""" return flush.handle(remote, cache_type, cache, storage)
try:
result = {"remote": remote, "cache_type": cache_type, "flushed": {"redis_keys": 0, "s3_objects": 0, "operations": []}}
# Flush Redis entries based on cache_type
if cache_type in ["all", "index", "metrics"] and cache.available and cache.client:
patterns = []
if cache_type in ["all", "index"]:
if remote:
patterns.append(f"index:{remote}:*")
patterns.append(f"mutable:meta:{remote}:*")
else:
patterns.append("index:*")
patterns.append("mutable:meta:*")
if cache_type in ["all", "metrics"]:
if remote:
patterns.append(f"metrics:*:{remote}")
else:
patterns.append("metrics:*")
for pattern in patterns:
keys = cache.client.keys(pattern)
if keys:
cache.client.delete(*keys)
result["flushed"]["redis_keys"] += len(keys)
logger.info(f"Cache flush: Deleted {len(keys)} Redis keys matching '{pattern}'")
if result["flushed"]["redis_keys"] > 0:
result["flushed"]["operations"].append(f"Deleted {result['flushed']['redis_keys']} Redis keys")
# Flush S3 objects if requested
if cache_type in ["all", "files"]:
try:
# Use prefix filtering for remote-specific deletion
list_params = {"Bucket": storage.bucket}
if remote:
list_params["Prefix"] = f"{remote}/"
response = storage.client.list_objects_v2(**list_params)
if "Contents" in response:
objects_to_delete = [obj["Key"] for obj in response["Contents"]]
for key in objects_to_delete:
try:
storage.client.delete_object(Bucket=storage.bucket, Key=key)
result["flushed"]["s3_objects"] += 1
except Exception as e:
logger.warning(f"Failed to delete S3 object {key}: {e}")
if objects_to_delete:
scope = f" for remote '{remote}'" if remote else ""
result["flushed"]["operations"].append(f"Deleted {len(objects_to_delete)} S3 objects{scope}")
logger.info(f"Cache flush: Deleted {len(objects_to_delete)} S3 objects{scope}")
except Exception as e:
result["flushed"]["operations"].append(f"S3 flush failed: {str(e)}")
logger.error(f"Cache flush S3 error: {e}")
if not result["flushed"]["operations"]:
result["flushed"]["operations"].append("No cache entries found to flush")
return result
except Exception as e:
logger.error(f"Cache flush error: {e}")
raise HTTPException(status_code=500, detail=f"Cache flush failed: {str(e)}")
async def construct_remote_url(remote_name: str, path: str) -> str:
remote_config = config.get_remote_config(remote_name)
if not remote_config:
raise HTTPException(status_code=404, detail=f"Remote '{remote_name}' not configured")
base_url = remote_config.get("base_url")
if not base_url:
raise HTTPException(status_code=500, detail=f"No base_url configured for remote '{remote_name}'")
# Handle Docker registry URLs
if remote_config.get("package") == "docker":
# Convert Docker paths to v2 API format
# e.g., library/nginx/manifests/latest -> v2/library/nginx/manifests/latest
return f"{base_url}/v2/{path}"
return f"{base_url}/{path}"
async def check_artifact_patterns(remote_name: str, repo_path: str, file_path: str, full_path: str) -> bool:
# Mutable files (index files) are always allowed through
mutable_patterns = config.get_mutable_patterns(remote_name)
if cache.is_mutable_file(file_path, mutable_patterns) or cache.is_mutable_file(full_path, mutable_patterns):
return True
# Check immutable include patterns
patterns = config.get_immutable_patterns(remote_name, repo_path)
if not patterns:
return True # Allow all if no patterns configured
pattern_matched = False
for pattern in patterns:
# Check both file_path and full_path to handle different pattern types
if re.search(pattern, file_path) or re.search(pattern, full_path):
pattern_matched = True
break
if not pattern_matched:
return False
return True
async def cache_single_artifact(url: str, remote_name: str, path: str) -> dict:
# Use hierarchical path-based key
key = storage.get_object_key(remote_name, path)
if storage.exists(key):
logger.info(f"Cache ALREADY EXISTS: {url} (key: {key})")
return {
"url": url,
"cached_url": storage.get_url(key),
"status": "already_cached",
}
try:
remote_config = config.get_remote_config(remote_name) or {}
is_docker = remote_config.get("package") == "docker" or "/v2/" in url
# Prepare headers for Docker registry requests
headers = {}
if is_docker:
if "/manifests/" in url:
headers["Accept"] = (
"application/vnd.docker.distribution.manifest.v2+json,"
"application/vnd.oci.image.manifest.v1+json,"
"application/vnd.oci.image.index.v1+json,"
"application/vnd.docker.distribution.manifest.list.v2+json"
)
elif "/blobs/" in url:
headers["Accept"] = "application/octet-stream"
async with httpx.AsyncClient(follow_redirects=True) as client:
response = await client.get(url, headers=headers)
# Handle Docker Bearer token challenge
if response.status_code == 401 and is_docker:
www_auth = response.headers.get("WWW-Authenticate", "")
username = remote_config.get("username")
password = remote_config.get("password")
token = await get_docker_token_for_response(www_auth, username, password)
if token:
headers["Authorization"] = f"Bearer {token}"
response = await client.get(url, headers=headers)
response.raise_for_status()
storage_path = storage.upload(key, response.content)
logger.info(f"Cache ADD SUCCESS: {url} (size: {len(response.content)} bytes, key: {key})")
return {
"url": url,
"cached_url": storage.get_url(key),
"storage_path": storage_path,
"size": len(response.content),
"status": "cached",
"etag": response.headers.get("ETag"),
"last_modified": response.headers.get("Last-Modified"),
}
except Exception as e:
return {"url": url, "status": "error", "error": str(e)}
async def _upstream_reachable(url: str) -> bool:
"""HEAD with a short timeout. Returns False only on network/timeout errors."""
try:
async with httpx.AsyncClient(follow_redirects=True) as client:
await client.head(url, timeout=10.0)
return True
except (httpx.NetworkError, httpx.TimeoutException):
return False
except Exception:
return True # 4xx/5xx means backend is up
async def check_upstream_changed(remote_url: str, remote_name: str, path: str) -> bool:
"""Conditional HEAD against upstream. Returns False only on a definitive 304.
Raises UpstreamUnreachable if the backend cannot be contacted."""
meta = cache.get_mutable_meta(remote_name, path)
if not meta:
return True
headers = {}
if meta.get("etag"):
headers["If-None-Match"] = meta["etag"]
if meta.get("last_modified"):
headers["If-Modified-Since"] = meta["last_modified"]
if not headers:
return True
try:
async with httpx.AsyncClient(follow_redirects=True) as client:
response = await client.head(remote_url, headers=headers)
return response.status_code != 304
except (httpx.NetworkError, httpx.TimeoutException) as exc:
raise UpstreamUnreachable(str(exc)) from exc
async def handle_expired_mutable(remote_name: str, path: str, remote_url: str) -> bool:
"""Handle an expired mutable file. Returns True if the cached copy is still valid."""
mutable_ttl = config.get_cache_config(remote_name).get("mutable_ttl", 3600)
remote_cfg = config.get_remote_config(remote_name) or {}
check_updates = remote_cfg.get("check_mutable_updates", False)
user_mutable = check_updates and cache.is_mutable_file(path, config.get_user_mutable_patterns(remote_name))
if user_mutable:
try:
changed = await check_upstream_changed(remote_url, remote_name, path)
except UpstreamUnreachable:
cache.mark_index_cached(remote_name, path, mutable_ttl)
logger.warning(f"Mutable STALE (backend unreachable): {remote_name}/{path} - TTL extended ({mutable_ttl}s)")
return True
if not changed:
cache.mark_index_cached(remote_name, path, mutable_ttl)
logger.info(f"Mutable file UNCHANGED: {remote_name}/{path} - TTL refreshed ({mutable_ttl}s)")
return True
logger.info(f"Mutable file CHANGED: {remote_name}/{path} - re-downloading")
else:
if not await _upstream_reachable(remote_url):
cache.mark_index_cached(remote_name, path, mutable_ttl)
logger.warning(f"Mutable STALE (backend unreachable): {remote_name}/{path} - TTL extended ({mutable_ttl}s)")
return True
logger.info(f"Mutable file EXPIRED: {remote_name}/{path} - removing from cache")
cache.cleanup_expired_index(storage, remote_name, path)
return False
@app.get("/api/v1/remote/{remote_name}/{path:path}")
async def get_artifact(remote_name: str, path: str):
# Check if remote is configured
remote_config = config.get_remote_config(remote_name)
if not remote_config:
raise HTTPException(status_code=404, detail=f"Remote '{remote_name}' not configured")
# Check if this is a local repository
if remote_config.get("type") == "local":
# Handle local repository download
metadata = database.get_local_file_metadata(remote_name, path)
if not metadata:
raise HTTPException(status_code=404, detail="File not found")
# Get file from S3
content = storage.download_object(metadata["s3_key"])
if content is None:
raise HTTPException(status_code=500, detail="File not accessible")
# Determine content type
content_type = metadata.get("content_type", "application/octet-stream")
return Response(
content=content,
media_type=content_type,
headers={"Content-Disposition": f"attachment; filename={os.path.basename(path)}"},
)
# Extract repository path for pattern checking
path_parts = path.split("/")
if len(path_parts) >= 2:
repo_path = f"{path_parts[0]}/{path_parts[1]}"
file_path = "/".join(path_parts[2:])
else:
repo_path = path
file_path = path
# Check if artifact matches configured patterns
if not await check_artifact_patterns(remote_name, repo_path, file_path, path):
logger.info(f"PATTERN BLOCKED: {remote_name}/{path} - not matching include patterns")
raise HTTPException(status_code=403, detail="Artifact not allowed by configuration patterns")
# Construct the remote URL
remote_url = await construct_remote_url(remote_name, path)
# Check if artifact is already cached
cached_key = storage.get_object_key(remote_name, path)
if not storage.exists(cached_key):
cached_key = None
# For mutable files, check Redis TTL validity
filename = os.path.basename(path)
is_mutable = cache.is_mutable_file(path, config.get_mutable_patterns(remote_name))
if cached_key and is_mutable:
if not cache.is_index_valid(remote_name, path):
if not await handle_expired_mutable(remote_name, path, remote_url):
cached_key = None
if cached_key:
# Return cached artifact
try:
artifact_data = storage.download_object(cached_key)
filename = os.path.basename(path)
# Log cache hit
logger.info(f"Cache HIT: {remote_name}/{path} (size: {len(artifact_data)} bytes, key: {cached_key})")
# Determine content type based on file extension
content_type = "application/octet-stream"
if filename.endswith(".tar.gz"):
content_type = "application/gzip"
elif filename.endswith(".zip"):
content_type = "application/zip"
elif filename.endswith(".exe"):
content_type = "application/x-msdownload"
elif filename.endswith(".rpm"):
content_type = "application/x-rpm"
elif filename.endswith(".xml"):
content_type = "application/xml"
elif filename.endswith((".xml.gz", ".xml.bz2", ".xml.xz")):
content_type = "application/gzip"
# Record cache hit metrics
metrics.record_cache_hit(remote_name, len(artifact_data))
# Record artifact mapping in database if not already recorded
database.record_artifact_mapping(cached_key, remote_name, path, len(artifact_data))
return Response(
content=artifact_data,
media_type=content_type,
headers={
"Content-Disposition": f"attachment; filename={filename}",
"X-Artifact-Source": "cache",
"X-Artifact-Size": str(len(artifact_data)),
},
)
except Exception as e:
raise HTTPException(status_code=500, detail=f"Error retrieving cached artifact: {str(e)}")
# Artifact not cached, cache it first
logger.info(f"Cache MISS: {remote_name}/{path} - fetching from remote: {remote_url}")
result = await cache_single_artifact(remote_url, remote_name, path)
if result["status"] == "error":
logger.error(f"Cache ADD FAILED: {remote_name}/{path} - {result['error']}")
raise HTTPException(status_code=502, detail=f"Failed to fetch artifact: {result['error']}")
# Mark mutable files as cached in Redis with TTL
if result["status"] == "cached" and is_mutable:
cache_config = config.get_cache_config(remote_name)
mutable_ttl = cache_config.get("mutable_ttl", 3600)
cache.mark_index_cached(remote_name, path, mutable_ttl)
logger.info(f"Mutable file cached with TTL: {remote_name}/{path} (ttl: {mutable_ttl}s)")
if result.get("etag") or result.get("last_modified"):
cache.store_mutable_meta(remote_name, path, result.get("etag"), result.get("last_modified"))
# Now return the cached artifact
try:
cache_key = storage.get_object_key(remote_name, path)
artifact_data = storage.download_object(cache_key)
filename = os.path.basename(path)
content_type = "application/octet-stream"
if filename.endswith(".tar.gz"):
content_type = "application/gzip"
elif filename.endswith(".zip"):
content_type = "application/zip"
elif filename.endswith(".exe"):
content_type = "application/x-msdownload"
elif filename.endswith(".rpm"):
content_type = "application/x-rpm"
elif filename.endswith(".xml"):
content_type = "application/xml"
elif filename.endswith((".xml.gz", ".xml.bz2", ".xml.xz")):
content_type = "application/gzip"
# Record cache miss metrics
metrics.record_cache_miss(remote_name, len(artifact_data))
# Record artifact mapping in database
cache_key = storage.get_object_key(remote_name, path)
database.record_artifact_mapping(cache_key, remote_name, path, len(artifact_data))
return Response(
content=artifact_data,
media_type=content_type,
headers={
"Content-Disposition": f"attachment; filename={filename}",
"X-Artifact-Source": "remote",
"X-Artifact-Size": str(len(artifact_data)),
},
)
except Exception as e:
raise HTTPException(status_code=500, detail=f"Error serving artifact: {str(e)}")
@app.get("/v2/") @app.get("/v2/")
async def docker_v2_ping(): async def docker_v2_ping():
return Response( return docker_handler.ping()
content="{}",
media_type="application/json",
headers={"Docker-Distribution-Api-Version": "registry/2.0"},
)
@app.api_route("/v2/{remote_name}/{path:path}", methods=["GET", "HEAD"]) @app.api_route("/v2/{remote_name}/{path:path}", methods=["GET", "HEAD"])
async def docker_v2_proxy(request: Request, remote_name: str, path: str): async def docker_v2_proxy(request: Request, remote_name: str, path: str):
remote_config = config.get_remote_config(remote_name) return await docker_handler.proxy(request, remote_name, path, storage, cache, config, metrics)
if not remote_config:
raise HTTPException(status_code=404, detail=f"Remote '{remote_name}' not configured")
if remote_config.get("package") != "docker":
raise HTTPException(status_code=400, detail=f"Remote '{remote_name}' is not a docker remote")
# Check immutable_patterns against the image name (e.g. "library/nginx")
patterns = config.get_immutable_patterns(remote_name, "")
if patterns:
path_parts = path.split("/")
image_name = "/".join(path_parts[:2]) if len(path_parts) >= 2 else path
if not any(re.search(p, path) or re.search(p, image_name) for p in patterns):
logger.info(f"PATTERN BLOCKED: {remote_name}/{path}")
raise HTTPException(status_code=403, detail="Image not allowed by configuration patterns")
remote_url = await construct_remote_url(remote_name, path)
cached_key = storage.get_object_key(remote_name, path)
if not storage.exists(cached_key):
cached_key = None
is_mutable = cache.is_mutable_file(path, config.get_mutable_patterns(remote_name))
if cached_key and is_mutable:
if not cache.is_index_valid(remote_name, path):
if not await handle_expired_mutable(remote_name, path, remote_url):
cached_key = None
if not cached_key:
logger.info(f"Cache MISS: {remote_name}/{path} - fetching from remote: {remote_url}")
result = await cache_single_artifact(remote_url, remote_name, path)
if result["status"] == "error":
raise HTTPException(status_code=502, detail=f"Failed to fetch: {result['error']}")
if result["status"] == "cached" and is_mutable:
cache_config = config.get_cache_config(remote_name)
mutable_ttl = cache_config.get("mutable_ttl", 3600)
cache.mark_index_cached(remote_name, path, mutable_ttl)
logger.info(f"Mutable file cached with TTL: {remote_name}/{path} (ttl: {mutable_ttl}s)")
if result.get("etag") or result.get("last_modified"):
cache.store_mutable_meta(remote_name, path, result.get("etag"), result.get("last_modified"))
artifact_data = storage.download_object(storage.get_object_key(remote_name, path))
is_blob = "/blobs/" in path
if is_blob:
content_type = "application/octet-stream"
else:
try:
manifest_json = json.loads(artifact_data)
content_type = manifest_json.get("mediaType")
if not content_type:
if "manifests" in manifest_json:
content_type = "application/vnd.oci.image.index.v1+json"
else:
content_type = "application/vnd.oci.image.manifest.v1+json"
except Exception:
content_type = "application/vnd.oci.image.manifest.v1+json"
digest = f"sha256:{hashlib.sha256(artifact_data).hexdigest()}"
headers = {
"Docker-Distribution-Api-Version": "registry/2.0",
"Docker-Content-Digest": digest,
"Content-Length": str(len(artifact_data)),
}
if request.method == "HEAD":
return Response(status_code=200, headers=headers, media_type=content_type)
metrics.record_cache_hit(remote_name, len(artifact_data))
return Response(content=artifact_data, media_type=content_type, headers=headers)
async def discover_artifacts(remote: str, include_pattern: str) -> list[str]: @app.get("/api/v1/virtual/{virtual_name}/{path:path}")
if "github.com" in remote: async def get_virtual_artifact(request: Request, virtual_name: str, path: str):
return await discover_github_releases(remote, include_pattern) return await virtual.handle(request, virtual_name, path, storage, cache, config)
else:
raise HTTPException(status_code=400, detail=f"Unsupported remote: {remote}")
async def discover_github_releases(remote: str, include_pattern: str) -> list[str]: @app.get("/api/v1/remote/{remote_name}/{path:path}")
match = re.match(r"github\.com/([^/]+)/([^/]+)", remote) async def get_artifact(request: Request, remote_name: str, path: str):
if not match: return await proxy.handle(request, remote_name, path, storage, cache, config, database, metrics)
raise HTTPException(status_code=400, detail="Invalid GitHub remote format")
owner, repo = match.groups()
async with httpx.AsyncClient(follow_redirects=True) as client:
response = await client.get(f"https://api.github.com/repos/{owner}/{repo}/releases")
if response.status_code != 200:
raise HTTPException(
status_code=response.status_code,
detail=f"Failed to fetch releases: {response.text}",
)
releases = response.json()
matching_urls = []
pattern = include_pattern.replace("*", ".*")
regex = re.compile(pattern)
for release in releases:
for asset in release.get("assets", []):
download_url = asset["browser_download_url"]
if regex.search(download_url):
matching_urls.append(download_url)
return matching_urls
@app.put("/api/v1/remote/{remote_name}/{path:path}") @app.get("/api/v1/local/{local_name}/{path:path}")
async def upload_file(remote_name: str, path: str, file: UploadFile = File(...)): def get_local_artifact(local_name: str, path: str):
"""Upload a file to local repository""" return local.download(local_name, path, storage, database, config)
# Check if remote is configured and is local
remote_config = config.get_remote_config(remote_name)
if not remote_config:
raise HTTPException(status_code=404, detail=f"Remote '{remote_name}' not configured")
if remote_config.get("type") != "local":
raise HTTPException(status_code=400, detail="Upload only supported for local repositories")
try:
# Read file content
content = await file.read()
# Calculate SHA256
sha256_sum = hashlib.sha256(content).hexdigest()
# Check if file already exists (prevent overwrite)
if database.file_exists(remote_name, path):
raise HTTPException(status_code=409, detail="File already exists")
# Generate S3 key
s3_key = f"local/{remote_name}/{path}"
# Determine content type
content_type = file.content_type or "application/octet-stream"
# Upload to S3
try:
storage.upload(s3_key, content)
except Exception as e:
raise HTTPException(status_code=500, detail=f"Upload failed: {e}")
# Add to database
success = database.add_local_file(
repository_name=remote_name,
file_path=path,
s3_key=s3_key,
size_bytes=len(content),
sha256_sum=sha256_sum,
content_type=content_type,
)
if not success:
# Clean up S3 if database insert failed
storage.delete_object(s3_key)
raise HTTPException(status_code=500, detail="Failed to save file metadata")
return JSONResponse(
{
"message": "File uploaded successfully",
"file_path": path,
"size_bytes": len(content),
"sha256_sum": sha256_sum,
"content_type": content_type,
}
)
except HTTPException:
raise
except Exception as e:
raise HTTPException(status_code=500, detail=f"Upload failed: {str(e)}")
@app.head("/api/v1/remote/{remote_name}/{path:path}") @app.put("/api/v1/local/{local_name}/{path:path}")
def check_file_exists(remote_name: str, path: str): async def upload_local_file(local_name: str, path: str, file: UploadFile = File(...)):
"""Check if file exists (for CI jobs) - supports local repositories only""" return await local.upload(local_name, path, file, storage, database, config)
# Check if remote is configured
remote_config = config.get_remote_config(remote_name)
if not remote_config:
raise HTTPException(status_code=404, detail=f"Remote '{remote_name}' not configured")
# Handle local repository
if remote_config.get("type") == "local":
try:
metadata = database.get_local_file_metadata(remote_name, path)
if not metadata:
raise HTTPException(status_code=404, detail="File not found")
return Response(
headers={
"Content-Length": str(metadata["size_bytes"]),
"Content-Type": metadata.get("content_type", "application/octet-stream"),
"X-SHA256": metadata["sha256_sum"],
"X-Created-At": metadata["created_at"].isoformat() if metadata["created_at"] else "",
"X-Uploaded-At": metadata["uploaded_at"].isoformat() if metadata["uploaded_at"] else "",
}
)
except HTTPException:
raise
except Exception as e:
raise HTTPException(status_code=500, detail=f"Check failed: {str(e)}")
else:
# For remote repositories, just return 405 Method Not Allowed
raise HTTPException(status_code=405, detail="HEAD method only supported for local repositories")
@app.delete("/api/v1/remote/{remote_name}/{path:path}") @app.head("/api/v1/local/{local_name}/{path:path}")
def delete_file(remote_name: str, path: str): def check_local_file_exists(local_name: str, path: str):
"""Delete a file from local repository""" return local.check_exists(local_name, path, database, config)
# Check if remote is configured and is local
remote_config = config.get_remote_config(remote_name)
if not remote_config:
raise HTTPException(status_code=404, detail=f"Remote '{remote_name}' not configured")
if remote_config.get("type") != "local":
raise HTTPException(status_code=400, detail="Delete only supported for local repositories")
try: @app.delete("/api/v1/local/{local_name}/{path:path}")
# Get S3 key before deleting from database def delete_local_file(local_name: str, path: str):
s3_key = database.delete_local_file(remote_name, path) return local.delete(local_name, path, storage, database, config)
if not s3_key:
raise HTTPException(status_code=404, detail="File not found")
# Delete from S3
if not storage.delete_object(s3_key):
# File was deleted from database but not from S3 - log warning but continue
print(f"Warning: Failed to delete S3 object {s3_key}")
return JSONResponse({"message": "File deleted successfully"})
except HTTPException:
raise
except Exception as e:
raise HTTPException(status_code=500, detail=f"Delete failed: {str(e)}")
@app.post("/api/v1/artifacts/cache") @app.post("/api/v1/artifacts/cache")
async def cache_artifact(request: ArtifactRequest) -> dict[str, Any]: async def cache_artifact(request: ArtifactRequest):
try: return await discovery.cache_artifacts(request.remote, request.include_pattern, storage)
matching_urls = await discover_artifacts(request.remote, request.include_pattern)
if not matching_urls:
return {
"message": "No matching artifacts found",
"cached_count": 0,
"artifacts": [],
}
cached_artifacts = []
for url in matching_urls:
result = await cache_single_artifact(url, "", "")
cached_artifacts.append(result)
cached_count = sum(1 for artifact in cached_artifacts if artifact["status"] in ["cached", "already_cached"])
return {
"message": f"Processed {len(matching_urls)} artifacts, {cached_count} successfully cached",
"cached_count": cached_count,
"artifacts": cached_artifacts,
}
except Exception as e:
raise HTTPException(status_code=500, detail=str(e))
@app.get("/api/v1/artifacts/{remote:path}") @app.get("/api/v1/artifacts/{remote:path}")
async def list_cached_artifacts(remote: str, include_pattern: str = ".*") -> dict[str, Any]: async def list_cached_artifacts(remote: str, include_pattern: str = ".*"):
try: return await discovery.list_artifacts(remote, include_pattern, storage)
matching_urls = await discover_artifacts(remote, include_pattern)
cached_artifacts = []
for url in matching_urls:
# Extract path from URL for hierarchical key generation
from urllib.parse import urlparse
parsed = urlparse(url)
path = parsed.path
key = storage.get_object_key(remote, path)
if storage.exists(key):
cached_artifacts.append({"url": url, "cached_url": storage.get_url(key), "key": key})
return {
"remote": remote,
"pattern": include_pattern,
"total_found": len(matching_urls),
"cached_count": len(cached_artifacts),
"artifacts": cached_artifacts,
}
except Exception as e:
raise HTTPException(status_code=500, detail=str(e))
@app.get("/metrics")
def get_metrics(
json: bool | None = Query(False, description="Return JSON format instead of Prometheus"),
):
"""Get comprehensive metrics about the artifact storage system"""
config._check_reload()
if json:
# Return JSON format
return metrics.get_metrics(storage, config)
else:
# Return Prometheus format
metrics.get_metrics(storage, config) # Update gauges
prometheus_data = generate_latest().decode("utf-8")
return PlainTextResponse(prometheus_data, media_type=CONTENT_TYPE_LATEST)
@app.get("/config")
def get_config():
return config.config
def main(): def main():
+13 -7
View File
@@ -87,9 +87,10 @@ class MetricsManager:
# Get from database if available # Get from database if available
db_sizes = self.database_manager.get_storage_by_remote() db_sizes = self.database_manager.get_storage_by_remote()
if db_sizes: if db_sizes:
# Initialize all configured remotes to 0 # Initialize all configured remotes and locals to 0
remote_sizes = {} remote_sizes = {}
for remote in config_manager.config.get("remotes", {}).keys(): all_names = list(config_manager.config.get("remotes", {}).keys()) + list(config_manager.config.get("locals", {}).keys())
for remote in all_names:
remote_sizes[remote] = db_sizes.get(remote, 0) remote_sizes[remote] = db_sizes.get(remote, 0)
# Update Prometheus gauges # Update Prometheus gauges
@@ -101,10 +102,10 @@ class MetricsManager:
# Fallback to S3 scanning if database not available # Fallback to S3 scanning if database not available
try: try:
remote_sizes = {} remote_sizes = {}
remotes = config_manager.config.get("remotes", {}).keys() all_names = list(config_manager.config.get("remotes", {}).keys()) + list(config_manager.config.get("locals", {}).keys())
# Initialize all remotes to 0 # Initialize all remotes and locals to 0
for remote in remotes: for remote in all_names:
remote_sizes[remote] = 0 remote_sizes[remote] = 0
paginator = storage.client.get_paginator("list_objects_v2") paginator = storage.client.get_paginator("list_objects_v2")
@@ -174,8 +175,13 @@ class MetricsManager:
metrics["requests"]["cache_hit_ratio"] = cache_hits / total_requests if total_requests > 0 else 0.0 metrics["requests"]["cache_hit_ratio"] = cache_hits / total_requests if total_requests > 0 else 0.0
metrics["bandwidth"]["saved_bytes"] = bandwidth_saved metrics["bandwidth"]["saved_bytes"] = bandwidth_saved
# Get per-remote metrics # Get per-repo metrics
for remote in config_manager.config.get("remotes", {}).keys(): all_repos = {
**config_manager.config.get("remotes", {}),
**config_manager.config.get("virtuals", {}),
**config_manager.config.get("locals", {}),
}
for remote in all_repos.keys():
remote_cache_hits = int(self.redis_client.client.get(f"metrics:cache_hits:{remote}") or 0) remote_cache_hits = int(self.redis_client.client.get(f"metrics:cache_hits:{remote}") or 0)
remote_cache_misses = int(self.redis_client.client.get(f"metrics:cache_misses:{remote}") or 0) remote_cache_misses = int(self.redis_client.client.get(f"metrics:cache_misses:{remote}") or 0)
remote_total = remote_cache_hits + remote_cache_misses remote_total = remote_cache_hits + remote_cache_misses
+4
View File
@@ -0,0 +1,4 @@
from . import generic, helm, npm, python, rpm
from .base import get_content_type
__all__ = ["generic", "helm", "npm", "python", "rpm", "get_content_type"]
+16
View File
@@ -0,0 +1,16 @@
def get_content_type(filename: str) -> str:
if filename.endswith((".tar.gz", ".tgz")):
return "application/gzip"
if filename.endswith(".zip") or filename.endswith(".whl"):
return "application/zip"
if filename.endswith(".exe"):
return "application/x-msdownload"
if filename.endswith(".rpm"):
return "application/x-rpm"
if filename.endswith(".xml"):
return "application/xml"
if filename.endswith((".xml.gz", ".xml.bz2", ".xml.xz")):
return "application/gzip"
if filename.endswith((".yaml", ".yml")):
return "text/yaml"
return "application/octet-stream"
+3
View File
@@ -0,0 +1,3 @@
from .base import get_content_type
__all__ = ["get_content_type"]
+18
View File
@@ -0,0 +1,18 @@
from .base import get_content_type
def resolve_content(
data: bytes,
path: str,
filename: str,
base_url: str,
proxy_url: str,
remote_name: str,
) -> tuple[bytes, str]:
if filename == "index.yaml":
data = data.replace(
base_url.encode(),
f"{proxy_url}/api/v1/remote/{remote_name}".encode(),
)
return data, "text/yaml"
return data, get_content_type(filename)
+21
View File
@@ -0,0 +1,21 @@
import re
from .base import get_content_type
def resolve_content(
data: bytes,
path: str,
filename: str,
immutable_patterns: list[str],
base_url: str,
proxy_url: str,
remote_name: str,
) -> tuple[bytes, str]:
if not any(re.search(p, path) for p in immutable_patterns):
data = data.replace(
base_url.encode(),
f"{proxy_url}/api/v1/remote/{remote_name}".encode(),
)
return data, "application/json"
return data, get_content_type(filename)
+32
View File
@@ -0,0 +1,32 @@
import re
from .base import get_content_type
def construct_url(base_url: str, path: str) -> str:
"""Build the upstream URL for a PyPI request.
PyPI splits simple/ index pages (pypi.org) from file downloads
(files.pythonhosted.org), so simple/ requests are redirected to pypi.org.
"""
if base_url.rstrip("/") == "https://files.pythonhosted.org" and "simple/" in path:
return f"https://pypi.org/{path}"
return f"{base_url}/{path}"
def resolve_content(
data: bytes,
path: str,
filename: str,
immutable_patterns: list[str],
base_url: str,
proxy_url: str,
remote_name: str,
) -> tuple[bytes, str]:
if not any(re.search(p, path) for p in immutable_patterns):
data = data.replace(
base_url.encode(),
f"{proxy_url}/api/v1/remote/{remote_name}".encode(),
)
return data, "text/html; charset=utf-8"
return data, get_content_type(filename)
+3
View File
@@ -0,0 +1,3 @@
from .base import get_content_type
__all__ = ["get_content_type"]
+3
View File
@@ -0,0 +1,3 @@
from .s3 import S3Storage
__all__ = ["S3Storage"]
@@ -41,7 +41,6 @@ class S3Storage:
self.client = boto3.client("s3", **client_kwargs) self.client = boto3.client("s3", **client_kwargs)
# Try to ensure bucket exists, but don't fail if MinIO isn't ready yet
try: try:
self._ensure_bucket_exists() self._ensure_bucket_exists()
except Exception as e: except Exception as e:
@@ -55,25 +54,21 @@ class S3Storage:
self.client.create_bucket(Bucket=self.bucket) self.client.create_bucket(Bucket=self.bucket)
def get_object_key(self, remote_name: str, path: str) -> str: def get_object_key(self, remote_name: str, path: str) -> str:
# Extract directory path and filename
clean_path = path.lstrip("/") clean_path = path.lstrip("/")
filename = os.path.basename(clean_path) filename = os.path.basename(clean_path)
directory_path = os.path.dirname(clean_path) directory_path = os.path.dirname(clean_path)
# Special handling for Docker registry blobs (use digest as key for deduplication) # Docker blobs are keyed by digest for deduplication across images
if "/blobs/sha256:" in clean_path: if "/blobs/sha256:" in clean_path:
# Extract the SHA256 digest for Docker blobs
parts = clean_path.split("/blobs/sha256:") parts = clean_path.split("/blobs/sha256:")
if len(parts) == 2: if len(parts) == 2:
digest = parts[1] digest = parts[1]
return f"{remote_name}/blobs/sha256/{digest}" return f"{remote_name}/blobs/sha256/{digest}"
# Hash the directory path to keep keys manageable while preserving remote structure
if directory_path: if directory_path:
path_hash = hashlib.sha256(directory_path.encode()).hexdigest()[:16] path_hash = hashlib.sha256(directory_path.encode()).hexdigest()[:16]
return f"{remote_name}/{path_hash}/{filename}" return f"{remote_name}/{path_hash}/{filename}"
else: else:
# If no directory, just use remote and filename
return f"{remote_name}/{filename}" return f"{remote_name}/{filename}"
def exists(self, key: str) -> bool: def exists(self, key: str) -> bool:
+69 -9
View File
@@ -20,59 +20,119 @@ TEST_REMOTES = {
"remotes": { "remotes": {
"alpine-test": { "alpine-test": {
"base_url": "https://dl-cdn.alpinelinux.org", "base_url": "https://dl-cdn.alpinelinux.org",
"type": "remote",
"package": "alpine", "package": "alpine",
"immutable_patterns": [".*/x86_64/.*\\.apk$"], "immutable_patterns": [".*/x86_64/.*\\.apk$"],
"cache": {"immutable_ttl": 0, "mutable_ttl": 3600}, "cache": {"immutable_ttl": 0, "mutable_ttl": 3600},
}, },
"rpm-test": { "rpm-test": {
"base_url": "https://example.com/rpm", "base_url": "https://example.com/rpm",
"type": "remote",
"package": "rpm", "package": "rpm",
"immutable_patterns": [".*/x86_64/.*\\.rpm$", ".*/repodata/.*$"], "immutable_patterns": [".*/x86_64/.*\\.rpm$", ".*/repodata/.*$"],
"cache": {"immutable_ttl": 0, "mutable_ttl": 3600}, "cache": {"immutable_ttl": 0, "mutable_ttl": 3600},
}, },
"docker-test": { "docker-test": {
"base_url": "https://registry.example.com", "base_url": "https://registry.example.com",
"type": "remote",
"package": "docker", "package": "docker",
"cache": {"immutable_ttl": 0, "mutable_ttl": 300}, "cache": {"immutable_ttl": 0, "mutable_ttl": 300},
}, },
"docker-restricted": { "docker-restricted": {
"base_url": "https://registry.example.com", "base_url": "https://registry.example.com",
"type": "remote",
"package": "docker", "package": "docker",
"immutable_patterns": ["^library/nginx"], "immutable_patterns": ["^library/nginx"],
"cache": {"immutable_ttl": 0, "mutable_ttl": 300}, "cache": {"immutable_ttl": 0, "mutable_ttl": 300},
}, },
"docker-bantags-test": {
"base_url": "https://registry.example.com",
"package": "docker",
"ban_tags_enabled": True,
"ban_tags": ["latest", "edge"],
"cache": {"immutable_ttl": 0, "mutable_ttl": 300},
},
"generic-test": { "generic-test": {
"base_url": "https://releases.example.com", "base_url": "https://releases.example.com",
"type": "remote",
"package": "generic", "package": "generic",
"immutable_patterns": [".*\\.tar\\.gz$"], "immutable_patterns": [".*\\.tar\\.gz$"],
"cache": {"immutable_ttl": 0, "mutable_ttl": 0}, "cache": {"immutable_ttl": 0, "mutable_ttl": 0},
}, },
"custom-index-test": { "custom-index-test": {
"base_url": "https://example.com", "base_url": "https://example.com",
"type": "remote",
"package": "generic", "package": "generic",
"mutable_patterns": ["metadata\\.json$"], "mutable_patterns": ["metadata\\.json$"],
"cache": {"immutable_ttl": 0, "mutable_ttl": 600}, "cache": {"immutable_ttl": 0, "mutable_ttl": 600},
}, },
"check-mutable-test": { "check-mutable-test": {
"base_url": "https://example.com", "base_url": "https://example.com",
"type": "remote",
"package": "generic", "package": "generic",
"mutable_patterns": ["metadata\\.json$"], "mutable_patterns": ["metadata\\.json$"],
"check_mutable_updates": True, "check_mutable_updates": True,
"cache": {"immutable_ttl": 0, "mutable_ttl": 600}, "cache": {"immutable_ttl": 0, "mutable_ttl": 600},
}, },
"pypi-test": {
"base_url": "https://files.pythonhosted.org",
"package": "pypi",
"immutable_patterns": [
r"packages/.*\.whl$",
r"packages/.*\.whl\.metadata$",
r"packages/.*\.tar\.gz$",
],
"cache": {"immutable_ttl": 0, "mutable_ttl": 600},
},
"npm-test": {
"base_url": "https://registry.npmjs.org",
"package": "npm",
"immutable_patterns": [r"\.tgz$"],
"mutable_patterns": [r"^(?!.*\.tgz$).*"],
"cache": {"immutable_ttl": 0, "mutable_ttl": 600},
},
"helm-test": {
"base_url": "https://helm.releases.hashicorp.com",
"package": "helm",
"immutable_patterns": [r"\.tgz$"],
"cache": {"immutable_ttl": 0, "mutable_ttl": 3600},
},
"quarantine-test": {
"base_url": "https://releases.example.com",
"package": "generic",
"immutable_patterns": [r".*\.tar\.gz$"],
"quarantine_new": True,
"quarantine_days": 3,
"cache": {"immutable_ttl": 0, "mutable_ttl": 0},
},
"quarantine-disabled": {
"base_url": "https://releases.example.com",
"package": "generic",
"immutable_patterns": [r".*\.tar\.gz$"],
"quarantine_new": False,
"quarantine_days": 3,
"cache": {"immutable_ttl": 0, "mutable_ttl": 0},
},
"helm-member-2": {
"base_url": "https://charts.example.com",
"package": "helm",
"immutable_patterns": [r"\.tgz$"],
"cache": {"immutable_ttl": 0, "mutable_ttl": 1800},
},
},
"locals": {
"local-test": { "local-test": {
"type": "local",
"package": "generic", "package": "generic",
"cache": {"immutable_ttl": 0, "mutable_ttl": 0}, "cache": {"immutable_ttl": 0, "mutable_ttl": 0},
}, },
} },
"virtuals": {
"helm-virtual-test": {
"package": "helm",
"members": ["helm-test", "helm-member-2"],
},
"unsupported-virtual-test": {
"package": "rpm",
"members": ["rpm-test"],
},
"empty-virtual-test": {
"package": "helm",
"members": [],
},
},
} }
# --------------------------------------------------------------------------- # ---------------------------------------------------------------------------
+115
View File
@@ -283,3 +283,118 @@ class TestMutableMeta:
def test_delete_no_op_when_unavailable(self, unavailable_cache): def test_delete_no_op_when_unavailable(self, unavailable_cache):
unavailable_cache.delete_mutable_meta("remote", "path") # must not raise unavailable_cache.delete_mutable_meta("remote", "path") # must not raise
# ---------------------------------------------------------------------------
# artifact published date (quarantine support)
# ---------------------------------------------------------------------------
class TestArtifactPublished:
def test_key_format_is_deterministic(self, bare_cache):
path = "some/path/package-1.0.tar.gz"
expected_hash = hashlib.sha256(path.encode()).hexdigest()[:16]
assert bare_cache.get_artifact_published_key("myremote", path) == f"pkg:published:myremote:{expected_hash}"
def test_key_hash_is_16_chars(self, bare_cache):
key = bare_cache.get_artifact_published_key("remote", "path/to/file.whl")
assert len(key.split(":")[-1]) == 16
def test_different_paths_produce_different_keys(self, bare_cache):
k1 = bare_cache.get_artifact_published_key("remote", "pkg-1.0.tar.gz")
k2 = bare_cache.get_artifact_published_key("remote", "pkg-2.0.tar.gz")
assert k1 != k2
def test_store_calls_set_with_correct_value(self, cache_with_redis, mock_redis_client):
lm = "Mon, 01 Jan 2024 00:00:00 GMT"
cache_with_redis.store_artifact_published("remote", "path/pkg.tar.gz", lm)
expected_key = cache_with_redis.get_artifact_published_key("remote", "path/pkg.tar.gz")
mock_redis_client.set.assert_called_once_with(expected_key, lm)
def test_get_returns_stored_value(self, cache_with_redis, mock_redis_client):
lm = "Tue, 15 Mar 2022 12:00:00 GMT"
mock_redis_client.get.return_value = lm
result = cache_with_redis.get_artifact_published("remote", "path/pkg.tar.gz")
assert result == lm
def test_get_returns_none_when_not_stored(self, cache_with_redis, mock_redis_client):
mock_redis_client.get.return_value = None
result = cache_with_redis.get_artifact_published("remote", "path/pkg.tar.gz")
assert result is None
def test_store_no_op_when_unavailable(self, unavailable_cache):
unavailable_cache.store_artifact_published("remote", "path", "Mon, 01 Jan 2024 00:00:00 GMT")
def test_get_returns_none_when_unavailable(self, unavailable_cache):
assert unavailable_cache.get_artifact_published("remote", "path") is None
# ---------------------------------------------------------------------------
# fetch lock (thundering-herd deduplication)
# ---------------------------------------------------------------------------
class TestFetchLock:
def test_acquire_returns_true_when_lock_obtained(self, cache_with_redis, mock_redis_client):
mock_redis_client.set.return_value = True
result = cache_with_redis.acquire_fetch_lock("myremote", "library/nginx/manifests/latest")
assert result is True
def test_acquire_calls_set_nx_with_ttl(self, cache_with_redis, mock_redis_client):
mock_redis_client.set.return_value = True
cache_with_redis.acquire_fetch_lock("myremote", "library/nginx/manifests/latest", ttl=15)
_, kwargs = mock_redis_client.set.call_args
assert kwargs.get("nx") is True
assert kwargs.get("ex") == 15
def test_acquire_returns_false_when_lock_already_held(self, cache_with_redis, mock_redis_client):
mock_redis_client.set.return_value = None # Redis SET NX → None when key exists
result = cache_with_redis.acquire_fetch_lock("myremote", "library/nginx/manifests/latest")
assert result is False
def test_acquire_fails_open_when_unavailable(self, unavailable_cache):
# caller must be allowed to proceed when Redis is down
assert unavailable_cache.acquire_fetch_lock("myremote", "some/path") is True
def test_acquire_fails_open_on_redis_exception(self, cache_with_redis, mock_redis_client):
mock_redis_client.set.side_effect = Exception("connection reset")
assert cache_with_redis.acquire_fetch_lock("myremote", "some/path") is True
def test_lock_key_embeds_path_hash(self, cache_with_redis, mock_redis_client):
mock_redis_client.set.return_value = True
path = "library/nginx/manifests/latest"
cache_with_redis.acquire_fetch_lock("myremote", path)
args, _ = mock_redis_client.set.call_args
expected_hash = hashlib.sha256(path.encode()).hexdigest()[:16]
assert args[0] == f"fetchlock:myremote:{expected_hash}"
def test_lock_key_hash_is_16_chars(self, cache_with_redis, mock_redis_client):
mock_redis_client.set.return_value = True
cache_with_redis.acquire_fetch_lock("myremote", "some/long/path/file.tar.gz")
args, _ = mock_redis_client.set.call_args
# key format: fetchlock:<remote>:<16-char hash>
parts = args[0].split(":")
assert len(parts) == 3
assert len(parts[2]) == 16
def test_different_paths_produce_different_lock_keys(self, cache_with_redis, mock_redis_client):
mock_redis_client.set.return_value = True
cache_with_redis.acquire_fetch_lock("myremote", "path/a/manifests/latest")
key_a = mock_redis_client.set.call_args[0][0]
mock_redis_client.set.reset_mock()
cache_with_redis.acquire_fetch_lock("myremote", "path/b/manifests/latest")
key_b = mock_redis_client.set.call_args[0][0]
assert key_a != key_b
def test_release_deletes_correct_key(self, cache_with_redis, mock_redis_client):
path = "library/nginx/manifests/latest"
cache_with_redis.release_fetch_lock("myremote", path)
expected_hash = hashlib.sha256(path.encode()).hexdigest()[:16]
mock_redis_client.delete.assert_called_once_with(f"fetchlock:myremote:{expected_hash}")
def test_release_no_op_when_unavailable(self, unavailable_cache):
unavailable_cache.release_fetch_lock("myremote", "some/path") # must not raise
def test_release_no_op_on_redis_exception(self, cache_with_redis, mock_redis_client):
mock_redis_client.delete.side_effect = Exception("timeout")
cache_with_redis.release_fetch_lock("myremote", "some/path") # must not raise
+252 -13
View File
@@ -27,24 +27,24 @@ def make_config(tmp_path):
class TestGetMutablePatterns: class TestGetMutablePatterns:
def test_alpine_returns_package_defaults(self, make_config): def test_alpine_returns_package_defaults(self, make_config):
cfg = make_config({"r": {"type": "remote", "package": "alpine", "base_url": "https://x.com"}}) cfg = make_config({"r": {"package": "alpine", "base_url": "https://x.com"}})
patterns = cfg.get_mutable_patterns("r") patterns = cfg.get_mutable_patterns("r")
assert r"APKINDEX\.tar\.gz$" in patterns assert r"APKINDEX\.tar\.gz$" in patterns
def test_rpm_returns_package_defaults(self, make_config): def test_rpm_returns_package_defaults(self, make_config):
cfg = make_config({"r": {"type": "remote", "package": "rpm", "base_url": "https://x.com"}}) cfg = make_config({"r": {"package": "rpm", "base_url": "https://x.com"}})
patterns = cfg.get_mutable_patterns("r") patterns = cfg.get_mutable_patterns("r")
assert r"repomd\.xml$" in patterns assert r"repomd\.xml$" in patterns
assert any("repodata" in p for p in patterns) assert any("repodata" in p for p in patterns)
def test_docker_returns_package_defaults(self, make_config): def test_docker_returns_package_defaults(self, make_config):
cfg = make_config({"r": {"type": "remote", "package": "docker", "base_url": "https://x.com"}}) cfg = make_config({"r": {"package": "docker", "base_url": "https://x.com"}})
patterns = cfg.get_mutable_patterns("r") patterns = cfg.get_mutable_patterns("r")
assert any("manifests" in p for p in patterns) assert any("manifests" in p for p in patterns)
assert any("tags/list" in p for p in patterns) assert any("tags/list" in p for p in patterns)
def test_generic_returns_empty_list(self, make_config): def test_generic_returns_empty_list(self, make_config):
cfg = make_config({"r": {"type": "remote", "package": "generic", "base_url": "https://x.com"}}) cfg = make_config({"r": {"package": "generic", "base_url": "https://x.com"}})
assert cfg.get_mutable_patterns("r") == [] assert cfg.get_mutable_patterns("r") == []
def test_unknown_remote_returns_empty_list(self, make_config): def test_unknown_remote_returns_empty_list(self, make_config):
@@ -52,12 +52,12 @@ class TestGetMutablePatterns:
assert cfg.get_mutable_patterns("nonexistent") == [] assert cfg.get_mutable_patterns("nonexistent") == []
def test_missing_package_field_defaults_to_generic(self, make_config): def test_missing_package_field_defaults_to_generic(self, make_config):
cfg = make_config({"r": {"type": "remote", "base_url": "https://x.com"}}) cfg = make_config({"r": {"base_url": "https://x.com"}})
assert cfg.get_mutable_patterns("r") == [] assert cfg.get_mutable_patterns("r") == []
def test_unknown_package_type_returns_empty_list(self, make_config): def test_unknown_package_type_returns_empty_list(self, make_config):
# A mis-spelled package type silently returns [] — this is a known footgun # A mis-spelled package type silently returns [] — this is a known footgun
cfg = make_config({"r": {"type": "remote", "package": "deb", "base_url": "https://x.com"}}) cfg = make_config({"r": {"package": "deb", "base_url": "https://x.com"}})
assert cfg.get_mutable_patterns("r") == [] assert cfg.get_mutable_patterns("r") == []
def test_extra_patterns_appended_after_defaults(self, make_config): def test_extra_patterns_appended_after_defaults(self, make_config):
@@ -133,6 +133,58 @@ class TestGetMutablePatterns:
assert r"repomd\.xml$" in patterns assert r"repomd\.xml$" in patterns
assert r"custom-meta\.xml$" in patterns assert r"custom-meta\.xml$" in patterns
def test_npm_has_no_package_defaults(self, make_config):
cfg = make_config({"r": {"package": "npm", "base_url": "https://x.com"}})
assert cfg.get_mutable_patterns("r") == []
def test_npm_explicit_mutable_pattern_matches_metadata(self, make_config):
import re
cfg = make_config(
{
"r": {
"type": "remote",
"package": "npm",
"base_url": "https://x.com",
"mutable_patterns": [r"^(?!.*\.tgz$).*"],
}
}
)
patterns = cfg.get_mutable_patterns("r")
assert any(re.search(p, "express") for p in patterns)
assert any(re.search(p, "@babel/core") for p in patterns)
def test_helm_returns_index_yaml_as_mutable(self, make_config):
cfg = make_config({"r": {"package": "helm", "base_url": "https://helm.example.com"}})
patterns = cfg.get_mutable_patterns("r")
assert r"index\.yaml$" in patterns
def test_helm_chart_tarballs_not_mutable_by_default(self, make_config):
import re
cfg = make_config({"r": {"package": "helm", "base_url": "https://helm.example.com"}})
patterns = cfg.get_mutable_patterns("r")
# Only index.yaml is mutable; .tgz chart tarballs are not
assert not any(re.search(p, "vault-0.29.1.tgz") for p in patterns)
assert not any(re.search(p, "consul-1.5.0.tgz") for p in patterns)
def test_npm_explicit_mutable_pattern_excludes_tarballs(self, make_config):
import re
cfg = make_config(
{
"r": {
"type": "remote",
"package": "npm",
"base_url": "https://x.com",
"mutable_patterns": [r"^(?!.*\.tgz$).*"],
}
}
)
patterns = cfg.get_mutable_patterns("r")
assert not any(re.search(p, "express-4.18.2.tgz") for p in patterns)
assert not any(re.search(p, "express/-/express-4.18.2.tgz") for p in patterns)
# --------------------------------------------------------------------------- # ---------------------------------------------------------------------------
# get_immutable_patterns # get_immutable_patterns
@@ -158,7 +210,7 @@ class TestGetImmutablePatterns:
assert cfg.get_immutable_patterns("nonexistent") == [] assert cfg.get_immutable_patterns("nonexistent") == []
def test_returns_empty_when_no_patterns_configured(self, make_config): def test_returns_empty_when_no_patterns_configured(self, make_config):
cfg = make_config({"r": {"type": "remote", "package": "generic", "base_url": "https://x.com"}}) cfg = make_config({"r": {"package": "generic", "base_url": "https://x.com"}})
assert cfg.get_immutable_patterns("r") == [] assert cfg.get_immutable_patterns("r") == []
def test_multiple_patterns_returned(self, make_config): def test_multiple_patterns_returned(self, make_config):
@@ -229,7 +281,7 @@ class TestGetUserMutablePatterns:
def test_excludes_package_defaults(self, make_config): def test_excludes_package_defaults(self, make_config):
# Package defaults (APKINDEX etc.) must NOT appear here # Package defaults (APKINDEX etc.) must NOT appear here
cfg = make_config({"r": {"type": "remote", "package": "alpine", "base_url": "https://x.com"}}) cfg = make_config({"r": {"package": "alpine", "base_url": "https://x.com"}})
assert cfg.get_user_mutable_patterns("r") == [] assert cfg.get_user_mutable_patterns("r") == []
def test_returns_empty_for_missing_remote(self, make_config): def test_returns_empty_for_missing_remote(self, make_config):
@@ -237,7 +289,7 @@ class TestGetUserMutablePatterns:
assert cfg.get_user_mutable_patterns("nonexistent") == [] assert cfg.get_user_mutable_patterns("nonexistent") == []
def test_returns_empty_when_key_absent(self, make_config): def test_returns_empty_when_key_absent(self, make_config):
cfg = make_config({"r": {"type": "remote", "package": "generic", "base_url": "https://x.com"}}) cfg = make_config({"r": {"package": "generic", "base_url": "https://x.com"}})
assert cfg.get_user_mutable_patterns("r") == [] assert cfg.get_user_mutable_patterns("r") == []
@@ -265,7 +317,7 @@ class TestGetCacheConfig:
assert cfg.get_cache_config("nonexistent") == {} assert cfg.get_cache_config("nonexistent") == {}
def test_returns_empty_dict_when_no_cache_key(self, make_config): def test_returns_empty_dict_when_no_cache_key(self, make_config):
cfg = make_config({"r": {"type": "remote", "package": "generic", "base_url": "https://x.com"}}) cfg = make_config({"r": {"package": "generic", "base_url": "https://x.com"}})
assert cfg.get_cache_config("r") == {} assert cfg.get_cache_config("r") == {}
@@ -277,11 +329,11 @@ class TestGetCacheConfig:
class TestConfigReload: class TestConfigReload:
def test_reloads_when_file_mtime_advances(self, tmp_path): def test_reloads_when_file_mtime_advances(self, tmp_path):
cfg_file = tmp_path / "remotes.yaml" cfg_file = tmp_path / "remotes.yaml"
cfg_file.write_text(yaml.dump({"remotes": {"repo-a": {"type": "remote", "package": "generic", "base_url": "https://x.com"}}})) cfg_file.write_text(yaml.dump({"remotes": {"repo-a": {"package": "generic", "base_url": "https://x.com"}}}))
cfg = ConfigManager(str(cfg_file)) cfg = ConfigManager(str(cfg_file))
assert "repo-a" in cfg.config["remotes"] assert "repo-a" in cfg.config["remotes"]
cfg_file.write_text(yaml.dump({"remotes": {"repo-b": {"type": "remote", "package": "generic", "base_url": "https://y.com"}}})) cfg_file.write_text(yaml.dump({"remotes": {"repo-b": {"package": "generic", "base_url": "https://y.com"}}}))
future_mtime = cfg._last_modified + 1 future_mtime = cfg._last_modified + 1
os.utime(str(cfg_file), (future_mtime, future_mtime)) os.utime(str(cfg_file), (future_mtime, future_mtime))
@@ -292,10 +344,197 @@ class TestConfigReload:
def test_no_reload_when_file_unchanged(self, tmp_path): def test_no_reload_when_file_unchanged(self, tmp_path):
cfg_file = tmp_path / "remotes.yaml" cfg_file = tmp_path / "remotes.yaml"
cfg_file.write_text(yaml.dump({"remotes": {"repo-a": {"type": "remote", "package": "generic", "base_url": "https://x.com"}}})) cfg_file.write_text(yaml.dump({"remotes": {"repo-a": {"package": "generic", "base_url": "https://x.com"}}}))
cfg = ConfigManager(str(cfg_file)) cfg = ConfigManager(str(cfg_file))
# Call check_reload without touching the file — should not reload # Call check_reload without touching the file — should not reload
cfg._check_reload() cfg._check_reload()
assert "repo-a" in cfg.config["remotes"] assert "repo-a" in cfg.config["remotes"]
# ---------------------------------------------------------------------------
# get_quarantine_config
# ---------------------------------------------------------------------------
class TestGetQuarantineConfig:
def test_returns_false_zero_when_not_configured(self, make_config):
cfg = make_config({"r": {"package": "generic", "base_url": "https://x.com"}})
enabled, days = cfg.get_quarantine_config("r")
assert enabled is False
assert days == 0
def test_returns_false_zero_for_missing_remote(self, make_config):
cfg = make_config({})
enabled, days = cfg.get_quarantine_config("nonexistent")
assert enabled is False
assert days == 0
def test_enabled_true_and_days_returned(self, make_config):
cfg = make_config(
{
"r": {
"type": "remote",
"package": "generic",
"base_url": "https://x.com",
"quarantine_new": True,
"quarantine_days": 7,
}
}
)
enabled, days = cfg.get_quarantine_config("r")
assert enabled is True
assert days == 7
def test_quarantine_new_false_returns_disabled(self, make_config):
cfg = make_config(
{
"r": {
"type": "remote",
"package": "generic",
"base_url": "https://x.com",
"quarantine_new": False,
"quarantine_days": 7,
}
}
)
enabled, days = cfg.get_quarantine_config("r")
assert enabled is False
assert days == 7
def test_enabled_with_zero_days_returns_zero(self, make_config):
cfg = make_config(
{
"r": {
"type": "remote",
"package": "generic",
"base_url": "https://x.com",
"quarantine_new": True,
"quarantine_days": 0,
}
}
)
enabled, days = cfg.get_quarantine_config("r")
assert enabled is True
assert days == 0
# ---------------------------------------------------------------------------
# Directory mode (CONFIG_PATH points to a directory)
# ---------------------------------------------------------------------------
def _remote(base_url: str = "https://x.com") -> dict:
return {"package": "generic", "base_url": base_url}
class TestConfigDirMode:
def test_loads_all_yaml_files(self, tmp_path):
(tmp_path / "a.yaml").write_text(yaml.dump({"remotes": {"repo-a": _remote()}}))
(tmp_path / "b.yaml").write_text(yaml.dump({"remotes": {"repo-b": _remote("https://y.com")}}))
cfg = ConfigManager(str(tmp_path))
assert "repo-a" in cfg.config["remotes"]
assert "repo-b" in cfg.config["remotes"]
def test_later_file_overrides_earlier_on_same_key(self, tmp_path):
(tmp_path / "a.yaml").write_text(yaml.dump({"remotes": {"r": _remote("https://first.com")}}))
(tmp_path / "b.yaml").write_text(yaml.dump({"remotes": {"r": _remote("https://second.com")}}))
cfg = ConfigManager(str(tmp_path))
assert cfg.config["remotes"]["r"]["base_url"] == "https://second.com"
def test_empty_directory_returns_empty_remotes(self, tmp_path):
cfg = ConfigManager(str(tmp_path))
assert cfg.config == {"remotes": {}, "virtuals": {}, "locals": {}}
def test_ignores_non_yaml_files(self, tmp_path):
(tmp_path / "notes.txt").write_text("not yaml")
(tmp_path / "a.yaml").write_text(yaml.dump({"remotes": {"repo-a": _remote()}}))
cfg = ConfigManager(str(tmp_path))
assert list(cfg.config["remotes"].keys()) == ["repo-a"]
def test_reload_picks_up_new_file(self, tmp_path):
(tmp_path / "a.yaml").write_text(yaml.dump({"remotes": {"repo-a": _remote()}}))
cfg = ConfigManager(str(tmp_path))
assert "repo-a" in cfg.config["remotes"]
assert "repo-b" not in cfg.config["remotes"]
new_file = tmp_path / "b.yaml"
new_file.write_text(yaml.dump({"remotes": {"repo-b": _remote("https://y.com")}}))
future_mtime = cfg._last_modified + 1
os.utime(str(new_file), (future_mtime, future_mtime))
cfg._check_reload()
assert "repo-a" in cfg.config["remotes"]
assert "repo-b" in cfg.config["remotes"]
# ---------------------------------------------------------------------------
# config_dir key (main file contains a config_dir pointer)
# ---------------------------------------------------------------------------
class TestConfigDirKey:
def test_merges_remotes_from_config_dir(self, tmp_path):
conf_d = tmp_path / "conf.d"
conf_d.mkdir()
(conf_d / "remotes.yaml").write_text(yaml.dump({"remotes": {"repo-extra": _remote("https://extra.com")}}))
main = tmp_path / "config.yaml"
main.write_text(yaml.dump({"config_dir": str(conf_d), "remotes": {"repo-main": _remote()}}))
cfg = ConfigManager(str(main))
assert "repo-main" in cfg.config["remotes"]
assert "repo-extra" in cfg.config["remotes"]
def test_relative_config_dir_resolved_from_main_file(self, tmp_path):
conf_d = tmp_path / "conf.d"
conf_d.mkdir()
(conf_d / "r.yaml").write_text(yaml.dump({"remotes": {"repo-a": _remote()}}))
main = tmp_path / "config.yaml"
main.write_text(yaml.dump({"config_dir": "conf.d", "remotes": {}}))
cfg = ConfigManager(str(main))
assert "repo-a" in cfg.config["remotes"]
def test_config_dir_key_not_present_in_loaded_config(self, tmp_path):
conf_d = tmp_path / "conf.d"
conf_d.mkdir()
main = tmp_path / "config.yaml"
main.write_text(yaml.dump({"config_dir": str(conf_d), "remotes": {}}))
cfg = ConfigManager(str(main))
assert "config_dir" not in cfg.config
def test_dir_remote_overrides_main_file_remote(self, tmp_path):
conf_d = tmp_path / "conf.d"
conf_d.mkdir()
(conf_d / "override.yaml").write_text(yaml.dump({"remotes": {"r": _remote("https://new.com")}}))
main = tmp_path / "config.yaml"
main.write_text(yaml.dump({"config_dir": str(conf_d), "remotes": {"r": _remote("https://old.com")}}))
cfg = ConfigManager(str(main))
assert cfg.config["remotes"]["r"]["base_url"] == "https://new.com"
def test_empty_config_dir_uses_main_file_only(self, tmp_path):
conf_d = tmp_path / "conf.d"
conf_d.mkdir()
main = tmp_path / "config.yaml"
main.write_text(yaml.dump({"config_dir": str(conf_d), "remotes": {"repo-main": _remote()}}))
cfg = ConfigManager(str(main))
assert list(cfg.config["remotes"].keys()) == ["repo-main"]
def test_reload_picks_up_changed_dir_file(self, tmp_path):
conf_d = tmp_path / "conf.d"
conf_d.mkdir()
dir_file = conf_d / "r.yaml"
dir_file.write_text(yaml.dump({"remotes": {"repo-v1": _remote()}}))
main = tmp_path / "config.yaml"
main.write_text(yaml.dump({"config_dir": str(conf_d), "remotes": {}}))
cfg = ConfigManager(str(main))
assert "repo-v1" in cfg.config["remotes"]
dir_file.write_text(yaml.dump({"remotes": {"repo-v2": _remote("https://v2.com")}}))
future_mtime = cfg._last_modified + 1
os.utime(str(dir_file), (future_mtime, future_mtime))
cfg._check_reload()
assert "repo-v2" in cfg.config["remotes"]
assert "repo-v1" not in cfg.config["remotes"]
+657 -44
View File
@@ -2,6 +2,7 @@
import hashlib import hashlib
import json import json
from datetime import UTC
from unittest.mock import ANY, AsyncMock, MagicMock, patch from unittest.mock import ANY, AsyncMock, MagicMock, patch
import pytest import pytest
@@ -204,7 +205,7 @@ class TestDockerProxy:
deps["cache"].is_mutable_file.return_value = True deps["cache"].is_mutable_file.return_value = True
with patch( with patch(
"artifactapi.main.cache_single_artifact", "artifactapi.artifact.proxy.cache_single_artifact",
new_callable=AsyncMock, new_callable=AsyncMock,
return_value={"status": "cached"}, return_value={"status": "cached"},
) as mock_fetch: ) as mock_fetch:
@@ -226,7 +227,7 @@ class TestDockerProxy:
deps["cache"].is_mutable_file.return_value = True deps["cache"].is_mutable_file.return_value = True
with patch( with patch(
"artifactapi.main.cache_single_artifact", "artifactapi.artifact.proxy.cache_single_artifact",
new_callable=AsyncMock, new_callable=AsyncMock,
return_value={"status": "cached"}, return_value={"status": "cached"},
): ):
@@ -248,9 +249,9 @@ class TestDockerProxy:
deps["cache"].is_index_valid.return_value = False # but TTL expired deps["cache"].is_index_valid.return_value = False # but TTL expired
deps["storage"].download_object.return_value = manifest deps["storage"].download_object.return_value = manifest
with patch("artifactapi.main._upstream_reachable", new_callable=AsyncMock, return_value=True): with patch("artifactapi.artifact.proxy._upstream_reachable", new_callable=AsyncMock, return_value=True):
with patch( with patch(
"artifactapi.main.cache_single_artifact", "artifactapi.artifact.proxy.cache_single_artifact",
new_callable=AsyncMock, new_callable=AsyncMock,
return_value={"status": "cached"}, return_value={"status": "cached"},
) as mock_fetch: ) as mock_fetch:
@@ -259,6 +260,211 @@ class TestDockerProxy:
mock_fetch.assert_called_once() mock_fetch.assert_called_once()
assert response.status_code == 200 assert response.status_code == 200
# --- Issue 1: sha256 digest cross-linking ---
def test_tag_manifest_is_stored_under_digest_key_on_cache_hit(self, client, patched_deps):
# When serving a cached tag manifest the handler must also write the content
# under the sha256 digest key so subsequent sha256-addressed pulls hit cache.
deps = patched_deps
manifest = json.dumps({"mediaType": "application/vnd.oci.image.manifest.v1+json", "layers": []}).encode()
# First exists call (tag manifest): hit. Second (digest key): miss → triggers upload.
deps["storage"].exists.side_effect = [True, False]
deps["storage"].download_object.return_value = manifest
deps["cache"].is_mutable_file.return_value = True
deps["cache"].is_index_valid.return_value = True
response = client.get("/v2/docker-test/library/nginx/manifests/v1.25.3")
assert response.status_code == 200
deps["storage"].upload.assert_called_once_with(deps["storage"].get_object_key.return_value, manifest)
def test_tag_manifest_digest_key_not_written_when_already_exists(self, client, patched_deps):
# When the digest key already exists in storage upload must not be called.
deps = patched_deps
manifest = json.dumps({"mediaType": "application/vnd.oci.image.manifest.v1+json", "layers": []}).encode()
# Both the tag key and the digest key already present.
deps["storage"].exists.return_value = True
deps["storage"].download_object.return_value = manifest
deps["cache"].is_mutable_file.return_value = True
deps["cache"].is_index_valid.return_value = True
client.get("/v2/docker-test/library/nginx/manifests/v1.25.3")
deps["storage"].upload.assert_not_called()
def test_sha256_manifest_request_is_not_cross_linked(self, client, patched_deps):
# sha256-addressed manifests are immutable — the cross-link logic must not apply.
deps = patched_deps
manifest = json.dumps({"mediaType": "application/vnd.oci.image.manifest.v1+json", "layers": []}).encode()
deps["storage"].exists.return_value = True
deps["storage"].download_object.return_value = manifest
deps["cache"].is_mutable_file.return_value = False # sha256 manifest is immutable
with patch("artifactapi.artifact.proxy._fetch_last_modified", new_callable=AsyncMock, return_value=None):
client.get("/v2/docker-test/library/nginx/manifests/sha256:" + "a" * 64)
deps["storage"].upload.assert_not_called()
# --- Issue 2: thundering herd distributed lock ---
def test_lock_acquired_and_released_on_upstream_fetch(self, client, patched_deps):
deps = patched_deps
manifest = json.dumps({"mediaType": "application/vnd.oci.image.manifest.v1+json", "layers": []}).encode()
deps["storage"].exists.side_effect = [False, False] # initial miss; digest key also absent
deps["storage"].download_object.return_value = manifest
deps["cache"].is_mutable_file.return_value = True
deps["cache"].acquire_fetch_lock.return_value = True
with patch(
"artifactapi.artifact.proxy.cache_single_artifact",
new_callable=AsyncMock,
return_value={"status": "cached"},
):
response = client.get("/v2/docker-test/library/nginx/manifests/latest")
deps["cache"].acquire_fetch_lock.assert_called_once()
deps["cache"].release_fetch_lock.assert_called_once()
assert response.status_code == 200
def test_lock_released_even_when_fetch_returns_error(self, client, patched_deps):
deps = patched_deps
deps["storage"].exists.return_value = False
deps["cache"].is_mutable_file.return_value = True
deps["cache"].acquire_fetch_lock.return_value = True
with patch(
"artifactapi.artifact.proxy.cache_single_artifact",
new_callable=AsyncMock,
return_value={"status": "error", "error": "upstream down"},
):
response = client.get("/v2/docker-test/library/nginx/manifests/latest")
deps["cache"].release_fetch_lock.assert_called_once()
assert response.status_code == 502
def test_thundering_herd_polls_storage_when_lock_not_acquired(self, client, patched_deps):
# When the lock is held by another pod the handler must poll storage and serve
# from cache once the competing fetch completes, without issuing its own upstream request.
deps = patched_deps
manifest = json.dumps({"mediaType": "application/vnd.oci.image.manifest.v1+json", "layers": []}).encode()
# Initial cache check: miss. First poll iteration: another pod has written it.
# Third call is for the digest cross-link check (is_mutable=True path); digest key exists.
deps["storage"].exists.side_effect = [False, True, True]
deps["storage"].download_object.return_value = manifest
deps["cache"].is_mutable_file.return_value = True
deps["cache"].is_index_valid.return_value = True
deps["cache"].acquire_fetch_lock.return_value = False # lock held by peer
with patch("artifactapi.artifact.docker.asyncio.sleep", new_callable=AsyncMock):
with patch(
"artifactapi.artifact.proxy.cache_single_artifact",
new_callable=AsyncMock,
) as mock_fetch:
response = client.get("/v2/docker-test/library/nginx/manifests/latest")
mock_fetch.assert_not_called()
assert response.status_code == 200
def test_thundering_herd_falls_through_to_fetch_if_poll_times_out(self, client, patched_deps):
# If the item never appears in storage during the poll window the handler must
# still issue its own upstream fetch as a fallback.
deps = patched_deps
manifest = json.dumps({"mediaType": "application/vnd.oci.image.manifest.v1+json", "layers": []}).encode()
# All exists calls return False — item never appears during polling.
deps["storage"].exists.return_value = False
deps["storage"].download_object.return_value = manifest
deps["cache"].is_mutable_file.return_value = True
deps["cache"].acquire_fetch_lock.return_value = False # lock held by peer
with patch("artifactapi.artifact.docker.asyncio.sleep", new_callable=AsyncMock):
with patch(
"artifactapi.artifact.proxy.cache_single_artifact",
new_callable=AsyncMock,
return_value={"status": "cached"},
) as mock_fetch:
response = client.get("/v2/docker-test/library/nginx/manifests/latest")
mock_fetch.assert_called_once()
assert response.status_code == 200
# ---------------------------------------------------------------------------
# Docker ban_tags feature
# ---------------------------------------------------------------------------
class TestDockerBanTags:
def test_banned_tag_returns_403(self, client, patched_deps):
response = client.get("/v2/docker-bantags-test/library/nginx/manifests/latest")
assert response.status_code == 403
assert "latest" in response.json()["detail"]
def test_second_banned_tag_returns_403(self, client, patched_deps):
response = client.get("/v2/docker-bantags-test/library/nginx/manifests/edge")
assert response.status_code == 403
assert "edge" in response.json()["detail"]
def test_allowed_tag_proceeds(self, client, patched_deps):
deps = patched_deps
manifest = json.dumps({"mediaType": "application/vnd.oci.image.manifest.v1+json", "layers": []}).encode()
deps["storage"].exists.return_value = True
deps["storage"].download_object.return_value = manifest
deps["cache"].is_mutable_file.return_value = True
deps["cache"].is_index_valid.return_value = True
response = client.get("/v2/docker-bantags-test/library/nginx/manifests/1.25.3")
assert response.status_code == 200
def test_digest_pull_bypasses_ban(self, client, patched_deps):
# sha256-addressed pulls must never be blocked by the tag ban list
deps = patched_deps
manifest = json.dumps({"mediaType": "application/vnd.oci.image.manifest.v1+json", "layers": []}).encode()
deps["storage"].exists.return_value = True
deps["storage"].download_object.return_value = manifest
deps["cache"].is_mutable_file.return_value = False
digest = "sha256:" + "a" * 64
with patch("artifactapi.artifact.proxy._fetch_last_modified", new_callable=AsyncMock, return_value=None):
response = client.get(f"/v2/docker-bantags-test/library/nginx/manifests/{digest}")
assert response.status_code == 200
def test_ban_tags_disabled_by_default(self, client, patched_deps):
# docker-test has no ban_tags_enabled — "latest" must pass through
deps = patched_deps
manifest = json.dumps({"mediaType": "application/vnd.oci.image.manifest.v1+json", "layers": []}).encode()
deps["storage"].exists.return_value = True
deps["storage"].download_object.return_value = manifest
deps["cache"].is_mutable_file.return_value = True
deps["cache"].is_index_valid.return_value = True
response = client.get("/v2/docker-test/library/nginx/manifests/latest")
assert response.status_code == 200
def test_ban_tags_enabled_but_empty_list_allows_all(self, client, patched_deps):
# If ban_tags_enabled is true but ban_tags is empty nothing should be blocked.
# docker-test doesn't have ban_tags_enabled, but we can verify via the
# docker-bantags-test remote with an unlisted tag.
deps = patched_deps
manifest = json.dumps({"mediaType": "application/vnd.oci.image.manifest.v1+json", "layers": []}).encode()
deps["storage"].exists.return_value = True
deps["storage"].download_object.return_value = manifest
deps["cache"].is_mutable_file.return_value = True
deps["cache"].is_index_valid.return_value = True
response = client.get("/v2/docker-bantags-test/library/nginx/manifests/stable")
assert response.status_code == 200
def test_ban_check_does_not_apply_to_blobs(self, client, patched_deps):
# Blob paths don't contain /manifests/ — the ban check must not interfere
deps = patched_deps
deps["storage"].exists.return_value = True
deps["storage"].download_object.return_value = b"\x00" * 100
deps["cache"].is_mutable_file.return_value = False
with patch("artifactapi.artifact.proxy._fetch_last_modified", new_callable=AsyncMock, return_value=None):
response = client.get("/v2/docker-bantags-test/library/nginx/blobs/sha256:" + "b" * 64)
assert response.status_code == 200
# --------------------------------------------------------------------------- # ---------------------------------------------------------------------------
# Generic artifact route /api/v1/remote/{remote}/{path} # Generic artifact route /api/v1/remote/{remote}/{path}
@@ -352,7 +558,7 @@ class TestGenericArtifactRoute:
deps["cache"].is_mutable_file.return_value = False deps["cache"].is_mutable_file.return_value = False
with patch( with patch(
"artifactapi.main.cache_single_artifact", "artifactapi.artifact.proxy.cache_single_artifact",
new_callable=AsyncMock, new_callable=AsyncMock,
return_value={"status": "cached"}, return_value={"status": "cached"},
) as mock_fetch: ) as mock_fetch:
@@ -369,7 +575,7 @@ class TestGenericArtifactRoute:
deps["cache"].is_mutable_file.return_value = False deps["cache"].is_mutable_file.return_value = False
with patch( with patch(
"artifactapi.main.cache_single_artifact", "artifactapi.artifact.proxy.cache_single_artifact",
new_callable=AsyncMock, new_callable=AsyncMock,
return_value={"status": "cached"}, return_value={"status": "cached"},
): ):
@@ -384,7 +590,7 @@ class TestGenericArtifactRoute:
deps["cache"].is_mutable_file.return_value = True deps["cache"].is_mutable_file.return_value = True
with patch( with patch(
"artifactapi.main.cache_single_artifact", "artifactapi.artifact.proxy.cache_single_artifact",
new_callable=AsyncMock, new_callable=AsyncMock,
return_value={"status": "cached"}, return_value={"status": "cached"},
): ):
@@ -399,7 +605,7 @@ class TestGenericArtifactRoute:
deps["cache"].is_mutable_file.return_value = False deps["cache"].is_mutable_file.return_value = False
with patch( with patch(
"artifactapi.main.cache_single_artifact", "artifactapi.artifact.proxy.cache_single_artifact",
new_callable=AsyncMock, new_callable=AsyncMock,
return_value={"status": "error", "error": "upstream unreachable"}, return_value={"status": "error", "error": "upstream unreachable"},
): ):
@@ -430,7 +636,7 @@ class TestGenericArtifactRoute:
deps["cache"].is_index_valid.return_value = False deps["cache"].is_index_valid.return_value = False
deps["cache"].get_mutable_meta.return_value = {"etag": '"abc"'} deps["cache"].get_mutable_meta.return_value = {"etag": '"abc"'}
with patch("artifactapi.main.check_upstream_changed", new_callable=AsyncMock, return_value=False): with patch("artifactapi.artifact.proxy.check_upstream_changed", new_callable=AsyncMock, return_value=False):
response = client.get("/api/v1/remote/check-mutable-test/metadata.json") response = client.get("/api/v1/remote/check-mutable-test/metadata.json")
assert response.status_code == 200 assert response.status_code == 200
@@ -446,8 +652,8 @@ class TestGenericArtifactRoute:
deps["cache"].is_index_valid.return_value = False deps["cache"].is_index_valid.return_value = False
deps["cache"].get_mutable_meta.return_value = {"etag": '"abc"'} deps["cache"].get_mutable_meta.return_value = {"etag": '"abc"'}
with patch("artifactapi.main.check_upstream_changed", new_callable=AsyncMock, return_value=True): with patch("artifactapi.artifact.proxy.check_upstream_changed", new_callable=AsyncMock, return_value=True):
with patch("artifactapi.main.cache_single_artifact", new_callable=AsyncMock) as mock_cache: with patch("artifactapi.artifact.proxy.cache_single_artifact", new_callable=AsyncMock) as mock_cache:
mock_cache.return_value = {"status": "error", "error": "upstream gone"} mock_cache.return_value = {"status": "error", "error": "upstream gone"}
response = client.get("/api/v1/remote/check-mutable-test/metadata.json") response = client.get("/api/v1/remote/check-mutable-test/metadata.json")
@@ -462,8 +668,8 @@ class TestGenericArtifactRoute:
deps["cache"].is_index_valid.return_value = False deps["cache"].is_index_valid.return_value = False
deps["cache"].get_mutable_meta.return_value = {"etag": '"abc"'} deps["cache"].get_mutable_meta.return_value = {"etag": '"abc"'}
with patch("artifactapi.main.check_upstream_changed", new_callable=AsyncMock, return_value=True): with patch("artifactapi.artifact.proxy.check_upstream_changed", new_callable=AsyncMock, return_value=True):
with patch("artifactapi.main.cache_single_artifact", new_callable=AsyncMock) as mock_cache: with patch("artifactapi.artifact.proxy.cache_single_artifact", new_callable=AsyncMock) as mock_cache:
mock_cache.return_value = {"status": "cached", "etag": '"def"', "last_modified": None} mock_cache.return_value = {"status": "cached", "etag": '"def"', "last_modified": None}
response = client.get("/api/v1/remote/check-mutable-test/metadata.json") response = client.get("/api/v1/remote/check-mutable-test/metadata.json")
@@ -472,7 +678,7 @@ class TestGenericArtifactRoute:
def test_mutable_backend_unreachable_on_check_updates_keeps_stale(self, client, patched_deps): def test_mutable_backend_unreachable_on_check_updates_keeps_stale(self, client, patched_deps):
"""When check_mutable_updates=True and backend is unreachable, stale copy is kept and TTL refreshed.""" """When check_mutable_updates=True and backend is unreachable, stale copy is kept and TTL refreshed."""
from artifactapi.main import UpstreamUnreachable from artifactapi.artifact.proxy import UpstreamUnreachable
deps = patched_deps deps = patched_deps
deps["storage"].exists.return_value = True deps["storage"].exists.return_value = True
@@ -481,7 +687,7 @@ class TestGenericArtifactRoute:
deps["cache"].is_index_valid.return_value = False deps["cache"].is_index_valid.return_value = False
deps["cache"].get_mutable_meta.return_value = {"etag": '"abc"'} deps["cache"].get_mutable_meta.return_value = {"etag": '"abc"'}
with patch("artifactapi.main.check_upstream_changed", side_effect=UpstreamUnreachable("connection refused")): with patch("artifactapi.artifact.proxy.check_upstream_changed", side_effect=UpstreamUnreachable("connection refused")):
response = client.get("/api/v1/remote/check-mutable-test/metadata.json") response = client.get("/api/v1/remote/check-mutable-test/metadata.json")
assert response.status_code == 200 assert response.status_code == 200
@@ -496,7 +702,7 @@ class TestGenericArtifactRoute:
deps["cache"].is_mutable_file.return_value = True deps["cache"].is_mutable_file.return_value = True
deps["cache"].is_index_valid.return_value = False deps["cache"].is_index_valid.return_value = False
with patch("artifactapi.main._upstream_reachable", new_callable=AsyncMock, return_value=False): with patch("artifactapi.artifact.proxy._upstream_reachable", new_callable=AsyncMock, return_value=False):
response = client.get("/api/v1/remote/alpine-test/alpine/v3.18/x86_64/APKINDEX.tar.gz") response = client.get("/api/v1/remote/alpine-test/alpine/v3.18/x86_64/APKINDEX.tar.gz")
assert response.status_code == 200 assert response.status_code == 200
@@ -510,8 +716,8 @@ class TestGenericArtifactRoute:
deps["cache"].is_mutable_file.return_value = True deps["cache"].is_mutable_file.return_value = True
deps["cache"].is_index_valid.return_value = False deps["cache"].is_index_valid.return_value = False
with patch("artifactapi.main.check_upstream_changed", new_callable=AsyncMock) as mock_check: with patch("artifactapi.artifact.proxy.check_upstream_changed", new_callable=AsyncMock) as mock_check:
with patch("artifactapi.main.cache_single_artifact", new_callable=AsyncMock) as mock_cache: with patch("artifactapi.artifact.proxy.cache_single_artifact", new_callable=AsyncMock) as mock_cache:
mock_cache.return_value = {"status": "error", "error": "upstream gone"} mock_cache.return_value = {"status": "error", "error": "upstream gone"}
client.get("/api/v1/remote/custom-index-test/metadata.json") client.get("/api/v1/remote/custom-index-test/metadata.json")
@@ -522,68 +728,53 @@ class TestGenericArtifactRoute:
deps["database"].get_local_file_metadata.return_value = None deps["database"].get_local_file_metadata.return_value = None
deps["database"].available = True deps["database"].available = True
response = client.get("/api/v1/remote/local-test/path/to/nonexistent.bin") response = client.get("/api/v1/local/local-test/path/to/nonexistent.bin")
assert response.status_code == 404 assert response.status_code == 404
# --------------------------------------------------------------------------- # ---------------------------------------------------------------------------
# Upload route PUT /api/v1/remote/{remote}/{path} # Upload route PUT /api/v1/local/{local}/{path}
# --------------------------------------------------------------------------- # ---------------------------------------------------------------------------
class TestUploadRoute: class TestUploadRoute:
def test_unknown_remote_returns_404(self, client, patched_deps): def test_unknown_local_returns_404(self, client, patched_deps):
response = client.put( response = client.put(
"/api/v1/remote/nonexistent/path/to/file.tar.gz", "/api/v1/local/nonexistent/path/to/file.tar.gz",
files={"file": ("file.tar.gz", b"content", "application/octet-stream")}, files={"file": ("file.tar.gz", b"content", "application/octet-stream")},
) )
assert response.status_code == 404 assert response.status_code == 404
def test_non_local_remote_returns_400(self, client, patched_deps):
response = client.put(
"/api/v1/remote/generic-test/path/to/file.tar.gz",
files={"file": ("file.tar.gz", b"content", "application/octet-stream")},
)
assert response.status_code == 400
# --------------------------------------------------------------------------- # ---------------------------------------------------------------------------
# HEAD route HEAD /api/v1/remote/{remote}/{path} # HEAD route HEAD /api/v1/local/{local}/{path}
# --------------------------------------------------------------------------- # ---------------------------------------------------------------------------
class TestHeadRoute: class TestHeadRoute:
def test_non_local_remote_returns_405(self, client, patched_deps):
response = client.head("/api/v1/remote/generic-test/path/to/file.tar.gz")
assert response.status_code == 405
def test_local_repo_file_not_found_returns_404(self, client, patched_deps): def test_local_repo_file_not_found_returns_404(self, client, patched_deps):
deps = patched_deps deps = patched_deps
deps["database"].get_local_file_metadata.return_value = None deps["database"].get_local_file_metadata.return_value = None
deps["database"].available = True deps["database"].available = True
response = client.head("/api/v1/remote/local-test/path/to/nonexistent.bin") response = client.head("/api/v1/local/local-test/path/to/nonexistent.bin")
assert response.status_code == 404 assert response.status_code == 404
def test_unknown_remote_returns_404(self, client, patched_deps): def test_unknown_local_returns_404(self, client, patched_deps):
response = client.head("/api/v1/remote/nonexistent/path/to/file.bin") response = client.head("/api/v1/local/nonexistent/path/to/file.bin")
assert response.status_code == 404 assert response.status_code == 404
# --------------------------------------------------------------------------- # ---------------------------------------------------------------------------
# DELETE route DELETE /api/v1/remote/{remote}/{path} # DELETE route DELETE /api/v1/local/{local}/{path}
# --------------------------------------------------------------------------- # ---------------------------------------------------------------------------
class TestDeleteRoute: class TestDeleteRoute:
def test_unknown_remote_returns_404(self, client, patched_deps): def test_unknown_local_returns_404(self, client, patched_deps):
response = client.delete("/api/v1/remote/nonexistent/path/to/file.tar.gz") response = client.delete("/api/v1/local/nonexistent/path/to/file.tar.gz")
assert response.status_code == 404 assert response.status_code == 404
def test_non_local_remote_returns_400(self, client, patched_deps):
response = client.delete("/api/v1/remote/generic-test/path/to/file.tar.gz")
assert response.status_code == 400
# --------------------------------------------------------------------------- # ---------------------------------------------------------------------------
# Cache flush PUT /cache/flush # Cache flush PUT /cache/flush
@@ -652,3 +843,425 @@ class TestConfigEndpoint:
data = response.json() data = response.json()
assert "remotes" in data assert "remotes" in data
assert "alpine-test" in data["remotes"] assert "alpine-test" in data["remotes"]
# ---------------------------------------------------------------------------
# PyPI remote /api/v1/remote/pypi-test/...
# ---------------------------------------------------------------------------
class TestPyPIRemote:
def test_simple_index_is_mutable(self, client, patched_deps):
"""simple/ paths are detected as mutable (package-type default)."""
deps = patched_deps
html = b"<html><body><a href='https://files.pythonhosted.org/packages/requests-2.31.0.tar.gz'>...</a></body></html>"
deps["storage"].exists.return_value = True
deps["storage"].download_object.return_value = html
deps["cache"].is_mutable_file.return_value = True
deps["cache"].is_index_valid.return_value = True
response = client.get("/api/v1/remote/pypi-test/simple/requests/")
assert response.status_code == 200
deps["cache"].mark_index_cached.assert_not_called()
def test_simple_index_urls_rewritten_to_proxy(self, client, patched_deps):
"""files.pythonhosted.org URLs in a cached simple index are rewritten to our proxy."""
deps = patched_deps
html = b"<html><body><a href='https://files.pythonhosted.org/packages/requests-2.31.0.tar.gz'>...</a></body></html>"
deps["storage"].exists.return_value = True
deps["storage"].download_object.return_value = html
deps["cache"].is_mutable_file.return_value = True
deps["cache"].is_index_valid.return_value = True
response = client.get("/api/v1/remote/pypi-test/simple/requests/")
assert response.status_code == 200
assert b"files.pythonhosted.org" not in response.content
assert b"/api/v1/remote/pypi-test/packages/requests-2.31.0.tar.gz" in response.content
def test_simple_index_content_type_is_html(self, client, patched_deps):
deps = patched_deps
deps["storage"].exists.return_value = True
deps["storage"].download_object.return_value = b"<html></html>"
deps["cache"].is_mutable_file.return_value = True
deps["cache"].is_index_valid.return_value = True
response = client.get("/api/v1/remote/pypi-test/simple/requests/")
assert response.status_code == 200
assert "text/html" in response.headers["content-type"]
def test_simple_index_cache_miss_fetches_upstream(self, client, patched_deps):
deps = patched_deps
html = b"<html><body><a href='https://files.pythonhosted.org/packages/p-1.0.whl'>...</a></body></html>"
deps["storage"].exists.return_value = False
deps["storage"].download_object.return_value = html
deps["cache"].is_mutable_file.return_value = True
with patch(
"artifactapi.artifact.proxy.cache_single_artifact",
new_callable=AsyncMock,
return_value={"status": "cached"},
) as mock_fetch:
response = client.get("/api/v1/remote/pypi-test/simple/requests/")
mock_fetch.assert_called_once()
assert response.status_code == 200
assert b"files.pythonhosted.org" not in response.content
def test_wheel_file_immutable_returns_correct_content_type(self, client, patched_deps):
deps = patched_deps
deps["storage"].exists.return_value = True
deps["storage"].download_object.return_value = b"PK wheel bytes"
deps["cache"].is_mutable_file.return_value = False
response = client.get("/api/v1/remote/pypi-test/packages/requests-2.31.0-py3-none-any.whl")
assert response.status_code == 200
assert "application/zip" in response.headers["content-type"]
assert response.headers["X-Artifact-Source"] == "cache"
def test_sdist_immutable_returns_correct_content_type(self, client, patched_deps):
deps = patched_deps
deps["storage"].exists.return_value = True
deps["storage"].download_object.return_value = b"tar bytes"
deps["cache"].is_mutable_file.return_value = False
response = client.get("/api/v1/remote/pypi-test/packages/requests-2.31.0.tar.gz")
assert response.status_code == 200
assert "application/gzip" in response.headers["content-type"]
def test_unknown_extension_on_pypi_remote_returns_403(self, client, patched_deps):
"""Paths that don't match immutable_patterns and aren't mutable are blocked."""
response = client.get("/api/v1/remote/pypi-test/packages/requests.unknown")
assert response.status_code == 403
# ---------------------------------------------------------------------------
# npm remote /api/v1/remote/npm-test/...
# ---------------------------------------------------------------------------
class TestNpmRemote:
def test_package_metadata_is_mutable(self, client, patched_deps):
"""Top-level package metadata paths are detected as mutable."""
deps = patched_deps
meta = b'{"name":"express","versions":{}}'
deps["storage"].exists.return_value = True
deps["storage"].download_object.return_value = meta
deps["cache"].is_mutable_file.return_value = True
deps["cache"].is_index_valid.return_value = True
response = client.get("/api/v1/remote/npm-test/express")
assert response.status_code == 200
deps["cache"].mark_index_cached.assert_not_called()
def test_metadata_tarball_urls_rewritten_to_proxy(self, client, patched_deps):
"""registry.npmjs.org tarball URLs in metadata JSON are rewritten to our proxy."""
deps = patched_deps
meta = b'{"dist":{"tarball":"https://registry.npmjs.org/express/-/express-4.18.2.tgz"}}'
deps["storage"].exists.return_value = True
deps["storage"].download_object.return_value = meta
deps["cache"].is_mutable_file.return_value = True
deps["cache"].is_index_valid.return_value = True
response = client.get("/api/v1/remote/npm-test/express")
assert response.status_code == 200
assert b"registry.npmjs.org" not in response.content
assert b"/api/v1/remote/npm-test/express/-/express-4.18.2.tgz" in response.content
def test_metadata_content_type_is_json(self, client, patched_deps):
deps = patched_deps
deps["storage"].exists.return_value = True
deps["storage"].download_object.return_value = b'{"name":"express"}'
deps["cache"].is_mutable_file.return_value = True
deps["cache"].is_index_valid.return_value = True
response = client.get("/api/v1/remote/npm-test/express")
assert response.status_code == 200
assert "application/json" in response.headers["content-type"]
def test_scoped_package_metadata_rewritten(self, client, patched_deps):
"""@scope/package metadata URLs are also rewritten back to the same npm-test remote."""
deps = patched_deps
meta = b'{"dist":{"tarball":"https://registry.npmjs.org/@babel/core/-/core-7.21.0.tgz"}}'
deps["storage"].exists.return_value = True
deps["storage"].download_object.return_value = meta
deps["cache"].is_mutable_file.return_value = True
deps["cache"].is_index_valid.return_value = True
response = client.get("/api/v1/remote/npm-test/@babel/core")
assert response.status_code == 200
assert b"registry.npmjs.org" not in response.content
assert b"/api/v1/remote/npm-test/@babel/core/-/core-7.21.0.tgz" in response.content
def test_tarball_not_rewritten(self, client, patched_deps):
"""Tarball requests (.tgz) bypass URL rewriting and return binary."""
deps = patched_deps
deps["storage"].exists.return_value = True
deps["storage"].download_object.return_value = b"\x1f\x8b tgz bytes"
deps["cache"].is_mutable_file.return_value = False
response = client.get("/api/v1/remote/npm-test/express/-/express-4.18.2.tgz")
assert response.status_code == 200
assert "application/gzip" in response.headers["content-type"]
assert response.headers["X-Artifact-Source"] == "cache"
def test_metadata_cache_miss_fetches_upstream(self, client, patched_deps):
deps = patched_deps
meta = b'{"dist":{"tarball":"https://registry.npmjs.org/lodash/-/lodash-4.17.21.tgz"}}'
deps["storage"].exists.return_value = False
deps["storage"].download_object.return_value = meta
deps["cache"].is_mutable_file.return_value = True
with patch(
"artifactapi.artifact.proxy.cache_single_artifact",
new_callable=AsyncMock,
return_value={"status": "cached"},
) as mock_fetch:
response = client.get("/api/v1/remote/npm-test/lodash")
mock_fetch.assert_called_once()
assert response.status_code == 200
assert b"registry.npmjs.org" not in response.content
def test_tarball_immutable_allowed_on_npm_remote(self, client, patched_deps):
"""Tarballs (.tgz) match immutable_patterns and are served without rewriting."""
deps = patched_deps
deps["storage"].exists.return_value = True
deps["storage"].download_object.return_value = b"tgz bytes"
deps["cache"].is_mutable_file.return_value = False
response = client.get("/api/v1/remote/npm-test/express/-/express-4.18.2.tgz")
assert response.status_code == 200
assert "application/gzip" in response.headers["content-type"]
# ---------------------------------------------------------------------------
# Helm remote /api/v1/remote/helm-test/...
# ---------------------------------------------------------------------------
class TestHelmRemote:
def test_index_yaml_is_mutable(self, client, patched_deps):
"""index.yaml is detected as mutable (package-type default)."""
deps = patched_deps
index = b"apiVersion: v1\nentries:\n vault:\n - urls:\n - https://helm.releases.hashicorp.com/vault-0.29.1.tgz\n"
deps["storage"].exists.return_value = True
deps["storage"].download_object.return_value = index
deps["cache"].is_mutable_file.return_value = True
deps["cache"].is_index_valid.return_value = True
response = client.get("/api/v1/remote/helm-test/index.yaml")
assert response.status_code == 200
deps["cache"].mark_index_cached.assert_not_called()
def test_index_yaml_urls_rewritten_to_proxy(self, client, patched_deps):
"""base_url chart URLs in a cached index.yaml are rewritten to our proxy."""
deps = patched_deps
index = b"apiVersion: v1\nentries:\n vault:\n - urls:\n - https://helm.releases.hashicorp.com/vault-0.29.1.tgz\n"
deps["storage"].exists.return_value = True
deps["storage"].download_object.return_value = index
deps["cache"].is_mutable_file.return_value = True
deps["cache"].is_index_valid.return_value = True
response = client.get("/api/v1/remote/helm-test/index.yaml")
assert response.status_code == 200
assert b"helm.releases.hashicorp.com" not in response.content
assert b"/api/v1/remote/helm-test/vault-0.29.1.tgz" in response.content
def test_index_yaml_content_type_is_yaml(self, client, patched_deps):
deps = patched_deps
deps["storage"].exists.return_value = True
deps["storage"].download_object.return_value = b"apiVersion: v1\nentries: {}\n"
deps["cache"].is_mutable_file.return_value = True
deps["cache"].is_index_valid.return_value = True
response = client.get("/api/v1/remote/helm-test/index.yaml")
assert response.status_code == 200
assert "text/yaml" in response.headers["content-type"]
def test_chart_tarball_immutable_returns_gzip_content_type(self, client, patched_deps):
"""Versioned chart tarballs match immutable_patterns and are served as binary."""
deps = patched_deps
deps["storage"].exists.return_value = True
deps["storage"].download_object.return_value = b"\x1f\x8b chart bytes"
deps["cache"].is_mutable_file.return_value = False
response = client.get("/api/v1/remote/helm-test/vault-0.29.1.tgz")
assert response.status_code == 200
assert "application/gzip" in response.headers["content-type"]
assert response.headers["X-Artifact-Source"] == "cache"
def test_index_yaml_cache_miss_fetches_upstream(self, client, patched_deps):
deps = patched_deps
index = b"apiVersion: v1\nentries:\n vault:\n - urls:\n - https://helm.releases.hashicorp.com/vault-0.29.1.tgz\n"
deps["storage"].exists.return_value = False
deps["storage"].download_object.return_value = index
deps["cache"].is_mutable_file.return_value = True
with patch(
"artifactapi.artifact.proxy.cache_single_artifact",
new_callable=AsyncMock,
return_value={"status": "cached"},
) as mock_fetch:
response = client.get("/api/v1/remote/helm-test/index.yaml")
mock_fetch.assert_called_once()
assert response.status_code == 200
assert b"helm.releases.hashicorp.com" not in response.content
def test_non_tgz_non_yaml_path_blocked_by_pattern(self, client, patched_deps):
"""Paths that don't match immutable_patterns and aren't mutable are blocked."""
deps = patched_deps
deps["cache"].is_mutable_file.return_value = False
response = client.get("/api/v1/remote/helm-test/vault.zip")
assert response.status_code == 403
# ---------------------------------------------------------------------------
# Quarantine (quarantine-test remote: quarantine_new=True, quarantine_days=3)
# ---------------------------------------------------------------------------
class TestQuarantine:
def _recent_date(self, days_ago=1):
"""Return an HTTP-format date string N days in the past (within quarantine window)."""
from datetime import datetime, timedelta
from email.utils import format_datetime
dt = datetime.now(UTC) - timedelta(days=days_ago)
return format_datetime(dt, usegmt=True)
def _old_date(self, days_ago=10):
"""Return an HTTP-format date string N days in the past (outside quarantine window)."""
from datetime import datetime, timedelta
from email.utils import format_datetime
dt = datetime.now(UTC) - timedelta(days=days_ago)
return format_datetime(dt, usegmt=True)
def test_cache_miss_recent_artifact_quarantined(self, client, patched_deps):
"""Cache miss: artifact published within quarantine window → 404."""
deps = patched_deps
deps["storage"].exists.return_value = False
deps["storage"].download_object.return_value = b"content"
deps["cache"].is_mutable_file.return_value = False
with patch(
"artifactapi.artifact.proxy.cache_single_artifact",
new_callable=AsyncMock,
return_value={"status": "cached", "last_modified": self._recent_date()},
):
response = client.get("/api/v1/remote/quarantine-test/some/path/package-1.0.tar.gz")
assert response.status_code == 404
assert "quarantined" in response.json()["detail"].lower()
def test_cache_miss_old_artifact_allowed(self, client, patched_deps):
"""Cache miss: artifact published outside quarantine window → 200."""
deps = patched_deps
deps["storage"].exists.return_value = False
deps["storage"].download_object.return_value = b"content"
deps["cache"].is_mutable_file.return_value = False
with patch(
"artifactapi.artifact.proxy.cache_single_artifact",
new_callable=AsyncMock,
return_value={"status": "cached", "last_modified": self._old_date()},
):
response = client.get("/api/v1/remote/quarantine-test/some/path/package-1.0.tar.gz")
assert response.status_code == 200
def test_cache_miss_no_last_modified_fails_open(self, client, patched_deps):
"""Cache miss: no Last-Modified header → fail open (200, not quarantined)."""
deps = patched_deps
deps["storage"].exists.return_value = False
deps["storage"].download_object.return_value = b"content"
deps["cache"].is_mutable_file.return_value = False
with patch(
"artifactapi.artifact.proxy.cache_single_artifact",
new_callable=AsyncMock,
return_value={"status": "cached", "last_modified": None},
):
response = client.get("/api/v1/remote/quarantine-test/some/path/package-1.0.tar.gz")
assert response.status_code == 200
def test_cache_hit_recent_artifact_quarantined(self, client, patched_deps):
"""Cache hit: stored publish date within quarantine window → 404."""
deps = patched_deps
deps["storage"].exists.return_value = True
deps["storage"].download_object.return_value = b"content"
deps["cache"].is_mutable_file.return_value = False
deps["cache"].get_artifact_published.return_value = self._recent_date()
response = client.get("/api/v1/remote/quarantine-test/some/path/package-1.0.tar.gz")
assert response.status_code == 404
assert "quarantined" in response.json()["detail"].lower()
def test_cache_hit_old_artifact_allowed(self, client, patched_deps):
"""Cache hit: stored publish date outside quarantine window → 200."""
deps = patched_deps
deps["storage"].exists.return_value = True
deps["storage"].download_object.return_value = b"content"
deps["cache"].is_mutable_file.return_value = False
deps["cache"].get_artifact_published.return_value = self._old_date()
response = client.get("/api/v1/remote/quarantine-test/some/path/package-1.0.tar.gz")
assert response.status_code == 200
def test_cache_hit_no_stored_date_fetches_upstream(self, client, patched_deps):
"""Cache hit: no stored date → HEAD upstream to get Last-Modified."""
deps = patched_deps
deps["storage"].exists.return_value = True
deps["storage"].download_object.return_value = b"content"
deps["cache"].is_mutable_file.return_value = False
deps["cache"].get_artifact_published.return_value = None
with patch(
"artifactapi.artifact.proxy._fetch_last_modified",
new_callable=AsyncMock,
return_value=self._old_date(),
) as mock_fetch:
response = client.get("/api/v1/remote/quarantine-test/some/path/package-1.0.tar.gz")
mock_fetch.assert_called_once()
assert response.status_code == 200
def test_quarantine_disabled_allows_recent_artifact(self, client, patched_deps):
"""quarantine_new=False: recent artifacts are not blocked."""
deps = patched_deps
deps["storage"].exists.return_value = False
deps["storage"].download_object.return_value = b"content"
deps["cache"].is_mutable_file.return_value = False
with patch(
"artifactapi.artifact.proxy.cache_single_artifact",
new_callable=AsyncMock,
return_value={"status": "cached", "last_modified": self._recent_date()},
):
response = client.get("/api/v1/remote/quarantine-disabled/some/path/package-1.0.tar.gz")
assert response.status_code == 200
def test_quarantine_detail_includes_available_date(self, client, patched_deps):
"""The 404 detail should include the date when the artifact becomes available."""
deps = patched_deps
deps["storage"].exists.return_value = False
deps["storage"].download_object.return_value = b"content"
deps["cache"].is_mutable_file.return_value = False
with patch(
"artifactapi.artifact.proxy.cache_single_artifact",
new_callable=AsyncMock,
return_value={"status": "cached", "last_modified": self._recent_date()},
):
response = client.get("/api/v1/remote/quarantine-test/some/path/package-1.0.tar.gz")
assert response.status_code == 404
detail = response.json()["detail"]
assert "available after" in detail
assert "3-day" in detail
+830
View File
@@ -0,0 +1,830 @@
"""Unit tests for the virtual repository handler (artifact/virtual.py)."""
from datetime import UTC, date, datetime
from unittest.mock import AsyncMock, MagicMock, patch
import pytest
import yaml
from artifactapi.artifact.virtual import (
_HANDLERS,
_entries_to_msgpack_safe,
_get_member_index,
_HelmDumper,
_HelmHandler,
_merge_helm_indexes,
_rewrite_urls,
_VirtualHandler,
_YamlDumperBase,
_YamlLoader,
)
# ---------------------------------------------------------------------------
# Shared sample data
# ---------------------------------------------------------------------------
_INDEX_A = b"""\
apiVersion: v1
entries:
vault:
- name: vault
version: "0.27.0"
urls:
- https://helm.releases.hashicorp.com/vault-0.27.0.tgz
consul:
- name: consul
version: "1.2.0"
urls:
- https://helm.releases.hashicorp.com/consul-1.2.0.tgz
generated: "2023-01-01T00:00:00.000Z"
"""
_INDEX_B = b"""\
apiVersion: v1
entries:
nginx:
- name: nginx
version: "15.0.0"
urls:
- https://charts.example.com/nginx-15.0.0.tgz
vault:
- name: vault
version: "0.27.0"
urls:
- https://charts.example.com/vault-0.27.0.tgz
- name: vault
version: "0.26.0"
urls:
- https://charts.example.com/vault-0.26.0.tgz
generated: "2023-01-01T00:00:00.000Z"
"""
_INDEX_SIMPLE = b"""\
apiVersion: v1
entries:
mychart:
- name: mychart
version: "1.0.0"
urls:
- https://helm.releases.hashicorp.com/mychart-1.0.0.tgz
generated: "2023-01-01T00:00:00.000Z"
"""
_INDEX_RELATIVE = b"""\
apiVersion: v1
entries:
rancher:
- name: rancher
version: "2.13.1"
urls:
- rancher-2.13.1.tgz
generated: "2023-01-01T00:00:00.000Z"
"""
_CFG_A = {"base_url": "https://helm.releases.hashicorp.com", "cache": {"mutable_ttl": 3600}}
_CFG_B = {"base_url": "https://charts.example.com", "cache": {"mutable_ttl": 1800}}
# ---------------------------------------------------------------------------
# _YamlLoader / _YamlDumperBase — C extension selection
# ---------------------------------------------------------------------------
class TestYamlExtensionSelection:
def test_loader_is_a_class(self):
assert isinstance(_YamlLoader, type)
def test_dumper_base_is_a_class(self):
assert isinstance(_YamlDumperBase, type)
def test_helm_dumper_uses_selected_base(self):
assert issubclass(_HelmDumper, _YamlDumperBase)
def test_c_extensions_used_when_available(self):
try:
assert _YamlLoader is yaml.CSafeLoader
assert _YamlDumperBase is yaml.CDumper
except AttributeError:
assert _YamlLoader is yaml.SafeLoader
assert _YamlDumperBase is yaml.Dumper
def test_loader_can_parse_yaml(self):
result = yaml.load(b"key: value", Loader=_YamlLoader)
assert result == {"key": "value"}
# ---------------------------------------------------------------------------
# _HelmDumper — datetime/date YAML serialization
# ---------------------------------------------------------------------------
class TestHelmDumper:
def _dump(self, value):
return yaml.dump({"v": value}, Dumper=_HelmDumper)
def test_datetime_with_tz_includes_Z_suffix(self):
dt = datetime(2023, 6, 15, 12, 0, 0, tzinfo=UTC)
assert "Z" in self._dump(dt)
def test_datetime_without_tz_has_no_Z_suffix(self):
dt = datetime(2023, 6, 15, 12, 0, 0)
assert "Z" not in self._dump(dt)
def test_datetime_uses_T_separator_not_space(self):
dt = datetime(2023, 6, 15, 12, 30, 0, tzinfo=UTC)
assert "T12:30:00" in self._dump(dt)
def test_date_serialized_as_iso_string(self):
assert "2023-01-15" in self._dump(date(2023, 1, 15))
def test_datetime_round_trips_as_string_not_python_datetime(self):
dt = datetime(2023, 6, 15, 12, 0, 0, tzinfo=UTC)
parsed = yaml.safe_load(self._dump(dt))
# yaml.safe_load must not re-parse this as a datetime object
assert isinstance(parsed["v"], str)
def test_date_round_trips_as_string_not_python_date(self):
parsed = yaml.safe_load(self._dump(date(2023, 1, 15)))
assert isinstance(parsed["v"], str)
# ---------------------------------------------------------------------------
# _HelmHandler
# ---------------------------------------------------------------------------
class TestHelmHandler:
def setup_method(self):
self.handler = _HelmHandler()
def test_accepts_index_yaml(self):
assert self.handler.accepts_path("index.yaml") is True
def test_rejects_tgz_path(self):
assert self.handler.accepts_path("vault-0.27.0.tgz") is False
def test_rejects_subdirectory_index(self):
assert self.handler.accepts_path("charts/index.yaml") is False
def test_rejects_empty_path(self):
assert self.handler.accepts_path("") is False
def test_path_error_is_non_empty_string(self):
msg = self.handler.path_error()
assert isinstance(msg, str) and len(msg) > 0
def test_merge_returns_bytes(self):
result = self.handler.merge([_INDEX_A], [None], ["member-a"], [_CFG_A], "http://proxy.example.com")
assert isinstance(result, bytes)
def test_merge_delegates_to_merge_helm_indexes(self):
with patch("artifactapi.artifact.virtual._merge_helm_indexes", return_value=b"merged") as mock_fn:
result = self.handler.merge([b"data"], [None], ["m"], [{}], "http://proxy")
mock_fn.assert_called_once_with([b"data"], [None], ["m"], [{}], "http://proxy")
assert result == b"merged"
# ---------------------------------------------------------------------------
# _HANDLERS registry
# ---------------------------------------------------------------------------
class TestHandlersRegistry:
def test_helm_handler_is_registered(self):
assert "helm" in _HANDLERS
assert isinstance(_HANDLERS["helm"], _HelmHandler)
def test_helm_handler_satisfies_protocol(self):
assert isinstance(_HANDLERS["helm"], _VirtualHandler)
# ---------------------------------------------------------------------------
# _rewrite_urls
# ---------------------------------------------------------------------------
class TestRewriteUrls:
def _rewrite(self, urls, base_url="https://upstream.example.com", proxy_base="http://proxy.example.com", member_name="my-remote"):
return _rewrite_urls(urls, base_url, proxy_base, member_name)
def test_absolute_url_matching_base_is_rewritten(self):
result = self._rewrite(["https://upstream.example.com/chart-1.0.0.tgz"])
assert result == ["http://proxy.example.com/api/v1/remote/my-remote/chart-1.0.0.tgz"]
def test_relative_url_is_prepended_with_proxy_remote(self):
result = self._rewrite(["chart-1.0.0.tgz"])
assert result == ["http://proxy.example.com/api/v1/remote/my-remote/chart-1.0.0.tgz"]
def test_relative_url_with_leading_slash(self):
result = self._rewrite(["/chart-1.0.0.tgz"])
assert result == ["http://proxy.example.com/api/v1/remote/my-remote/chart-1.0.0.tgz"]
def test_absolute_url_not_matching_base_is_unchanged(self):
result = self._rewrite(["https://other.example.com/chart-1.0.0.tgz"])
assert result == ["https://other.example.com/chart-1.0.0.tgz"]
def test_empty_url_list_returns_empty(self):
assert self._rewrite([]) == []
def test_multiple_urls_all_rewritten(self):
urls = ["https://upstream.example.com/a-1.0.0.tgz", "b-2.0.0.tgz"]
result = self._rewrite(urls)
assert result[0] == "http://proxy.example.com/api/v1/remote/my-remote/a-1.0.0.tgz"
assert result[1] == "http://proxy.example.com/api/v1/remote/my-remote/b-2.0.0.tgz"
# ---------------------------------------------------------------------------
# _merge_helm_indexes
# ---------------------------------------------------------------------------
class TestMergeHelmIndexes:
def _merge(self, raw_indexes, member_names, member_configs, proxy_base="http://proxy.example.com"):
return _merge_helm_indexes(raw_indexes, [None] * len(raw_indexes), member_names, member_configs, proxy_base)
def _parse(self, raw):
return yaml.safe_load(raw)
def test_single_member_all_charts_present(self):
index = self._parse(self._merge([_INDEX_A], ["member-a"], [_CFG_A]))
assert "vault" in index["entries"]
assert "consul" in index["entries"]
def test_two_members_non_overlapping_charts_all_present(self):
index = self._parse(self._merge([_INDEX_A, _INDEX_B], ["member-a", "member-b"], [_CFG_A, _CFG_B]))
assert "vault" in index["entries"]
assert "consul" in index["entries"]
assert "nginx" in index["entries"]
def test_first_member_wins_on_duplicate_name_and_version(self):
index = self._parse(self._merge([_INDEX_A, _INDEX_B], ["member-a", "member-b"], [_CFG_A, _CFG_B]))
v027 = next(e for e in index["entries"]["vault"] if e["version"] == "0.27.0")
assert "member-a" in v027["urls"][0]
def test_absolute_urls_rewritten_to_proxy(self):
index = self._parse(self._merge([_INDEX_A], ["member-a"], [_CFG_A]))
url = index["entries"]["vault"][0]["urls"][0]
assert url == "http://proxy.example.com/api/v1/remote/member-a/vault-0.27.0.tgz"
def test_relative_urls_rewritten_to_proxy(self):
cfg = {"base_url": "https://releases.rancher.com/server-charts/stable", "cache": {"mutable_ttl": 3600}}
index = self._parse(self._merge([_INDEX_RELATIVE], ["rancher-stable"], [cfg]))
url = index["entries"]["rancher"][0]["urls"][0]
assert url == "http://proxy.example.com/api/v1/remote/rancher-stable/rancher-2.13.1.tgz"
def test_different_versions_of_same_chart_both_included(self):
index = self._parse(self._merge([_INDEX_A, _INDEX_B], ["member-a", "member-b"], [_CFG_A, _CFG_B]))
versions = {e["version"] for e in index["entries"]["vault"]}
assert "0.27.0" in versions
assert "0.26.0" in versions
def test_malformed_yaml_from_member_is_skipped(self):
index = self._parse(self._merge([_INDEX_A, b"{bad yaml"], ["member-a", "bad"], [_CFG_A, _CFG_B]))
assert "vault" in index["entries"]
assert "consul" in index["entries"]
def test_output_has_apiVersion_v1(self):
index = self._parse(self._merge([_INDEX_A], ["member-a"], [_CFG_A]))
assert index["apiVersion"] == "v1"
def test_output_has_generated_field(self):
index = self._parse(self._merge([_INDEX_A], ["member-a"], [_CFG_A]))
assert "generated" in index
def test_output_is_valid_yaml(self):
raw = self._merge([_INDEX_A, _INDEX_B], ["member-a", "member-b"], [_CFG_A, _CFG_B])
assert isinstance(yaml.safe_load(raw), dict)
def test_empty_index_from_member_produces_no_entries(self):
empty = b"apiVersion: v1\nentries: {}\ngenerated: '2023-01-01T00:00:00.000Z'\n"
index = self._parse(self._merge([empty], ["member-a"], [_CFG_A]))
assert index["entries"] == {}
# ---------------------------------------------------------------------------
# _get_member_index (async)
# ---------------------------------------------------------------------------
class TestGetMemberIndex:
@pytest.fixture
def storage(self):
m = MagicMock()
m.get_object_key.return_value = "member/key/index.yaml"
m.exists.return_value = False
m.download_object.return_value = b"cached bytes"
return m
@pytest.fixture
def cache(self):
m = MagicMock()
m.is_index_valid.return_value = False
return m
@pytest.fixture
def member_cfg(self):
return {"base_url": "https://helm.releases.hashicorp.com", "cache": {"mutable_ttl": 3600}}
def _fake_response(self, content=b"upstream bytes"):
r = MagicMock()
r.content = content
r.raise_for_status = MagicMock()
return r
def _patch_httpx(self, response):
mock_client_cls = patch("artifactapi.artifact.virtual.httpx.AsyncClient")
p = mock_client_cls.start()
mock_client = AsyncMock()
p.return_value.__aenter__.return_value = mock_client
mock_client.get.return_value = response
return mock_client_cls, mock_client
async def test_cache_hit_returns_stored_bytes(self, storage, cache, member_cfg):
storage.exists.return_value = True
cache.is_index_valid.return_value = True
_, _, _, raw_data, _ = await _get_member_index("m", member_cfg, "index.yaml", storage, cache)
assert raw_data == b"cached bytes"
async def test_cache_hit_does_not_fetch_upstream(self, storage, cache, member_cfg):
storage.exists.return_value = True
cache.is_index_valid.return_value = True
with patch("artifactapi.artifact.virtual.httpx.AsyncClient") as mock_cls:
await _get_member_index("m", member_cfg, "index.yaml", storage, cache)
mock_cls.assert_not_called()
async def test_cache_hit_storage_error_falls_through_to_upstream(self, storage, cache, member_cfg):
storage.exists.return_value = True
cache.is_index_valid.return_value = True
storage.download_object.side_effect = Exception("S3 read error")
with patch("artifactapi.artifact.virtual.httpx.AsyncClient") as mock_cls:
mock_client = AsyncMock()
mock_cls.return_value.__aenter__.return_value = mock_client
mock_client.get.return_value = self._fake_response(b"fresh bytes")
_, _, _, raw_data, _ = await _get_member_index("m", member_cfg, "index.yaml", storage, cache)
assert raw_data == b"fresh bytes"
async def test_cache_miss_fetches_from_upstream(self, storage, cache, member_cfg):
with patch("artifactapi.artifact.virtual.httpx.AsyncClient") as mock_cls:
mock_client = AsyncMock()
mock_cls.return_value.__aenter__.return_value = mock_client
mock_client.get.return_value = self._fake_response()
_, _, _, raw_data, _ = await _get_member_index("m", member_cfg, "index.yaml", storage, cache)
assert raw_data == b"upstream bytes"
async def test_cache_miss_stores_result_in_s3(self, storage, cache, member_cfg):
with patch("artifactapi.artifact.virtual.httpx.AsyncClient") as mock_cls:
mock_client = AsyncMock()
mock_cls.return_value.__aenter__.return_value = mock_client
mock_client.get.return_value = self._fake_response()
await _get_member_index("m", member_cfg, "index.yaml", storage, cache)
storage.upload.assert_called_once()
async def test_cache_miss_marks_cache_with_configured_ttl(self, storage, cache, member_cfg):
with patch("artifactapi.artifact.virtual.httpx.AsyncClient") as mock_cls:
mock_client = AsyncMock()
mock_cls.return_value.__aenter__.return_value = mock_client
mock_client.get.return_value = self._fake_response()
await _get_member_index("m", member_cfg, "index.yaml", storage, cache)
cache.mark_index_cached.assert_called_once_with("m", "index.yaml", 3600)
async def test_cache_miss_with_auth_sends_basic_auth_header(self, storage, cache):
cfg = {
"base_url": "https://private.example.com",
"username": "user",
"password": "pass",
"cache": {"mutable_ttl": 3600},
}
with patch("artifactapi.artifact.virtual.httpx.AsyncClient") as mock_cls:
mock_client = AsyncMock()
mock_cls.return_value.__aenter__.return_value = mock_client
mock_client.get.return_value = self._fake_response()
await _get_member_index("m", cfg, "index.yaml", storage, cache)
headers = mock_client.get.call_args.kwargs["headers"]
assert "Authorization" in headers
assert headers["Authorization"].startswith("Basic ")
async def test_no_credentials_sends_no_auth_header(self, storage, cache, member_cfg):
with patch("artifactapi.artifact.virtual.httpx.AsyncClient") as mock_cls:
mock_client = AsyncMock()
mock_cls.return_value.__aenter__.return_value = mock_client
mock_client.get.return_value = self._fake_response()
await _get_member_index("m", member_cfg, "index.yaml", storage, cache)
headers = mock_client.get.call_args.kwargs["headers"]
assert "Authorization" not in headers
async def test_upstream_fetch_failure_returns_none(self, storage, cache, member_cfg):
with patch("artifactapi.artifact.virtual.httpx.AsyncClient") as mock_cls:
mock_client = AsyncMock()
mock_cls.return_value.__aenter__.return_value = mock_client
mock_client.get.side_effect = Exception("connection refused")
_, _, _, raw_data, _ = await _get_member_index("m", member_cfg, "index.yaml", storage, cache)
assert raw_data is None
async def test_s3_upload_failure_still_returns_data(self, storage, cache, member_cfg):
storage.upload.side_effect = Exception("S3 write error")
with patch("artifactapi.artifact.virtual.httpx.AsyncClient") as mock_cls:
mock_client = AsyncMock()
mock_cls.return_value.__aenter__.return_value = mock_client
mock_client.get.return_value = self._fake_response()
_, _, _, raw_data, _ = await _get_member_index("m", member_cfg, "index.yaml", storage, cache)
assert raw_data == b"upstream bytes"
async def test_returns_ttl_from_config(self, storage, cache):
cfg = {"base_url": "https://example.com", "cache": {"mutable_ttl": 900}}
with patch("artifactapi.artifact.virtual.httpx.AsyncClient") as mock_cls:
mock_client = AsyncMock()
mock_cls.return_value.__aenter__.return_value = mock_client
mock_client.get.return_value = self._fake_response()
_, _, ttl, _, _ = await _get_member_index("m", cfg, "index.yaml", storage, cache)
assert ttl == 900
async def test_defaults_ttl_to_3600_when_not_configured(self, storage, cache):
cfg = {"base_url": "https://example.com"}
with patch("artifactapi.artifact.virtual.httpx.AsyncClient") as mock_cls:
mock_client = AsyncMock()
mock_cls.return_value.__aenter__.return_value = mock_client
mock_client.get.return_value = self._fake_response()
_, _, ttl, _, _ = await _get_member_index("m", cfg, "index.yaml", storage, cache)
assert ttl == 3600
# ---------------------------------------------------------------------------
# Virtual route GET /api/v1/virtual/{name}/{path}
# ---------------------------------------------------------------------------
@pytest.fixture
def mock_storage_v():
m = MagicMock()
m.get_object_key.return_value = "virtual/helm-virtual-test/index.yaml"
m.exists.return_value = False
m.download_object.return_value = b"apiVersion: v1\nentries: {}\n"
return m
@pytest.fixture
def mock_cache_v():
m = MagicMock()
m.is_index_valid.return_value = False
m.available = False
m.client = None
return m
@pytest.fixture
def patched_virtual_deps(mock_storage_v, mock_cache_v):
import artifactapi.main as main_mod
with (
patch.object(main_mod, "storage", mock_storage_v),
patch.object(main_mod, "cache", mock_cache_v),
):
yield {"storage": mock_storage_v, "cache": mock_cache_v}
class TestVirtualRoute:
def test_unknown_virtual_name_returns_404(self, client, patched_virtual_deps):
response = client.get("/api/v1/virtual/no-such-virtual/index.yaml")
assert response.status_code == 404
def test_non_virtual_name_returns_404(self, client, patched_virtual_deps):
# helm-test is in remotes, not virtuals
response = client.get("/api/v1/virtual/helm-test/index.yaml")
assert response.status_code == 404
def test_unsupported_package_returns_400(self, client, patched_virtual_deps):
# unsupported-virtual-test has package "rpm"
response = client.get("/api/v1/virtual/unsupported-virtual-test/index.yaml")
assert response.status_code == 400
def test_non_index_path_returns_404(self, client, patched_virtual_deps):
response = client.get("/api/v1/virtual/helm-virtual-test/vault-0.27.0.tgz")
assert response.status_code == 404
def test_no_members_returns_500(self, client, patched_virtual_deps):
response = client.get("/api/v1/virtual/empty-virtual-test/index.yaml")
assert response.status_code == 500
def test_virtual_cache_hit_returns_200(self, client, patched_virtual_deps):
deps = patched_virtual_deps
deps["storage"].exists.return_value = True
deps["cache"].is_index_valid.return_value = True
response = client.get("/api/v1/virtual/helm-virtual-test/index.yaml")
assert response.status_code == 200
def test_virtual_cache_hit_content_type_is_yaml(self, client, patched_virtual_deps):
deps = patched_virtual_deps
deps["storage"].exists.return_value = True
deps["cache"].is_index_valid.return_value = True
response = client.get("/api/v1/virtual/helm-virtual-test/index.yaml")
assert "text/yaml" in response.headers["content-type"]
def test_virtual_cache_hit_returns_stored_content(self, client, patched_virtual_deps):
deps = patched_virtual_deps
deps["storage"].exists.return_value = True
deps["cache"].is_index_valid.return_value = True
deps["storage"].download_object.return_value = b"apiVersion: v1\nentries: {}\n"
response = client.get("/api/v1/virtual/helm-virtual-test/index.yaml")
assert response.content == b"apiVersion: v1\nentries: {}\n"
def test_virtual_cache_hit_skips_member_fetch(self, client, patched_virtual_deps):
deps = patched_virtual_deps
deps["storage"].exists.return_value = True
deps["cache"].is_index_valid.return_value = True
with patch("artifactapi.artifact.virtual._get_member_index", new_callable=AsyncMock) as mock_get:
client.get("/api/v1/virtual/helm-virtual-test/index.yaml")
mock_get.assert_not_called()
def test_cache_miss_returns_200_with_yaml_content_type(self, client, patched_virtual_deps):
with patch("artifactapi.artifact.virtual._get_member_index", new_callable=AsyncMock) as mock_get:
mock_get.return_value = ("helm-test", _CFG_A, 3600, _INDEX_SIMPLE, None)
response = client.get("/api/v1/virtual/helm-virtual-test/index.yaml")
assert response.status_code == 200
assert "text/yaml" in response.headers["content-type"]
def test_cache_miss_response_contains_merged_entries(self, client, patched_virtual_deps):
with patch("artifactapi.artifact.virtual._get_member_index", new_callable=AsyncMock) as mock_get:
mock_get.return_value = ("helm-test", _CFG_A, 3600, _INDEX_SIMPLE, None)
response = client.get("/api/v1/virtual/helm-virtual-test/index.yaml")
index = yaml.safe_load(response.content)
assert "mychart" in index["entries"]
def test_cache_miss_stores_result_in_s3(self, client, patched_virtual_deps):
deps = patched_virtual_deps
with patch("artifactapi.artifact.virtual._get_member_index", new_callable=AsyncMock) as mock_get:
mock_get.return_value = ("helm-test", _CFG_A, 3600, _INDEX_SIMPLE, None)
client.get("/api/v1/virtual/helm-virtual-test/index.yaml")
deps["storage"].upload.assert_called_once()
def test_cache_miss_marks_index_cached(self, client, patched_virtual_deps):
deps = patched_virtual_deps
with patch("artifactapi.artifact.virtual._get_member_index", new_callable=AsyncMock) as mock_get:
mock_get.return_value = ("helm-test", _CFG_A, 3600, _INDEX_SIMPLE, None)
client.get("/api/v1/virtual/helm-virtual-test/index.yaml")
deps["cache"].mark_index_cached.assert_called_once()
def test_cache_miss_uses_min_ttl_across_members(self, client, patched_virtual_deps):
deps = patched_virtual_deps
with patch("artifactapi.artifact.virtual._get_member_index", new_callable=AsyncMock) as mock_get:
mock_get.side_effect = [
("helm-test", _CFG_A, 3600, _INDEX_SIMPLE, None),
("helm-member-2", _CFG_B, 1800, _INDEX_SIMPLE, None),
]
client.get("/api/v1/virtual/helm-virtual-test/index.yaml")
_, _, ttl = deps["cache"].mark_index_cached.call_args[0]
assert ttl == 1800
def test_all_members_unreachable_returns_502(self, client, patched_virtual_deps):
with patch("artifactapi.artifact.virtual._get_member_index", new_callable=AsyncMock) as mock_get:
mock_get.return_value = ("helm-test", _CFG_A, 3600, None, None)
response = client.get("/api/v1/virtual/helm-virtual-test/index.yaml")
assert response.status_code == 502
def test_one_member_unreachable_still_returns_200(self, client, patched_virtual_deps):
with patch("artifactapi.artifact.virtual._get_member_index", new_callable=AsyncMock) as mock_get:
mock_get.side_effect = [
("helm-test", _CFG_A, 3600, _INDEX_SIMPLE, None),
("helm-member-2", _CFG_B, 1800, None, None),
]
response = client.get("/api/v1/virtual/helm-virtual-test/index.yaml")
assert response.status_code == 200
def test_member_not_in_config_is_skipped(self, client, patched_virtual_deps):
import artifactapi.main as main_mod
real_get = main_mod.config.get_remote_config
def patched_get(name):
return None if name == "helm-member-2" else real_get(name)
with (
patch("artifactapi.artifact.virtual._get_member_index", new_callable=AsyncMock) as mock_get,
patch.object(main_mod.config, "get_remote_config", side_effect=patched_get),
):
mock_get.return_value = ("helm-test", _CFG_A, 3600, _INDEX_SIMPLE, None)
response = client.get("/api/v1/virtual/helm-virtual-test/index.yaml")
# only helm-test was available — should succeed
assert response.status_code == 200
mock_get.assert_called_once()
def test_s3_store_failure_still_returns_200(self, client, patched_virtual_deps):
deps = patched_virtual_deps
deps["storage"].upload.side_effect = Exception("S3 write error")
with patch("artifactapi.artifact.virtual._get_member_index", new_callable=AsyncMock) as mock_get:
mock_get.return_value = ("helm-test", _CFG_A, 3600, _INDEX_SIMPLE, None)
response = client.get("/api/v1/virtual/helm-virtual-test/index.yaml")
assert response.status_code == 200
# ---------------------------------------------------------------------------
# _entries_to_msgpack_safe
# ---------------------------------------------------------------------------
class TestEntriesToMsgpackSafe:
def test_plain_string_values_pass_through(self):
entries = {"chart": [{"name": "chart", "version": "1.0.0", "urls": ["http://x/c.tgz"]}]}
result = _entries_to_msgpack_safe(entries)
assert result["chart"][0]["version"] == "1.0.0"
def test_datetime_converted_to_iso_string(self):
dt = datetime(2023, 6, 15, 12, 0, 0, tzinfo=UTC)
entries = {"chart": [{"name": "chart", "version": "1.0.0", "created": dt}]}
result = _entries_to_msgpack_safe(entries)
assert isinstance(result["chart"][0]["created"], str)
assert "2023-06-15" in result["chart"][0]["created"]
def test_date_converted_to_iso_string(self):
entries = {"chart": [{"name": "chart", "version": "1.0.0", "created": date(2023, 6, 15)}]}
result = _entries_to_msgpack_safe(entries)
assert result["chart"][0]["created"] == "2023-06-15"
def test_empty_entries_returns_empty_dict(self):
assert _entries_to_msgpack_safe({}) == {}
def test_multiple_versions_all_converted(self):
dt = datetime(2023, 1, 1, tzinfo=UTC)
entries = {
"chart": [
{"name": "chart", "version": "1.0.0", "created": dt},
{"name": "chart", "version": "2.0.0", "created": dt},
]
}
result = _entries_to_msgpack_safe(entries)
for v in result["chart"]:
assert isinstance(v["created"], str)
def test_result_is_msgpack_serializable(self):
import msgpack
dt = datetime(2023, 6, 15, 12, 0, 0, tzinfo=UTC)
entries = {"chart": [{"name": "chart", "version": "1.0.0", "created": dt, "urls": ["http://x/c.tgz"]}]}
safe = _entries_to_msgpack_safe(entries)
packed = msgpack.packb(safe, use_bin_type=True)
unpacked = msgpack.unpackb(packed, raw=False)
assert unpacked["chart"][0]["created"] == safe["chart"][0]["created"]
# ---------------------------------------------------------------------------
# _merge_helm_indexes — pre-parsed entries path
# ---------------------------------------------------------------------------
class TestMergeHelmIndexesWithParsed:
"""Verify that pre-parsed entries (from msgpack) produce the same output as raw YAML."""
def _parse_entries(self, raw: bytes) -> dict:
index = yaml.safe_load(raw)
return index.get("entries") or {}
def test_parsed_entries_produce_same_charts_as_raw(self):
parsed = self._parse_entries(_INDEX_A)
raw_result = yaml.safe_load(_merge_helm_indexes([_INDEX_A], [None], ["member-a"], [_CFG_A], "http://proxy.example.com"))
parsed_result = yaml.safe_load(_merge_helm_indexes([_INDEX_A], [parsed], ["member-a"], [_CFG_A], "http://proxy.example.com"))
assert set(raw_result["entries"].keys()) == set(parsed_result["entries"].keys())
def test_parsed_entries_urls_are_rewritten(self):
parsed = self._parse_entries(_INDEX_A)
result = yaml.safe_load(_merge_helm_indexes([_INDEX_A], [parsed], ["member-a"], [_CFG_A], "http://proxy.example.com"))
url = result["entries"]["vault"][0]["urls"][0]
assert "member-a" in url
assert "proxy.example.com" in url
def test_none_parsed_falls_back_to_raw_bytes(self):
result = yaml.safe_load(_merge_helm_indexes([_INDEX_A], [None], ["member-a"], [_CFG_A], "http://proxy.example.com"))
assert "vault" in result["entries"]
def test_mixed_parsed_and_raw_merge_correctly(self):
parsed_a = self._parse_entries(_INDEX_A)
result = yaml.safe_load(
_merge_helm_indexes(
[_INDEX_A, _INDEX_B],
[parsed_a, None],
["member-a", "member-b"],
[_CFG_A, _CFG_B],
"http://proxy.example.com",
)
)
assert "vault" in result["entries"]
assert "nginx" in result["entries"]
# ---------------------------------------------------------------------------
# _get_member_index — msgpack cache behaviour
# ---------------------------------------------------------------------------
class TestGetMemberIndexMsgpack:
@pytest.fixture
def storage(self):
m = MagicMock()
m.get_object_key.side_effect = lambda name, path: f"{name}/{path}"
m.exists.return_value = False
m.download_object.return_value = _INDEX_SIMPLE
return m
@pytest.fixture
def cache(self):
m = MagicMock()
m.is_index_valid.return_value = False
return m
@pytest.fixture
def member_cfg(self):
return {"base_url": "https://helm.releases.hashicorp.com", "cache": {"mutable_ttl": 3600}}
def _fake_response(self, content=_INDEX_SIMPLE):
r = MagicMock()
r.content = content
r.raise_for_status = MagicMock()
return r
async def test_cache_hit_with_msgpack_returns_parsed_entries(self, storage, cache, member_cfg):
import msgpack
entries = {"mychart": [{"name": "mychart", "version": "1.0.0", "urls": ["http://x/c.tgz"]}]}
packed = msgpack.packb(entries, use_bin_type=True)
storage.exists.side_effect = lambda key: True
cache.is_index_valid.return_value = True
storage.download_object.side_effect = lambda key: packed if key.endswith("index.msgpack") else _INDEX_SIMPLE
_, _, _, raw_data, parsed = await _get_member_index("m", member_cfg, "index.yaml", storage, cache)
assert parsed == entries
async def test_cache_miss_builds_msgpack_and_returns_parsed(self, storage, cache, member_cfg):
with patch("artifactapi.artifact.virtual.httpx.AsyncClient") as mock_cls:
mock_client = AsyncMock()
mock_cls.return_value.__aenter__.return_value = mock_client
mock_client.get.return_value = self._fake_response()
_, _, _, raw_data, parsed = await _get_member_index("m", member_cfg, "index.yaml", storage, cache)
assert raw_data == _INDEX_SIMPLE
assert isinstance(parsed, dict)
assert "mychart" in parsed
async def test_broken_msgpack_rebuilds_from_raw_yaml(self, storage, cache, member_cfg):
storage.exists.side_effect = lambda key: True
cache.is_index_valid.return_value = True
storage.download_object.side_effect = lambda key: b"not-valid-msgpack" if key.endswith("index.msgpack") else _INDEX_SIMPLE
_, _, _, raw_data, parsed = await _get_member_index("m", member_cfg, "index.yaml", storage, cache)
assert raw_data == _INDEX_SIMPLE
# Falls back to YAML parse and rebuilds msgpack — entries are returned
assert isinstance(parsed, dict)
assert "mychart" in parsed
async def test_upstream_failure_returns_none_for_both(self, storage, cache, member_cfg):
with patch("artifactapi.artifact.virtual.httpx.AsyncClient") as mock_cls:
mock_client = AsyncMock()
mock_cls.return_value.__aenter__.return_value = mock_client
mock_client.get.side_effect = Exception("timeout")
_, _, _, raw_data, parsed = await _get_member_index("m", member_cfg, "index.yaml", storage, cache)
assert raw_data is None
assert parsed is None