Compare commits

..

9 Commits

Author SHA1 Message Date
unkinben 43927a7666 feat: add Terraform/OpenTofu registry remote type (#45)
ci/woodpecker/pr/pre-commit Pipeline was successful
ci/woodpecker/pr/test Pipeline was successful
ci/woodpecker/pr/build Pipeline was successful
Implements the Terraform Registry Protocol as a proxy remote type so
Terraform and OpenTofu can pull providers through the caching layer
without changing provider source addresses.

- New `terraform` package type with `construct_url` (prepends
  `/v1/providers/`) and `resolve_content` (rewrites `download_url`,
  `shasums_url`, `shasums_signature_url` to route through a companion
  `releases_remote`)
- Built-in mutable pattern for provider version lists
  (`{ns}/{type}/versions`)
- `releases_remote` config option links the registry remote to a
  separate generic remote proxying the release CDN
- Client config: `.terraformrc` / `.tofurc` host block redirects
  `registry.terraform.io` to the proxy without touching `.tf` files
- 8 unit tests + end-to-end test (OpenTofu 1.10 pulling hashicorp/vault
  4.5.0 through docker-compose stack)
- Example config and README section added
2026-05-17 11:30:35 +10:00
unkinben 9287cf7cf2 feat: add Puppet Forge remote type (#44)
## Summary

- Adds \`package: puppet\` for proxying Puppet Forge (forgeapi.puppet.com)
- \`remote/puppet.py\` rewrites JSON responses: absolute forge URLs → proxy URLs, and relative \`/v3/files/\` \`file_uri\` paths → absolute proxy URLs. g10k uses Go's \`url.ResolveReference\`, so an absolute \`file_uri\` overrides the base URL entirely — tarballs are fetched directly from the proxy without a second hop
- Built-in mutable patterns: \`^v3/modules/\` and \`^v3/releases\` (module metadata); tarballs at \`v3/files/\` are configured as immutable via \`immutable_patterns\`
- 9 new tests covering mutable detection, URL rewriting (relative \`file_uri\` and absolute forge URLs), content-type, tarball pass-through, and pattern blocking

## Client configuration

**g10k config file** (\`forge_base_url\` at root level):
\`\`\`yaml
cachedir: /tmp/g10k
forge_base_url: https://artifacts.example.com/api/v1/remote/puppet-forge
sources:
  control:
    remote: git@git.example.com:puppet/control.git
    basedir: /etc/puppetlabs/code/environments
\`\`\`

**Puppetfile** (\`forge.baseUrl\` directive, works with \`-puppetfile\` mode):
\`\`\`ruby
forge.baseUrl https://artifacts.example.com/api/v1/remote/puppet-forge

mod 'puppetlabs-stdlib', '9.7.0'
\`\`\`

## Test plan

- [x] 331 unit tests pass (\`make test\`)
- [x] End-to-end: g10k 0.9.10 on AlmaLinux 9 via \`forge_base_url\` — stdlib 9.7.0, inifile 6.2.0, concat 9.1.0 installed; proxy logs confirm cache MISS → fetch → ADD for metadata and tarballs
- [x] End-to-end: \`forge.baseUrl\` Puppetfile directive with \`-puppetfile\` mode — same result

Reviewed-on: #44
2026-05-17 10:56:50 +10:00
unkinben ff2aefeef4 feat: add ban_tags_enabled/ban_tags to docker remotes to block named tags (#43)
ci/woodpecker/tag/docker Pipeline was successful
Adds two per-remote config keys for docker remotes:

  ban_tags_enabled: false   # opt-in, default off
  ban_tags:
    - latest
    - edge

When ban_tags_enabled is true and a manifest request arrives for a named
tag in ban_tags, the proxy returns 403. sha256-addressed pulls are never
blocked, so images already pulled can still be referenced by digest.
Blob requests are unaffected.

Reviewed-on: #43
2026-05-10 22:13:11 +10:00
unkinben a115904bbc fix: cross-link tag manifests to digest keys and add fetch lock to prevent thundering herd (#42)
Tag manifests (e.g. library/nginx/manifests/latest) and their sha256-addressed
counterparts were stored at separate S3 keys with no cross-reference, so a
sha256 manifest request always missed cache even when the identical content had
just been stored under the tag key.

After serving any mutable (tag) manifest, compute the sha256 of the response
body and write it under the digest key (manifests/sha256:<hex>) if absent. The
next sha256-addressed pull hits cache immediately.

Also adds a short-lived Redis distributed lock (SET NX EX 30) around upstream
fetches so that concurrent pods racing for the same cold key poll storage for
up to 5 s before issuing a duplicate upstream request, eliminating the
thundering herd on deploy events.

Includes unit tests for both the lock primitives (acquire/release, fail-open
when Redis is unavailable) and the docker proxy behaviour (cross-link written
on tag hit, not written for sha256 requests, lock acquired/released, poll path
serves from cache without upstream fetch, fallback fetch when poll times out).

Reviewed-on: #42
2026-05-10 22:12:54 +10:00
unkinben 8a7f26b193 feat: cache parsed member indexes as msgpack to skip YAML re-parse on rebuild (#40)
ci/woodpecker/tag/docker Pipeline was successful
Closes #36

## Summary

- After fetching a member's `index.yaml` (from upstream or S3), the handler now parses it and stores a compact msgpack file (`index.msgpack`) alongside the raw YAML in S3
- On subsequent virtual rebuilds (member caches valid, virtual TTL expired), the handler loads the msgpack file instead of re-parsing raw YAML — eliminating the costliest phase
- `_entries_to_msgpack_safe()` converts datetime/date objects to ISO strings before packing (msgpack cannot natively serialize Python datetimes)
- `_merge_helm_indexes()` accepts `list[dict | None]` as pre-parsed entries; falls back to raw YAML parse when msgpack is unavailable
- `_VirtualHandler.merge()` protocol updated to pass pre-parsed entries to all future handler implementations
- Broken msgpack is detected and rebuilt from raw YAML automatically

## Performance

Phase breakdown (19-member helm-all virtual, 14 MB total):

| Phase | Time | % |
|---|---|---|
| YAML parse (eliminated) | 6314 ms | 60% |
| URL rewrite + dedup | 33 ms | 0.3% |
| YAML dump | 4124 ms | 39% |

| Scenario | Before (CSafeLoader only, #34) | After |
|---|---|---|
| Cold rebuild (upstream fetch) | ~21s | ~26s (+5s for msgpack build, one-time) |
| **Warm rebuild (S3 hit, virtual expired)** | **~9.6s** | **~5.9s (38% faster)** |
| Virtual cache hit | ~0.03s | ~0.03s |

Log line confirms msgpack hits: `msgpack=19/19`

## Test plan

- 297 tests pass
- `TestEntriesToMsgpackSafe`: datetime/date serialization, empty input, round-trip
- `TestMergeHelmIndexesWithParsed`: pre-parsed path produces identical output to raw-bytes path
- `TestGetMemberIndexMsgpack`: msgpack hit, cold-build, broken msgpack fallback, upstream failure
- Docker warm-rebuild measured at 5.9s vs 9.6s baseline

Reviewed-on: #40
2026-05-02 17:15:31 +10:00
unkinben 15f934cd0b perf: use yaml.CSafeLoader/CDumper for 4x faster virtual index merge (#39)
Closes #34

## Summary

- At module load time, a `try/except` selects `yaml.CSafeLoader` / `yaml.CDumper` (C extensions) when libyaml is available, otherwise falls back to `yaml.SafeLoader` / `yaml.Dumper`
- `_HelmDumper` inherits from whichever dumper base was selected — custom datetime/date representers are registered the same way as before
- `_merge_helm_indexes` uses `yaml.load(raw_data, Loader=_YamlLoader)` instead of `yaml.safe_load`
- No change to `yaml.dump(...)` call — it already passes `Dumper=_HelmDumper`, which now inherits from the C base when available
- Five new tests in `TestYamlExtensionSelection` cover: loader/dumper base are classes, `_HelmDumper` inherits from the selected base, C extensions used when available, loader can parse YAML

## Measured performance gain

19-member `helm-all` virtual repo, real upstream data, Docker (AlmaLinux 9):

| | `merge=` time |
|---|---|
| Before (SafeLoader + Dumper) | **38,877ms** |
| After (CSafeLoader + CDumper) | **9,625ms** |
| Speedup | **4.0×** |

Local microbenchmark (500 charts × 10 versions × 19 members, 3 runs avg):
- Before: **40.8s** → After: **6.1s** (**6.7×** faster)

## Test plan

- [x] 283 unit tests pass (`make test`)
- [x] Wheel builds cleanly (`uv build --wheel`)
- [x] C extension confirmed available in AlmaLinux 9 container: `yaml.CSafeLoader: <class 'yaml.cyaml.CSafeLoader'>`
- [x] Baseline Docker timing measured with pure-Python path forced: merge=38,877ms
- [x] After Docker timing measured with C extension path: merge=9,625ms

Reviewed-on: #39
2026-05-02 11:51:00 +10:00
unkinben 7b6c69b70f perf: offload virtual repo merge to thread pool via asyncio.to_thread (#38)
Closes #35

## Summary

- Wraps `handler.merge(...)` in `await asyncio.to_thread(...)` so the CPU-bound YAML parse/merge/dump runs in the thread pool instead of blocking the event loop
- Change is at the generic `handle()` dispatch site — applies to all current and future `_VirtualHandler` implementations without modification
- Also fixes a pre-existing bug in `examples/single-file/remotes.yaml` where `base_url` and `package` keys were merged onto a single line, preventing `docker-compose up` from starting the app

## Measured performance gain

19-member `helm-all` virtual repo, single uvicorn worker, cache miss (38s merge):

| | Concurrent `/health` latency |
|---|---|
| Before (blocking) | **37,721ms** for first request (stalled) |
| After (thread pool) | **8–63ms** for all requests |

## Test plan

- [x] 278 unit tests pass (`make test`)
- [x] Live concurrency test: cache miss merge started in background, 5 concurrent `/health` checks measured — all <65ms
- [x] Baseline comparison: same test with blocking call — first health check stalled 37.7s

Reviewed-on: #38
2026-05-02 01:35:45 +10:00
unkinben 624d858062 fix: rewrite helm index.yaml URLs post-parse to handle relative URLs (#37)
Closes #33

## Summary

- `_merge_helm_indexes` now parses each member's raw YAML first, then rewrites `urls` entries in-place via the new `_rewrite_urls` helper
- **Relative URLs** (e.g. `rancher-2.13.1.tgz`) are prepended with `{proxy_base}/api/v1/remote/{member_name}/`
- **Absolute URLs** matching `base_url` are rewritten to the proxy path (existing behaviour, now correct)
- **Absolute URLs** with a different prefix are left unchanged
- Removes the `_helm.resolve_content` raw-bytes detour from the virtual merge path; `remote/helm.py` is unchanged (still used for direct remote proxying)

## Test plan

- [x] 278 unit tests pass (`make test`)
- [x] New `TestRewriteUrls` class covering relative, absolute-match, absolute-no-match, leading-slash, and multi-URL cases
- [x] New `test_relative_urls_rewritten_to_proxy` in `TestMergeHelmIndexes`
- [x] Updated `test_first_member_wins_on_duplicate_name_and_version` to assert on proxy remote name (not upstream hostname)
- [x] Live Docker test: Rancher `index.yaml` relative URLs rewritten correctly to `http://localhost:8000/api/v1/remote/rancher-stable/rancher-2.14.1.tgz` etc.
- [x] `helm-all` virtual (19 members) returns HTTP 200 with 395k-line merged index on cache miss

Reviewed-on: #37
2026-05-02 01:22:16 +10:00
unkinben 1656664dfa refactor: split config into remotes/virtuals/locals sections (#31)
ci/woodpecker/tag/docker Pipeline was successful
Repository types now live under dedicated top-level keys instead of a
shared remotes: block distinguished by a type field:

  remotes:   caching proxy remotes (no type field needed)
  virtuals:  virtual merged-index repositories
  locals:    local upload repositories

Routes for local repos move from /api/v1/remote/ to /api/v1/local/.
config.py gains get_virtual_config() and get_local_config() lookups.
Root endpoint now reports all three sections. Drop root conf.d/ (was
an exact duplicate of examples/conf.d-method/).

Reviewed-on: #31
2026-04-30 23:50:20 +10:00
25 changed files with 1470 additions and 500 deletions
+143 -118
View File
@@ -4,13 +4,14 @@ FastAPI caching proxy that downloads and stores files from remote sources in S3-
## Features
- Remote definitions via `remotes.yaml` — generic HTTP, Alpine APK, RPM, Docker, PyPI, npm, Helm
- Remote definitions via `remotes.yaml` — generic HTTP, Alpine APK, RPM, Docker, PyPI, npm, Helm, Puppet Forge, Terraform/OpenTofu registry
- Virtual repositories — merge multiple remotes of the same package type into a single unified index
- Immutable/mutable caching model with per-remote TTLs
- Conditional revalidation (`If-None-Match` / `If-Modified-Since`) on TTL expiry
- Stale-on-upstream-error: refreshes TTL when backend is unreachable rather than evicting
- URL rewriting for PyPI simple index, npm metadata, and Helm `index.yaml`
- Access control via regex patterns — unmatched paths return 403
- Docker tag banning — block named tags (e.g. `latest`) while allowing digest pulls
## Architecture
@@ -61,8 +62,10 @@ src/artifactapi/
├── generic.py — generic HTTP remotes
├── helm.py — Helm index.yaml URL rewriting
├── npm.py — npm metadata URL rewriting
├── puppet.py — Puppet Forge JSON URL rewriting
├── python.py — PyPI URL construction + HTML rewriting
── rpm.py — RPM remotes
── rpm.py — RPM remotes
└── terraform.py — Terraform/OpenTofu registry URL construction + download URL rewriting
```
## API Endpoints
@@ -70,10 +73,11 @@ src/artifactapi/
| Method | Path | Description |
|---|---|---|
| `GET` | `/api/v1/remote/{remote}/{path}` | Fetch artifact (auto-cache on miss) |
| `PUT` | `/api/v1/remote/{remote}/{path}` | Upload to local remote |
| `HEAD` | `/api/v1/remote/{remote}/{path}` | Check existence (local remotes) |
| `DELETE` | `/api/v1/remote/{remote}/{path}` | Delete from local remote |
| `GET` | `/api/v1/virtual/{virtual}/{path}` | Fetch from virtual (merged) repository |
| `GET` | `/api/v1/local/{local}/{path}` | Download from local repository |
| `PUT` | `/api/v1/local/{local}/{path}` | Upload to local repository |
| `HEAD` | `/api/v1/local/{local}/{path}` | Check existence (local) |
| `DELETE` | `/api/v1/local/{local}/{path}` | Delete from local repository |
| `GET` | `/v2/{remote}/{path}` | Docker Registry v2 proxy |
| `PUT` | `/cache/flush` | Flush cache entries |
| `GET` | `/health` | Health check |
@@ -120,15 +124,15 @@ config_dir: conf.d # or an absolute path
remotes: {} # optional base remotes
```
### remotes.yaml Structure
### Configuration structure
The top-level key declares the repository type — no `type:` field needed:
Repositories are declared under three top-level keys matching their type:
```yaml
remote:
remotes: # proxy (caching) remotes
remote-name:
base_url: "https://example.com"
package: "generic" # generic, alpine, rpm, docker, pypi, npm, helm
package: "generic" # generic, alpine, rpm, docker, pypi, npm, helm, puppet, terraform
description: "..."
immutable_patterns: # regex — cached forever
- ".*\\.tar\\.gz$"
@@ -139,16 +143,19 @@ remote:
immutable_ttl: 0 # 0 = indefinitely
mutable_ttl: 3600
virtual:
virtuals: # virtual (merged-index) repositories
virtual-name:
package: "helm"
members:
- remote-name-1
- remote-name-2
- remote-a
- remote-b
local:
locals: # local upload repositories (no base_url)
local-name:
package: "generic"
cache:
immutable_ttl: 0
mutable_ttl: 0
```
## Remote Types
@@ -158,7 +165,7 @@ local:
Arbitrary HTTP file servers — GitHub releases, HashiCorp, custom servers.
```yaml
remote:
remotes:
github:
base_url: "https://github.com"
package: "generic"
@@ -185,7 +192,7 @@ Access: `GET /api/v1/remote/github/owner/repo/releases/download/v1.0/binary.tar.
### alpine
```yaml
remote:
remotes:
alpine:
base_url: "https://dl-cdn.alpinelinux.org"
package: "alpine"
@@ -201,7 +208,7 @@ remote:
### rpm
```yaml
remote:
remotes:
almalinux:
base_url: "https://mirror.example.com/almalinux"
package: "rpm"
@@ -218,7 +225,7 @@ remote:
### docker
```yaml
remote:
remotes:
dockerhub:
base_url: "https://registry-1.docker.io"
package: "docker"
@@ -239,6 +246,26 @@ remote:
Tag manifests and `/tags/list` are built-in mutable patterns. Digest-addressed blobs are immutable.
#### Banning tags
Set `ban_tags_enabled: true` and list named tags in `ban_tags` to block specific tag references. Requests for a banned tag return `403`. Digest-addressed pulls (`sha256:…`) are never blocked, so images already in use can still be referenced by digest.
```yaml
remotes:
dockerhub:
base_url: "https://registry-1.docker.io"
package: "docker"
ban_tags_enabled: true
ban_tags:
- latest # force pinned tags in CI/CD
- edge
cache:
immutable_ttl: 0
mutable_ttl: 300
```
`ban_tags_enabled` defaults to `false`. Setting it to `true` with an empty `ban_tags` list has no effect.
For RKE2/containerd, configure `/etc/rancher/rke2/registries.yaml`:
```yaml
@@ -258,7 +285,7 @@ mirrors:
### pypi
```yaml
remote:
remotes:
pypi:
base_url: "https://files.pythonhosted.org"
package: "pypi"
@@ -289,7 +316,7 @@ default = true
### npm
```yaml
remote:
remotes:
npm:
base_url: "https://registry.npmjs.org"
package: "npm"
@@ -315,7 +342,7 @@ registry=https://artifacts.example.com/api/v1/remote/npm/
### helm
```yaml
remote:
remotes:
hashicorp-helm:
base_url: "https://helm.releases.hashicorp.com"
package: "helm"
@@ -336,6 +363,94 @@ helm repo add hashicorp https://artifacts.example.com/api/v1/remote/hashicorp-he
helm repo update
```
### puppet
Proxy for [Puppet Forge](https://forge.puppet.com) (forgeapi.puppet.com). Module metadata is cached as mutable; versioned module tarballs are cached as immutable.
```yaml
remotes:
puppet-forge:
base_url: "https://forgeapi.puppet.com"
package: "puppet"
check_mutable_updates: true
immutable_patterns:
- "^v3/files/.*\\.tar\\.gz$"
cache:
immutable_ttl: 0 # module tarballs cached indefinitely
mutable_ttl: 600 # module metadata refreshed after 10 minutes
```
`v3/modules/` and `v3/releases` are built-in mutable patterns — module metadata pages expire after `mutable_ttl` and are re-fetched on the next request.
**URL rewriting**: the proxy rewrites `file_uri` fields in Forge JSON responses from relative paths (`/v3/files/…`) to absolute proxy URLs. g10k resolves download URLs with Go's `url.ResolveReference`, so an absolute `file_uri` overrides the forge base entirely — tarballs download straight from the proxy without a second hop.
**Client configuration — g10k**: set `forge_base_url` in the g10k config file:
```yaml
# g10k.yaml
cachedir: /tmp/g10k
forge_base_url: https://artifacts.example.com/api/v1/remote/puppet-forge
sources:
control:
remote: git@git.example.com:puppet/control.git
basedir: /etc/puppetlabs/code/environments
```
Alternatively, set the URL per-Puppetfile with the `forge.baseUrl` directive (works with `-puppetfile` mode and does not require a config file):
```ruby
forge.baseUrl https://artifacts.example.com/api/v1/remote/puppet-forge
mod 'puppetlabs-stdlib', '9.7.0'
mod 'puppetlabs-inifile', '6.2.0'
```
### terraform
Proxy for [Terraform](https://registry.terraform.io) / [OpenTofu](https://opentofu.org) provider registries using the [Registry Protocol](https://developer.hashicorp.com/terraform/internals/provider-registry-protocol). Provider version listings are mutable; per-version download info is immutable.
Two remotes are needed: one for the registry API and one for the release CDN (where the actual `.zip` binaries live):
```yaml
remotes:
terraform-registry:
base_url: "https://registry.terraform.io"
package: "terraform"
releases_remote: "hashicorp-releases" # name of the CDN remote below
immutable_patterns:
- "[^/]+/[^/]+/[^/]+/download/[^/]+/[^/]+$"
cache:
immutable_ttl: 0 # per-version download info cached indefinitely
mutable_ttl: 300 # provider version lists refreshed after 5 minutes
hashicorp-releases:
base_url: "https://releases.hashicorp.com"
package: "generic"
immutable_patterns:
- ".*\\.zip$"
- ".*SHA256SUMS(\\.sig)?$"
cache:
immutable_ttl: 0
mutable_ttl: 0
```
`{namespace}/{type}/versions` is a built-in mutable pattern — the version list expires after `mutable_ttl` and is re-fetched on the next request.
**URL rewriting**: the `download_url`, `shasums_url`, and `shasums_signature_url` fields in per-version download info JSON are rewritten from `releases.hashicorp.com` to point at the remote named by `releases_remote`, so Terraform fetches binaries through the proxy.
**Client configuration**: redirect Terraform's provider registry lookup via `.terraformrc` without changing any provider source addresses in your Terraform code:
```hcl
# ~/.terraformrc (or /etc/terraform.rc, or TF_CLI_CONFIG_FILE)
host "registry.terraform.io" {
services = {
"providers.v1" = "http://artifacts.example.com/api/v1/remote/terraform-registry/"
}
}
```
With this in place, `terraform init` / `tofu init` fetches provider metadata from the proxy and downloads zips from the `hashicorp-releases` remote. No changes to `.tf` files are needed.
### virtual
A virtual repository presents a single unified index built from multiple member remotes of the same package type. Clients configure one endpoint and get access to all member remotes transparently.
@@ -343,7 +458,7 @@ A virtual repository presents a single unified index built from multiple member
All members must share the same `package` type as the virtual repo. Currently supported package types: `helm`.
```yaml
remote:
remotes:
helm-hashicorp:
base_url: "https://helm.releases.hashicorp.com"
package: "helm"
@@ -362,7 +477,7 @@ remote:
immutable_ttl: 0
mutable_ttl: 3600
virtual:
virtuals:
helm-all:
package: "helm"
members:
@@ -386,7 +501,7 @@ If a member is unreachable and has no cached index, it is skipped and a warning
**Caching:**
The merged index is cached using `min(mutable_ttl)` across all members. Each member's raw index is cached in S3 under its own remote key by the normal proxy rules; the virtual handler reuses those copies when available.
The merged index is cached using `min(mutable_ttl)` across all members. Each member's raw index is cached in S3 under its own remote key; the virtual handler reuses those copies when available. On rebuild, each member's parsed index is also stored as a compact msgpack file (`index.msgpack`) alongside the raw YAML, eliminating the YAML parse cost on subsequent rebuilds.
**Helm example:**
@@ -400,7 +515,7 @@ Chart tarball URLs in the merged `index.yaml` are rewritten to point at the indi
### local
```yaml
local:
locals:
local-generic:
package: "generic"
description: "Local file repository"
@@ -409,99 +524,7 @@ local:
mutable_ttl: 0
```
No `base_url`. Files are uploaded via `PUT` and served via `GET`.
## Migration
### Splitting a single remotes file into per-type files
The old format used a single `remotes:` map with an explicit `type:` field on each entry. The new format uses top-level type keys (`remote:`, `virtual:`, `local:`) and supports splitting across multiple files via `config_dir`.
**Before** (`remotes.yaml`):
```yaml
remotes:
dockerhub:
base_url: "https://registry-1.docker.io"
type: "remote"
package: "docker"
cache:
immutable_ttl: 0
mutable_ttl: 300
hashicorp-helm:
base_url: "https://helm.releases.hashicorp.com"
type: "remote"
package: "helm"
immutable_patterns:
- "\\.tgz$"
cache:
immutable_ttl: 0
mutable_ttl: 3600
helm-all:
type: "virtual"
package: "helm"
members:
- hashicorp-helm
local-files:
type: "local"
package: "generic"
```
**After** — one file per type + package type, with a main config pointing at the directory:
`config.yaml`:
```yaml
config_dir: conf.d
```
`conf.d/remote-docker.yaml`:
```yaml
remote:
dockerhub:
base_url: "https://registry-1.docker.io"
package: "docker"
cache:
immutable_ttl: 0
mutable_ttl: 300
```
`conf.d/remote-helm.yaml`:
```yaml
remote:
hashicorp-helm:
base_url: "https://helm.releases.hashicorp.com"
package: "helm"
immutable_patterns:
- "\\.tgz$"
cache:
immutable_ttl: 0
mutable_ttl: 3600
```
`conf.d/virtual-helm.yaml`:
```yaml
virtual:
helm-all:
package: "helm"
members:
- hashicorp-helm
```
`conf.d/local-generic.yaml`:
```yaml
local:
local-files:
package: "generic"
```
Set `CONFIG_PATH` to the main file:
```
CONFIG_PATH=/etc/artifactapi/config.yaml
```
Files in `conf.d/` are merged alphabetically; later files win on conflicts within the same remote name.
No `base_url`. Files are uploaded via `PUT /api/v1/local/{name}/{path}` and downloaded via `GET /api/v1/local/{name}/{path}`.
## Caching Model
@@ -524,6 +547,8 @@ Each package type has built-in defaults that are merged with any user-defined `m
| `docker` | Tag manifests (non-digest refs), `/tags/list` |
| `pypi` | `simple/` (per-package and top-level index pages) |
| `helm` | `index\.yaml$` |
| `puppet` | `^v3/modules/`, `^v3/releases` |
| `terraform` | `[^/]+/[^/]+/versions$` |
| `npm` | *(none built-in — define via `mutable_patterns`)* |
| `generic` | *(none)* |
@@ -540,7 +565,7 @@ When a mutable file expires and the upstream is unreachable (connection refused,
Set `quarantine_new: true` and `quarantine_days: N` on a remote to block immutable artifacts published within the last N days. Requests return `404` until the quarantine period expires, giving time to detect malicious packages before they are consumed.
```yaml
remote:
remotes:
pypi:
base_url: "https://files.pythonhosted.org"
package: "pypi"
-11
View File
@@ -1,11 +0,0 @@
remotes:
alpine:
base_url: "https://dl-cdn.alpinelinux.org"
type: "remote"
package: "alpine"
description: "Alpine Linux APK package repository"
immutable_patterns:
- ".*/x86_64/.*\\.apk$"
cache:
immutable_ttl: 0
mutable_ttl: 7200
-12
View File
@@ -1,12 +0,0 @@
remotes:
github:
base_url: "https://github.com"
type: "remote"
package: "generic"
description: "GitHub releases and files"
immutable_patterns:
- "gruntwork-io/terragrunt/.*terragrunt_linux_amd64.*"
- "prometheus/node_exporter/.*/node_exporter-.*\\.linux-amd64\\.tar\\.gz$"
cache:
immutable_ttl: 0
mutable_ttl: 0
-17
View File
@@ -1,17 +0,0 @@
remotes:
pypi:
base_url: "https://files.pythonhosted.org"
type: "remote"
package: "pypi"
description: "Python Package Index"
check_mutable_updates: true
quarantine_new: true
quarantine_days: 3
immutable_patterns:
- "packages/.*\\.whl$"
- "packages/.*\\.whl\\.metadata$"
- "packages/.*\\.tar\\.gz$"
- "packages/.*\\.zip$"
cache:
immutable_ttl: 0
mutable_ttl: 600
+1 -1
View File
@@ -1,4 +1,4 @@
remote:
remotes:
alpine:
base_url: "https://dl-cdn.alpinelinux.org"
package: "alpine"
+1 -1
View File
@@ -1,4 +1,4 @@
remote:
remotes:
github:
base_url: "https://github.com"
package: "generic"
+1 -1
View File
@@ -1,4 +1,4 @@
remote:
remotes:
pypi:
base_url: "https://files.pythonhosted.org"
package: "pypi"
+74 -23
View File
@@ -1,10 +1,5 @@
# Example remotes configuration — copy and adapt for your environment.
#
# Top-level keys determine the repository type:
# remote: — proxy to an upstream URL, cache responses in S3
# virtual: — merge indexes from multiple member remotes into one
# local: — store files uploaded via PUT, serve via GET
#
# immutable_patterns: artifacts cached forever (e.g. release binaries, versioned tags).
# mutable_patterns: artifacts that expire after cache.mutable_ttl seconds and are
# re-fetched from upstream on next request (e.g. index files,
@@ -37,8 +32,7 @@
#database:
# url: "postgresql://artifacts:artifacts123@localhost:5432/artifacts"
#
remote:
remotes:
github:
base_url: "https://github.com"
package: "generic"
@@ -67,7 +61,7 @@ remote:
- "stalwartlabs/stalwart/.*/stalwart-foundationdb-x86_64-unknown-linux-gnu\\.tar\\.gz$"
- "stalwartlabs/stalwart/.*/stalwart-x86_64-unknown-linux-gnu\\.tar\\.gz$"
cache:
immutable_ttl: 0
immutable_ttl: 0 # Files cached indefinitely
mutable_ttl: 0
github-archive:
@@ -86,8 +80,8 @@ remote:
# Only applies to user-defined mutable_patterns, not package-type defaults.
check_mutable_updates: true
cache:
immutable_ttl: 0 # Tag archives cached indefinitely
mutable_ttl: 86400 # Branch archives refreshed after 1 day
immutable_ttl: 0 # Tag archives cached indefinitely
mutable_ttl: 86400 # Branch archives refreshed after 1 day
gitea-dl:
base_url: "https://dl.gitea.com"
@@ -96,7 +90,7 @@ remote:
immutable_patterns:
- "act_runner/.*/act_runner-.*-linux-amd64$"
cache:
immutable_ttl: 0
immutable_ttl: 0 # Files cached indefinitely
mutable_ttl: 0
hashicorp-releases:
@@ -116,7 +110,7 @@ remote:
- "nomad/.*/nomad_.*_linux_amd64\\.zip$"
- "packer/.*/packer_.*_linux_amd64\\.zip$"
cache:
immutable_ttl: 0
immutable_ttl: 0 # Files cached indefinitely
mutable_ttl: 0
alpine:
@@ -129,7 +123,7 @@ remote:
# and is always re-fetched on expiry — conditional checks are skipped for
# built-in mutable patterns regardless of this flag.
cache:
immutable_ttl: 0
immutable_ttl: 0 # Files cached indefinitely
mutable_ttl: 7200 # Index files (APKINDEX.tar.gz) cached for 2 hours
almalinux:
@@ -140,9 +134,12 @@ remote:
- ".*/x86_64/.*\\.rpm$"
- ".*/noarch/.*\\.rpm$"
- ".*/repodata/.*$"
- ".*\\.rpm$"
- ".*\\.rpm$" # Allow all RPM files
# repomd.xml / repodata are package-type defaults — always re-fetched on
# expiry. check_mutable_updates would only apply to any custom
# mutable_patterns added here.
cache:
immutable_ttl: 0
immutable_ttl: 0 # Files cached indefinitely
mutable_ttl: 7200 # Metadata files cached for 2 hours
epel:
@@ -156,8 +153,8 @@ remote:
- ".*/noarch/.*\\.rpm$"
- ".*/repodata/.*$"
cache:
immutable_ttl: 0
mutable_ttl: 7200
immutable_ttl: 0 # Files cached indefinitely
mutable_ttl: 7200 # Metadata files cached for 2 hours
fedora:
base_url: "https://gsl-syd.mm.fcix.net/fedora/linux"
@@ -170,7 +167,7 @@ remote:
- ".*/noarch/.*\\.rpm$"
- "updates/.*/Everything/x86_64/repodata/.*$"
cache:
immutable_ttl: 0
immutable_ttl: 0 # Files cached indefinitely
mutable_ttl: 300 # Metadata files cached for 5 minutes
ghcr:
@@ -179,6 +176,9 @@ remote:
description: "GitHub Container Registry"
# username: "your-github-username"
# password: "your-github-pat" # needs read:packages scope
# Docker manifest/tag-list patterns are package-type defaults — always
# re-fetched on expiry. check_mutable_updates only applies to any custom
# mutable_patterns you add (e.g. a metadata endpoint).
cache:
immutable_ttl: 0
mutable_ttl: 300
@@ -195,7 +195,12 @@ remote:
base_url: "https://files.pythonhosted.org"
package: "pypi"
description: "Python Package Index — simple index and package files via a single remote"
# simple/ requests are transparently fetched from pypi.org; package files come from
# files.pythonhosted.org (base_url). URLs in the simple index are rewritten to this remote.
check_mutable_updates: true
# Block packages published within the last 3 days (supply-chain attack mitigation).
# Immutable artifacts (wheel/sdist) newer than quarantine_days return 404 until
# the window passes. Disable by setting quarantine_new: false or removing both keys.
quarantine_new: true
quarantine_days: 3
immutable_patterns:
@@ -246,8 +251,8 @@ remote:
immutable_patterns:
- "\\.tgz$"
cache:
immutable_ttl: 0
mutable_ttl: 3600
immutable_ttl: 0 # Chart tarballs are versioned — cache forever
mutable_ttl: 3600 # index.yaml refreshed after 1 hour
metallb:
base_url: "https://metallb.github.io/metallb"
@@ -447,7 +452,53 @@ remote:
immutable_ttl: 0
mutable_ttl: 3600
virtual:
puppet-forge:
base_url: "https://forgeapi.puppet.com"
package: "puppet"
description: "Puppet Forge module registry"
# Module metadata (v3/modules/, v3/releases) is mutable by default.
# Configure r10k / librarian-puppet with this remote as the Forge URL:
# http://your-proxy/api/v1/remote/puppet-forge
check_mutable_updates: true
immutable_patterns:
- "^v3/files/.*\\.tar\\.gz$"
cache:
immutable_ttl: 0 # Module tarballs cached indefinitely
mutable_ttl: 600 # Module metadata refreshed after 10 minutes
terraform-registry:
base_url: "https://registry.terraform.io"
package: "terraform"
description: "Terraform/OpenTofu provider registry (Registry Protocol)"
# Provider version lists are mutable by default.
# Point Terraform at this remote via .terraformrc:
# host "registry.terraform.io" {
# services = {
# "providers.v1" = "http://your-proxy/api/v1/remote/terraform-registry/"
# }
# }
# releases_remote must match the name of the hashicorp-releases remote below,
# so download_url / shasums_url in per-version download info are rewritten.
releases_remote: "hashicorp-releases"
immutable_patterns:
- "[^/]+/[^/]+/[^/]+/download/[^/]+/[^/]+$"
cache:
immutable_ttl: 0 # Per-version download info cached indefinitely
mutable_ttl: 300 # Provider versions list refreshed after 5 minutes
hashicorp-releases:
base_url: "https://releases.hashicorp.com"
package: "generic"
description: "HashiCorp releases CDN — provider zips, SHA256SUMS, and signatures"
immutable_patterns:
- ".*\\.zip$"
- ".*SHA256SUMS(\\.sig)?$"
cache:
immutable_ttl: 0 # Release artifacts cached indefinitely
mutable_ttl: 0
virtuals:
helm-all:
package: "helm"
description: "Virtual repository merging all helm remotes — member order is priority order for duplicate chart+version"
@@ -472,10 +523,10 @@ virtual:
- jfrog
- openvox
local:
locals:
local-generic:
package: "generic"
description: "Local generic file repository"
cache:
immutable_ttl: 0
immutable_ttl: 0 # Files cached indefinitely
mutable_ttl: 0
+1
View File
@@ -14,6 +14,7 @@ dependencies = [
"lxml>=4.9.0",
"prometheus-client>=0.19.0",
"python-multipart>=0.0.6",
"msgpack>=1.0.0",
]
requires-python = ">=3.11"
readme = "README.md"
+50 -15
View File
@@ -1,3 +1,4 @@
import asyncio
import hashlib
import json
import logging
@@ -33,6 +34,16 @@ async def proxy(request: Request, remote_name: str, path: str, storage, cache, c
logger.info(f"PATTERN BLOCKED: {remote_name}/{path}")
raise HTTPException(status_code=403, detail="Image not allowed by configuration patterns")
if remote_config.get("ban_tags_enabled", False):
ban_tags = remote_config.get("ban_tags", [])
if ban_tags:
tag_match = re.search(r"/manifests/([^/]+)$", path)
if tag_match:
tag = tag_match.group(1)
if not tag.startswith("sha256:") and tag in ban_tags:
logger.info(f"TAG BANNED: {remote_name}/{path} (tag: {tag})")
raise HTTPException(status_code=403, detail=f"Tag '{tag}' is not permitted on this remote")
base_url = remote_config.get("base_url", "").rstrip("/")
remote_url = f"{base_url}/v2/{path}"
@@ -47,23 +58,39 @@ async def proxy(request: Request, remote_name: str, path: str, storage, cache, c
if not await _proxy.handle_expired_mutable(remote_name, path, remote_url, config, cache, storage):
cached_key = None
lock_acquired = False
if not cached_key:
lock_acquired = cache.acquire_fetch_lock(remote_name, path)
if not lock_acquired:
# Another pod is already fetching — poll storage briefly before issuing a duplicate upstream request
for _ in range(10):
await asyncio.sleep(0.5)
probe_key = storage.get_object_key(remote_name, path)
if storage.exists(probe_key):
cached_key = probe_key
break
if not cached_key:
logger.info(f"Cache MISS: {remote_name}/{path} - fetching from remote: {remote_url}")
result = await _proxy.cache_single_artifact(remote_url, remote_name, path, storage, remote_config)
if result["status"] == "error":
raise HTTPException(status_code=502, detail=f"Failed to fetch: {result['error']}")
if result["status"] == "cached" and is_mutable:
cache_config = config.get_cache_config(remote_name)
mutable_ttl = cache_config.get("mutable_ttl", 3600)
cache.mark_index_cached(remote_name, path, mutable_ttl)
logger.info(f"Mutable file cached with TTL: {remote_name}/{path} (ttl: {mutable_ttl}s)")
if result.get("etag") or result.get("last_modified"):
cache.store_mutable_meta(remote_name, path, result.get("etag"), result.get("last_modified"))
if not is_mutable:
published = result.get("last_modified")
if published:
cache.store_artifact_published(remote_name, path, published)
_proxy._check_quarantine(remote_name, published, config)
try:
result = await _proxy.cache_single_artifact(remote_url, remote_name, path, storage, remote_config)
if result["status"] == "error":
raise HTTPException(status_code=502, detail=f"Failed to fetch: {result['error']}")
if result["status"] == "cached" and is_mutable:
cache_config = config.get_cache_config(remote_name)
mutable_ttl = cache_config.get("mutable_ttl", 3600)
cache.mark_index_cached(remote_name, path, mutable_ttl)
logger.info(f"Mutable file cached with TTL: {remote_name}/{path} (ttl: {mutable_ttl}s)")
if result.get("etag") or result.get("last_modified"):
cache.store_mutable_meta(remote_name, path, result.get("etag"), result.get("last_modified"))
if not is_mutable:
published = result.get("last_modified")
if published:
cache.store_artifact_published(remote_name, path, published)
_proxy._check_quarantine(remote_name, published, config)
finally:
if lock_acquired:
cache.release_fetch_lock(remote_name, path)
elif not is_mutable:
published = cache.get_artifact_published(remote_name, path)
if not published:
@@ -90,6 +117,14 @@ async def proxy(request: Request, remote_name: str, path: str, storage, cache, c
content_type = "application/vnd.oci.image.manifest.v1+json"
digest = f"sha256:{hashlib.sha256(artifact_data).hexdigest()}"
# Cross-link tag manifests to their sha256 digest key so digest-addressed pulls hit cache
if is_mutable and "/manifests/" in path:
digest_path = re.sub(r"/manifests/[^/]+$", f"/manifests/{digest}", path)
digest_key = storage.get_object_key(remote_name, digest_path)
if not storage.exists(digest_key):
storage.upload(digest_key, artifact_data)
headers = {
"Docker-Distribution-Api-Version": "registry/2.0",
"Docker-Content-Digest": digest,
+21 -16
View File
@@ -1,5 +1,6 @@
import hashlib
import logging
import os
from fastapi import HTTPException, Response, UploadFile
from fastapi.responses import JSONResponse
@@ -7,12 +8,23 @@ from fastapi.responses import JSONResponse
logger = logging.getLogger(__name__)
def download(remote_name: str, path: str, storage, database, config) -> Response:
if not config.get_local_config(remote_name):
raise HTTPException(status_code=404, detail=f"Local repository '{remote_name}' not configured")
metadata = database.get_local_file_metadata(remote_name, path)
if not metadata:
raise HTTPException(status_code=404, detail="File not found")
content = storage.download_object(metadata["s3_key"])
return Response(
content=content,
media_type=metadata.get("content_type", "application/octet-stream"),
headers={"Content-Disposition": f"attachment; filename={os.path.basename(path)}"},
)
async def upload(remote_name: str, path: str, file: UploadFile, storage, database, config) -> JSONResponse:
remote_config = config.get_remote_config(remote_name)
if not remote_config:
raise HTTPException(status_code=404, detail=f"Remote '{remote_name}' not configured")
if remote_config.get("type") != "local":
raise HTTPException(status_code=400, detail="Upload only supported for local repositories")
if not config.get_local_config(remote_name):
raise HTTPException(status_code=404, detail=f"Local repository '{remote_name}' not configured")
try:
content = await file.read()
@@ -59,12 +71,8 @@ async def upload(remote_name: str, path: str, file: UploadFile, storage, databas
def check_exists(remote_name: str, path: str, database, config) -> Response:
remote_config = config.get_remote_config(remote_name)
if not remote_config:
raise HTTPException(status_code=404, detail=f"Remote '{remote_name}' not configured")
if remote_config.get("type") != "local":
raise HTTPException(status_code=405, detail="HEAD method only supported for local repositories")
if not config.get_local_config(remote_name):
raise HTTPException(status_code=404, detail=f"Local repository '{remote_name}' not configured")
try:
metadata = database.get_local_file_metadata(remote_name, path)
@@ -87,11 +95,8 @@ def check_exists(remote_name: str, path: str, database, config) -> Response:
def delete(remote_name: str, path: str, storage, database, config) -> JSONResponse:
remote_config = config.get_remote_config(remote_name)
if not remote_config:
raise HTTPException(status_code=404, detail=f"Remote '{remote_name}' not configured")
if remote_config.get("type") != "local":
raise HTTPException(status_code=400, detail="Delete only supported for local repositories")
if not config.get_local_config(remote_name):
raise HTTPException(status_code=404, detail=f"Local repository '{remote_name}' not configured")
try:
s3_key = database.delete_local_file(remote_name, path)
+9 -13
View File
@@ -11,7 +11,9 @@ from fastapi import HTTPException, Request, Response
from ..auth import get_docker_token_for_response
from ..remote import helm as _helm
from ..remote import npm as _npm
from ..remote import puppet as _puppet
from ..remote import python as _pypi
from ..remote import terraform as _terraform
from ..remote.base import get_content_type
logger = logging.getLogger(__name__)
@@ -84,6 +86,11 @@ def _resolve_content(
return _npm.resolve_content(data, path, filename, remote_config.get("immutable_patterns", []), base_url, proxy_base, remote_name)
if package == "helm":
return _helm.resolve_content(data, path, filename, base_url, proxy_base, remote_name)
if package == "puppet":
return _puppet.resolve_content(data, path, filename, base_url, proxy_base, remote_name)
if package == "terraform":
releases_remote = remote_config.get("releases_remote")
return _terraform.resolve_content(data, path, filename, base_url, proxy_base, remote_name, releases_remote)
return data, get_content_type(filename)
@@ -93,6 +100,8 @@ def construct_url(remote_config: dict, path: str) -> str:
return f"{base_url}/v2/{path}"
if remote_config.get("package") == "pypi":
return _pypi.construct_url(base_url, path)
if remote_config.get("package") == "terraform":
return _terraform.construct_url(base_url, path)
return f"{base_url}/{path}"
@@ -218,19 +227,6 @@ async def handle(request: Request, remote_name: str, path: str, storage, cache,
if not remote_config:
raise HTTPException(status_code=404, detail=f"Remote '{remote_name}' not configured")
if remote_config.get("type") == "local":
metadata = database.get_local_file_metadata(remote_name, path)
if not metadata:
raise HTTPException(status_code=404, detail="File not found")
content = storage.download_object(metadata["s3_key"])
if content is None:
raise HTTPException(status_code=500, detail="File not accessible")
return Response(
content=content,
media_type=metadata.get("content_type", "application/octet-stream"),
headers={"Content-Disposition": f"attachment; filename={os.path.basename(path)}"},
)
path_parts = path.split("/")
if len(path_parts) >= 2:
repo_path = f"{path_parts[0]}/{path_parts[1]}"
+115 -25
View File
@@ -6,15 +6,21 @@ from datetime import UTC, date, datetime
from typing import Protocol, runtime_checkable
import httpx
import msgpack as _msgpack
import yaml
from fastapi import HTTPException, Request, Response
from ..remote import helm as _helm
logger = logging.getLogger(__name__)
try:
_YamlLoader = yaml.CSafeLoader
_YamlDumperBase = yaml.CDumper
except AttributeError:
_YamlLoader = yaml.SafeLoader
_YamlDumperBase = yaml.Dumper
class _HelmDumper(yaml.Dumper):
class _HelmDumper(_YamlDumperBase):
"""YAML dumper that serializes datetime/date objects back to ISO 8601 strings.
yaml.safe_load converts timestamp-shaped YAML scalars (e.g. chart `created`
@@ -37,21 +43,43 @@ _HelmDumper.add_representer(datetime, _repr_datetime)
_HelmDumper.add_representer(date, _repr_date)
def _entries_to_msgpack_safe(entries: dict) -> dict:
"""Convert datetime/date values to ISO strings for msgpack serialization."""
result = {}
for chart, versions in entries.items():
safe_versions = []
for v in versions:
safe_v = {}
for k, val in v.items():
if isinstance(val, datetime):
safe_v[k] = val.isoformat()
elif isinstance(val, date):
safe_v[k] = val.isoformat()
else:
safe_v[k] = val
safe_versions.append(safe_v)
result[chart] = safe_versions
return result
async def _get_member_index(
member_name: str,
member_cfg: dict,
path: str,
storage,
cache,
) -> tuple[str, dict, int, bytes | None]:
) -> tuple[str, dict, int, bytes | None, dict | None]:
"""Fetch or retrieve cached index.yaml for one member remote.
Returns (member_name, member_cfg, ttl, raw_bytes).
Returns (member_name, member_cfg, ttl, raw_bytes, parsed_entries).
raw_bytes is None if the member is unreachable and not in S3.
parsed_entries is the pre-parsed entries dict (from msgpack cache), or None.
"""
member_ttl = member_cfg.get("cache", {}).get("mutable_ttl", 3600)
s3_key = storage.get_object_key(member_name, path)
msgpack_key = storage.get_object_key(member_name, "index.msgpack")
raw_data: bytes | None = None
parsed_entries: dict | None = None
if storage.exists(s3_key) and cache.is_index_valid(member_name, path):
try:
@@ -59,6 +87,13 @@ async def _get_member_index(
logger.info(f"Virtual: cache hit for member '{member_name}'")
except Exception:
raw_data = None
if raw_data is not None and storage.exists(msgpack_key):
try:
packed = storage.download_object(msgpack_key)
parsed_entries = _msgpack.unpackb(packed, raw=False)
logger.debug(f"Virtual: msgpack hit for member '{member_name}'")
except Exception:
parsed_entries = None
if raw_data is None:
base_url = member_cfg.get("base_url", "").rstrip("/")
@@ -76,35 +111,74 @@ async def _get_member_index(
raw_data = response.content
except Exception as e:
logger.warning(f"Virtual: failed to fetch index.yaml from member '{member_name}': {e}")
return member_name, member_cfg, member_ttl, None
return member_name, member_cfg, member_ttl, None, None
try:
storage.upload(s3_key, raw_data)
cache.mark_index_cached(member_name, path, member_ttl)
except Exception as e:
logger.warning(f"Virtual: failed to cache index.yaml for member '{member_name}': {e}")
return member_name, member_cfg, member_ttl, raw_data
if parsed_entries is None and raw_data is not None:
try:
index = yaml.load(raw_data, Loader=_YamlLoader)
safe_entries = _entries_to_msgpack_safe(index.get("entries") or {})
storage.upload(msgpack_key, _msgpack.packb(safe_entries, use_bin_type=True))
parsed_entries = safe_entries
except Exception as e:
logger.warning(f"Virtual: failed to build msgpack cache for '{member_name}': {e}")
return member_name, member_cfg, member_ttl, raw_data, parsed_entries
def _merge_helm_indexes(raw_indexes: list[bytes], member_names: list[str], member_configs: list[dict], proxy_base: str) -> bytes:
def _rewrite_urls(urls: list, base_url: str, proxy_base: str, member_name: str) -> list:
proxy_remote = f"{proxy_base}/api/v1/remote/{member_name}"
rewritten = []
for url in urls:
if url.startswith(("http://", "https://")):
if base_url and url.startswith(base_url):
url = proxy_remote + url[len(base_url) :]
else:
url = f"{proxy_remote}/{url.lstrip('/')}"
rewritten.append(url)
return rewritten
def _merge_helm_indexes(
raw_indexes: list[bytes],
parsed_entries_list: list[dict | None],
member_names: list[str],
member_configs: list[dict],
proxy_base: str,
) -> bytes:
"""Merge helm index.yaml files with per-member URL rewriting.
Priority is determined by position in member_names: earlier members win
when the same chart name + version appears in multiple remotes.
Uses pre-parsed msgpack entries when available to skip YAML parsing.
"""
merged_entries: dict[str, list] = {}
for raw_data, member_name, member_cfg in zip(raw_indexes, member_names, member_configs):
for raw_data, pre_parsed, member_name, member_cfg in zip(raw_indexes, parsed_entries_list, member_names, member_configs):
base_url = member_cfg.get("base_url", "").rstrip("/")
rewritten, _ = _helm.resolve_content(raw_data, "index.yaml", "index.yaml", base_url, proxy_base, member_name)
try:
index = yaml.safe_load(rewritten)
except Exception as e:
logger.warning(f"Virtual: failed to parse index.yaml from member '{member_name}': {e}")
continue
if pre_parsed is not None:
entries = pre_parsed
else:
try:
index = yaml.load(raw_data, Loader=_YamlLoader)
except Exception as e:
logger.warning(f"Virtual: failed to parse index.yaml from member '{member_name}': {e}")
continue
entries = index.get("entries") or {}
for chart_name, versions in (index.get("entries") or {}).items():
for chart_name, versions in entries.items():
for version_entry in versions:
version_entry["urls"] = _rewrite_urls(
version_entry.get("urls") or [],
base_url,
proxy_base,
member_name,
)
if chart_name not in merged_entries:
merged_entries[chart_name] = list(versions)
else:
@@ -126,7 +200,14 @@ def _merge_helm_indexes(raw_indexes: list[bytes], member_names: list[str], membe
@runtime_checkable
class _VirtualHandler(Protocol):
def accepts_path(self, path: str) -> bool: ...
def merge(self, raw_indexes: list[bytes], member_names: list[str], member_configs: list[dict], proxy_base: str) -> bytes: ...
def merge(
self,
raw_indexes: list[bytes],
parsed_entries: list[dict | None],
member_names: list[str],
member_configs: list[dict],
proxy_base: str,
) -> bytes: ...
def path_error(self) -> str: ...
@@ -134,8 +215,15 @@ class _HelmHandler:
def accepts_path(self, path: str) -> bool:
return path == "index.yaml"
def merge(self, raw_indexes: list[bytes], member_names: list[str], member_configs: list[dict], proxy_base: str) -> bytes:
return _merge_helm_indexes(raw_indexes, member_names, member_configs, proxy_base)
def merge(
self,
raw_indexes: list[bytes],
parsed_entries: list[dict | None],
member_names: list[str],
member_configs: list[dict],
proxy_base: str,
) -> bytes:
return _merge_helm_indexes(raw_indexes, parsed_entries, member_names, member_configs, proxy_base)
def path_error(self) -> str:
return "Virtual helm repositories only serve index.yaml; chart tarballs are served directly by member remotes"
@@ -147,11 +235,9 @@ _HANDLERS: dict[str, _VirtualHandler] = {
async def handle(request: Request, virtual_name: str, path: str, storage, cache, config) -> Response:
virtual_cfg = config.get_remote_config(virtual_name)
virtual_cfg = config.get_virtual_config(virtual_name)
if not virtual_cfg:
raise HTTPException(status_code=404, detail=f"Virtual repository '{virtual_name}' not configured")
if virtual_cfg.get("type") != "virtual":
raise HTTPException(status_code=400, detail=f"'{virtual_name}' is not a virtual repository")
package = virtual_cfg.get("package")
handler = _HANDLERS.get(package)
@@ -188,17 +274,19 @@ async def handle(request: Request, virtual_name: str, path: str, storage, cache,
fetch_ms = int((time.perf_counter() - t_fetch) * 1000)
raw_indexes: list[bytes] = []
used_parsed: list[dict | None] = []
used_members: list[str] = []
used_configs: list[dict] = []
min_ttl: int | None = None
for member_name, member_cfg, member_ttl, raw_data in results:
for member_name, member_cfg, member_ttl, raw_data, parsed_entries in results:
if min_ttl is None or member_ttl < min_ttl:
min_ttl = member_ttl
if raw_data is None:
logger.warning(f"Virtual '{virtual_name}': skipping unreachable member '{member_name}'")
continue
raw_indexes.append(raw_data)
used_parsed.append(parsed_entries)
used_members.append(member_name)
used_configs.append(member_cfg)
@@ -209,7 +297,7 @@ async def handle(request: Request, virtual_name: str, path: str, storage, cache,
min_ttl = 3600
t_merge = time.perf_counter()
merged = handler.merge(raw_indexes, used_members, used_configs, proxy_base)
merged = await asyncio.to_thread(handler.merge, raw_indexes, used_parsed, used_members, used_configs, proxy_base)
merge_ms = int((time.perf_counter() - t_merge) * 1000)
try:
@@ -217,9 +305,11 @@ async def handle(request: Request, virtual_name: str, path: str, storage, cache,
storage.upload(virtual_key, merged)
cache.mark_index_cached(virtual_name, path, min_ttl)
store_ms = int((time.perf_counter() - t_store) * 1000)
msgpack_hits = sum(1 for p in used_parsed if p is not None)
logger.info(
f"Virtual MISS: {virtual_name}/{path} rebuilt from {used_members} "
f"(fetch={fetch_ms}ms merge={merge_ms}ms store={store_ms}ms ttl={min_ttl}s)"
f"(fetch={fetch_ms}ms merge={merge_ms}ms store={store_ms}ms ttl={min_ttl}s "
f"msgpack={msgpack_hits}/{len(used_members)})"
)
except Exception as e:
logger.warning(f"Virtual: failed to store merged index for '{virtual_name}': {e}")
+19
View File
@@ -99,6 +99,25 @@ class RedisCache:
except Exception:
return None
def acquire_fetch_lock(self, remote_name: str, path: str, ttl: int = 30) -> bool:
"""Try to acquire a short-lived fetch lock. Returns True if acquired, False if held by another caller."""
if not self.available:
return True # fail open: no Redis → behave as if we always hold the lock
key = f"fetchlock:{remote_name}:{hashlib.sha256(path.encode()).hexdigest()[:16]}"
try:
return bool(self.client.set(key, 1, nx=True, ex=ttl))
except Exception:
return True
def release_fetch_lock(self, remote_name: str, path: str) -> None:
if not self.available:
return
key = f"fetchlock:{remote_name}:{hashlib.sha256(path.encode()).hexdigest()[:16]}"
try:
self.client.delete(key)
except Exception:
pass
def cleanup_expired_index(self, storage, remote_name: str, path: str) -> None:
if not self.available:
return
+21 -23
View File
@@ -4,21 +4,6 @@ import os
import yaml
_TYPE_KEYS = ("remote", "virtual", "local")
def _normalize_loaded(raw: dict) -> dict:
"""Convert {remote: {...}, virtual: {...}, local: {...}} into {remotes: {name: {type: ..., ...}}}."""
remotes = {}
for type_key in _TYPE_KEYS:
for name, cfg in (raw.get(type_key) or {}).items():
remotes[name] = {"type": type_key, **cfg}
result = {k: v for k, v in raw.items() if k not in _TYPE_KEYS}
if remotes:
result["remotes"] = remotes
return result
_PACKAGE_MUTABLE_PATTERNS: dict[str, list[str]] = {
"alpine": [
r"APKINDEX\.tar\.gz$",
@@ -41,6 +26,13 @@ _PACKAGE_MUTABLE_PATTERNS: dict[str, list[str]] = {
"helm": [
r"index\.yaml$",
],
"puppet": [
r"^v3/modules/",
r"^v3/releases",
],
"terraform": [
r"[^/]+/[^/]+/versions$",
],
"generic": [],
}
@@ -56,10 +48,8 @@ class ConfigManager:
try:
with open(path) as f:
if path.endswith((".yaml", ".yml")):
raw = yaml.safe_load(f) or {}
else:
raw = json.load(f)
return _normalize_loaded(raw)
return yaml.safe_load(f) or {}
return json.load(f)
except FileNotFoundError:
return {}
@@ -67,8 +57,8 @@ class ConfigManager:
def _merge(base: dict, overlay: dict) -> dict:
result = {**base}
for key, value in overlay.items():
if key == "remotes" and isinstance(base.get("remotes"), dict) and isinstance(value, dict):
result["remotes"] = {**base.get("remotes", {}), **value}
if key in ("remotes", "virtuals", "locals") and isinstance(base.get(key), dict) and isinstance(value, dict):
result[key] = {**base.get(key, {}), **value}
else:
result[key] = value
return result
@@ -84,11 +74,11 @@ class ConfigManager:
self._config_dir = None
if os.path.isdir(self.config_path):
return self._load_from_dir(self.config_path) or {"remotes": {}}
return self._load_from_dir(self.config_path) or {"remotes": {}, "virtuals": {}, "locals": {}}
config = self._load_single_file(self.config_path)
if not config:
return {"remotes": {}}
return {"remotes": {}, "virtuals": {}, "locals": {}}
config_dir = config.pop("config_dir", None)
if config_dir:
@@ -136,6 +126,14 @@ class ConfigManager:
self._check_reload()
return self.config.get("remotes", {}).get(remote_name)
def get_virtual_config(self, virtual_name: str) -> dict | None:
self._check_reload()
return self.config.get("virtuals", {}).get(virtual_name)
def get_local_config(self, local_name: str) -> dict | None:
self._check_reload()
return self.config.get("locals", {}).get(local_name)
def get_immutable_patterns(self, remote_name: str, repo_path: str = "") -> list[str]:
remote_config = self.get_remote_config(remote_name)
if not remote_config:
+21 -10
View File
@@ -49,7 +49,13 @@ class ArtifactRequest(BaseModel):
@app.get("/")
def read_root():
config._check_reload()
return {"message": "Artifact Storage API", "version": app.version, "remotes": list(config.config.get("remotes", {}).keys())}
return {
"message": "Artifact Storage API",
"version": app.version,
"remotes": list(config.config.get("remotes", {}).keys()),
"virtuals": list(config.config.get("virtuals", {}).keys()),
"locals": list(config.config.get("locals", {}).keys()),
}
@app.get("/health")
@@ -99,19 +105,24 @@ async def get_artifact(request: Request, remote_name: str, path: str):
return await proxy.handle(request, remote_name, path, storage, cache, config, database, metrics)
@app.put("/api/v1/remote/{remote_name}/{path:path}")
async def upload_file(remote_name: str, path: str, file: UploadFile = File(...)):
return await local.upload(remote_name, path, file, storage, database, config)
@app.get("/api/v1/local/{local_name}/{path:path}")
def get_local_artifact(local_name: str, path: str):
return local.download(local_name, path, storage, database, config)
@app.head("/api/v1/remote/{remote_name}/{path:path}")
def check_file_exists(remote_name: str, path: str):
return local.check_exists(remote_name, path, database, config)
@app.put("/api/v1/local/{local_name}/{path:path}")
async def upload_local_file(local_name: str, path: str, file: UploadFile = File(...)):
return await local.upload(local_name, path, file, storage, database, config)
@app.delete("/api/v1/remote/{remote_name}/{path:path}")
def delete_file(remote_name: str, path: str):
return local.delete(remote_name, path, storage, database, config)
@app.head("/api/v1/local/{local_name}/{path:path}")
def check_local_file_exists(local_name: str, path: str):
return local.check_exists(local_name, path, database, config)
@app.delete("/api/v1/local/{local_name}/{path:path}")
def delete_local_file(local_name: str, path: str):
return local.delete(local_name, path, storage, database, config)
@app.post("/api/v1/artifacts/cache")
+13 -7
View File
@@ -87,9 +87,10 @@ class MetricsManager:
# Get from database if available
db_sizes = self.database_manager.get_storage_by_remote()
if db_sizes:
# Initialize all configured remotes to 0
# Initialize all configured remotes and locals to 0
remote_sizes = {}
for remote in config_manager.config.get("remotes", {}).keys():
all_names = list(config_manager.config.get("remotes", {}).keys()) + list(config_manager.config.get("locals", {}).keys())
for remote in all_names:
remote_sizes[remote] = db_sizes.get(remote, 0)
# Update Prometheus gauges
@@ -101,10 +102,10 @@ class MetricsManager:
# Fallback to S3 scanning if database not available
try:
remote_sizes = {}
remotes = config_manager.config.get("remotes", {}).keys()
all_names = list(config_manager.config.get("remotes", {}).keys()) + list(config_manager.config.get("locals", {}).keys())
# Initialize all remotes to 0
for remote in remotes:
# Initialize all remotes and locals to 0
for remote in all_names:
remote_sizes[remote] = 0
paginator = storage.client.get_paginator("list_objects_v2")
@@ -174,8 +175,13 @@ class MetricsManager:
metrics["requests"]["cache_hit_ratio"] = cache_hits / total_requests if total_requests > 0 else 0.0
metrics["bandwidth"]["saved_bytes"] = bandwidth_saved
# Get per-remote metrics
for remote in config_manager.config.get("remotes", {}).keys():
# Get per-repo metrics
all_repos = {
**config_manager.config.get("remotes", {}),
**config_manager.config.get("virtuals", {}),
**config_manager.config.get("locals", {}),
}
for remote in all_repos.keys():
remote_cache_hits = int(self.redis_client.client.get(f"metrics:cache_hits:{remote}") or 0)
remote_cache_misses = int(self.redis_client.client.get(f"metrics:cache_misses:{remote}") or 0)
remote_total = remote_cache_hits + remote_cache_misses
+2 -2
View File
@@ -1,4 +1,4 @@
from . import generic, helm, npm, python, rpm
from . import generic, helm, npm, puppet, python, rpm, terraform
from .base import get_content_type
__all__ = ["generic", "helm", "npm", "python", "rpm", "get_content_type"]
__all__ = ["generic", "helm", "npm", "puppet", "python", "rpm", "terraform", "get_content_type"]
+24
View File
@@ -0,0 +1,24 @@
from .base import get_content_type
def resolve_content(
data: bytes,
path: str,
filename: str,
base_url: str,
proxy_url: str,
remote_name: str,
) -> tuple[bytes, str]:
if not path.startswith("v3/files/"):
proxy_remote_url = f"{proxy_url}/api/v1/remote/{remote_name}"
# Rewrite any absolute forge API URLs
data = data.replace(base_url.encode(), proxy_remote_url.encode())
# Rewrite relative file_uri paths ("/v3/files/...") to absolute proxy URLs.
# g10k resolves file_uri against only the forge host, so a relative path
# would drop our /api/v1/remote/<name> prefix.
data = data.replace(
b'"/v3/files/',
f'"{proxy_remote_url}/v3/files/'.encode(),
)
return data, "application/json"
return data, get_content_type(filename)
+36
View File
@@ -0,0 +1,36 @@
import json
import re
from urllib.parse import urlparse
from .base import get_content_type
_DOWNLOAD_PATH = re.compile(r"^[^/]+/[^/]+/[^/]+/download/[^/]+/[^/]+$")
def construct_url(base_url: str, path: str) -> str:
return f"{base_url}/v1/providers/{path}"
def resolve_content(
data: bytes,
path: str,
filename: str,
_base_url: str,
proxy_url: str,
_remote_name: str,
releases_remote: str | None = None,
) -> tuple[bytes, str]:
if filename.endswith((".zip", ".sig")):
return data, get_content_type(filename)
if releases_remote and _DOWNLOAD_PATH.match(path):
releases_proxy = f"{proxy_url}/api/v1/remote/{releases_remote}"
try:
obj = json.loads(data)
for field in ("download_url", "shasums_url", "shasums_signature_url"):
if field in obj:
parsed = urlparse(obj[field])
obj[field] = f"{releases_proxy}{parsed.path}"
data = json.dumps(obj).encode()
except (json.JSONDecodeError, KeyError):
pass
return data, "application/json"
+37 -22
View File
@@ -20,61 +20,55 @@ TEST_REMOTES = {
"remotes": {
"alpine-test": {
"base_url": "https://dl-cdn.alpinelinux.org",
"type": "remote",
"package": "alpine",
"immutable_patterns": [".*/x86_64/.*\\.apk$"],
"cache": {"immutable_ttl": 0, "mutable_ttl": 3600},
},
"rpm-test": {
"base_url": "https://example.com/rpm",
"type": "remote",
"package": "rpm",
"immutable_patterns": [".*/x86_64/.*\\.rpm$", ".*/repodata/.*$"],
"cache": {"immutable_ttl": 0, "mutable_ttl": 3600},
},
"docker-test": {
"base_url": "https://registry.example.com",
"type": "remote",
"package": "docker",
"cache": {"immutable_ttl": 0, "mutable_ttl": 300},
},
"docker-restricted": {
"base_url": "https://registry.example.com",
"type": "remote",
"package": "docker",
"immutable_patterns": ["^library/nginx"],
"cache": {"immutable_ttl": 0, "mutable_ttl": 300},
},
"docker-bantags-test": {
"base_url": "https://registry.example.com",
"package": "docker",
"ban_tags_enabled": True,
"ban_tags": ["latest", "edge"],
"cache": {"immutable_ttl": 0, "mutable_ttl": 300},
},
"generic-test": {
"base_url": "https://releases.example.com",
"type": "remote",
"package": "generic",
"immutable_patterns": [".*\\.tar\\.gz$"],
"cache": {"immutable_ttl": 0, "mutable_ttl": 0},
},
"custom-index-test": {
"base_url": "https://example.com",
"type": "remote",
"package": "generic",
"mutable_patterns": ["metadata\\.json$"],
"cache": {"immutable_ttl": 0, "mutable_ttl": 600},
},
"check-mutable-test": {
"base_url": "https://example.com",
"type": "remote",
"package": "generic",
"mutable_patterns": ["metadata\\.json$"],
"check_mutable_updates": True,
"cache": {"immutable_ttl": 0, "mutable_ttl": 600},
},
"local-test": {
"type": "local",
"package": "generic",
"cache": {"immutable_ttl": 0, "mutable_ttl": 0},
},
"pypi-test": {
"base_url": "https://files.pythonhosted.org",
"type": "remote",
"package": "pypi",
"immutable_patterns": [
r"packages/.*\.whl$",
@@ -85,7 +79,6 @@ TEST_REMOTES = {
},
"npm-test": {
"base_url": "https://registry.npmjs.org",
"type": "remote",
"package": "npm",
"immutable_patterns": [r"\.tgz$"],
"mutable_patterns": [r"^(?!.*\.tgz$).*"],
@@ -93,14 +86,12 @@ TEST_REMOTES = {
},
"helm-test": {
"base_url": "https://helm.releases.hashicorp.com",
"type": "remote",
"package": "helm",
"immutable_patterns": [r"\.tgz$"],
"cache": {"immutable_ttl": 0, "mutable_ttl": 3600},
},
"quarantine-test": {
"base_url": "https://releases.example.com",
"type": "remote",
"package": "generic",
"immutable_patterns": [r".*\.tar\.gz$"],
"quarantine_new": True,
@@ -109,7 +100,6 @@ TEST_REMOTES = {
},
"quarantine-disabled": {
"base_url": "https://releases.example.com",
"type": "remote",
"package": "generic",
"immutable_patterns": [r".*\.tar\.gz$"],
"quarantine_new": False,
@@ -118,27 +108,52 @@ TEST_REMOTES = {
},
"helm-member-2": {
"base_url": "https://charts.example.com",
"type": "remote",
"package": "helm",
"immutable_patterns": [r"\.tgz$"],
"cache": {"immutable_ttl": 0, "mutable_ttl": 1800},
},
"puppet-test": {
"base_url": "https://forgeapi.puppet.com",
"package": "puppet",
"immutable_patterns": [r"^v3/files/.*\.tar\.gz$"],
"cache": {"immutable_ttl": 0, "mutable_ttl": 600},
},
"terraform-registry-test": {
"base_url": "https://registry.terraform.io",
"package": "terraform",
"immutable_patterns": [
r"[^/]+/[^/]+/[^/]+/download/[^/]+/[^/]+$",
],
"releases_remote": "hashicorp-releases-test",
"cache": {"immutable_ttl": 0, "mutable_ttl": 300},
},
"hashicorp-releases-test": {
"base_url": "https://releases.hashicorp.com",
"package": "generic",
"immutable_patterns": [r".*\.zip$", r".*SHA256SUMS(\.sig)?$"],
"cache": {"immutable_ttl": 0, "mutable_ttl": 0},
},
},
"locals": {
"local-test": {
"package": "generic",
"cache": {"immutable_ttl": 0, "mutable_ttl": 0},
},
},
"virtuals": {
"helm-virtual-test": {
"type": "virtual",
"package": "helm",
"members": ["helm-test", "helm-member-2"],
},
"unsupported-virtual-test": {
"type": "virtual",
"package": "rpm",
"members": ["rpm-test"],
},
"empty-virtual-test": {
"type": "virtual",
"package": "helm",
"members": [],
},
}
},
}
# ---------------------------------------------------------------------------
+71
View File
@@ -327,3 +327,74 @@ class TestArtifactPublished:
def test_get_returns_none_when_unavailable(self, unavailable_cache):
assert unavailable_cache.get_artifact_published("remote", "path") is None
# ---------------------------------------------------------------------------
# fetch lock (thundering-herd deduplication)
# ---------------------------------------------------------------------------
class TestFetchLock:
def test_acquire_returns_true_when_lock_obtained(self, cache_with_redis, mock_redis_client):
mock_redis_client.set.return_value = True
result = cache_with_redis.acquire_fetch_lock("myremote", "library/nginx/manifests/latest")
assert result is True
def test_acquire_calls_set_nx_with_ttl(self, cache_with_redis, mock_redis_client):
mock_redis_client.set.return_value = True
cache_with_redis.acquire_fetch_lock("myremote", "library/nginx/manifests/latest", ttl=15)
_, kwargs = mock_redis_client.set.call_args
assert kwargs.get("nx") is True
assert kwargs.get("ex") == 15
def test_acquire_returns_false_when_lock_already_held(self, cache_with_redis, mock_redis_client):
mock_redis_client.set.return_value = None # Redis SET NX → None when key exists
result = cache_with_redis.acquire_fetch_lock("myremote", "library/nginx/manifests/latest")
assert result is False
def test_acquire_fails_open_when_unavailable(self, unavailable_cache):
# caller must be allowed to proceed when Redis is down
assert unavailable_cache.acquire_fetch_lock("myremote", "some/path") is True
def test_acquire_fails_open_on_redis_exception(self, cache_with_redis, mock_redis_client):
mock_redis_client.set.side_effect = Exception("connection reset")
assert cache_with_redis.acquire_fetch_lock("myremote", "some/path") is True
def test_lock_key_embeds_path_hash(self, cache_with_redis, mock_redis_client):
mock_redis_client.set.return_value = True
path = "library/nginx/manifests/latest"
cache_with_redis.acquire_fetch_lock("myremote", path)
args, _ = mock_redis_client.set.call_args
expected_hash = hashlib.sha256(path.encode()).hexdigest()[:16]
assert args[0] == f"fetchlock:myremote:{expected_hash}"
def test_lock_key_hash_is_16_chars(self, cache_with_redis, mock_redis_client):
mock_redis_client.set.return_value = True
cache_with_redis.acquire_fetch_lock("myremote", "some/long/path/file.tar.gz")
args, _ = mock_redis_client.set.call_args
# key format: fetchlock:<remote>:<16-char hash>
parts = args[0].split(":")
assert len(parts) == 3
assert len(parts[2]) == 16
def test_different_paths_produce_different_lock_keys(self, cache_with_redis, mock_redis_client):
mock_redis_client.set.return_value = True
cache_with_redis.acquire_fetch_lock("myremote", "path/a/manifests/latest")
key_a = mock_redis_client.set.call_args[0][0]
mock_redis_client.set.reset_mock()
cache_with_redis.acquire_fetch_lock("myremote", "path/b/manifests/latest")
key_b = mock_redis_client.set.call_args[0][0]
assert key_a != key_b
def test_release_deletes_correct_key(self, cache_with_redis, mock_redis_client):
path = "library/nginx/manifests/latest"
cache_with_redis.release_fetch_lock("myremote", path)
expected_hash = hashlib.sha256(path.encode()).hexdigest()[:16]
mock_redis_client.delete.assert_called_once_with(f"fetchlock:myremote:{expected_hash}")
def test_release_no_op_when_unavailable(self, unavailable_cache):
unavailable_cache.release_fetch_lock("myremote", "some/path") # must not raise
def test_release_no_op_on_redis_exception(self, cache_with_redis, mock_redis_client):
mock_redis_client.delete.side_effect = Exception("timeout")
cache_with_redis.release_fetch_lock("myremote", "some/path") # must not raise
+40 -98
View File
@@ -10,11 +10,11 @@ from artifactapi.config import ConfigManager
@pytest.fixture
def make_config(tmp_path):
"""Factory: write a remote dict to a temp YAML and return a ConfigManager."""
"""Factory: write a remotes dict to a temp YAML and return a ConfigManager."""
def _make(remotes_dict):
cfg_file = tmp_path / "remotes.yaml"
cfg_file.write_text(yaml.dump({"remote": remotes_dict}))
cfg_file.write_text(yaml.dump({"remotes": remotes_dict}))
return ConfigManager(str(cfg_file))
return _make
@@ -64,6 +64,7 @@ class TestGetMutablePatterns:
cfg = make_config(
{
"r": {
"type": "remote",
"package": "alpine",
"base_url": "https://x.com",
"mutable_patterns": [r"custom\.json$"],
@@ -80,6 +81,7 @@ class TestGetMutablePatterns:
cfg = make_config(
{
"r": {
"type": "remote",
"package": "alpine",
"base_url": "https://x.com",
"mutable_patterns": [],
@@ -93,6 +95,7 @@ class TestGetMutablePatterns:
cfg = make_config(
{
"r": {
"type": "remote",
"package": "alpine",
"base_url": "https://x.com",
"mutable_patterns": [existing],
@@ -106,6 +109,7 @@ class TestGetMutablePatterns:
cfg = make_config(
{
"r": {
"type": "remote",
"package": "generic",
"base_url": "https://x.com",
"mutable_patterns": [r"meta\.json$", r"index\.yaml$"],
@@ -118,6 +122,7 @@ class TestGetMutablePatterns:
cfg = make_config(
{
"r": {
"type": "remote",
"package": "rpm",
"base_url": "https://x.com",
"mutable_patterns": [r"custom-meta\.xml$"],
@@ -138,6 +143,7 @@ class TestGetMutablePatterns:
cfg = make_config(
{
"r": {
"type": "remote",
"package": "npm",
"base_url": "https://x.com",
"mutable_patterns": [r"^(?!.*\.tgz$).*"],
@@ -168,6 +174,7 @@ class TestGetMutablePatterns:
cfg = make_config(
{
"r": {
"type": "remote",
"package": "npm",
"base_url": "https://x.com",
"mutable_patterns": [r"^(?!.*\.tgz$).*"],
@@ -189,6 +196,7 @@ class TestGetImmutablePatterns:
cfg = make_config(
{
"r": {
"type": "remote",
"package": "generic",
"base_url": "https://x.com",
"immutable_patterns": [r".*\.tar\.gz$"],
@@ -210,6 +218,7 @@ class TestGetImmutablePatterns:
cfg = make_config(
{
"r": {
"type": "remote",
"package": "rpm",
"base_url": "https://x.com",
"immutable_patterns": patterns,
@@ -222,6 +231,7 @@ class TestGetImmutablePatterns:
cfg = make_config(
{
"r": {
"type": "remote",
"package": "generic",
"base_url": "https://x.com",
"immutable_patterns": [r".*\.tar\.gz$"],
@@ -237,6 +247,7 @@ class TestGetImmutablePatterns:
cfg = make_config(
{
"r": {
"type": "remote",
"package": "generic",
"base_url": "https://x.com",
"immutable_patterns": [r".*\.tar\.gz$"],
@@ -259,6 +270,7 @@ class TestGetUserMutablePatterns:
cfg = make_config(
{
"r": {
"type": "remote",
"package": "alpine",
"base_url": "https://x.com",
"mutable_patterns": [r"custom\.json$"],
@@ -291,6 +303,7 @@ class TestGetCacheConfig:
cfg = make_config(
{
"r": {
"type": "remote",
"package": "generic",
"base_url": "https://x.com",
"cache": {"immutable_ttl": 0, "mutable_ttl": 7200},
@@ -316,11 +329,11 @@ class TestGetCacheConfig:
class TestConfigReload:
def test_reloads_when_file_mtime_advances(self, tmp_path):
cfg_file = tmp_path / "remotes.yaml"
cfg_file.write_text(yaml.dump({"remote": {"repo-a": {"package": "generic", "base_url": "https://x.com"}}}))
cfg_file.write_text(yaml.dump({"remotes": {"repo-a": {"package": "generic", "base_url": "https://x.com"}}}))
cfg = ConfigManager(str(cfg_file))
assert "repo-a" in cfg.config["remotes"]
cfg_file.write_text(yaml.dump({"remote": {"repo-b": {"package": "generic", "base_url": "https://y.com"}}}))
cfg_file.write_text(yaml.dump({"remotes": {"repo-b": {"package": "generic", "base_url": "https://y.com"}}}))
future_mtime = cfg._last_modified + 1
os.utime(str(cfg_file), (future_mtime, future_mtime))
@@ -331,7 +344,7 @@ class TestConfigReload:
def test_no_reload_when_file_unchanged(self, tmp_path):
cfg_file = tmp_path / "remotes.yaml"
cfg_file.write_text(yaml.dump({"remote": {"repo-a": {"package": "generic", "base_url": "https://x.com"}}}))
cfg_file.write_text(yaml.dump({"remotes": {"repo-a": {"package": "generic", "base_url": "https://x.com"}}}))
cfg = ConfigManager(str(cfg_file))
# Call check_reload without touching the file — should not reload
@@ -362,6 +375,7 @@ class TestGetQuarantineConfig:
cfg = make_config(
{
"r": {
"type": "remote",
"package": "generic",
"base_url": "https://x.com",
"quarantine_new": True,
@@ -377,6 +391,7 @@ class TestGetQuarantineConfig:
cfg = make_config(
{
"r": {
"type": "remote",
"package": "generic",
"base_url": "https://x.com",
"quarantine_new": False,
@@ -392,6 +407,7 @@ class TestGetQuarantineConfig:
cfg = make_config(
{
"r": {
"type": "remote",
"package": "generic",
"base_url": "https://x.com",
"quarantine_new": True,
@@ -415,36 +431,36 @@ def _remote(base_url: str = "https://x.com") -> dict:
class TestConfigDirMode:
def test_loads_all_yaml_files(self, tmp_path):
(tmp_path / "a.yaml").write_text(yaml.dump({"remote": {"repo-a": _remote()}}))
(tmp_path / "b.yaml").write_text(yaml.dump({"remote": {"repo-b": _remote("https://y.com")}}))
(tmp_path / "a.yaml").write_text(yaml.dump({"remotes": {"repo-a": _remote()}}))
(tmp_path / "b.yaml").write_text(yaml.dump({"remotes": {"repo-b": _remote("https://y.com")}}))
cfg = ConfigManager(str(tmp_path))
assert "repo-a" in cfg.config["remotes"]
assert "repo-b" in cfg.config["remotes"]
def test_later_file_overrides_earlier_on_same_key(self, tmp_path):
(tmp_path / "a.yaml").write_text(yaml.dump({"remote": {"r": _remote("https://first.com")}}))
(tmp_path / "b.yaml").write_text(yaml.dump({"remote": {"r": _remote("https://second.com")}}))
(tmp_path / "a.yaml").write_text(yaml.dump({"remotes": {"r": _remote("https://first.com")}}))
(tmp_path / "b.yaml").write_text(yaml.dump({"remotes": {"r": _remote("https://second.com")}}))
cfg = ConfigManager(str(tmp_path))
assert cfg.config["remotes"]["r"]["base_url"] == "https://second.com"
def test_empty_directory_returns_empty_remotes(self, tmp_path):
cfg = ConfigManager(str(tmp_path))
assert cfg.config == {"remotes": {}}
assert cfg.config == {"remotes": {}, "virtuals": {}, "locals": {}}
def test_ignores_non_yaml_files(self, tmp_path):
(tmp_path / "notes.txt").write_text("not yaml")
(tmp_path / "a.yaml").write_text(yaml.dump({"remote": {"repo-a": _remote()}}))
(tmp_path / "a.yaml").write_text(yaml.dump({"remotes": {"repo-a": _remote()}}))
cfg = ConfigManager(str(tmp_path))
assert list(cfg.config["remotes"].keys()) == ["repo-a"]
def test_reload_picks_up_new_file(self, tmp_path):
(tmp_path / "a.yaml").write_text(yaml.dump({"remote": {"repo-a": _remote()}}))
(tmp_path / "a.yaml").write_text(yaml.dump({"remotes": {"repo-a": _remote()}}))
cfg = ConfigManager(str(tmp_path))
assert "repo-a" in cfg.config["remotes"]
assert "repo-b" not in cfg.config["remotes"]
new_file = tmp_path / "b.yaml"
new_file.write_text(yaml.dump({"remote": {"repo-b": _remote("https://y.com")}}))
new_file.write_text(yaml.dump({"remotes": {"repo-b": _remote("https://y.com")}}))
future_mtime = cfg._last_modified + 1
os.utime(str(new_file), (future_mtime, future_mtime))
@@ -463,9 +479,9 @@ class TestConfigDirKey:
def test_merges_remotes_from_config_dir(self, tmp_path):
conf_d = tmp_path / "conf.d"
conf_d.mkdir()
(conf_d / "remotes.yaml").write_text(yaml.dump({"remote": {"repo-extra": _remote("https://extra.com")}}))
(conf_d / "remotes.yaml").write_text(yaml.dump({"remotes": {"repo-extra": _remote("https://extra.com")}}))
main = tmp_path / "config.yaml"
main.write_text(yaml.dump({"config_dir": str(conf_d), "remote": {"repo-main": _remote()}}))
main.write_text(yaml.dump({"config_dir": str(conf_d), "remotes": {"repo-main": _remote()}}))
cfg = ConfigManager(str(main))
assert "repo-main" in cfg.config["remotes"]
assert "repo-extra" in cfg.config["remotes"]
@@ -473,9 +489,9 @@ class TestConfigDirKey:
def test_relative_config_dir_resolved_from_main_file(self, tmp_path):
conf_d = tmp_path / "conf.d"
conf_d.mkdir()
(conf_d / "r.yaml").write_text(yaml.dump({"remote": {"repo-a": _remote()}}))
(conf_d / "r.yaml").write_text(yaml.dump({"remotes": {"repo-a": _remote()}}))
main = tmp_path / "config.yaml"
main.write_text(yaml.dump({"config_dir": "conf.d"}))
main.write_text(yaml.dump({"config_dir": "conf.d", "remotes": {}}))
cfg = ConfigManager(str(main))
assert "repo-a" in cfg.config["remotes"]
@@ -483,16 +499,16 @@ class TestConfigDirKey:
conf_d = tmp_path / "conf.d"
conf_d.mkdir()
main = tmp_path / "config.yaml"
main.write_text(yaml.dump({"config_dir": str(conf_d), "remote": {}}))
main.write_text(yaml.dump({"config_dir": str(conf_d), "remotes": {}}))
cfg = ConfigManager(str(main))
assert "config_dir" not in cfg.config
def test_dir_remote_overrides_main_file_remote(self, tmp_path):
conf_d = tmp_path / "conf.d"
conf_d.mkdir()
(conf_d / "override.yaml").write_text(yaml.dump({"remote": {"r": _remote("https://new.com")}}))
(conf_d / "override.yaml").write_text(yaml.dump({"remotes": {"r": _remote("https://new.com")}}))
main = tmp_path / "config.yaml"
main.write_text(yaml.dump({"config_dir": str(conf_d), "remote": {"r": _remote("https://old.com")}}))
main.write_text(yaml.dump({"config_dir": str(conf_d), "remotes": {"r": _remote("https://old.com")}}))
cfg = ConfigManager(str(main))
assert cfg.config["remotes"]["r"]["base_url"] == "https://new.com"
@@ -500,7 +516,7 @@ class TestConfigDirKey:
conf_d = tmp_path / "conf.d"
conf_d.mkdir()
main = tmp_path / "config.yaml"
main.write_text(yaml.dump({"config_dir": str(conf_d), "remote": {"repo-main": _remote()}}))
main.write_text(yaml.dump({"config_dir": str(conf_d), "remotes": {"repo-main": _remote()}}))
cfg = ConfigManager(str(main))
assert list(cfg.config["remotes"].keys()) == ["repo-main"]
@@ -508,13 +524,13 @@ class TestConfigDirKey:
conf_d = tmp_path / "conf.d"
conf_d.mkdir()
dir_file = conf_d / "r.yaml"
dir_file.write_text(yaml.dump({"remote": {"repo-v1": _remote()}}))
dir_file.write_text(yaml.dump({"remotes": {"repo-v1": _remote()}}))
main = tmp_path / "config.yaml"
main.write_text(yaml.dump({"config_dir": str(conf_d), "remote": {}}))
main.write_text(yaml.dump({"config_dir": str(conf_d), "remotes": {}}))
cfg = ConfigManager(str(main))
assert "repo-v1" in cfg.config["remotes"]
dir_file.write_text(yaml.dump({"remote": {"repo-v2": _remote("https://v2.com")}}))
dir_file.write_text(yaml.dump({"remotes": {"repo-v2": _remote("https://v2.com")}}))
future_mtime = cfg._last_modified + 1
os.utime(str(dir_file), (future_mtime, future_mtime))
@@ -522,77 +538,3 @@ class TestConfigDirKey:
assert "repo-v2" in cfg.config["remotes"]
assert "repo-v1" not in cfg.config["remotes"]
# ---------------------------------------------------------------------------
# YAML format normalisation — top-level type keys
# ---------------------------------------------------------------------------
class TestYamlTypeKeys:
def test_remote_key_injects_type_remote(self, tmp_path):
f = tmp_path / "r.yaml"
f.write_text(yaml.dump({"remote": {"my-remote": {"package": "generic", "base_url": "https://x.com"}}}))
cfg = ConfigManager(str(f))
assert cfg.config["remotes"]["my-remote"]["type"] == "remote"
def test_virtual_key_injects_type_virtual(self, tmp_path):
f = tmp_path / "r.yaml"
f.write_text(yaml.dump({"virtual": {"my-virtual": {"package": "helm", "members": ["a", "b"]}}}))
cfg = ConfigManager(str(f))
assert cfg.config["remotes"]["my-virtual"]["type"] == "virtual"
assert cfg.config["remotes"]["my-virtual"]["members"] == ["a", "b"]
def test_local_key_injects_type_local(self, tmp_path):
f = tmp_path / "r.yaml"
f.write_text(yaml.dump({"local": {"my-local": {"package": "generic"}}}))
cfg = ConfigManager(str(f))
assert cfg.config["remotes"]["my-local"]["type"] == "local"
def test_mixed_file_all_three_types(self, tmp_path):
f = tmp_path / "r.yaml"
f.write_text(
yaml.dump(
{
"remote": {"r": {"package": "helm", "base_url": "https://helm.example.com"}},
"virtual": {"v": {"package": "helm", "members": ["r"]}},
"local": {"l": {"package": "generic"}},
}
)
)
cfg = ConfigManager(str(f))
assert cfg.config["remotes"]["r"]["type"] == "remote"
assert cfg.config["remotes"]["v"]["type"] == "virtual"
assert cfg.config["remotes"]["l"]["type"] == "local"
def test_type_field_not_required_in_yaml(self, tmp_path):
f = tmp_path / "r.yaml"
f.write_text(yaml.dump({"remote": {"r": {"package": "alpine", "base_url": "https://x.com"}}}))
cfg = ConfigManager(str(f))
raw = cfg.config["remotes"]["r"]
# type is injected by the loader; the original dict had no type key
assert "type" in raw
assert raw["type"] == "remote"
def test_other_fields_preserved_after_normalisation(self, tmp_path):
f = tmp_path / "r.yaml"
f.write_text(
yaml.dump(
{
"remote": {
"r": {
"package": "helm",
"base_url": "https://helm.example.com",
"immutable_patterns": [r"\.tgz$"],
"cache": {"immutable_ttl": 0, "mutable_ttl": 1800},
}
}
}
)
)
cfg = ConfigManager(str(f))
remote = cfg.config["remotes"]["r"]
assert remote["package"] == "helm"
assert remote["base_url"] == "https://helm.example.com"
assert remote["cache"] == {"immutable_ttl": 0, "mutable_ttl": 1800}
assert r"\.tgz$" in remote["immutable_patterns"]
+477 -26
View File
@@ -260,6 +260,211 @@ class TestDockerProxy:
mock_fetch.assert_called_once()
assert response.status_code == 200
# --- Issue 1: sha256 digest cross-linking ---
def test_tag_manifest_is_stored_under_digest_key_on_cache_hit(self, client, patched_deps):
# When serving a cached tag manifest the handler must also write the content
# under the sha256 digest key so subsequent sha256-addressed pulls hit cache.
deps = patched_deps
manifest = json.dumps({"mediaType": "application/vnd.oci.image.manifest.v1+json", "layers": []}).encode()
# First exists call (tag manifest): hit. Second (digest key): miss → triggers upload.
deps["storage"].exists.side_effect = [True, False]
deps["storage"].download_object.return_value = manifest
deps["cache"].is_mutable_file.return_value = True
deps["cache"].is_index_valid.return_value = True
response = client.get("/v2/docker-test/library/nginx/manifests/v1.25.3")
assert response.status_code == 200
deps["storage"].upload.assert_called_once_with(deps["storage"].get_object_key.return_value, manifest)
def test_tag_manifest_digest_key_not_written_when_already_exists(self, client, patched_deps):
# When the digest key already exists in storage upload must not be called.
deps = patched_deps
manifest = json.dumps({"mediaType": "application/vnd.oci.image.manifest.v1+json", "layers": []}).encode()
# Both the tag key and the digest key already present.
deps["storage"].exists.return_value = True
deps["storage"].download_object.return_value = manifest
deps["cache"].is_mutable_file.return_value = True
deps["cache"].is_index_valid.return_value = True
client.get("/v2/docker-test/library/nginx/manifests/v1.25.3")
deps["storage"].upload.assert_not_called()
def test_sha256_manifest_request_is_not_cross_linked(self, client, patched_deps):
# sha256-addressed manifests are immutable — the cross-link logic must not apply.
deps = patched_deps
manifest = json.dumps({"mediaType": "application/vnd.oci.image.manifest.v1+json", "layers": []}).encode()
deps["storage"].exists.return_value = True
deps["storage"].download_object.return_value = manifest
deps["cache"].is_mutable_file.return_value = False # sha256 manifest is immutable
with patch("artifactapi.artifact.proxy._fetch_last_modified", new_callable=AsyncMock, return_value=None):
client.get("/v2/docker-test/library/nginx/manifests/sha256:" + "a" * 64)
deps["storage"].upload.assert_not_called()
# --- Issue 2: thundering herd distributed lock ---
def test_lock_acquired_and_released_on_upstream_fetch(self, client, patched_deps):
deps = patched_deps
manifest = json.dumps({"mediaType": "application/vnd.oci.image.manifest.v1+json", "layers": []}).encode()
deps["storage"].exists.side_effect = [False, False] # initial miss; digest key also absent
deps["storage"].download_object.return_value = manifest
deps["cache"].is_mutable_file.return_value = True
deps["cache"].acquire_fetch_lock.return_value = True
with patch(
"artifactapi.artifact.proxy.cache_single_artifact",
new_callable=AsyncMock,
return_value={"status": "cached"},
):
response = client.get("/v2/docker-test/library/nginx/manifests/latest")
deps["cache"].acquire_fetch_lock.assert_called_once()
deps["cache"].release_fetch_lock.assert_called_once()
assert response.status_code == 200
def test_lock_released_even_when_fetch_returns_error(self, client, patched_deps):
deps = patched_deps
deps["storage"].exists.return_value = False
deps["cache"].is_mutable_file.return_value = True
deps["cache"].acquire_fetch_lock.return_value = True
with patch(
"artifactapi.artifact.proxy.cache_single_artifact",
new_callable=AsyncMock,
return_value={"status": "error", "error": "upstream down"},
):
response = client.get("/v2/docker-test/library/nginx/manifests/latest")
deps["cache"].release_fetch_lock.assert_called_once()
assert response.status_code == 502
def test_thundering_herd_polls_storage_when_lock_not_acquired(self, client, patched_deps):
# When the lock is held by another pod the handler must poll storage and serve
# from cache once the competing fetch completes, without issuing its own upstream request.
deps = patched_deps
manifest = json.dumps({"mediaType": "application/vnd.oci.image.manifest.v1+json", "layers": []}).encode()
# Initial cache check: miss. First poll iteration: another pod has written it.
# Third call is for the digest cross-link check (is_mutable=True path); digest key exists.
deps["storage"].exists.side_effect = [False, True, True]
deps["storage"].download_object.return_value = manifest
deps["cache"].is_mutable_file.return_value = True
deps["cache"].is_index_valid.return_value = True
deps["cache"].acquire_fetch_lock.return_value = False # lock held by peer
with patch("artifactapi.artifact.docker.asyncio.sleep", new_callable=AsyncMock):
with patch(
"artifactapi.artifact.proxy.cache_single_artifact",
new_callable=AsyncMock,
) as mock_fetch:
response = client.get("/v2/docker-test/library/nginx/manifests/latest")
mock_fetch.assert_not_called()
assert response.status_code == 200
def test_thundering_herd_falls_through_to_fetch_if_poll_times_out(self, client, patched_deps):
# If the item never appears in storage during the poll window the handler must
# still issue its own upstream fetch as a fallback.
deps = patched_deps
manifest = json.dumps({"mediaType": "application/vnd.oci.image.manifest.v1+json", "layers": []}).encode()
# All exists calls return False — item never appears during polling.
deps["storage"].exists.return_value = False
deps["storage"].download_object.return_value = manifest
deps["cache"].is_mutable_file.return_value = True
deps["cache"].acquire_fetch_lock.return_value = False # lock held by peer
with patch("artifactapi.artifact.docker.asyncio.sleep", new_callable=AsyncMock):
with patch(
"artifactapi.artifact.proxy.cache_single_artifact",
new_callable=AsyncMock,
return_value={"status": "cached"},
) as mock_fetch:
response = client.get("/v2/docker-test/library/nginx/manifests/latest")
mock_fetch.assert_called_once()
assert response.status_code == 200
# ---------------------------------------------------------------------------
# Docker ban_tags feature
# ---------------------------------------------------------------------------
class TestDockerBanTags:
def test_banned_tag_returns_403(self, client, patched_deps):
response = client.get("/v2/docker-bantags-test/library/nginx/manifests/latest")
assert response.status_code == 403
assert "latest" in response.json()["detail"]
def test_second_banned_tag_returns_403(self, client, patched_deps):
response = client.get("/v2/docker-bantags-test/library/nginx/manifests/edge")
assert response.status_code == 403
assert "edge" in response.json()["detail"]
def test_allowed_tag_proceeds(self, client, patched_deps):
deps = patched_deps
manifest = json.dumps({"mediaType": "application/vnd.oci.image.manifest.v1+json", "layers": []}).encode()
deps["storage"].exists.return_value = True
deps["storage"].download_object.return_value = manifest
deps["cache"].is_mutable_file.return_value = True
deps["cache"].is_index_valid.return_value = True
response = client.get("/v2/docker-bantags-test/library/nginx/manifests/1.25.3")
assert response.status_code == 200
def test_digest_pull_bypasses_ban(self, client, patched_deps):
# sha256-addressed pulls must never be blocked by the tag ban list
deps = patched_deps
manifest = json.dumps({"mediaType": "application/vnd.oci.image.manifest.v1+json", "layers": []}).encode()
deps["storage"].exists.return_value = True
deps["storage"].download_object.return_value = manifest
deps["cache"].is_mutable_file.return_value = False
digest = "sha256:" + "a" * 64
with patch("artifactapi.artifact.proxy._fetch_last_modified", new_callable=AsyncMock, return_value=None):
response = client.get(f"/v2/docker-bantags-test/library/nginx/manifests/{digest}")
assert response.status_code == 200
def test_ban_tags_disabled_by_default(self, client, patched_deps):
# docker-test has no ban_tags_enabled — "latest" must pass through
deps = patched_deps
manifest = json.dumps({"mediaType": "application/vnd.oci.image.manifest.v1+json", "layers": []}).encode()
deps["storage"].exists.return_value = True
deps["storage"].download_object.return_value = manifest
deps["cache"].is_mutable_file.return_value = True
deps["cache"].is_index_valid.return_value = True
response = client.get("/v2/docker-test/library/nginx/manifests/latest")
assert response.status_code == 200
def test_ban_tags_enabled_but_empty_list_allows_all(self, client, patched_deps):
# If ban_tags_enabled is true but ban_tags is empty nothing should be blocked.
# docker-test doesn't have ban_tags_enabled, but we can verify via the
# docker-bantags-test remote with an unlisted tag.
deps = patched_deps
manifest = json.dumps({"mediaType": "application/vnd.oci.image.manifest.v1+json", "layers": []}).encode()
deps["storage"].exists.return_value = True
deps["storage"].download_object.return_value = manifest
deps["cache"].is_mutable_file.return_value = True
deps["cache"].is_index_valid.return_value = True
response = client.get("/v2/docker-bantags-test/library/nginx/manifests/stable")
assert response.status_code == 200
def test_ban_check_does_not_apply_to_blobs(self, client, patched_deps):
# Blob paths don't contain /manifests/ — the ban check must not interfere
deps = patched_deps
deps["storage"].exists.return_value = True
deps["storage"].download_object.return_value = b"\x00" * 100
deps["cache"].is_mutable_file.return_value = False
with patch("artifactapi.artifact.proxy._fetch_last_modified", new_callable=AsyncMock, return_value=None):
response = client.get("/v2/docker-bantags-test/library/nginx/blobs/sha256:" + "b" * 64)
assert response.status_code == 200
# ---------------------------------------------------------------------------
# Generic artifact route /api/v1/remote/{remote}/{path}
@@ -523,68 +728,53 @@ class TestGenericArtifactRoute:
deps["database"].get_local_file_metadata.return_value = None
deps["database"].available = True
response = client.get("/api/v1/remote/local-test/path/to/nonexistent.bin")
response = client.get("/api/v1/local/local-test/path/to/nonexistent.bin")
assert response.status_code == 404
# ---------------------------------------------------------------------------
# Upload route PUT /api/v1/remote/{remote}/{path}
# Upload route PUT /api/v1/local/{local}/{path}
# ---------------------------------------------------------------------------
class TestUploadRoute:
def test_unknown_remote_returns_404(self, client, patched_deps):
def test_unknown_local_returns_404(self, client, patched_deps):
response = client.put(
"/api/v1/remote/nonexistent/path/to/file.tar.gz",
"/api/v1/local/nonexistent/path/to/file.tar.gz",
files={"file": ("file.tar.gz", b"content", "application/octet-stream")},
)
assert response.status_code == 404
def test_non_local_remote_returns_400(self, client, patched_deps):
response = client.put(
"/api/v1/remote/generic-test/path/to/file.tar.gz",
files={"file": ("file.tar.gz", b"content", "application/octet-stream")},
)
assert response.status_code == 400
# ---------------------------------------------------------------------------
# HEAD route HEAD /api/v1/remote/{remote}/{path}
# HEAD route HEAD /api/v1/local/{local}/{path}
# ---------------------------------------------------------------------------
class TestHeadRoute:
def test_non_local_remote_returns_405(self, client, patched_deps):
response = client.head("/api/v1/remote/generic-test/path/to/file.tar.gz")
assert response.status_code == 405
def test_local_repo_file_not_found_returns_404(self, client, patched_deps):
deps = patched_deps
deps["database"].get_local_file_metadata.return_value = None
deps["database"].available = True
response = client.head("/api/v1/remote/local-test/path/to/nonexistent.bin")
response = client.head("/api/v1/local/local-test/path/to/nonexistent.bin")
assert response.status_code == 404
def test_unknown_remote_returns_404(self, client, patched_deps):
response = client.head("/api/v1/remote/nonexistent/path/to/file.bin")
def test_unknown_local_returns_404(self, client, patched_deps):
response = client.head("/api/v1/local/nonexistent/path/to/file.bin")
assert response.status_code == 404
# ---------------------------------------------------------------------------
# DELETE route DELETE /api/v1/remote/{remote}/{path}
# DELETE route DELETE /api/v1/local/{local}/{path}
# ---------------------------------------------------------------------------
class TestDeleteRoute:
def test_unknown_remote_returns_404(self, client, patched_deps):
response = client.delete("/api/v1/remote/nonexistent/path/to/file.tar.gz")
def test_unknown_local_returns_404(self, client, patched_deps):
response = client.delete("/api/v1/local/nonexistent/path/to/file.tar.gz")
assert response.status_code == 404
def test_non_local_remote_returns_400(self, client, patched_deps):
response = client.delete("/api/v1/remote/generic-test/path/to/file.tar.gz")
assert response.status_code == 400
# ---------------------------------------------------------------------------
# Cache flush PUT /cache/flush
@@ -927,6 +1117,267 @@ class TestHelmRemote:
assert response.status_code == 403
# ---------------------------------------------------------------------------
# Puppet Forge remote /api/v1/remote/puppet-test/...
# ---------------------------------------------------------------------------
class TestPuppetRemote:
def test_module_metadata_is_mutable(self, client, patched_deps):
"""v3/modules/ paths are detected as mutable (package-type default)."""
deps = patched_deps
deps["storage"].exists.return_value = True
deps["storage"].download_object.return_value = b'{"current_release":{"file_uri":"/v3/files/puppetlabs-stdlib-9.7.0.tar.gz"}}'
deps["cache"].is_mutable_file.return_value = True
deps["cache"].is_index_valid.return_value = True
response = client.get("/api/v1/remote/puppet-test/v3/modules/puppetlabs-stdlib")
assert response.status_code == 200
deps["cache"].mark_index_cached.assert_not_called()
def test_releases_path_is_mutable(self, client, patched_deps):
"""v3/releases paths are detected as mutable (package-type default)."""
deps = patched_deps
deps["storage"].exists.return_value = True
deps["storage"].download_object.return_value = b'{"file_uri":"/v3/files/puppetlabs-stdlib-9.7.0.tar.gz"}'
deps["cache"].is_mutable_file.return_value = True
deps["cache"].is_index_valid.return_value = True
response = client.get("/api/v1/remote/puppet-test/v3/releases/puppetlabs-stdlib-9.7.0")
assert response.status_code == 200
def test_relative_file_uri_rewritten_to_absolute_proxy_url(self, client, patched_deps):
"""Relative /v3/files/ paths in JSON responses are rewritten to absolute proxy URLs."""
deps = patched_deps
meta = b'{"current_release":{"file_uri":"/v3/files/puppetlabs-stdlib-9.7.0.tar.gz","version":"9.7.0"}}'
deps["storage"].exists.return_value = True
deps["storage"].download_object.return_value = meta
deps["cache"].is_mutable_file.return_value = True
deps["cache"].is_index_valid.return_value = True
response = client.get("/api/v1/remote/puppet-test/v3/modules/puppetlabs-stdlib")
assert response.status_code == 200
assert b'"/v3/files/' not in response.content
assert b"/api/v1/remote/puppet-test/v3/files/puppetlabs-stdlib-9.7.0.tar.gz" in response.content
def test_absolute_forge_url_rewritten_to_proxy(self, client, patched_deps):
"""Absolute forgeapi.puppet.com URLs in JSON are rewritten to the proxy URL."""
deps = patched_deps
meta = b'{"uri":"https://forgeapi.puppet.com/v3/modules/puppetlabs-stdlib"}'
deps["storage"].exists.return_value = True
deps["storage"].download_object.return_value = meta
deps["cache"].is_mutable_file.return_value = True
deps["cache"].is_index_valid.return_value = True
response = client.get("/api/v1/remote/puppet-test/v3/modules/puppetlabs-stdlib")
assert response.status_code == 200
assert b"forgeapi.puppet.com" not in response.content
assert b"/api/v1/remote/puppet-test" in response.content
def test_metadata_content_type_is_json(self, client, patched_deps):
deps = patched_deps
deps["storage"].exists.return_value = True
deps["storage"].download_object.return_value = b'{"current_release":{}}'
deps["cache"].is_mutable_file.return_value = True
deps["cache"].is_index_valid.return_value = True
response = client.get("/api/v1/remote/puppet-test/v3/modules/puppetlabs-concat")
assert response.status_code == 200
assert "application/json" in response.headers["content-type"]
def test_tarball_served_without_rewriting(self, client, patched_deps):
"""Module tarballs (v3/files/*.tar.gz) are served as binary without URL rewriting."""
deps = patched_deps
deps["storage"].exists.return_value = True
deps["storage"].download_object.return_value = b"\x1f\x8b tarball bytes"
deps["cache"].is_mutable_file.return_value = False
response = client.get("/api/v1/remote/puppet-test/v3/files/puppetlabs-stdlib-9.7.0.tar.gz")
assert response.status_code == 200
assert "application/gzip" in response.headers["content-type"]
assert response.headers["X-Artifact-Source"] == "cache"
def test_tarball_not_blocked_by_immutable_pattern(self, client, patched_deps):
"""v3/files/*.tar.gz matches the configured immutable_patterns and is allowed."""
deps = patched_deps
deps["storage"].exists.return_value = True
deps["storage"].download_object.return_value = b"\x1f\x8b tarball bytes"
deps["cache"].is_mutable_file.return_value = False
response = client.get("/api/v1/remote/puppet-test/v3/files/puppetlabs-inifile-6.2.0.tar.gz")
assert response.status_code == 200
def test_unknown_path_blocked(self, client, patched_deps):
"""Paths outside v3/modules, v3/releases, and v3/files are blocked."""
deps = patched_deps
deps["cache"].is_mutable_file.return_value = False
response = client.get("/api/v1/remote/puppet-test/v3/users/puppetlabs")
assert response.status_code == 403
def test_metadata_cache_miss_fetches_upstream(self, client, patched_deps):
deps = patched_deps
meta = b'{"current_release":{"file_uri":"/v3/files/puppetlabs-stdlib-9.7.0.tar.gz"}}'
deps["storage"].exists.return_value = False
deps["storage"].download_object.return_value = meta
deps["cache"].is_mutable_file.return_value = True
with patch(
"artifactapi.artifact.proxy.cache_single_artifact",
new_callable=AsyncMock,
return_value={"status": "cached"},
) as mock_fetch:
response = client.get("/api/v1/remote/puppet-test/v3/modules/puppetlabs-stdlib")
mock_fetch.assert_called_once()
assert response.status_code == 200
assert b'"/v3/files/' not in response.content
# ---------------------------------------------------------------------------
# Terraform registry remote (terraform-registry-test)
# ---------------------------------------------------------------------------
class TestTerraformRemote:
def test_versions_path_is_mutable(self, client, patched_deps):
"""Provider versions listing is detected as mutable."""
deps = patched_deps
deps["storage"].exists.return_value = True
deps["storage"].download_object.return_value = b'{"versions":[]}'
deps["cache"].is_mutable_file.return_value = True
deps["cache"].is_index_valid.return_value = True
response = client.get("/api/v1/remote/terraform-registry-test/hashicorp/vault/versions")
assert response.status_code == 200
deps["cache"].mark_index_cached.assert_not_called()
def test_versions_returns_json_content_type(self, client, patched_deps):
deps = patched_deps
deps["storage"].exists.return_value = True
deps["storage"].download_object.return_value = b'{"versions":[]}'
deps["cache"].is_mutable_file.return_value = True
deps["cache"].is_index_valid.return_value = True
response = client.get("/api/v1/remote/terraform-registry-test/hashicorp/vault/versions")
assert response.status_code == 200
assert "application/json" in response.headers["content-type"]
def test_download_info_download_url_rewritten(self, client, patched_deps):
"""download_url in download-info JSON is rewritten to point to the releases proxy."""
deps = patched_deps
download_info = json.dumps(
{
"os": "linux",
"arch": "amd64",
"filename": "terraform-provider-vault_0.28.0_linux_amd64.zip",
"download_url": "https://releases.hashicorp.com/terraform-provider-vault/0.28.0/terraform-provider-vault_0.28.0_linux_amd64.zip",
"shasums_url": "https://releases.hashicorp.com/terraform-provider-vault/0.28.0/terraform-provider-vault_0.28.0_SHA256SUMS",
"shasums_signature_url": "https://releases.hashicorp.com/terraform-provider-vault/0.28.0/terraform-provider-vault_0.28.0_SHA256SUMS.sig",
}
).encode()
deps["storage"].exists.return_value = True
deps["storage"].download_object.return_value = download_info
deps["cache"].is_mutable_file.return_value = False
response = client.get("/api/v1/remote/terraform-registry-test/hashicorp/vault/0.28.0/download/linux/amd64")
assert response.status_code == 200
data = response.json()
assert "releases.hashicorp.com" not in data["download_url"]
assert "/api/v1/remote/hashicorp-releases-test/" in data["download_url"]
def test_download_info_shasums_url_rewritten(self, client, patched_deps):
"""shasums_url is also rewritten to the releases proxy."""
deps = patched_deps
download_info = json.dumps(
{
"os": "linux",
"arch": "amd64",
"filename": "terraform-provider-vault_0.28.0_linux_amd64.zip",
"download_url": "https://releases.hashicorp.com/terraform-provider-vault/0.28.0/terraform-provider-vault_0.28.0_linux_amd64.zip",
"shasums_url": "https://releases.hashicorp.com/terraform-provider-vault/0.28.0/terraform-provider-vault_0.28.0_SHA256SUMS",
"shasums_signature_url": "https://releases.hashicorp.com/terraform-provider-vault/0.28.0/terraform-provider-vault_0.28.0_SHA256SUMS.sig",
}
).encode()
deps["storage"].exists.return_value = True
deps["storage"].download_object.return_value = download_info
deps["cache"].is_mutable_file.return_value = False
response = client.get("/api/v1/remote/terraform-registry-test/hashicorp/vault/0.28.0/download/linux/amd64")
assert response.status_code == 200
data = response.json()
assert "/api/v1/remote/hashicorp-releases-test/" in data["shasums_url"]
assert "/api/v1/remote/hashicorp-releases-test/" in data["shasums_signature_url"]
assert "releases.hashicorp.com" not in data["shasums_url"]
assert "releases.hashicorp.com" not in data["shasums_signature_url"]
def test_download_info_path_preserved(self, client, patched_deps):
"""The path portion of the upstream URL is preserved when rewriting."""
deps = patched_deps
zip_path = "/terraform-provider-vault/0.28.0/terraform-provider-vault_0.28.0_linux_amd64.zip"
download_info = json.dumps(
{
"download_url": f"https://releases.hashicorp.com{zip_path}",
"shasums_url": "https://releases.hashicorp.com/terraform-provider-vault/0.28.0/terraform-provider-vault_0.28.0_SHA256SUMS",
"shasums_signature_url": "https://releases.hashicorp.com/terraform-provider-vault/0.28.0/terraform-provider-vault_0.28.0_SHA256SUMS.sig",
}
).encode()
deps["storage"].exists.return_value = True
deps["storage"].download_object.return_value = download_info
deps["cache"].is_mutable_file.return_value = False
response = client.get("/api/v1/remote/terraform-registry-test/hashicorp/vault/0.28.0/download/linux/amd64")
assert response.status_code == 200
data = response.json()
assert data["download_url"].endswith(zip_path)
def test_zip_served_as_binary(self, client, patched_deps):
"""Provider zip files are served as binary without JSON rewriting."""
deps = patched_deps
deps["storage"].exists.return_value = True
deps["storage"].download_object.return_value = b"PK\x03\x04 zip bytes"
deps["cache"].is_mutable_file.return_value = False
response = client.get(
"/api/v1/remote/hashicorp-releases-test/terraform-provider-vault/0.28.0/terraform-provider-vault_0.28.0_linux_amd64.zip"
)
assert response.status_code == 200
assert response.headers["X-Artifact-Source"] == "cache"
def test_construct_url_prepends_v1_providers(self, client, patched_deps):
"""Upstream URL for the terraform package type prepends /v1/providers/."""
deps = patched_deps
deps["storage"].exists.return_value = False
with patch(
"artifactapi.artifact.proxy.cache_single_artifact",
new_callable=AsyncMock,
return_value={"status": "cached"},
) as mock_fetch:
deps["storage"].download_object.return_value = b'{"versions":[]}'
deps["cache"].is_mutable_file.return_value = True
client.get("/api/v1/remote/terraform-registry-test/hashicorp/vault/versions")
called_url = mock_fetch.call_args[0][0]
assert called_url == "https://registry.terraform.io/v1/providers/hashicorp/vault/versions"
def test_versions_cache_miss_fetches_upstream(self, client, patched_deps):
deps = patched_deps
deps["storage"].exists.return_value = False
deps["storage"].download_object.return_value = b'{"versions":[]}'
deps["cache"].is_mutable_file.return_value = True
with patch(
"artifactapi.artifact.proxy.cache_single_artifact",
new_callable=AsyncMock,
return_value={"status": "cached"},
) as mock_fetch:
response = client.get("/api/v1/remote/terraform-registry-test/hashicorp/vault/versions")
mock_fetch.assert_called_once()
assert response.status_code == 200
# ---------------------------------------------------------------------------
# Quarantine (quarantine-test remote: quarantine_new=True, quarantine_days=3)
# ---------------------------------------------------------------------------
+293 -59
View File
@@ -8,11 +8,15 @@ import yaml
from artifactapi.artifact.virtual import (
_HANDLERS,
_entries_to_msgpack_safe,
_get_member_index,
_HelmDumper,
_HelmHandler,
_merge_helm_indexes,
_rewrite_urls,
_VirtualHandler,
_YamlDumperBase,
_YamlLoader,
)
# ---------------------------------------------------------------------------
@@ -66,12 +70,47 @@ entries:
generated: "2023-01-01T00:00:00.000Z"
"""
_INDEX_RELATIVE = b"""\
apiVersion: v1
entries:
rancher:
- name: rancher
version: "2.13.1"
urls:
- rancher-2.13.1.tgz
generated: "2023-01-01T00:00:00.000Z"
"""
_CFG_A = {"base_url": "https://helm.releases.hashicorp.com", "cache": {"mutable_ttl": 3600}}
_CFG_B = {"base_url": "https://charts.example.com", "cache": {"mutable_ttl": 1800}}
def _identity_resolve(data, *args, **kwargs):
return data, None
# ---------------------------------------------------------------------------
# _YamlLoader / _YamlDumperBase — C extension selection
# ---------------------------------------------------------------------------
class TestYamlExtensionSelection:
def test_loader_is_a_class(self):
assert isinstance(_YamlLoader, type)
def test_dumper_base_is_a_class(self):
assert isinstance(_YamlDumperBase, type)
def test_helm_dumper_uses_selected_base(self):
assert issubclass(_HelmDumper, _YamlDumperBase)
def test_c_extensions_used_when_available(self):
try:
assert _YamlLoader is yaml.CSafeLoader
assert _YamlDumperBase is yaml.CDumper
except AttributeError:
assert _YamlLoader is yaml.SafeLoader
assert _YamlDumperBase is yaml.Dumper
def test_loader_can_parse_yaml(self):
result = yaml.load(b"key: value", Loader=_YamlLoader)
assert result == {"key": "value"}
# ---------------------------------------------------------------------------
@@ -135,14 +174,13 @@ class TestHelmHandler:
assert isinstance(msg, str) and len(msg) > 0
def test_merge_returns_bytes(self):
with patch("artifactapi.artifact.virtual._helm.resolve_content", side_effect=_identity_resolve):
result = self.handler.merge([_INDEX_A], ["member-a"], [_CFG_A], "http://proxy.example.com")
result = self.handler.merge([_INDEX_A], [None], ["member-a"], [_CFG_A], "http://proxy.example.com")
assert isinstance(result, bytes)
def test_merge_delegates_to_merge_helm_indexes(self):
with patch("artifactapi.artifact.virtual._merge_helm_indexes", return_value=b"merged") as mock_fn:
result = self.handler.merge([b"data"], ["m"], [{}], "http://proxy")
mock_fn.assert_called_once_with([b"data"], ["m"], [{}], "http://proxy")
result = self.handler.merge([b"data"], [None], ["m"], [{}], "http://proxy")
mock_fn.assert_called_once_with([b"data"], [None], ["m"], [{}], "http://proxy")
assert result == b"merged"
@@ -160,6 +198,41 @@ class TestHandlersRegistry:
assert isinstance(_HANDLERS["helm"], _VirtualHandler)
# ---------------------------------------------------------------------------
# _rewrite_urls
# ---------------------------------------------------------------------------
class TestRewriteUrls:
def _rewrite(self, urls, base_url="https://upstream.example.com", proxy_base="http://proxy.example.com", member_name="my-remote"):
return _rewrite_urls(urls, base_url, proxy_base, member_name)
def test_absolute_url_matching_base_is_rewritten(self):
result = self._rewrite(["https://upstream.example.com/chart-1.0.0.tgz"])
assert result == ["http://proxy.example.com/api/v1/remote/my-remote/chart-1.0.0.tgz"]
def test_relative_url_is_prepended_with_proxy_remote(self):
result = self._rewrite(["chart-1.0.0.tgz"])
assert result == ["http://proxy.example.com/api/v1/remote/my-remote/chart-1.0.0.tgz"]
def test_relative_url_with_leading_slash(self):
result = self._rewrite(["/chart-1.0.0.tgz"])
assert result == ["http://proxy.example.com/api/v1/remote/my-remote/chart-1.0.0.tgz"]
def test_absolute_url_not_matching_base_is_unchanged(self):
result = self._rewrite(["https://other.example.com/chart-1.0.0.tgz"])
assert result == ["https://other.example.com/chart-1.0.0.tgz"]
def test_empty_url_list_returns_empty(self):
assert self._rewrite([]) == []
def test_multiple_urls_all_rewritten(self):
urls = ["https://upstream.example.com/a-1.0.0.tgz", "b-2.0.0.tgz"]
result = self._rewrite(urls)
assert result[0] == "http://proxy.example.com/api/v1/remote/my-remote/a-1.0.0.tgz"
assert result[1] == "http://proxy.example.com/api/v1/remote/my-remote/b-2.0.0.tgz"
# ---------------------------------------------------------------------------
# _merge_helm_indexes
# ---------------------------------------------------------------------------
@@ -167,8 +240,7 @@ class TestHandlersRegistry:
class TestMergeHelmIndexes:
def _merge(self, raw_indexes, member_names, member_configs, proxy_base="http://proxy.example.com"):
with patch("artifactapi.artifact.virtual._helm.resolve_content", side_effect=_identity_resolve):
return _merge_helm_indexes(raw_indexes, member_names, member_configs, proxy_base)
return _merge_helm_indexes(raw_indexes, [None] * len(raw_indexes), member_names, member_configs, proxy_base)
def _parse(self, raw):
return yaml.safe_load(raw)
@@ -187,7 +259,18 @@ class TestMergeHelmIndexes:
def test_first_member_wins_on_duplicate_name_and_version(self):
index = self._parse(self._merge([_INDEX_A, _INDEX_B], ["member-a", "member-b"], [_CFG_A, _CFG_B]))
v027 = next(e for e in index["entries"]["vault"] if e["version"] == "0.27.0")
assert "helm.releases.hashicorp.com" in v027["urls"][0]
assert "member-a" in v027["urls"][0]
def test_absolute_urls_rewritten_to_proxy(self):
index = self._parse(self._merge([_INDEX_A], ["member-a"], [_CFG_A]))
url = index["entries"]["vault"][0]["urls"][0]
assert url == "http://proxy.example.com/api/v1/remote/member-a/vault-0.27.0.tgz"
def test_relative_urls_rewritten_to_proxy(self):
cfg = {"base_url": "https://releases.rancher.com/server-charts/stable", "cache": {"mutable_ttl": 3600}}
index = self._parse(self._merge([_INDEX_RELATIVE], ["rancher-stable"], [cfg]))
url = index["entries"]["rancher"][0]["urls"][0]
assert url == "http://proxy.example.com/api/v1/remote/rancher-stable/rancher-2.13.1.tgz"
def test_different_versions_of_same_chart_both_included(self):
index = self._parse(self._merge([_INDEX_A, _INDEX_B], ["member-a", "member-b"], [_CFG_A, _CFG_B]))
@@ -260,7 +343,7 @@ class TestGetMemberIndex:
storage.exists.return_value = True
cache.is_index_valid.return_value = True
_, _, _, raw_data = await _get_member_index("m", member_cfg, "index.yaml", storage, cache)
_, _, _, raw_data, _ = await _get_member_index("m", member_cfg, "index.yaml", storage, cache)
assert raw_data == b"cached bytes"
@@ -283,7 +366,7 @@ class TestGetMemberIndex:
mock_cls.return_value.__aenter__.return_value = mock_client
mock_client.get.return_value = self._fake_response(b"fresh bytes")
_, _, _, raw_data = await _get_member_index("m", member_cfg, "index.yaml", storage, cache)
_, _, _, raw_data, _ = await _get_member_index("m", member_cfg, "index.yaml", storage, cache)
assert raw_data == b"fresh bytes"
@@ -293,7 +376,7 @@ class TestGetMemberIndex:
mock_cls.return_value.__aenter__.return_value = mock_client
mock_client.get.return_value = self._fake_response()
_, _, _, raw_data = await _get_member_index("m", member_cfg, "index.yaml", storage, cache)
_, _, _, raw_data, _ = await _get_member_index("m", member_cfg, "index.yaml", storage, cache)
assert raw_data == b"upstream bytes"
@@ -352,7 +435,7 @@ class TestGetMemberIndex:
mock_cls.return_value.__aenter__.return_value = mock_client
mock_client.get.side_effect = Exception("connection refused")
_, _, _, raw_data = await _get_member_index("m", member_cfg, "index.yaml", storage, cache)
_, _, _, raw_data, _ = await _get_member_index("m", member_cfg, "index.yaml", storage, cache)
assert raw_data is None
@@ -364,7 +447,7 @@ class TestGetMemberIndex:
mock_cls.return_value.__aenter__.return_value = mock_client
mock_client.get.return_value = self._fake_response()
_, _, _, raw_data = await _get_member_index("m", member_cfg, "index.yaml", storage, cache)
_, _, _, raw_data, _ = await _get_member_index("m", member_cfg, "index.yaml", storage, cache)
assert raw_data == b"upstream bytes"
@@ -375,7 +458,7 @@ class TestGetMemberIndex:
mock_cls.return_value.__aenter__.return_value = mock_client
mock_client.get.return_value = self._fake_response()
_, _, ttl, _ = await _get_member_index("m", cfg, "index.yaml", storage, cache)
_, _, ttl, _, _ = await _get_member_index("m", cfg, "index.yaml", storage, cache)
assert ttl == 900
@@ -386,7 +469,7 @@ class TestGetMemberIndex:
mock_cls.return_value.__aenter__.return_value = mock_client
mock_client.get.return_value = self._fake_response()
_, _, ttl, _ = await _get_member_index("m", cfg, "index.yaml", storage, cache)
_, _, ttl, _, _ = await _get_member_index("m", cfg, "index.yaml", storage, cache)
assert ttl == 3600
@@ -430,10 +513,10 @@ class TestVirtualRoute:
response = client.get("/api/v1/virtual/no-such-virtual/index.yaml")
assert response.status_code == 404
def test_non_virtual_type_returns_400(self, client, patched_virtual_deps):
# helm-test is type "remote", not "virtual"
def test_non_virtual_name_returns_404(self, client, patched_virtual_deps):
# helm-test is in remotes, not virtuals
response = client.get("/api/v1/virtual/helm-test/index.yaml")
assert response.status_code == 400
assert response.status_code == 404
def test_unsupported_package_returns_400(self, client, patched_virtual_deps):
# unsupported-virtual-test has package "rpm"
@@ -484,22 +567,16 @@ class TestVirtualRoute:
mock_get.assert_not_called()
def test_cache_miss_returns_200_with_yaml_content_type(self, client, patched_virtual_deps):
with (
patch("artifactapi.artifact.virtual._get_member_index", new_callable=AsyncMock) as mock_get,
patch("artifactapi.artifact.virtual._helm.resolve_content", side_effect=_identity_resolve),
):
mock_get.return_value = ("helm-test", _CFG_A, 3600, _INDEX_SIMPLE)
with patch("artifactapi.artifact.virtual._get_member_index", new_callable=AsyncMock) as mock_get:
mock_get.return_value = ("helm-test", _CFG_A, 3600, _INDEX_SIMPLE, None)
response = client.get("/api/v1/virtual/helm-virtual-test/index.yaml")
assert response.status_code == 200
assert "text/yaml" in response.headers["content-type"]
def test_cache_miss_response_contains_merged_entries(self, client, patched_virtual_deps):
with (
patch("artifactapi.artifact.virtual._get_member_index", new_callable=AsyncMock) as mock_get,
patch("artifactapi.artifact.virtual._helm.resolve_content", side_effect=_identity_resolve),
):
mock_get.return_value = ("helm-test", _CFG_A, 3600, _INDEX_SIMPLE)
with patch("artifactapi.artifact.virtual._get_member_index", new_callable=AsyncMock) as mock_get:
mock_get.return_value = ("helm-test", _CFG_A, 3600, _INDEX_SIMPLE, None)
response = client.get("/api/v1/virtual/helm-virtual-test/index.yaml")
index = yaml.safe_load(response.content)
@@ -507,35 +584,26 @@ class TestVirtualRoute:
def test_cache_miss_stores_result_in_s3(self, client, patched_virtual_deps):
deps = patched_virtual_deps
with (
patch("artifactapi.artifact.virtual._get_member_index", new_callable=AsyncMock) as mock_get,
patch("artifactapi.artifact.virtual._helm.resolve_content", side_effect=_identity_resolve),
):
mock_get.return_value = ("helm-test", _CFG_A, 3600, _INDEX_SIMPLE)
with patch("artifactapi.artifact.virtual._get_member_index", new_callable=AsyncMock) as mock_get:
mock_get.return_value = ("helm-test", _CFG_A, 3600, _INDEX_SIMPLE, None)
client.get("/api/v1/virtual/helm-virtual-test/index.yaml")
deps["storage"].upload.assert_called_once()
def test_cache_miss_marks_index_cached(self, client, patched_virtual_deps):
deps = patched_virtual_deps
with (
patch("artifactapi.artifact.virtual._get_member_index", new_callable=AsyncMock) as mock_get,
patch("artifactapi.artifact.virtual._helm.resolve_content", side_effect=_identity_resolve),
):
mock_get.return_value = ("helm-test", _CFG_A, 3600, _INDEX_SIMPLE)
with patch("artifactapi.artifact.virtual._get_member_index", new_callable=AsyncMock) as mock_get:
mock_get.return_value = ("helm-test", _CFG_A, 3600, _INDEX_SIMPLE, None)
client.get("/api/v1/virtual/helm-virtual-test/index.yaml")
deps["cache"].mark_index_cached.assert_called_once()
def test_cache_miss_uses_min_ttl_across_members(self, client, patched_virtual_deps):
deps = patched_virtual_deps
with (
patch("artifactapi.artifact.virtual._get_member_index", new_callable=AsyncMock) as mock_get,
patch("artifactapi.artifact.virtual._helm.resolve_content", side_effect=_identity_resolve),
):
with patch("artifactapi.artifact.virtual._get_member_index", new_callable=AsyncMock) as mock_get:
mock_get.side_effect = [
("helm-test", _CFG_A, 3600, _INDEX_SIMPLE),
("helm-member-2", _CFG_B, 1800, _INDEX_SIMPLE),
("helm-test", _CFG_A, 3600, _INDEX_SIMPLE, None),
("helm-member-2", _CFG_B, 1800, _INDEX_SIMPLE, None),
]
client.get("/api/v1/virtual/helm-virtual-test/index.yaml")
@@ -544,19 +612,16 @@ class TestVirtualRoute:
def test_all_members_unreachable_returns_502(self, client, patched_virtual_deps):
with patch("artifactapi.artifact.virtual._get_member_index", new_callable=AsyncMock) as mock_get:
mock_get.return_value = ("helm-test", _CFG_A, 3600, None)
mock_get.return_value = ("helm-test", _CFG_A, 3600, None, None)
response = client.get("/api/v1/virtual/helm-virtual-test/index.yaml")
assert response.status_code == 502
def test_one_member_unreachable_still_returns_200(self, client, patched_virtual_deps):
with (
patch("artifactapi.artifact.virtual._get_member_index", new_callable=AsyncMock) as mock_get,
patch("artifactapi.artifact.virtual._helm.resolve_content", side_effect=_identity_resolve),
):
with patch("artifactapi.artifact.virtual._get_member_index", new_callable=AsyncMock) as mock_get:
mock_get.side_effect = [
("helm-test", _CFG_A, 3600, _INDEX_SIMPLE),
("helm-member-2", _CFG_B, 1800, None),
("helm-test", _CFG_A, 3600, _INDEX_SIMPLE, None),
("helm-member-2", _CFG_B, 1800, None, None),
]
response = client.get("/api/v1/virtual/helm-virtual-test/index.yaml")
@@ -572,10 +637,9 @@ class TestVirtualRoute:
with (
patch("artifactapi.artifact.virtual._get_member_index", new_callable=AsyncMock) as mock_get,
patch("artifactapi.artifact.virtual._helm.resolve_content", side_effect=_identity_resolve),
patch.object(main_mod.config, "get_remote_config", side_effect=patched_get),
):
mock_get.return_value = ("helm-test", _CFG_A, 3600, _INDEX_SIMPLE)
mock_get.return_value = ("helm-test", _CFG_A, 3600, _INDEX_SIMPLE, None)
response = client.get("/api/v1/virtual/helm-virtual-test/index.yaml")
# only helm-test was available — should succeed
@@ -586,11 +650,181 @@ class TestVirtualRoute:
deps = patched_virtual_deps
deps["storage"].upload.side_effect = Exception("S3 write error")
with (
patch("artifactapi.artifact.virtual._get_member_index", new_callable=AsyncMock) as mock_get,
patch("artifactapi.artifact.virtual._helm.resolve_content", side_effect=_identity_resolve),
):
mock_get.return_value = ("helm-test", _CFG_A, 3600, _INDEX_SIMPLE)
with patch("artifactapi.artifact.virtual._get_member_index", new_callable=AsyncMock) as mock_get:
mock_get.return_value = ("helm-test", _CFG_A, 3600, _INDEX_SIMPLE, None)
response = client.get("/api/v1/virtual/helm-virtual-test/index.yaml")
assert response.status_code == 200
# ---------------------------------------------------------------------------
# _entries_to_msgpack_safe
# ---------------------------------------------------------------------------
class TestEntriesToMsgpackSafe:
def test_plain_string_values_pass_through(self):
entries = {"chart": [{"name": "chart", "version": "1.0.0", "urls": ["http://x/c.tgz"]}]}
result = _entries_to_msgpack_safe(entries)
assert result["chart"][0]["version"] == "1.0.0"
def test_datetime_converted_to_iso_string(self):
dt = datetime(2023, 6, 15, 12, 0, 0, tzinfo=UTC)
entries = {"chart": [{"name": "chart", "version": "1.0.0", "created": dt}]}
result = _entries_to_msgpack_safe(entries)
assert isinstance(result["chart"][0]["created"], str)
assert "2023-06-15" in result["chart"][0]["created"]
def test_date_converted_to_iso_string(self):
entries = {"chart": [{"name": "chart", "version": "1.0.0", "created": date(2023, 6, 15)}]}
result = _entries_to_msgpack_safe(entries)
assert result["chart"][0]["created"] == "2023-06-15"
def test_empty_entries_returns_empty_dict(self):
assert _entries_to_msgpack_safe({}) == {}
def test_multiple_versions_all_converted(self):
dt = datetime(2023, 1, 1, tzinfo=UTC)
entries = {
"chart": [
{"name": "chart", "version": "1.0.0", "created": dt},
{"name": "chart", "version": "2.0.0", "created": dt},
]
}
result = _entries_to_msgpack_safe(entries)
for v in result["chart"]:
assert isinstance(v["created"], str)
def test_result_is_msgpack_serializable(self):
import msgpack
dt = datetime(2023, 6, 15, 12, 0, 0, tzinfo=UTC)
entries = {"chart": [{"name": "chart", "version": "1.0.0", "created": dt, "urls": ["http://x/c.tgz"]}]}
safe = _entries_to_msgpack_safe(entries)
packed = msgpack.packb(safe, use_bin_type=True)
unpacked = msgpack.unpackb(packed, raw=False)
assert unpacked["chart"][0]["created"] == safe["chart"][0]["created"]
# ---------------------------------------------------------------------------
# _merge_helm_indexes — pre-parsed entries path
# ---------------------------------------------------------------------------
class TestMergeHelmIndexesWithParsed:
"""Verify that pre-parsed entries (from msgpack) produce the same output as raw YAML."""
def _parse_entries(self, raw: bytes) -> dict:
index = yaml.safe_load(raw)
return index.get("entries") or {}
def test_parsed_entries_produce_same_charts_as_raw(self):
parsed = self._parse_entries(_INDEX_A)
raw_result = yaml.safe_load(_merge_helm_indexes([_INDEX_A], [None], ["member-a"], [_CFG_A], "http://proxy.example.com"))
parsed_result = yaml.safe_load(_merge_helm_indexes([_INDEX_A], [parsed], ["member-a"], [_CFG_A], "http://proxy.example.com"))
assert set(raw_result["entries"].keys()) == set(parsed_result["entries"].keys())
def test_parsed_entries_urls_are_rewritten(self):
parsed = self._parse_entries(_INDEX_A)
result = yaml.safe_load(_merge_helm_indexes([_INDEX_A], [parsed], ["member-a"], [_CFG_A], "http://proxy.example.com"))
url = result["entries"]["vault"][0]["urls"][0]
assert "member-a" in url
assert "proxy.example.com" in url
def test_none_parsed_falls_back_to_raw_bytes(self):
result = yaml.safe_load(_merge_helm_indexes([_INDEX_A], [None], ["member-a"], [_CFG_A], "http://proxy.example.com"))
assert "vault" in result["entries"]
def test_mixed_parsed_and_raw_merge_correctly(self):
parsed_a = self._parse_entries(_INDEX_A)
result = yaml.safe_load(
_merge_helm_indexes(
[_INDEX_A, _INDEX_B],
[parsed_a, None],
["member-a", "member-b"],
[_CFG_A, _CFG_B],
"http://proxy.example.com",
)
)
assert "vault" in result["entries"]
assert "nginx" in result["entries"]
# ---------------------------------------------------------------------------
# _get_member_index — msgpack cache behaviour
# ---------------------------------------------------------------------------
class TestGetMemberIndexMsgpack:
@pytest.fixture
def storage(self):
m = MagicMock()
m.get_object_key.side_effect = lambda name, path: f"{name}/{path}"
m.exists.return_value = False
m.download_object.return_value = _INDEX_SIMPLE
return m
@pytest.fixture
def cache(self):
m = MagicMock()
m.is_index_valid.return_value = False
return m
@pytest.fixture
def member_cfg(self):
return {"base_url": "https://helm.releases.hashicorp.com", "cache": {"mutable_ttl": 3600}}
def _fake_response(self, content=_INDEX_SIMPLE):
r = MagicMock()
r.content = content
r.raise_for_status = MagicMock()
return r
async def test_cache_hit_with_msgpack_returns_parsed_entries(self, storage, cache, member_cfg):
import msgpack
entries = {"mychart": [{"name": "mychart", "version": "1.0.0", "urls": ["http://x/c.tgz"]}]}
packed = msgpack.packb(entries, use_bin_type=True)
storage.exists.side_effect = lambda key: True
cache.is_index_valid.return_value = True
storage.download_object.side_effect = lambda key: packed if key.endswith("index.msgpack") else _INDEX_SIMPLE
_, _, _, raw_data, parsed = await _get_member_index("m", member_cfg, "index.yaml", storage, cache)
assert parsed == entries
async def test_cache_miss_builds_msgpack_and_returns_parsed(self, storage, cache, member_cfg):
with patch("artifactapi.artifact.virtual.httpx.AsyncClient") as mock_cls:
mock_client = AsyncMock()
mock_cls.return_value.__aenter__.return_value = mock_client
mock_client.get.return_value = self._fake_response()
_, _, _, raw_data, parsed = await _get_member_index("m", member_cfg, "index.yaml", storage, cache)
assert raw_data == _INDEX_SIMPLE
assert isinstance(parsed, dict)
assert "mychart" in parsed
async def test_broken_msgpack_rebuilds_from_raw_yaml(self, storage, cache, member_cfg):
storage.exists.side_effect = lambda key: True
cache.is_index_valid.return_value = True
storage.download_object.side_effect = lambda key: b"not-valid-msgpack" if key.endswith("index.msgpack") else _INDEX_SIMPLE
_, _, _, raw_data, parsed = await _get_member_index("m", member_cfg, "index.yaml", storage, cache)
assert raw_data == _INDEX_SIMPLE
# Falls back to YAML parse and rebuilds msgpack — entries are returned
assert isinstance(parsed, dict)
assert "mychart" in parsed
async def test_upstream_failure_returns_none_for_both(self, storage, cache, member_cfg):
with patch("artifactapi.artifact.virtual.httpx.AsyncClient") as mock_cls:
mock_client = AsyncMock()
mock_cls.return_value.__aenter__.return_value = mock_client
mock_client.get.side_effect = Exception("timeout")
_, _, _, raw_data, parsed = await _get_member_index("m", member_cfg, "index.yaml", storage, cache)
assert raw_data is None
assert parsed is None