Compare commits

...

7 Commits

Author SHA1 Message Date
unkinben 8adcbac405 Merge pull request 'feat: add helm chart repository caching proxy' (#17) from benvin/helm-remote into master
ci/woodpecker/tag/docker Pipeline was successful
Reviewed-on: #17
2026-04-27 22:22:36 +10:00
unkinben 4ca89b9159 feat: add helm chart repository caching proxy
ci/woodpecker/pr/pre-commit Pipeline was successful
ci/woodpecker/pr/test Pipeline was successful
ci/woodpecker/pr/build Pipeline was successful
- Add helm package type with index.yaml as mutable (TTL-based) and
  .tgz chart tarballs as immutable
- Rewrite chart URLs in index.yaml to serve tarballs via proxy cache
- Add text/yaml content-type detection for .yaml/.yml files
- Add hashicorp-helm example remote in remotes.yaml
- Update README with Helm chart repository proxy section
- Add tests for helm mutable patterns and route behaviour
2026-04-27 22:17:31 +10:00
unkinben 25b85ddc92 Merge pull request 'feat: add npm registry caching proxy' (#16) from benvin/npm-remote into master
ci/woodpecker/tag/docker Pipeline was successful
Reviewed-on: #16
2026-04-27 20:30:18 +10:00
unkinben d585ab425c feat: add npm remote type with metadata URL rewriting and caching
ci/woodpecker/pr/test Pipeline was successful
ci/woodpecker/pr/pre-commit Pipeline was successful
ci/woodpecker/pr/build Pipeline was successful
- Add `npm` package type to config with no built-in mutable defaults;
  users set explicit mutable_patterns (e.g. ^(?!.*\.tgz$).*) and
  immutable_patterns (e.g. \.tgz$) in remotes.yaml
- Rewrite dist.tarball URLs in metadata JSON on the fly so tarball
  downloads pass through the same proxy remote instead of hitting
  npmjs.org directly
- Single-remote design: npm_files_remote points back to itself since
  both metadata and tarballs are served from registry.npmjs.org
- Add .tgz to _get_content_type (application/gzip)
- Add example npm remote to remotes.yaml
- Add npm proxy section to README covering remotes.yaml config,
  client setup (npm/yarn/pnpm), rewriting behaviour, and
  mutable vs immutable path table
- Add tests for mutable pattern matching, URL rewriting, content-type,
  scoped packages, cache miss, and tarball immutability
2026-04-27 20:28:31 +10:00
unkinben 6b1a6c9eb4 Merge pull request 'feat: add PyPI remote type with URL rewriting and basic auth' (#15) from benvin/pypi-remote into master
Reviewed-on: #15
2026-04-27 14:46:27 +10:00
unkinben 5de912db75 docs: describe PyPI remote usage with uv system/user uv.toml
ci/woodpecker/pr/pre-commit Pipeline was successful
ci/woodpecker/pr/test Pipeline was successful
ci/woodpecker/pr/build Pipeline was successful
2026-04-27 14:37:41 +10:00
unkinben 8e9d313892 feat: add pypi remote type with URL rewriting and basic auth
ci/woodpecker/pr/pre-commit Pipeline was successful
ci/woodpecker/pr/test Pipeline was successful
ci/woodpecker/pr/build Pipeline was successful
- Add 'pypi' package type to config.py; simple/ paths are mutable by default
- Refactor content-type detection into _get_content_type() helper; add .whl
- Add _resolve_content() which rewrites files host URLs in simple index HTML
  to go through the proxy (pypi_files_url / pypi_files_remote config keys),
  and returns text/html content-type for simple index responses
- Add basic auth support for non-Docker remotes (username + password/token
  in remote config); thread auth through _upstream_reachable and
  check_upstream_changed so mutable TTL checks also authenticate
- Add 'pypi' remote (pypi.org simple index) and 'pypi-files' remote
  (files.pythonhosted.org) to remotes.yaml; add 'pypi-gitea' example for
  Gitea package registries where index and files share the same base URL
- Add unit tests: simple index URL rewriting, HTML content-type, .whl/.tar.gz
  content-types, mutable index detection, and immutable pattern enforcement
2026-04-27 14:31:33 +10:00
7 changed files with 744 additions and 46 deletions
+214
View File
@@ -13,6 +13,8 @@ A generic FastAPI-based artifact caching system that downloads and stores files
- **Stale-on-Upstream-Error**: Expired mutable files are kept and their TTL refreshed when the backend cannot be reached, so cached data remains available during upstream outages
- **S3 Storage**: MinIO/S3 backend with predictable paths
- **Docker Registry Proxy**: Full Docker Registry HTTP API v2 for transparent container image caching
- **npm Package Proxy**: Caching proxy for the npm registry with metadata URL rewriting so tarballs also pass through cache
- **Helm Chart Repository Proxy**: Caching proxy for Helm chart repositories with `index.yaml` URL rewriting so chart tarballs also pass through cache
- **Content-Type Detection**: Automatic MIME type detection for downloads
## Architecture
@@ -932,3 +934,215 @@ curl -I https://artifacts.example.com/v2/dockerhub/library/nginx/manifests/lates
# Check what's stored in the cache
curl https://artifacts.example.com/ | jq '.remotes'
```
## Python Package Proxy with uv
The `pypi` package type turns the artifact API into a caching PyPI proxy. Simple index pages (`/simple/{package}/`) are mutable and expire after `mutable_ttl`; package files (wheels, sdists, metadata) are immutable and cached forever. URLs in the simple index HTML are rewritten on the fly to point back through the proxy, so both the index lookup and the file download are served from cache.
### remotes.yaml
```yaml
remotes:
pypi:
base_url: "https://pypi.org"
type: "remote"
package: "pypi"
pypi_files_url: "https://files.pythonhosted.org" # host to rewrite in index HTML
pypi_files_remote: "pypi-files" # our proxy remote to replace it with
check_mutable_updates: true
cache:
immutable_ttl: 0
mutable_ttl: 600 # re-check simple indexes after 10 minutes
pypi-files:
base_url: "https://files.pythonhosted.org"
type: "remote"
package: "generic"
immutable_patterns:
- "packages/.*\\.whl$"
- "packages/.*\\.whl\\.metadata$"
- "packages/.*\\.tar\\.gz$"
- "packages/.*\\.zip$"
- "packages/.*\\.egg$"
cache:
immutable_ttl: 0 # package files are content-addressed — cache forever
# Self-hosted Gitea PyPI registry (index and files share the same base URL)
pypi-gitea:
base_url: "https://gitea.example.com/api/packages/myorg/pypi"
type: "remote"
package: "pypi"
# username: "your-gitea-username"
# password: "your-personal-access-token" # needs package:read scope
pypi_files_url: "https://gitea.example.com/api/packages/myorg/pypi"
pypi_files_remote: "pypi-gitea" # point back to itself — Gitea serves both index and files
check_mutable_updates: true
immutable_patterns:
- "files/.*\\.whl$"
- "files/.*\\.whl\\.metadata$"
- "files/.*\\.tar\\.gz$"
- "files/.*\\.zip$"
- "files/.*\\.egg$"
cache:
immutable_ttl: 0
mutable_ttl: 600
```
### Configuring uv system- or user-wide
uv reads `uv.toml` from two locations outside any project, applied in order from broadest to narrowest scope:
| Scope | Path (Linux/macOS) |
|---|---|
| System | `/etc/uv/uv.toml` |
| User | `~/.config/uv/uv.toml` |
Use these files to route **all** package installs on a machine through the proxy without touching individual projects or their `pyproject.toml`.
**`/etc/uv/uv.toml`** — applies to every user on the host:
```toml
# Replace the default PyPI index with the caching proxy
[[index]]
url = "https://artifacts.example.com/api/v1/remote/pypi/simple"
default = true
# Optionally add a private index (searched alongside the default)
[[index]]
url = "https://artifacts.example.com/api/v1/remote/pypi-gitea/simple"
name = "gitea"
```
**`~/.config/uv/uv.toml`** — same syntax, single-user scope:
```toml
[[index]]
url = "https://artifacts.example.com/api/v1/remote/pypi/simple"
default = true
```
Setting `default = true` replaces uv's built-in PyPI index. The first install of a package fetches it from upstream and populates the cache; every subsequent install — from any machine or fresh environment pointing at the same proxy — is served directly from S3.
### How the rewriting works
When uv requests the simple index for a package, the proxy:
1. Fetches `https://pypi.org/simple/{package}/` (or returns a valid cached copy within `mutable_ttl`)
2. Rewrites every `https://files.pythonhosted.org/...` href to `https://artifacts.example.com/api/v1/remote/pypi-files/...`
3. Returns the rewritten HTML to uv
uv then downloads wheels and `.whl.metadata` files via the rewritten URLs, which also pass through the proxy and are cached as immutable artifacts.
For self-hosted registries like Gitea, both the index and file downloads share the same base URL. Setting `pypi_files_url` and `pypi_files_remote` to the same remote causes file links to be rewritten back through the same proxy entry.
## npm Package Proxy
The `npm` package type turns the artifact API into a caching npm registry proxy. Since the npm registry serves both metadata and tarballs from the same host, a single remote handles everything. Package metadata (e.g. `GET /express`) is mutable and expires after `mutable_ttl`; tarballs (`.tgz`) are immutable and cached forever. `dist.tarball` URLs in metadata JSON are rewritten on the fly to point back through the same remote, so both the metadata lookup and the tarball download are served from cache.
### remotes.yaml
```yaml
remotes:
npm:
base_url: "https://registry.npmjs.org"
type: "remote"
package: "npm"
npm_files_url: "https://registry.npmjs.org" # URL prefix to rewrite in metadata JSON
npm_files_remote: "npm" # rewrite back to this same remote
check_mutable_updates: true
immutable_patterns:
- "\.tgz$" # versioned tarballs are content-addressed — cache forever
mutable_patterns:
- "^(?!.*\.tgz$).*" # everything else (package metadata) expires after mutable_ttl
cache:
immutable_ttl: 0
mutable_ttl: 600 # re-check package metadata after 10 minutes
```
### Configuring npm / yarn / pnpm
**npm** — per-project `.npmrc` or `~/.npmrc`:
```ini
registry=https://artifacts.example.com/api/v1/remote/npm/
```
**yarn**`~/.yarnrc.yml`:
```yaml
npmRegistryServer: "https://artifacts.example.com/api/v1/remote/npm/"
```
**pnpm**`.npmrc`:
```ini
registry=https://artifacts.example.com/api/v1/remote/npm/
```
### How the rewriting works
When a client requests package metadata, the proxy:
1. Fetches `https://registry.npmjs.org/{package}` (or returns a cached copy within `mutable_ttl`)
2. Rewrites every `https://registry.npmjs.org/...` tarball URL to `https://artifacts.example.com/api/v1/remote/npm/...`
3. Returns the rewritten JSON to the client
The client then downloads the tarball via the rewritten URL, which hits the same `npm` remote and is cached as an immutable artifact. Subsequent installs of the same package version are served entirely from S3.
### Mutable vs immutable paths
| Path pattern | Type | Example |
|---|---|---|
| `/{package}` | Mutable (TTL) | `/express` |
| `/@{scope}/{package}` | Mutable (TTL) | `/@babel/core` |
| `/-/all` | Mutable (TTL) | `/-/all` |
| `/{package}/-/{package}-{version}.tgz` | Immutable (forever) | `/express/-/express-4.18.2.tgz` |
| `/@{scope}/{pkg}/-/{pkg}-{ver}.tgz` | Immutable (forever) | `/@babel/core/-/core-7.21.0.tgz` |
## Helm Chart Repository Proxy
The `helm` package type turns the artifact API into a caching Helm chart repository proxy. A single remote handles both the mutable `index.yaml` and the immutable versioned chart tarballs, since they are served from the same upstream host. Chart URLs inside `index.yaml` are rewritten on the fly to point back through the same remote, so both the index lookup and the chart download are served from cache.
### remotes.yaml
```yaml
remotes:
hashicorp-helm:
base_url: "https://helm.releases.hashicorp.com"
type: "remote"
package: "helm"
check_mutable_updates: true
immutable_patterns:
- "\\.tgz$" # chart tarballs — cache forever
cache:
immutable_ttl: 0
mutable_ttl: 3600 # index.yaml refreshed after 1 hour
```
### Configuring Helm
Point Helm at the proxy with `helm repo add`:
```bash
helm repo add hashicorp https://artifacts.example.com/api/v1/remote/hashicorp-helm
helm repo update
helm search repo hashicorp/vault
helm install vault hashicorp/vault
```
### How the rewriting works
When a client requests `index.yaml`, the proxy:
1. Fetches `https://helm.releases.hashicorp.com/index.yaml` (or returns a cached copy within `mutable_ttl`)
2. Rewrites every `https://helm.releases.hashicorp.com/...` chart URL to `https://artifacts.example.com/api/v1/remote/hashicorp-helm/...`
3. Returns the rewritten YAML to the client
The client then downloads chart tarballs via the rewritten URLs, which hit the same `hashicorp-helm` remote and are cached as immutable artifacts. Subsequent installs of the same chart version are served entirely from S3.
### Mutable vs immutable paths
| Path | Type | Example |
|---|---|---|
| `index.yaml` | Mutable (TTL) | `index.yaml` |
| `{chart}-{version}.tgz` | Immutable (forever) | `vault-0.29.1.tgz` |
+79
View File
@@ -194,6 +194,85 @@ remotes:
immutable_ttl: 0
mutable_ttl: 300
pypi:
base_url: "https://pypi.org"
type: "remote"
package: "pypi"
description: "Python Package Index — simple repository API"
# pypi_files_url: the upstream host used in simple-index hrefs (default: files.pythonhosted.org)
# pypi_files_remote: our proxy remote that will serve those files (default: pypi-files)
pypi_files_url: "https://files.pythonhosted.org"
pypi_files_remote: "pypi-files"
check_mutable_updates: true
cache:
immutable_ttl: 0
mutable_ttl: 600 # Simple index pages refreshed after 10 minutes
pypi-gitea:
base_url: "https://gitea.example.com/api/packages/myorg/pypi"
type: "remote"
package: "pypi"
description: "Private Gitea PyPI registry"
# username: "your-gitea-username"
# password: "your-personal-access-token" # needs package:read scope
# Files are served from the same Gitea instance — rewrite back to this same remote
pypi_files_url: "https://gitea.example.com/api/packages/myorg/pypi"
pypi_files_remote: "pypi-gitea"
check_mutable_updates: true
immutable_patterns:
- "files/.*\\.whl$"
- "files/.*\\.whl\\.metadata$"
- "files/.*\\.tar\\.gz$"
- "files/.*\\.zip$"
- "files/.*\\.egg$"
cache:
immutable_ttl: 0
mutable_ttl: 600
pypi-files:
base_url: "https://files.pythonhosted.org"
type: "remote"
package: "generic"
description: "Python Package Index — file storage (wheels, sdists)"
immutable_patterns:
- "packages/.*\\.whl$"
- "packages/.*\\.whl\\.metadata$"
- "packages/.*\\.tar\\.gz$"
- "packages/.*\\.zip$"
- "packages/.*\\.egg$"
cache:
immutable_ttl: 0 # Package files are content-addressed — cache forever
npm:
base_url: "https://registry.npmjs.org"
type: "remote"
package: "npm"
description: "npm registry — package metadata with tarball URL rewriting"
# npm_files_url: the upstream host used in metadata tarball hrefs (default: https://registry.npmjs.org)
# npm_files_remote: our proxy remote that will serve those tarballs (default: npm-files)
npm_files_url: "https://registry.npmjs.org"
npm_files_remote: "npm"
check_mutable_updates: true
immutable_patterns:
- \.tgz$
mutable_patterns:
- ^(?!.*\.tgz$).*
cache:
immutable_ttl: 0
mutable_ttl: 600 # Package metadata refreshed after 10 minutes
hashicorp-helm:
base_url: "https://helm.releases.hashicorp.com"
type: "remote"
package: "helm"
description: "HashiCorp Helm chart repository (Vault, Consul, Nomad, etc.)"
check_mutable_updates: true
immutable_patterns:
- "\\.tgz$"
cache:
immutable_ttl: 0 # Chart tarballs are versioned — cache forever
mutable_ttl: 3600 # index.yaml refreshed after 1 hour
local-generic:
type: "local"
package: "generic"
+7
View File
@@ -18,6 +18,13 @@ _PACKAGE_MUTABLE_PATTERNS: dict[str, list[str]] = {
r"/manifests/(?!sha256:)[^/]+$",
r"/tags/list$",
],
"pypi": [
r"simple/", # Per-package and top-level simple index pages
],
"npm": [],
"helm": [
r"index\.yaml$",
],
"generic": [],
}
+83 -45
View File
@@ -1,3 +1,4 @@
import base64
import hashlib
import json
import logging
@@ -208,8 +209,11 @@ async def cache_single_artifact(url: str, remote_name: str, path: str) -> dict:
remote_config = config.get_remote_config(remote_name) or {}
is_docker = remote_config.get("package") == "docker" or "/v2/" in url
# Prepare headers for Docker registry requests
# Prepare headers
headers = {}
username = remote_config.get("username")
password = remote_config.get("password")
if is_docker:
if "/manifests/" in url:
headers["Accept"] = (
@@ -220,6 +224,8 @@ async def cache_single_artifact(url: str, remote_name: str, path: str) -> dict:
)
elif "/blobs/" in url:
headers["Accept"] = "application/octet-stream"
elif username and password:
headers["Authorization"] = "Basic " + base64.b64encode(f"{username}:{password}".encode()).decode()
async with httpx.AsyncClient(follow_redirects=True) as client:
response = await client.get(url, headers=headers)
@@ -254,11 +260,20 @@ async def cache_single_artifact(url: str, remote_name: str, path: str) -> dict:
return {"url": url, "status": "error", "error": str(e)}
async def _upstream_reachable(url: str) -> bool:
def _basic_auth_header(remote_cfg: dict) -> dict[str, str]:
username = remote_cfg.get("username")
password = remote_cfg.get("password")
if username and password:
token = base64.b64encode(f"{username}:{password}".encode()).decode()
return {"Authorization": f"Basic {token}"}
return {}
async def _upstream_reachable(url: str, auth_headers: dict | None = None) -> bool:
"""HEAD with a short timeout. Returns False only on network/timeout errors."""
try:
async with httpx.AsyncClient(follow_redirects=True) as client:
await client.head(url, timeout=10.0)
await client.head(url, headers=auth_headers or {}, timeout=10.0)
return True
except (httpx.NetworkError, httpx.TimeoutException):
return False
@@ -266,19 +281,19 @@ async def _upstream_reachable(url: str) -> bool:
return True # 4xx/5xx means backend is up
async def check_upstream_changed(remote_url: str, remote_name: str, path: str) -> bool:
async def check_upstream_changed(remote_url: str, remote_name: str, path: str, auth_headers: dict | None = None) -> bool:
"""Conditional HEAD against upstream. Returns False only on a definitive 304.
Raises UpstreamUnreachable if the backend cannot be contacted."""
meta = cache.get_mutable_meta(remote_name, path)
if not meta:
return True
headers = {}
headers = dict(auth_headers or {})
if meta.get("etag"):
headers["If-None-Match"] = meta["etag"]
if meta.get("last_modified"):
headers["If-Modified-Since"] = meta["last_modified"]
if not headers:
if not (meta.get("etag") or meta.get("last_modified")):
return True
try:
@@ -294,12 +309,13 @@ async def handle_expired_mutable(remote_name: str, path: str, remote_url: str) -
mutable_ttl = config.get_cache_config(remote_name).get("mutable_ttl", 3600)
remote_cfg = config.get_remote_config(remote_name) or {}
auth = _basic_auth_header(remote_cfg)
check_updates = remote_cfg.get("check_mutable_updates", False)
user_mutable = check_updates and cache.is_mutable_file(path, config.get_user_mutable_patterns(remote_name))
if user_mutable:
try:
changed = await check_upstream_changed(remote_url, remote_name, path)
changed = await check_upstream_changed(remote_url, remote_name, path, auth)
except UpstreamUnreachable:
cache.mark_index_cached(remote_name, path, mutable_ttl)
logger.warning(f"Mutable STALE (backend unreachable): {remote_name}/{path} - TTL extended ({mutable_ttl}s)")
@@ -310,7 +326,7 @@ async def handle_expired_mutable(remote_name: str, path: str, remote_url: str) -
return True
logger.info(f"Mutable file CHANGED: {remote_name}/{path} - re-downloading")
else:
if not await _upstream_reachable(remote_url):
if not await _upstream_reachable(remote_url, auth):
cache.mark_index_cached(remote_name, path, mutable_ttl)
logger.warning(f"Mutable STALE (backend unreachable): {remote_name}/{path} - TTL extended ({mutable_ttl}s)")
return True
@@ -320,8 +336,64 @@ async def handle_expired_mutable(remote_name: str, path: str, remote_url: str) -
return False
def _get_content_type(filename: str) -> str:
if filename.endswith((".tar.gz", ".tgz")):
return "application/gzip"
if filename.endswith(".zip") or filename.endswith(".whl"):
return "application/zip"
if filename.endswith(".exe"):
return "application/x-msdownload"
if filename.endswith(".rpm"):
return "application/x-rpm"
if filename.endswith(".xml"):
return "application/xml"
if filename.endswith((".xml.gz", ".xml.bz2", ".xml.xz")):
return "application/gzip"
if filename.endswith((".yaml", ".yml")):
return "text/yaml"
return "application/octet-stream"
def _resolve_content(
data: bytes,
path: str,
filename: str,
remote_config: dict,
request: Request,
remote_name: str = "",
) -> tuple[bytes, str]:
"""Return (possibly-rewritten data, content_type) for a cached artifact."""
if remote_config.get("package") == "pypi" and "simple/" in path:
files_url = remote_config.get("pypi_files_url", "https://files.pythonhosted.org")
files_remote = remote_config.get("pypi_files_remote", "pypi-files")
proxy_base = str(request.base_url).rstrip("/")
data = data.replace(
files_url.rstrip("/").encode(),
f"{proxy_base}/api/v1/remote/{files_remote}".encode(),
)
return data, "text/html; charset=utf-8"
if remote_config.get("package") == "npm" and not path.endswith(".tgz"):
files_url = remote_config.get("npm_files_url", "https://registry.npmjs.org")
files_remote = remote_config.get("npm_files_remote", "npm-files")
proxy_base = str(request.base_url).rstrip("/")
data = data.replace(
files_url.rstrip("/").encode(),
f"{proxy_base}/api/v1/remote/{files_remote}".encode(),
)
return data, "application/json"
if remote_config.get("package") == "helm" and filename == "index.yaml":
proxy_base = str(request.base_url).rstrip("/")
base_url = remote_config.get("base_url", "").rstrip("/")
data = data.replace(
base_url.encode(),
f"{proxy_base}/api/v1/remote/{remote_name}".encode(),
)
return data, "text/yaml"
return data, _get_content_type(filename)
@app.get("/api/v1/remote/{remote_name}/{path:path}")
async def get_artifact(remote_name: str, path: str):
async def get_artifact(request: Request, remote_name: str, path: str):
# Check if remote is configured
remote_config = config.get_remote_config(remote_name)
if not remote_config:
@@ -384,29 +456,11 @@ async def get_artifact(remote_name: str, path: str):
try:
artifact_data = storage.download_object(cached_key)
filename = os.path.basename(path)
artifact_data, content_type = _resolve_content(artifact_data, path, filename, remote_config, request, remote_name)
# Log cache hit
logger.info(f"Cache HIT: {remote_name}/{path} (size: {len(artifact_data)} bytes, key: {cached_key})")
# Determine content type based on file extension
content_type = "application/octet-stream"
if filename.endswith(".tar.gz"):
content_type = "application/gzip"
elif filename.endswith(".zip"):
content_type = "application/zip"
elif filename.endswith(".exe"):
content_type = "application/x-msdownload"
elif filename.endswith(".rpm"):
content_type = "application/x-rpm"
elif filename.endswith(".xml"):
content_type = "application/xml"
elif filename.endswith((".xml.gz", ".xml.bz2", ".xml.xz")):
content_type = "application/gzip"
# Record cache hit metrics
metrics.record_cache_hit(remote_name, len(artifact_data))
# Record artifact mapping in database if not already recorded
database.record_artifact_mapping(cached_key, remote_name, path, len(artifact_data))
return Response(
@@ -443,25 +497,9 @@ async def get_artifact(remote_name: str, path: str):
cache_key = storage.get_object_key(remote_name, path)
artifact_data = storage.download_object(cache_key)
filename = os.path.basename(path)
artifact_data, content_type = _resolve_content(artifact_data, path, filename, remote_config, request, remote_name)
content_type = "application/octet-stream"
if filename.endswith(".tar.gz"):
content_type = "application/gzip"
elif filename.endswith(".zip"):
content_type = "application/zip"
elif filename.endswith(".exe"):
content_type = "application/x-msdownload"
elif filename.endswith(".rpm"):
content_type = "application/x-rpm"
elif filename.endswith(".xml"):
content_type = "application/xml"
elif filename.endswith((".xml.gz", ".xml.bz2", ".xml.xz")):
content_type = "application/gzip"
# Record cache miss metrics
metrics.record_cache_miss(remote_name, len(artifact_data))
# Record artifact mapping in database
cache_key = storage.get_object_key(remote_name, path)
database.record_artifact_mapping(cache_key, remote_name, path, len(artifact_data))
+36
View File
@@ -72,6 +72,42 @@ TEST_REMOTES = {
"package": "generic",
"cache": {"immutable_ttl": 0, "mutable_ttl": 0},
},
"pypi-test": {
"base_url": "https://pypi.org",
"type": "remote",
"package": "pypi",
"pypi_files_url": "https://files.pythonhosted.org",
"pypi_files_remote": "pypi-files-test",
"cache": {"immutable_ttl": 0, "mutable_ttl": 600},
},
"pypi-files-test": {
"base_url": "https://files.pythonhosted.org",
"type": "remote",
"package": "generic",
"immutable_patterns": [
"packages/.*\\.whl$",
"packages/.*\\.whl\\.metadata$",
"packages/.*\\.tar\\.gz$",
],
"cache": {"immutable_ttl": 0, "mutable_ttl": 0},
},
"npm-test": {
"base_url": "https://registry.npmjs.org",
"type": "remote",
"package": "npm",
"npm_files_url": "https://registry.npmjs.org",
"npm_files_remote": "npm-test",
"immutable_patterns": [r"\.tgz$"],
"mutable_patterns": [r"^(?!.*\.tgz$).*"],
"cache": {"immutable_ttl": 0, "mutable_ttl": 600},
},
"helm-test": {
"base_url": "https://helm.releases.hashicorp.com",
"type": "remote",
"package": "helm",
"immutable_patterns": [r"\.tgz$"],
"cache": {"immutable_ttl": 0, "mutable_ttl": 3600},
},
}
}
+52
View File
@@ -133,6 +133,58 @@ class TestGetMutablePatterns:
assert r"repomd\.xml$" in patterns
assert r"custom-meta\.xml$" in patterns
def test_npm_has_no_package_defaults(self, make_config):
cfg = make_config({"r": {"type": "remote", "package": "npm", "base_url": "https://x.com"}})
assert cfg.get_mutable_patterns("r") == []
def test_npm_explicit_mutable_pattern_matches_metadata(self, make_config):
import re
cfg = make_config(
{
"r": {
"type": "remote",
"package": "npm",
"base_url": "https://x.com",
"mutable_patterns": [r"^(?!.*\.tgz$).*"],
}
}
)
patterns = cfg.get_mutable_patterns("r")
assert any(re.search(p, "express") for p in patterns)
assert any(re.search(p, "@babel/core") for p in patterns)
def test_helm_returns_index_yaml_as_mutable(self, make_config):
cfg = make_config({"r": {"type": "remote", "package": "helm", "base_url": "https://helm.example.com"}})
patterns = cfg.get_mutable_patterns("r")
assert r"index\.yaml$" in patterns
def test_helm_chart_tarballs_not_mutable_by_default(self, make_config):
import re
cfg = make_config({"r": {"type": "remote", "package": "helm", "base_url": "https://helm.example.com"}})
patterns = cfg.get_mutable_patterns("r")
# Only index.yaml is mutable; .tgz chart tarballs are not
assert not any(re.search(p, "vault-0.29.1.tgz") for p in patterns)
assert not any(re.search(p, "consul-1.5.0.tgz") for p in patterns)
def test_npm_explicit_mutable_pattern_excludes_tarballs(self, make_config):
import re
cfg = make_config(
{
"r": {
"type": "remote",
"package": "npm",
"base_url": "https://x.com",
"mutable_patterns": [r"^(?!.*\.tgz$).*"],
}
}
)
patterns = cfg.get_mutable_patterns("r")
assert not any(re.search(p, "express-4.18.2.tgz") for p in patterns)
assert not any(re.search(p, "express/-/express-4.18.2.tgz") for p in patterns)
# ---------------------------------------------------------------------------
# get_immutable_patterns
+272
View File
@@ -652,3 +652,275 @@ class TestConfigEndpoint:
data = response.json()
assert "remotes" in data
assert "alpine-test" in data["remotes"]
# ---------------------------------------------------------------------------
# PyPI remote /api/v1/remote/pypi-test/...
# ---------------------------------------------------------------------------
class TestPyPIRemote:
def test_simple_index_is_mutable(self, client, patched_deps):
"""simple/ paths are detected as mutable (package-type default)."""
deps = patched_deps
html = b"<html><body><a href='https://files.pythonhosted.org/packages/requests-2.31.0.tar.gz'>...</a></body></html>"
deps["storage"].exists.return_value = True
deps["storage"].download_object.return_value = html
deps["cache"].is_mutable_file.return_value = True
deps["cache"].is_index_valid.return_value = True
response = client.get("/api/v1/remote/pypi-test/simple/requests/")
assert response.status_code == 200
deps["cache"].mark_index_cached.assert_not_called()
def test_simple_index_urls_rewritten_to_proxy(self, client, patched_deps):
"""files.pythonhosted.org URLs in a cached simple index are rewritten to our proxy."""
deps = patched_deps
html = b"<html><body><a href='https://files.pythonhosted.org/packages/requests-2.31.0.tar.gz'>...</a></body></html>"
deps["storage"].exists.return_value = True
deps["storage"].download_object.return_value = html
deps["cache"].is_mutable_file.return_value = True
deps["cache"].is_index_valid.return_value = True
response = client.get("/api/v1/remote/pypi-test/simple/requests/")
assert response.status_code == 200
assert b"files.pythonhosted.org" not in response.content
assert b"/api/v1/remote/pypi-files-test/packages/requests-2.31.0.tar.gz" in response.content
def test_simple_index_content_type_is_html(self, client, patched_deps):
deps = patched_deps
deps["storage"].exists.return_value = True
deps["storage"].download_object.return_value = b"<html></html>"
deps["cache"].is_mutable_file.return_value = True
deps["cache"].is_index_valid.return_value = True
response = client.get("/api/v1/remote/pypi-test/simple/requests/")
assert response.status_code == 200
assert "text/html" in response.headers["content-type"]
def test_simple_index_cache_miss_fetches_upstream(self, client, patched_deps):
deps = patched_deps
html = b"<html><body><a href='https://files.pythonhosted.org/packages/p-1.0.whl'>...</a></body></html>"
deps["storage"].exists.return_value = False
deps["storage"].download_object.return_value = html
deps["cache"].is_mutable_file.return_value = True
with patch(
"artifactapi.main.cache_single_artifact",
new_callable=AsyncMock,
return_value={"status": "cached"},
) as mock_fetch:
response = client.get("/api/v1/remote/pypi-test/simple/requests/")
mock_fetch.assert_called_once()
assert response.status_code == 200
assert b"files.pythonhosted.org" not in response.content
def test_wheel_file_immutable_returns_correct_content_type(self, client, patched_deps):
deps = patched_deps
deps["storage"].exists.return_value = True
deps["storage"].download_object.return_value = b"PK wheel bytes"
deps["cache"].is_mutable_file.return_value = False
response = client.get("/api/v1/remote/pypi-files-test/packages/requests-2.31.0-py3-none-any.whl")
assert response.status_code == 200
assert "application/zip" in response.headers["content-type"]
assert response.headers["X-Artifact-Source"] == "cache"
def test_sdist_immutable_returns_correct_content_type(self, client, patched_deps):
deps = patched_deps
deps["storage"].exists.return_value = True
deps["storage"].download_object.return_value = b"tar bytes"
deps["cache"].is_mutable_file.return_value = False
response = client.get("/api/v1/remote/pypi-files-test/packages/requests-2.31.0.tar.gz")
assert response.status_code == 200
assert "application/gzip" in response.headers["content-type"]
def test_blocked_path_on_files_remote_returns_403(self, client, patched_deps):
"""Paths that don't match immutable_patterns on pypi-files-test are blocked."""
response = client.get("/api/v1/remote/pypi-files-test/packages/requests.unknown")
assert response.status_code == 403
# ---------------------------------------------------------------------------
# npm remote /api/v1/remote/npm-test/...
# ---------------------------------------------------------------------------
class TestNpmRemote:
def test_package_metadata_is_mutable(self, client, patched_deps):
"""Top-level package metadata paths are detected as mutable."""
deps = patched_deps
meta = b'{"name":"express","versions":{}}'
deps["storage"].exists.return_value = True
deps["storage"].download_object.return_value = meta
deps["cache"].is_mutable_file.return_value = True
deps["cache"].is_index_valid.return_value = True
response = client.get("/api/v1/remote/npm-test/express")
assert response.status_code == 200
deps["cache"].mark_index_cached.assert_not_called()
def test_metadata_tarball_urls_rewritten_to_proxy(self, client, patched_deps):
"""registry.npmjs.org tarball URLs in metadata JSON are rewritten to our proxy."""
deps = patched_deps
meta = b'{"dist":{"tarball":"https://registry.npmjs.org/express/-/express-4.18.2.tgz"}}'
deps["storage"].exists.return_value = True
deps["storage"].download_object.return_value = meta
deps["cache"].is_mutable_file.return_value = True
deps["cache"].is_index_valid.return_value = True
response = client.get("/api/v1/remote/npm-test/express")
assert response.status_code == 200
assert b"registry.npmjs.org" not in response.content
assert b"/api/v1/remote/npm-test/express/-/express-4.18.2.tgz" in response.content
def test_metadata_content_type_is_json(self, client, patched_deps):
deps = patched_deps
deps["storage"].exists.return_value = True
deps["storage"].download_object.return_value = b'{"name":"express"}'
deps["cache"].is_mutable_file.return_value = True
deps["cache"].is_index_valid.return_value = True
response = client.get("/api/v1/remote/npm-test/express")
assert response.status_code == 200
assert "application/json" in response.headers["content-type"]
def test_scoped_package_metadata_rewritten(self, client, patched_deps):
"""@scope/package metadata URLs are also rewritten back to the same npm-test remote."""
deps = patched_deps
meta = b'{"dist":{"tarball":"https://registry.npmjs.org/@babel/core/-/core-7.21.0.tgz"}}'
deps["storage"].exists.return_value = True
deps["storage"].download_object.return_value = meta
deps["cache"].is_mutable_file.return_value = True
deps["cache"].is_index_valid.return_value = True
response = client.get("/api/v1/remote/npm-test/@babel/core")
assert response.status_code == 200
assert b"registry.npmjs.org" not in response.content
assert b"/api/v1/remote/npm-test/@babel/core/-/core-7.21.0.tgz" in response.content
def test_tarball_not_rewritten(self, client, patched_deps):
"""Tarball requests (.tgz) bypass URL rewriting and return binary."""
deps = patched_deps
deps["storage"].exists.return_value = True
deps["storage"].download_object.return_value = b"\x1f\x8b tgz bytes"
deps["cache"].is_mutable_file.return_value = False
response = client.get("/api/v1/remote/npm-test/express/-/express-4.18.2.tgz")
assert response.status_code == 200
assert "application/gzip" in response.headers["content-type"]
assert response.headers["X-Artifact-Source"] == "cache"
def test_metadata_cache_miss_fetches_upstream(self, client, patched_deps):
deps = patched_deps
meta = b'{"dist":{"tarball":"https://registry.npmjs.org/lodash/-/lodash-4.17.21.tgz"}}'
deps["storage"].exists.return_value = False
deps["storage"].download_object.return_value = meta
deps["cache"].is_mutable_file.return_value = True
with patch(
"artifactapi.main.cache_single_artifact",
new_callable=AsyncMock,
return_value={"status": "cached"},
) as mock_fetch:
response = client.get("/api/v1/remote/npm-test/lodash")
mock_fetch.assert_called_once()
assert response.status_code == 200
assert b"registry.npmjs.org" not in response.content
def test_tarball_immutable_allowed_on_npm_remote(self, client, patched_deps):
"""Tarballs (.tgz) match immutable_patterns and are served without rewriting."""
deps = patched_deps
deps["storage"].exists.return_value = True
deps["storage"].download_object.return_value = b"tgz bytes"
deps["cache"].is_mutable_file.return_value = False
response = client.get("/api/v1/remote/npm-test/express/-/express-4.18.2.tgz")
assert response.status_code == 200
assert "application/gzip" in response.headers["content-type"]
# ---------------------------------------------------------------------------
# Helm remote /api/v1/remote/helm-test/...
# ---------------------------------------------------------------------------
class TestHelmRemote:
def test_index_yaml_is_mutable(self, client, patched_deps):
"""index.yaml is detected as mutable (package-type default)."""
deps = patched_deps
index = b"apiVersion: v1\nentries:\n vault:\n - urls:\n - https://helm.releases.hashicorp.com/vault-0.29.1.tgz\n"
deps["storage"].exists.return_value = True
deps["storage"].download_object.return_value = index
deps["cache"].is_mutable_file.return_value = True
deps["cache"].is_index_valid.return_value = True
response = client.get("/api/v1/remote/helm-test/index.yaml")
assert response.status_code == 200
deps["cache"].mark_index_cached.assert_not_called()
def test_index_yaml_urls_rewritten_to_proxy(self, client, patched_deps):
"""base_url chart URLs in a cached index.yaml are rewritten to our proxy."""
deps = patched_deps
index = b"apiVersion: v1\nentries:\n vault:\n - urls:\n - https://helm.releases.hashicorp.com/vault-0.29.1.tgz\n"
deps["storage"].exists.return_value = True
deps["storage"].download_object.return_value = index
deps["cache"].is_mutable_file.return_value = True
deps["cache"].is_index_valid.return_value = True
response = client.get("/api/v1/remote/helm-test/index.yaml")
assert response.status_code == 200
assert b"helm.releases.hashicorp.com" not in response.content
assert b"/api/v1/remote/helm-test/vault-0.29.1.tgz" in response.content
def test_index_yaml_content_type_is_yaml(self, client, patched_deps):
deps = patched_deps
deps["storage"].exists.return_value = True
deps["storage"].download_object.return_value = b"apiVersion: v1\nentries: {}\n"
deps["cache"].is_mutable_file.return_value = True
deps["cache"].is_index_valid.return_value = True
response = client.get("/api/v1/remote/helm-test/index.yaml")
assert response.status_code == 200
assert "text/yaml" in response.headers["content-type"]
def test_chart_tarball_immutable_returns_gzip_content_type(self, client, patched_deps):
"""Versioned chart tarballs match immutable_patterns and are served as binary."""
deps = patched_deps
deps["storage"].exists.return_value = True
deps["storage"].download_object.return_value = b"\x1f\x8b chart bytes"
deps["cache"].is_mutable_file.return_value = False
response = client.get("/api/v1/remote/helm-test/vault-0.29.1.tgz")
assert response.status_code == 200
assert "application/gzip" in response.headers["content-type"]
assert response.headers["X-Artifact-Source"] == "cache"
def test_index_yaml_cache_miss_fetches_upstream(self, client, patched_deps):
deps = patched_deps
index = b"apiVersion: v1\nentries:\n vault:\n - urls:\n - https://helm.releases.hashicorp.com/vault-0.29.1.tgz\n"
deps["storage"].exists.return_value = False
deps["storage"].download_object.return_value = index
deps["cache"].is_mutable_file.return_value = True
with patch(
"artifactapi.main.cache_single_artifact",
new_callable=AsyncMock,
return_value={"status": "cached"},
) as mock_fetch:
response = client.get("/api/v1/remote/helm-test/index.yaml")
mock_fetch.assert_called_once()
assert response.status_code == 200
assert b"helm.releases.hashicorp.com" not in response.content
def test_non_tgz_non_yaml_path_blocked_by_pattern(self, client, patched_deps):
"""Paths that don't match immutable_patterns and aren't mutable are blocked."""
deps = patched_deps
deps["cache"].is_mutable_file.return_value = False
response = client.get("/api/v1/remote/helm-test/vault.zip")
assert response.status_code == 403