Compare commits
5 Commits
| Author | SHA1 | Date | |
|---|---|---|---|
| 25b85ddc92 | |||
| d585ab425c | |||
| 6b1a6c9eb4 | |||
| 5de912db75 | |||
| 8e9d313892 |
@@ -13,6 +13,7 @@ A generic FastAPI-based artifact caching system that downloads and stores files
|
|||||||
- **Stale-on-Upstream-Error**: Expired mutable files are kept and their TTL refreshed when the backend cannot be reached, so cached data remains available during upstream outages
|
- **Stale-on-Upstream-Error**: Expired mutable files are kept and their TTL refreshed when the backend cannot be reached, so cached data remains available during upstream outages
|
||||||
- **S3 Storage**: MinIO/S3 backend with predictable paths
|
- **S3 Storage**: MinIO/S3 backend with predictable paths
|
||||||
- **Docker Registry Proxy**: Full Docker Registry HTTP API v2 for transparent container image caching
|
- **Docker Registry Proxy**: Full Docker Registry HTTP API v2 for transparent container image caching
|
||||||
|
- **npm Package Proxy**: Caching proxy for the npm registry with metadata URL rewriting so tarballs also pass through cache
|
||||||
- **Content-Type Detection**: Automatic MIME type detection for downloads
|
- **Content-Type Detection**: Automatic MIME type detection for downloads
|
||||||
|
|
||||||
## Architecture
|
## Architecture
|
||||||
@@ -931,4 +932,168 @@ curl -I https://artifacts.example.com/v2/dockerhub/library/nginx/manifests/lates
|
|||||||
|
|
||||||
# Check what's stored in the cache
|
# Check what's stored in the cache
|
||||||
curl https://artifacts.example.com/ | jq '.remotes'
|
curl https://artifacts.example.com/ | jq '.remotes'
|
||||||
```
|
```
|
||||||
|
|
||||||
|
## Python Package Proxy with uv
|
||||||
|
|
||||||
|
The `pypi` package type turns the artifact API into a caching PyPI proxy. Simple index pages (`/simple/{package}/`) are mutable and expire after `mutable_ttl`; package files (wheels, sdists, metadata) are immutable and cached forever. URLs in the simple index HTML are rewritten on the fly to point back through the proxy, so both the index lookup and the file download are served from cache.
|
||||||
|
|
||||||
|
### remotes.yaml
|
||||||
|
|
||||||
|
```yaml
|
||||||
|
remotes:
|
||||||
|
pypi:
|
||||||
|
base_url: "https://pypi.org"
|
||||||
|
type: "remote"
|
||||||
|
package: "pypi"
|
||||||
|
pypi_files_url: "https://files.pythonhosted.org" # host to rewrite in index HTML
|
||||||
|
pypi_files_remote: "pypi-files" # our proxy remote to replace it with
|
||||||
|
check_mutable_updates: true
|
||||||
|
cache:
|
||||||
|
immutable_ttl: 0
|
||||||
|
mutable_ttl: 600 # re-check simple indexes after 10 minutes
|
||||||
|
|
||||||
|
pypi-files:
|
||||||
|
base_url: "https://files.pythonhosted.org"
|
||||||
|
type: "remote"
|
||||||
|
package: "generic"
|
||||||
|
immutable_patterns:
|
||||||
|
- "packages/.*\\.whl$"
|
||||||
|
- "packages/.*\\.whl\\.metadata$"
|
||||||
|
- "packages/.*\\.tar\\.gz$"
|
||||||
|
- "packages/.*\\.zip$"
|
||||||
|
- "packages/.*\\.egg$"
|
||||||
|
cache:
|
||||||
|
immutable_ttl: 0 # package files are content-addressed — cache forever
|
||||||
|
|
||||||
|
# Self-hosted Gitea PyPI registry (index and files share the same base URL)
|
||||||
|
pypi-gitea:
|
||||||
|
base_url: "https://gitea.example.com/api/packages/myorg/pypi"
|
||||||
|
type: "remote"
|
||||||
|
package: "pypi"
|
||||||
|
# username: "your-gitea-username"
|
||||||
|
# password: "your-personal-access-token" # needs package:read scope
|
||||||
|
pypi_files_url: "https://gitea.example.com/api/packages/myorg/pypi"
|
||||||
|
pypi_files_remote: "pypi-gitea" # point back to itself — Gitea serves both index and files
|
||||||
|
check_mutable_updates: true
|
||||||
|
immutable_patterns:
|
||||||
|
- "files/.*\\.whl$"
|
||||||
|
- "files/.*\\.whl\\.metadata$"
|
||||||
|
- "files/.*\\.tar\\.gz$"
|
||||||
|
- "files/.*\\.zip$"
|
||||||
|
- "files/.*\\.egg$"
|
||||||
|
cache:
|
||||||
|
immutable_ttl: 0
|
||||||
|
mutable_ttl: 600
|
||||||
|
```
|
||||||
|
|
||||||
|
### Configuring uv system- or user-wide
|
||||||
|
|
||||||
|
uv reads `uv.toml` from two locations outside any project, applied in order from broadest to narrowest scope:
|
||||||
|
|
||||||
|
| Scope | Path (Linux/macOS) |
|
||||||
|
|---|---|
|
||||||
|
| System | `/etc/uv/uv.toml` |
|
||||||
|
| User | `~/.config/uv/uv.toml` |
|
||||||
|
|
||||||
|
Use these files to route **all** package installs on a machine through the proxy without touching individual projects or their `pyproject.toml`.
|
||||||
|
|
||||||
|
**`/etc/uv/uv.toml`** — applies to every user on the host:
|
||||||
|
|
||||||
|
```toml
|
||||||
|
# Replace the default PyPI index with the caching proxy
|
||||||
|
[[index]]
|
||||||
|
url = "https://artifacts.example.com/api/v1/remote/pypi/simple"
|
||||||
|
default = true
|
||||||
|
|
||||||
|
# Optionally add a private index (searched alongside the default)
|
||||||
|
[[index]]
|
||||||
|
url = "https://artifacts.example.com/api/v1/remote/pypi-gitea/simple"
|
||||||
|
name = "gitea"
|
||||||
|
```
|
||||||
|
|
||||||
|
**`~/.config/uv/uv.toml`** — same syntax, single-user scope:
|
||||||
|
|
||||||
|
```toml
|
||||||
|
[[index]]
|
||||||
|
url = "https://artifacts.example.com/api/v1/remote/pypi/simple"
|
||||||
|
default = true
|
||||||
|
```
|
||||||
|
|
||||||
|
Setting `default = true` replaces uv's built-in PyPI index. The first install of a package fetches it from upstream and populates the cache; every subsequent install — from any machine or fresh environment pointing at the same proxy — is served directly from S3.
|
||||||
|
|
||||||
|
### How the rewriting works
|
||||||
|
|
||||||
|
When uv requests the simple index for a package, the proxy:
|
||||||
|
|
||||||
|
1. Fetches `https://pypi.org/simple/{package}/` (or returns a valid cached copy within `mutable_ttl`)
|
||||||
|
2. Rewrites every `https://files.pythonhosted.org/...` href to `https://artifacts.example.com/api/v1/remote/pypi-files/...`
|
||||||
|
3. Returns the rewritten HTML to uv
|
||||||
|
|
||||||
|
uv then downloads wheels and `.whl.metadata` files via the rewritten URLs, which also pass through the proxy and are cached as immutable artifacts.
|
||||||
|
|
||||||
|
For self-hosted registries like Gitea, both the index and file downloads share the same base URL. Setting `pypi_files_url` and `pypi_files_remote` to the same remote causes file links to be rewritten back through the same proxy entry.
|
||||||
|
|
||||||
|
## npm Package Proxy
|
||||||
|
|
||||||
|
The `npm` package type turns the artifact API into a caching npm registry proxy. Since the npm registry serves both metadata and tarballs from the same host, a single remote handles everything. Package metadata (e.g. `GET /express`) is mutable and expires after `mutable_ttl`; tarballs (`.tgz`) are immutable and cached forever. `dist.tarball` URLs in metadata JSON are rewritten on the fly to point back through the same remote, so both the metadata lookup and the tarball download are served from cache.
|
||||||
|
|
||||||
|
### remotes.yaml
|
||||||
|
|
||||||
|
```yaml
|
||||||
|
remotes:
|
||||||
|
npm:
|
||||||
|
base_url: "https://registry.npmjs.org"
|
||||||
|
type: "remote"
|
||||||
|
package: "npm"
|
||||||
|
npm_files_url: "https://registry.npmjs.org" # URL prefix to rewrite in metadata JSON
|
||||||
|
npm_files_remote: "npm" # rewrite back to this same remote
|
||||||
|
check_mutable_updates: true
|
||||||
|
immutable_patterns:
|
||||||
|
- "\.tgz$" # versioned tarballs are content-addressed — cache forever
|
||||||
|
mutable_patterns:
|
||||||
|
- "^(?!.*\.tgz$).*" # everything else (package metadata) expires after mutable_ttl
|
||||||
|
cache:
|
||||||
|
immutable_ttl: 0
|
||||||
|
mutable_ttl: 600 # re-check package metadata after 10 minutes
|
||||||
|
```
|
||||||
|
|
||||||
|
### Configuring npm / yarn / pnpm
|
||||||
|
|
||||||
|
**npm** — per-project `.npmrc` or `~/.npmrc`:
|
||||||
|
|
||||||
|
```ini
|
||||||
|
registry=https://artifacts.example.com/api/v1/remote/npm/
|
||||||
|
```
|
||||||
|
|
||||||
|
**yarn** — `~/.yarnrc.yml`:
|
||||||
|
|
||||||
|
```yaml
|
||||||
|
npmRegistryServer: "https://artifacts.example.com/api/v1/remote/npm/"
|
||||||
|
```
|
||||||
|
|
||||||
|
**pnpm** — `.npmrc`:
|
||||||
|
|
||||||
|
```ini
|
||||||
|
registry=https://artifacts.example.com/api/v1/remote/npm/
|
||||||
|
```
|
||||||
|
|
||||||
|
### How the rewriting works
|
||||||
|
|
||||||
|
When a client requests package metadata, the proxy:
|
||||||
|
|
||||||
|
1. Fetches `https://registry.npmjs.org/{package}` (or returns a cached copy within `mutable_ttl`)
|
||||||
|
2. Rewrites every `https://registry.npmjs.org/...` tarball URL to `https://artifacts.example.com/api/v1/remote/npm/...`
|
||||||
|
3. Returns the rewritten JSON to the client
|
||||||
|
|
||||||
|
The client then downloads the tarball via the rewritten URL, which hits the same `npm` remote and is cached as an immutable artifact. Subsequent installs of the same package version are served entirely from S3.
|
||||||
|
|
||||||
|
### Mutable vs immutable paths
|
||||||
|
|
||||||
|
| Path pattern | Type | Example |
|
||||||
|
|---|---|---|
|
||||||
|
| `/{package}` | Mutable (TTL) | `/express` |
|
||||||
|
| `/@{scope}/{package}` | Mutable (TTL) | `/@babel/core` |
|
||||||
|
| `/-/all` | Mutable (TTL) | `/-/all` |
|
||||||
|
| `/{package}/-/{package}-{version}.tgz` | Immutable (forever) | `/express/-/express-4.18.2.tgz` |
|
||||||
|
| `/@{scope}/{pkg}/-/{pkg}-{ver}.tgz` | Immutable (forever) | `/@babel/core/-/core-7.21.0.tgz` |
|
||||||
@@ -194,6 +194,73 @@ remotes:
|
|||||||
immutable_ttl: 0
|
immutable_ttl: 0
|
||||||
mutable_ttl: 300
|
mutable_ttl: 300
|
||||||
|
|
||||||
|
pypi:
|
||||||
|
base_url: "https://pypi.org"
|
||||||
|
type: "remote"
|
||||||
|
package: "pypi"
|
||||||
|
description: "Python Package Index — simple repository API"
|
||||||
|
# pypi_files_url: the upstream host used in simple-index hrefs (default: files.pythonhosted.org)
|
||||||
|
# pypi_files_remote: our proxy remote that will serve those files (default: pypi-files)
|
||||||
|
pypi_files_url: "https://files.pythonhosted.org"
|
||||||
|
pypi_files_remote: "pypi-files"
|
||||||
|
check_mutable_updates: true
|
||||||
|
cache:
|
||||||
|
immutable_ttl: 0
|
||||||
|
mutable_ttl: 600 # Simple index pages refreshed after 10 minutes
|
||||||
|
|
||||||
|
pypi-gitea:
|
||||||
|
base_url: "https://gitea.example.com/api/packages/myorg/pypi"
|
||||||
|
type: "remote"
|
||||||
|
package: "pypi"
|
||||||
|
description: "Private Gitea PyPI registry"
|
||||||
|
# username: "your-gitea-username"
|
||||||
|
# password: "your-personal-access-token" # needs package:read scope
|
||||||
|
# Files are served from the same Gitea instance — rewrite back to this same remote
|
||||||
|
pypi_files_url: "https://gitea.example.com/api/packages/myorg/pypi"
|
||||||
|
pypi_files_remote: "pypi-gitea"
|
||||||
|
check_mutable_updates: true
|
||||||
|
immutable_patterns:
|
||||||
|
- "files/.*\\.whl$"
|
||||||
|
- "files/.*\\.whl\\.metadata$"
|
||||||
|
- "files/.*\\.tar\\.gz$"
|
||||||
|
- "files/.*\\.zip$"
|
||||||
|
- "files/.*\\.egg$"
|
||||||
|
cache:
|
||||||
|
immutable_ttl: 0
|
||||||
|
mutable_ttl: 600
|
||||||
|
|
||||||
|
pypi-files:
|
||||||
|
base_url: "https://files.pythonhosted.org"
|
||||||
|
type: "remote"
|
||||||
|
package: "generic"
|
||||||
|
description: "Python Package Index — file storage (wheels, sdists)"
|
||||||
|
immutable_patterns:
|
||||||
|
- "packages/.*\\.whl$"
|
||||||
|
- "packages/.*\\.whl\\.metadata$"
|
||||||
|
- "packages/.*\\.tar\\.gz$"
|
||||||
|
- "packages/.*\\.zip$"
|
||||||
|
- "packages/.*\\.egg$"
|
||||||
|
cache:
|
||||||
|
immutable_ttl: 0 # Package files are content-addressed — cache forever
|
||||||
|
|
||||||
|
npm:
|
||||||
|
base_url: "https://registry.npmjs.org"
|
||||||
|
type: "remote"
|
||||||
|
package: "npm"
|
||||||
|
description: "npm registry — package metadata with tarball URL rewriting"
|
||||||
|
# npm_files_url: the upstream host used in metadata tarball hrefs (default: https://registry.npmjs.org)
|
||||||
|
# npm_files_remote: our proxy remote that will serve those tarballs (default: npm-files)
|
||||||
|
npm_files_url: "https://registry.npmjs.org"
|
||||||
|
npm_files_remote: "npm"
|
||||||
|
check_mutable_updates: true
|
||||||
|
immutable_patterns:
|
||||||
|
- \.tgz$
|
||||||
|
mutable_patterns:
|
||||||
|
- ^(?!.*\.tgz$).*
|
||||||
|
cache:
|
||||||
|
immutable_ttl: 0
|
||||||
|
mutable_ttl: 600 # Package metadata refreshed after 10 minutes
|
||||||
|
|
||||||
local-generic:
|
local-generic:
|
||||||
type: "local"
|
type: "local"
|
||||||
package: "generic"
|
package: "generic"
|
||||||
|
|||||||
@@ -18,6 +18,10 @@ _PACKAGE_MUTABLE_PATTERNS: dict[str, list[str]] = {
|
|||||||
r"/manifests/(?!sha256:)[^/]+$",
|
r"/manifests/(?!sha256:)[^/]+$",
|
||||||
r"/tags/list$",
|
r"/tags/list$",
|
||||||
],
|
],
|
||||||
|
"pypi": [
|
||||||
|
r"simple/", # Per-package and top-level simple index pages
|
||||||
|
],
|
||||||
|
"npm": [],
|
||||||
"generic": [],
|
"generic": [],
|
||||||
}
|
}
|
||||||
|
|
||||||
|
|||||||
+72
-45
@@ -1,3 +1,4 @@
|
|||||||
|
import base64
|
||||||
import hashlib
|
import hashlib
|
||||||
import json
|
import json
|
||||||
import logging
|
import logging
|
||||||
@@ -208,8 +209,11 @@ async def cache_single_artifact(url: str, remote_name: str, path: str) -> dict:
|
|||||||
remote_config = config.get_remote_config(remote_name) or {}
|
remote_config = config.get_remote_config(remote_name) or {}
|
||||||
is_docker = remote_config.get("package") == "docker" or "/v2/" in url
|
is_docker = remote_config.get("package") == "docker" or "/v2/" in url
|
||||||
|
|
||||||
# Prepare headers for Docker registry requests
|
# Prepare headers
|
||||||
headers = {}
|
headers = {}
|
||||||
|
username = remote_config.get("username")
|
||||||
|
password = remote_config.get("password")
|
||||||
|
|
||||||
if is_docker:
|
if is_docker:
|
||||||
if "/manifests/" in url:
|
if "/manifests/" in url:
|
||||||
headers["Accept"] = (
|
headers["Accept"] = (
|
||||||
@@ -220,6 +224,8 @@ async def cache_single_artifact(url: str, remote_name: str, path: str) -> dict:
|
|||||||
)
|
)
|
||||||
elif "/blobs/" in url:
|
elif "/blobs/" in url:
|
||||||
headers["Accept"] = "application/octet-stream"
|
headers["Accept"] = "application/octet-stream"
|
||||||
|
elif username and password:
|
||||||
|
headers["Authorization"] = "Basic " + base64.b64encode(f"{username}:{password}".encode()).decode()
|
||||||
|
|
||||||
async with httpx.AsyncClient(follow_redirects=True) as client:
|
async with httpx.AsyncClient(follow_redirects=True) as client:
|
||||||
response = await client.get(url, headers=headers)
|
response = await client.get(url, headers=headers)
|
||||||
@@ -254,11 +260,20 @@ async def cache_single_artifact(url: str, remote_name: str, path: str) -> dict:
|
|||||||
return {"url": url, "status": "error", "error": str(e)}
|
return {"url": url, "status": "error", "error": str(e)}
|
||||||
|
|
||||||
|
|
||||||
async def _upstream_reachable(url: str) -> bool:
|
def _basic_auth_header(remote_cfg: dict) -> dict[str, str]:
|
||||||
|
username = remote_cfg.get("username")
|
||||||
|
password = remote_cfg.get("password")
|
||||||
|
if username and password:
|
||||||
|
token = base64.b64encode(f"{username}:{password}".encode()).decode()
|
||||||
|
return {"Authorization": f"Basic {token}"}
|
||||||
|
return {}
|
||||||
|
|
||||||
|
|
||||||
|
async def _upstream_reachable(url: str, auth_headers: dict | None = None) -> bool:
|
||||||
"""HEAD with a short timeout. Returns False only on network/timeout errors."""
|
"""HEAD with a short timeout. Returns False only on network/timeout errors."""
|
||||||
try:
|
try:
|
||||||
async with httpx.AsyncClient(follow_redirects=True) as client:
|
async with httpx.AsyncClient(follow_redirects=True) as client:
|
||||||
await client.head(url, timeout=10.0)
|
await client.head(url, headers=auth_headers or {}, timeout=10.0)
|
||||||
return True
|
return True
|
||||||
except (httpx.NetworkError, httpx.TimeoutException):
|
except (httpx.NetworkError, httpx.TimeoutException):
|
||||||
return False
|
return False
|
||||||
@@ -266,19 +281,19 @@ async def _upstream_reachable(url: str) -> bool:
|
|||||||
return True # 4xx/5xx means backend is up
|
return True # 4xx/5xx means backend is up
|
||||||
|
|
||||||
|
|
||||||
async def check_upstream_changed(remote_url: str, remote_name: str, path: str) -> bool:
|
async def check_upstream_changed(remote_url: str, remote_name: str, path: str, auth_headers: dict | None = None) -> bool:
|
||||||
"""Conditional HEAD against upstream. Returns False only on a definitive 304.
|
"""Conditional HEAD against upstream. Returns False only on a definitive 304.
|
||||||
Raises UpstreamUnreachable if the backend cannot be contacted."""
|
Raises UpstreamUnreachable if the backend cannot be contacted."""
|
||||||
meta = cache.get_mutable_meta(remote_name, path)
|
meta = cache.get_mutable_meta(remote_name, path)
|
||||||
if not meta:
|
if not meta:
|
||||||
return True
|
return True
|
||||||
|
|
||||||
headers = {}
|
headers = dict(auth_headers or {})
|
||||||
if meta.get("etag"):
|
if meta.get("etag"):
|
||||||
headers["If-None-Match"] = meta["etag"]
|
headers["If-None-Match"] = meta["etag"]
|
||||||
if meta.get("last_modified"):
|
if meta.get("last_modified"):
|
||||||
headers["If-Modified-Since"] = meta["last_modified"]
|
headers["If-Modified-Since"] = meta["last_modified"]
|
||||||
if not headers:
|
if not (meta.get("etag") or meta.get("last_modified")):
|
||||||
return True
|
return True
|
||||||
|
|
||||||
try:
|
try:
|
||||||
@@ -294,12 +309,13 @@ async def handle_expired_mutable(remote_name: str, path: str, remote_url: str) -
|
|||||||
mutable_ttl = config.get_cache_config(remote_name).get("mutable_ttl", 3600)
|
mutable_ttl = config.get_cache_config(remote_name).get("mutable_ttl", 3600)
|
||||||
|
|
||||||
remote_cfg = config.get_remote_config(remote_name) or {}
|
remote_cfg = config.get_remote_config(remote_name) or {}
|
||||||
|
auth = _basic_auth_header(remote_cfg)
|
||||||
check_updates = remote_cfg.get("check_mutable_updates", False)
|
check_updates = remote_cfg.get("check_mutable_updates", False)
|
||||||
user_mutable = check_updates and cache.is_mutable_file(path, config.get_user_mutable_patterns(remote_name))
|
user_mutable = check_updates and cache.is_mutable_file(path, config.get_user_mutable_patterns(remote_name))
|
||||||
|
|
||||||
if user_mutable:
|
if user_mutable:
|
||||||
try:
|
try:
|
||||||
changed = await check_upstream_changed(remote_url, remote_name, path)
|
changed = await check_upstream_changed(remote_url, remote_name, path, auth)
|
||||||
except UpstreamUnreachable:
|
except UpstreamUnreachable:
|
||||||
cache.mark_index_cached(remote_name, path, mutable_ttl)
|
cache.mark_index_cached(remote_name, path, mutable_ttl)
|
||||||
logger.warning(f"Mutable STALE (backend unreachable): {remote_name}/{path} - TTL extended ({mutable_ttl}s)")
|
logger.warning(f"Mutable STALE (backend unreachable): {remote_name}/{path} - TTL extended ({mutable_ttl}s)")
|
||||||
@@ -310,7 +326,7 @@ async def handle_expired_mutable(remote_name: str, path: str, remote_url: str) -
|
|||||||
return True
|
return True
|
||||||
logger.info(f"Mutable file CHANGED: {remote_name}/{path} - re-downloading")
|
logger.info(f"Mutable file CHANGED: {remote_name}/{path} - re-downloading")
|
||||||
else:
|
else:
|
||||||
if not await _upstream_reachable(remote_url):
|
if not await _upstream_reachable(remote_url, auth):
|
||||||
cache.mark_index_cached(remote_name, path, mutable_ttl)
|
cache.mark_index_cached(remote_name, path, mutable_ttl)
|
||||||
logger.warning(f"Mutable STALE (backend unreachable): {remote_name}/{path} - TTL extended ({mutable_ttl}s)")
|
logger.warning(f"Mutable STALE (backend unreachable): {remote_name}/{path} - TTL extended ({mutable_ttl}s)")
|
||||||
return True
|
return True
|
||||||
@@ -320,8 +336,53 @@ async def handle_expired_mutable(remote_name: str, path: str, remote_url: str) -
|
|||||||
return False
|
return False
|
||||||
|
|
||||||
|
|
||||||
|
def _get_content_type(filename: str) -> str:
|
||||||
|
if filename.endswith((".tar.gz", ".tgz")):
|
||||||
|
return "application/gzip"
|
||||||
|
if filename.endswith(".zip") or filename.endswith(".whl"):
|
||||||
|
return "application/zip"
|
||||||
|
if filename.endswith(".exe"):
|
||||||
|
return "application/x-msdownload"
|
||||||
|
if filename.endswith(".rpm"):
|
||||||
|
return "application/x-rpm"
|
||||||
|
if filename.endswith(".xml"):
|
||||||
|
return "application/xml"
|
||||||
|
if filename.endswith((".xml.gz", ".xml.bz2", ".xml.xz")):
|
||||||
|
return "application/gzip"
|
||||||
|
return "application/octet-stream"
|
||||||
|
|
||||||
|
|
||||||
|
def _resolve_content(
|
||||||
|
data: bytes,
|
||||||
|
path: str,
|
||||||
|
filename: str,
|
||||||
|
remote_config: dict,
|
||||||
|
request: Request,
|
||||||
|
) -> tuple[bytes, str]:
|
||||||
|
"""Return (possibly-rewritten data, content_type) for a cached artifact."""
|
||||||
|
if remote_config.get("package") == "pypi" and "simple/" in path:
|
||||||
|
files_url = remote_config.get("pypi_files_url", "https://files.pythonhosted.org")
|
||||||
|
files_remote = remote_config.get("pypi_files_remote", "pypi-files")
|
||||||
|
proxy_base = str(request.base_url).rstrip("/")
|
||||||
|
data = data.replace(
|
||||||
|
files_url.rstrip("/").encode(),
|
||||||
|
f"{proxy_base}/api/v1/remote/{files_remote}".encode(),
|
||||||
|
)
|
||||||
|
return data, "text/html; charset=utf-8"
|
||||||
|
if remote_config.get("package") == "npm" and not path.endswith(".tgz"):
|
||||||
|
files_url = remote_config.get("npm_files_url", "https://registry.npmjs.org")
|
||||||
|
files_remote = remote_config.get("npm_files_remote", "npm-files")
|
||||||
|
proxy_base = str(request.base_url).rstrip("/")
|
||||||
|
data = data.replace(
|
||||||
|
files_url.rstrip("/").encode(),
|
||||||
|
f"{proxy_base}/api/v1/remote/{files_remote}".encode(),
|
||||||
|
)
|
||||||
|
return data, "application/json"
|
||||||
|
return data, _get_content_type(filename)
|
||||||
|
|
||||||
|
|
||||||
@app.get("/api/v1/remote/{remote_name}/{path:path}")
|
@app.get("/api/v1/remote/{remote_name}/{path:path}")
|
||||||
async def get_artifact(remote_name: str, path: str):
|
async def get_artifact(request: Request, remote_name: str, path: str):
|
||||||
# Check if remote is configured
|
# Check if remote is configured
|
||||||
remote_config = config.get_remote_config(remote_name)
|
remote_config = config.get_remote_config(remote_name)
|
||||||
if not remote_config:
|
if not remote_config:
|
||||||
@@ -384,29 +445,11 @@ async def get_artifact(remote_name: str, path: str):
|
|||||||
try:
|
try:
|
||||||
artifact_data = storage.download_object(cached_key)
|
artifact_data = storage.download_object(cached_key)
|
||||||
filename = os.path.basename(path)
|
filename = os.path.basename(path)
|
||||||
|
artifact_data, content_type = _resolve_content(artifact_data, path, filename, remote_config, request)
|
||||||
|
|
||||||
# Log cache hit
|
|
||||||
logger.info(f"Cache HIT: {remote_name}/{path} (size: {len(artifact_data)} bytes, key: {cached_key})")
|
logger.info(f"Cache HIT: {remote_name}/{path} (size: {len(artifact_data)} bytes, key: {cached_key})")
|
||||||
|
|
||||||
# Determine content type based on file extension
|
|
||||||
content_type = "application/octet-stream"
|
|
||||||
if filename.endswith(".tar.gz"):
|
|
||||||
content_type = "application/gzip"
|
|
||||||
elif filename.endswith(".zip"):
|
|
||||||
content_type = "application/zip"
|
|
||||||
elif filename.endswith(".exe"):
|
|
||||||
content_type = "application/x-msdownload"
|
|
||||||
elif filename.endswith(".rpm"):
|
|
||||||
content_type = "application/x-rpm"
|
|
||||||
elif filename.endswith(".xml"):
|
|
||||||
content_type = "application/xml"
|
|
||||||
elif filename.endswith((".xml.gz", ".xml.bz2", ".xml.xz")):
|
|
||||||
content_type = "application/gzip"
|
|
||||||
|
|
||||||
# Record cache hit metrics
|
|
||||||
metrics.record_cache_hit(remote_name, len(artifact_data))
|
metrics.record_cache_hit(remote_name, len(artifact_data))
|
||||||
|
|
||||||
# Record artifact mapping in database if not already recorded
|
|
||||||
database.record_artifact_mapping(cached_key, remote_name, path, len(artifact_data))
|
database.record_artifact_mapping(cached_key, remote_name, path, len(artifact_data))
|
||||||
|
|
||||||
return Response(
|
return Response(
|
||||||
@@ -443,25 +486,9 @@ async def get_artifact(remote_name: str, path: str):
|
|||||||
cache_key = storage.get_object_key(remote_name, path)
|
cache_key = storage.get_object_key(remote_name, path)
|
||||||
artifact_data = storage.download_object(cache_key)
|
artifact_data = storage.download_object(cache_key)
|
||||||
filename = os.path.basename(path)
|
filename = os.path.basename(path)
|
||||||
|
artifact_data, content_type = _resolve_content(artifact_data, path, filename, remote_config, request)
|
||||||
|
|
||||||
content_type = "application/octet-stream"
|
|
||||||
if filename.endswith(".tar.gz"):
|
|
||||||
content_type = "application/gzip"
|
|
||||||
elif filename.endswith(".zip"):
|
|
||||||
content_type = "application/zip"
|
|
||||||
elif filename.endswith(".exe"):
|
|
||||||
content_type = "application/x-msdownload"
|
|
||||||
elif filename.endswith(".rpm"):
|
|
||||||
content_type = "application/x-rpm"
|
|
||||||
elif filename.endswith(".xml"):
|
|
||||||
content_type = "application/xml"
|
|
||||||
elif filename.endswith((".xml.gz", ".xml.bz2", ".xml.xz")):
|
|
||||||
content_type = "application/gzip"
|
|
||||||
|
|
||||||
# Record cache miss metrics
|
|
||||||
metrics.record_cache_miss(remote_name, len(artifact_data))
|
metrics.record_cache_miss(remote_name, len(artifact_data))
|
||||||
|
|
||||||
# Record artifact mapping in database
|
|
||||||
cache_key = storage.get_object_key(remote_name, path)
|
cache_key = storage.get_object_key(remote_name, path)
|
||||||
database.record_artifact_mapping(cache_key, remote_name, path, len(artifact_data))
|
database.record_artifact_mapping(cache_key, remote_name, path, len(artifact_data))
|
||||||
|
|
||||||
|
|||||||
@@ -72,6 +72,35 @@ TEST_REMOTES = {
|
|||||||
"package": "generic",
|
"package": "generic",
|
||||||
"cache": {"immutable_ttl": 0, "mutable_ttl": 0},
|
"cache": {"immutable_ttl": 0, "mutable_ttl": 0},
|
||||||
},
|
},
|
||||||
|
"pypi-test": {
|
||||||
|
"base_url": "https://pypi.org",
|
||||||
|
"type": "remote",
|
||||||
|
"package": "pypi",
|
||||||
|
"pypi_files_url": "https://files.pythonhosted.org",
|
||||||
|
"pypi_files_remote": "pypi-files-test",
|
||||||
|
"cache": {"immutable_ttl": 0, "mutable_ttl": 600},
|
||||||
|
},
|
||||||
|
"pypi-files-test": {
|
||||||
|
"base_url": "https://files.pythonhosted.org",
|
||||||
|
"type": "remote",
|
||||||
|
"package": "generic",
|
||||||
|
"immutable_patterns": [
|
||||||
|
"packages/.*\\.whl$",
|
||||||
|
"packages/.*\\.whl\\.metadata$",
|
||||||
|
"packages/.*\\.tar\\.gz$",
|
||||||
|
],
|
||||||
|
"cache": {"immutable_ttl": 0, "mutable_ttl": 0},
|
||||||
|
},
|
||||||
|
"npm-test": {
|
||||||
|
"base_url": "https://registry.npmjs.org",
|
||||||
|
"type": "remote",
|
||||||
|
"package": "npm",
|
||||||
|
"npm_files_url": "https://registry.npmjs.org",
|
||||||
|
"npm_files_remote": "npm-test",
|
||||||
|
"immutable_patterns": [r"\.tgz$"],
|
||||||
|
"mutable_patterns": [r"^(?!.*\.tgz$).*"],
|
||||||
|
"cache": {"immutable_ttl": 0, "mutable_ttl": 600},
|
||||||
|
},
|
||||||
}
|
}
|
||||||
}
|
}
|
||||||
|
|
||||||
|
|||||||
@@ -133,6 +133,44 @@ class TestGetMutablePatterns:
|
|||||||
assert r"repomd\.xml$" in patterns
|
assert r"repomd\.xml$" in patterns
|
||||||
assert r"custom-meta\.xml$" in patterns
|
assert r"custom-meta\.xml$" in patterns
|
||||||
|
|
||||||
|
def test_npm_has_no_package_defaults(self, make_config):
|
||||||
|
cfg = make_config({"r": {"type": "remote", "package": "npm", "base_url": "https://x.com"}})
|
||||||
|
assert cfg.get_mutable_patterns("r") == []
|
||||||
|
|
||||||
|
def test_npm_explicit_mutable_pattern_matches_metadata(self, make_config):
|
||||||
|
import re
|
||||||
|
|
||||||
|
cfg = make_config(
|
||||||
|
{
|
||||||
|
"r": {
|
||||||
|
"type": "remote",
|
||||||
|
"package": "npm",
|
||||||
|
"base_url": "https://x.com",
|
||||||
|
"mutable_patterns": [r"^(?!.*\.tgz$).*"],
|
||||||
|
}
|
||||||
|
}
|
||||||
|
)
|
||||||
|
patterns = cfg.get_mutable_patterns("r")
|
||||||
|
assert any(re.search(p, "express") for p in patterns)
|
||||||
|
assert any(re.search(p, "@babel/core") for p in patterns)
|
||||||
|
|
||||||
|
def test_npm_explicit_mutable_pattern_excludes_tarballs(self, make_config):
|
||||||
|
import re
|
||||||
|
|
||||||
|
cfg = make_config(
|
||||||
|
{
|
||||||
|
"r": {
|
||||||
|
"type": "remote",
|
||||||
|
"package": "npm",
|
||||||
|
"base_url": "https://x.com",
|
||||||
|
"mutable_patterns": [r"^(?!.*\.tgz$).*"],
|
||||||
|
}
|
||||||
|
}
|
||||||
|
)
|
||||||
|
patterns = cfg.get_mutable_patterns("r")
|
||||||
|
assert not any(re.search(p, "express-4.18.2.tgz") for p in patterns)
|
||||||
|
assert not any(re.search(p, "express/-/express-4.18.2.tgz") for p in patterns)
|
||||||
|
|
||||||
|
|
||||||
# ---------------------------------------------------------------------------
|
# ---------------------------------------------------------------------------
|
||||||
# get_immutable_patterns
|
# get_immutable_patterns
|
||||||
|
|||||||
@@ -652,3 +652,192 @@ class TestConfigEndpoint:
|
|||||||
data = response.json()
|
data = response.json()
|
||||||
assert "remotes" in data
|
assert "remotes" in data
|
||||||
assert "alpine-test" in data["remotes"]
|
assert "alpine-test" in data["remotes"]
|
||||||
|
|
||||||
|
|
||||||
|
# ---------------------------------------------------------------------------
|
||||||
|
# PyPI remote /api/v1/remote/pypi-test/...
|
||||||
|
# ---------------------------------------------------------------------------
|
||||||
|
|
||||||
|
|
||||||
|
class TestPyPIRemote:
|
||||||
|
def test_simple_index_is_mutable(self, client, patched_deps):
|
||||||
|
"""simple/ paths are detected as mutable (package-type default)."""
|
||||||
|
deps = patched_deps
|
||||||
|
html = b"<html><body><a href='https://files.pythonhosted.org/packages/requests-2.31.0.tar.gz'>...</a></body></html>"
|
||||||
|
deps["storage"].exists.return_value = True
|
||||||
|
deps["storage"].download_object.return_value = html
|
||||||
|
deps["cache"].is_mutable_file.return_value = True
|
||||||
|
deps["cache"].is_index_valid.return_value = True
|
||||||
|
|
||||||
|
response = client.get("/api/v1/remote/pypi-test/simple/requests/")
|
||||||
|
assert response.status_code == 200
|
||||||
|
deps["cache"].mark_index_cached.assert_not_called()
|
||||||
|
|
||||||
|
def test_simple_index_urls_rewritten_to_proxy(self, client, patched_deps):
|
||||||
|
"""files.pythonhosted.org URLs in a cached simple index are rewritten to our proxy."""
|
||||||
|
deps = patched_deps
|
||||||
|
html = b"<html><body><a href='https://files.pythonhosted.org/packages/requests-2.31.0.tar.gz'>...</a></body></html>"
|
||||||
|
deps["storage"].exists.return_value = True
|
||||||
|
deps["storage"].download_object.return_value = html
|
||||||
|
deps["cache"].is_mutable_file.return_value = True
|
||||||
|
deps["cache"].is_index_valid.return_value = True
|
||||||
|
|
||||||
|
response = client.get("/api/v1/remote/pypi-test/simple/requests/")
|
||||||
|
assert response.status_code == 200
|
||||||
|
assert b"files.pythonhosted.org" not in response.content
|
||||||
|
assert b"/api/v1/remote/pypi-files-test/packages/requests-2.31.0.tar.gz" in response.content
|
||||||
|
|
||||||
|
def test_simple_index_content_type_is_html(self, client, patched_deps):
|
||||||
|
deps = patched_deps
|
||||||
|
deps["storage"].exists.return_value = True
|
||||||
|
deps["storage"].download_object.return_value = b"<html></html>"
|
||||||
|
deps["cache"].is_mutable_file.return_value = True
|
||||||
|
deps["cache"].is_index_valid.return_value = True
|
||||||
|
|
||||||
|
response = client.get("/api/v1/remote/pypi-test/simple/requests/")
|
||||||
|
assert response.status_code == 200
|
||||||
|
assert "text/html" in response.headers["content-type"]
|
||||||
|
|
||||||
|
def test_simple_index_cache_miss_fetches_upstream(self, client, patched_deps):
|
||||||
|
deps = patched_deps
|
||||||
|
html = b"<html><body><a href='https://files.pythonhosted.org/packages/p-1.0.whl'>...</a></body></html>"
|
||||||
|
deps["storage"].exists.return_value = False
|
||||||
|
deps["storage"].download_object.return_value = html
|
||||||
|
deps["cache"].is_mutable_file.return_value = True
|
||||||
|
|
||||||
|
with patch(
|
||||||
|
"artifactapi.main.cache_single_artifact",
|
||||||
|
new_callable=AsyncMock,
|
||||||
|
return_value={"status": "cached"},
|
||||||
|
) as mock_fetch:
|
||||||
|
response = client.get("/api/v1/remote/pypi-test/simple/requests/")
|
||||||
|
|
||||||
|
mock_fetch.assert_called_once()
|
||||||
|
assert response.status_code == 200
|
||||||
|
assert b"files.pythonhosted.org" not in response.content
|
||||||
|
|
||||||
|
def test_wheel_file_immutable_returns_correct_content_type(self, client, patched_deps):
|
||||||
|
deps = patched_deps
|
||||||
|
deps["storage"].exists.return_value = True
|
||||||
|
deps["storage"].download_object.return_value = b"PK wheel bytes"
|
||||||
|
deps["cache"].is_mutable_file.return_value = False
|
||||||
|
|
||||||
|
response = client.get("/api/v1/remote/pypi-files-test/packages/requests-2.31.0-py3-none-any.whl")
|
||||||
|
assert response.status_code == 200
|
||||||
|
assert "application/zip" in response.headers["content-type"]
|
||||||
|
assert response.headers["X-Artifact-Source"] == "cache"
|
||||||
|
|
||||||
|
def test_sdist_immutable_returns_correct_content_type(self, client, patched_deps):
|
||||||
|
deps = patched_deps
|
||||||
|
deps["storage"].exists.return_value = True
|
||||||
|
deps["storage"].download_object.return_value = b"tar bytes"
|
||||||
|
deps["cache"].is_mutable_file.return_value = False
|
||||||
|
|
||||||
|
response = client.get("/api/v1/remote/pypi-files-test/packages/requests-2.31.0.tar.gz")
|
||||||
|
assert response.status_code == 200
|
||||||
|
assert "application/gzip" in response.headers["content-type"]
|
||||||
|
|
||||||
|
def test_blocked_path_on_files_remote_returns_403(self, client, patched_deps):
|
||||||
|
"""Paths that don't match immutable_patterns on pypi-files-test are blocked."""
|
||||||
|
response = client.get("/api/v1/remote/pypi-files-test/packages/requests.unknown")
|
||||||
|
assert response.status_code == 403
|
||||||
|
|
||||||
|
|
||||||
|
# ---------------------------------------------------------------------------
|
||||||
|
# npm remote /api/v1/remote/npm-test/...
|
||||||
|
# ---------------------------------------------------------------------------
|
||||||
|
|
||||||
|
|
||||||
|
class TestNpmRemote:
|
||||||
|
def test_package_metadata_is_mutable(self, client, patched_deps):
|
||||||
|
"""Top-level package metadata paths are detected as mutable."""
|
||||||
|
deps = patched_deps
|
||||||
|
meta = b'{"name":"express","versions":{}}'
|
||||||
|
deps["storage"].exists.return_value = True
|
||||||
|
deps["storage"].download_object.return_value = meta
|
||||||
|
deps["cache"].is_mutable_file.return_value = True
|
||||||
|
deps["cache"].is_index_valid.return_value = True
|
||||||
|
|
||||||
|
response = client.get("/api/v1/remote/npm-test/express")
|
||||||
|
assert response.status_code == 200
|
||||||
|
deps["cache"].mark_index_cached.assert_not_called()
|
||||||
|
|
||||||
|
def test_metadata_tarball_urls_rewritten_to_proxy(self, client, patched_deps):
|
||||||
|
"""registry.npmjs.org tarball URLs in metadata JSON are rewritten to our proxy."""
|
||||||
|
deps = patched_deps
|
||||||
|
meta = b'{"dist":{"tarball":"https://registry.npmjs.org/express/-/express-4.18.2.tgz"}}'
|
||||||
|
deps["storage"].exists.return_value = True
|
||||||
|
deps["storage"].download_object.return_value = meta
|
||||||
|
deps["cache"].is_mutable_file.return_value = True
|
||||||
|
deps["cache"].is_index_valid.return_value = True
|
||||||
|
|
||||||
|
response = client.get("/api/v1/remote/npm-test/express")
|
||||||
|
assert response.status_code == 200
|
||||||
|
assert b"registry.npmjs.org" not in response.content
|
||||||
|
assert b"/api/v1/remote/npm-test/express/-/express-4.18.2.tgz" in response.content
|
||||||
|
|
||||||
|
def test_metadata_content_type_is_json(self, client, patched_deps):
|
||||||
|
deps = patched_deps
|
||||||
|
deps["storage"].exists.return_value = True
|
||||||
|
deps["storage"].download_object.return_value = b'{"name":"express"}'
|
||||||
|
deps["cache"].is_mutable_file.return_value = True
|
||||||
|
deps["cache"].is_index_valid.return_value = True
|
||||||
|
|
||||||
|
response = client.get("/api/v1/remote/npm-test/express")
|
||||||
|
assert response.status_code == 200
|
||||||
|
assert "application/json" in response.headers["content-type"]
|
||||||
|
|
||||||
|
def test_scoped_package_metadata_rewritten(self, client, patched_deps):
|
||||||
|
"""@scope/package metadata URLs are also rewritten back to the same npm-test remote."""
|
||||||
|
deps = patched_deps
|
||||||
|
meta = b'{"dist":{"tarball":"https://registry.npmjs.org/@babel/core/-/core-7.21.0.tgz"}}'
|
||||||
|
deps["storage"].exists.return_value = True
|
||||||
|
deps["storage"].download_object.return_value = meta
|
||||||
|
deps["cache"].is_mutable_file.return_value = True
|
||||||
|
deps["cache"].is_index_valid.return_value = True
|
||||||
|
|
||||||
|
response = client.get("/api/v1/remote/npm-test/@babel/core")
|
||||||
|
assert response.status_code == 200
|
||||||
|
assert b"registry.npmjs.org" not in response.content
|
||||||
|
assert b"/api/v1/remote/npm-test/@babel/core/-/core-7.21.0.tgz" in response.content
|
||||||
|
|
||||||
|
def test_tarball_not_rewritten(self, client, patched_deps):
|
||||||
|
"""Tarball requests (.tgz) bypass URL rewriting and return binary."""
|
||||||
|
deps = patched_deps
|
||||||
|
deps["storage"].exists.return_value = True
|
||||||
|
deps["storage"].download_object.return_value = b"\x1f\x8b tgz bytes"
|
||||||
|
deps["cache"].is_mutable_file.return_value = False
|
||||||
|
|
||||||
|
response = client.get("/api/v1/remote/npm-test/express/-/express-4.18.2.tgz")
|
||||||
|
assert response.status_code == 200
|
||||||
|
assert "application/gzip" in response.headers["content-type"]
|
||||||
|
assert response.headers["X-Artifact-Source"] == "cache"
|
||||||
|
|
||||||
|
def test_metadata_cache_miss_fetches_upstream(self, client, patched_deps):
|
||||||
|
deps = patched_deps
|
||||||
|
meta = b'{"dist":{"tarball":"https://registry.npmjs.org/lodash/-/lodash-4.17.21.tgz"}}'
|
||||||
|
deps["storage"].exists.return_value = False
|
||||||
|
deps["storage"].download_object.return_value = meta
|
||||||
|
deps["cache"].is_mutable_file.return_value = True
|
||||||
|
|
||||||
|
with patch(
|
||||||
|
"artifactapi.main.cache_single_artifact",
|
||||||
|
new_callable=AsyncMock,
|
||||||
|
return_value={"status": "cached"},
|
||||||
|
) as mock_fetch:
|
||||||
|
response = client.get("/api/v1/remote/npm-test/lodash")
|
||||||
|
|
||||||
|
mock_fetch.assert_called_once()
|
||||||
|
assert response.status_code == 200
|
||||||
|
assert b"registry.npmjs.org" not in response.content
|
||||||
|
|
||||||
|
def test_tarball_immutable_allowed_on_npm_remote(self, client, patched_deps):
|
||||||
|
"""Tarballs (.tgz) match immutable_patterns and are served without rewriting."""
|
||||||
|
deps = patched_deps
|
||||||
|
deps["storage"].exists.return_value = True
|
||||||
|
deps["storage"].download_object.return_value = b"tgz bytes"
|
||||||
|
deps["cache"].is_mutable_file.return_value = False
|
||||||
|
|
||||||
|
response = client.get("/api/v1/remote/npm-test/express/-/express-4.18.2.tgz")
|
||||||
|
assert response.status_code == 200
|
||||||
|
assert "application/gzip" in response.headers["content-type"]
|
||||||
|
|||||||
Reference in New Issue
Block a user