feat: add npm remote type with metadata URL rewriting and caching
- Add `npm` package type to config with no built-in mutable defaults; users set explicit mutable_patterns (e.g. ^(?!.*\.tgz$).*) and immutable_patterns (e.g. \.tgz$) in remotes.yaml - Rewrite dist.tarball URLs in metadata JSON on the fly so tarball downloads pass through the same proxy remote instead of hitting npmjs.org directly - Single-remote design: npm_files_remote points back to itself since both metadata and tarballs are served from registry.npmjs.org - Add .tgz to _get_content_type (application/gzip) - Add example npm remote to remotes.yaml - Add npm proxy section to README covering remotes.yaml config, client setup (npm/yarn/pnpm), rewriting behaviour, and mutable vs immutable path table - Add tests for mutable pattern matching, URL rewriting, content-type, scoped packages, cache miss, and tarball immutability
This commit is contained in:
@@ -13,6 +13,7 @@ A generic FastAPI-based artifact caching system that downloads and stores files
|
||||
- **Stale-on-Upstream-Error**: Expired mutable files are kept and their TTL refreshed when the backend cannot be reached, so cached data remains available during upstream outages
|
||||
- **S3 Storage**: MinIO/S3 backend with predictable paths
|
||||
- **Docker Registry Proxy**: Full Docker Registry HTTP API v2 for transparent container image caching
|
||||
- **npm Package Proxy**: Caching proxy for the npm registry with metadata URL rewriting so tarballs also pass through cache
|
||||
- **Content-Type Detection**: Automatic MIME type detection for downloads
|
||||
|
||||
## Architecture
|
||||
@@ -1031,4 +1032,68 @@ When uv requests the simple index for a package, the proxy:
|
||||
|
||||
uv then downloads wheels and `.whl.metadata` files via the rewritten URLs, which also pass through the proxy and are cached as immutable artifacts.
|
||||
|
||||
For self-hosted registries like Gitea, both the index and file downloads share the same base URL. Setting `pypi_files_url` and `pypi_files_remote` to the same remote causes file links to be rewritten back through the same proxy entry.
|
||||
For self-hosted registries like Gitea, both the index and file downloads share the same base URL. Setting `pypi_files_url` and `pypi_files_remote` to the same remote causes file links to be rewritten back through the same proxy entry.
|
||||
|
||||
## npm Package Proxy
|
||||
|
||||
The `npm` package type turns the artifact API into a caching npm registry proxy. Since the npm registry serves both metadata and tarballs from the same host, a single remote handles everything. Package metadata (e.g. `GET /express`) is mutable and expires after `mutable_ttl`; tarballs (`.tgz`) are immutable and cached forever. `dist.tarball` URLs in metadata JSON are rewritten on the fly to point back through the same remote, so both the metadata lookup and the tarball download are served from cache.
|
||||
|
||||
### remotes.yaml
|
||||
|
||||
```yaml
|
||||
remotes:
|
||||
npm:
|
||||
base_url: "https://registry.npmjs.org"
|
||||
type: "remote"
|
||||
package: "npm"
|
||||
npm_files_url: "https://registry.npmjs.org" # URL prefix to rewrite in metadata JSON
|
||||
npm_files_remote: "npm" # rewrite back to this same remote
|
||||
check_mutable_updates: true
|
||||
immutable_patterns:
|
||||
- "\.tgz$" # versioned tarballs are content-addressed — cache forever
|
||||
mutable_patterns:
|
||||
- "^(?!.*\.tgz$).*" # everything else (package metadata) expires after mutable_ttl
|
||||
cache:
|
||||
immutable_ttl: 0
|
||||
mutable_ttl: 600 # re-check package metadata after 10 minutes
|
||||
```
|
||||
|
||||
### Configuring npm / yarn / pnpm
|
||||
|
||||
**npm** — per-project `.npmrc` or `~/.npmrc`:
|
||||
|
||||
```ini
|
||||
registry=https://artifacts.example.com/api/v1/remote/npm/
|
||||
```
|
||||
|
||||
**yarn** — `~/.yarnrc.yml`:
|
||||
|
||||
```yaml
|
||||
npmRegistryServer: "https://artifacts.example.com/api/v1/remote/npm/"
|
||||
```
|
||||
|
||||
**pnpm** — `.npmrc`:
|
||||
|
||||
```ini
|
||||
registry=https://artifacts.example.com/api/v1/remote/npm/
|
||||
```
|
||||
|
||||
### How the rewriting works
|
||||
|
||||
When a client requests package metadata, the proxy:
|
||||
|
||||
1. Fetches `https://registry.npmjs.org/{package}` (or returns a cached copy within `mutable_ttl`)
|
||||
2. Rewrites every `https://registry.npmjs.org/...` tarball URL to `https://artifacts.example.com/api/v1/remote/npm/...`
|
||||
3. Returns the rewritten JSON to the client
|
||||
|
||||
The client then downloads the tarball via the rewritten URL, which hits the same `npm` remote and is cached as an immutable artifact. Subsequent installs of the same package version are served entirely from S3.
|
||||
|
||||
### Mutable vs immutable paths
|
||||
|
||||
| Path pattern | Type | Example |
|
||||
|---|---|---|
|
||||
| `/{package}` | Mutable (TTL) | `/express` |
|
||||
| `/@{scope}/{package}` | Mutable (TTL) | `/@babel/core` |
|
||||
| `/-/all` | Mutable (TTL) | `/-/all` |
|
||||
| `/{package}/-/{package}-{version}.tgz` | Immutable (forever) | `/express/-/express-4.18.2.tgz` |
|
||||
| `/@{scope}/{pkg}/-/{pkg}-{ver}.tgz` | Immutable (forever) | `/@babel/core/-/core-7.21.0.tgz` |
|
||||
@@ -243,6 +243,24 @@ remotes:
|
||||
cache:
|
||||
immutable_ttl: 0 # Package files are content-addressed — cache forever
|
||||
|
||||
npm:
|
||||
base_url: "https://registry.npmjs.org"
|
||||
type: "remote"
|
||||
package: "npm"
|
||||
description: "npm registry — package metadata with tarball URL rewriting"
|
||||
# npm_files_url: the upstream host used in metadata tarball hrefs (default: https://registry.npmjs.org)
|
||||
# npm_files_remote: our proxy remote that will serve those tarballs (default: npm-files)
|
||||
npm_files_url: "https://registry.npmjs.org"
|
||||
npm_files_remote: "npm"
|
||||
check_mutable_updates: true
|
||||
immutable_patterns:
|
||||
- \.tgz$
|
||||
mutable_patterns:
|
||||
- ^(?!.*\.tgz$).*
|
||||
cache:
|
||||
immutable_ttl: 0
|
||||
mutable_ttl: 600 # Package metadata refreshed after 10 minutes
|
||||
|
||||
local-generic:
|
||||
type: "local"
|
||||
package: "generic"
|
||||
|
||||
@@ -21,6 +21,7 @@ _PACKAGE_MUTABLE_PATTERNS: dict[str, list[str]] = {
|
||||
"pypi": [
|
||||
r"simple/", # Per-package and top-level simple index pages
|
||||
],
|
||||
"npm": [],
|
||||
"generic": [],
|
||||
}
|
||||
|
||||
|
||||
+10
-1
@@ -337,7 +337,7 @@ async def handle_expired_mutable(remote_name: str, path: str, remote_url: str) -
|
||||
|
||||
|
||||
def _get_content_type(filename: str) -> str:
|
||||
if filename.endswith(".tar.gz"):
|
||||
if filename.endswith((".tar.gz", ".tgz")):
|
||||
return "application/gzip"
|
||||
if filename.endswith(".zip") or filename.endswith(".whl"):
|
||||
return "application/zip"
|
||||
@@ -369,6 +369,15 @@ def _resolve_content(
|
||||
f"{proxy_base}/api/v1/remote/{files_remote}".encode(),
|
||||
)
|
||||
return data, "text/html; charset=utf-8"
|
||||
if remote_config.get("package") == "npm" and not path.endswith(".tgz"):
|
||||
files_url = remote_config.get("npm_files_url", "https://registry.npmjs.org")
|
||||
files_remote = remote_config.get("npm_files_remote", "npm-files")
|
||||
proxy_base = str(request.base_url).rstrip("/")
|
||||
data = data.replace(
|
||||
files_url.rstrip("/").encode(),
|
||||
f"{proxy_base}/api/v1/remote/{files_remote}".encode(),
|
||||
)
|
||||
return data, "application/json"
|
||||
return data, _get_content_type(filename)
|
||||
|
||||
|
||||
|
||||
@@ -91,6 +91,16 @@ TEST_REMOTES = {
|
||||
],
|
||||
"cache": {"immutable_ttl": 0, "mutable_ttl": 0},
|
||||
},
|
||||
"npm-test": {
|
||||
"base_url": "https://registry.npmjs.org",
|
||||
"type": "remote",
|
||||
"package": "npm",
|
||||
"npm_files_url": "https://registry.npmjs.org",
|
||||
"npm_files_remote": "npm-test",
|
||||
"immutable_patterns": [r"\.tgz$"],
|
||||
"mutable_patterns": [r"^(?!.*\.tgz$).*"],
|
||||
"cache": {"immutable_ttl": 0, "mutable_ttl": 600},
|
||||
},
|
||||
}
|
||||
}
|
||||
|
||||
|
||||
@@ -133,6 +133,44 @@ class TestGetMutablePatterns:
|
||||
assert r"repomd\.xml$" in patterns
|
||||
assert r"custom-meta\.xml$" in patterns
|
||||
|
||||
def test_npm_has_no_package_defaults(self, make_config):
|
||||
cfg = make_config({"r": {"type": "remote", "package": "npm", "base_url": "https://x.com"}})
|
||||
assert cfg.get_mutable_patterns("r") == []
|
||||
|
||||
def test_npm_explicit_mutable_pattern_matches_metadata(self, make_config):
|
||||
import re
|
||||
|
||||
cfg = make_config(
|
||||
{
|
||||
"r": {
|
||||
"type": "remote",
|
||||
"package": "npm",
|
||||
"base_url": "https://x.com",
|
||||
"mutable_patterns": [r"^(?!.*\.tgz$).*"],
|
||||
}
|
||||
}
|
||||
)
|
||||
patterns = cfg.get_mutable_patterns("r")
|
||||
assert any(re.search(p, "express") for p in patterns)
|
||||
assert any(re.search(p, "@babel/core") for p in patterns)
|
||||
|
||||
def test_npm_explicit_mutable_pattern_excludes_tarballs(self, make_config):
|
||||
import re
|
||||
|
||||
cfg = make_config(
|
||||
{
|
||||
"r": {
|
||||
"type": "remote",
|
||||
"package": "npm",
|
||||
"base_url": "https://x.com",
|
||||
"mutable_patterns": [r"^(?!.*\.tgz$).*"],
|
||||
}
|
||||
}
|
||||
)
|
||||
patterns = cfg.get_mutable_patterns("r")
|
||||
assert not any(re.search(p, "express-4.18.2.tgz") for p in patterns)
|
||||
assert not any(re.search(p, "express/-/express-4.18.2.tgz") for p in patterns)
|
||||
|
||||
|
||||
# ---------------------------------------------------------------------------
|
||||
# get_immutable_patterns
|
||||
|
||||
@@ -741,3 +741,103 @@ class TestPyPIRemote:
|
||||
"""Paths that don't match immutable_patterns on pypi-files-test are blocked."""
|
||||
response = client.get("/api/v1/remote/pypi-files-test/packages/requests.unknown")
|
||||
assert response.status_code == 403
|
||||
|
||||
|
||||
# ---------------------------------------------------------------------------
|
||||
# npm remote /api/v1/remote/npm-test/...
|
||||
# ---------------------------------------------------------------------------
|
||||
|
||||
|
||||
class TestNpmRemote:
|
||||
def test_package_metadata_is_mutable(self, client, patched_deps):
|
||||
"""Top-level package metadata paths are detected as mutable."""
|
||||
deps = patched_deps
|
||||
meta = b'{"name":"express","versions":{}}'
|
||||
deps["storage"].exists.return_value = True
|
||||
deps["storage"].download_object.return_value = meta
|
||||
deps["cache"].is_mutable_file.return_value = True
|
||||
deps["cache"].is_index_valid.return_value = True
|
||||
|
||||
response = client.get("/api/v1/remote/npm-test/express")
|
||||
assert response.status_code == 200
|
||||
deps["cache"].mark_index_cached.assert_not_called()
|
||||
|
||||
def test_metadata_tarball_urls_rewritten_to_proxy(self, client, patched_deps):
|
||||
"""registry.npmjs.org tarball URLs in metadata JSON are rewritten to our proxy."""
|
||||
deps = patched_deps
|
||||
meta = b'{"dist":{"tarball":"https://registry.npmjs.org/express/-/express-4.18.2.tgz"}}'
|
||||
deps["storage"].exists.return_value = True
|
||||
deps["storage"].download_object.return_value = meta
|
||||
deps["cache"].is_mutable_file.return_value = True
|
||||
deps["cache"].is_index_valid.return_value = True
|
||||
|
||||
response = client.get("/api/v1/remote/npm-test/express")
|
||||
assert response.status_code == 200
|
||||
assert b"registry.npmjs.org" not in response.content
|
||||
assert b"/api/v1/remote/npm-test/express/-/express-4.18.2.tgz" in response.content
|
||||
|
||||
def test_metadata_content_type_is_json(self, client, patched_deps):
|
||||
deps = patched_deps
|
||||
deps["storage"].exists.return_value = True
|
||||
deps["storage"].download_object.return_value = b'{"name":"express"}'
|
||||
deps["cache"].is_mutable_file.return_value = True
|
||||
deps["cache"].is_index_valid.return_value = True
|
||||
|
||||
response = client.get("/api/v1/remote/npm-test/express")
|
||||
assert response.status_code == 200
|
||||
assert "application/json" in response.headers["content-type"]
|
||||
|
||||
def test_scoped_package_metadata_rewritten(self, client, patched_deps):
|
||||
"""@scope/package metadata URLs are also rewritten back to the same npm-test remote."""
|
||||
deps = patched_deps
|
||||
meta = b'{"dist":{"tarball":"https://registry.npmjs.org/@babel/core/-/core-7.21.0.tgz"}}'
|
||||
deps["storage"].exists.return_value = True
|
||||
deps["storage"].download_object.return_value = meta
|
||||
deps["cache"].is_mutable_file.return_value = True
|
||||
deps["cache"].is_index_valid.return_value = True
|
||||
|
||||
response = client.get("/api/v1/remote/npm-test/@babel/core")
|
||||
assert response.status_code == 200
|
||||
assert b"registry.npmjs.org" not in response.content
|
||||
assert b"/api/v1/remote/npm-test/@babel/core/-/core-7.21.0.tgz" in response.content
|
||||
|
||||
def test_tarball_not_rewritten(self, client, patched_deps):
|
||||
"""Tarball requests (.tgz) bypass URL rewriting and return binary."""
|
||||
deps = patched_deps
|
||||
deps["storage"].exists.return_value = True
|
||||
deps["storage"].download_object.return_value = b"\x1f\x8b tgz bytes"
|
||||
deps["cache"].is_mutable_file.return_value = False
|
||||
|
||||
response = client.get("/api/v1/remote/npm-test/express/-/express-4.18.2.tgz")
|
||||
assert response.status_code == 200
|
||||
assert "application/gzip" in response.headers["content-type"]
|
||||
assert response.headers["X-Artifact-Source"] == "cache"
|
||||
|
||||
def test_metadata_cache_miss_fetches_upstream(self, client, patched_deps):
|
||||
deps = patched_deps
|
||||
meta = b'{"dist":{"tarball":"https://registry.npmjs.org/lodash/-/lodash-4.17.21.tgz"}}'
|
||||
deps["storage"].exists.return_value = False
|
||||
deps["storage"].download_object.return_value = meta
|
||||
deps["cache"].is_mutable_file.return_value = True
|
||||
|
||||
with patch(
|
||||
"artifactapi.main.cache_single_artifact",
|
||||
new_callable=AsyncMock,
|
||||
return_value={"status": "cached"},
|
||||
) as mock_fetch:
|
||||
response = client.get("/api/v1/remote/npm-test/lodash")
|
||||
|
||||
mock_fetch.assert_called_once()
|
||||
assert response.status_code == 200
|
||||
assert b"registry.npmjs.org" not in response.content
|
||||
|
||||
def test_tarball_immutable_allowed_on_npm_remote(self, client, patched_deps):
|
||||
"""Tarballs (.tgz) match immutable_patterns and are served without rewriting."""
|
||||
deps = patched_deps
|
||||
deps["storage"].exists.return_value = True
|
||||
deps["storage"].download_object.return_value = b"tgz bytes"
|
||||
deps["cache"].is_mutable_file.return_value = False
|
||||
|
||||
response = client.get("/api/v1/remote/npm-test/express/-/express-4.18.2.tgz")
|
||||
assert response.status_code == 200
|
||||
assert "application/gzip" in response.headers["content-type"]
|
||||
|
||||
Reference in New Issue
Block a user