Compare commits

..

13 Commits

Author SHA1 Message Date
unkinben b3d12f4962 docs: add SPEC.md with repository model and caching requirements 2026-04-25 18:31:27 +10:00
unkinben 92b9f9a03e refactor: use package: docker instead of type: docker
Align with intended type=local|remote|virtual / package=docker|rpm|alpine|generic
model. All docker-specific logic now keyed on package field; type field
correctly reflects the repository kind (remote vs local).
2026-04-25 18:27:31 +10:00
unkinben 7930023de8 Merge pull request 'feat: enforce include_patterns on docker /v2/ proxy route' (#10) from benvin/docker-include-patterns into master
Reviewed-on: #10
2026-04-25 18:14:50 +10:00
unkinben 869a1f8c02 feat: enforce include_patterns on docker /v2/ proxy route
Adds pattern checking to docker_v2_proxy before any upstream fetch.
Patterns match against the full path and the image name (first two
path segments), bypassing the index-file exemption that check_artifact_patterns
applies — so restrictions apply equally to manifests, blobs, and tag lists.
Returns 403 when no pattern matches, consistent with the non-docker route.
2026-04-25 18:09:12 +10:00
unkinben 1b2ee0d37f Merge pull request 'benvin/docker-caching' (#9) from benvin/docker-caching into master
Reviewed-on: #9
2026-04-25 17:33:18 +10:00
unkinben 33e7365a88 fix: set SETUPTOOLS_SCM_PRETEND_VERSION in Dockerfile for hatch-vcs 2026-04-25 17:31:36 +10:00
unkinben cf854a2ace chore: derive version from git tags via hatch-vcs
Replace hardcoded version in pyproject.toml with hatch-vcs so the
package version is read from git tags at build time. Dockerfile
accepts a VERSION build arg and passes it as HATCH_VCS_PRETEND_VERSION
for builds without a git checkout. Makefile _tag target now rebuilds
the container with the correct version automatically.
2026-04-25 16:53:37 +10:00
unkinben 4c1f77e679 Merge pull request 'feat: add Docker registry proxy support with proper cache classification' (#8) from benvin/docker-caching into master
Reviewed-on: #8
2026-04-25 16:37:38 +10:00
unkinben 4651183ed1 feat: add Docker registry proxy support with proper cache classification
- Add /v2/ endpoint implementing OCI Distribution API for native docker pull support
- Add docker_auth.py with Bearer token challenge handling and in-memory token cache
- Classify tag-based manifests (/manifests/<tag>) as index (short TTL, mutable)
- Classify digest-pinned manifests (/manifests/sha256:...) and blobs as file cache (indefinite, immutable)
- Deduplicate blob storage by keying on sha256 digest rather than image path
- Support username/password auth per docker remote in remotes.yaml
2026-04-25 16:35:27 +10:00
unkinben 5733d52e51 Merge pull request 'feature/cache-flush-api-enhancement' (#7) from feature/cache-flush-api-enhancement into master
Reviewed-on: #7
2026-01-25 11:34:43 +11:00
unkinben bf8a176dda Bump version: 2.0.4 → 2.0.5 2026-01-25 11:09:29 +11:00
unkinben 3e8e819ecf feat: enhance cache flush API for remote-specific S3 clearing
Add support for remote-specific S3 object deletion using hierarchical key prefixes.
When a remote parameter is provided, only objects under that remote's hierarchy
(e.g., 'fedora/*') are deleted, preserving cached files for other remotes.

Key improvements:
- Use S3 list_objects_v2 Prefix parameter for targeted deletion
- Enhanced logging to indicate scope of cache clearing operations
- Backward compatible - existing behavior preserved when no remote specified

Resolves artifactapi-u46
2026-01-25 11:08:46 +11:00
unkinben de04e4d2b2 Merge pull request 'benvin/path-based-storage' (#6) from benvin/path-based-storage into master
Reviewed-on: #6
2026-01-25 00:00:53 +11:00
8 changed files with 408 additions and 14 deletions
+3
View File
@@ -32,6 +32,9 @@ COPY --chown=appuser:appuser pyproject.toml uv.lock README.md ./
# Switch to appuser and install Python dependencies
USER appuser
ARG VERSION=dev
ENV HATCH_VCS_PRETEND_VERSION=${VERSION} \
SETUPTOOLS_SCM_PRETEND_VERSION=${VERSION}
RUN uv sync --frozen
# Copy application source
+24
View File
@@ -45,3 +45,27 @@ docker-clean:
docker system prune -f
docker-restart: docker-down docker-up
# Bump helpers — reads the latest semver tag and creates the next one.
# If no tag exists yet, starts from v0.0.0.
_LATEST := $(shell git tag --sort=-v:refname | grep -E '^v[0-9]+\.[0-9]+\.[0-9]+$$' | head -1)
_BASE := $(if $(_LATEST),$(_LATEST),v0.0.0)
_MAJ := $(shell echo $(_BASE) | sed 's/^v//' | cut -d. -f1)
_MIN := $(shell echo $(_BASE) | sed 's/^v//' | cut -d. -f2)
_PAT := $(shell echo $(_BASE) | sed 's/^v//' | cut -d. -f3)
patch:
@NEW=v$(_MAJ).$(_MIN).$(shell expr $(_PAT) + 1); \
git tag $$NEW && echo "Tagged $$NEW" && $(MAKE) _tag TAG=$$NEW
minor:
@NEW=v$(_MAJ).$(shell expr $(_MIN) + 1).0; \
git tag $$NEW && echo "Tagged $$NEW" && $(MAKE) _tag TAG=$$NEW
major:
@NEW=v$(shell expr $(_MAJ) + 1).0.0; \
git tag $$NEW && echo "Tagged $$NEW" && $(MAKE) _tag TAG=$$NEW
_tag:
git push origin $(TAG)
docker-compose build --no-cache --build-arg VERSION=$(TAG:v%=%)
+137
View File
@@ -0,0 +1,137 @@
# ArtifactAPI Specification
## Repository model
Every repository entry in `remotes.yaml` has two orthogonal fields:
| field | values | meaning |
|---|---|---|
| `type` | `local`, `remote`, `virtual` | repository kind — how the repo is served |
| `package` | `docker`, `rpm`, `alpine`, `generic` | package format — what protocol and caching rules to apply |
**type**
- `local` — files are uploaded directly to the API and stored in S3; no upstream.
- `remote` — proxies and caches content from an upstream URL (`base_url`).
- `virtual` — aggregates multiple repositories (not yet implemented).
**package**
- `docker` — upstream speaks the OCI Distribution API (Bearer auth, manifest/blob paths).
- `rpm` — upstream is an RPM repository; repodata files are index files.
- `alpine` — upstream is an Alpine APK repository; `APKINDEX.tar.gz` is an index file.
- `generic` — plain HTTP file download; no format-specific logic.
---
## Caching
Two cache classes determine retention:
| class | stored | TTL |
|---|---|---|
| **file** | S3 object, no Redis entry | `file_ttl``0` means indefinite |
| **index** | S3 object + Redis TTL key | `index_ttl` — when the Redis key expires the S3 object is deleted and re-fetched |
Index files are mutable metadata that must expire. File-class objects are treated as immutable and cached indefinitely (unless `file_ttl` is set).
---
## Docker package rules
### URL construction
Remote URLs are prefixed with `/v2/` for `package: docker` remotes:
```
{base_url}/v2/{path}
```
e.g. `library/nginx/manifests/latest``https://registry-1.docker.io/v2/library/nginx/manifests/latest`
### Authentication
Docker registries use Bearer token challenges. On a `401 Unauthorized` response, the API:
1. Parses the `WWW-Authenticate: Bearer` header for `realm`, `service`, and `scope`.
2. Fetches a token from the auth realm, supplying `username`/`password` from the remote config if present.
3. Retries the request with `Authorization: Bearer <token>`.
Tokens are cached in-memory keyed by `(realm, service, scope, username)` and expire 30 seconds before their stated `expires_in`.
### Cache classification
| path pattern | mutable | class | TTL source |
|---|---|---|---|
| `/manifests/<tag>` | yes | index | `index_ttl` |
| `/tags/list` | yes | index | `index_ttl` |
| `/manifests/sha256:<digest>` | no | file | `file_ttl` |
| `/blobs/sha256:<digest>` | no | file | `file_ttl` |
Tag-based manifests and tag lists are mutable and cached as index. Digest-pinned manifests and blobs are content-addressed and cached indefinitely as files.
### Blob deduplication
Blobs are stored under a digest-keyed path shared across all images on the same remote:
```
{remote_name}/blobs/sha256/{digest}
```
The same layer pulled by different images is stored once.
### Accept headers
| path | `Accept` header sent upstream |
|---|---|
| `/manifests/…` | `application/vnd.docker.distribution.manifest.v2+json`, `application/vnd.oci.image.manifest.v1+json`, `application/vnd.oci.image.index.v1+json`, `application/vnd.docker.distribution.manifest.list.v2+json` |
| `/blobs/…` | `application/octet-stream` |
---
## OCI Distribution API endpoint
The API exposes a native Docker registry interface so clients can use `docker pull` directly:
```
GET /v2/ — version ping
GET /v2/{remote}/{image}/manifests/{ref} — fetch manifest
HEAD /v2/{remote}/{image}/manifests/{ref} — manifest metadata
GET /v2/{remote}/{image}/blobs/{digest} — fetch blob
HEAD /v2/{remote}/{image}/blobs/{digest} — blob metadata
```
Responses include `Docker-Distribution-Api-Version`, `Docker-Content-Digest`, and the correct OCI `Content-Type` (detected from the manifest `mediaType` field).
Only remotes with `package: docker` are accessible via this endpoint. All other remotes return `400`.
---
## include_patterns
`include_patterns` is a list of Python regexes applied to every request before any upstream fetch or cache lookup.
**Generic remotes (`/api/v1/remote/…`):**
- Patterns match against the file path and the full path.
- Index files (mutable metadata) bypass pattern checks and are always allowed.
**Docker remotes (`/v2/…`):**
- Patterns match against the image name (first two path segments, e.g. `library/nginx`) and the full path.
- The index-file exemption does **not** apply — patterns restrict whole images, including their manifests and tag lists.
- No patterns configured → all images allowed.
Returns `403` when a request is blocked.
---
## Versioning
The package version is derived from git tags via `hatch-vcs`. Tags follow the format `v{MAJOR}.{MINOR}.{PATCH}`.
Docker images are built with the version injected at build time:
```
SETUPTOOLS_SCM_PRETEND_VERSION=<version> uv sync --frozen
```
The `Makefile` provides `patch`, `minor`, and `major` targets that tag the current commit and rebuild the container image.
+5 -9
View File
@@ -1,6 +1,6 @@
[project]
name = "artifactapi"
version = "2.0.4"
dynamic = ["version"]
description = "Generic artifact caching system with support for various package managers"
dependencies = [
@@ -23,9 +23,12 @@ license = {text = "MIT"}
artifactapi = "artifactapi.main:main"
[build-system]
requires = ["hatchling"]
requires = ["hatchling", "hatch-vcs"]
build-backend = "hatchling.build"
[tool.hatch.version]
source = "vcs"
[tool.hatch.metadata]
allow-direct-references = true
@@ -40,11 +43,4 @@ dev = [
"isort>=5.12.0",
"mypy>=1.6.0",
"ruff>=0.1.0",
"bump-my-version>=1.2.0",
]
[tool.bumpversion]
current_version = "2.0.4"
commit = true
tag = true
message = "Bump version: {current_version} → {new_version}"
+7
View File
@@ -30,6 +30,13 @@ class RedisCache:
".yaml.xz", ".yaml.gz", ".yaml.bz2", ".yaml.zst",
".asc", ".txt"
)))
# Docker tag-based manifests are mutable (index); digest-pinned are immutable (file)
or (
"/manifests/" in file_path
and not file_path.split("/manifests/", 1)[1].startswith("sha256:")
)
or "/tags/list" in file_path
or file_path.endswith("/tags/list")
)
def get_index_cache_key(self, remote_name: str, path: str) -> str:
+96
View File
@@ -0,0 +1,96 @@
import time
import logging
import re
from typing import Optional
import httpx
logger = logging.getLogger(__name__)
# In-memory token cache: key -> (token, expires_at)
_token_cache: dict[str, tuple[str, float]] = {}
_WWW_AUTH_RE = re.compile(
r'Bearer\s+realm="(?P<realm>[^"]+)"'
r'(?:,service="(?P<service>[^"]*)")?'
r'(?:,scope="(?P<scope>[^"]*)")?',
re.IGNORECASE,
)
def _cache_key(realm: str, service: str, scope: str, username: Optional[str]) -> str:
return f"{realm}|{service}|{scope}|{username or ''}"
def _get_cached_token(key: str) -> Optional[str]:
entry = _token_cache.get(key)
if entry and entry[1] > time.time():
return entry[0]
_token_cache.pop(key, None)
return None
def _store_token(key: str, token: str, expires_in: int) -> None:
# Expire 30s early to avoid using a token right as it expires
_token_cache[key] = (token, time.time() + max(expires_in - 30, 10))
async def fetch_token(
realm: str,
service: str,
scope: str,
username: Optional[str] = None,
password: Optional[str] = None,
) -> Optional[str]:
"""Fetch a Bearer token from a Docker registry auth server."""
key = _cache_key(realm, service, scope, username)
cached = _get_cached_token(key)
if cached:
return cached
params: dict[str, str] = {}
if service:
params["service"] = service
if scope:
params["scope"] = scope
auth = (username, password) if username and password else None
try:
async with httpx.AsyncClient(follow_redirects=True) as client:
response = await client.get(realm, params=params, auth=auth)
response.raise_for_status()
data = response.json()
except Exception as e:
logger.warning(f"Docker token fetch failed ({realm}): {e}")
return None
token = data.get("token") or data.get("access_token")
if not token:
logger.warning(f"Docker token response missing token field: {data}")
return None
expires_in = int(data.get("expires_in", 300))
_store_token(key, token, expires_in)
logger.debug(f"Docker token obtained (realm={realm}, service={service}, scope={scope}, expires_in={expires_in}s)")
return token
def parse_www_authenticate(header: str) -> Optional[tuple[str, str, str]]:
"""Parse WWW-Authenticate: Bearer header. Returns (realm, service, scope) or None."""
m = _WWW_AUTH_RE.search(header)
if not m:
return None
return m.group("realm"), m.group("service") or "", m.group("scope") or ""
async def get_docker_token_for_response(
www_authenticate: str,
username: Optional[str] = None,
password: Optional[str] = None,
) -> Optional[str]:
"""Given a WWW-Authenticate header value, fetch and return a Bearer token."""
parsed = parse_www_authenticate(www_authenticate)
if not parsed:
return None
realm, service, scope = parsed
return await fetch_token(realm, service, scope, username, password)
+128 -5
View File
@@ -1,10 +1,11 @@
import os
import re
import json
import hashlib
import logging
from typing import Dict, Any, Optional
import httpx
from fastapi import FastAPI, HTTPException, Response, Query, File, UploadFile
from fastapi import FastAPI, HTTPException, Response, Request, Query, File, UploadFile
from fastapi.responses import PlainTextResponse, JSONResponse
from pydantic import BaseModel
from prometheus_client import generate_latest, CONTENT_TYPE_LATEST
@@ -21,6 +22,7 @@ from .database import DatabaseManager
from .storage import S3Storage
from .cache import RedisCache
from .metrics import MetricsManager
from .docker_auth import get_docker_token_for_response
class ArtifactRequest(BaseModel):
@@ -116,7 +118,12 @@ def flush_cache(
# Flush S3 objects if requested
if cache_type in ["all", "files"]:
try:
response = storage.client.list_objects_v2(Bucket=storage.bucket)
# Use prefix filtering for remote-specific deletion
list_params = {"Bucket": storage.bucket}
if remote:
list_params["Prefix"] = f"{remote}/"
response = storage.client.list_objects_v2(**list_params)
if 'Contents' in response:
objects_to_delete = [obj['Key'] for obj in response['Contents']]
@@ -128,8 +135,9 @@ def flush_cache(
logger.warning(f"Failed to delete S3 object {key}: {e}")
if objects_to_delete:
result["flushed"]["operations"].append(f"Deleted {len(objects_to_delete)} S3 objects")
logger.info(f"Cache flush: Deleted {len(objects_to_delete)} S3 objects")
scope = f" for remote '{remote}'" if remote else ""
result["flushed"]["operations"].append(f"Deleted {len(objects_to_delete)} S3 objects{scope}")
logger.info(f"Cache flush: Deleted {len(objects_to_delete)} S3 objects{scope}")
except Exception as e:
result["flushed"]["operations"].append(f"S3 flush failed: {str(e)}")
@@ -158,6 +166,12 @@ async def construct_remote_url(remote_name: str, path: str) -> str:
status_code=500, detail=f"No base_url configured for remote '{remote_name}'"
)
# Handle Docker registry URLs
if remote_config.get("package") == "docker":
# Convert Docker paths to v2 API format
# e.g., library/nginx/manifests/latest -> v2/library/nginx/manifests/latest
return f"{base_url}/v2/{path}"
return f"{base_url}/{path}"
@@ -200,8 +214,35 @@ async def cache_single_artifact(url: str, remote_name: str, path: str) -> dict:
}
try:
remote_config = config.get_remote_config(remote_name) or {}
is_docker = remote_config.get("package") == "docker" or "/v2/" in url
# Prepare headers for Docker registry requests
headers = {}
if is_docker:
if "/manifests/" in url:
headers["Accept"] = (
"application/vnd.docker.distribution.manifest.v2+json,"
"application/vnd.oci.image.manifest.v1+json,"
"application/vnd.oci.image.index.v1+json,"
"application/vnd.docker.distribution.manifest.list.v2+json"
)
elif "/blobs/" in url:
headers["Accept"] = "application/octet-stream"
async with httpx.AsyncClient(follow_redirects=True) as client:
response = await client.get(url)
response = await client.get(url, headers=headers)
# Handle Docker Bearer token challenge
if response.status_code == 401 and is_docker:
www_auth = response.headers.get("WWW-Authenticate", "")
username = remote_config.get("username")
password = remote_config.get("password")
token = await get_docker_token_for_response(www_auth, username, password)
if token:
headers["Authorization"] = f"Bearer {token}"
response = await client.get(url, headers=headers)
response.raise_for_status()
storage_path = storage.upload(key, response.content)
@@ -394,6 +435,88 @@ async def get_artifact(remote_name: str, path: str):
raise HTTPException(status_code=500, detail=f"Error serving artifact: {str(e)}")
@app.get("/v2/")
async def docker_v2_ping():
return Response(
content="{}",
media_type="application/json",
headers={"Docker-Distribution-Api-Version": "registry/2.0"},
)
@app.api_route("/v2/{remote_name}/{path:path}", methods=["GET", "HEAD"])
async def docker_v2_proxy(request: Request, remote_name: str, path: str):
remote_config = config.get_remote_config(remote_name)
if not remote_config:
raise HTTPException(status_code=404, detail=f"Remote '{remote_name}' not configured")
if remote_config.get("package") != "docker":
raise HTTPException(status_code=400, detail=f"Remote '{remote_name}' is not a docker remote")
# Check include_patterns against the image name (e.g. "library/nginx")
patterns = config.get_repository_patterns(remote_name, "")
if patterns:
path_parts = path.split("/")
image_name = "/".join(path_parts[:2]) if len(path_parts) >= 2 else path
if not any(re.search(p, path) or re.search(p, image_name) for p in patterns):
logger.info(f"PATTERN BLOCKED: {remote_name}/{path}")
raise HTTPException(status_code=403, detail="Image not allowed by configuration patterns")
remote_url = await construct_remote_url(remote_name, path)
cached_key = storage.get_object_key(remote_name, path)
if not storage.exists(cached_key):
cached_key = None
is_index = cache.is_index_file(path)
if cached_key and is_index:
if not cache.is_index_valid(remote_name, path):
logger.info(f"Index EXPIRED: {remote_name}/{path} - removing from cache")
cache.cleanup_expired_index(storage, remote_name, path)
cached_key = None
if not cached_key:
logger.info(f"Cache MISS: {remote_name}/{path} - fetching from remote: {remote_url}")
result = await cache_single_artifact(remote_url, remote_name, path)
if result["status"] == "error":
raise HTTPException(status_code=502, detail=f"Failed to fetch: {result['error']}")
if result["status"] == "cached" and is_index:
cache_config = config.get_cache_config(remote_name)
index_ttl = cache_config.get("index_ttl", 300)
cache.mark_index_cached(remote_name, path, index_ttl)
logger.info(f"Index file cached with TTL: {remote_name}/{path} (ttl: {index_ttl}s)")
artifact_data = storage.download_object(storage.get_object_key(remote_name, path))
is_blob = "/blobs/" in path
if is_blob:
content_type = "application/octet-stream"
else:
try:
manifest_json = json.loads(artifact_data)
content_type = manifest_json.get("mediaType")
if not content_type:
if "manifests" in manifest_json:
content_type = "application/vnd.oci.image.index.v1+json"
else:
content_type = "application/vnd.oci.image.manifest.v1+json"
except Exception:
content_type = "application/vnd.oci.image.manifest.v1+json"
digest = f"sha256:{hashlib.sha256(artifact_data).hexdigest()}"
headers = {
"Docker-Distribution-Api-Version": "registry/2.0",
"Docker-Content-Digest": digest,
"Content-Length": str(len(artifact_data)),
}
if request.method == "HEAD":
return Response(status_code=200, headers=headers, media_type=content_type)
metrics.record_cache_hit(remote_name, len(artifact_data))
return Response(content=artifact_data, media_type=content_type, headers=headers)
async def discover_artifacts(remote: str, include_pattern: str) -> list[str]:
if "github.com" in remote:
return await discover_github_releases(remote, include_pattern)
+8
View File
@@ -60,6 +60,14 @@ class S3Storage:
filename = os.path.basename(clean_path)
directory_path = os.path.dirname(clean_path)
# Special handling for Docker registry blobs (use digest as key for deduplication)
if "/blobs/sha256:" in clean_path:
# Extract the SHA256 digest for Docker blobs
parts = clean_path.split("/blobs/sha256:")
if len(parts) == 2:
digest = parts[1]
return f"{remote_name}/blobs/sha256/{digest}"
# Hash the directory path to keep keys manageable while preserving remote structure
if directory_path:
path_hash = hashlib.sha256(directory_path.encode()).hexdigest()[:16]