Compare commits

...

44 Commits

Author SHA1 Message Date
unkinben 34160032fc feat: use top-level key for repo type instead of type field
Replace the flat `remotes:` map (with `type: "remote"/"virtual"/"local"`) with
separate top-level sections — `remote:`, `virtual:`, `local:` — so the repo
type is declared structurally and the `type:` field is no longer needed.

Config loader normalises the new format to the existing internal representation
(injecting `type` into each remote dict), so all handler code is unchanged.
Adds a TestYamlTypeKeys suite covering all three type keys, mixed files, and
field preservation. Includes README migration guide for splitting a single
remotes file into per-type-and-package conf.d files.
2026-04-29 23:24:54 +10:00
unkinben c7baae8d0d feat: add virtual repository support for unified index merging (#30)
Adds a new virtual repo type that merges indexes from multiple member remotes
of the same package type. Currently supports helm (index.yaml merge with URL
rewriting). Member fetches run in parallel; merged index is Redis-cached at
min(mutable_ttl) across members.

Reviewed-on: #30
2026-04-29 23:01:14 +10:00
unkinben 4789635e87 Merge pull request 'chore: move example config files into examples/' (#27) from benvin/examples-directory into master
ci/woodpecker/tag/docker Pipeline was successful
Reviewed-on: #27
2026-04-28 23:47:03 +10:00
unkinben ba52fedd27 chore: restructure examples into single-file and conf.d-method subdirs
ci/woodpecker/pr/pre-commit Pipeline was successful
ci/woodpecker/pr/test Pipeline was successful
ci/woodpecker/pr/build Pipeline was successful
examples/single-file/remotes.yaml  — original monolithic config
examples/conf.d-method/            — one yaml per remote (alpine, github, pypi)

docker-compose updated to mount from examples/single-file/.
2026-04-28 23:46:06 +10:00
unkinben 76633403b2 chore: move example config files into examples/
ci/woodpecker/pr/pre-commit Pipeline was successful
ci/woodpecker/pr/test Pipeline was successful
ci/woodpecker/pr/build Pipeline was successful
Keeps the repo root clean — example remotes.yaml lives in examples/.
docker-compose.yml updated to mount from the new path.
2026-04-28 23:44:14 +10:00
unkinben cae3503ac4 Merge pull request 'feat: support config.d directory for split configuration (closes #20)' (#26) from benvin/issue-20-config-dir-split into master
Reviewed-on: #26
2026-04-28 23:39:56 +10:00
unkinben 3f098df428 chore: add conf.d example split-config files
ci/woodpecker/pr/pre-commit Pipeline was successful
ci/woodpecker/pr/test Pipeline was successful
ci/woodpecker/pr/build Pipeline was successful
Three example files (alpine, github, pypi) demonstrating per-remote
YAML files for the conf.d directory mode.
2026-04-28 23:29:41 +10:00
unkinben 64266f40e9 feat: support config.d directory for split configuration (closes #20)
ci/woodpecker/pr/pre-commit Pipeline was successful
ci/woodpecker/pr/test Pipeline was successful
ci/woodpecker/pr/build Pipeline was successful
CONFIG_PATH now accepts a directory path (all *.yaml files merged) or a
main file with a config_dir key pointing to a drop-in directory. Remotes
are merged alphabetically across files; later files win on conflicts.
2026-04-28 23:21:02 +10:00
unkinben be25fc19f7 Merge pull request 'feat: quarantine new releases (supply-chain attack prevention)' (#25) from benvin/issue-22-quarantine into master
Reviewed-on: #25
2026-04-28 23:13:28 +10:00
unkinben 3bd3ca8b74 feat: quarantine new releases to prevent supply chain attacks
ci/woodpecker/pr/pre-commit Pipeline was successful
ci/woodpecker/pr/test Pipeline was successful
ci/woodpecker/pr/build Pipeline was successful
Add per-remote quarantine support: when quarantine_new=true and quarantine_days=N,
immutable artifacts published within the last N days are blocked with 404 until
the quarantine window expires.

- ConfigManager.get_quarantine_config() reads quarantine_new/quarantine_days
- RedisCache.store/get_artifact_published() persist Last-Modified per artifact
- proxy._check_quarantine() enforces the window; fails open when date is unknown
- proxy._fetch_last_modified() HEAD-requests upstream to discover publish date
- Docker proxy route wires quarantine checks on both cache-hit and cache-miss
- remotes.yaml: quarantine_new/quarantine_days added to pypi example (3-day window)
- README: documents quarantine configuration
2026-04-28 23:01:52 +10:00
unkinben 373366e695 Merge pull request 'refactor: split codebase into submodules (closes #19)' (#24) from benvin/issue-19-submodules into master
Reviewed-on: #24
2026-04-28 22:47:38 +10:00
unkinben e6d9b175ce refactor: extract route handler logic into artifact/ subpackage
ci/woodpecker/pr/pre-commit Pipeline was successful
ci/woodpecker/pr/test Pipeline was successful
ci/woodpecker/pr/build Pipeline was successful
Each route in main.py is now a single-line delegation to an artifact submodule:
- artifact/proxy.py  — remote artifact GET, caching, mutable revalidation
- artifact/local.py  — local repo upload/check/delete
- artifact/docker.py — Docker Registry v2 proxy + ping
- artifact/discovery.py — GitHub release discovery + bulk cache
- artifact/flush.py  — cache flush

UpstreamUnreachable, cache_single_artifact, _upstream_reachable and
check_upstream_changed moved from main.py to artifact/proxy.py.
Tests updated to patch at their new locations.

All 187 tests pass.
2026-04-28 22:21:01 +10:00
unkinben 0daca40156 refactor: add storage/s3 and auth/docker submodules
- storage/s3.py: S3Storage moved from storage.py; storage/__init__.py re-exports it
- auth/docker.py: Docker Bearer token logic moved from docker_auth.py
- docker_auth.py: thin shim re-exporting all public symbols (including _token_cache)
  for backwards compatibility with existing test and import paths
- main.py: now imports get_docker_token_for_response from .auth

All 187 tests pass.
2026-04-28 22:15:04 +10:00
unkinben 0df726467a refactor: split cache, database, and remote logic into submodules
cache/redis.py, database/postgres.py, and remote/{base,generic,helm,npm,python,rpm}.py
replace the flat modules. All public symbols re-exported from their package
__init__.py for backwards compatibility. No functional changes; all 187 tests pass.

Closes #19
2026-04-28 22:09:58 +10:00
unkinben b8bc7f8714 Merge pull request 'chore: cleanup the readme' (#23) from benvin/readme-refactor into master
Reviewed-on: #23
2026-04-28 22:00:32 +10:00
unkinben 0c780c1bd1 chore: cleanup the readme
ci/woodpecker/pr/pre-commit Pipeline was successful
ci/woodpecker/pr/test Pipeline was successful
ci/woodpecker/pr/build Pipeline was successful
2026-04-28 21:57:14 +10:00
unkinben 173b5d8b10 Merge pull request 'refactor: simplify pypi and npm URL rewriting' (#18) from benvin/simplify-remote-url-rewriting into master
Reviewed-on: #18
2026-04-27 22:43:33 +10:00
unkinben 3352a3e886 refactor: simplify pypi and npm URL rewriting — single remote, no redundant config keys
ci/woodpecker/pr/pre-commit Pipeline was successful
ci/woodpecker/pr/test Pipeline was successful
ci/woodpecker/pr/build Pipeline was successful
- npm: remove npm_files_url/npm_files_remote; rewrite uses base_url and
  remote name directly (same approach as helm)
- npm: replace hardcoded .tgz extension check with immutable_patterns match
- pypi: collapse pypi + pypi-files into a single remote (base_url points
  to files.pythonhosted.org); simple/ requests are transparently fetched
  from pypi.org with no extra config required
- pypi: remove pypi_files_url/pypi_files_remote from pypi and pypi-gitea
- pypi: rewrite check now uses immutable_patterns (consistent with npm)
- Update README for both pypi and npm sections
- Update tests and fixtures to reflect single-remote pypi config
2026-04-27 22:42:23 +10:00
unkinben 8adcbac405 Merge pull request 'feat: add helm chart repository caching proxy' (#17) from benvin/helm-remote into master
ci/woodpecker/tag/docker Pipeline was successful
Reviewed-on: #17
2026-04-27 22:22:36 +10:00
unkinben 4ca89b9159 feat: add helm chart repository caching proxy
ci/woodpecker/pr/pre-commit Pipeline was successful
ci/woodpecker/pr/test Pipeline was successful
ci/woodpecker/pr/build Pipeline was successful
- Add helm package type with index.yaml as mutable (TTL-based) and
  .tgz chart tarballs as immutable
- Rewrite chart URLs in index.yaml to serve tarballs via proxy cache
- Add text/yaml content-type detection for .yaml/.yml files
- Add hashicorp-helm example remote in remotes.yaml
- Update README with Helm chart repository proxy section
- Add tests for helm mutable patterns and route behaviour
2026-04-27 22:17:31 +10:00
unkinben 25b85ddc92 Merge pull request 'feat: add npm registry caching proxy' (#16) from benvin/npm-remote into master
ci/woodpecker/tag/docker Pipeline was successful
Reviewed-on: #16
2026-04-27 20:30:18 +10:00
unkinben d585ab425c feat: add npm remote type with metadata URL rewriting and caching
ci/woodpecker/pr/test Pipeline was successful
ci/woodpecker/pr/pre-commit Pipeline was successful
ci/woodpecker/pr/build Pipeline was successful
- Add `npm` package type to config with no built-in mutable defaults;
  users set explicit mutable_patterns (e.g. ^(?!.*\.tgz$).*) and
  immutable_patterns (e.g. \.tgz$) in remotes.yaml
- Rewrite dist.tarball URLs in metadata JSON on the fly so tarball
  downloads pass through the same proxy remote instead of hitting
  npmjs.org directly
- Single-remote design: npm_files_remote points back to itself since
  both metadata and tarballs are served from registry.npmjs.org
- Add .tgz to _get_content_type (application/gzip)
- Add example npm remote to remotes.yaml
- Add npm proxy section to README covering remotes.yaml config,
  client setup (npm/yarn/pnpm), rewriting behaviour, and
  mutable vs immutable path table
- Add tests for mutable pattern matching, URL rewriting, content-type,
  scoped packages, cache miss, and tarball immutability
2026-04-27 20:28:31 +10:00
unkinben 6b1a6c9eb4 Merge pull request 'feat: add PyPI remote type with URL rewriting and basic auth' (#15) from benvin/pypi-remote into master
Reviewed-on: #15
2026-04-27 14:46:27 +10:00
unkinben 5de912db75 docs: describe PyPI remote usage with uv system/user uv.toml
ci/woodpecker/pr/pre-commit Pipeline was successful
ci/woodpecker/pr/test Pipeline was successful
ci/woodpecker/pr/build Pipeline was successful
2026-04-27 14:37:41 +10:00
unkinben 8e9d313892 feat: add pypi remote type with URL rewriting and basic auth
ci/woodpecker/pr/pre-commit Pipeline was successful
ci/woodpecker/pr/test Pipeline was successful
ci/woodpecker/pr/build Pipeline was successful
- Add 'pypi' package type to config.py; simple/ paths are mutable by default
- Refactor content-type detection into _get_content_type() helper; add .whl
- Add _resolve_content() which rewrites files host URLs in simple index HTML
  to go through the proxy (pypi_files_url / pypi_files_remote config keys),
  and returns text/html content-type for simple index responses
- Add basic auth support for non-Docker remotes (username + password/token
  in remote config); thread auth through _upstream_reachable and
  check_upstream_changed so mutable TTL checks also authenticate
- Add 'pypi' remote (pypi.org simple index) and 'pypi-files' remote
  (files.pythonhosted.org) to remotes.yaml; add 'pypi-gitea' example for
  Gitea package registries where index and files share the same base URL
- Add unit tests: simple index URL rewriting, HTML content-type, .whl/.tar.gz
  content-types, mutable index detection, and immutable pattern enforcement
2026-04-27 14:31:33 +10:00
unkinben 70cd439961 Merge pull request 'feat: immutable/mutable caching patterns with conditional revalidation and stale fallback' (#14) from benvin/immutable-mutable-patterns into master
ci/woodpecker/tag/docker Pipeline was successful
Reviewed-on: #14
2026-04-27 11:44:49 +10:00
unkinben fe837dabf7 feat: keep stale mutables when upstream is unreachable; update README
ci/woodpecker/pr/pre-commit Pipeline was successful
ci/woodpecker/pr/test Pipeline was successful
ci/woodpecker/pr/build Pipeline was successful
When a mutable file's TTL expires and the upstream backend cannot be
contacted (network error or timeout), the cached copy is kept and its
TTL refreshed instead of being evicted. This keeps RPM repodata, Alpine
indexes, branch archives, and other mutable data available during
upstream outages.

Adds UpstreamUnreachable exception and _upstream_reachable() helper.
check_upstream_changed() now raises UpstreamUnreachable on network
errors (was silently returning True). handle_expired_mutable() catches
the exception on the check_mutable_updates path and calls
_upstream_reachable() on the plain-expiry path.

README updated to current immutable/mutable terminology and documents
all new caching features.
2026-04-27 11:38:50 +10:00
unkinben 78296dae8f refactor: extract handle_expired_mutable helper; add redownload success test
Deduplicates the expired-mutable TTL/redownload branching logic that
was copied verbatim between get_artifact and docker_v2_proxy. Adds the
missing happy-path test for a changed mutable file that is successfully
re-fetched from upstream.
2026-04-27 11:13:15 +10:00
unkinben 8fe4bac2b9 feat: add check_mutable_updates flag for conditional upstream revalidation
When check_mutable_updates: true is set on a remote, expired user-defined
mutable files are revalidated before re-downloading:

- On expiry a conditional HEAD is sent with If-None-Match / If-Modified-Since
- 304 Not Modified: TTL is refreshed in Redis, S3 cache is untouched
- 200 / no conditional support: cache is invalidated and file re-downloaded
- Network error: safe fallback — assume changed, re-download

ETag and Last-Modified from upstream responses are stored in Redis under
mutable:meta:<remote>:<hash> (no expiry, cleaned up on re-download or
cache flush). The flag only applies to user-configured mutable_patterns;
built-in package-type defaults (APKINDEX, repomd.xml, Docker manifests)
are always re-fetched unconditionally.

cache/flush also clears mutable:meta:* keys alongside index:* keys.
2026-04-27 11:00:09 +10:00
unkinben 8bc9285117 chore: track remotes.yaml as a documented example config
Remove remotes.yaml from .gitignore and add header comments explaining
the immutable_patterns/mutable_patterns/cache keys. Marks the file
clearly as an example to copy and adapt; warns against committing
real credentials.
2026-04-27 10:58:59 +10:00
unkinben ce01a94141 feat: rename include/index patterns to immutable/mutable with per-remote TTL
Replace the include_patterns/index_patterns split with a clearer
immutable_patterns/mutable_patterns model:

- immutable_patterns: artifacts cached indefinitely (no TTL)
- mutable_patterns: artifacts that expire and are re-fetched after
  cache.mutable_ttl seconds (replaces cache.index_ttl)

_PACKAGE_INDEX_PATTERNS renamed to _PACKAGE_MUTABLE_PATTERNS; all
built-in package-type index patterns (APKINDEX, repomd, manifests, etc.)
default to the remote's mutable_ttl (default 1 hour).

cache.file_ttl renamed to cache.immutable_ttl for consistency.
Adds github-archive remote to remotes.yaml as a worked example showing
tag archives as immutable and branch archives as mutable (1-day TTL).

docker-compose.yml: fix VERSION=dev → 2.2.2.dev0 (valid PEP 440),
add :z SELinux label to volume mounts.
2026-04-27 00:40:13 +10:00
unkinben 4619ae18d8 Merge pull request 'chore: remove build from tag' (#13) from benvin/docker-compose-build into master
Reviewed-on: #13
2026-04-25 22:29:48 +10:00
unkinben ac51d3a51d chore: remove build from tag
ci/woodpecker/pr/test Pipeline was successful
ci/woodpecker/pr/pre-commit Pipeline was successful
ci/woodpecker/pr/build Pipeline was successful
- stop building the image on tag events
2026-04-25 22:27:59 +10:00
unkinben 2887ce4476 Merge pull request 'build: align Dockerfile with packer build and add docker-compose dev mounts' (#12) from benvin/packer-aligned-dockerfile into master
ci/woodpecker/tag/docker Pipeline was successful
Reviewed-on: #12
2026-04-25 22:23:59 +10:00
unkinben 9e52929d73 build: align Dockerfile with packer build and add docker-compose dev mounts
ci/woodpecker/pr/pre-commit Pipeline was successful
ci/woodpecker/pr/test Pipeline was successful
ci/woodpecker/pr/build Pipeline was successful
- Rebase Dockerfile onto almalinux9-base, install via uv tool install
- Remove dev artifacts (remotes.yaml, ca-bundle.pem) from image
- Mount gitignored dev files via docker-compose volumes instead
- Add .dockerignore to keep secrets out of build context
- Track docker-compose.yml in git (no secrets; dev files mounted as volumes)
2026-04-25 22:17:36 +10:00
unkinben 788d469063 Merge pull request 'benvin/configurable-index-patterns' (#11) from benvin/configurable-index-patterns into master
ci/woodpecker/tag/docker Pipeline failed
Reviewed-on: #11
2026-04-25 21:04:25 +10:00
unkinben 1cbe836f1b ci: add Woodpecker pipelines for pre-commit, tests, and Docker build
ci/woodpecker/pr/pre-commit Pipeline was successful
ci/woodpecker/pr/test Pipeline was successful
2026-04-25 21:02:39 +10:00
unkinben f3394b9ca6 docs: add RKE2 image rewriting guide and expand pattern examples
Add a new "Docker Image Rewriting with RKE2" section covering:
- How the /v2/ proxy integrates with registries.yaml mirror rewrites
- Per-registry examples (docker.io, ghcr.io, registry.k8s.io, quay.io)
- include_patterns for restricting which images are cached
- TLS CA configuration for private certificate authorities
- Apply and verification commands

Expand the Configuration section with:
- Richer include_patterns examples (anchored, extension, architecture,
  Docker image name patterns, repodata directories)
- New index_patterns section explaining built-in defaults per package
  type and how to add custom patterns (Helm index.yaml, APT InRelease/
  Packages.gz, extra RPM comps.xml)
2026-04-25 20:20:42 +10:00
unkinben 8da43e610e tests: resolve all peer-review issues across test suite
Address every substantive critique from the peer review:

test_cache: replace tautological same-inputs key test with hardcoded
hash assertion; assert setex call + TTL in mark_index_cached test;
assert client is None for unavailable no-op; rename Packages.gz test
to document intentional behaviour; add alpine sig/tmp negatives; add
hyphenated and date-tag docker positive cases; add key hash-length
assertion.

test_config: replace live-constant comparisons with literal string
assertions for alpine/rpm/docker; add unknown package type test;
add dict-keyed repositories branch coverage (per-repo override and
fallback); fix cache config to full equality check; add explicit empty
index_patterns test.

test_docker_auth: fix case-insensitive test to verify realm value;
add field-order (scope before service) limitation test; add pipe-char
collision documentation test; add missing fetch_token edge cases
(no token field, HTTPStatusError, missing expires_in default 300);
replace rubber-stamp delegate test with end-to-end parse→fetch test.

test_storage: replace split prefix/suffix assertions with structural
3-part check + pinned sha256 assertion; fix Docker blob digests to
64-char hex; add secure=True URL test; add upload return value test;
add download_object 404-on-ClientError test; remove redundant subset
test.

test_routes: add metrics.record_cache_hit/miss assertions; add
mark_index_cached assertion after cache miss on index (docker + generic);
add Content-Disposition, X-Artifact-Size header checks; add rpm/xml
content-type tests; add flush test that verifies Redis keys are deleted
when cache is available; add smoke coverage for upload (PUT), HEAD, DELETE,
/metrics, and /config routes.
2026-04-25 19:58:33 +10:00
unkinben 3a13d76f7e chore: add .tox, .pytest_cache, .pre-commit-cache, .ruff_cache to .gitignore 2026-04-25 19:21:43 +10:00
unkinben 2d0e2c64e6 feat: add test suite, tox, pre-commit, and ruff formatting
- tests/: 107 unit tests across config, cache, docker_auth, storage,
  and FastAPI routes; all passing under pytest-asyncio auto mode
- tox.ini: runs pytest via uvx --with tox-uv tox (py311)
- .pre-commit-config.yaml: ruff lint + ruff-format at v0.15.12
- pyproject.toml: pytest config (asyncio_mode=auto), ruff config
  (line-length=140), tox/pre-commit added to dev extras
- Makefile: test/tox/pre-commit targets via uvx --python 3.11
- Source files reformatted by ruff-format (no logic changes)
2026-04-25 19:21:05 +10:00
unkinben 2414ddfdd3 feat: make index file patterns configurable per remote
Replace hardcoded is_index_file logic with regex patterns driven by
remotes.yaml. Package-level defaults (alpine/rpm/docker) are merged with
any extra patterns listed under index_patterns in the remote config.
2026-04-25 18:40:45 +10:00
unkinben b3d12f4962 docs: add SPEC.md with repository model and caching requirements 2026-04-25 18:31:27 +10:00
unkinben 92b9f9a03e refactor: use package: docker instead of type: docker
Align with intended type=local|remote|virtual / package=docker|rpm|alpine|generic
model. All docker-specific logic now keyed on package field; type field
correctly reflects the repository kind (remote vs local).
2026-04-25 18:27:31 +10:00
56 changed files with 6148 additions and 1709 deletions
+15
View File
@@ -0,0 +1,15 @@
.git/
.venv/
dist/
tests/
remotes.yaml
ca-bundle.pem
.env
*.log
docker-compose.yml
.woodpecker/
.tox/
.ruff_cache/
.pytest_cache/
.pre-commit-cache/
minio_data/
+12 -2
View File
@@ -35,7 +35,6 @@ env/
# Environment variables
.env
remotes.yaml
# Logs
*.log
@@ -43,9 +42,20 @@ remotes.yaml
# uv
uv.lock
# tox
.tox/
# pytest
.pytest_cache/
# pre-commit
.pre-commit-cache/
# ruff
.ruff_cache/
# Docker volumes
minio_data/
# Local configuration overrides
docker-compose.yml
ca-bundle.pem
+7
View File
@@ -0,0 +1,7 @@
repos:
- repo: https://github.com/astral-sh/ruff-pre-commit
rev: v0.15.12
hooks:
- id: ruff
args: [--fix, --exit-non-zero-on-fix]
- id: ruff-format
+9
View File
@@ -0,0 +1,9 @@
when:
- event: pull_request
steps:
- name: docker-build
image: woodpeckerci/plugin-docker-buildx
settings:
repo: git.unkin.net/unkin/artifactapi
dry_run: true
+18
View File
@@ -0,0 +1,18 @@
when:
- event: tag
ref: refs/tags/v*
steps:
- name: docker
image: woodpeckerci/plugin-docker-buildx
settings:
registry: git.unkin.net
repo: git.unkin.net/unkin/artifactapi
username: droneci
password:
from_secret: DRONECI_PASSWORD
tags:
- ${CI_COMMIT_TAG}
- latest
build_args:
- VERSION=${CI_COMMIT_TAG##v}
+9
View File
@@ -0,0 +1,9 @@
when:
- event: pull_request
steps:
- name: pre-commit
image: git.unkin.net/unkin/almalinux9-base:20260308
commands:
- uvx pre-commit run --all-files
+8
View File
@@ -0,0 +1,8 @@
when:
- event: pull_request
steps:
- name: test
image: git.unkin.net/unkin/almalinux9-base:20260308
commands:
- uvx --python 3.11 --with tox-uv tox
+15 -45
View File
@@ -1,53 +1,23 @@
# Use Alpine Linux as base image
FROM python:3.11-alpine
FROM git.unkin.net/unkin/almalinux9-base:latest
# Set working directory
WORKDIR /app
ARG VERSION=0.0.0.dev0
# Install system dependencies
RUN apk add --no-cache \
gcc \
musl-dev \
libffi-dev \
postgresql-dev \
curl \
wget \
tar
COPY . /build
# Install uv
ARG PACKAGE_VERSION=0.9.21
RUN wget -O /app/uv-x86_64-unknown-linux-musl.tar.gz https://github.com/astral-sh/uv/releases/download/${PACKAGE_VERSION}/uv-x86_64-unknown-linux-musl.tar.gz && \
tar xf /app/uv-x86_64-unknown-linux-musl.tar.gz -C /app && \
mv /app/uv-x86_64-unknown-linux-musl/uv /usr/local/bin/uv && \
rm -rf /app/uv-x86_64-unknown-linux-musl* && \
chmod +x /usr/local/bin/uv && \
uv --version
RUN HATCH_VCS_PRETEND_VERSION=${VERSION} \
SETUPTOOLS_SCM_PRETEND_VERSION=${VERSION} \
uv build --wheel --directory /build && \
useradd -m -r -s /bin/sh appuser
# Create non-root user first
RUN adduser -D -s /bin/sh appuser && \
chown -R appuser:appuser /app
# Copy dependency files and change ownership
COPY --chown=appuser:appuser pyproject.toml uv.lock README.md ./
# Switch to appuser and install Python dependencies
USER appuser
ARG VERSION=dev
ENV HATCH_VCS_PRETEND_VERSION=${VERSION} \
SETUPTOOLS_SCM_PRETEND_VERSION=${VERSION}
RUN uv sync --frozen
RUN uv tool install --from /build/dist/*.whl artifactapi
# Copy application source
COPY --chown=appuser:appuser src/ ./src/
COPY --chown=appuser:appuser remotes.yaml ./
COPY --chown=appuser:appuser ca-bundle.pem ./
USER root
RUN rm -rf /build
# Expose port
EXPOSE 8000
# Health check
HEALTHCHECK --interval=30s --timeout=10s --start-period=5s --retries=3 \
CMD curl -f http://localhost:8000/health || exit 1
# Run the application
CMD ["uv", "run", "python", "-m", "src.artifactapi.main"]
HEALTHCHECK --interval=30s --timeout=10s --start-period=5s --retries=3 CMD curl -f http://localhost:8000/health || exit 1
USER appuser
ENV PATH="/home/appuser/.local/bin:$PATH"
WORKDIR /app
CMD ["artifactapi"]
+9 -4
View File
@@ -1,7 +1,7 @@
.PHONY: build install dev clean test lint format docker-build docker-up docker-down docker-logs docker-rebuild docker-clean docker-restart
.PHONY: build install dev clean test lint format pre-commit tox docker-build docker-up docker-down docker-logs docker-rebuild docker-clean docker-restart
build:
docker build --no-cache -t artifactapi:latest .
docker build -t artifactapi:dev .
install: build
@@ -17,7 +17,13 @@ clean:
rm -rf *.egg-info/
test:
uv run pytest
uvx --python 3.11 --with tox-uv tox
tox:
uvx --python 3.11 --with tox-uv tox
pre-commit:
uvx --python 3.11 pre-commit run --all-files
lint:
uv run ruff check --fix .
@@ -68,4 +74,3 @@ major:
_tag:
git push origin $(TAG)
docker-compose build --no-cache --build-arg VERSION=$(TAG:v%=%)
+507 -614
View File
File diff suppressed because it is too large Load Diff
+137
View File
@@ -0,0 +1,137 @@
# ArtifactAPI Specification
## Repository model
Every repository entry in `remotes.yaml` has two orthogonal fields:
| field | values | meaning |
|---|---|---|
| `type` | `local`, `remote`, `virtual` | repository kind — how the repo is served |
| `package` | `docker`, `rpm`, `alpine`, `generic` | package format — what protocol and caching rules to apply |
**type**
- `local` — files are uploaded directly to the API and stored in S3; no upstream.
- `remote` — proxies and caches content from an upstream URL (`base_url`).
- `virtual` — aggregates multiple repositories (not yet implemented).
**package**
- `docker` — upstream speaks the OCI Distribution API (Bearer auth, manifest/blob paths).
- `rpm` — upstream is an RPM repository; repodata files are index files.
- `alpine` — upstream is an Alpine APK repository; `APKINDEX.tar.gz` is an index file.
- `generic` — plain HTTP file download; no format-specific logic.
---
## Caching
Two cache classes determine retention:
| class | stored | TTL |
|---|---|---|
| **file** | S3 object, no Redis entry | `file_ttl``0` means indefinite |
| **index** | S3 object + Redis TTL key | `index_ttl` — when the Redis key expires the S3 object is deleted and re-fetched |
Index files are mutable metadata that must expire. File-class objects are treated as immutable and cached indefinitely (unless `file_ttl` is set).
---
## Docker package rules
### URL construction
Remote URLs are prefixed with `/v2/` for `package: docker` remotes:
```
{base_url}/v2/{path}
```
e.g. `library/nginx/manifests/latest``https://registry-1.docker.io/v2/library/nginx/manifests/latest`
### Authentication
Docker registries use Bearer token challenges. On a `401 Unauthorized` response, the API:
1. Parses the `WWW-Authenticate: Bearer` header for `realm`, `service`, and `scope`.
2. Fetches a token from the auth realm, supplying `username`/`password` from the remote config if present.
3. Retries the request with `Authorization: Bearer <token>`.
Tokens are cached in-memory keyed by `(realm, service, scope, username)` and expire 30 seconds before their stated `expires_in`.
### Cache classification
| path pattern | mutable | class | TTL source |
|---|---|---|---|
| `/manifests/<tag>` | yes | index | `index_ttl` |
| `/tags/list` | yes | index | `index_ttl` |
| `/manifests/sha256:<digest>` | no | file | `file_ttl` |
| `/blobs/sha256:<digest>` | no | file | `file_ttl` |
Tag-based manifests and tag lists are mutable and cached as index. Digest-pinned manifests and blobs are content-addressed and cached indefinitely as files.
### Blob deduplication
Blobs are stored under a digest-keyed path shared across all images on the same remote:
```
{remote_name}/blobs/sha256/{digest}
```
The same layer pulled by different images is stored once.
### Accept headers
| path | `Accept` header sent upstream |
|---|---|
| `/manifests/…` | `application/vnd.docker.distribution.manifest.v2+json`, `application/vnd.oci.image.manifest.v1+json`, `application/vnd.oci.image.index.v1+json`, `application/vnd.docker.distribution.manifest.list.v2+json` |
| `/blobs/…` | `application/octet-stream` |
---
## OCI Distribution API endpoint
The API exposes a native Docker registry interface so clients can use `docker pull` directly:
```
GET /v2/ — version ping
GET /v2/{remote}/{image}/manifests/{ref} — fetch manifest
HEAD /v2/{remote}/{image}/manifests/{ref} — manifest metadata
GET /v2/{remote}/{image}/blobs/{digest} — fetch blob
HEAD /v2/{remote}/{image}/blobs/{digest} — blob metadata
```
Responses include `Docker-Distribution-Api-Version`, `Docker-Content-Digest`, and the correct OCI `Content-Type` (detected from the manifest `mediaType` field).
Only remotes with `package: docker` are accessible via this endpoint. All other remotes return `400`.
---
## include_patterns
`include_patterns` is a list of Python regexes applied to every request before any upstream fetch or cache lookup.
**Generic remotes (`/api/v1/remote/…`):**
- Patterns match against the file path and the full path.
- Index files (mutable metadata) bypass pattern checks and are always allowed.
**Docker remotes (`/v2/…`):**
- Patterns match against the image name (first two path segments, e.g. `library/nginx`) and the full path.
- The index-file exemption does **not** apply — patterns restrict whole images, including their manifests and tag lists.
- No patterns configured → all images allowed.
Returns `403` when a request is blocked.
---
## Versioning
The package version is derived from git tags via `hatch-vcs`. Tags follow the format `v{MAJOR}.{MINOR}.{PATCH}`.
Docker images are built with the version injected at build time:
```
SETUPTOOLS_SCM_PRETEND_VERSION=<version> uv sync --frozen
```
The `Makefile` provides `patch`, `minor`, and `major` targets that tag the current commit and rebuild the container image.
+11
View File
@@ -0,0 +1,11 @@
remotes:
alpine:
base_url: "https://dl-cdn.alpinelinux.org"
type: "remote"
package: "alpine"
description: "Alpine Linux APK package repository"
immutable_patterns:
- ".*/x86_64/.*\\.apk$"
cache:
immutable_ttl: 0
mutable_ttl: 7200
+12
View File
@@ -0,0 +1,12 @@
remotes:
github:
base_url: "https://github.com"
type: "remote"
package: "generic"
description: "GitHub releases and files"
immutable_patterns:
- "gruntwork-io/terragrunt/.*terragrunt_linux_amd64.*"
- "prometheus/node_exporter/.*/node_exporter-.*\\.linux-amd64\\.tar\\.gz$"
cache:
immutable_ttl: 0
mutable_ttl: 0
+17
View File
@@ -0,0 +1,17 @@
remotes:
pypi:
base_url: "https://files.pythonhosted.org"
type: "remote"
package: "pypi"
description: "Python Package Index"
check_mutable_updates: true
quarantine_new: true
quarantine_days: 3
immutable_patterns:
- "packages/.*\\.whl$"
- "packages/.*\\.whl\\.metadata$"
- "packages/.*\\.tar\\.gz$"
- "packages/.*\\.zip$"
cache:
immutable_ttl: 0
mutable_ttl: 600
+91
View File
@@ -0,0 +1,91 @@
version: '3.8'
services:
artifactapi:
build:
context: .
dockerfile: Dockerfile
args:
- VERSION=2.2.2.dev0
ports:
- "8000:8000"
volumes:
- ./examples/single-file/remotes.yaml:/app/remotes.yaml:ro,z
- ./ca-bundle.pem:/app/ca-bundle.pem:ro,z
environment:
- CONFIG_PATH=/app/remotes.yaml
- DBHOST=postgres
- DBPORT=5432
- DBUSER=artifacts
- DBPASS=artifacts123
- DBNAME=artifacts
- REDIS_URL=redis://redis:6379
- MINIO_ENDPOINT=minio:9000
- MINIO_ACCESS_KEY=minioadmin
- MINIO_SECRET_KEY=minioadmin
- MINIO_BUCKET=artifacts
- MINIO_SECURE=false
- REQUESTS_CA_BUNDLE=/app/ca-bundle.pem
depends_on:
postgres:
condition: service_healthy
redis:
condition: service_healthy
minio:
condition: service_healthy
healthcheck:
test: ["CMD", "curl", "-f", "http://localhost:8000/health"]
interval: 30s
timeout: 10s
retries: 3
minio:
image: minio/minio:latest
ports:
- "9000:9000"
- "9001:9001"
environment:
MINIO_ROOT_USER: minioadmin
MINIO_ROOT_PASSWORD: minioadmin
command: server /data --console-address ":9001"
volumes:
- minio_data:/data
healthcheck:
test: ["CMD", "curl", "-f", "http://localhost:9000/minio/health/live"]
interval: 30s
timeout: 20s
retries: 3
redis:
image: redis:7-alpine
ports:
- "6379:6379"
volumes:
- redis_data:/data
command: redis-server --save 20 1
healthcheck:
test: ["CMD", "redis-cli", "ping"]
interval: 30s
timeout: 10s
retries: 3
postgres:
image: postgres:15-alpine
ports:
- "5432:5432"
environment:
POSTGRES_DB: artifacts
POSTGRES_USER: artifacts
POSTGRES_PASSWORD: artifacts123
volumes:
- postgres_data:/var/lib/postgresql/data
healthcheck:
test: ["CMD-SHELL", "pg_isready -U artifacts -d artifacts"]
interval: 30s
timeout: 10s
retries: 3
volumes:
minio_data:
redis_data:
postgres_data:
+10
View File
@@ -0,0 +1,10 @@
remote:
alpine:
base_url: "https://dl-cdn.alpinelinux.org"
package: "alpine"
description: "Alpine Linux APK package repository"
immutable_patterns:
- ".*/x86_64/.*\\.apk$"
cache:
immutable_ttl: 0
mutable_ttl: 7200
+11
View File
@@ -0,0 +1,11 @@
remote:
github:
base_url: "https://github.com"
package: "generic"
description: "GitHub releases and files"
immutable_patterns:
- "gruntwork-io/terragrunt/.*terragrunt_linux_amd64.*"
- "prometheus/node_exporter/.*/node_exporter-.*\\.linux-amd64\\.tar\\.gz$"
cache:
immutable_ttl: 0
mutable_ttl: 0
+16
View File
@@ -0,0 +1,16 @@
remote:
pypi:
base_url: "https://files.pythonhosted.org"
package: "pypi"
description: "Python Package Index"
check_mutable_updates: true
quarantine_new: true
quarantine_days: 3
immutable_patterns:
- "packages/.*\\.whl$"
- "packages/.*\\.whl\\.metadata$"
- "packages/.*\\.tar\\.gz$"
- "packages/.*\\.zip$"
cache:
immutable_ttl: 0
mutable_ttl: 600
+481
View File
@@ -0,0 +1,481 @@
# Example remotes configuration — copy and adapt for your environment.
#
# Top-level keys determine the repository type:
# remote: — proxy to an upstream URL, cache responses in S3
# virtual: — merge indexes from multiple member remotes into one
# local: — store files uploaded via PUT, serve via GET
#
# immutable_patterns: artifacts cached forever (e.g. release binaries, versioned tags).
# mutable_patterns: artifacts that expire after cache.mutable_ttl seconds and are
# re-fetched from upstream on next request (e.g. index files,
# branch archives). Defaults to the package-type built-ins when
# not set (APKINDEX, repomd.xml, Docker manifests, etc.).
# cache:
# immutable_ttl: TTL for immutable files (0 = forever, rarely needed to change).
# mutable_ttl: TTL in seconds for mutable files. Omit to use the default (3600).
#
# quarantine_new: Set to true to block immutable artifacts published within the last
# quarantine_days days. Requests return 404 until the quarantine period
# expires. Fails open when the publish date cannot be determined.
# quarantine_days: Number of days to quarantine newly published artifacts (requires
# quarantine_new: true). The upstream Last-Modified header is used as
# the publish date.
#
# WARNING: this file may contain credentials — do not commit real values.
#
# Global configuration
#s3:
# endpoint: "localhost:9000"
# access_key: "minioadmin"
# secret_key: "minioadmin"
# bucket: "artifacts"
# secure: false
#
#redis:
# url: "redis://localhost:6379/0"
#
#database:
# url: "postgresql://artifacts:artifacts123@localhost:5432/artifacts"
#
remote:
github:
base_url: "https://github.com"
package: "generic"
description: "GitHub releases and files"
immutable_patterns:
- "gruntwork-io/terragrunt/.*terragrunt_linux_amd64.*"
- "lxc/incus/.*\\.tar\\.gz$"
- "prometheus/node_exporter/.*/node_exporter-.*\\.linux-amd64\\.tar\\.gz$"
- "VictoriaMetrics/VictoriaMetrics/.*/vmutils-linux-amd64-.*\\.tar\\.gz$"
- "VictoriaMetrics/VictoriaMetrics/.*/victoria-metrics-linux-amd64-.*-cluster\\.tar\\.gz$"
- "VictoriaMetrics/VictoriaMetrics/.*/victoria-logs-linux-amd64-.*\\.tar\\.gz$"
- "VictoriaMetrics/VictoriaMetrics/.*/vlutils-linux-amd64-.*\\.tar\\.gz$"
- "prometheus-community/bind_exporter/.*/bind_exporter-.*\\.linux-amd64\\.tar\\.gz$"
- "prometheus-community/pgbouncer_exporter/.*/pgbouncer_exporter-.*\\.linux-amd64\\.tar\\.gz$"
- "prometheus-community/postgres_exporter/.*/postgres_exporter-.*\\.linux-amd64\\.tar\\.gz$"
- "onedr0p/exportarr/.*/exportarr_.*_linux_amd64\\.tar\\.gz$"
- "tynany/frr_exporter/.*/frr_exporter-.*\\.linux-amd64\\.tar\\.gz$"
- "camptocamp/prometheus-puppetdb-exporter/.*/prometheus-puppetdb-exporter-.*\\.linux-amd64\\.tar\\.gz$"
- "grafana/jsonnet-language-server/.*/jsonnet-language-server_.*_linux_amd64$"
- "helmfile/helmfile/.*/helmfile_.*_linux_amd64\\.tar\\.gz$"
- "helmfile/vals/.*/vals_.*_linux_amd64\\.tar\\.gz$"
- "openbao/openbao-plugins/.*/openbao-plugin-secrets-consul_linux_amd64_.*\\.tar\\.gz$"
- "openbao/openbao-plugins/.*/openbao-plugin-secrets-nomad_linux_amd64_.*\\.tar\\.gz$"
- "apple/foundationdb/.*/libfdb_c\\.x86_64\\.so$"
- "stalwartlabs/stalwart/.*/stalwart-cli-x86_64-unknown-linux-gnu\\.tar\\.gz$"
- "stalwartlabs/stalwart/.*/stalwart-foundationdb-x86_64-unknown-linux-gnu\\.tar\\.gz$"
- "stalwartlabs/stalwart/.*/stalwart-x86_64-unknown-linux-gnu\\.tar\\.gz$"
cache:
immutable_ttl: 0
mutable_ttl: 0
github-archive:
base_url: "https://github.com"
package: "generic"
description: "GitHub repository archive tarballs"
immutable_patterns:
# Tag archives are immutable — a tag never changes
- ".*/archive/refs/tags/.*\\.tar\\.gz$"
mutable_patterns:
# Branch archives can change on every push
- ".*/archive/refs/heads/main\\.tar\\.gz$"
- ".*/archive/refs/heads/master\\.tar\\.gz$"
# Before re-downloading an expired branch archive, check whether it has
# actually changed (304 Not Modified → just refresh the TTL, no transfer).
# Only applies to user-defined mutable_patterns, not package-type defaults.
check_mutable_updates: true
cache:
immutable_ttl: 0 # Tag archives cached indefinitely
mutable_ttl: 86400 # Branch archives refreshed after 1 day
gitea-dl:
base_url: "https://dl.gitea.com"
package: "generic"
description: "Gitea download site"
immutable_patterns:
- "act_runner/.*/act_runner-.*-linux-amd64$"
cache:
immutable_ttl: 0
mutable_ttl: 0
hashicorp-releases:
base_url: "https://releases.hashicorp.com"
package: "generic"
description: "HashiCorp product releases"
immutable_patterns:
- "terraform/.*terraform_.*_linux_amd64\\.zip$"
- "terraform/.*terraform_.*_windows_amd64\\.zip$"
- "terraform/.*terraform_.*_darwin_amd64\\.zip$"
- "vault/.*vault_.*_linux_amd64\\.zip$"
- "vault/.*vault_.*_windows_amd64\\.zip$"
- "vault/.*vault_.*_darwin_amd64\\.zip$"
- "consul-cni/.*/consul-cni_.*_linux_amd64\\.zip$"
- "consul/.*/consul_.*_linux_amd64\\.zip$"
- "nomad-autoscaler/.*/nomad-autoscaler_.*_linux_amd64\\.zip$"
- "nomad/.*/nomad_.*_linux_amd64\\.zip$"
- "packer/.*/packer_.*_linux_amd64\\.zip$"
cache:
immutable_ttl: 0
mutable_ttl: 0
alpine:
base_url: "https://dl-cdn.alpinelinux.org"
package: "alpine"
description: "Alpine Linux APK package repository"
immutable_patterns:
- ".*/x86_64/.*\\.apk$"
# check_mutable_updates not set: APKINDEX.tar.gz is a package-type default
# and is always re-fetched on expiry — conditional checks are skipped for
# built-in mutable patterns regardless of this flag.
cache:
immutable_ttl: 0
mutable_ttl: 7200 # Index files (APKINDEX.tar.gz) cached for 2 hours
almalinux:
base_url: "https://gsl-syd.mm.fcix.net/almalinux"
package: "rpm"
description: "AlmaLinux RPM package repository"
immutable_patterns:
- ".*/x86_64/.*\\.rpm$"
- ".*/noarch/.*\\.rpm$"
- ".*/repodata/.*$"
- ".*\\.rpm$"
cache:
immutable_ttl: 0
mutable_ttl: 7200 # Metadata files cached for 2 hours
epel:
base_url: "http://mirror.aarnet.edu.au/pub/epel"
package: "rpm"
description: "EPEL (Extra Packages for Enterprise Linux)"
immutable_patterns:
- "8/Everything/x86_64/.*\\.rpm$"
- "9/Everything/x86_64/.*\\.rpm$"
- "10/Everything/x86_64/.*\\.rpm$"
- ".*/noarch/.*\\.rpm$"
- ".*/repodata/.*$"
cache:
immutable_ttl: 0
mutable_ttl: 7200
fedora:
base_url: "https://gsl-syd.mm.fcix.net/fedora/linux"
package: "rpm"
description: "Fedora Linux RPM package repository"
immutable_patterns:
- "releases/.*/Everything/x86_64/.*\\.rpm$"
- "updates/.*/Everything/x86_64/.*\\.rpm$"
- "development/.*/Everything/x86_64/.*\\.rpm$"
- ".*/noarch/.*\\.rpm$"
- "updates/.*/Everything/x86_64/repodata/.*$"
cache:
immutable_ttl: 0
mutable_ttl: 300 # Metadata files cached for 5 minutes
ghcr:
base_url: "https://ghcr.io"
package: "docker"
description: "GitHub Container Registry"
# username: "your-github-username"
# password: "your-github-pat" # needs read:packages scope
cache:
immutable_ttl: 0
mutable_ttl: 300
dockerhub:
base_url: "https://registry-1.docker.io"
package: "docker"
description: "Docker Hub registry"
cache:
immutable_ttl: 0
mutable_ttl: 300
pypi:
base_url: "https://files.pythonhosted.org"
package: "pypi"
description: "Python Package Index — simple index and package files via a single remote"
check_mutable_updates: true
quarantine_new: true
quarantine_days: 3
immutable_patterns:
- "packages/.*\\.whl$"
- "packages/.*\\.whl\\.metadata$"
- "packages/.*\\.tar\\.gz$"
- "packages/.*\\.zip$"
- "packages/.*\\.egg$"
cache:
immutable_ttl: 0
mutable_ttl: 600 # Simple index pages refreshed after 10 minutes
pypi-gitea:
base_url: "https://gitea.example.com/api/packages/myorg/pypi"
package: "pypi"
description: "Private Gitea PyPI registry — simple index and files at the same host"
# username: "your-gitea-username"
# password: "your-personal-access-token" # needs package:read scope
check_mutable_updates: true
immutable_patterns:
- "files/.*\\.whl$"
- "files/.*\\.whl\\.metadata$"
- "files/.*\\.tar\\.gz$"
- "files/.*\\.zip$"
- "files/.*\\.egg$"
cache:
immutable_ttl: 0
mutable_ttl: 600
npm:
base_url: "https://registry.npmjs.org"
package: "npm"
description: "npm registry — package metadata with tarball URL rewriting"
check_mutable_updates: true
immutable_patterns:
- \.tgz$
mutable_patterns:
- ^(?!.*\.tgz$).*
cache:
immutable_ttl: 0
mutable_ttl: 600 # Package metadata refreshed after 10 minutes
hashicorp-helm:
base_url: "https://helm.releases.hashicorp.com"
package: "helm"
description: "HashiCorp Helm chart repository (Vault, Consul, Nomad, etc.)"
check_mutable_updates: true
immutable_patterns:
- "\\.tgz$"
cache:
immutable_ttl: 0
mutable_ttl: 3600
metallb:
base_url: "https://metallb.github.io/metallb"
package: "helm"
description: "MetalLB load balancer Helm charts"
check_mutable_updates: true
immutable_patterns:
- "\\.tgz$"
cache:
immutable_ttl: 0
mutable_ttl: 3600
jetstack:
base_url: "https://charts.jetstack.io"
package: "helm"
description: "Jetstack Helm charts (cert-manager)"
check_mutable_updates: true
immutable_patterns:
- "\\.tgz$"
cache:
immutable_ttl: 0
mutable_ttl: 3600
rancher-stable:
base_url: "https://releases.rancher.com/server-charts/stable"
package: "helm"
description: "Rancher stable Helm charts"
check_mutable_updates: true
immutable_patterns:
- "\\.tgz$"
cache:
immutable_ttl: 0
mutable_ttl: 3600
purelb:
base_url: "https://gitlab.com/api/v4/projects/20400619/packages/helm/stable"
package: "helm"
description: "PureLB load balancer Helm charts"
check_mutable_updates: true
immutable_patterns:
- "\\.tgz$"
cache:
immutable_ttl: 0
mutable_ttl: 3600
istio:
base_url: "https://istio-release.storage.googleapis.com/charts"
package: "helm"
description: "Istio service mesh Helm charts"
check_mutable_updates: true
immutable_patterns:
- "\\.tgz$"
cache:
immutable_ttl: 0
mutable_ttl: 3600
cnpg:
base_url: "https://cloudnative-pg.github.io/charts"
package: "helm"
description: "CloudNativePG operator Helm charts"
check_mutable_updates: true
immutable_patterns:
- "\\.tgz$"
cache:
immutable_ttl: 0
mutable_ttl: 3600
ceph-csi:
base_url: "https://ceph.github.io/csi-charts"
package: "helm"
description: "Ceph CSI driver Helm charts"
check_mutable_updates: true
immutable_patterns:
- "\\.tgz$"
cache:
immutable_ttl: 0
mutable_ttl: 3600
external-dns:
base_url: "https://kubernetes-sigs.github.io/external-dns/"
package: "helm"
description: "ExternalDNS Helm charts"
check_mutable_updates: true
immutable_patterns:
- "\\.tgz$"
cache:
immutable_ttl: 0
mutable_ttl: 3600
intel-helm:
base_url: "https://intel.github.io/helm-charts/"
package: "helm"
description: "Intel Helm charts"
check_mutable_updates: true
immutable_patterns:
- "\\.tgz$"
cache:
immutable_ttl: 0
mutable_ttl: 3600
elastic:
base_url: "https://helm.elastic.co"
package: "helm"
description: "Elastic stack Helm charts"
check_mutable_updates: true
immutable_patterns:
- "\\.tgz$"
cache:
immutable_ttl: 0
mutable_ttl: 3600
k8up-io:
base_url: "https://k8up-io.github.io/k8up"
package: "helm"
description: "K8up backup operator Helm charts"
check_mutable_updates: true
immutable_patterns:
- "\\.tgz$"
cache:
immutable_ttl: 0
mutable_ttl: 3600
victoriametrics:
base_url: "https://victoriametrics.github.io/helm-charts/"
package: "helm"
description: "VictoriaMetrics observability Helm charts"
check_mutable_updates: true
immutable_patterns:
- "\\.tgz$"
cache:
immutable_ttl: 0
mutable_ttl: 3600
grafana:
base_url: "https://grafana.github.io/helm-charts"
package: "helm"
description: "Grafana observability Helm charts"
check_mutable_updates: true
immutable_patterns:
- "\\.tgz$"
cache:
immutable_ttl: 0
mutable_ttl: 3600
helm-openldap:
base_url: "https://jp-gouin.github.io/helm-openldap/"
package: "helm"
description: "OpenLDAP Helm charts"
check_mutable_updates: true
immutable_patterns:
- "\\.tgz$"
cache:
immutable_ttl: 0
mutable_ttl: 3600
woodpecker:
base_url: "https://woodpecker-ci.org/"
package: "helm"
description: "Woodpecker CI Helm charts"
check_mutable_updates: true
immutable_patterns:
- "\\.tgz$"
cache:
immutable_ttl: 0
mutable_ttl: 3600
stakater:
base_url: "https://stakater.github.io/stakater-charts"
package: "helm"
description: "Stakater Helm charts"
check_mutable_updates: true
immutable_patterns:
- "\\.tgz$"
cache:
immutable_ttl: 0
mutable_ttl: 3600
jfrog:
base_url: "https://charts.jfrog.io/"
package: "helm"
description: "JFrog Helm charts"
check_mutable_updates: true
immutable_patterns:
- "\\.tgz$"
cache:
immutable_ttl: 0
mutable_ttl: 3600
openvox:
base_url: "https://openvoxproject.github.io/openvox-helm-chart"
package: "helm"
description: "OpenVox Helm charts"
check_mutable_updates: true
immutable_patterns:
- "\\.tgz$"
cache:
immutable_ttl: 0
mutable_ttl: 3600
virtual:
helm-all:
package: "helm"
description: "Virtual repository merging all helm remotes — member order is priority order for duplicate chart+version"
members:
- hashicorp-helm
- metallb
- jetstack
- rancher-stable
- purelb
- istio
- cnpg
- ceph-csi
- external-dns
- intel-helm
- elastic
- k8up-io
- victoriametrics
- grafana
- helm-openldap
- woodpecker
- stakater
- jfrog
- openvox
local:
local-generic:
package: "generic"
description: "Local generic file repository"
cache:
immutable_ttl: 0
mutable_ttl: 0
+14 -1
View File
@@ -42,5 +42,18 @@ dev = [
"black>=23.9.0",
"isort>=5.12.0",
"mypy>=1.6.0",
"ruff>=0.1.0",
"ruff>=0.4.0",
"tox>=4.0.0",
"pre-commit>=3.0.0",
]
[tool.pytest.ini_options]
asyncio_mode = "auto"
testpaths = ["tests"]
[tool.ruff]
line-length = 140
[tool.ruff.lint]
select = ["E", "F", "I", "UP"]
ignore = ["E501"]
+3
View File
@@ -0,0 +1,3 @@
from . import discovery, docker, flush, local, proxy
__all__ = ["discovery", "docker", "flush", "local", "proxy"]
+82
View File
@@ -0,0 +1,82 @@
import logging
import re
from typing import Any
from urllib.parse import urlparse
import httpx
from fastapi import HTTPException
from .proxy import cache_single_artifact
logger = logging.getLogger(__name__)
async def _discover_github_releases(remote: str, include_pattern: str) -> list[str]:
match = re.match(r"github\.com/([^/]+)/([^/]+)", remote)
if not match:
raise HTTPException(status_code=400, detail="Invalid GitHub remote format")
owner, repo = match.groups()
async with httpx.AsyncClient(follow_redirects=True) as client:
response = await client.get(f"https://api.github.com/repos/{owner}/{repo}/releases")
if response.status_code != 200:
raise HTTPException(status_code=response.status_code, detail=f"Failed to fetch releases: {response.text}")
releases = response.json()
regex = re.compile(include_pattern.replace("*", ".*"))
return [
asset["browser_download_url"]
for release in releases
for asset in release.get("assets", [])
if regex.search(asset["browser_download_url"])
]
async def _discover(remote: str, include_pattern: str) -> list[str]:
if "github.com" in remote:
return await _discover_github_releases(remote, include_pattern)
raise HTTPException(status_code=400, detail=f"Unsupported remote: {remote}")
async def cache_artifacts(remote: str, include_pattern: str, storage) -> dict[str, Any]:
try:
matching_urls = await _discover(remote, include_pattern)
if not matching_urls:
return {"message": "No matching artifacts found", "cached_count": 0, "artifacts": []}
cached_artifacts = []
for url in matching_urls:
result = await cache_single_artifact(url, "", "", storage, {})
cached_artifacts.append(result)
cached_count = sum(1 for a in cached_artifacts if a["status"] in ["cached", "already_cached"])
return {
"message": f"Processed {len(matching_urls)} artifacts, {cached_count} successfully cached",
"cached_count": cached_count,
"artifacts": cached_artifacts,
}
except Exception as e:
raise HTTPException(status_code=500, detail=str(e))
async def list_artifacts(remote: str, include_pattern: str, storage) -> dict[str, Any]:
try:
matching_urls = await _discover(remote, include_pattern)
cached_artifacts = []
for url in matching_urls:
parsed = urlparse(url)
key = storage.get_object_key(remote, parsed.path)
if storage.exists(key):
cached_artifacts.append({"url": url, "cached_url": storage.get_url(key), "key": key})
return {
"remote": remote,
"pattern": include_pattern,
"total_found": len(matching_urls),
"cached_count": len(cached_artifacts),
"artifacts": cached_artifacts,
}
except Exception as e:
raise HTTPException(status_code=500, detail=str(e))
+103
View File
@@ -0,0 +1,103 @@
import hashlib
import json
import logging
import re
from fastapi import HTTPException, Request, Response
from . import proxy as _proxy
logger = logging.getLogger(__name__)
def ping() -> Response:
return Response(
content="{}",
media_type="application/json",
headers={"Docker-Distribution-Api-Version": "registry/2.0"},
)
async def proxy(request: Request, remote_name: str, path: str, storage, cache, config, metrics) -> Response:
remote_config = config.get_remote_config(remote_name)
if not remote_config:
raise HTTPException(status_code=404, detail=f"Remote '{remote_name}' not configured")
if remote_config.get("package") != "docker":
raise HTTPException(status_code=400, detail=f"Remote '{remote_name}' is not a docker remote")
patterns = config.get_immutable_patterns(remote_name, "")
if patterns:
path_parts = path.split("/")
image_name = "/".join(path_parts[:2]) if len(path_parts) >= 2 else path
if not any(re.search(p, path) or re.search(p, image_name) for p in patterns):
logger.info(f"PATTERN BLOCKED: {remote_name}/{path}")
raise HTTPException(status_code=403, detail="Image not allowed by configuration patterns")
base_url = remote_config.get("base_url", "").rstrip("/")
remote_url = f"{base_url}/v2/{path}"
cached_key = storage.get_object_key(remote_name, path)
if not storage.exists(cached_key):
cached_key = None
is_mutable = cache.is_mutable_file(path, config.get_mutable_patterns(remote_name))
if cached_key and is_mutable:
if not cache.is_index_valid(remote_name, path):
if not await _proxy.handle_expired_mutable(remote_name, path, remote_url, config, cache, storage):
cached_key = None
if not cached_key:
logger.info(f"Cache MISS: {remote_name}/{path} - fetching from remote: {remote_url}")
result = await _proxy.cache_single_artifact(remote_url, remote_name, path, storage, remote_config)
if result["status"] == "error":
raise HTTPException(status_code=502, detail=f"Failed to fetch: {result['error']}")
if result["status"] == "cached" and is_mutable:
cache_config = config.get_cache_config(remote_name)
mutable_ttl = cache_config.get("mutable_ttl", 3600)
cache.mark_index_cached(remote_name, path, mutable_ttl)
logger.info(f"Mutable file cached with TTL: {remote_name}/{path} (ttl: {mutable_ttl}s)")
if result.get("etag") or result.get("last_modified"):
cache.store_mutable_meta(remote_name, path, result.get("etag"), result.get("last_modified"))
if not is_mutable:
published = result.get("last_modified")
if published:
cache.store_artifact_published(remote_name, path, published)
_proxy._check_quarantine(remote_name, published, config)
elif not is_mutable:
published = cache.get_artifact_published(remote_name, path)
if not published:
published = await _proxy._fetch_last_modified(remote_url, remote_config)
if published:
cache.store_artifact_published(remote_name, path, published)
_proxy._check_quarantine(remote_name, published, config)
artifact_data = storage.download_object(storage.get_object_key(remote_name, path))
is_blob = "/blobs/" in path
if is_blob:
content_type = "application/octet-stream"
else:
try:
manifest_json = json.loads(artifact_data)
content_type = manifest_json.get("mediaType")
if not content_type:
if "manifests" in manifest_json:
content_type = "application/vnd.oci.image.index.v1+json"
else:
content_type = "application/vnd.oci.image.manifest.v1+json"
except Exception:
content_type = "application/vnd.oci.image.manifest.v1+json"
digest = f"sha256:{hashlib.sha256(artifact_data).hexdigest()}"
headers = {
"Docker-Distribution-Api-Version": "registry/2.0",
"Docker-Content-Digest": digest,
"Content-Length": str(len(artifact_data)),
}
if request.method == "HEAD":
return Response(status_code=200, headers=headers, media_type=content_type)
metrics.record_cache_hit(remote_name, len(artifact_data))
return Response(content=artifact_data, media_type=content_type, headers=headers)
+66
View File
@@ -0,0 +1,66 @@
import logging
from fastapi import HTTPException
logger = logging.getLogger(__name__)
def handle(remote: str | None, cache_type: str, cache, storage) -> dict:
try:
result = {"remote": remote, "cache_type": cache_type, "flushed": {"redis_keys": 0, "s3_objects": 0, "operations": []}}
if cache_type in ["all", "index", "metrics"] and cache.available and cache.client:
patterns = []
if cache_type in ["all", "index"]:
if remote:
patterns += [f"index:{remote}:*", f"mutable:meta:{remote}:*"]
else:
patterns += ["index:*", "mutable:meta:*"]
if cache_type in ["all", "metrics"]:
patterns.append(f"metrics:*:{remote}" if remote else "metrics:*")
for pattern in patterns:
keys = cache.client.keys(pattern)
if keys:
cache.client.delete(*keys)
result["flushed"]["redis_keys"] += len(keys)
logger.info(f"Cache flush: deleted {len(keys)} Redis keys matching '{pattern}'")
if result["flushed"]["redis_keys"] > 0:
result["flushed"]["operations"].append(f"Deleted {result['flushed']['redis_keys']} Redis keys")
if cache_type in ["all", "files"]:
try:
list_params = {"Bucket": storage.bucket}
if remote:
list_params["Prefix"] = f"{remote}/"
response = storage.client.list_objects_v2(**list_params)
if "Contents" in response:
objects_to_delete = [obj["Key"] for obj in response["Contents"]]
for key in objects_to_delete:
try:
storage.client.delete_object(Bucket=storage.bucket, Key=key)
result["flushed"]["s3_objects"] += 1
except Exception as e:
logger.warning(f"Failed to delete S3 object {key}: {e}")
if objects_to_delete:
scope = f" for remote '{remote}'" if remote else ""
result["flushed"]["operations"].append(f"Deleted {len(objects_to_delete)} S3 objects{scope}")
logger.info(f"Cache flush: deleted {len(objects_to_delete)} S3 objects{scope}")
except Exception as e:
result["flushed"]["operations"].append(f"S3 flush failed: {str(e)}")
logger.error(f"Cache flush S3 error: {e}")
if not result["flushed"]["operations"]:
result["flushed"]["operations"].append("No cache entries found to flush")
return result
except Exception as e:
logger.error(f"Cache flush error: {e}")
raise HTTPException(status_code=500, detail=f"Cache flush failed: {str(e)}")
+108
View File
@@ -0,0 +1,108 @@
import hashlib
import logging
from fastapi import HTTPException, Response, UploadFile
from fastapi.responses import JSONResponse
logger = logging.getLogger(__name__)
async def upload(remote_name: str, path: str, file: UploadFile, storage, database, config) -> JSONResponse:
remote_config = config.get_remote_config(remote_name)
if not remote_config:
raise HTTPException(status_code=404, detail=f"Remote '{remote_name}' not configured")
if remote_config.get("type") != "local":
raise HTTPException(status_code=400, detail="Upload only supported for local repositories")
try:
content = await file.read()
sha256_sum = hashlib.sha256(content).hexdigest()
if database.file_exists(remote_name, path):
raise HTTPException(status_code=409, detail="File already exists")
s3_key = f"local/{remote_name}/{path}"
content_type = file.content_type or "application/octet-stream"
try:
storage.upload(s3_key, content)
except Exception as e:
raise HTTPException(status_code=500, detail=f"Upload failed: {e}")
success = database.add_local_file(
repository_name=remote_name,
file_path=path,
s3_key=s3_key,
size_bytes=len(content),
sha256_sum=sha256_sum,
content_type=content_type,
)
if not success:
storage.delete_object(s3_key)
raise HTTPException(status_code=500, detail="Failed to save file metadata")
return JSONResponse(
{
"message": "File uploaded successfully",
"file_path": path,
"size_bytes": len(content),
"sha256_sum": sha256_sum,
"content_type": content_type,
}
)
except HTTPException:
raise
except Exception as e:
raise HTTPException(status_code=500, detail=f"Upload failed: {str(e)}")
def check_exists(remote_name: str, path: str, database, config) -> Response:
remote_config = config.get_remote_config(remote_name)
if not remote_config:
raise HTTPException(status_code=404, detail=f"Remote '{remote_name}' not configured")
if remote_config.get("type") != "local":
raise HTTPException(status_code=405, detail="HEAD method only supported for local repositories")
try:
metadata = database.get_local_file_metadata(remote_name, path)
if not metadata:
raise HTTPException(status_code=404, detail="File not found")
return Response(
headers={
"Content-Length": str(metadata["size_bytes"]),
"Content-Type": metadata.get("content_type", "application/octet-stream"),
"X-SHA256": metadata["sha256_sum"],
"X-Created-At": metadata["created_at"].isoformat() if metadata["created_at"] else "",
"X-Uploaded-At": metadata["uploaded_at"].isoformat() if metadata["uploaded_at"] else "",
}
)
except HTTPException:
raise
except Exception as e:
raise HTTPException(status_code=500, detail=f"Check failed: {str(e)}")
def delete(remote_name: str, path: str, storage, database, config) -> JSONResponse:
remote_config = config.get_remote_config(remote_name)
if not remote_config:
raise HTTPException(status_code=404, detail=f"Remote '{remote_name}' not configured")
if remote_config.get("type") != "local":
raise HTTPException(status_code=400, detail="Delete only supported for local repositories")
try:
s3_key = database.delete_local_file(remote_name, path)
if not s3_key:
raise HTTPException(status_code=404, detail="File not found")
if not storage.delete_object(s3_key):
logger.warning(f"Failed to delete S3 object {s3_key} after database removal")
return JSONResponse({"message": "File deleted successfully"})
except HTTPException:
raise
except Exception as e:
raise HTTPException(status_code=500, detail=f"Delete failed: {str(e)}")
+331
View File
@@ -0,0 +1,331 @@
import base64
import logging
import os
import re
from datetime import UTC, datetime, timedelta
from email.utils import parsedate_to_datetime
import httpx
from fastapi import HTTPException, Request, Response
from ..auth import get_docker_token_for_response
from ..remote import helm as _helm
from ..remote import npm as _npm
from ..remote import python as _pypi
from ..remote.base import get_content_type
logger = logging.getLogger(__name__)
class UpstreamUnreachable(Exception):
"""Raised when the upstream backend cannot be contacted (network or timeout error)."""
def _check_quarantine(remote_name: str, last_modified_str: str | None, config) -> None:
"""Raise HTTP 404 if the artifact is within the per-remote quarantine window.
Fails open (allows the request) when the publish date cannot be determined.
"""
enabled, days = config.get_quarantine_config(remote_name)
if not enabled or not days:
return
if not last_modified_str:
return # cannot determine age → allow
try:
publish_date = parsedate_to_datetime(last_modified_str)
except Exception:
return # unparseable → allow
cutoff = datetime.now(UTC) - timedelta(days=days)
if publish_date > cutoff:
available_on = (publish_date + timedelta(days=days)).date()
raise HTTPException(
status_code=404,
detail=(
f"Package quarantined: published {publish_date.date()}, available after {available_on} ({days}-day new-release quarantine)"
),
)
async def _fetch_last_modified(remote_url: str, remote_cfg: dict) -> str | None:
"""HEAD the upstream URL and return the Last-Modified header, or None on any failure."""
auth = _basic_auth_header(remote_cfg)
try:
async with httpx.AsyncClient(follow_redirects=True) as client:
response = await client.head(remote_url, headers=auth, timeout=10.0)
return response.headers.get("Last-Modified")
except Exception:
return None
def _basic_auth_header(remote_cfg: dict) -> dict[str, str]:
username = remote_cfg.get("username")
password = remote_cfg.get("password")
if username and password:
token = base64.b64encode(f"{username}:{password}".encode()).decode()
return {"Authorization": f"Basic {token}"}
return {}
def _resolve_content(
data: bytes,
path: str,
filename: str,
remote_config: dict,
request: Request,
remote_name: str = "",
) -> tuple[bytes, str]:
package = remote_config.get("package")
proxy_base = str(request.base_url).rstrip("/")
base_url = remote_config.get("base_url", "").rstrip("/")
if package == "pypi":
return _pypi.resolve_content(data, path, filename, remote_config.get("immutable_patterns", []), base_url, proxy_base, remote_name)
if package == "npm":
return _npm.resolve_content(data, path, filename, remote_config.get("immutable_patterns", []), base_url, proxy_base, remote_name)
if package == "helm":
return _helm.resolve_content(data, path, filename, base_url, proxy_base, remote_name)
return data, get_content_type(filename)
def construct_url(remote_config: dict, path: str) -> str:
base_url = remote_config.get("base_url", "").rstrip("/")
if remote_config.get("package") == "docker":
return f"{base_url}/v2/{path}"
if remote_config.get("package") == "pypi":
return _pypi.construct_url(base_url, path)
return f"{base_url}/{path}"
async def cache_single_artifact(url: str, remote_name: str, path: str, storage, remote_config: dict) -> dict:
key = storage.get_object_key(remote_name, path)
if storage.exists(key):
logger.info(f"Cache ALREADY EXISTS: {url} (key: {key})")
return {"url": url, "cached_url": storage.get_url(key), "status": "already_cached"}
try:
is_docker = remote_config.get("package") == "docker" or "/v2/" in url
headers = {}
username = remote_config.get("username")
password = remote_config.get("password")
if is_docker:
if "/manifests/" in url:
headers["Accept"] = (
"application/vnd.docker.distribution.manifest.v2+json,"
"application/vnd.oci.image.manifest.v1+json,"
"application/vnd.oci.image.index.v1+json,"
"application/vnd.docker.distribution.manifest.list.v2+json"
)
elif "/blobs/" in url:
headers["Accept"] = "application/octet-stream"
elif username and password:
headers["Authorization"] = "Basic " + base64.b64encode(f"{username}:{password}".encode()).decode()
async with httpx.AsyncClient(follow_redirects=True) as client:
response = await client.get(url, headers=headers)
if response.status_code == 401 and is_docker:
www_auth = response.headers.get("WWW-Authenticate", "")
token = await get_docker_token_for_response(www_auth, username, password)
if token:
headers["Authorization"] = f"Bearer {token}"
response = await client.get(url, headers=headers)
response.raise_for_status()
storage.upload(key, response.content)
logger.info(f"Cache ADD SUCCESS: {url} (size: {len(response.content)} bytes, key: {key})")
return {
"url": url,
"cached_url": storage.get_url(key),
"storage_path": f"s3://{storage.bucket}/{key}",
"size": len(response.content),
"status": "cached",
"etag": response.headers.get("ETag"),
"last_modified": response.headers.get("Last-Modified"),
}
except Exception as e:
return {"url": url, "status": "error", "error": str(e)}
async def _upstream_reachable(url: str, auth_headers: dict | None = None) -> bool:
try:
async with httpx.AsyncClient(follow_redirects=True) as client:
await client.head(url, headers=auth_headers or {}, timeout=10.0)
return True
except (httpx.NetworkError, httpx.TimeoutException):
return False
except Exception:
return True
async def check_upstream_changed(remote_url: str, remote_name: str, path: str, cache, auth_headers: dict | None = None) -> bool:
meta = cache.get_mutable_meta(remote_name, path)
if not meta:
return True
headers = dict(auth_headers or {})
if meta.get("etag"):
headers["If-None-Match"] = meta["etag"]
if meta.get("last_modified"):
headers["If-Modified-Since"] = meta["last_modified"]
if not (meta.get("etag") or meta.get("last_modified")):
return True
try:
async with httpx.AsyncClient(follow_redirects=True) as client:
response = await client.head(remote_url, headers=headers)
return response.status_code != 304
except (httpx.NetworkError, httpx.TimeoutException) as exc:
raise UpstreamUnreachable(str(exc)) from exc
async def handle_expired_mutable(remote_name: str, path: str, remote_url: str, config, cache, storage) -> bool:
"""Handle an expired mutable file. Returns True if the cached copy is still valid."""
mutable_ttl = config.get_cache_config(remote_name).get("mutable_ttl", 3600)
remote_cfg = config.get_remote_config(remote_name) or {}
auth = _basic_auth_header(remote_cfg)
check_updates = remote_cfg.get("check_mutable_updates", False)
user_mutable = check_updates and cache.is_mutable_file(path, config.get_user_mutable_patterns(remote_name))
if user_mutable:
try:
changed = await check_upstream_changed(remote_url, remote_name, path, cache, auth)
except UpstreamUnreachable:
cache.mark_index_cached(remote_name, path, mutable_ttl)
logger.warning(f"Mutable STALE (backend unreachable): {remote_name}/{path} - TTL extended ({mutable_ttl}s)")
return True
if not changed:
cache.mark_index_cached(remote_name, path, mutable_ttl)
logger.info(f"Mutable file UNCHANGED: {remote_name}/{path} - TTL refreshed ({mutable_ttl}s)")
return True
logger.info(f"Mutable file CHANGED: {remote_name}/{path} - re-downloading")
else:
if not await _upstream_reachable(remote_url, auth):
cache.mark_index_cached(remote_name, path, mutable_ttl)
logger.warning(f"Mutable STALE (backend unreachable): {remote_name}/{path} - TTL extended ({mutable_ttl}s)")
return True
logger.info(f"Mutable file EXPIRED: {remote_name}/{path} - removing from cache")
cache.cleanup_expired_index(storage, remote_name, path)
return False
async def handle(request: Request, remote_name: str, path: str, storage, cache, config, database, metrics) -> Response:
remote_config = config.get_remote_config(remote_name)
if not remote_config:
raise HTTPException(status_code=404, detail=f"Remote '{remote_name}' not configured")
if remote_config.get("type") == "local":
metadata = database.get_local_file_metadata(remote_name, path)
if not metadata:
raise HTTPException(status_code=404, detail="File not found")
content = storage.download_object(metadata["s3_key"])
if content is None:
raise HTTPException(status_code=500, detail="File not accessible")
return Response(
content=content,
media_type=metadata.get("content_type", "application/octet-stream"),
headers={"Content-Disposition": f"attachment; filename={os.path.basename(path)}"},
)
path_parts = path.split("/")
if len(path_parts) >= 2:
repo_path = f"{path_parts[0]}/{path_parts[1]}"
file_path = "/".join(path_parts[2:])
else:
repo_path = path
file_path = path
mutable_patterns = config.get_mutable_patterns(remote_name)
if not cache.is_mutable_file(file_path, mutable_patterns) and not cache.is_mutable_file(path, mutable_patterns):
patterns = config.get_immutable_patterns(remote_name, repo_path)
if patterns and not any(re.search(p, file_path) or re.search(p, path) for p in patterns):
logger.info(f"PATTERN BLOCKED: {remote_name}/{path} - not matching include patterns")
raise HTTPException(status_code=403, detail="Artifact not allowed by configuration patterns")
remote_url = construct_url(remote_config, path)
if not remote_config.get("base_url"):
raise HTTPException(status_code=500, detail=f"No base_url configured for remote '{remote_name}'")
cached_key = storage.get_object_key(remote_name, path)
if not storage.exists(cached_key):
cached_key = None
filename = os.path.basename(path)
is_mutable = cache.is_mutable_file(path, mutable_patterns)
if cached_key and is_mutable:
if not cache.is_index_valid(remote_name, path):
if not await handle_expired_mutable(remote_name, path, remote_url, config, cache, storage):
cached_key = None
if cached_key:
if not is_mutable:
published = cache.get_artifact_published(remote_name, path)
if not published:
published = await _fetch_last_modified(remote_url, remote_config)
if published:
cache.store_artifact_published(remote_name, path, published)
_check_quarantine(remote_name, published, config)
try:
artifact_data = storage.download_object(cached_key)
artifact_data, content_type = _resolve_content(artifact_data, path, filename, remote_config, request, remote_name)
logger.info(f"Cache HIT: {remote_name}/{path} (size: {len(artifact_data)} bytes, key: {cached_key})")
metrics.record_cache_hit(remote_name, len(artifact_data))
database.record_artifact_mapping(cached_key, remote_name, path, len(artifact_data))
return Response(
content=artifact_data,
media_type=content_type,
headers={
"Content-Disposition": f"attachment; filename={filename}",
"X-Artifact-Source": "cache",
"X-Artifact-Size": str(len(artifact_data)),
},
)
except HTTPException:
raise
except Exception as e:
raise HTTPException(status_code=500, detail=f"Error retrieving cached artifact: {str(e)}")
logger.info(f"Cache MISS: {remote_name}/{path} - fetching from remote: {remote_url}")
result = await cache_single_artifact(remote_url, remote_name, path, storage, remote_config)
if result["status"] == "error":
logger.error(f"Cache ADD FAILED: {remote_name}/{path} - {result['error']}")
raise HTTPException(status_code=502, detail=f"Failed to fetch artifact: {result['error']}")
if result["status"] == "cached" and is_mutable:
cache_config = config.get_cache_config(remote_name)
mutable_ttl = cache_config.get("mutable_ttl", 3600)
cache.mark_index_cached(remote_name, path, mutable_ttl)
logger.info(f"Mutable file cached with TTL: {remote_name}/{path} (ttl: {mutable_ttl}s)")
if result.get("etag") or result.get("last_modified"):
cache.store_mutable_meta(remote_name, path, result.get("etag"), result.get("last_modified"))
if not is_mutable:
published = result.get("last_modified")
if published:
cache.store_artifact_published(remote_name, path, published)
_check_quarantine(remote_name, published, config)
try:
cache_key = storage.get_object_key(remote_name, path)
artifact_data = storage.download_object(cache_key)
artifact_data, content_type = _resolve_content(artifact_data, path, filename, remote_config, request, remote_name)
metrics.record_cache_miss(remote_name, len(artifact_data))
database.record_artifact_mapping(cache_key, remote_name, path, len(artifact_data))
return Response(
content=artifact_data,
media_type=content_type,
headers={
"Content-Disposition": f"attachment; filename={filename}",
"X-Artifact-Source": "remote",
"X-Artifact-Size": str(len(artifact_data)),
},
)
except Exception as e:
raise HTTPException(status_code=500, detail=f"Error serving artifact: {str(e)}")
+227
View File
@@ -0,0 +1,227 @@
import asyncio
import base64
import logging
import time
from datetime import UTC, date, datetime
from typing import Protocol, runtime_checkable
import httpx
import yaml
from fastapi import HTTPException, Request, Response
from ..remote import helm as _helm
logger = logging.getLogger(__name__)
class _HelmDumper(yaml.Dumper):
"""YAML dumper that serializes datetime/date objects back to ISO 8601 strings.
yaml.safe_load converts timestamp-shaped YAML scalars (e.g. chart `created`
fields) to Python datetime objects. Without a custom representer, yaml.dump
would render them as "2022-12-16 11:08:49+00:00" (space, not T), which
Go's YAML parser cannot unmarshal into time.Time.
"""
def _repr_datetime(dumper: yaml.Dumper, data: datetime) -> yaml.ScalarNode:
s = data.strftime("%Y-%m-%dT%H:%M:%S.%f") + ("Z" if data.tzinfo else "")
return dumper.represent_scalar("tag:yaml.org,2002:str", s)
def _repr_date(dumper: yaml.Dumper, data: date) -> yaml.ScalarNode:
return dumper.represent_scalar("tag:yaml.org,2002:str", data.isoformat())
_HelmDumper.add_representer(datetime, _repr_datetime)
_HelmDumper.add_representer(date, _repr_date)
async def _get_member_index(
member_name: str,
member_cfg: dict,
path: str,
storage,
cache,
) -> tuple[str, dict, int, bytes | None]:
"""Fetch or retrieve cached index.yaml for one member remote.
Returns (member_name, member_cfg, ttl, raw_bytes).
raw_bytes is None if the member is unreachable and not in S3.
"""
member_ttl = member_cfg.get("cache", {}).get("mutable_ttl", 3600)
s3_key = storage.get_object_key(member_name, path)
raw_data: bytes | None = None
if storage.exists(s3_key) and cache.is_index_valid(member_name, path):
try:
raw_data = storage.download_object(s3_key)
logger.info(f"Virtual: cache hit for member '{member_name}'")
except Exception:
raw_data = None
if raw_data is None:
base_url = member_cfg.get("base_url", "").rstrip("/")
upstream_url = f"{base_url}/index.yaml"
headers = {}
username = member_cfg.get("username")
password = member_cfg.get("password")
if username and password:
token = base64.b64encode(f"{username}:{password}".encode()).decode()
headers["Authorization"] = f"Basic {token}"
try:
async with httpx.AsyncClient(follow_redirects=True) as client:
response = await client.get(upstream_url, headers=headers, timeout=30.0)
response.raise_for_status()
raw_data = response.content
except Exception as e:
logger.warning(f"Virtual: failed to fetch index.yaml from member '{member_name}': {e}")
return member_name, member_cfg, member_ttl, None
try:
storage.upload(s3_key, raw_data)
cache.mark_index_cached(member_name, path, member_ttl)
except Exception as e:
logger.warning(f"Virtual: failed to cache index.yaml for member '{member_name}': {e}")
return member_name, member_cfg, member_ttl, raw_data
def _merge_helm_indexes(raw_indexes: list[bytes], member_names: list[str], member_configs: list[dict], proxy_base: str) -> bytes:
"""Merge helm index.yaml files with per-member URL rewriting.
Priority is determined by position in member_names: earlier members win
when the same chart name + version appears in multiple remotes.
"""
merged_entries: dict[str, list] = {}
for raw_data, member_name, member_cfg in zip(raw_indexes, member_names, member_configs):
base_url = member_cfg.get("base_url", "").rstrip("/")
rewritten, _ = _helm.resolve_content(raw_data, "index.yaml", "index.yaml", base_url, proxy_base, member_name)
try:
index = yaml.safe_load(rewritten)
except Exception as e:
logger.warning(f"Virtual: failed to parse index.yaml from member '{member_name}': {e}")
continue
for chart_name, versions in (index.get("entries") or {}).items():
if chart_name not in merged_entries:
merged_entries[chart_name] = list(versions)
else:
existing = {(v.get("name"), v.get("version")) for v in merged_entries[chart_name]}
for version_entry in versions:
key = (version_entry.get("name"), version_entry.get("version"))
if key not in existing:
merged_entries[chart_name].append(version_entry)
existing.add(key)
merged = {
"apiVersion": "v1",
"entries": merged_entries,
"generated": datetime.now(UTC).strftime("%Y-%m-%dT%H:%M:%S.000Z"),
}
return yaml.dump(merged, Dumper=_HelmDumper, default_flow_style=False, allow_unicode=True).encode()
@runtime_checkable
class _VirtualHandler(Protocol):
def accepts_path(self, path: str) -> bool: ...
def merge(self, raw_indexes: list[bytes], member_names: list[str], member_configs: list[dict], proxy_base: str) -> bytes: ...
def path_error(self) -> str: ...
class _HelmHandler:
def accepts_path(self, path: str) -> bool:
return path == "index.yaml"
def merge(self, raw_indexes: list[bytes], member_names: list[str], member_configs: list[dict], proxy_base: str) -> bytes:
return _merge_helm_indexes(raw_indexes, member_names, member_configs, proxy_base)
def path_error(self) -> str:
return "Virtual helm repositories only serve index.yaml; chart tarballs are served directly by member remotes"
_HANDLERS: dict[str, _VirtualHandler] = {
"helm": _HelmHandler(),
}
async def handle(request: Request, virtual_name: str, path: str, storage, cache, config) -> Response:
virtual_cfg = config.get_remote_config(virtual_name)
if not virtual_cfg:
raise HTTPException(status_code=404, detail=f"Virtual repository '{virtual_name}' not configured")
if virtual_cfg.get("type") != "virtual":
raise HTTPException(status_code=400, detail=f"'{virtual_name}' is not a virtual repository")
package = virtual_cfg.get("package")
handler = _HANDLERS.get(package)
if handler is None:
raise HTTPException(status_code=400, detail=f"Virtual repositories with package '{package}' are not yet supported")
if not handler.accepts_path(path):
raise HTTPException(status_code=404, detail=handler.path_error())
members = virtual_cfg.get("members", [])
if not members:
raise HTTPException(status_code=500, detail=f"Virtual repository '{virtual_name}' has no members configured")
virtual_key = storage.get_object_key(virtual_name, path)
if cache.is_index_valid(virtual_name, path) and storage.exists(virtual_key):
data = storage.download_object(virtual_key)
logger.info(f"Virtual HIT: {virtual_name}/{path}")
return Response(content=data, media_type="text/yaml")
# Resolve configs first (config reads are sync/cheap)
member_entries = []
for member_name in members:
member_cfg = config.get_remote_config(member_name)
if not member_cfg:
logger.warning(f"Virtual '{virtual_name}': member '{member_name}' not found in config, skipping")
continue
member_entries.append((member_name, member_cfg))
# Fetch all member indexes in parallel; asyncio.gather preserves input order
proxy_base = str(request.base_url).rstrip("/")
t_fetch = time.perf_counter()
results = await asyncio.gather(*[_get_member_index(name, cfg, path, storage, cache) for name, cfg in member_entries])
fetch_ms = int((time.perf_counter() - t_fetch) * 1000)
raw_indexes: list[bytes] = []
used_members: list[str] = []
used_configs: list[dict] = []
min_ttl: int | None = None
for member_name, member_cfg, member_ttl, raw_data in results:
if min_ttl is None or member_ttl < min_ttl:
min_ttl = member_ttl
if raw_data is None:
logger.warning(f"Virtual '{virtual_name}': skipping unreachable member '{member_name}'")
continue
raw_indexes.append(raw_data)
used_members.append(member_name)
used_configs.append(member_cfg)
if not raw_indexes:
raise HTTPException(status_code=502, detail=f"Virtual repository '{virtual_name}': no member indices could be fetched")
if min_ttl is None:
min_ttl = 3600
t_merge = time.perf_counter()
merged = handler.merge(raw_indexes, used_members, used_configs, proxy_base)
merge_ms = int((time.perf_counter() - t_merge) * 1000)
try:
t_store = time.perf_counter()
storage.upload(virtual_key, merged)
cache.mark_index_cached(virtual_name, path, min_ttl)
store_ms = int((time.perf_counter() - t_store) * 1000)
logger.info(
f"Virtual MISS: {virtual_name}/{path} rebuilt from {used_members} "
f"(fetch={fetch_ms}ms merge={merge_ms}ms store={store_ms}ms ttl={min_ttl}s)"
)
except Exception as e:
logger.warning(f"Virtual: failed to store merged index for '{virtual_name}': {e}")
return Response(content=merged, media_type="text/yaml")
+3
View File
@@ -0,0 +1,3 @@
from .docker import fetch_token, get_docker_token_for_response, parse_www_authenticate
__all__ = ["fetch_token", "get_docker_token_for_response", "parse_www_authenticate"]
+96
View File
@@ -0,0 +1,96 @@
import logging
import re
import time
import httpx
logger = logging.getLogger(__name__)
# In-memory token cache: key -> (token, expires_at)
_token_cache: dict[str, tuple[str, float]] = {}
_WWW_AUTH_RE = re.compile(
r'Bearer\s+realm="(?P<realm>[^"]+)"'
r'(?:,service="(?P<service>[^"]*)")?'
r'(?:,scope="(?P<scope>[^"]*)")?',
re.IGNORECASE,
)
def _cache_key(realm: str, service: str, scope: str, username: str | None) -> str:
return f"{realm}|{service}|{scope}|{username or ''}"
def _get_cached_token(key: str) -> str | None:
entry = _token_cache.get(key)
if entry and entry[1] > time.time():
return entry[0]
_token_cache.pop(key, None)
return None
def _store_token(key: str, token: str, expires_in: int) -> None:
# Expire 30s early to avoid using a token right as it expires
_token_cache[key] = (token, time.time() + max(expires_in - 30, 10))
async def fetch_token(
realm: str,
service: str,
scope: str,
username: str | None = None,
password: str | None = None,
) -> str | None:
"""Fetch a Bearer token from a Docker registry auth server."""
key = _cache_key(realm, service, scope, username)
cached = _get_cached_token(key)
if cached:
return cached
params: dict[str, str] = {}
if service:
params["service"] = service
if scope:
params["scope"] = scope
auth = (username, password) if username and password else None
try:
async with httpx.AsyncClient(follow_redirects=True) as client:
response = await client.get(realm, params=params, auth=auth)
response.raise_for_status()
data = response.json()
except Exception as e:
logger.warning(f"Docker token fetch failed ({realm}): {e}")
return None
token = data.get("token") or data.get("access_token")
if not token:
logger.warning(f"Docker token response missing token field: {data}")
return None
expires_in = int(data.get("expires_in", 300))
_store_token(key, token, expires_in)
logger.debug(f"Docker token obtained (realm={realm}, service={service}, scope={scope}, expires_in={expires_in}s)")
return token
def parse_www_authenticate(header: str) -> tuple[str, str, str] | None:
"""Parse WWW-Authenticate: Bearer header. Returns (realm, service, scope) or None."""
m = _WWW_AUTH_RE.search(header)
if not m:
return None
return m.group("realm"), m.group("service") or "", m.group("scope") or ""
async def get_docker_token_for_response(
www_authenticate: str,
username: str | None = None,
password: str | None = None,
) -> str | None:
"""Given a WWW-Authenticate header value, fetch and return a Bearer token."""
parsed = parse_www_authenticate(www_authenticate)
if not parsed:
return None
realm, service, scope = parsed
return await fetch_token(realm, service, scope, username, password)
-91
View File
@@ -1,91 +0,0 @@
import time
import hashlib
import redis
class RedisCache:
def __init__(self, redis_url: str):
self.redis_url = redis_url
try:
self.client = redis.from_url(self.redis_url, decode_responses=True)
# Test connection
self.client.ping()
self.available = True
except Exception as e:
print(f"Redis not available: {e}")
self.client = None
self.available = False
def is_index_file(self, file_path: str) -> bool:
"""Check if the file is an index file that should have TTL"""
return (
file_path.endswith("APKINDEX.tar.gz")
or file_path.endswith("Packages.gz")
or file_path.endswith("repomd.xml")
or ("repodata/" in file_path
and file_path.endswith((
".xml", ".xml.gz", ".xml.bz2", ".xml.xz", ".xml.zck", ".xml.zst",
".sqlite", ".sqlite.gz", ".sqlite.bz2", ".sqlite.xz", ".sqlite.zck", ".sqlite.zst",
".yaml.xz", ".yaml.gz", ".yaml.bz2", ".yaml.zst",
".asc", ".txt"
)))
# Docker tag-based manifests are mutable (index); digest-pinned are immutable (file)
or (
"/manifests/" in file_path
and not file_path.split("/manifests/", 1)[1].startswith("sha256:")
)
or "/tags/list" in file_path
or file_path.endswith("/tags/list")
)
def get_index_cache_key(self, remote_name: str, path: str) -> str:
"""Generate cache key for index files"""
return f"index:{remote_name}:{hashlib.sha256(path.encode()).hexdigest()[:16]}"
def is_index_valid(
self, remote_name: str, path: str, ttl_override: int = None
) -> bool:
"""Check if index file is still valid (not expired)"""
if not self.available:
return False
try:
key = self.get_index_cache_key(remote_name, path)
return self.client.exists(key) > 0
except Exception:
return False
def mark_index_cached(self, remote_name: str, path: str, ttl: int = 300) -> None:
"""Mark index file as cached with TTL"""
if not self.available:
return
try:
key = self.get_index_cache_key(remote_name, path)
self.client.setex(key, ttl, str(int(time.time())))
except Exception:
pass
def cleanup_expired_index(self, storage, remote_name: str, path: str) -> None:
"""Remove expired index from S3 storage"""
if not self.available:
return
try:
# Construct the URL the same way as in the main flow
from .config import ConfigManager
import os
config_path = os.environ.get("CONFIG_PATH")
if config_path:
config = ConfigManager(config_path)
remote_config = config.get_remote_config(remote_name)
if remote_config:
base_url = remote_config.get("base_url")
if base_url:
# Use hierarchical path-based key (same as cache_single_artifact)
s3_key = storage.get_object_key(remote_name, path)
if storage.exists(s3_key):
storage.client.delete_object(Bucket=storage.bucket, Key=s3_key)
except Exception:
pass
+3
View File
@@ -0,0 +1,3 @@
from .redis import RedisCache
__all__ = ["RedisCache"]
+124
View File
@@ -0,0 +1,124 @@
import hashlib
import re
import time
import redis
class RedisCache:
def __init__(self, redis_url: str):
self.redis_url = redis_url
try:
self.client = redis.from_url(self.redis_url, decode_responses=True)
self.client.ping()
self.available = True
except Exception as e:
print(f"Redis not available: {e}")
self.client = None
self.available = False
def is_mutable_file(self, file_path: str, patterns: list[str] | None = None) -> bool:
if patterns is None:
patterns = []
return any(re.search(p, file_path) for p in patterns)
def get_index_cache_key(self, remote_name: str, path: str) -> str:
return f"index:{remote_name}:{hashlib.sha256(path.encode()).hexdigest()[:16]}"
def get_mutable_meta_key(self, remote_name: str, path: str) -> str:
return f"mutable:meta:{remote_name}:{hashlib.sha256(path.encode()).hexdigest()[:16]}"
def is_index_valid(self, remote_name: str, path: str) -> bool:
if not self.available:
return False
try:
key = self.get_index_cache_key(remote_name, path)
return self.client.exists(key) > 0
except Exception:
return False
def mark_index_cached(self, remote_name: str, path: str, ttl: int = 300) -> None:
if not self.available:
return
try:
key = self.get_index_cache_key(remote_name, path)
self.client.setex(key, ttl, str(int(time.time())))
except Exception:
pass
def store_mutable_meta(self, remote_name: str, path: str, etag: str | None, last_modified: str | None) -> None:
if not self.available:
return
data = {}
if etag:
data["etag"] = etag
if last_modified:
data["last_modified"] = last_modified
if not data:
return
try:
self.client.hset(self.get_mutable_meta_key(remote_name, path), mapping=data)
except Exception:
pass
def get_mutable_meta(self, remote_name: str, path: str) -> dict:
if not self.available:
return {}
try:
return self.client.hgetall(self.get_mutable_meta_key(remote_name, path)) or {}
except Exception:
return {}
def delete_mutable_meta(self, remote_name: str, path: str) -> None:
if not self.available:
return
try:
self.client.delete(self.get_mutable_meta_key(remote_name, path))
except Exception:
pass
def get_artifact_published_key(self, remote_name: str, path: str) -> str:
return f"pkg:published:{remote_name}:{hashlib.sha256(path.encode()).hexdigest()[:16]}"
def store_artifact_published(self, remote_name: str, path: str, last_modified: str) -> None:
"""Persist the upstream Last-Modified header for a (typically immutable) artifact."""
if not self.available:
return
try:
self.client.set(self.get_artifact_published_key(remote_name, path), last_modified)
except Exception:
pass
def get_artifact_published(self, remote_name: str, path: str) -> str | None:
"""Return the stored Last-Modified string for an artifact, or None."""
if not self.available:
return None
try:
return self.client.get(self.get_artifact_published_key(remote_name, path))
except Exception:
return None
def cleanup_expired_index(self, storage, remote_name: str, path: str) -> None:
if not self.available:
return
try:
import os
from ..config import ConfigManager
config_path = os.environ.get("CONFIG_PATH")
if config_path:
config = ConfigManager(config_path)
remote_config = config.get_remote_config(remote_name)
if remote_config:
base_url = remote_config.get("base_url")
if base_url:
s3_key = storage.get_object_key(remote_name, path)
if storage.exists(s3_key):
storage.client.delete_object(Bucket=storage.bucket, Key=s3_key)
except Exception:
pass
self.delete_mutable_meta(remote_name, path)
+160 -32
View File
@@ -1,64 +1,156 @@
import os
import glob
import json
import os
import yaml
from typing import Optional
_TYPE_KEYS = ("remote", "virtual", "local")
def _normalize_loaded(raw: dict) -> dict:
"""Convert {remote: {...}, virtual: {...}, local: {...}} into {remotes: {name: {type: ..., ...}}}."""
remotes = {}
for type_key in _TYPE_KEYS:
for name, cfg in (raw.get(type_key) or {}).items():
remotes[name] = {"type": type_key, **cfg}
result = {k: v for k, v in raw.items() if k not in _TYPE_KEYS}
if remotes:
result["remotes"] = remotes
return result
_PACKAGE_MUTABLE_PATTERNS: dict[str, list[str]] = {
"alpine": [
r"APKINDEX\.tar\.gz$",
],
"rpm": [
r"repomd\.xml$",
r"repodata/.*\.(xml|xml\.gz|xml\.bz2|xml\.xz|xml\.zck|xml\.zst"
r"|sqlite|sqlite\.gz|sqlite\.bz2|sqlite\.xz|sqlite\.zck|sqlite\.zst"
r"|yaml\.xz|yaml\.gz|yaml\.bz2|yaml\.zst|asc|txt)$",
r"Packages\.gz$",
],
"docker": [
r"/manifests/(?!sha256:)[^/]+$",
r"/tags/list$",
],
"pypi": [
r"simple/", # Per-package and top-level simple index pages
],
"npm": [],
"helm": [
r"index\.yaml$",
],
"generic": [],
}
class ConfigManager:
def __init__(self, config_file: str = "remotes.yaml"):
self.config_file = config_file
self._last_modified = 0
def __init__(self, config_path: str = "remotes.yaml"):
self.config_path = config_path
self._config_dir: str | None = None
self._last_modified: float = 0.0
self.config = self._load_config()
def _load_config(self) -> dict:
def _load_single_file(self, path: str) -> dict:
try:
with open(self.config_file, "r") as f:
if self.config_file.endswith(".yaml") or self.config_file.endswith(
".yml"
):
return yaml.safe_load(f)
with open(path) as f:
if path.endswith((".yaml", ".yml")):
raw = yaml.safe_load(f) or {}
else:
return json.load(f)
raw = json.load(f)
return _normalize_loaded(raw)
except FileNotFoundError:
return {}
@staticmethod
def _merge(base: dict, overlay: dict) -> dict:
result = {**base}
for key, value in overlay.items():
if key == "remotes" and isinstance(base.get("remotes"), dict) and isinstance(value, dict):
result["remotes"] = {**base.get("remotes", {}), **value}
else:
result[key] = value
return result
def _load_from_dir(self, dir_path: str) -> dict:
merged: dict = {}
files = sorted(glob.glob(os.path.join(dir_path, "*.yaml")) + glob.glob(os.path.join(dir_path, "*.yml")))
for path in files:
merged = self._merge(merged, self._load_single_file(path))
return merged
def _load_config(self) -> dict:
self._config_dir = None
if os.path.isdir(self.config_path):
return self._load_from_dir(self.config_path) or {"remotes": {}}
config = self._load_single_file(self.config_path)
if not config:
return {"remotes": {}}
def _check_reload(self) -> None:
"""Check if config file has been modified and reload if needed"""
try:
import os
config_dir = config.pop("config_dir", None)
if config_dir:
if not os.path.isabs(config_dir):
config_dir = os.path.join(os.path.dirname(os.path.abspath(self.config_path)), config_dir)
self._config_dir = config_dir
config = self._merge(config, self._load_from_dir(config_dir))
current_modified = os.path.getmtime(self.config_file)
return config
def _file_mtimes(self) -> list[float]:
mtimes: list[float] = []
if os.path.isdir(self.config_path):
for f in glob.glob(os.path.join(self.config_path, "*.yaml")) + glob.glob(os.path.join(self.config_path, "*.yml")):
try:
mtimes.append(os.path.getmtime(f))
except OSError:
pass
else:
try:
mtimes.append(os.path.getmtime(self.config_path))
except OSError:
pass
if self._config_dir and os.path.isdir(self._config_dir):
for f in glob.glob(os.path.join(self._config_dir, "*.yaml")) + glob.glob(os.path.join(self._config_dir, "*.yml")):
try:
mtimes.append(os.path.getmtime(f))
except OSError:
pass
return mtimes
def _check_reload(self) -> None:
try:
current_modified = max(self._file_mtimes(), default=0.0)
if current_modified > self._last_modified:
self._last_modified = current_modified
self.config = self._load_config()
print(f"Config reloaded from {self.config_file}")
print(f"Config reloaded from {self.config_path}")
except OSError:
pass
def get_remote_config(self, remote_name: str) -> Optional[dict]:
def get_remote_config(self, remote_name: str) -> dict | None:
self._check_reload()
return self.config.get("remotes", {}).get(remote_name)
def get_repository_patterns(self, remote_name: str, repo_path: str) -> list:
def get_immutable_patterns(self, remote_name: str, repo_path: str = "") -> list[str]:
remote_config = self.get_remote_config(remote_name)
if not remote_config:
return []
repositories = remote_config.get("repositories", {})
# Handle both dict (GitHub style) and list (Alpine style) repositories
if isinstance(repositories, dict):
repo_config = repositories.get(repo_path)
if repo_config:
patterns = repo_config.get("include_patterns", [])
patterns = repo_config.get("immutable_patterns", [])
else:
patterns = remote_config.get("include_patterns", [])
elif isinstance(repositories, list):
# For Alpine, repositories is just a list of allowed repo names
# Pattern matching is handled by the main include_patterns
patterns = remote_config.get("include_patterns", [])
patterns = remote_config.get("immutable_patterns", [])
else:
patterns = remote_config.get("include_patterns", [])
patterns = remote_config.get("immutable_patterns", [])
return patterns
@@ -92,9 +184,7 @@ class ConfigManager:
if not redis_url:
raise ValueError("REDIS_URL environment variable is required")
return {
"url": redis_url
}
return {"url": redis_url}
def get_database_config(self) -> dict:
"""Get database configuration from environment variables"""
@@ -105,12 +195,37 @@ class ConfigManager:
db_name = os.getenv("DBNAME")
if not all([db_host, db_port, db_user, db_pass, db_name]):
missing = [var for var, val in [("DBHOST", db_host), ("DBPORT", db_port), ("DBUSER", db_user), ("DBPASS", db_pass), ("DBNAME", db_name)] if not val]
missing = [
var
for var, val in [("DBHOST", db_host), ("DBPORT", db_port), ("DBUSER", db_user), ("DBPASS", db_pass), ("DBNAME", db_name)]
if not val
]
raise ValueError(f"All database environment variables are required: {', '.join(missing)}")
db_url = f"postgresql://{db_user}:{db_pass}@{db_host}:{db_port}/{db_name}"
return {"url": db_url}
def get_user_mutable_patterns(self, remote_name: str) -> list[str]:
"""Return only user-configured mutable_patterns, excluding package-type defaults."""
remote_config = self.get_remote_config(remote_name)
if not remote_config:
return []
return remote_config.get("mutable_patterns", [])
def get_mutable_patterns(self, remote_name: str) -> list[str]:
"""Return mutable-file patterns for a remote (TTL is configured per-remote in cache.index_ttl).
Merges the package-level defaults with any extra patterns listed under
``mutable_patterns`` in the remote's config.
"""
remote_config = self.get_remote_config(remote_name)
if not remote_config:
return []
package = remote_config.get("package", "generic")
defaults = _PACKAGE_MUTABLE_PATTERNS.get(package, [])
extra = remote_config.get("mutable_patterns", [])
return defaults + [p for p in extra if p not in defaults]
def get_cache_config(self, remote_name: str) -> dict:
"""Get cache configuration for a specific remote"""
remote_config = self.get_remote_config(remote_name)
@@ -118,3 +233,16 @@ class ConfigManager:
return {}
return remote_config.get("cache", {})
def get_quarantine_config(self, remote_name: str) -> tuple[bool, int]:
"""Return (enabled, quarantine_days) for a remote.
When enabled=True and quarantine_days>0, immutable artifacts published
within the last quarantine_days days are blocked with a 404.
"""
remote_config = self.get_remote_config(remote_name)
if not remote_config:
return False, 0
enabled = bool(remote_config.get("quarantine_new", False))
days = int(remote_config.get("quarantine_days", 0))
return enabled, days
+3
View File
@@ -0,0 +1,3 @@
from .postgres import DatabaseManager
__all__ = ["DatabaseManager"]
@@ -1,5 +1,3 @@
import os
from typing import Optional
import psycopg2
from psycopg2.extras import RealDictCursor
@@ -11,7 +9,6 @@ class DatabaseManager:
self._init_database()
def _init_database(self):
"""Initialize database connection and create schema if needed"""
try:
self.connection = psycopg2.connect(self.db_url)
self.connection.autocommit = True
@@ -23,10 +20,8 @@ class DatabaseManager:
self.available = False
def _create_schema(self):
"""Create tables if they don't exist"""
try:
with self.connection.cursor() as cursor:
# Create table to map S3 keys to remote names
cursor.execute("""
CREATE TABLE IF NOT EXISTS artifact_mappings (
id SERIAL PRIMARY KEY,
@@ -53,27 +48,15 @@ class DatabaseManager:
)
""")
# Create indexes separately
cursor.execute(
"CREATE INDEX IF NOT EXISTS idx_s3_key ON artifact_mappings (s3_key)"
)
cursor.execute(
"CREATE INDEX IF NOT EXISTS idx_remote_name ON artifact_mappings (remote_name)"
)
cursor.execute(
"CREATE INDEX IF NOT EXISTS idx_local_repo_path ON local_files (repository_name, file_path)"
)
cursor.execute(
"CREATE INDEX IF NOT EXISTS idx_local_s3_key ON local_files (s3_key)"
)
cursor.execute("CREATE INDEX IF NOT EXISTS idx_s3_key ON artifact_mappings (s3_key)")
cursor.execute("CREATE INDEX IF NOT EXISTS idx_remote_name ON artifact_mappings (remote_name)")
cursor.execute("CREATE INDEX IF NOT EXISTS idx_local_repo_path ON local_files (repository_name, file_path)")
cursor.execute("CREATE INDEX IF NOT EXISTS idx_local_s3_key ON local_files (s3_key)")
print("Database schema initialized")
except Exception as e:
print(f"Error creating schema: {e}")
def record_artifact_mapping(
self, s3_key: str, remote_name: str, file_path: str, size_bytes: int
):
"""Record mapping between S3 key and remote"""
def record_artifact_mapping(self, s3_key: str, remote_name: str, file_path: str, size_bytes: int):
if not self.available:
return
@@ -95,7 +78,6 @@ class DatabaseManager:
print(f"Error recording artifact mapping: {e}")
def get_storage_by_remote(self) -> dict[str, int]:
"""Get storage size breakdown by remote from database"""
if not self.available:
return {}
@@ -112,8 +94,7 @@ class DatabaseManager:
print(f"Error getting storage by remote: {e}")
return {}
def get_remote_for_s3_key(self, s3_key: str) -> Optional[str]:
"""Get remote name for given S3 key"""
def get_remote_for_s3_key(self, s3_key: str) -> str | None:
if not self.available:
return None
@@ -138,7 +119,6 @@ class DatabaseManager:
sha256_sum: str,
content_type: str = None,
):
"""Add a file to local repository"""
if not self.available:
return False
@@ -165,7 +145,6 @@ class DatabaseManager:
return False
def get_local_file_metadata(self, repository_name: str, file_path: str):
"""Get metadata for a local file"""
if not self.available:
return None
@@ -197,7 +176,6 @@ class DatabaseManager:
return None
def list_local_files(self, repository_name: str, prefix: str = ""):
"""List files in local repository with optional path prefix"""
if not self.available:
return []
@@ -241,7 +219,6 @@ class DatabaseManager:
return []
def delete_local_file(self, repository_name: str, file_path: str):
"""Delete a file from local repository"""
if not self.available:
return False
@@ -263,7 +240,6 @@ class DatabaseManager:
return None
def file_exists(self, repository_name: str, file_path: str):
"""Check if file exists in local repository"""
if not self.available:
return False
+17 -94
View File
@@ -1,96 +1,19 @@
import time
import logging
import re
from typing import Optional
import httpx
logger = logging.getLogger(__name__)
# In-memory token cache: key -> (token, expires_at)
_token_cache: dict[str, tuple[str, float]] = {}
_WWW_AUTH_RE = re.compile(
r'Bearer\s+realm="(?P<realm>[^"]+)"'
r'(?:,service="(?P<service>[^"]*)")?'
r'(?:,scope="(?P<scope>[^"]*)")?',
re.IGNORECASE,
from .auth.docker import (
_cache_key,
_get_cached_token,
_store_token,
_token_cache,
fetch_token,
get_docker_token_for_response,
parse_www_authenticate,
)
def _cache_key(realm: str, service: str, scope: str, username: Optional[str]) -> str:
return f"{realm}|{service}|{scope}|{username or ''}"
def _get_cached_token(key: str) -> Optional[str]:
entry = _token_cache.get(key)
if entry and entry[1] > time.time():
return entry[0]
_token_cache.pop(key, None)
return None
def _store_token(key: str, token: str, expires_in: int) -> None:
# Expire 30s early to avoid using a token right as it expires
_token_cache[key] = (token, time.time() + max(expires_in - 30, 10))
async def fetch_token(
realm: str,
service: str,
scope: str,
username: Optional[str] = None,
password: Optional[str] = None,
) -> Optional[str]:
"""Fetch a Bearer token from a Docker registry auth server."""
key = _cache_key(realm, service, scope, username)
cached = _get_cached_token(key)
if cached:
return cached
params: dict[str, str] = {}
if service:
params["service"] = service
if scope:
params["scope"] = scope
auth = (username, password) if username and password else None
try:
async with httpx.AsyncClient(follow_redirects=True) as client:
response = await client.get(realm, params=params, auth=auth)
response.raise_for_status()
data = response.json()
except Exception as e:
logger.warning(f"Docker token fetch failed ({realm}): {e}")
return None
token = data.get("token") or data.get("access_token")
if not token:
logger.warning(f"Docker token response missing token field: {data}")
return None
expires_in = int(data.get("expires_in", 300))
_store_token(key, token, expires_in)
logger.debug(f"Docker token obtained (realm={realm}, service={service}, scope={scope}, expires_in={expires_in}s)")
return token
def parse_www_authenticate(header: str) -> Optional[tuple[str, str, str]]:
"""Parse WWW-Authenticate: Bearer header. Returns (realm, service, scope) or None."""
m = _WWW_AUTH_RE.search(header)
if not m:
return None
return m.group("realm"), m.group("service") or "", m.group("scope") or ""
async def get_docker_token_for_response(
www_authenticate: str,
username: Optional[str] = None,
password: Optional[str] = None,
) -> Optional[str]:
"""Given a WWW-Authenticate header value, fetch and return a Bearer token."""
parsed = parse_www_authenticate(www_authenticate)
if not parsed:
return None
realm, service, scope = parsed
return await fetch_token(realm, service, scope, username, password)
__all__ = [
"_cache_key",
"_get_cached_token",
"_store_token",
"_token_cache",
"fetch_token",
"get_docker_token_for_response",
"parse_www_authenticate",
]
+61 -730
View File
@@ -1,28 +1,44 @@
import os
import re
import json
import hashlib
import logging
from typing import Dict, Any, Optional
import httpx
from fastapi import FastAPI, HTTPException, Response, Request, Query, File, UploadFile
from fastapi.responses import PlainTextResponse, JSONResponse
import os
from fastapi import FastAPI, File, Query, Request, UploadFile
from fastapi.responses import PlainTextResponse
from prometheus_client import CONTENT_TYPE_LATEST, generate_latest
from pydantic import BaseModel
from prometheus_client import generate_latest, CONTENT_TYPE_LATEST
try:
from importlib.metadata import version
__version__ = version("artifactapi")
except ImportError:
# Fallback for development when package isn't installed
__version__ = "dev"
from .artifact import discovery, flush, local, proxy, virtual
from .artifact import docker as docker_handler
from .cache import RedisCache
from .config import ConfigManager
from .database import DatabaseManager
from .storage import S3Storage
from .cache import RedisCache
from .metrics import MetricsManager
from .docker_auth import get_docker_token_for_response
from .storage import S3Storage
logging.basicConfig(level=logging.INFO, format="%(asctime)s - %(name)s - %(levelname)s - %(message)s")
logger = logging.getLogger(__name__)
app = FastAPI(title="Artifact Storage API", version=__version__)
config_path = os.environ.get("CONFIG_PATH")
if not config_path:
raise ValueError("CONFIG_PATH environment variable is required")
config = ConfigManager(config_path)
s3_config = config.get_s3_config()
redis_config = config.get_redis_config()
db_config = config.get_database_config()
storage = S3Storage(**s3_config)
cache = RedisCache(redis_config["url"])
database = DatabaseManager(db_config["url"])
metrics = MetricsManager(cache, database)
class ArtifactRequest(BaseModel):
@@ -30,41 +46,10 @@ class ArtifactRequest(BaseModel):
include_pattern: str
# Configure logging
logging.basicConfig(
level=logging.INFO,
format='%(asctime)s - %(name)s - %(levelname)s - %(message)s'
)
logger = logging.getLogger(__name__)
app = FastAPI(title="Artifact Storage API", version=__version__)
# Initialize components using config
config_path = os.environ.get("CONFIG_PATH")
if not config_path:
raise ValueError("CONFIG_PATH environment variable is required")
config = ConfigManager(config_path)
# Get configurations
s3_config = config.get_s3_config()
redis_config = config.get_redis_config()
db_config = config.get_database_config()
# Initialize services
storage = S3Storage(**s3_config)
cache = RedisCache(redis_config["url"])
database = DatabaseManager(db_config["url"])
metrics = MetricsManager(cache, database)
@app.get("/")
def read_root():
config._check_reload()
return {
"message": "Artifact Storage API",
"version": app.version,
"remotes": list(config.config.get("remotes", {}).keys()),
}
return {"message": "Artifact Storage API", "version": app.version, "remotes": list(config.config.get("remotes", {}).keys())}
@app.get("/health")
@@ -72,725 +57,71 @@ def health_check():
return {"status": "healthy"}
@app.get("/config")
def get_config():
return config.config
@app.get("/metrics")
def get_metrics(json: bool | None = Query(False, description="Return JSON format instead of Prometheus")):
config._check_reload()
if json:
return metrics.get_metrics(storage, config)
metrics.get_metrics(storage, config)
return PlainTextResponse(generate_latest().decode("utf-8"), media_type=CONTENT_TYPE_LATEST)
@app.put("/cache/flush")
def flush_cache(
remote: str = Query(default=None, description="Specific remote to flush (optional)"),
cache_type: str = Query(default="all", description="Type to flush: 'all', 'index', 'files', 'metrics'")
cache_type: str = Query(default="all", description="Type to flush: 'all', 'index', 'files', 'metrics'"),
):
"""Flush cache entries for specified remote or all remotes"""
try:
result = {
"remote": remote,
"cache_type": cache_type,
"flushed": {
"redis_keys": 0,
"s3_objects": 0,
"operations": []
}
}
# Flush Redis entries based on cache_type
if cache_type in ["all", "index", "metrics"] and cache.available and cache.client:
patterns = []
if cache_type in ["all", "index"]:
if remote:
patterns.append(f"index:{remote}:*")
else:
patterns.append("index:*")
if cache_type in ["all", "metrics"]:
if remote:
patterns.append(f"metrics:*:{remote}")
else:
patterns.append("metrics:*")
for pattern in patterns:
keys = cache.client.keys(pattern)
if keys:
cache.client.delete(*keys)
result["flushed"]["redis_keys"] += len(keys)
logger.info(f"Cache flush: Deleted {len(keys)} Redis keys matching '{pattern}'")
if result["flushed"]["redis_keys"] > 0:
result["flushed"]["operations"].append(f"Deleted {result['flushed']['redis_keys']} Redis keys")
# Flush S3 objects if requested
if cache_type in ["all", "files"]:
try:
# Use prefix filtering for remote-specific deletion
list_params = {"Bucket": storage.bucket}
if remote:
list_params["Prefix"] = f"{remote}/"
response = storage.client.list_objects_v2(**list_params)
if 'Contents' in response:
objects_to_delete = [obj['Key'] for obj in response['Contents']]
for key in objects_to_delete:
try:
storage.client.delete_object(Bucket=storage.bucket, Key=key)
result["flushed"]["s3_objects"] += 1
except Exception as e:
logger.warning(f"Failed to delete S3 object {key}: {e}")
if objects_to_delete:
scope = f" for remote '{remote}'" if remote else ""
result["flushed"]["operations"].append(f"Deleted {len(objects_to_delete)} S3 objects{scope}")
logger.info(f"Cache flush: Deleted {len(objects_to_delete)} S3 objects{scope}")
except Exception as e:
result["flushed"]["operations"].append(f"S3 flush failed: {str(e)}")
logger.error(f"Cache flush S3 error: {e}")
if not result["flushed"]["operations"]:
result["flushed"]["operations"].append("No cache entries found to flush")
return result
except Exception as e:
logger.error(f"Cache flush error: {e}")
raise HTTPException(status_code=500, detail=f"Cache flush failed: {str(e)}")
async def construct_remote_url(remote_name: str, path: str) -> str:
remote_config = config.get_remote_config(remote_name)
if not remote_config:
raise HTTPException(
status_code=404, detail=f"Remote '{remote_name}' not configured"
)
base_url = remote_config.get("base_url")
if not base_url:
raise HTTPException(
status_code=500, detail=f"No base_url configured for remote '{remote_name}'"
)
# Handle Docker registry URLs
if remote_config.get("type") == "docker":
# Convert Docker paths to v2 API format
# e.g., library/nginx/manifests/latest -> v2/library/nginx/manifests/latest
return f"{base_url}/v2/{path}"
return f"{base_url}/{path}"
async def check_artifact_patterns(
remote_name: str, repo_path: str, file_path: str, full_path: str
) -> bool:
# First check if this is an index file - always allow index files
if cache.is_index_file(file_path) or cache.is_index_file(full_path):
return True
# Then check basic include patterns
patterns = config.get_repository_patterns(remote_name, repo_path)
if not patterns:
return True # Allow all if no patterns configured
pattern_matched = False
for pattern in patterns:
# Check both file_path and full_path to handle different pattern types
if re.search(pattern, file_path) or re.search(pattern, full_path):
pattern_matched = True
break
if not pattern_matched:
return False
# All remotes now use pattern-based filtering only - no additional checks needed
return True
async def cache_single_artifact(url: str, remote_name: str, path: str) -> dict:
# Use hierarchical path-based key
key = storage.get_object_key(remote_name, path)
if storage.exists(key):
logger.info(f"Cache ALREADY EXISTS: {url} (key: {key})")
return {
"url": url,
"cached_url": storage.get_url(key),
"status": "already_cached",
}
try:
remote_config = config.get_remote_config(remote_name) or {}
is_docker = remote_config.get("type") == "docker" or "/v2/" in url
# Prepare headers for Docker registry requests
headers = {}
if is_docker:
if "/manifests/" in url:
headers["Accept"] = (
"application/vnd.docker.distribution.manifest.v2+json,"
"application/vnd.oci.image.manifest.v1+json,"
"application/vnd.oci.image.index.v1+json,"
"application/vnd.docker.distribution.manifest.list.v2+json"
)
elif "/blobs/" in url:
headers["Accept"] = "application/octet-stream"
async with httpx.AsyncClient(follow_redirects=True) as client:
response = await client.get(url, headers=headers)
# Handle Docker Bearer token challenge
if response.status_code == 401 and is_docker:
www_auth = response.headers.get("WWW-Authenticate", "")
username = remote_config.get("username")
password = remote_config.get("password")
token = await get_docker_token_for_response(www_auth, username, password)
if token:
headers["Authorization"] = f"Bearer {token}"
response = await client.get(url, headers=headers)
response.raise_for_status()
storage_path = storage.upload(key, response.content)
logger.info(f"Cache ADD SUCCESS: {url} (size: {len(response.content)} bytes, key: {key})")
return {
"url": url,
"cached_url": storage.get_url(key),
"storage_path": storage_path,
"size": len(response.content),
"status": "cached",
}
except Exception as e:
return {"url": url, "status": "error", "error": str(e)}
@app.get("/api/v1/remote/{remote_name}/{path:path}")
async def get_artifact(remote_name: str, path: str):
# Check if remote is configured
remote_config = config.get_remote_config(remote_name)
if not remote_config:
raise HTTPException(
status_code=404, detail=f"Remote '{remote_name}' not configured"
)
# Check if this is a local repository
if remote_config.get("type") == "local":
# Handle local repository download
metadata = database.get_local_file_metadata(remote_name, path)
if not metadata:
raise HTTPException(status_code=404, detail="File not found")
# Get file from S3
content = storage.download_object(metadata["s3_key"])
if content is None:
raise HTTPException(status_code=500, detail="File not accessible")
# Determine content type
content_type = metadata.get("content_type", "application/octet-stream")
return Response(
content=content,
media_type=content_type,
headers={
"Content-Disposition": f"attachment; filename={os.path.basename(path)}"
},
)
# Extract repository path for pattern checking
path_parts = path.split("/")
if len(path_parts) >= 2:
repo_path = f"{path_parts[0]}/{path_parts[1]}"
file_path = "/".join(path_parts[2:])
else:
repo_path = path
file_path = path
# Check if artifact matches configured patterns
if not await check_artifact_patterns(remote_name, repo_path, file_path, path):
logger.info(f"PATTERN BLOCKED: {remote_name}/{path} - not matching include patterns")
raise HTTPException(
status_code=403, detail="Artifact not allowed by configuration patterns"
)
# Construct the remote URL
remote_url = await construct_remote_url(remote_name, path)
# Check if artifact is already cached
cached_key = storage.get_object_key(remote_name, path)
if not storage.exists(cached_key):
cached_key = None
# For index files, check Redis TTL validity
filename = os.path.basename(path)
is_index = cache.is_index_file(path) # Check full path, not just filename
if cached_key and is_index:
# Index file exists, but check if it's still valid
if not cache.is_index_valid(remote_name, path):
# Index has expired, remove it from S3
logger.info(f"Index EXPIRED: {remote_name}/{path} - removing from cache")
cache.cleanup_expired_index(storage, remote_name, path)
cached_key = None # Force re-download
if cached_key:
# Return cached artifact
try:
artifact_data = storage.download_object(cached_key)
filename = os.path.basename(path)
# Log cache hit
logger.info(f"Cache HIT: {remote_name}/{path} (size: {len(artifact_data)} bytes, key: {cached_key})")
# Determine content type based on file extension
content_type = "application/octet-stream"
if filename.endswith(".tar.gz"):
content_type = "application/gzip"
elif filename.endswith(".zip"):
content_type = "application/zip"
elif filename.endswith(".exe"):
content_type = "application/x-msdownload"
elif filename.endswith(".rpm"):
content_type = "application/x-rpm"
elif filename.endswith(".xml"):
content_type = "application/xml"
elif filename.endswith((".xml.gz", ".xml.bz2", ".xml.xz")):
content_type = "application/gzip"
# Record cache hit metrics
metrics.record_cache_hit(remote_name, len(artifact_data))
# Record artifact mapping in database if not already recorded
database.record_artifact_mapping(
cached_key, remote_name, path, len(artifact_data)
)
return Response(
content=artifact_data,
media_type=content_type,
headers={
"Content-Disposition": f"attachment; filename={filename}",
"X-Artifact-Source": "cache",
"X-Artifact-Size": str(len(artifact_data)),
},
)
except Exception as e:
raise HTTPException(
status_code=500, detail=f"Error retrieving cached artifact: {str(e)}"
)
# Artifact not cached, cache it first
logger.info(f"Cache MISS: {remote_name}/{path} - fetching from remote: {remote_url}")
result = await cache_single_artifact(remote_url, remote_name, path)
if result["status"] == "error":
logger.error(f"Cache ADD FAILED: {remote_name}/{path} - {result['error']}")
raise HTTPException(
status_code=502, detail=f"Failed to fetch artifact: {result['error']}"
)
# Mark index files as cached in Redis if this was a new download
if result["status"] == "cached" and is_index:
# Get TTL from remote config
cache_config = config.get_cache_config(remote_name)
index_ttl = cache_config.get("index_ttl", 300) # Default 5 minutes
cache.mark_index_cached(remote_name, path, index_ttl)
logger.info(f"Index file cached with TTL: {remote_name}/{path} (ttl: {index_ttl}s)")
# Now return the cached artifact
try:
cache_key = storage.get_object_key(remote_name, path)
artifact_data = storage.download_object(cache_key)
filename = os.path.basename(path)
content_type = "application/octet-stream"
if filename.endswith(".tar.gz"):
content_type = "application/gzip"
elif filename.endswith(".zip"):
content_type = "application/zip"
elif filename.endswith(".exe"):
content_type = "application/x-msdownload"
elif filename.endswith(".rpm"):
content_type = "application/x-rpm"
elif filename.endswith(".xml"):
content_type = "application/xml"
elif filename.endswith((".xml.gz", ".xml.bz2", ".xml.xz")):
content_type = "application/gzip"
# Record cache miss metrics
metrics.record_cache_miss(remote_name, len(artifact_data))
# Record artifact mapping in database
cache_key = storage.get_object_key(remote_name, path)
database.record_artifact_mapping(
cache_key, remote_name, path, len(artifact_data)
)
return Response(
content=artifact_data,
media_type=content_type,
headers={
"Content-Disposition": f"attachment; filename={filename}",
"X-Artifact-Source": "remote",
"X-Artifact-Size": str(len(artifact_data)),
},
)
except Exception as e:
raise HTTPException(status_code=500, detail=f"Error serving artifact: {str(e)}")
return flush.handle(remote, cache_type, cache, storage)
@app.get("/v2/")
async def docker_v2_ping():
return Response(
content="{}",
media_type="application/json",
headers={"Docker-Distribution-Api-Version": "registry/2.0"},
)
return docker_handler.ping()
@app.api_route("/v2/{remote_name}/{path:path}", methods=["GET", "HEAD"])
async def docker_v2_proxy(request: Request, remote_name: str, path: str):
remote_config = config.get_remote_config(remote_name)
if not remote_config:
raise HTTPException(status_code=404, detail=f"Remote '{remote_name}' not configured")
if remote_config.get("type") != "docker":
raise HTTPException(status_code=400, detail=f"Remote '{remote_name}' is not a docker remote")
# Check include_patterns against the image name (e.g. "library/nginx")
patterns = config.get_repository_patterns(remote_name, "")
if patterns:
path_parts = path.split("/")
image_name = "/".join(path_parts[:2]) if len(path_parts) >= 2 else path
if not any(re.search(p, path) or re.search(p, image_name) for p in patterns):
logger.info(f"PATTERN BLOCKED: {remote_name}/{path}")
raise HTTPException(status_code=403, detail="Image not allowed by configuration patterns")
remote_url = await construct_remote_url(remote_name, path)
cached_key = storage.get_object_key(remote_name, path)
if not storage.exists(cached_key):
cached_key = None
is_index = cache.is_index_file(path)
if cached_key and is_index:
if not cache.is_index_valid(remote_name, path):
logger.info(f"Index EXPIRED: {remote_name}/{path} - removing from cache")
cache.cleanup_expired_index(storage, remote_name, path)
cached_key = None
if not cached_key:
logger.info(f"Cache MISS: {remote_name}/{path} - fetching from remote: {remote_url}")
result = await cache_single_artifact(remote_url, remote_name, path)
if result["status"] == "error":
raise HTTPException(status_code=502, detail=f"Failed to fetch: {result['error']}")
if result["status"] == "cached" and is_index:
cache_config = config.get_cache_config(remote_name)
index_ttl = cache_config.get("index_ttl", 300)
cache.mark_index_cached(remote_name, path, index_ttl)
logger.info(f"Index file cached with TTL: {remote_name}/{path} (ttl: {index_ttl}s)")
artifact_data = storage.download_object(storage.get_object_key(remote_name, path))
is_blob = "/blobs/" in path
if is_blob:
content_type = "application/octet-stream"
else:
try:
manifest_json = json.loads(artifact_data)
content_type = manifest_json.get("mediaType")
if not content_type:
if "manifests" in manifest_json:
content_type = "application/vnd.oci.image.index.v1+json"
else:
content_type = "application/vnd.oci.image.manifest.v1+json"
except Exception:
content_type = "application/vnd.oci.image.manifest.v1+json"
digest = f"sha256:{hashlib.sha256(artifact_data).hexdigest()}"
headers = {
"Docker-Distribution-Api-Version": "registry/2.0",
"Docker-Content-Digest": digest,
"Content-Length": str(len(artifact_data)),
}
if request.method == "HEAD":
return Response(status_code=200, headers=headers, media_type=content_type)
metrics.record_cache_hit(remote_name, len(artifact_data))
return Response(content=artifact_data, media_type=content_type, headers=headers)
return await docker_handler.proxy(request, remote_name, path, storage, cache, config, metrics)
async def discover_artifacts(remote: str, include_pattern: str) -> list[str]:
if "github.com" in remote:
return await discover_github_releases(remote, include_pattern)
else:
raise HTTPException(status_code=400, detail=f"Unsupported remote: {remote}")
@app.get("/api/v1/virtual/{virtual_name}/{path:path}")
async def get_virtual_artifact(request: Request, virtual_name: str, path: str):
return await virtual.handle(request, virtual_name, path, storage, cache, config)
async def discover_github_releases(remote: str, include_pattern: str) -> list[str]:
match = re.match(r"github\.com/([^/]+)/([^/]+)", remote)
if not match:
raise HTTPException(status_code=400, detail="Invalid GitHub remote format")
owner, repo = match.groups()
async with httpx.AsyncClient(follow_redirects=True) as client:
response = await client.get(
f"https://api.github.com/repos/{owner}/{repo}/releases"
)
if response.status_code != 200:
raise HTTPException(
status_code=response.status_code,
detail=f"Failed to fetch releases: {response.text}",
)
releases = response.json()
matching_urls = []
pattern = include_pattern.replace("*", ".*")
regex = re.compile(pattern)
for release in releases:
for asset in release.get("assets", []):
download_url = asset["browser_download_url"]
if regex.search(download_url):
matching_urls.append(download_url)
return matching_urls
@app.get("/api/v1/remote/{remote_name}/{path:path}")
async def get_artifact(request: Request, remote_name: str, path: str):
return await proxy.handle(request, remote_name, path, storage, cache, config, database, metrics)
@app.put("/api/v1/remote/{remote_name}/{path:path}")
async def upload_file(remote_name: str, path: str, file: UploadFile = File(...)):
"""Upload a file to local repository"""
# Check if remote is configured and is local
remote_config = config.get_remote_config(remote_name)
if not remote_config:
raise HTTPException(
status_code=404, detail=f"Remote '{remote_name}' not configured"
)
if remote_config.get("type") != "local":
raise HTTPException(
status_code=400, detail="Upload only supported for local repositories"
)
try:
# Read file content
content = await file.read()
# Calculate SHA256
sha256_sum = hashlib.sha256(content).hexdigest()
# Check if file already exists (prevent overwrite)
if database.file_exists(remote_name, path):
raise HTTPException(status_code=409, detail="File already exists")
# Generate S3 key
s3_key = f"local/{remote_name}/{path}"
# Determine content type
content_type = file.content_type or "application/octet-stream"
# Upload to S3
try:
storage.upload(s3_key, content)
except Exception as e:
raise HTTPException(status_code=500, detail=f"Upload failed: {e}")
# Add to database
success = database.add_local_file(
repository_name=remote_name,
file_path=path,
s3_key=s3_key,
size_bytes=len(content),
sha256_sum=sha256_sum,
content_type=content_type,
)
if not success:
# Clean up S3 if database insert failed
storage.delete_object(s3_key)
raise HTTPException(status_code=500, detail="Failed to save file metadata")
return JSONResponse(
{
"message": "File uploaded successfully",
"file_path": path,
"size_bytes": len(content),
"sha256_sum": sha256_sum,
"content_type": content_type,
}
)
except HTTPException:
raise
except Exception as e:
raise HTTPException(status_code=500, detail=f"Upload failed: {str(e)}")
return await local.upload(remote_name, path, file, storage, database, config)
@app.head("/api/v1/remote/{remote_name}/{path:path}")
def check_file_exists(remote_name: str, path: str):
"""Check if file exists (for CI jobs) - supports local repositories only"""
# Check if remote is configured
remote_config = config.get_remote_config(remote_name)
if not remote_config:
raise HTTPException(
status_code=404, detail=f"Remote '{remote_name}' not configured"
)
# Handle local repository
if remote_config.get("type") == "local":
try:
metadata = database.get_local_file_metadata(remote_name, path)
if not metadata:
raise HTTPException(status_code=404, detail="File not found")
return Response(
headers={
"Content-Length": str(metadata["size_bytes"]),
"Content-Type": metadata.get(
"content_type", "application/octet-stream"
),
"X-SHA256": metadata["sha256_sum"],
"X-Created-At": metadata["created_at"].isoformat()
if metadata["created_at"]
else "",
"X-Uploaded-At": metadata["uploaded_at"].isoformat()
if metadata["uploaded_at"]
else "",
}
)
except HTTPException:
raise
except Exception as e:
raise HTTPException(status_code=500, detail=f"Check failed: {str(e)}")
else:
# For remote repositories, just return 405 Method Not Allowed
raise HTTPException(
status_code=405, detail="HEAD method only supported for local repositories"
)
return local.check_exists(remote_name, path, database, config)
@app.delete("/api/v1/remote/{remote_name}/{path:path}")
def delete_file(remote_name: str, path: str):
"""Delete a file from local repository"""
# Check if remote is configured and is local
remote_config = config.get_remote_config(remote_name)
if not remote_config:
raise HTTPException(
status_code=404, detail=f"Remote '{remote_name}' not configured"
)
if remote_config.get("type") != "local":
raise HTTPException(
status_code=400, detail="Delete only supported for local repositories"
)
try:
# Get S3 key before deleting from database
s3_key = database.delete_local_file(remote_name, path)
if not s3_key:
raise HTTPException(status_code=404, detail="File not found")
# Delete from S3
if not storage.delete_object(s3_key):
# File was deleted from database but not from S3 - log warning but continue
print(f"Warning: Failed to delete S3 object {s3_key}")
return JSONResponse({"message": "File deleted successfully"})
except HTTPException:
raise
except Exception as e:
raise HTTPException(status_code=500, detail=f"Delete failed: {str(e)}")
return local.delete(remote_name, path, storage, database, config)
@app.post("/api/v1/artifacts/cache")
async def cache_artifact(request: ArtifactRequest) -> Dict[str, Any]:
try:
matching_urls = await discover_artifacts(
request.remote, request.include_pattern
)
if not matching_urls:
return {
"message": "No matching artifacts found",
"cached_count": 0,
"artifacts": [],
}
cached_artifacts = []
for url in matching_urls:
result = await cache_single_artifact(url, "", "")
cached_artifacts.append(result)
cached_count = sum(
1
for artifact in cached_artifacts
if artifact["status"] in ["cached", "already_cached"]
)
return {
"message": f"Processed {len(matching_urls)} artifacts, {cached_count} successfully cached",
"cached_count": cached_count,
"artifacts": cached_artifacts,
}
except Exception as e:
raise HTTPException(status_code=500, detail=str(e))
async def cache_artifact(request: ArtifactRequest):
return await discovery.cache_artifacts(request.remote, request.include_pattern, storage)
@app.get("/api/v1/artifacts/{remote:path}")
async def list_cached_artifacts(
remote: str, include_pattern: str = ".*"
) -> Dict[str, Any]:
try:
matching_urls = await discover_artifacts(remote, include_pattern)
cached_artifacts = []
for url in matching_urls:
# Extract path from URL for hierarchical key generation
from urllib.parse import urlparse
parsed = urlparse(url)
path = parsed.path
key = storage.get_object_key(remote, path)
if storage.exists(key):
cached_artifacts.append(
{"url": url, "cached_url": storage.get_url(key), "key": key}
)
return {
"remote": remote,
"pattern": include_pattern,
"total_found": len(matching_urls),
"cached_count": len(cached_artifacts),
"artifacts": cached_artifacts,
}
except Exception as e:
raise HTTPException(status_code=500, detail=str(e))
@app.get("/metrics")
def get_metrics(
json: Optional[bool] = Query(
False, description="Return JSON format instead of Prometheus"
),
):
"""Get comprehensive metrics about the artifact storage system"""
config._check_reload()
if json:
# Return JSON format
return metrics.get_metrics(storage, config)
else:
# Return Prometheus format
metrics.get_metrics(storage, config) # Update gauges
prometheus_data = generate_latest().decode("utf-8")
return PlainTextResponse(prometheus_data, media_type=CONTENT_TYPE_LATEST)
@app.get("/config")
def get_config():
return config.config
async def list_cached_artifacts(remote: str, include_pattern: str = ".*"):
return await discovery.list_artifacts(remote, include_pattern, storage)
def main():
+18 -51
View File
@@ -1,22 +1,14 @@
from datetime import datetime
from typing import Dict, Any
from typing import Any
from prometheus_client import Counter, Gauge
# Prometheus metrics
request_counter = Counter(
"artifact_requests_total", "Total artifact requests", ["remote", "status"]
)
request_counter = Counter("artifact_requests_total", "Total artifact requests", ["remote", "status"])
cache_hit_counter = Counter("artifact_cache_hits_total", "Total cache hits", ["remote"])
cache_miss_counter = Counter(
"artifact_cache_misses_total", "Total cache misses", ["remote"]
)
bandwidth_saved_counter = Counter(
"artifact_bandwidth_saved_bytes_total", "Total bandwidth saved", ["remote"]
)
storage_size_gauge = Gauge(
"artifact_storage_size_bytes", "Storage size by remote", ["remote"]
)
cache_miss_counter = Counter("artifact_cache_misses_total", "Total cache misses", ["remote"])
bandwidth_saved_counter = Counter("artifact_bandwidth_saved_bytes_total", "Total bandwidth saved", ["remote"])
storage_size_gauge = Gauge("artifact_storage_size_bytes", "Storage size by remote", ["remote"])
redis_keys_gauge = Gauge("artifact_redis_keys_total", "Total Redis keys")
@@ -44,9 +36,7 @@ class MetricsManager:
# Increment per-remote counters
self.redis_client.client.incr(f"metrics:cache_hits:{remote_name}")
self.redis_client.client.incr(f"metrics:total_requests:{remote_name}")
self.redis_client.client.incrby(
f"metrics:bandwidth_saved:{remote_name}", size_bytes
)
self.redis_client.client.incrby(f"metrics:bandwidth_saved:{remote_name}", size_bytes)
except Exception:
pass
@@ -91,7 +81,7 @@ class MetricsManager:
except Exception:
return 0
def get_s3_size_by_remote(self, storage, config_manager) -> Dict[str, int]:
def get_s3_size_by_remote(self, storage, config_manager) -> dict[str, int]:
"""Get size of stored data per remote using database mappings"""
if self.database_manager and self.database_manager.available:
# Get from database if available
@@ -146,7 +136,7 @@ class MetricsManager:
except Exception:
return {}
def get_metrics(self, storage, config_manager) -> Dict[str, Any]:
def get_metrics(self, storage, config_manager) -> dict[str, Any]:
"""Get comprehensive metrics"""
# Update Redis keys gauge
redis_key_count = self.get_redis_key_count()
@@ -173,54 +163,31 @@ class MetricsManager:
if self.redis_client and self.redis_client.available:
try:
# Get global metrics
cache_hits = int(
self.redis_client.client.get("metrics:cache_hits") or 0
)
cache_misses = int(
self.redis_client.client.get("metrics:cache_misses") or 0
)
cache_hits = int(self.redis_client.client.get("metrics:cache_hits") or 0)
cache_misses = int(self.redis_client.client.get("metrics:cache_misses") or 0)
total_requests = cache_hits + cache_misses
bandwidth_saved = int(
self.redis_client.client.get("metrics:bandwidth_saved") or 0
)
bandwidth_saved = int(self.redis_client.client.get("metrics:bandwidth_saved") or 0)
metrics["requests"]["cache_hits"] = cache_hits
metrics["requests"]["cache_misses"] = cache_misses
metrics["requests"]["total_requests"] = total_requests
metrics["requests"]["cache_hit_ratio"] = (
cache_hits / total_requests if total_requests > 0 else 0.0
)
metrics["requests"]["cache_hit_ratio"] = cache_hits / total_requests if total_requests > 0 else 0.0
metrics["bandwidth"]["saved_bytes"] = bandwidth_saved
# Get per-remote metrics
for remote in config_manager.config.get("remotes", {}).keys():
remote_cache_hits = int(
self.redis_client.client.get(f"metrics:cache_hits:{remote}")
or 0
)
remote_cache_misses = int(
self.redis_client.client.get(f"metrics:cache_misses:{remote}")
or 0
)
remote_cache_hits = int(self.redis_client.client.get(f"metrics:cache_hits:{remote}") or 0)
remote_cache_misses = int(self.redis_client.client.get(f"metrics:cache_misses:{remote}") or 0)
remote_total = remote_cache_hits + remote_cache_misses
remote_bandwidth_saved = int(
self.redis_client.client.get(
f"metrics:bandwidth_saved:{remote}"
)
or 0
)
remote_bandwidth_saved = int(self.redis_client.client.get(f"metrics:bandwidth_saved:{remote}") or 0)
metrics["per_remote"][remote] = {
"cache_hits": remote_cache_hits,
"cache_misses": remote_cache_misses,
"total_requests": remote_total,
"cache_hit_ratio": remote_cache_hits / remote_total
if remote_total > 0
else 0.0,
"cache_hit_ratio": remote_cache_hits / remote_total if remote_total > 0 else 0.0,
"bandwidth_saved_bytes": remote_bandwidth_saved,
"storage_size_bytes": metrics["storage"]["size_by_remote"].get(
remote, 0
),
"storage_size_bytes": metrics["storage"]["size_by_remote"].get(remote, 0),
}
except Exception:
+4
View File
@@ -0,0 +1,4 @@
from . import generic, helm, npm, python, rpm
from .base import get_content_type
__all__ = ["generic", "helm", "npm", "python", "rpm", "get_content_type"]
+16
View File
@@ -0,0 +1,16 @@
def get_content_type(filename: str) -> str:
if filename.endswith((".tar.gz", ".tgz")):
return "application/gzip"
if filename.endswith(".zip") or filename.endswith(".whl"):
return "application/zip"
if filename.endswith(".exe"):
return "application/x-msdownload"
if filename.endswith(".rpm"):
return "application/x-rpm"
if filename.endswith(".xml"):
return "application/xml"
if filename.endswith((".xml.gz", ".xml.bz2", ".xml.xz")):
return "application/gzip"
if filename.endswith((".yaml", ".yml")):
return "text/yaml"
return "application/octet-stream"
+3
View File
@@ -0,0 +1,3 @@
from .base import get_content_type
__all__ = ["get_content_type"]
+18
View File
@@ -0,0 +1,18 @@
from .base import get_content_type
def resolve_content(
data: bytes,
path: str,
filename: str,
base_url: str,
proxy_url: str,
remote_name: str,
) -> tuple[bytes, str]:
if filename == "index.yaml":
data = data.replace(
base_url.encode(),
f"{proxy_url}/api/v1/remote/{remote_name}".encode(),
)
return data, "text/yaml"
return data, get_content_type(filename)
+21
View File
@@ -0,0 +1,21 @@
import re
from .base import get_content_type
def resolve_content(
data: bytes,
path: str,
filename: str,
immutable_patterns: list[str],
base_url: str,
proxy_url: str,
remote_name: str,
) -> tuple[bytes, str]:
if not any(re.search(p, path) for p in immutable_patterns):
data = data.replace(
base_url.encode(),
f"{proxy_url}/api/v1/remote/{remote_name}".encode(),
)
return data, "application/json"
return data, get_content_type(filename)
+32
View File
@@ -0,0 +1,32 @@
import re
from .base import get_content_type
def construct_url(base_url: str, path: str) -> str:
"""Build the upstream URL for a PyPI request.
PyPI splits simple/ index pages (pypi.org) from file downloads
(files.pythonhosted.org), so simple/ requests are redirected to pypi.org.
"""
if base_url.rstrip("/") == "https://files.pythonhosted.org" and "simple/" in path:
return f"https://pypi.org/{path}"
return f"{base_url}/{path}"
def resolve_content(
data: bytes,
path: str,
filename: str,
immutable_patterns: list[str],
base_url: str,
proxy_url: str,
remote_name: str,
) -> tuple[bytes, str]:
if not any(re.search(p, path) for p in immutable_patterns):
data = data.replace(
base_url.encode(),
f"{proxy_url}/api/v1/remote/{remote_name}".encode(),
)
return data, "text/html; charset=utf-8"
return data, get_content_type(filename)
+3
View File
@@ -0,0 +1,3 @@
from .base import get_content_type
__all__ = ["get_content_type"]
+3
View File
@@ -0,0 +1,3 @@
from .s3 import S3Storage
__all__ = ["S3Storage"]
@@ -1,5 +1,6 @@
import os
import hashlib
import os
import boto3
from botocore.config import Config
from botocore.exceptions import ClientError
@@ -21,27 +22,25 @@ class S3Storage:
self.bucket = bucket
self.secure = secure
ca_bundle = os.environ.get('REQUESTS_CA_BUNDLE') or os.environ.get('SSL_CERT_FILE')
config_kwargs = {
"request_checksum_calculation": "when_required",
"response_checksum_validation": "when_required"
}
ca_bundle = os.environ.get("REQUESTS_CA_BUNDLE") or os.environ.get("SSL_CERT_FILE")
config_kwargs = {"request_checksum_calculation": "when_required", "response_checksum_validation": "when_required"}
client_kwargs = {
"endpoint_url": f"http{'s' if self.secure else ''}://{self.endpoint}",
"aws_access_key_id": self.access_key,
"aws_secret_access_key": self.secret_key,
"config": Config(**config_kwargs)
"config": Config(**config_kwargs),
}
if ca_bundle and os.path.exists(ca_bundle):
client_kwargs["verify"] = ca_bundle
print(f"Debug: Using CA bundle: {ca_bundle}")
else:
print(f"Debug: No CA bundle found. REQUESTS_CA_BUNDLE={os.environ.get('REQUESTS_CA_BUNDLE')}, SSL_CERT_FILE={os.environ.get('SSL_CERT_FILE')}")
print(
f"Debug: No CA bundle found. REQUESTS_CA_BUNDLE={os.environ.get('REQUESTS_CA_BUNDLE')}, SSL_CERT_FILE={os.environ.get('SSL_CERT_FILE')}"
)
self.client = boto3.client("s3", **client_kwargs)
# Try to ensure bucket exists, but don't fail if MinIO isn't ready yet
try:
self._ensure_bucket_exists()
except Exception as e:
@@ -55,25 +54,21 @@ class S3Storage:
self.client.create_bucket(Bucket=self.bucket)
def get_object_key(self, remote_name: str, path: str) -> str:
# Extract directory path and filename
clean_path = path.lstrip('/')
clean_path = path.lstrip("/")
filename = os.path.basename(clean_path)
directory_path = os.path.dirname(clean_path)
# Special handling for Docker registry blobs (use digest as key for deduplication)
# Docker blobs are keyed by digest for deduplication across images
if "/blobs/sha256:" in clean_path:
# Extract the SHA256 digest for Docker blobs
parts = clean_path.split("/blobs/sha256:")
if len(parts) == 2:
digest = parts[1]
return f"{remote_name}/blobs/sha256/{digest}"
# Hash the directory path to keep keys manageable while preserving remote structure
if directory_path:
path_hash = hashlib.sha256(directory_path.encode()).hexdigest()[:16]
return f"{remote_name}/{path_hash}/{filename}"
else:
# If no directory, just use remote and filename
return f"{remote_name}/{filename}"
def exists(self, key: str) -> bool:
View File
+205
View File
@@ -0,0 +1,205 @@
"""
Pytest configuration and shared fixtures.
Module-level setup (env vars + connection patches) runs before any test
module is imported, so the FastAPI app initialises against mocks rather
than real S3 / Redis / PostgreSQL services.
"""
import os
import tempfile
from unittest.mock import MagicMock, patch
import yaml
# ---------------------------------------------------------------------------
# Test remote configuration
# ---------------------------------------------------------------------------
TEST_REMOTES = {
"remotes": {
"alpine-test": {
"base_url": "https://dl-cdn.alpinelinux.org",
"type": "remote",
"package": "alpine",
"immutable_patterns": [".*/x86_64/.*\\.apk$"],
"cache": {"immutable_ttl": 0, "mutable_ttl": 3600},
},
"rpm-test": {
"base_url": "https://example.com/rpm",
"type": "remote",
"package": "rpm",
"immutable_patterns": [".*/x86_64/.*\\.rpm$", ".*/repodata/.*$"],
"cache": {"immutable_ttl": 0, "mutable_ttl": 3600},
},
"docker-test": {
"base_url": "https://registry.example.com",
"type": "remote",
"package": "docker",
"cache": {"immutable_ttl": 0, "mutable_ttl": 300},
},
"docker-restricted": {
"base_url": "https://registry.example.com",
"type": "remote",
"package": "docker",
"immutable_patterns": ["^library/nginx"],
"cache": {"immutable_ttl": 0, "mutable_ttl": 300},
},
"generic-test": {
"base_url": "https://releases.example.com",
"type": "remote",
"package": "generic",
"immutable_patterns": [".*\\.tar\\.gz$"],
"cache": {"immutable_ttl": 0, "mutable_ttl": 0},
},
"custom-index-test": {
"base_url": "https://example.com",
"type": "remote",
"package": "generic",
"mutable_patterns": ["metadata\\.json$"],
"cache": {"immutable_ttl": 0, "mutable_ttl": 600},
},
"check-mutable-test": {
"base_url": "https://example.com",
"type": "remote",
"package": "generic",
"mutable_patterns": ["metadata\\.json$"],
"check_mutable_updates": True,
"cache": {"immutable_ttl": 0, "mutable_ttl": 600},
},
"local-test": {
"type": "local",
"package": "generic",
"cache": {"immutable_ttl": 0, "mutable_ttl": 0},
},
"pypi-test": {
"base_url": "https://files.pythonhosted.org",
"type": "remote",
"package": "pypi",
"immutable_patterns": [
r"packages/.*\.whl$",
r"packages/.*\.whl\.metadata$",
r"packages/.*\.tar\.gz$",
],
"cache": {"immutable_ttl": 0, "mutable_ttl": 600},
},
"npm-test": {
"base_url": "https://registry.npmjs.org",
"type": "remote",
"package": "npm",
"immutable_patterns": [r"\.tgz$"],
"mutable_patterns": [r"^(?!.*\.tgz$).*"],
"cache": {"immutable_ttl": 0, "mutable_ttl": 600},
},
"helm-test": {
"base_url": "https://helm.releases.hashicorp.com",
"type": "remote",
"package": "helm",
"immutable_patterns": [r"\.tgz$"],
"cache": {"immutable_ttl": 0, "mutable_ttl": 3600},
},
"quarantine-test": {
"base_url": "https://releases.example.com",
"type": "remote",
"package": "generic",
"immutable_patterns": [r".*\.tar\.gz$"],
"quarantine_new": True,
"quarantine_days": 3,
"cache": {"immutable_ttl": 0, "mutable_ttl": 0},
},
"quarantine-disabled": {
"base_url": "https://releases.example.com",
"type": "remote",
"package": "generic",
"immutable_patterns": [r".*\.tar\.gz$"],
"quarantine_new": False,
"quarantine_days": 3,
"cache": {"immutable_ttl": 0, "mutable_ttl": 0},
},
"helm-member-2": {
"base_url": "https://charts.example.com",
"type": "remote",
"package": "helm",
"immutable_patterns": [r"\.tgz$"],
"cache": {"immutable_ttl": 0, "mutable_ttl": 1800},
},
"helm-virtual-test": {
"type": "virtual",
"package": "helm",
"members": ["helm-test", "helm-member-2"],
},
"unsupported-virtual-test": {
"type": "virtual",
"package": "rpm",
"members": ["rpm-test"],
},
"empty-virtual-test": {
"type": "virtual",
"package": "helm",
"members": [],
},
}
}
# ---------------------------------------------------------------------------
# Write temp config and set env vars BEFORE importing the package
# ---------------------------------------------------------------------------
_tmpdir = tempfile.mkdtemp()
_config_path = os.path.join(_tmpdir, "remotes.yaml")
with open(_config_path, "w") as _f:
yaml.dump(TEST_REMOTES, _f)
os.environ.update(
{
"CONFIG_PATH": _config_path,
"MINIO_ENDPOINT": "localhost:9000",
"MINIO_ACCESS_KEY": "testkey",
"MINIO_SECRET_KEY": "testsecret",
"MINIO_BUCKET": "testbucket",
"REDIS_URL": "redis://localhost:6379/0",
"DBHOST": "localhost",
"DBPORT": "5432",
"DBUSER": "test",
"DBPASS": "test",
"DBNAME": "test",
}
)
# Patch external service connections before the package is imported.
# These stay active for the whole session (process exits after tests finish).
_boto3_patch = patch("boto3.client", return_value=MagicMock())
_redis_patch = patch("redis.from_url", return_value=MagicMock())
_psycopg2_patch = patch("psycopg2.connect", return_value=MagicMock())
_boto3_patch.start()
_redis_patch.start()
_psycopg2_patch.start()
# ---------------------------------------------------------------------------
# Shared fixtures
# ---------------------------------------------------------------------------
import pytest # noqa: E402
from fastapi.testclient import TestClient # noqa: E402
@pytest.fixture(scope="session")
def app():
from artifactapi.main import app as fastapi_app
return fastapi_app
@pytest.fixture(scope="session")
def client(app):
return TestClient(app)
@pytest.fixture
def config_path():
return _config_path
@pytest.fixture
def test_remotes():
return TEST_REMOTES
+329
View File
@@ -0,0 +1,329 @@
"""Tests for RedisCache, focusing on is_mutable_file with configurable patterns."""
import hashlib
from unittest.mock import ANY, MagicMock, patch
import pytest
from artifactapi.cache import RedisCache
from artifactapi.config import _PACKAGE_MUTABLE_PATTERNS
@pytest.fixture
def bare_cache():
"""RedisCache instance bypassing __init__ (no Redis needed for pure-logic tests)."""
return RedisCache.__new__(RedisCache)
@pytest.fixture
def unavailable_cache():
"""RedisCache where Redis is not reachable."""
with patch("redis.from_url", side_effect=Exception("connection refused")):
return RedisCache("redis://localhost:6379/0")
@pytest.fixture
def mock_redis_client():
return MagicMock()
@pytest.fixture
def cache_with_redis(mock_redis_client):
"""RedisCache backed by a MagicMock Redis client."""
with patch("redis.from_url", return_value=mock_redis_client):
c = RedisCache("redis://localhost:6379/0")
c.client = mock_redis_client
c.available = True
return c
# ---------------------------------------------------------------------------
# is_mutable_file — alpine patterns
# ---------------------------------------------------------------------------
class TestIsMutableFileAlpine:
def test_apkindex_tarball_is_index(self, bare_cache):
patterns = _PACKAGE_MUTABLE_PATTERNS["alpine"]
assert bare_cache.is_mutable_file("alpine/v3.18/x86_64/APKINDEX.tar.gz", patterns)
def test_nested_apkindex_is_index(self, bare_cache):
patterns = _PACKAGE_MUTABLE_PATTERNS["alpine"]
assert bare_cache.is_mutable_file("mirrors/dl-cdn/alpine/v3.19/community/x86_64/APKINDEX.tar.gz", patterns)
def test_apk_package_is_not_index(self, bare_cache):
patterns = _PACKAGE_MUTABLE_PATTERNS["alpine"]
assert not bare_cache.is_mutable_file("alpine/v3.18/x86_64/musl-1.2.4-r2.apk", patterns)
def test_random_tarball_is_not_index(self, bare_cache):
patterns = _PACKAGE_MUTABLE_PATTERNS["alpine"]
assert not bare_cache.is_mutable_file("some/path/archive.tar.gz", patterns)
def test_apkindex_signature_file_is_not_index(self, bare_cache):
# Signature file adjacent to the index should not be treated as an index
patterns = _PACKAGE_MUTABLE_PATTERNS["alpine"]
assert not bare_cache.is_mutable_file("alpine/v3.18/x86_64/APKINDEX.tar.gz.sig", patterns)
def test_apkindex_tmp_file_is_not_index(self, bare_cache):
patterns = _PACKAGE_MUTABLE_PATTERNS["alpine"]
assert not bare_cache.is_mutable_file("alpine/v3.18/x86_64/APKINDEX.tar.gz.tmp", patterns)
# ---------------------------------------------------------------------------
# is_mutable_file — rpm patterns
# ---------------------------------------------------------------------------
class TestIsMutableFileRpm:
def test_repomd_xml_is_index(self, bare_cache):
patterns = _PACKAGE_MUTABLE_PATTERNS["rpm"]
assert bare_cache.is_mutable_file("almalinux/9/x86_64/repomd.xml", patterns)
def test_repodata_primary_xml_gz_is_index(self, bare_cache):
patterns = _PACKAGE_MUTABLE_PATTERNS["rpm"]
assert bare_cache.is_mutable_file("repo/repodata/primary.xml.gz", patterns)
def test_repodata_sqlite_is_index(self, bare_cache):
patterns = _PACKAGE_MUTABLE_PATTERNS["rpm"]
assert bare_cache.is_mutable_file("repo/repodata/primary.sqlite", patterns)
def test_repodata_sqlite_bz2_is_index(self, bare_cache):
patterns = _PACKAGE_MUTABLE_PATTERNS["rpm"]
assert bare_cache.is_mutable_file("repo/repodata/other.sqlite.bz2", patterns)
def test_repodata_yaml_xz_is_index(self, bare_cache):
patterns = _PACKAGE_MUTABLE_PATTERNS["rpm"]
assert bare_cache.is_mutable_file("repo/repodata/comps.yaml.xz", patterns)
def test_packages_gz_pattern_matches_any_path(self, bare_cache):
# The Packages.gz$ regex is a carryover from the original hardcoded logic and
# deliberately matches any path ending in Packages.gz — including Debian-style paths.
# This test documents that intentional behaviour.
patterns = _PACKAGE_MUTABLE_PATTERNS["rpm"]
assert bare_cache.is_mutable_file("debian/dists/stable/main/binary-amd64/Packages.gz", patterns)
def test_rpm_package_is_not_index(self, bare_cache):
patterns = _PACKAGE_MUTABLE_PATTERNS["rpm"]
assert not bare_cache.is_mutable_file("almalinux/9/x86_64/Packages/bash-5.1.8.x86_64.rpm", patterns)
def test_arbitrary_xml_outside_repodata_is_not_index(self, bare_cache):
patterns = _PACKAGE_MUTABLE_PATTERNS["rpm"]
assert not bare_cache.is_mutable_file("some/path/config.xml", patterns)
# ---------------------------------------------------------------------------
# is_mutable_file — docker patterns
# ---------------------------------------------------------------------------
class TestIsMutableFileDocker:
def test_tag_manifest_is_index(self, bare_cache):
patterns = _PACKAGE_MUTABLE_PATTERNS["docker"]
assert bare_cache.is_mutable_file("library/nginx/manifests/latest", patterns)
def test_version_tag_manifest_is_index(self, bare_cache):
patterns = _PACKAGE_MUTABLE_PATTERNS["docker"]
assert bare_cache.is_mutable_file("library/nginx/manifests/1.25.3", patterns)
def test_hyphenated_tag_manifest_is_index(self, bare_cache):
patterns = _PACKAGE_MUTABLE_PATTERNS["docker"]
assert bare_cache.is_mutable_file("library/nginx/manifests/latest-rc", patterns)
def test_numeric_date_tag_manifest_is_index(self, bare_cache):
patterns = _PACKAGE_MUTABLE_PATTERNS["docker"]
assert bare_cache.is_mutable_file("library/nginx/manifests/20240101", patterns)
def test_digest_manifest_is_not_index(self, bare_cache):
patterns = _PACKAGE_MUTABLE_PATTERNS["docker"]
digest = "sha256:" + "a" * 64
assert not bare_cache.is_mutable_file(f"library/nginx/manifests/{digest}", patterns)
def test_tags_list_is_index(self, bare_cache):
patterns = _PACKAGE_MUTABLE_PATTERNS["docker"]
assert bare_cache.is_mutable_file("library/nginx/tags/list", patterns)
def test_blob_is_not_index(self, bare_cache):
patterns = _PACKAGE_MUTABLE_PATTERNS["docker"]
assert not bare_cache.is_mutable_file("library/nginx/blobs/sha256:abc123", patterns)
# ---------------------------------------------------------------------------
# is_mutable_file — edge cases
# ---------------------------------------------------------------------------
class TestIsMutableFileEdgeCases:
def test_empty_patterns_nothing_is_index(self, bare_cache):
assert not bare_cache.is_mutable_file("APKINDEX.tar.gz", [])
assert not bare_cache.is_mutable_file("repomd.xml", [])
assert not bare_cache.is_mutable_file("library/nginx/manifests/latest", [])
def test_none_patterns_nothing_is_index(self, bare_cache):
assert not bare_cache.is_mutable_file("APKINDEX.tar.gz", None)
assert not bare_cache.is_mutable_file("repomd.xml", None)
def test_custom_patterns_match(self, bare_cache):
patterns = [r"metadata\.json$", r"index\.yaml$"]
assert bare_cache.is_mutable_file("repo/metadata.json", patterns)
assert bare_cache.is_mutable_file("repo/subdir/index.yaml", patterns)
assert not bare_cache.is_mutable_file("repo/data.tar.gz", patterns)
def test_custom_pattern_does_not_match_standard_index(self, bare_cache):
patterns = [r"metadata\.json$"]
assert not bare_cache.is_mutable_file("APKINDEX.tar.gz", patterns)
# ---------------------------------------------------------------------------
# get_index_cache_key
# ---------------------------------------------------------------------------
class TestGetIndexCacheKey:
def test_key_format_is_deterministic(self, bare_cache):
# Assert against a pre-computed value to pin the hash algorithm,
# truncation length, and format string in one assertion.
path = "alpine/v3.18/x86_64/APKINDEX.tar.gz"
expected_hash = hashlib.sha256(path.encode()).hexdigest()[:16]
key = bare_cache.get_index_cache_key("alpine-test", path)
assert key == f"index:alpine-test:{expected_hash}"
def test_different_paths_produce_different_keys(self, bare_cache):
k1 = bare_cache.get_index_cache_key("alpine-test", "alpine/v3.18/x86_64/APKINDEX.tar.gz")
k2 = bare_cache.get_index_cache_key("alpine-test", "alpine/v3.19/x86_64/APKINDEX.tar.gz")
assert k1 != k2
def test_different_remotes_produce_different_keys(self, bare_cache):
k1 = bare_cache.get_index_cache_key("remote-a", "path/to/APKINDEX.tar.gz")
k2 = bare_cache.get_index_cache_key("remote-b", "path/to/APKINDEX.tar.gz")
assert k1 != k2
def test_key_starts_with_index_prefix_and_remote(self, bare_cache):
key = bare_cache.get_index_cache_key("myremote", "some/path")
assert key.startswith("index:myremote:")
def test_key_hash_segment_is_16_chars(self, bare_cache):
key = bare_cache.get_index_cache_key("myremote", "some/path/file.xml")
# Format: index:<remote>:<16-char hash> — the fixed length matters for key-space hygiene
parts = key.split(":")
assert len(parts) == 3
assert len(parts[2]) == 16
# ---------------------------------------------------------------------------
# mark_index_cached / is_index_valid
# ---------------------------------------------------------------------------
class TestIndexValidity:
def test_mark_index_cached_calls_setex_with_correct_ttl(self, cache_with_redis, mock_redis_client):
cache_with_redis.mark_index_cached("remote", "path/APKINDEX.tar.gz", 300)
expected_key = cache_with_redis.get_index_cache_key("remote", "path/APKINDEX.tar.gz")
mock_redis_client.setex.assert_called_once_with(expected_key, 300, ANY)
def test_present_key_is_valid(self, cache_with_redis, mock_redis_client):
mock_redis_client.exists.return_value = 1
assert cache_with_redis.is_index_valid("remote", "path/APKINDEX.tar.gz")
def test_missing_key_is_not_valid(self, cache_with_redis, mock_redis_client):
mock_redis_client.exists.return_value = 0
assert not cache_with_redis.is_index_valid("remote", "path/APKINDEX.tar.gz")
def test_unavailable_redis_is_not_valid(self, unavailable_cache):
assert not unavailable_cache.is_index_valid("remote", "some/path")
def test_mark_cached_no_op_when_unavailable(self, unavailable_cache):
# client is None when Redis is unavailable — setex cannot be called
assert unavailable_cache.client is None
unavailable_cache.mark_index_cached("remote", "some/path", 300) # must not raise
# ---------------------------------------------------------------------------
# mutable meta (ETag / Last-Modified storage)
# ---------------------------------------------------------------------------
class TestMutableMeta:
def test_meta_key_format(self, bare_cache):
path = "repo/metadata.json"
expected_hash = hashlib.sha256(path.encode()).hexdigest()[:16]
assert bare_cache.get_mutable_meta_key("myremote", path) == f"mutable:meta:myremote:{expected_hash}"
def test_meta_key_hash_is_16_chars(self, bare_cache):
key = bare_cache.get_mutable_meta_key("remote", "some/path/file.json")
assert len(key.split(":")[-1]) == 16
def test_store_and_retrieve_etag(self, cache_with_redis, mock_redis_client):
mock_redis_client.hgetall.return_value = {"etag": '"abc123"'}
cache_with_redis.store_mutable_meta("remote", "path/meta.json", '"abc123"', None)
mock_redis_client.hset.assert_called_once()
meta = cache_with_redis.get_mutable_meta("remote", "path/meta.json")
assert meta["etag"] == '"abc123"'
def test_store_and_retrieve_last_modified(self, cache_with_redis, mock_redis_client):
lm = "Mon, 01 Jan 2024 00:00:00 GMT"
mock_redis_client.hgetall.return_value = {"last_modified": lm}
cache_with_redis.store_mutable_meta("remote", "path/meta.json", None, lm)
meta = cache_with_redis.get_mutable_meta("remote", "path/meta.json")
assert meta["last_modified"] == lm
def test_store_no_op_when_both_none(self, cache_with_redis, mock_redis_client):
cache_with_redis.store_mutable_meta("remote", "path/meta.json", None, None)
mock_redis_client.hset.assert_not_called()
def test_store_no_op_when_unavailable(self, unavailable_cache):
unavailable_cache.store_mutable_meta("remote", "path", "etag", None) # must not raise
def test_get_returns_empty_when_unavailable(self, unavailable_cache):
assert unavailable_cache.get_mutable_meta("remote", "path") == {}
def test_delete_removes_meta_key(self, cache_with_redis, mock_redis_client):
expected_key = cache_with_redis.get_mutable_meta_key("remote", "path/meta.json")
cache_with_redis.delete_mutable_meta("remote", "path/meta.json")
mock_redis_client.delete.assert_called_once_with(expected_key)
def test_delete_no_op_when_unavailable(self, unavailable_cache):
unavailable_cache.delete_mutable_meta("remote", "path") # must not raise
# ---------------------------------------------------------------------------
# artifact published date (quarantine support)
# ---------------------------------------------------------------------------
class TestArtifactPublished:
def test_key_format_is_deterministic(self, bare_cache):
path = "some/path/package-1.0.tar.gz"
expected_hash = hashlib.sha256(path.encode()).hexdigest()[:16]
assert bare_cache.get_artifact_published_key("myremote", path) == f"pkg:published:myremote:{expected_hash}"
def test_key_hash_is_16_chars(self, bare_cache):
key = bare_cache.get_artifact_published_key("remote", "path/to/file.whl")
assert len(key.split(":")[-1]) == 16
def test_different_paths_produce_different_keys(self, bare_cache):
k1 = bare_cache.get_artifact_published_key("remote", "pkg-1.0.tar.gz")
k2 = bare_cache.get_artifact_published_key("remote", "pkg-2.0.tar.gz")
assert k1 != k2
def test_store_calls_set_with_correct_value(self, cache_with_redis, mock_redis_client):
lm = "Mon, 01 Jan 2024 00:00:00 GMT"
cache_with_redis.store_artifact_published("remote", "path/pkg.tar.gz", lm)
expected_key = cache_with_redis.get_artifact_published_key("remote", "path/pkg.tar.gz")
mock_redis_client.set.assert_called_once_with(expected_key, lm)
def test_get_returns_stored_value(self, cache_with_redis, mock_redis_client):
lm = "Tue, 15 Mar 2022 12:00:00 GMT"
mock_redis_client.get.return_value = lm
result = cache_with_redis.get_artifact_published("remote", "path/pkg.tar.gz")
assert result == lm
def test_get_returns_none_when_not_stored(self, cache_with_redis, mock_redis_client):
mock_redis_client.get.return_value = None
result = cache_with_redis.get_artifact_published("remote", "path/pkg.tar.gz")
assert result is None
def test_store_no_op_when_unavailable(self, unavailable_cache):
unavailable_cache.store_artifact_published("remote", "path", "Mon, 01 Jan 2024 00:00:00 GMT")
def test_get_returns_none_when_unavailable(self, unavailable_cache):
assert unavailable_cache.get_artifact_published("remote", "path") is None
+598
View File
@@ -0,0 +1,598 @@
"""Tests for ConfigManager, focusing on get_mutable_patterns and get_immutable_patterns."""
import os
import pytest
import yaml
from artifactapi.config import ConfigManager
@pytest.fixture
def make_config(tmp_path):
"""Factory: write a remote dict to a temp YAML and return a ConfigManager."""
def _make(remotes_dict):
cfg_file = tmp_path / "remotes.yaml"
cfg_file.write_text(yaml.dump({"remote": remotes_dict}))
return ConfigManager(str(cfg_file))
return _make
# ---------------------------------------------------------------------------
# get_mutable_patterns
# ---------------------------------------------------------------------------
class TestGetMutablePatterns:
def test_alpine_returns_package_defaults(self, make_config):
cfg = make_config({"r": {"package": "alpine", "base_url": "https://x.com"}})
patterns = cfg.get_mutable_patterns("r")
assert r"APKINDEX\.tar\.gz$" in patterns
def test_rpm_returns_package_defaults(self, make_config):
cfg = make_config({"r": {"package": "rpm", "base_url": "https://x.com"}})
patterns = cfg.get_mutable_patterns("r")
assert r"repomd\.xml$" in patterns
assert any("repodata" in p for p in patterns)
def test_docker_returns_package_defaults(self, make_config):
cfg = make_config({"r": {"package": "docker", "base_url": "https://x.com"}})
patterns = cfg.get_mutable_patterns("r")
assert any("manifests" in p for p in patterns)
assert any("tags/list" in p for p in patterns)
def test_generic_returns_empty_list(self, make_config):
cfg = make_config({"r": {"package": "generic", "base_url": "https://x.com"}})
assert cfg.get_mutable_patterns("r") == []
def test_unknown_remote_returns_empty_list(self, make_config):
cfg = make_config({})
assert cfg.get_mutable_patterns("nonexistent") == []
def test_missing_package_field_defaults_to_generic(self, make_config):
cfg = make_config({"r": {"base_url": "https://x.com"}})
assert cfg.get_mutable_patterns("r") == []
def test_unknown_package_type_returns_empty_list(self, make_config):
# A mis-spelled package type silently returns [] — this is a known footgun
cfg = make_config({"r": {"package": "deb", "base_url": "https://x.com"}})
assert cfg.get_mutable_patterns("r") == []
def test_extra_patterns_appended_after_defaults(self, make_config):
cfg = make_config(
{
"r": {
"package": "alpine",
"base_url": "https://x.com",
"mutable_patterns": [r"custom\.json$"],
}
}
)
patterns = cfg.get_mutable_patterns("r")
assert r"APKINDEX\.tar\.gz$" in patterns
assert r"custom\.json$" in patterns
# Defaults come first
assert patterns.index(r"APKINDEX\.tar\.gz$") < patterns.index(r"custom\.json$")
def test_explicit_empty_extra_patterns_returns_defaults(self, make_config):
cfg = make_config(
{
"r": {
"package": "alpine",
"base_url": "https://x.com",
"mutable_patterns": [],
}
}
)
assert r"APKINDEX\.tar\.gz$" in cfg.get_mutable_patterns("r")
def test_duplicate_extra_pattern_not_added_twice(self, make_config):
existing = r"APKINDEX\.tar\.gz$"
cfg = make_config(
{
"r": {
"package": "alpine",
"base_url": "https://x.com",
"mutable_patterns": [existing],
}
}
)
patterns = cfg.get_mutable_patterns("r")
assert patterns.count(existing) == 1
def test_generic_with_only_extra_patterns(self, make_config):
cfg = make_config(
{
"r": {
"package": "generic",
"base_url": "https://x.com",
"mutable_patterns": [r"meta\.json$", r"index\.yaml$"],
}
}
)
assert cfg.get_mutable_patterns("r") == [r"meta\.json$", r"index\.yaml$"]
def test_rpm_extra_patterns_merged(self, make_config):
cfg = make_config(
{
"r": {
"package": "rpm",
"base_url": "https://x.com",
"mutable_patterns": [r"custom-meta\.xml$"],
}
}
)
patterns = cfg.get_mutable_patterns("r")
assert r"repomd\.xml$" in patterns
assert r"custom-meta\.xml$" in patterns
def test_npm_has_no_package_defaults(self, make_config):
cfg = make_config({"r": {"package": "npm", "base_url": "https://x.com"}})
assert cfg.get_mutable_patterns("r") == []
def test_npm_explicit_mutable_pattern_matches_metadata(self, make_config):
import re
cfg = make_config(
{
"r": {
"package": "npm",
"base_url": "https://x.com",
"mutable_patterns": [r"^(?!.*\.tgz$).*"],
}
}
)
patterns = cfg.get_mutable_patterns("r")
assert any(re.search(p, "express") for p in patterns)
assert any(re.search(p, "@babel/core") for p in patterns)
def test_helm_returns_index_yaml_as_mutable(self, make_config):
cfg = make_config({"r": {"package": "helm", "base_url": "https://helm.example.com"}})
patterns = cfg.get_mutable_patterns("r")
assert r"index\.yaml$" in patterns
def test_helm_chart_tarballs_not_mutable_by_default(self, make_config):
import re
cfg = make_config({"r": {"package": "helm", "base_url": "https://helm.example.com"}})
patterns = cfg.get_mutable_patterns("r")
# Only index.yaml is mutable; .tgz chart tarballs are not
assert not any(re.search(p, "vault-0.29.1.tgz") for p in patterns)
assert not any(re.search(p, "consul-1.5.0.tgz") for p in patterns)
def test_npm_explicit_mutable_pattern_excludes_tarballs(self, make_config):
import re
cfg = make_config(
{
"r": {
"package": "npm",
"base_url": "https://x.com",
"mutable_patterns": [r"^(?!.*\.tgz$).*"],
}
}
)
patterns = cfg.get_mutable_patterns("r")
assert not any(re.search(p, "express-4.18.2.tgz") for p in patterns)
assert not any(re.search(p, "express/-/express-4.18.2.tgz") for p in patterns)
# ---------------------------------------------------------------------------
# get_immutable_patterns
# ---------------------------------------------------------------------------
class TestGetImmutablePatterns:
def test_returns_immutable_patterns(self, make_config):
cfg = make_config(
{
"r": {
"package": "generic",
"base_url": "https://x.com",
"immutable_patterns": [r".*\.tar\.gz$"],
}
}
)
assert cfg.get_immutable_patterns("r") == [r".*\.tar\.gz$"]
def test_returns_empty_for_missing_remote(self, make_config):
cfg = make_config({})
assert cfg.get_immutable_patterns("nonexistent") == []
def test_returns_empty_when_no_patterns_configured(self, make_config):
cfg = make_config({"r": {"package": "generic", "base_url": "https://x.com"}})
assert cfg.get_immutable_patterns("r") == []
def test_multiple_patterns_returned(self, make_config):
patterns = [r".*\.rpm$", r".*/repodata/.*$"]
cfg = make_config(
{
"r": {
"package": "rpm",
"base_url": "https://x.com",
"immutable_patterns": patterns,
}
}
)
assert cfg.get_immutable_patterns("r") == patterns
def test_dict_keyed_repositories_returns_per_repo_patterns(self, make_config):
cfg = make_config(
{
"r": {
"package": "generic",
"base_url": "https://x.com",
"immutable_patterns": [r".*\.tar\.gz$"],
"repositories": {
"/path/to/repo": {"immutable_patterns": [r".*\.rpm$"]},
},
}
}
)
assert cfg.get_immutable_patterns("r", "/path/to/repo") == [r".*\.rpm$"]
def test_dict_keyed_repositories_falls_back_to_remote_patterns(self, make_config):
cfg = make_config(
{
"r": {
"package": "generic",
"base_url": "https://x.com",
"immutable_patterns": [r".*\.tar\.gz$"],
"repositories": {
"/path/to/repo": {"immutable_patterns": [r".*\.rpm$"]},
},
}
}
)
assert cfg.get_immutable_patterns("r", "/unknown/path") == [r".*\.tar\.gz$"]
# ---------------------------------------------------------------------------
# get_user_mutable_patterns
# ---------------------------------------------------------------------------
class TestGetUserMutablePatterns:
def test_returns_only_user_patterns(self, make_config):
cfg = make_config(
{
"r": {
"package": "alpine",
"base_url": "https://x.com",
"mutable_patterns": [r"custom\.json$"],
}
}
)
assert cfg.get_user_mutable_patterns("r") == [r"custom\.json$"]
def test_excludes_package_defaults(self, make_config):
# Package defaults (APKINDEX etc.) must NOT appear here
cfg = make_config({"r": {"package": "alpine", "base_url": "https://x.com"}})
assert cfg.get_user_mutable_patterns("r") == []
def test_returns_empty_for_missing_remote(self, make_config):
cfg = make_config({})
assert cfg.get_user_mutable_patterns("nonexistent") == []
def test_returns_empty_when_key_absent(self, make_config):
cfg = make_config({"r": {"package": "generic", "base_url": "https://x.com"}})
assert cfg.get_user_mutable_patterns("r") == []
# ---------------------------------------------------------------------------
# get_cache_config
# ---------------------------------------------------------------------------
class TestGetCacheConfig:
def test_returns_cache_section(self, make_config):
cfg = make_config(
{
"r": {
"package": "generic",
"base_url": "https://x.com",
"cache": {"immutable_ttl": 0, "mutable_ttl": 7200},
}
}
)
assert cfg.get_cache_config("r") == {"immutable_ttl": 0, "mutable_ttl": 7200}
def test_returns_empty_dict_for_missing_remote(self, make_config):
cfg = make_config({})
assert cfg.get_cache_config("nonexistent") == {}
def test_returns_empty_dict_when_no_cache_key(self, make_config):
cfg = make_config({"r": {"package": "generic", "base_url": "https://x.com"}})
assert cfg.get_cache_config("r") == {}
# ---------------------------------------------------------------------------
# Config file reload
# ---------------------------------------------------------------------------
class TestConfigReload:
def test_reloads_when_file_mtime_advances(self, tmp_path):
cfg_file = tmp_path / "remotes.yaml"
cfg_file.write_text(yaml.dump({"remote": {"repo-a": {"package": "generic", "base_url": "https://x.com"}}}))
cfg = ConfigManager(str(cfg_file))
assert "repo-a" in cfg.config["remotes"]
cfg_file.write_text(yaml.dump({"remote": {"repo-b": {"package": "generic", "base_url": "https://y.com"}}}))
future_mtime = cfg._last_modified + 1
os.utime(str(cfg_file), (future_mtime, future_mtime))
cfg._check_reload()
assert "repo-b" in cfg.config["remotes"]
assert "repo-a" not in cfg.config["remotes"]
def test_no_reload_when_file_unchanged(self, tmp_path):
cfg_file = tmp_path / "remotes.yaml"
cfg_file.write_text(yaml.dump({"remote": {"repo-a": {"package": "generic", "base_url": "https://x.com"}}}))
cfg = ConfigManager(str(cfg_file))
# Call check_reload without touching the file — should not reload
cfg._check_reload()
assert "repo-a" in cfg.config["remotes"]
# ---------------------------------------------------------------------------
# get_quarantine_config
# ---------------------------------------------------------------------------
class TestGetQuarantineConfig:
def test_returns_false_zero_when_not_configured(self, make_config):
cfg = make_config({"r": {"package": "generic", "base_url": "https://x.com"}})
enabled, days = cfg.get_quarantine_config("r")
assert enabled is False
assert days == 0
def test_returns_false_zero_for_missing_remote(self, make_config):
cfg = make_config({})
enabled, days = cfg.get_quarantine_config("nonexistent")
assert enabled is False
assert days == 0
def test_enabled_true_and_days_returned(self, make_config):
cfg = make_config(
{
"r": {
"package": "generic",
"base_url": "https://x.com",
"quarantine_new": True,
"quarantine_days": 7,
}
}
)
enabled, days = cfg.get_quarantine_config("r")
assert enabled is True
assert days == 7
def test_quarantine_new_false_returns_disabled(self, make_config):
cfg = make_config(
{
"r": {
"package": "generic",
"base_url": "https://x.com",
"quarantine_new": False,
"quarantine_days": 7,
}
}
)
enabled, days = cfg.get_quarantine_config("r")
assert enabled is False
assert days == 7
def test_enabled_with_zero_days_returns_zero(self, make_config):
cfg = make_config(
{
"r": {
"package": "generic",
"base_url": "https://x.com",
"quarantine_new": True,
"quarantine_days": 0,
}
}
)
enabled, days = cfg.get_quarantine_config("r")
assert enabled is True
assert days == 0
# ---------------------------------------------------------------------------
# Directory mode (CONFIG_PATH points to a directory)
# ---------------------------------------------------------------------------
def _remote(base_url: str = "https://x.com") -> dict:
return {"package": "generic", "base_url": base_url}
class TestConfigDirMode:
def test_loads_all_yaml_files(self, tmp_path):
(tmp_path / "a.yaml").write_text(yaml.dump({"remote": {"repo-a": _remote()}}))
(tmp_path / "b.yaml").write_text(yaml.dump({"remote": {"repo-b": _remote("https://y.com")}}))
cfg = ConfigManager(str(tmp_path))
assert "repo-a" in cfg.config["remotes"]
assert "repo-b" in cfg.config["remotes"]
def test_later_file_overrides_earlier_on_same_key(self, tmp_path):
(tmp_path / "a.yaml").write_text(yaml.dump({"remote": {"r": _remote("https://first.com")}}))
(tmp_path / "b.yaml").write_text(yaml.dump({"remote": {"r": _remote("https://second.com")}}))
cfg = ConfigManager(str(tmp_path))
assert cfg.config["remotes"]["r"]["base_url"] == "https://second.com"
def test_empty_directory_returns_empty_remotes(self, tmp_path):
cfg = ConfigManager(str(tmp_path))
assert cfg.config == {"remotes": {}}
def test_ignores_non_yaml_files(self, tmp_path):
(tmp_path / "notes.txt").write_text("not yaml")
(tmp_path / "a.yaml").write_text(yaml.dump({"remote": {"repo-a": _remote()}}))
cfg = ConfigManager(str(tmp_path))
assert list(cfg.config["remotes"].keys()) == ["repo-a"]
def test_reload_picks_up_new_file(self, tmp_path):
(tmp_path / "a.yaml").write_text(yaml.dump({"remote": {"repo-a": _remote()}}))
cfg = ConfigManager(str(tmp_path))
assert "repo-a" in cfg.config["remotes"]
assert "repo-b" not in cfg.config["remotes"]
new_file = tmp_path / "b.yaml"
new_file.write_text(yaml.dump({"remote": {"repo-b": _remote("https://y.com")}}))
future_mtime = cfg._last_modified + 1
os.utime(str(new_file), (future_mtime, future_mtime))
cfg._check_reload()
assert "repo-a" in cfg.config["remotes"]
assert "repo-b" in cfg.config["remotes"]
# ---------------------------------------------------------------------------
# config_dir key (main file contains a config_dir pointer)
# ---------------------------------------------------------------------------
class TestConfigDirKey:
def test_merges_remotes_from_config_dir(self, tmp_path):
conf_d = tmp_path / "conf.d"
conf_d.mkdir()
(conf_d / "remotes.yaml").write_text(yaml.dump({"remote": {"repo-extra": _remote("https://extra.com")}}))
main = tmp_path / "config.yaml"
main.write_text(yaml.dump({"config_dir": str(conf_d), "remote": {"repo-main": _remote()}}))
cfg = ConfigManager(str(main))
assert "repo-main" in cfg.config["remotes"]
assert "repo-extra" in cfg.config["remotes"]
def test_relative_config_dir_resolved_from_main_file(self, tmp_path):
conf_d = tmp_path / "conf.d"
conf_d.mkdir()
(conf_d / "r.yaml").write_text(yaml.dump({"remote": {"repo-a": _remote()}}))
main = tmp_path / "config.yaml"
main.write_text(yaml.dump({"config_dir": "conf.d"}))
cfg = ConfigManager(str(main))
assert "repo-a" in cfg.config["remotes"]
def test_config_dir_key_not_present_in_loaded_config(self, tmp_path):
conf_d = tmp_path / "conf.d"
conf_d.mkdir()
main = tmp_path / "config.yaml"
main.write_text(yaml.dump({"config_dir": str(conf_d), "remote": {}}))
cfg = ConfigManager(str(main))
assert "config_dir" not in cfg.config
def test_dir_remote_overrides_main_file_remote(self, tmp_path):
conf_d = tmp_path / "conf.d"
conf_d.mkdir()
(conf_d / "override.yaml").write_text(yaml.dump({"remote": {"r": _remote("https://new.com")}}))
main = tmp_path / "config.yaml"
main.write_text(yaml.dump({"config_dir": str(conf_d), "remote": {"r": _remote("https://old.com")}}))
cfg = ConfigManager(str(main))
assert cfg.config["remotes"]["r"]["base_url"] == "https://new.com"
def test_empty_config_dir_uses_main_file_only(self, tmp_path):
conf_d = tmp_path / "conf.d"
conf_d.mkdir()
main = tmp_path / "config.yaml"
main.write_text(yaml.dump({"config_dir": str(conf_d), "remote": {"repo-main": _remote()}}))
cfg = ConfigManager(str(main))
assert list(cfg.config["remotes"].keys()) == ["repo-main"]
def test_reload_picks_up_changed_dir_file(self, tmp_path):
conf_d = tmp_path / "conf.d"
conf_d.mkdir()
dir_file = conf_d / "r.yaml"
dir_file.write_text(yaml.dump({"remote": {"repo-v1": _remote()}}))
main = tmp_path / "config.yaml"
main.write_text(yaml.dump({"config_dir": str(conf_d), "remote": {}}))
cfg = ConfigManager(str(main))
assert "repo-v1" in cfg.config["remotes"]
dir_file.write_text(yaml.dump({"remote": {"repo-v2": _remote("https://v2.com")}}))
future_mtime = cfg._last_modified + 1
os.utime(str(dir_file), (future_mtime, future_mtime))
cfg._check_reload()
assert "repo-v2" in cfg.config["remotes"]
assert "repo-v1" not in cfg.config["remotes"]
# ---------------------------------------------------------------------------
# YAML format normalisation — top-level type keys
# ---------------------------------------------------------------------------
class TestYamlTypeKeys:
def test_remote_key_injects_type_remote(self, tmp_path):
f = tmp_path / "r.yaml"
f.write_text(yaml.dump({"remote": {"my-remote": {"package": "generic", "base_url": "https://x.com"}}}))
cfg = ConfigManager(str(f))
assert cfg.config["remotes"]["my-remote"]["type"] == "remote"
def test_virtual_key_injects_type_virtual(self, tmp_path):
f = tmp_path / "r.yaml"
f.write_text(yaml.dump({"virtual": {"my-virtual": {"package": "helm", "members": ["a", "b"]}}}))
cfg = ConfigManager(str(f))
assert cfg.config["remotes"]["my-virtual"]["type"] == "virtual"
assert cfg.config["remotes"]["my-virtual"]["members"] == ["a", "b"]
def test_local_key_injects_type_local(self, tmp_path):
f = tmp_path / "r.yaml"
f.write_text(yaml.dump({"local": {"my-local": {"package": "generic"}}}))
cfg = ConfigManager(str(f))
assert cfg.config["remotes"]["my-local"]["type"] == "local"
def test_mixed_file_all_three_types(self, tmp_path):
f = tmp_path / "r.yaml"
f.write_text(
yaml.dump(
{
"remote": {"r": {"package": "helm", "base_url": "https://helm.example.com"}},
"virtual": {"v": {"package": "helm", "members": ["r"]}},
"local": {"l": {"package": "generic"}},
}
)
)
cfg = ConfigManager(str(f))
assert cfg.config["remotes"]["r"]["type"] == "remote"
assert cfg.config["remotes"]["v"]["type"] == "virtual"
assert cfg.config["remotes"]["l"]["type"] == "local"
def test_type_field_not_required_in_yaml(self, tmp_path):
f = tmp_path / "r.yaml"
f.write_text(yaml.dump({"remote": {"r": {"package": "alpine", "base_url": "https://x.com"}}}))
cfg = ConfigManager(str(f))
raw = cfg.config["remotes"]["r"]
# type is injected by the loader; the original dict had no type key
assert "type" in raw
assert raw["type"] == "remote"
def test_other_fields_preserved_after_normalisation(self, tmp_path):
f = tmp_path / "r.yaml"
f.write_text(
yaml.dump(
{
"remote": {
"r": {
"package": "helm",
"base_url": "https://helm.example.com",
"immutable_patterns": [r"\.tgz$"],
"cache": {"immutable_ttl": 0, "mutable_ttl": 1800},
}
}
}
)
)
cfg = ConfigManager(str(f))
remote = cfg.config["remotes"]["r"]
assert remote["package"] == "helm"
assert remote["base_url"] == "https://helm.example.com"
assert remote["cache"] == {"immutable_ttl": 0, "mutable_ttl": 1800}
assert r"\.tgz$" in remote["immutable_patterns"]
+273
View File
@@ -0,0 +1,273 @@
"""Tests for docker_auth: WWW-Authenticate parsing and token caching."""
import time
from unittest.mock import AsyncMock, MagicMock, patch
import httpx
import pytest
from artifactapi import docker_auth
from artifactapi.docker_auth import (
_cache_key,
_get_cached_token,
_store_token,
fetch_token,
get_docker_token_for_response,
parse_www_authenticate,
)
@pytest.fixture(autouse=True)
def clear_token_cache():
"""Isolate tests: wipe the module-level token cache before and after each test."""
docker_auth._token_cache.clear()
yield
docker_auth._token_cache.clear()
# ---------------------------------------------------------------------------
# parse_www_authenticate
# ---------------------------------------------------------------------------
class TestParseWwwAuthenticate:
def test_full_bearer_header(self):
header = 'Bearer realm="https://auth.docker.io/token",service="registry.docker.io",scope="repository:library/nginx:pull"'
result = parse_www_authenticate(header)
assert result is not None
realm, service, scope = result
assert realm == "https://auth.docker.io/token"
assert service == "registry.docker.io"
assert scope == "repository:library/nginx:pull"
def test_realm_only(self):
header = 'Bearer realm="https://auth.example.com/token"'
result = parse_www_authenticate(header)
assert result is not None
realm, service, scope = result
assert realm == "https://auth.example.com/token"
assert service == ""
assert scope == ""
def test_realm_and_service_only(self):
header = 'Bearer realm="https://auth.example.com",service="registry.example.com"'
result = parse_www_authenticate(header)
assert result is not None
_, service, scope = result
assert service == "registry.example.com"
assert scope == ""
def test_invalid_scheme_returns_none(self):
assert parse_www_authenticate('Basic realm="example"') is None
def test_empty_header_returns_none(self):
assert parse_www_authenticate("") is None
def test_case_insensitive_bearer_parses_realm(self):
header = 'bearer realm="https://auth.example.com/token"'
result = parse_www_authenticate(header)
assert result is not None
realm, _, _ = result
assert realm == "https://auth.example.com/token"
def test_field_order_scope_before_service_drops_service(self):
# The regex requires realm,service,scope order; scope before service
# results in service being silently dropped. This test documents the known limitation.
header = 'Bearer realm="https://auth.example.com",scope="repo:pull",service="svc"'
result = parse_www_authenticate(header)
assert result is not None
realm, service, scope = result
assert realm == "https://auth.example.com"
assert scope == "repo:pull"
assert service == "" # silently dropped when out of order
# ---------------------------------------------------------------------------
# _cache_key
# ---------------------------------------------------------------------------
class TestCacheKey:
def test_key_contains_all_components(self):
key = _cache_key("https://realm.com", "svc", "scope", "user")
assert "https://realm.com" in key
assert "svc" in key
assert "scope" in key
assert "user" in key
def test_none_username_uses_empty_string(self):
key = _cache_key("https://realm.com", "svc", "scope", None)
assert key.endswith("|")
def test_different_services_give_different_keys(self):
k1 = _cache_key("realm", "svc1", "scope", None)
k2 = _cache_key("realm", "svc2", "scope", None)
assert k1 != k2
def test_different_scopes_give_different_keys(self):
k1 = _cache_key("realm", "svc", "scope:read", None)
k2 = _cache_key("realm", "svc", "scope:write", None)
assert k1 != k2
def test_pipe_in_field_value_can_collide_with_adjacent_fields(self):
# The "|" separator is not escaped, so a pipe embedded in one field
# produces the same key as the same pipe appearing as a separator boundary.
# This is a known limitation: _cache_key("a|b","c","d",None) ==
# _cache_key("a","b|c","d",None). Documents the behaviour, not a claim it's correct.
k1 = _cache_key("a|b", "c", "d", None)
k2 = _cache_key("a", "b|c", "d", None)
assert k1 == k2
# ---------------------------------------------------------------------------
# _get_cached_token / _store_token
# ---------------------------------------------------------------------------
class TestTokenCaching:
def test_get_returns_none_when_not_cached(self):
assert _get_cached_token("no-such-key") is None
def test_get_returns_token_when_valid(self):
_store_token("mykey", "tok-abc", 300)
assert _get_cached_token("mykey") == "tok-abc"
def test_get_returns_none_when_expired(self):
docker_auth._token_cache["mykey"] = ("old-token", time.time() - 1)
assert _get_cached_token("mykey") is None
def test_expired_entry_is_removed_from_cache(self):
docker_auth._token_cache["mykey"] = ("old-token", time.time() - 1)
_get_cached_token("mykey")
assert "mykey" not in docker_auth._token_cache
def test_store_expires_30s_before_stated_time(self):
before = time.time()
_store_token("mykey", "tok", 100)
_, expires_at = docker_auth._token_cache["mykey"]
# expires_in - 30 = 70; allow ±2 s clock wiggle
assert before + 68 <= expires_at <= before + 72
def test_store_enforces_minimum_10s_expiry(self):
before = time.time()
_store_token("mykey", "tok", 5) # expires_in - 30 would be negative
_, expires_at = docker_auth._token_cache["mykey"]
assert expires_at >= before + 10
# ---------------------------------------------------------------------------
# fetch_token (async, mocks httpx)
# ---------------------------------------------------------------------------
def _make_mock_http_client(token_payload: dict):
mock_response = MagicMock()
mock_response.raise_for_status = MagicMock()
mock_response.json.return_value = token_payload
mock_client = AsyncMock()
mock_client.get = AsyncMock(return_value=mock_response)
ctx = MagicMock()
ctx.__aenter__ = AsyncMock(return_value=mock_client)
ctx.__aexit__ = AsyncMock(return_value=False)
return ctx, mock_client
class TestFetchToken:
async def test_returns_token_field(self):
ctx, _ = _make_mock_http_client({"token": "bearer-tok", "expires_in": 300})
with patch("httpx.AsyncClient", return_value=ctx):
token = await fetch_token("https://auth.example.com", "svc", "scope")
assert token == "bearer-tok"
async def test_falls_back_to_access_token_field(self):
ctx, _ = _make_mock_http_client({"access_token": "access-tok", "expires_in": 300})
with patch("httpx.AsyncClient", return_value=ctx):
token = await fetch_token("https://auth.example.com", "svc", "scope")
assert token == "access-tok"
async def test_returns_none_when_response_missing_token_field(self):
ctx, _ = _make_mock_http_client({"not_token": "value", "expires_in": 300})
with patch("httpx.AsyncClient", return_value=ctx):
token = await fetch_token("https://auth.example.com", "svc", "scope")
assert token is None
async def test_defaults_expires_in_to_300_when_missing(self):
ctx, _ = _make_mock_http_client({"token": "tok"}) # no expires_in key
before = time.time()
with patch("httpx.AsyncClient", return_value=ctx):
token = await fetch_token("https://auth.example.com", "svc", "scope")
assert token == "tok"
key = _cache_key("https://auth.example.com", "svc", "scope", None)
_, expires_at = docker_auth._token_cache[key]
# Default expires_in=300, stored as time.time() + max(300-30, 10) = 270
assert before + 268 <= expires_at <= before + 272
async def test_uses_cache_on_second_call_without_http(self):
ctx, mock_client = _make_mock_http_client({"token": "cached-tok", "expires_in": 300})
with patch("httpx.AsyncClient", return_value=ctx):
await fetch_token("https://auth.example.com", "svc", "scope")
mock_client.get.reset_mock()
token = await fetch_token("https://auth.example.com", "svc", "scope")
mock_client.get.assert_not_called()
assert token == "cached-tok"
async def test_returns_none_on_network_error(self):
mock_client = AsyncMock()
mock_client.get = AsyncMock(side_effect=Exception("connection refused"))
ctx = MagicMock()
ctx.__aenter__ = AsyncMock(return_value=mock_client)
ctx.__aexit__ = AsyncMock(return_value=False)
with patch("httpx.AsyncClient", return_value=ctx):
token = await fetch_token("https://auth.example.com", "svc", "scope")
assert token is None
async def test_returns_none_on_http_status_error(self):
mock_response = MagicMock()
mock_response.raise_for_status.side_effect = httpx.HTTPStatusError("401 Unauthorized", request=MagicMock(), response=MagicMock())
mock_client = AsyncMock()
mock_client.get = AsyncMock(return_value=mock_response)
ctx = MagicMock()
ctx.__aenter__ = AsyncMock(return_value=mock_client)
ctx.__aexit__ = AsyncMock(return_value=False)
with patch("httpx.AsyncClient", return_value=ctx):
token = await fetch_token("https://auth.example.com", "svc", "scope")
assert token is None
async def test_passes_credentials_as_auth_tuple(self):
ctx, mock_client = _make_mock_http_client({"token": "authed-tok", "expires_in": 300})
with patch("httpx.AsyncClient", return_value=ctx):
await fetch_token("https://auth.example.com", "svc", "scope", "user", "pass")
call_kwargs = mock_client.get.call_args.kwargs
assert call_kwargs.get("auth") == ("user", "pass")
async def test_no_auth_when_no_credentials(self):
ctx, mock_client = _make_mock_http_client({"token": "anon-tok", "expires_in": 300})
with patch("httpx.AsyncClient", return_value=ctx):
await fetch_token("https://auth.example.com", "svc", "scope")
call_kwargs = mock_client.get.call_args.kwargs
assert call_kwargs.get("auth") is None
# ---------------------------------------------------------------------------
# get_docker_token_for_response
# ---------------------------------------------------------------------------
class TestGetDockerTokenForResponse:
async def test_returns_none_for_non_bearer_header(self):
token = await get_docker_token_for_response('Basic realm="example"')
assert token is None
async def test_end_to_end_parse_and_fetch(self):
"""parse_www_authenticate → fetch_token wired together end-to-end."""
header = 'Bearer realm="https://auth.example.com",service="svc",scope="repo:pull"'
ctx, mock_client = _make_mock_http_client({"token": "e2e-tok", "expires_in": 300})
with patch("httpx.AsyncClient", return_value=ctx):
token = await get_docker_token_for_response(header, "user", "pass")
assert token == "e2e-tok"
call_kwargs = mock_client.get.call_args.kwargs
assert call_kwargs["params"]["service"] == "svc"
assert call_kwargs["params"]["scope"] == "repo:pull"
assert call_kwargs["auth"] == ("user", "pass")
+1077
View File
File diff suppressed because it is too large Load Diff
+132
View File
@@ -0,0 +1,132 @@
"""Tests for S3Storage: get_object_key (pure logic) and I/O methods."""
import hashlib
from unittest.mock import MagicMock, patch
import pytest
from botocore.exceptions import ClientError
from fastapi import HTTPException
from artifactapi.storage import S3Storage
@pytest.fixture
def storage():
"""S3Storage with a mocked boto3 client."""
with patch("boto3.client", return_value=MagicMock()):
s = S3Storage(
endpoint="localhost:9000",
access_key="testkey",
secret_key="testsecret",
bucket="testbucket",
secure=False,
)
s.client = MagicMock()
return s
# ---------------------------------------------------------------------------
# get_object_key
# ---------------------------------------------------------------------------
class TestGetObjectKey:
def test_key_has_three_part_structure(self, storage):
# remote / hash-segment / filename
key = storage.get_object_key("myremote", "some/path/to/file.rpm")
parts = key.split("/")
assert len(parts) == 3
assert parts[0] == "myremote"
assert parts[2] == "file.rpm"
assert len(parts[1]) == 16 # SHA-256 hex truncated to 16 chars
def test_key_uses_sha256_of_directory_path(self, storage):
# Pin the hash algorithm, truncation length, and format in one assertion
key = storage.get_object_key("myremote", "some/path/to/file.rpm")
expected_hash = hashlib.sha256(b"some/path/to").hexdigest()[:16]
assert key == f"myremote/{expected_hash}/file.rpm"
def test_different_remotes_give_different_keys(self, storage):
k1 = storage.get_object_key("remote-a", "path/to/file.rpm")
k2 = storage.get_object_key("remote-b", "path/to/file.rpm")
assert k1 != k2
def test_different_directories_give_different_keys(self, storage):
k1 = storage.get_object_key("myremote", "path/version-1/file.rpm")
k2 = storage.get_object_key("myremote", "path/version-2/file.rpm")
assert k1 != k2
assert k1.split("/")[-1] == k2.split("/")[-1] == "file.rpm"
def test_leading_slash_stripped(self, storage):
k1 = storage.get_object_key("myremote", "/path/to/file.rpm")
k2 = storage.get_object_key("myremote", "path/to/file.rpm")
assert k1 == k2
def test_file_with_no_directory(self, storage):
key = storage.get_object_key("myremote", "file.rpm")
assert key == "myremote/file.rpm"
def test_docker_blob_uses_digest_path(self, storage):
digest = "a" * 64 # realistic 64-char SHA-256 hex string
path = f"library/nginx/blobs/sha256:{digest}"
key = storage.get_object_key("dockerhub", path)
assert key == f"dockerhub/blobs/sha256/{digest}"
def test_docker_blob_deduplication_across_images(self, storage):
"""Same blob digest pulled from different images maps to the same S3 key."""
digest = "deadbeef" * 8 # 64-char hex
k1 = storage.get_object_key("dockerhub", f"library/nginx/blobs/sha256:{digest}")
k2 = storage.get_object_key("dockerhub", f"library/ubuntu/blobs/sha256:{digest}")
assert k1 == k2
def test_docker_blob_different_digests_different_keys(self, storage):
k1 = storage.get_object_key("dockerhub", "library/nginx/blobs/sha256:" + "a" * 64)
k2 = storage.get_object_key("dockerhub", "library/nginx/blobs/sha256:" + "b" * 64)
assert k1 != k2
def test_docker_blob_different_remotes_different_keys(self, storage):
digest = "abc" * 21 + "d" # 64-char hex
k1 = storage.get_object_key("remote-a", f"library/nginx/blobs/sha256:{digest}")
k2 = storage.get_object_key("remote-b", f"library/nginx/blobs/sha256:{digest}")
assert k1 != k2
# ---------------------------------------------------------------------------
# get_url
# ---------------------------------------------------------------------------
class TestGetUrl:
def test_returns_http_url_for_insecure_endpoint(self, storage):
url = storage.get_url("myremote/abc123/file.rpm")
assert url == "http://localhost:9000/testbucket/myremote/abc123/file.rpm"
def test_returns_http_url_for_secure_storage(self):
with patch("boto3.client", return_value=MagicMock()):
s = S3Storage(endpoint="s3.example.com", access_key="k", secret_key="s", bucket="b", secure=True)
s.client = MagicMock()
# get_url uses http:// always (direct internal access address, not the S3 protocol)
assert s.get_url("path/to/file.rpm") == "http://s3.example.com/b/path/to/file.rpm"
# ---------------------------------------------------------------------------
# upload / download_object
# ---------------------------------------------------------------------------
class TestUpload:
def test_upload_returns_s3_uri(self, storage):
storage.client.put_object.return_value = {}
result = storage.upload("myremote/abc123/file.rpm", b"content")
assert result == "s3://testbucket/myremote/abc123/file.rpm"
class TestDownloadObject:
def test_download_object_raises_404_on_client_error(self, storage):
storage.client.get_object.side_effect = ClientError(
{"Error": {"Code": "NoSuchKey", "Message": "The specified key does not exist"}},
"GetObject",
)
with pytest.raises(HTTPException) as exc_info:
storage.download_object("nonexistent/key")
assert exc_info.value.status_code == 404
+596
View File
@@ -0,0 +1,596 @@
"""Unit tests for the virtual repository handler (artifact/virtual.py)."""
from datetime import UTC, date, datetime
from unittest.mock import AsyncMock, MagicMock, patch
import pytest
import yaml
from artifactapi.artifact.virtual import (
_HANDLERS,
_get_member_index,
_HelmDumper,
_HelmHandler,
_merge_helm_indexes,
_VirtualHandler,
)
# ---------------------------------------------------------------------------
# Shared sample data
# ---------------------------------------------------------------------------
_INDEX_A = b"""\
apiVersion: v1
entries:
vault:
- name: vault
version: "0.27.0"
urls:
- https://helm.releases.hashicorp.com/vault-0.27.0.tgz
consul:
- name: consul
version: "1.2.0"
urls:
- https://helm.releases.hashicorp.com/consul-1.2.0.tgz
generated: "2023-01-01T00:00:00.000Z"
"""
_INDEX_B = b"""\
apiVersion: v1
entries:
nginx:
- name: nginx
version: "15.0.0"
urls:
- https://charts.example.com/nginx-15.0.0.tgz
vault:
- name: vault
version: "0.27.0"
urls:
- https://charts.example.com/vault-0.27.0.tgz
- name: vault
version: "0.26.0"
urls:
- https://charts.example.com/vault-0.26.0.tgz
generated: "2023-01-01T00:00:00.000Z"
"""
_INDEX_SIMPLE = b"""\
apiVersion: v1
entries:
mychart:
- name: mychart
version: "1.0.0"
urls:
- https://helm.releases.hashicorp.com/mychart-1.0.0.tgz
generated: "2023-01-01T00:00:00.000Z"
"""
_CFG_A = {"base_url": "https://helm.releases.hashicorp.com", "cache": {"mutable_ttl": 3600}}
_CFG_B = {"base_url": "https://charts.example.com", "cache": {"mutable_ttl": 1800}}
def _identity_resolve(data, *args, **kwargs):
return data, None
# ---------------------------------------------------------------------------
# _HelmDumper — datetime/date YAML serialization
# ---------------------------------------------------------------------------
class TestHelmDumper:
def _dump(self, value):
return yaml.dump({"v": value}, Dumper=_HelmDumper)
def test_datetime_with_tz_includes_Z_suffix(self):
dt = datetime(2023, 6, 15, 12, 0, 0, tzinfo=UTC)
assert "Z" in self._dump(dt)
def test_datetime_without_tz_has_no_Z_suffix(self):
dt = datetime(2023, 6, 15, 12, 0, 0)
assert "Z" not in self._dump(dt)
def test_datetime_uses_T_separator_not_space(self):
dt = datetime(2023, 6, 15, 12, 30, 0, tzinfo=UTC)
assert "T12:30:00" in self._dump(dt)
def test_date_serialized_as_iso_string(self):
assert "2023-01-15" in self._dump(date(2023, 1, 15))
def test_datetime_round_trips_as_string_not_python_datetime(self):
dt = datetime(2023, 6, 15, 12, 0, 0, tzinfo=UTC)
parsed = yaml.safe_load(self._dump(dt))
# yaml.safe_load must not re-parse this as a datetime object
assert isinstance(parsed["v"], str)
def test_date_round_trips_as_string_not_python_date(self):
parsed = yaml.safe_load(self._dump(date(2023, 1, 15)))
assert isinstance(parsed["v"], str)
# ---------------------------------------------------------------------------
# _HelmHandler
# ---------------------------------------------------------------------------
class TestHelmHandler:
def setup_method(self):
self.handler = _HelmHandler()
def test_accepts_index_yaml(self):
assert self.handler.accepts_path("index.yaml") is True
def test_rejects_tgz_path(self):
assert self.handler.accepts_path("vault-0.27.0.tgz") is False
def test_rejects_subdirectory_index(self):
assert self.handler.accepts_path("charts/index.yaml") is False
def test_rejects_empty_path(self):
assert self.handler.accepts_path("") is False
def test_path_error_is_non_empty_string(self):
msg = self.handler.path_error()
assert isinstance(msg, str) and len(msg) > 0
def test_merge_returns_bytes(self):
with patch("artifactapi.artifact.virtual._helm.resolve_content", side_effect=_identity_resolve):
result = self.handler.merge([_INDEX_A], ["member-a"], [_CFG_A], "http://proxy.example.com")
assert isinstance(result, bytes)
def test_merge_delegates_to_merge_helm_indexes(self):
with patch("artifactapi.artifact.virtual._merge_helm_indexes", return_value=b"merged") as mock_fn:
result = self.handler.merge([b"data"], ["m"], [{}], "http://proxy")
mock_fn.assert_called_once_with([b"data"], ["m"], [{}], "http://proxy")
assert result == b"merged"
# ---------------------------------------------------------------------------
# _HANDLERS registry
# ---------------------------------------------------------------------------
class TestHandlersRegistry:
def test_helm_handler_is_registered(self):
assert "helm" in _HANDLERS
assert isinstance(_HANDLERS["helm"], _HelmHandler)
def test_helm_handler_satisfies_protocol(self):
assert isinstance(_HANDLERS["helm"], _VirtualHandler)
# ---------------------------------------------------------------------------
# _merge_helm_indexes
# ---------------------------------------------------------------------------
class TestMergeHelmIndexes:
def _merge(self, raw_indexes, member_names, member_configs, proxy_base="http://proxy.example.com"):
with patch("artifactapi.artifact.virtual._helm.resolve_content", side_effect=_identity_resolve):
return _merge_helm_indexes(raw_indexes, member_names, member_configs, proxy_base)
def _parse(self, raw):
return yaml.safe_load(raw)
def test_single_member_all_charts_present(self):
index = self._parse(self._merge([_INDEX_A], ["member-a"], [_CFG_A]))
assert "vault" in index["entries"]
assert "consul" in index["entries"]
def test_two_members_non_overlapping_charts_all_present(self):
index = self._parse(self._merge([_INDEX_A, _INDEX_B], ["member-a", "member-b"], [_CFG_A, _CFG_B]))
assert "vault" in index["entries"]
assert "consul" in index["entries"]
assert "nginx" in index["entries"]
def test_first_member_wins_on_duplicate_name_and_version(self):
index = self._parse(self._merge([_INDEX_A, _INDEX_B], ["member-a", "member-b"], [_CFG_A, _CFG_B]))
v027 = next(e for e in index["entries"]["vault"] if e["version"] == "0.27.0")
assert "helm.releases.hashicorp.com" in v027["urls"][0]
def test_different_versions_of_same_chart_both_included(self):
index = self._parse(self._merge([_INDEX_A, _INDEX_B], ["member-a", "member-b"], [_CFG_A, _CFG_B]))
versions = {e["version"] for e in index["entries"]["vault"]}
assert "0.27.0" in versions
assert "0.26.0" in versions
def test_malformed_yaml_from_member_is_skipped(self):
index = self._parse(self._merge([_INDEX_A, b"{bad yaml"], ["member-a", "bad"], [_CFG_A, _CFG_B]))
assert "vault" in index["entries"]
assert "consul" in index["entries"]
def test_output_has_apiVersion_v1(self):
index = self._parse(self._merge([_INDEX_A], ["member-a"], [_CFG_A]))
assert index["apiVersion"] == "v1"
def test_output_has_generated_field(self):
index = self._parse(self._merge([_INDEX_A], ["member-a"], [_CFG_A]))
assert "generated" in index
def test_output_is_valid_yaml(self):
raw = self._merge([_INDEX_A, _INDEX_B], ["member-a", "member-b"], [_CFG_A, _CFG_B])
assert isinstance(yaml.safe_load(raw), dict)
def test_empty_index_from_member_produces_no_entries(self):
empty = b"apiVersion: v1\nentries: {}\ngenerated: '2023-01-01T00:00:00.000Z'\n"
index = self._parse(self._merge([empty], ["member-a"], [_CFG_A]))
assert index["entries"] == {}
# ---------------------------------------------------------------------------
# _get_member_index (async)
# ---------------------------------------------------------------------------
class TestGetMemberIndex:
@pytest.fixture
def storage(self):
m = MagicMock()
m.get_object_key.return_value = "member/key/index.yaml"
m.exists.return_value = False
m.download_object.return_value = b"cached bytes"
return m
@pytest.fixture
def cache(self):
m = MagicMock()
m.is_index_valid.return_value = False
return m
@pytest.fixture
def member_cfg(self):
return {"base_url": "https://helm.releases.hashicorp.com", "cache": {"mutable_ttl": 3600}}
def _fake_response(self, content=b"upstream bytes"):
r = MagicMock()
r.content = content
r.raise_for_status = MagicMock()
return r
def _patch_httpx(self, response):
mock_client_cls = patch("artifactapi.artifact.virtual.httpx.AsyncClient")
p = mock_client_cls.start()
mock_client = AsyncMock()
p.return_value.__aenter__.return_value = mock_client
mock_client.get.return_value = response
return mock_client_cls, mock_client
async def test_cache_hit_returns_stored_bytes(self, storage, cache, member_cfg):
storage.exists.return_value = True
cache.is_index_valid.return_value = True
_, _, _, raw_data = await _get_member_index("m", member_cfg, "index.yaml", storage, cache)
assert raw_data == b"cached bytes"
async def test_cache_hit_does_not_fetch_upstream(self, storage, cache, member_cfg):
storage.exists.return_value = True
cache.is_index_valid.return_value = True
with patch("artifactapi.artifact.virtual.httpx.AsyncClient") as mock_cls:
await _get_member_index("m", member_cfg, "index.yaml", storage, cache)
mock_cls.assert_not_called()
async def test_cache_hit_storage_error_falls_through_to_upstream(self, storage, cache, member_cfg):
storage.exists.return_value = True
cache.is_index_valid.return_value = True
storage.download_object.side_effect = Exception("S3 read error")
with patch("artifactapi.artifact.virtual.httpx.AsyncClient") as mock_cls:
mock_client = AsyncMock()
mock_cls.return_value.__aenter__.return_value = mock_client
mock_client.get.return_value = self._fake_response(b"fresh bytes")
_, _, _, raw_data = await _get_member_index("m", member_cfg, "index.yaml", storage, cache)
assert raw_data == b"fresh bytes"
async def test_cache_miss_fetches_from_upstream(self, storage, cache, member_cfg):
with patch("artifactapi.artifact.virtual.httpx.AsyncClient") as mock_cls:
mock_client = AsyncMock()
mock_cls.return_value.__aenter__.return_value = mock_client
mock_client.get.return_value = self._fake_response()
_, _, _, raw_data = await _get_member_index("m", member_cfg, "index.yaml", storage, cache)
assert raw_data == b"upstream bytes"
async def test_cache_miss_stores_result_in_s3(self, storage, cache, member_cfg):
with patch("artifactapi.artifact.virtual.httpx.AsyncClient") as mock_cls:
mock_client = AsyncMock()
mock_cls.return_value.__aenter__.return_value = mock_client
mock_client.get.return_value = self._fake_response()
await _get_member_index("m", member_cfg, "index.yaml", storage, cache)
storage.upload.assert_called_once()
async def test_cache_miss_marks_cache_with_configured_ttl(self, storage, cache, member_cfg):
with patch("artifactapi.artifact.virtual.httpx.AsyncClient") as mock_cls:
mock_client = AsyncMock()
mock_cls.return_value.__aenter__.return_value = mock_client
mock_client.get.return_value = self._fake_response()
await _get_member_index("m", member_cfg, "index.yaml", storage, cache)
cache.mark_index_cached.assert_called_once_with("m", "index.yaml", 3600)
async def test_cache_miss_with_auth_sends_basic_auth_header(self, storage, cache):
cfg = {
"base_url": "https://private.example.com",
"username": "user",
"password": "pass",
"cache": {"mutable_ttl": 3600},
}
with patch("artifactapi.artifact.virtual.httpx.AsyncClient") as mock_cls:
mock_client = AsyncMock()
mock_cls.return_value.__aenter__.return_value = mock_client
mock_client.get.return_value = self._fake_response()
await _get_member_index("m", cfg, "index.yaml", storage, cache)
headers = mock_client.get.call_args.kwargs["headers"]
assert "Authorization" in headers
assert headers["Authorization"].startswith("Basic ")
async def test_no_credentials_sends_no_auth_header(self, storage, cache, member_cfg):
with patch("artifactapi.artifact.virtual.httpx.AsyncClient") as mock_cls:
mock_client = AsyncMock()
mock_cls.return_value.__aenter__.return_value = mock_client
mock_client.get.return_value = self._fake_response()
await _get_member_index("m", member_cfg, "index.yaml", storage, cache)
headers = mock_client.get.call_args.kwargs["headers"]
assert "Authorization" not in headers
async def test_upstream_fetch_failure_returns_none(self, storage, cache, member_cfg):
with patch("artifactapi.artifact.virtual.httpx.AsyncClient") as mock_cls:
mock_client = AsyncMock()
mock_cls.return_value.__aenter__.return_value = mock_client
mock_client.get.side_effect = Exception("connection refused")
_, _, _, raw_data = await _get_member_index("m", member_cfg, "index.yaml", storage, cache)
assert raw_data is None
async def test_s3_upload_failure_still_returns_data(self, storage, cache, member_cfg):
storage.upload.side_effect = Exception("S3 write error")
with patch("artifactapi.artifact.virtual.httpx.AsyncClient") as mock_cls:
mock_client = AsyncMock()
mock_cls.return_value.__aenter__.return_value = mock_client
mock_client.get.return_value = self._fake_response()
_, _, _, raw_data = await _get_member_index("m", member_cfg, "index.yaml", storage, cache)
assert raw_data == b"upstream bytes"
async def test_returns_ttl_from_config(self, storage, cache):
cfg = {"base_url": "https://example.com", "cache": {"mutable_ttl": 900}}
with patch("artifactapi.artifact.virtual.httpx.AsyncClient") as mock_cls:
mock_client = AsyncMock()
mock_cls.return_value.__aenter__.return_value = mock_client
mock_client.get.return_value = self._fake_response()
_, _, ttl, _ = await _get_member_index("m", cfg, "index.yaml", storage, cache)
assert ttl == 900
async def test_defaults_ttl_to_3600_when_not_configured(self, storage, cache):
cfg = {"base_url": "https://example.com"}
with patch("artifactapi.artifact.virtual.httpx.AsyncClient") as mock_cls:
mock_client = AsyncMock()
mock_cls.return_value.__aenter__.return_value = mock_client
mock_client.get.return_value = self._fake_response()
_, _, ttl, _ = await _get_member_index("m", cfg, "index.yaml", storage, cache)
assert ttl == 3600
# ---------------------------------------------------------------------------
# Virtual route GET /api/v1/virtual/{name}/{path}
# ---------------------------------------------------------------------------
@pytest.fixture
def mock_storage_v():
m = MagicMock()
m.get_object_key.return_value = "virtual/helm-virtual-test/index.yaml"
m.exists.return_value = False
m.download_object.return_value = b"apiVersion: v1\nentries: {}\n"
return m
@pytest.fixture
def mock_cache_v():
m = MagicMock()
m.is_index_valid.return_value = False
m.available = False
m.client = None
return m
@pytest.fixture
def patched_virtual_deps(mock_storage_v, mock_cache_v):
import artifactapi.main as main_mod
with (
patch.object(main_mod, "storage", mock_storage_v),
patch.object(main_mod, "cache", mock_cache_v),
):
yield {"storage": mock_storage_v, "cache": mock_cache_v}
class TestVirtualRoute:
def test_unknown_virtual_name_returns_404(self, client, patched_virtual_deps):
response = client.get("/api/v1/virtual/no-such-virtual/index.yaml")
assert response.status_code == 404
def test_non_virtual_type_returns_400(self, client, patched_virtual_deps):
# helm-test is type "remote", not "virtual"
response = client.get("/api/v1/virtual/helm-test/index.yaml")
assert response.status_code == 400
def test_unsupported_package_returns_400(self, client, patched_virtual_deps):
# unsupported-virtual-test has package "rpm"
response = client.get("/api/v1/virtual/unsupported-virtual-test/index.yaml")
assert response.status_code == 400
def test_non_index_path_returns_404(self, client, patched_virtual_deps):
response = client.get("/api/v1/virtual/helm-virtual-test/vault-0.27.0.tgz")
assert response.status_code == 404
def test_no_members_returns_500(self, client, patched_virtual_deps):
response = client.get("/api/v1/virtual/empty-virtual-test/index.yaml")
assert response.status_code == 500
def test_virtual_cache_hit_returns_200(self, client, patched_virtual_deps):
deps = patched_virtual_deps
deps["storage"].exists.return_value = True
deps["cache"].is_index_valid.return_value = True
response = client.get("/api/v1/virtual/helm-virtual-test/index.yaml")
assert response.status_code == 200
def test_virtual_cache_hit_content_type_is_yaml(self, client, patched_virtual_deps):
deps = patched_virtual_deps
deps["storage"].exists.return_value = True
deps["cache"].is_index_valid.return_value = True
response = client.get("/api/v1/virtual/helm-virtual-test/index.yaml")
assert "text/yaml" in response.headers["content-type"]
def test_virtual_cache_hit_returns_stored_content(self, client, patched_virtual_deps):
deps = patched_virtual_deps
deps["storage"].exists.return_value = True
deps["cache"].is_index_valid.return_value = True
deps["storage"].download_object.return_value = b"apiVersion: v1\nentries: {}\n"
response = client.get("/api/v1/virtual/helm-virtual-test/index.yaml")
assert response.content == b"apiVersion: v1\nentries: {}\n"
def test_virtual_cache_hit_skips_member_fetch(self, client, patched_virtual_deps):
deps = patched_virtual_deps
deps["storage"].exists.return_value = True
deps["cache"].is_index_valid.return_value = True
with patch("artifactapi.artifact.virtual._get_member_index", new_callable=AsyncMock) as mock_get:
client.get("/api/v1/virtual/helm-virtual-test/index.yaml")
mock_get.assert_not_called()
def test_cache_miss_returns_200_with_yaml_content_type(self, client, patched_virtual_deps):
with (
patch("artifactapi.artifact.virtual._get_member_index", new_callable=AsyncMock) as mock_get,
patch("artifactapi.artifact.virtual._helm.resolve_content", side_effect=_identity_resolve),
):
mock_get.return_value = ("helm-test", _CFG_A, 3600, _INDEX_SIMPLE)
response = client.get("/api/v1/virtual/helm-virtual-test/index.yaml")
assert response.status_code == 200
assert "text/yaml" in response.headers["content-type"]
def test_cache_miss_response_contains_merged_entries(self, client, patched_virtual_deps):
with (
patch("artifactapi.artifact.virtual._get_member_index", new_callable=AsyncMock) as mock_get,
patch("artifactapi.artifact.virtual._helm.resolve_content", side_effect=_identity_resolve),
):
mock_get.return_value = ("helm-test", _CFG_A, 3600, _INDEX_SIMPLE)
response = client.get("/api/v1/virtual/helm-virtual-test/index.yaml")
index = yaml.safe_load(response.content)
assert "mychart" in index["entries"]
def test_cache_miss_stores_result_in_s3(self, client, patched_virtual_deps):
deps = patched_virtual_deps
with (
patch("artifactapi.artifact.virtual._get_member_index", new_callable=AsyncMock) as mock_get,
patch("artifactapi.artifact.virtual._helm.resolve_content", side_effect=_identity_resolve),
):
mock_get.return_value = ("helm-test", _CFG_A, 3600, _INDEX_SIMPLE)
client.get("/api/v1/virtual/helm-virtual-test/index.yaml")
deps["storage"].upload.assert_called_once()
def test_cache_miss_marks_index_cached(self, client, patched_virtual_deps):
deps = patched_virtual_deps
with (
patch("artifactapi.artifact.virtual._get_member_index", new_callable=AsyncMock) as mock_get,
patch("artifactapi.artifact.virtual._helm.resolve_content", side_effect=_identity_resolve),
):
mock_get.return_value = ("helm-test", _CFG_A, 3600, _INDEX_SIMPLE)
client.get("/api/v1/virtual/helm-virtual-test/index.yaml")
deps["cache"].mark_index_cached.assert_called_once()
def test_cache_miss_uses_min_ttl_across_members(self, client, patched_virtual_deps):
deps = patched_virtual_deps
with (
patch("artifactapi.artifact.virtual._get_member_index", new_callable=AsyncMock) as mock_get,
patch("artifactapi.artifact.virtual._helm.resolve_content", side_effect=_identity_resolve),
):
mock_get.side_effect = [
("helm-test", _CFG_A, 3600, _INDEX_SIMPLE),
("helm-member-2", _CFG_B, 1800, _INDEX_SIMPLE),
]
client.get("/api/v1/virtual/helm-virtual-test/index.yaml")
_, _, ttl = deps["cache"].mark_index_cached.call_args[0]
assert ttl == 1800
def test_all_members_unreachable_returns_502(self, client, patched_virtual_deps):
with patch("artifactapi.artifact.virtual._get_member_index", new_callable=AsyncMock) as mock_get:
mock_get.return_value = ("helm-test", _CFG_A, 3600, None)
response = client.get("/api/v1/virtual/helm-virtual-test/index.yaml")
assert response.status_code == 502
def test_one_member_unreachable_still_returns_200(self, client, patched_virtual_deps):
with (
patch("artifactapi.artifact.virtual._get_member_index", new_callable=AsyncMock) as mock_get,
patch("artifactapi.artifact.virtual._helm.resolve_content", side_effect=_identity_resolve),
):
mock_get.side_effect = [
("helm-test", _CFG_A, 3600, _INDEX_SIMPLE),
("helm-member-2", _CFG_B, 1800, None),
]
response = client.get("/api/v1/virtual/helm-virtual-test/index.yaml")
assert response.status_code == 200
def test_member_not_in_config_is_skipped(self, client, patched_virtual_deps):
import artifactapi.main as main_mod
real_get = main_mod.config.get_remote_config
def patched_get(name):
return None if name == "helm-member-2" else real_get(name)
with (
patch("artifactapi.artifact.virtual._get_member_index", new_callable=AsyncMock) as mock_get,
patch("artifactapi.artifact.virtual._helm.resolve_content", side_effect=_identity_resolve),
patch.object(main_mod.config, "get_remote_config", side_effect=patched_get),
):
mock_get.return_value = ("helm-test", _CFG_A, 3600, _INDEX_SIMPLE)
response = client.get("/api/v1/virtual/helm-virtual-test/index.yaml")
# only helm-test was available — should succeed
assert response.status_code == 200
mock_get.assert_called_once()
def test_s3_store_failure_still_returns_200(self, client, patched_virtual_deps):
deps = patched_virtual_deps
deps["storage"].upload.side_effect = Exception("S3 write error")
with (
patch("artifactapi.artifact.virtual._get_member_index", new_callable=AsyncMock) as mock_get,
patch("artifactapi.artifact.virtual._helm.resolve_content", side_effect=_identity_resolve),
):
mock_get.return_value = ("helm-test", _CFG_A, 3600, _INDEX_SIMPLE)
response = client.get("/api/v1/virtual/helm-virtual-test/index.yaml")
assert response.status_code == 200
+8
View File
@@ -0,0 +1,8 @@
[tox]
envlist = py311
isolated_build = true
[testenv]
extras = dev
commands =
pytest {posargs:tests}