Commit Graph

47 Commits

Author SHA1 Message Date
unkinben fe837dabf7 feat: keep stale mutables when upstream is unreachable; update README
ci/woodpecker/pr/pre-commit Pipeline was successful
ci/woodpecker/pr/test Pipeline was successful
ci/woodpecker/pr/build Pipeline was successful
When a mutable file's TTL expires and the upstream backend cannot be
contacted (network error or timeout), the cached copy is kept and its
TTL refreshed instead of being evicted. This keeps RPM repodata, Alpine
indexes, branch archives, and other mutable data available during
upstream outages.

Adds UpstreamUnreachable exception and _upstream_reachable() helper.
check_upstream_changed() now raises UpstreamUnreachable on network
errors (was silently returning True). handle_expired_mutable() catches
the exception on the check_mutable_updates path and calls
_upstream_reachable() on the plain-expiry path.

README updated to current immutable/mutable terminology and documents
all new caching features.
2026-04-27 11:38:50 +10:00
unkinben 78296dae8f refactor: extract handle_expired_mutable helper; add redownload success test
Deduplicates the expired-mutable TTL/redownload branching logic that
was copied verbatim between get_artifact and docker_v2_proxy. Adds the
missing happy-path test for a changed mutable file that is successfully
re-fetched from upstream.
2026-04-27 11:13:15 +10:00
unkinben 8fe4bac2b9 feat: add check_mutable_updates flag for conditional upstream revalidation
When check_mutable_updates: true is set on a remote, expired user-defined
mutable files are revalidated before re-downloading:

- On expiry a conditional HEAD is sent with If-None-Match / If-Modified-Since
- 304 Not Modified: TTL is refreshed in Redis, S3 cache is untouched
- 200 / no conditional support: cache is invalidated and file re-downloaded
- Network error: safe fallback — assume changed, re-download

ETag and Last-Modified from upstream responses are stored in Redis under
mutable:meta:<remote>:<hash> (no expiry, cleaned up on re-download or
cache flush). The flag only applies to user-configured mutable_patterns;
built-in package-type defaults (APKINDEX, repomd.xml, Docker manifests)
are always re-fetched unconditionally.

cache/flush also clears mutable:meta:* keys alongside index:* keys.
2026-04-27 11:00:09 +10:00
unkinben 8bc9285117 chore: track remotes.yaml as a documented example config
Remove remotes.yaml from .gitignore and add header comments explaining
the immutable_patterns/mutable_patterns/cache keys. Marks the file
clearly as an example to copy and adapt; warns against committing
real credentials.
2026-04-27 10:58:59 +10:00
unkinben ce01a94141 feat: rename include/index patterns to immutable/mutable with per-remote TTL
Replace the include_patterns/index_patterns split with a clearer
immutable_patterns/mutable_patterns model:

- immutable_patterns: artifacts cached indefinitely (no TTL)
- mutable_patterns: artifacts that expire and are re-fetched after
  cache.mutable_ttl seconds (replaces cache.index_ttl)

_PACKAGE_INDEX_PATTERNS renamed to _PACKAGE_MUTABLE_PATTERNS; all
built-in package-type index patterns (APKINDEX, repomd, manifests, etc.)
default to the remote's mutable_ttl (default 1 hour).

cache.file_ttl renamed to cache.immutable_ttl for consistency.
Adds github-archive remote to remotes.yaml as a worked example showing
tag archives as immutable and branch archives as mutable (1-day TTL).

docker-compose.yml: fix VERSION=dev → 2.2.2.dev0 (valid PEP 440),
add :z SELinux label to volume mounts.
2026-04-27 00:40:13 +10:00
unkinben 4619ae18d8 Merge pull request 'chore: remove build from tag' (#13) from benvin/docker-compose-build into master
Reviewed-on: #13
2026-04-25 22:29:48 +10:00
unkinben ac51d3a51d chore: remove build from tag
ci/woodpecker/pr/test Pipeline was successful
ci/woodpecker/pr/pre-commit Pipeline was successful
ci/woodpecker/pr/build Pipeline was successful
- stop building the image on tag events
2026-04-25 22:27:59 +10:00
unkinben 2887ce4476 Merge pull request 'build: align Dockerfile with packer build and add docker-compose dev mounts' (#12) from benvin/packer-aligned-dockerfile into master
ci/woodpecker/tag/docker Pipeline was successful
Reviewed-on: #12
v2.2.1
2026-04-25 22:23:59 +10:00
unkinben 9e52929d73 build: align Dockerfile with packer build and add docker-compose dev mounts
ci/woodpecker/pr/pre-commit Pipeline was successful
ci/woodpecker/pr/test Pipeline was successful
ci/woodpecker/pr/build Pipeline was successful
- Rebase Dockerfile onto almalinux9-base, install via uv tool install
- Remove dev artifacts (remotes.yaml, ca-bundle.pem) from image
- Mount gitignored dev files via docker-compose volumes instead
- Add .dockerignore to keep secrets out of build context
- Track docker-compose.yml in git (no secrets; dev files mounted as volumes)
2026-04-25 22:17:36 +10:00
unkinben 788d469063 Merge pull request 'benvin/configurable-index-patterns' (#11) from benvin/configurable-index-patterns into master
ci/woodpecker/tag/docker Pipeline failed
Reviewed-on: #11
v2.2.0
2026-04-25 21:04:25 +10:00
unkinben 1cbe836f1b ci: add Woodpecker pipelines for pre-commit, tests, and Docker build
ci/woodpecker/pr/pre-commit Pipeline was successful
ci/woodpecker/pr/test Pipeline was successful
2026-04-25 21:02:39 +10:00
unkinben f3394b9ca6 docs: add RKE2 image rewriting guide and expand pattern examples
Add a new "Docker Image Rewriting with RKE2" section covering:
- How the /v2/ proxy integrates with registries.yaml mirror rewrites
- Per-registry examples (docker.io, ghcr.io, registry.k8s.io, quay.io)
- include_patterns for restricting which images are cached
- TLS CA configuration for private certificate authorities
- Apply and verification commands

Expand the Configuration section with:
- Richer include_patterns examples (anchored, extension, architecture,
  Docker image name patterns, repodata directories)
- New index_patterns section explaining built-in defaults per package
  type and how to add custom patterns (Helm index.yaml, APT InRelease/
  Packages.gz, extra RPM comps.xml)
2026-04-25 20:20:42 +10:00
unkinben 8da43e610e tests: resolve all peer-review issues across test suite
Address every substantive critique from the peer review:

test_cache: replace tautological same-inputs key test with hardcoded
hash assertion; assert setex call + TTL in mark_index_cached test;
assert client is None for unavailable no-op; rename Packages.gz test
to document intentional behaviour; add alpine sig/tmp negatives; add
hyphenated and date-tag docker positive cases; add key hash-length
assertion.

test_config: replace live-constant comparisons with literal string
assertions for alpine/rpm/docker; add unknown package type test;
add dict-keyed repositories branch coverage (per-repo override and
fallback); fix cache config to full equality check; add explicit empty
index_patterns test.

test_docker_auth: fix case-insensitive test to verify realm value;
add field-order (scope before service) limitation test; add pipe-char
collision documentation test; add missing fetch_token edge cases
(no token field, HTTPStatusError, missing expires_in default 300);
replace rubber-stamp delegate test with end-to-end parse→fetch test.

test_storage: replace split prefix/suffix assertions with structural
3-part check + pinned sha256 assertion; fix Docker blob digests to
64-char hex; add secure=True URL test; add upload return value test;
add download_object 404-on-ClientError test; remove redundant subset
test.

test_routes: add metrics.record_cache_hit/miss assertions; add
mark_index_cached assertion after cache miss on index (docker + generic);
add Content-Disposition, X-Artifact-Size header checks; add rpm/xml
content-type tests; add flush test that verifies Redis keys are deleted
when cache is available; add smoke coverage for upload (PUT), HEAD, DELETE,
/metrics, and /config routes.
2026-04-25 19:58:33 +10:00
unkinben 3a13d76f7e chore: add .tox, .pytest_cache, .pre-commit-cache, .ruff_cache to .gitignore 2026-04-25 19:21:43 +10:00
unkinben 2d0e2c64e6 feat: add test suite, tox, pre-commit, and ruff formatting
- tests/: 107 unit tests across config, cache, docker_auth, storage,
  and FastAPI routes; all passing under pytest-asyncio auto mode
- tox.ini: runs pytest via uvx --with tox-uv tox (py311)
- .pre-commit-config.yaml: ruff lint + ruff-format at v0.15.12
- pyproject.toml: pytest config (asyncio_mode=auto), ruff config
  (line-length=140), tox/pre-commit added to dev extras
- Makefile: test/tox/pre-commit targets via uvx --python 3.11
- Source files reformatted by ruff-format (no logic changes)
2026-04-25 19:21:05 +10:00
unkinben 2414ddfdd3 feat: make index file patterns configurable per remote
Replace hardcoded is_index_file logic with regex patterns driven by
remotes.yaml. Package-level defaults (alpine/rpm/docker) are merged with
any extra patterns listed under index_patterns in the remote config.
2026-04-25 18:40:45 +10:00
unkinben b3d12f4962 docs: add SPEC.md with repository model and caching requirements v2.1.3 2026-04-25 18:31:27 +10:00
unkinben 92b9f9a03e refactor: use package: docker instead of type: docker
Align with intended type=local|remote|virtual / package=docker|rpm|alpine|generic
model. All docker-specific logic now keyed on package field; type field
correctly reflects the repository kind (remote vs local).
2026-04-25 18:27:31 +10:00
unkinben 7930023de8 Merge pull request 'feat: enforce include_patterns on docker /v2/ proxy route' (#10) from benvin/docker-include-patterns into master
Reviewed-on: #10
v2.1.2
2026-04-25 18:14:50 +10:00
unkinben 869a1f8c02 feat: enforce include_patterns on docker /v2/ proxy route
Adds pattern checking to docker_v2_proxy before any upstream fetch.
Patterns match against the full path and the image name (first two
path segments), bypassing the index-file exemption that check_artifact_patterns
applies — so restrictions apply equally to manifests, blobs, and tag lists.
Returns 403 when no pattern matches, consistent with the non-docker route.
2026-04-25 18:09:12 +10:00
unkinben 1b2ee0d37f Merge pull request 'benvin/docker-caching' (#9) from benvin/docker-caching into master
Reviewed-on: #9
v2.1.1
2026-04-25 17:33:18 +10:00
unkinben 33e7365a88 fix: set SETUPTOOLS_SCM_PRETEND_VERSION in Dockerfile for hatch-vcs 2026-04-25 17:31:36 +10:00
unkinben cf854a2ace chore: derive version from git tags via hatch-vcs
Replace hardcoded version in pyproject.toml with hatch-vcs so the
package version is read from git tags at build time. Dockerfile
accepts a VERSION build arg and passes it as HATCH_VCS_PRETEND_VERSION
for builds without a git checkout. Makefile _tag target now rebuilds
the container with the correct version automatically.
v2.1.0
2026-04-25 16:53:37 +10:00
unkinben 4c1f77e679 Merge pull request 'feat: add Docker registry proxy support with proper cache classification' (#8) from benvin/docker-caching into master
Reviewed-on: #8
2026-04-25 16:37:38 +10:00
unkinben 4651183ed1 feat: add Docker registry proxy support with proper cache classification
- Add /v2/ endpoint implementing OCI Distribution API for native docker pull support
- Add docker_auth.py with Bearer token challenge handling and in-memory token cache
- Classify tag-based manifests (/manifests/<tag>) as index (short TTL, mutable)
- Classify digest-pinned manifests (/manifests/sha256:...) and blobs as file cache (indefinite, immutable)
- Deduplicate blob storage by keying on sha256 digest rather than image path
- Support username/password auth per docker remote in remotes.yaml
2026-04-25 16:35:27 +10:00
unkinben 5733d52e51 Merge pull request 'feature/cache-flush-api-enhancement' (#7) from feature/cache-flush-api-enhancement into master
Reviewed-on: #7
2026-01-25 11:34:43 +11:00
unkinben bf8a176dda Bump version: 2.0.4 → 2.0.5 2026-01-25 11:09:29 +11:00
unkinben 3e8e819ecf feat: enhance cache flush API for remote-specific S3 clearing
Add support for remote-specific S3 object deletion using hierarchical key prefixes.
When a remote parameter is provided, only objects under that remote's hierarchy
(e.g., 'fedora/*') are deleted, preserving cached files for other remotes.

Key improvements:
- Use S3 list_objects_v2 Prefix parameter for targeted deletion
- Enhanced logging to indicate scope of cache clearing operations
- Backward compatible - existing behavior preserved when no remote specified

Resolves artifactapi-u46
2026-01-25 11:08:46 +11:00
unkinben de04e4d2b2 Merge pull request 'benvin/path-based-storage' (#6) from benvin/path-based-storage into master
Reviewed-on: #6
2026-01-25 00:00:53 +11:00
unkinben 16c8bd60eb Bump version: 2.0.3 → 2.0.4 v2.0.4 2026-01-24 23:59:48 +11:00
unkinben d39550c4e8 chore: migrate from bumpver to bump-my-version for better reliability 2026-01-24 23:59:05 +11:00
unkinben 424de5cc13 fix: sync package version with bumpver version (2.0.3) 2026-01-24 23:54:42 +11:00
unkinben 3f54874421 Bump version 2.0.2 → 2.0.3 2026-01-24 23:53:19 +11:00
unkinben 6e8912eed1 chore: ignore local config overrides (docker-compose.yml, ca-bundle.pem) 2026-01-24 23:53:08 +11:00
unkinben 5a0e8b4e0b feat: implement hierarchical S3 keys and automated version management
This commit introduces two major improvements:

1. **Hierarchical S3 Key Structure**:
   - Replace URL-based hashing with remote-name/hash(directory_path)/filename format
   - Enables remote-specific cache operations and intuitive S3 organization
   - Cache keys now independent of mirror URL changes
   - Example: fedora/886d215f6d1a0108/eccodes-2.44.0-1.fc42.x86_64.rpm

2. **Automated Version Management**:
   - Add bumpver for semantic version bumping
   - Single source of truth in pyproject.toml
   - FastAPI dynamically reads version from package metadata
   - Eliminates manual version synchronization between files

Changes:
- storage.py: New get_object_key(remote_name, path) method with directory hashing
- main.py: Dynamic version import and updated cache key generation calls
- cache.py: Updated to use new hierarchical key structure
- pyproject.toml: Added bumpver config and dev dependency

Breaking change: S3 key format changed, existing cache will need regeneration
2026-01-24 23:51:03 +11:00
unkinben 1a71a2c9fa Merge pull request 'feat: add cache flush API and fix cache key consistency' (#5) from benvin/cache_flush into master
Reviewed-on: #5
2026-01-13 19:02:52 +11:00
unkinben d2ecc6b1a0 feat: add cache flush API and fix cache key consistency
- Add PUT /cache/flush endpoint for selective cache management
- Fix critical cache key mismatch in cleanup_expired_index()
- Update index file detection to use full path instead of filename
- Bump version to 2.0.2

The key changes include adding a comprehensive cache flush API that supports selective flushing by cache type (index, files, metrics) and fixing a critical bug where cache keys were
inconsistent between storage and cleanup operations, preventing proper cache invalidation.
2026-01-13 19:02:31 +11:00
unkinben e4013e6a2a Merge pull request 'feat: index caching' (#4) from benvin/index_caching into master
Reviewed-on: #4
2026-01-13 18:14:39 +11:00
unkinben 9defc78e21 feat: index caching
- improve index detection for rpms
- improve logging
2026-01-13 18:13:47 +11:00
unkinben f40675f3d2 Merge pull request 'feat: add fedora index files' (#3) from benvin/fedora_indexes into master
Reviewed-on: #3
2026-01-10 17:02:58 +11:00
unkinben b54e6c3e0c feat: add fedora index files
- ensure files matching xml.zck and xml.zst are marked as index files
2026-01-10 17:01:39 +11:00
unkinben 79a8553e9c Merge pull request 'Fix S3 SSL certificate validation and boto3 checksum compatibility' (#2) from benvin/boto3_fixes into master
Reviewed-on: #2
2026-01-08 23:55:42 +11:00
unkinben b7205e09a3 Fix S3 SSL certificate validation and boto3 checksum compatibility
- Add support for custom CA bundle via REQUESTS_CA_BUNDLE/SSL_CERT_FILE environment variables
- Configure boto3 client with custom SSL verification to support Ceph RadosGW through nginx proxy
- Maintain boto3 checksum validation configuration for compatibility with third-party S3 providers
- Resolves XAmzContentSHA256Mismatch errors when connecting to RadosGW endpoints

Fixes #4400 compatibility issue with boto3 v1.36+ stricter checksum validation
2026-01-08 23:54:39 +11:00
unkinben 1fb6b89a5f Merge pull request 'Fix boto3 XAmzContentSHA256Mismatch errors with Ceph RadosGW' (#1) from fix/boto3-checksum-validation into master
Reviewed-on: #1
2026-01-08 23:07:51 +11:00
unkinben f0f3cfda50 Fix boto3 XAmzContentSHA256Mismatch errors with Ceph RadosGW
- Configure boto3 client with when_required checksum calculation/validation
- Resolves compatibility issues between boto3 v1.36+ and S3-compatible providers
- Fixes 502 errors when uploading artifacts to storage backend

Refs: https://github.com/boto/boto3/issues/4400
2026-01-08 23:04:19 +11:00
unkinben 666ad08518 Add CONFIG_PATH environment variable for configurable config file location
- Replace hardcoded remotes.yaml path with CONFIG_PATH environment variable
- Update docker-compose.yml to set CONFIG_PATH=/app/remotes.yaml
- Application now requires CONFIG_PATH to be set, enabling flexible deployment
2026-01-06 21:35:59 +11:00
unkinben 46711eec6a Initial implementation of generic artifact storage system
- FastAPI-based caching proxy for remote file servers
- YAML configuration for multiple remotes (GitHub, Gitea, HashiCorp, etc.)
- Direct URL API: /api/v1/remote/{remote}/{path} with auto-download and caching
- Pattern-based access control with regex filtering
- S3/MinIO backend storage with predictable paths
- Docker Compose setup with MinIO for local development
2026-01-06 21:13:13 +11:00