The circuit breaker was fully implemented and tested but never called, so
a failing upstream was hit on every request. Engine.Fetch now short-
circuits when the breaker is open (serving stale if available, else 503),
records a failure on each UpstreamError, and resets on success.
Refs #74
Fixes#67
## Why
The proxy used `http.DefaultClient` for all upstream GET/HEAD and bearer-token requests. It has no timeouts, so a slow or hung upstream holds a goroutine and connection indefinitely.
## Changes
- Add a shared `upstreamClient` (`internal/proxy/httpclient.go`) with dial, TLS-handshake, response-header and idle-connection timeouts, plus connection pooling.
- Deliberately no overall `Client.Timeout`, so large artifact bodies can still stream; total time is bounded by the request context.
- Route all four upstream calls in the engine through it.
## Validation
- `make e2e` passes.
Reviewed-on: #83
Co-authored-by: Ben Vincent <ben@unkin.net>
Co-committed-by: Ben Vincent <ben@unkin.net>
Fixes#75
## Why
On a fetch-lock miss, `Engine.Fetch` slept a flat 500ms once, tried the store, and otherwise fell through to fetch upstream unlocked. A cold-cache stampede therefore still hit upstream once per waiter.
## Changes
- Add `waitForStore`, which polls the store every 100ms for up to 5s (stopping on context cancellation) so waiters pick up the lock leaders populated result.
- Only fall through to an upstream fetch if the leader has not populated the store within the wait budget.
## Validation
- `make e2e` passes.
Reviewed-on: #93
Co-authored-by: Ben Vincent <ben@unkin.net>
Co-committed-by: Ben Vincent <ben@unkin.net>
Fixes#76
## Why
Every proxied request spawned a goroutine running a 5s-timeout single-row INSERT. Under load this is unbounded goroutines and connection-pool pressure.
## Changes
- Add `database.AccessLogEntry` + `InsertAccessLogBatch` (bulk `COPY`).
- The engine starts one background writer that drains a buffered channel and flushes every 128 entries or 2s.
- `logAccess` is now a non-blocking channel send (drops on full buffer), so the request path never blocks on the DB. Best-effort telemetry: a small tail may be lost on abrupt shutdown.
## Validation
- `make e2e` passes.
Reviewed-on: #91
Co-authored-by: Ben Vincent <ben@unkin.net>
Co-committed-by: Ben Vincent <ben@unkin.net>
Fixes#70
## Why
Docker `HEAD` routes mapped to `handleProxy`, which ran a full `Fetch` + `io.Copy` — downloading the entire blob (and fetching upstream on a miss) only for net/http to discard the body. HEAD existence checks (manifests, blobs) are common.
## Changes
- Add `Engine.Head`: answers cached artifacts/indexes from store metadata (no blob download); on a miss issues an upstream `HEAD` (with bearer-token handling) and never caches a body.
- Route `HEAD /v2/{remote}/*` to a dedicated `handleProxyHead` that writes headers only.
- Add e2e tests for HEAD on a blocklisted path (403) and an unknown remote (404).
## Note
`headUpstream` uses `http.DefaultClient` to build cleanly on master; it will pick up the shared timeout-configured client from #67 once that merges.
## Validation
- `make e2e` passes (includes new HEAD tests).
Reviewed-on: #89
Co-authored-by: Ben Vincent <ben@unkin.net>
Co-committed-by: Ben Vincent <ben@unkin.net>
Fixes#77
## Why
Each upstream 401 re-ran the full token-endpoint request, even though a single Docker pull triggers many blob/manifest requests sharing one scope.
## Changes
- Add Redis `GetToken`/`SetToken`.
- `fetchBearerToken` now also parses `expires_in` and returns a TTL.
- New `Engine.cachedBearerToken` reuses a cached token keyed by remote + challenge (hashed), caching for `expires_in` minus a safety margin (default 60s when absent).
## Validation
- `make e2e` passes.
Reviewed-on: #92
Co-authored-by: Ben Vincent <ben@unkin.net>
Co-committed-by: Ben Vincent <ben@unkin.net>
Fixes#66
## Why
`fetchFromUpstream` read every upstream response with `io.ReadAll`, hashed it in memory, uploaded from memory and served from memory. A single large immutable blob (Docker layer, RPM, tarball, Go module zip) — or several concurrent ones — could OOM the process. The streaming, tempfile-backed CAS already existed but the proxy path bypassed it (and `Engine.cas` was assigned but unused).
## Changes
- Immutable fetches now stream through `CAS.Store` (tempfile -> sha256 -> S3), so memory stays bounded regardless of artifact size, and are served back from the store.
- Mutable indexes stay on the in-memory path (small, and subject to `RewriteResponse`).
- Skipping `RewriteResponse` for immutable content is behaviour-preserving: the proxy path always passes an empty `proxyBaseURL`, under which every providers `RewriteResponse` is a no-op.
- Remove the now-unused in-memory `sha256Hash` helper.
## Validation
- `make e2e` passes.
- Live smoke test against Postgres/Redis/MinIO: proxied a 12 MB blob through a generic remote — fetch #1 `X-Artifact-Source: remote`, fetch #2 `X-Artifact-Source: cache`, both byte-identical (sha256) to the origin.
Reviewed-on: #94
Co-authored-by: Ben Vincent <ben@unkin.net>
Co-committed-by: Ben Vincent <ben@unkin.net>
Fixes#68
## Why
`isNetworkError` type-asserted `err.(*UpstreamError)` directly. If the error is ever wrapped, stale-on-error handling silently stops triggering.
## Changes
- Use `errors.As` to detect `*UpstreamError` through wrapping.
## Validation
- `make e2e` passes.
Reviewed-on: #84
Co-authored-by: Ben Vincent <ben@unkin.net>
Co-committed-by: Ben Vincent <ben@unkin.net>
Fixes#78
## Why
`serveFromStore` first called `store.Download` with the bare content hash as the S3 key, which never matches real object keys (`blobs/sha256/<hash>`). Every cached blob serve therefore paid an extra guaranteed-404 round-trip before retrying with the correct `BlobKey`.
## Changes
- Remove the dead first `Download` attempt; go straight to the `BlobKey` lookup, then fall back to the index key.
## Validation
- `make e2e` passes (proxy cache-hit paths exercised end-to-end).
Reviewed-on: #82
Co-authored-by: Ben Vincent <ben@unkin.net>
Co-committed-by: Ben Vincent <ben@unkin.net>
## Problems
1. Docker daemon sends specific Accept headers to negotiate manifest format, but the proxy dropped them — registries defaulted to OCI format, causing "mediaType should be manifest.v2+json not oci.image.index" errors
2. Upstream Content-Type was only used when the provider returned "application/octet-stream" — Docker manifests got the wrong Content-Type
## Fixes
- Forward client Accept header to upstream (both initial request and Bearer token retry)
- Always prefer upstream Content-Type when present
- Fetch signature now accepts variadic clientHeaders for backwards compat
## E2E tested
- DockerHub: redis:7-alpine, alpine:3 — skopeo inspect OK
- GHCR: OCI-only images work with docker pull (GHCR 404s Docker v2 Accept, which is expected)
- Quay: prometheus/node-exporter — skopeo inspect OK
Reviewed-on: #62
Co-authored-by: Ben Vincent <ben@unkin.net>
Co-committed-by: Ben Vincent <ben@unkin.net>
Docker Hub (and other registries) return 401 with a `Www-Authenticate: Bearer realm=...` challenge even for public images. The proxy now:
1. Detects 401 + Bearer challenge
2. Parses realm/service/scope from the header
3. Fetches an anonymous token (or authenticated if username/password configured)
4. Retries the original request with the Bearer token
Fixes: `docker pull artifactapi.../dockerhub/library/redis:latest` returning "unauthorized: upstream returned 401"
Reviewed-on: #60
Co-authored-by: Ben Vincent <ben@unkin.net>
Co-committed-by: Ben Vincent <ben@unkin.net>