perf: stream proxied artifacts instead of buffering the full body in memory #94

Merged
benvin merged 1 commits from benvin/stream-immutable-blobs into master 2026-07-02 21:33:42 +10:00
Owner

Fixes #66

Why

fetchFromUpstream read every upstream response with io.ReadAll, hashed it in memory, uploaded from memory and served from memory. A single large immutable blob (Docker layer, RPM, tarball, Go module zip) — or several concurrent ones — could OOM the process. The streaming, tempfile-backed CAS already existed but the proxy path bypassed it (and Engine.cas was assigned but unused).

Changes

  • Immutable fetches now stream through CAS.Store (tempfile -> sha256 -> S3), so memory stays bounded regardless of artifact size, and are served back from the store.
  • Mutable indexes stay on the in-memory path (small, and subject to RewriteResponse).
  • Skipping RewriteResponse for immutable content is behaviour-preserving: the proxy path always passes an empty proxyBaseURL, under which every providers RewriteResponse is a no-op.
  • Remove the now-unused in-memory sha256Hash helper.

Validation

  • make e2e passes.
  • Live smoke test against Postgres/Redis/MinIO: proxied a 12 MB blob through a generic remote — fetch #1 X-Artifact-Source: remote, fetch #2 X-Artifact-Source: cache, both byte-identical (sha256) to the origin.
Fixes #66 ## Why `fetchFromUpstream` read every upstream response with `io.ReadAll`, hashed it in memory, uploaded from memory and served from memory. A single large immutable blob (Docker layer, RPM, tarball, Go module zip) — or several concurrent ones — could OOM the process. The streaming, tempfile-backed CAS already existed but the proxy path bypassed it (and `Engine.cas` was assigned but unused). ## Changes - Immutable fetches now stream through `CAS.Store` (tempfile -> sha256 -> S3), so memory stays bounded regardless of artifact size, and are served back from the store. - Mutable indexes stay on the in-memory path (small, and subject to `RewriteResponse`). - Skipping `RewriteResponse` for immutable content is behaviour-preserving: the proxy path always passes an empty `proxyBaseURL`, under which every providers `RewriteResponse` is a no-op. - Remove the now-unused in-memory `sha256Hash` helper. ## Validation - `make e2e` passes. - Live smoke test against Postgres/Redis/MinIO: proxied a 12 MB blob through a generic remote — fetch #1 `X-Artifact-Source: remote`, fetch #2 `X-Artifact-Source: cache`, both byte-identical (sha256) to the origin.
unkinben added 1 commit 2026-07-02 00:49:25 +10:00
perf: stream immutable blobs through the CAS instead of buffering
ci/woodpecker/pr/test Pipeline was successful
ci/woodpecker/pr/pre-commit Pipeline was successful
ci/woodpecker/pr/build Pipeline was successful
2fc9ae6a31
fetchFromUpstream read every upstream response fully into memory before
storing and serving it, so a large blob (or several concurrent ones) could
OOM the process. Immutable artifacts now stream through the tempfile-backed
CAS (tee to sha256 + S3), keeping memory bounded regardless of size, and
are served back from the store. Mutable indexes are small and may be
rewritten, so they keep the in-memory path. Immutable content is never
rewritten in the proxy path (proxyBaseURL is empty), so skipping
RewriteResponse there is behaviour-preserving.

Refs #66
benvin merged commit f3680951b7 into master 2026-07-02 21:33:42 +10:00
benvin deleted branch benvin/stream-immutable-blobs 2026-07-02 21:33:42 +10:00
Sign in to join this conversation.
No Reviewers
No Label
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: unkin/artifactapi#94