feat: background cache warmer for virtual repositories #28

New Issue

2026-04-29T21:41:29+10:00

unkinben commented

2026-04-29 21:41:29 +10:00

Problem

Virtual repository `index.yaml` rebuilds are currently on-demand: when the Redis TTL expires, the next request triggers a full rebuild — parallel HTTP fetches across all member remotes, plus a YAML parse/merge/dump cycle. With 19 members this takes ~22s fetch + ~44s merge = ~66s total, blocking the client.

Proposed solution

Add a background task (FastAPI `BackgroundTasks` or a dedicated async loop) that proactively re-warms the virtual index before it expires, so the virtual handler only ever serves from S3 — never rebuilds inline.

Behaviour

On startup, trigger an initial warm for each configured virtual repository.
Schedule a refresh at `min(member_ttls) * 0.9` (10% before the cache expires) so there is always a fresh merged index ready in S3.
The `GET /api/v1/virtual/{name}/index.yaml` handler checks Redis + S3 and serves immediately; if the cache is cold (first boot race, S3 eviction), fall back to the current inline rebuild and log a warning.

Implementation notes

One async background loop per virtual repository, sleeping between refreshes.
Re-use the existing `_get_member_index` + `_merge_helm_indexes` logic from `artifact/virtual.py`.
Member index fetches remain parallel via `asyncio.gather`.
Metrics: expose last-refresh timestamp and duration per virtual repo via the existing `/metrics` endpoint.
Config: opt-in per virtual repo with `background_refresh: true` (default on once implemented).

## Problem Virtual repository \`index.yaml\` rebuilds are currently on-demand: when the Redis TTL expires, the next request triggers a full rebuild — parallel HTTP fetches across all member remotes, plus a YAML parse/merge/dump cycle. With 19 members this takes ~22s fetch + ~44s merge = ~66s total, blocking the client. ## Proposed solution Add a background task (FastAPI \`BackgroundTasks\` or a dedicated async loop) that proactively re-warms the virtual index before it expires, so the virtual handler **only ever serves from S3** — never rebuilds inline. ### Behaviour - On startup, trigger an initial warm for each configured virtual repository. - Schedule a refresh at \`min(member_ttls) * 0.9\` (10% before the cache expires) so there is always a fresh merged index ready in S3. - The \`GET /api/v1/virtual/{name}/index.yaml\` handler checks Redis + S3 and serves immediately; if the cache is cold (first boot race, S3 eviction), fall back to the current inline rebuild and log a warning. ### Implementation notes - One async background loop per virtual repository, sleeping between refreshes. - Re-use the existing \`_get_member_index\` + \`_merge_helm_indexes\` logic from \`artifact/virtual.py\`. - Member index fetches remain parallel via \`asyncio.gather\`. - Metrics: expose last-refresh timestamp and duration per virtual repo via the existing \`/metrics\` endpoint. - Config: opt-in per virtual repo with \`background_refresh: true\` (default on once implemented). ### Related Observed timings with 19 member repos: \`fetch=22749ms merge=44083ms store=109ms\` — the merge/dump of ~1 M lines of YAML is the dominant cost and would move entirely off the request path with this change.

Sign in to join this conversation.

1 Participants

Notifications

Due Date

No due date set.

Dependencies

No dependencies set.

Reference: unkin/artifactapi#28