feat: cache parsed member indexes as msgpack to skip YAML re-parse on rebuild #40

Merged
unkinben merged 1 commits from benvin/issue-36-msgpack-member-cache into master 2026-05-02 17:15:31 +10:00
Owner

Closes #36

Summary

  • After fetching a member's index.yaml (from upstream or S3), the handler now parses it and stores a compact msgpack file (index.msgpack) alongside the raw YAML in S3
  • On subsequent virtual rebuilds (member caches valid, virtual TTL expired), the handler loads the msgpack file instead of re-parsing raw YAML — eliminating the costliest phase
  • _entries_to_msgpack_safe() converts datetime/date objects to ISO strings before packing (msgpack cannot natively serialize Python datetimes)
  • _merge_helm_indexes() accepts list[dict | None] as pre-parsed entries; falls back to raw YAML parse when msgpack is unavailable
  • _VirtualHandler.merge() protocol updated to pass pre-parsed entries to all future handler implementations
  • Broken msgpack is detected and rebuilt from raw YAML automatically

Performance

Phase breakdown (19-member helm-all virtual, 14 MB total):

Phase Time %
YAML parse (eliminated) 6314 ms 60%
URL rewrite + dedup 33 ms 0.3%
YAML dump 4124 ms 39%
Scenario Before (CSafeLoader only, #34) After
Cold rebuild (upstream fetch) ~21s ~26s (+5s for msgpack build, one-time)
Warm rebuild (S3 hit, virtual expired) ~9.6s ~5.9s (38% faster)
Virtual cache hit ~0.03s ~0.03s

Log line confirms msgpack hits: msgpack=19/19

Test plan

  • 297 tests pass
  • TestEntriesToMsgpackSafe: datetime/date serialization, empty input, round-trip
  • TestMergeHelmIndexesWithParsed: pre-parsed path produces identical output to raw-bytes path
  • TestGetMemberIndexMsgpack: msgpack hit, cold-build, broken msgpack fallback, upstream failure
  • Docker warm-rebuild measured at 5.9s vs 9.6s baseline
Closes #36 ## Summary - After fetching a member's `index.yaml` (from upstream or S3), the handler now parses it and stores a compact msgpack file (`index.msgpack`) alongside the raw YAML in S3 - On subsequent virtual rebuilds (member caches valid, virtual TTL expired), the handler loads the msgpack file instead of re-parsing raw YAML — eliminating the costliest phase - `_entries_to_msgpack_safe()` converts datetime/date objects to ISO strings before packing (msgpack cannot natively serialize Python datetimes) - `_merge_helm_indexes()` accepts `list[dict | None]` as pre-parsed entries; falls back to raw YAML parse when msgpack is unavailable - `_VirtualHandler.merge()` protocol updated to pass pre-parsed entries to all future handler implementations - Broken msgpack is detected and rebuilt from raw YAML automatically ## Performance Phase breakdown (19-member helm-all virtual, 14 MB total): | Phase | Time | % | |---|---|---| | YAML parse (eliminated) | 6314 ms | 60% | | URL rewrite + dedup | 33 ms | 0.3% | | YAML dump | 4124 ms | 39% | | Scenario | Before (CSafeLoader only, #34) | After | |---|---|---| | Cold rebuild (upstream fetch) | ~21s | ~26s (+5s for msgpack build, one-time) | | **Warm rebuild (S3 hit, virtual expired)** | **~9.6s** | **~5.9s (38% faster)** | | Virtual cache hit | ~0.03s | ~0.03s | Log line confirms msgpack hits: `msgpack=19/19` ## Test plan - 297 tests pass - `TestEntriesToMsgpackSafe`: datetime/date serialization, empty input, round-trip - `TestMergeHelmIndexesWithParsed`: pre-parsed path produces identical output to raw-bytes path - `TestGetMemberIndexMsgpack`: msgpack hit, cold-build, broken msgpack fallback, upstream failure - Docker warm-rebuild measured at 5.9s vs 9.6s baseline
unkinben added 1 commit 2026-05-02 17:04:44 +10:00
feat: cache parsed member indexes as msgpack to skip YAML re-parse on rebuild
ci/woodpecker/pr/test Pipeline was successful
ci/woodpecker/pr/pre-commit Pipeline was successful
ci/woodpecker/pr/build Pipeline was successful
1e0f4dc840
Warm rebuilds of virtual repos (member caches valid, virtual TTL expired)
previously re-parsed all member index.yaml files on every rebuild. With 19
Helm members totalling 14 MB, YAML parsing was 60% of merge time (~6.3s of
~9.6s). Parsing each member's YAML also produces msgpack and stores it in S3
alongside the raw index. Subsequent rebuilds load the compact msgpack and skip
YAML parsing entirely.

Before: warm rebuild ~9.6s (CSafeLoader baseline)
After:  warm rebuild ~5.9s (38% faster, merge=4.7s down from ~9.6s)
unkinben merged commit 8a7f26b193 into master 2026-05-02 17:15:31 +10:00
unkinben deleted branch benvin/issue-36-msgpack-member-cache 2026-05-02 17:15:31 +10:00
Sign in to join this conversation.
No Reviewers
No Label
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: unkin/artifactapi#40