Remove SPEC, ARCHITECTURE, TODO from tracking

Add to .gitignore — these are working documents, not part of the shipped codebase.
2026-05-04 22:16:55 +10:00
parent 2309e9f43a
commit d2e9e4df59
3 changed files with 0 additions and 169 deletions
@@ -1,112 +0,0 @@
-# StreamStack Architecture
-
-## Services
-
-| Service | Replicas | Backing stores | Responsibility |
-|---------|----------|----------------|----------------|
-| **auth** | 2 | Postgres, NATS KV | User accounts, JWT issue/refresh/revoke |
-| **catalogue** | 2 | Postgres, NATS pub | Media metadata CRUD, stream token requests |
-| **streaming** | 2 | NATS KV, S3 | Token issuance, byte-range video delivery |
-| **ingest** | 2 | S3, (catalogue HTTP) | Upload video, extract metadata/thumbnail, register in catalogue |
-| **nginx** | 1 | — | Reverse proxy + React SPA |
-
-## Infrastructure
-
-| Component | Purpose |
-|-----------|---------|
-| **Postgres** | Persistent store for user accounts (auth) and media metadata (catalogue) |
-| **NATS JetStream KV** | Short-lived stream tokens (1h TTL); revoked-token list for JWT blacklisting |
-| **S3 / MinIO** | Binary storage — `media/` bucket for video files, `thumbnails/` bucket for JPEG thumbnails |
-
---
-
-## Request flows
-
-### Login
-```
-Browser → nginx → auth
-  auth reads Postgres (verify credentials)
-  auth writes nothing to NATS
-  auth returns access_token (JWT, RS256, 30min) + refresh_token (7 days)
-```
-
-### Browse catalogue
-```
-Browser → nginx → catalogue
-  catalogue reads Postgres (published media items)
-  returns list of metadata (title, duration, thumbnail_s3_key, etc.)
-  no NATS, no S3
-```
-
-### Request a stream token
-```
-Browser → nginx → catalogue  POST /catalogue/{id}/stream-token
-  catalogue reads Postgres → gets s3_key + size_bytes for the item
-  catalogue → streaming  POST /stream/token  {media_id, s3_key, size_bytes}
-    streaming verifies JWT (public key, local)
-    streaming writes NATS KV: token → "media_id|user_id|timestamp|s3_key|size_bytes"
-    streaming returns {stream_url: "/api/v1/stream/<token>"}
-  catalogue returns stream_url to browser
-```
-Token TTL: 1 hour. After that, NATS discards it automatically.
-
-### Play video (each range request)
-```
-Browser → nginx → streaming  GET /stream/<token>  Range: bytes=X-Y
-  streaming reads NATS KV (resolve token → s3_key + size_bytes)
-  streaming → S3  GET object with byte range  (aiobotocore, fully async)
-  streams bytes back to browser
-  no Postgres, no catalogue HTTP call
-```
-The browser sends many range requests for a single video. Each one costs only a NATS lookup + S3 range-get.
-
-### Ingest a video (admin only)
-```
-curl/frontend → nginx → ingest  POST /ingest/upload  (multipart)
-  ingest verifies JWT (admin role required)
-  ingest → S3  upload file → media/{uuid}.ext
-  ingest → S3  head_object → size_bytes
-  ingest runs PyAV (in threadpool):
-    - reads S3 via range-gets → extracts duration, codec, width, height, fps
-    - decodes first video frame → JPEG → S3 thumbnails/{uuid}.jpg
-  ingest → catalogue  POST /catalogue/  {s3_key, size_bytes, metadata...}
-    catalogue writes Postgres
-    catalogue publishes NATS: catalogue.events.media.published
-  returns catalogue item JSON
-```
-
---
-
-## JWT flow
-
-Auth uses **RS256** (asymmetric). The private key signs tokens; all other services hold only the public key and verify locally — no auth HTTP call on every request.
-
-Revoked tokens are stored as keys in a NATS KV bucket (`revoked-tokens`). Streaming checks this bucket on token issue, not on every range request.
-
---
-
-## Data ownership
-
-```
-Postgres       auth        users, hashed passwords, roles
-               catalogue   media items, all metadata fields
-
-NATS KV        streaming   stream tokens (s3_key + size_bytes embedded)
-               auth        revoked JWT list
-
-S3             ingest      video files  →  media/
-               ingest      thumbnails   →  thumbnails/
-               (read)      streaming    reads media/ for range delivery
-               (read)      ingest/PyAV  reads media/ for metadata extraction
-```
-
---
-
-## Inter-service HTTP calls
-
-| Caller | Callee | When |
-|--------|--------|------|
-| catalogue | streaming | Stream token request — passes s3_key + size_bytes |
-| ingest | catalogue | After upload — registers the media item |
-
-All other cross-service communication is either direct DB access (own service only) or NATS pub/sub. Services do **not** query each other's databases.
@@ -1,41 +0,0 @@
-welcome to stream stack.
-
-this project is to build a media streaming service comprised of a number of microservices and multiple frontends (desktop, mobile, admin). the aim is that every component is highly available and load balanced. state is shared between all processes through NATS, and persistent data is stored in pgsql or s3 (depending on the data). the backends should all be build using fastapi. each backend service should be able to run independently (should it be one pypi package that we enable features for different modules, or a different pypi package for each system?)
-
-the frontend services should be in a fast and responsive language that will consume the fastapi services (react maybe?). there should be a "router" service that the frontend talks to, which proxies connections to the appropriate backend, or should be put different services on different dns addresses?
-
-question: can we stream media from s3? will that enable skipping forward/backwards?
-
-ensure there are unit tests for all file (in tests/)
-add a makefile that tests the unit tests (make test) using uvx (so we dont need to install any requirements permamently)
-add makefile test for linting with ruff
-add Dockerfile to run the streamstack (with booleans to enable different microservices)
-    - this should use git.unkin.net/unkin/almalinux9-base:latest to build, then the uv container (dhi.io/uv:0.11) to run
-add docker-compose for e22 testing of the stack (with makefile targets to start/stop)
-
-
-required projects:
- https://github.com/pyav-org/pyav
- https://github.com/fastapi/fastapi
- https://github.com/nats-io/nats.py
-
-phase 1:
- build a backend microservice that can read media files with ffmpeg (pyav) from s3 and stream them. the url to stream the media should not include the name of the media. the url should be openable in mpv for testing.
-
-phase 2:
- build a microservice that presents the media catalogue. this will be used by the frontend later to list media available.
-
-phase 3:
- build auth microservice. it should be a jwt provider. when a user autheticates, they have a jwt kept somewhere that is passed to each microservice for each request. each microservice should then verify the jwt against the auth microservice.
-
-phase 4:
- import microservice, for importing video into s3, adding to catalogue, finding metadata (thumbnail, actors, etc)
-
-phase 5:
-  simple react frontend (this is just for testing. no auth. just show catalogue and when you click on an item, play that video)
- the frontend should be its own container, so that it can be run in a DMZ
-
-
-additional requirements:
-    keep track of where a user is up to in a given video, so that when they replay it, it starts from a few seconds before where they stopped.
-    when streaming video, send bursts of video to the user so that it caches on the client side
@@ -1,16 +0,0 @@
-# TODO
-
- Transcode MKV uploads to MP4 during ingest — browsers (Firefox/Chrome) cannot natively play MKV containers, so Jellyfish-style uploads fail to load in the video player.
- IMDB metadata microservice — subscribe to `catalogue.events.media.published` (durable consumer `"imdb-fetcher"`), look up title/year against IMDB API, patch catalogue with enriched metadata (rating, genre, plot, cast).
- Subtitle fetcher microservice — subscribe to `catalogue.events.media.published` (durable consumer `"subtitle-fetcher"`), fetch subtitles (e.g. OpenSubtitles API), store as `.vtt` in S3, update catalogue with subtitle_s3_key. Frontend `<video>` supports `<track>` elements for native subtitle display.
-
-## TV show metadata identification
-
-For a file like `Clarkson's.Farm.S01E01.Tractoring.WEBRip-1080p.mp4`, metadata can be identified via:
-
- **Filename parsing** — extract show name, season, episode number, and episode title from the filename using a regex (e.g. `S(\d+)E(\d+)` pattern). The ingest service or a dedicated parser microservice could do this automatically at upload time, pre-filling `show_name`, `season`, `episode`, `episode_title` fields so the user doesn't have to type them.
- **TheTVDB API** — given `show_name` + `season` + `episode`, look up the canonical title, air date, plot, guest cast, network, and a high-quality episode thumbnail. Free API key available. Subscribe to `catalogue.events.media.published` as a durable consumer `"tvdb-fetcher"`.
- **TMDB (The Movie Database)** — also covers TV series (`/tv/{series_id}/season/{n}/episode/{n}`). Has episode stills, show banners, cast photos. Free API key.
- **IMDb / Cinemagoer** — Python library (`cinemagoer`, formerly IMDbPY) that scrapes IMDb data without an API key. Slower but no key required. IMDb series ID can be cross-referenced from TheTVDB.
- **Video container metadata** — MKV/MP4 files sometimes embed title, show name, season/episode in container tags (readable via PyAV `container.metadata`). Worth checking before hitting external APIs — already have the file open during ingest.
- **Suggested flow**: parse filename → check container tags → query TheTVDB with (show_name, season, episode) → fall back to TMDB → patch catalogue via service JWT.