commit 2309e9f43a9790442c9b9265ec6aaa5d8a20eb07 Author: Ben Vincent Date: Mon May 4 22:16:39 2026 +1000 Initial commit — StreamStack v1 Five-service streaming platform: auth, catalogue, streaming, ingest, thumbnailer. Includes React frontend served by nginx, NATS JetStream event bus, aiobotocore async S3, PyAV video metadata + thumbnail extraction, service-to-service JWT auth, and a full unit + e2e test suite. diff --git a/.gitignore b/.gitignore new file mode 100644 index 0000000..15fad2c --- /dev/null +++ b/.gitignore @@ -0,0 +1,14 @@ +.env +.venv/ +__pycache__/ +*.pyc +*.pyo +.pytest_cache/ +.coverage +htmlcov/ +dist/ +*.egg-info/ +.ruff_cache/ +keys/ +*.pem +testdata/ diff --git a/ARCHITECTURE.md b/ARCHITECTURE.md new file mode 100644 index 0000000..4ad46fd --- /dev/null +++ b/ARCHITECTURE.md @@ -0,0 +1,112 @@ +# StreamStack Architecture + +## Services + +| Service | Replicas | Backing stores | Responsibility | +|---------|----------|----------------|----------------| +| **auth** | 2 | Postgres, NATS KV | User accounts, JWT issue/refresh/revoke | +| **catalogue** | 2 | Postgres, NATS pub | Media metadata CRUD, stream token requests | +| **streaming** | 2 | NATS KV, S3 | Token issuance, byte-range video delivery | +| **ingest** | 2 | S3, (catalogue HTTP) | Upload video, extract metadata/thumbnail, register in catalogue | +| **nginx** | 1 | — | Reverse proxy + React SPA | + +## Infrastructure + +| Component | Purpose | +|-----------|---------| +| **Postgres** | Persistent store for user accounts (auth) and media metadata (catalogue) | +| **NATS JetStream KV** | Short-lived stream tokens (1h TTL); revoked-token list for JWT blacklisting | +| **S3 / MinIO** | Binary storage — `media/` bucket for video files, `thumbnails/` bucket for JPEG thumbnails | + +--- + +## Request flows + +### Login +``` +Browser → nginx → auth + auth reads Postgres (verify credentials) + auth writes nothing to NATS + auth returns access_token (JWT, RS256, 30min) + refresh_token (7 days) +``` + +### Browse catalogue +``` +Browser → nginx → catalogue + catalogue reads Postgres (published media items) + returns list of metadata (title, duration, thumbnail_s3_key, etc.) + no NATS, no S3 +``` + +### Request a stream token +``` +Browser → nginx → catalogue POST /catalogue/{id}/stream-token + catalogue reads Postgres → gets s3_key + size_bytes for the item + catalogue → streaming POST /stream/token {media_id, s3_key, size_bytes} + streaming verifies JWT (public key, local) + streaming writes NATS KV: token → "media_id|user_id|timestamp|s3_key|size_bytes" + streaming returns {stream_url: "/api/v1/stream/"} + catalogue returns stream_url to browser +``` +Token TTL: 1 hour. After that, NATS discards it automatically. + +### Play video (each range request) +``` +Browser → nginx → streaming GET /stream/ Range: bytes=X-Y + streaming reads NATS KV (resolve token → s3_key + size_bytes) + streaming → S3 GET object with byte range (aiobotocore, fully async) + streams bytes back to browser + no Postgres, no catalogue HTTP call +``` +The browser sends many range requests for a single video. Each one costs only a NATS lookup + S3 range-get. + +### Ingest a video (admin only) +``` +curl/frontend → nginx → ingest POST /ingest/upload (multipart) + ingest verifies JWT (admin role required) + ingest → S3 upload file → media/{uuid}.ext + ingest → S3 head_object → size_bytes + ingest runs PyAV (in threadpool): + - reads S3 via range-gets → extracts duration, codec, width, height, fps + - decodes first video frame → JPEG → S3 thumbnails/{uuid}.jpg + ingest → catalogue POST /catalogue/ {s3_key, size_bytes, metadata...} + catalogue writes Postgres + catalogue publishes NATS: catalogue.events.media.published + returns catalogue item JSON +``` + +--- + +## JWT flow + +Auth uses **RS256** (asymmetric). The private key signs tokens; all other services hold only the public key and verify locally — no auth HTTP call on every request. + +Revoked tokens are stored as keys in a NATS KV bucket (`revoked-tokens`). Streaming checks this bucket on token issue, not on every range request. + +--- + +## Data ownership + +``` +Postgres auth users, hashed passwords, roles + catalogue media items, all metadata fields + +NATS KV streaming stream tokens (s3_key + size_bytes embedded) + auth revoked JWT list + +S3 ingest video files → media/ + ingest thumbnails → thumbnails/ + (read) streaming reads media/ for range delivery + (read) ingest/PyAV reads media/ for metadata extraction +``` + +--- + +## Inter-service HTTP calls + +| Caller | Callee | When | +|--------|--------|------| +| catalogue | streaming | Stream token request — passes s3_key + size_bytes | +| ingest | catalogue | After upload — registers the media item | + +All other cross-service communication is either direct DB access (own service only) or NATS pub/sub. Services do **not** query each other's databases. diff --git a/Dockerfile b/Dockerfile new file mode 100644 index 0000000..9eea24d --- /dev/null +++ b/Dockerfile @@ -0,0 +1,31 @@ +# syntax=docker/dockerfile:1.4 +ARG SERVICE=streaming + +# ── Stage 1: Build ──────────────────────────────────────────────────────────── +FROM git.unkin.net/unkin/almalinux9-base:20260308 AS builder + +RUN dnf install -y jellyfin-ffmpeg-bin && dnf clean all + +WORKDIR /app +COPY pyproject.toml uv.lock* ./ +COPY src/ src/ +RUN uv sync --frozen --no-dev --python 3.12 + +# ── Stage 2: Runtime ────────────────────────────────────────────────────────── +FROM git.unkin.net/unkin/almalinux9-base:20260308 AS runtime + +RUN dnf install -y jellyfin-ffmpeg-bin && dnf clean all + +WORKDIR /app +COPY --from=builder /app/.venv /app/.venv +COPY --from=builder /app/src /app/src + +ARG SERVICE=streaming +ENV SERVICE=${SERVICE} \ + PATH="/app/.venv/bin:$PATH" \ + PYTHONPATH="/app/src" + +EXPOSE 8000 + +COPY --chmod=755 scripts/entrypoint.sh /entrypoint.sh +ENTRYPOINT ["/entrypoint.sh"] diff --git a/Dockerfile.nginx b/Dockerfile.nginx new file mode 100644 index 0000000..770e699 --- /dev/null +++ b/Dockerfile.nginx @@ -0,0 +1,21 @@ +# syntax=docker/dockerfile:1.4 + +# Stage 1: build the React frontend +FROM node:22-alpine AS frontend-build + +WORKDIR /app +COPY frontend/package*.json ./ +RUN npm install +COPY frontend/ . + +ARG VITE_GUEST_EMAIL=guest@streamstack.local +ARG VITE_GUEST_PASSWORD=streamstack-guest +RUN VITE_GUEST_EMAIL=$VITE_GUEST_EMAIL \ + VITE_GUEST_PASSWORD=$VITE_GUEST_PASSWORD \ + npm run build + +# Stage 2: nginx serves the frontend and proxies API calls +FROM nginx:1.26-alpine + +COPY --from=frontend-build /app/dist /usr/share/nginx/html +COPY nginx/conf.d/streamstack.conf /etc/nginx/conf.d/default.conf diff --git a/Dockerfile.test b/Dockerfile.test new file mode 100644 index 0000000..fe85663 --- /dev/null +++ b/Dockerfile.test @@ -0,0 +1,14 @@ +FROM git.unkin.net/unkin/almalinux9-base:20260308 + +RUN dnf install -y \ + gcc gcc-c++ make ffmpeg-devel libpq-devel && dnf clean all + +WORKDIR /app +COPY pyproject.toml uv.lock* ./ +COPY src/ src/ +COPY tests/ tests/ +COPY testdata/ testdata/ + +RUN uv sync --frozen --extra dev --python 3.12 + +ENV PYTHONPATH="/app/src" PATH="/app/.venv/bin:$PATH" diff --git a/Makefile b/Makefile new file mode 100644 index 0000000..92cca15 --- /dev/null +++ b/Makefile @@ -0,0 +1,30 @@ +.PHONY: test lint lint-fix up down up-test down-test migrate migration + +test: + uv run --extra dev pytest tests/ -v --cov=src/streamstack --cov-report=term-missing + +lint: + uv run --extra lint ruff check src/ tests/ + uv run --extra lint ruff format --check src/ tests/ + +lint-fix: + uv run --extra lint ruff check --fix src/ tests/ + uv run --extra lint ruff format src/ tests/ + +up: + docker compose up -d --build + +down: + docker compose down -v + +up-test: + docker compose -f docker-compose.test.yml up --build --abort-on-container-exit + +down-test: + docker compose -f docker-compose.test.yml down -v + +migrate: + uv run alembic upgrade head + +migration: + uv run alembic revision --autogenerate -m "$(MSG)" diff --git a/SPEC.md b/SPEC.md new file mode 100644 index 0000000..b0c2d77 --- /dev/null +++ b/SPEC.md @@ -0,0 +1,41 @@ +welcome to stream stack. + +this project is to build a media streaming service comprised of a number of microservices and multiple frontends (desktop, mobile, admin). the aim is that every component is highly available and load balanced. state is shared between all processes through NATS, and persistent data is stored in pgsql or s3 (depending on the data). the backends should all be build using fastapi. each backend service should be able to run independently (should it be one pypi package that we enable features for different modules, or a different pypi package for each system?) + +the frontend services should be in a fast and responsive language that will consume the fastapi services (react maybe?). there should be a "router" service that the frontend talks to, which proxies connections to the appropriate backend, or should be put different services on different dns addresses? + +question: can we stream media from s3? will that enable skipping forward/backwards? + +ensure there are unit tests for all file (in tests/) +add a makefile that tests the unit tests (make test) using uvx (so we dont need to install any requirements permamently) +add makefile test for linting with ruff +add Dockerfile to run the streamstack (with booleans to enable different microservices) + - this should use git.unkin.net/unkin/almalinux9-base:latest to build, then the uv container (dhi.io/uv:0.11) to run +add docker-compose for e22 testing of the stack (with makefile targets to start/stop) + + +required projects: +- https://github.com/pyav-org/pyav +- https://github.com/fastapi/fastapi +- https://github.com/nats-io/nats.py + +phase 1: +- build a backend microservice that can read media files with ffmpeg (pyav) from s3 and stream them. the url to stream the media should not include the name of the media. the url should be openable in mpv for testing. + +phase 2: +- build a microservice that presents the media catalogue. this will be used by the frontend later to list media available. + +phase 3: +- build auth microservice. it should be a jwt provider. when a user autheticates, they have a jwt kept somewhere that is passed to each microservice for each request. each microservice should then verify the jwt against the auth microservice. + +phase 4: +- import microservice, for importing video into s3, adding to catalogue, finding metadata (thumbnail, actors, etc) + +phase 5: +- simple react frontend (this is just for testing. no auth. just show catalogue and when you click on an item, play that video) +- the frontend should be its own container, so that it can be run in a DMZ + + +additional requirements: + keep track of where a user is up to in a given video, so that when they replay it, it starts from a few seconds before where they stopped. + when streaming video, send bursts of video to the user so that it caches on the client side diff --git a/TODO.md b/TODO.md new file mode 100644 index 0000000..38edb0f --- /dev/null +++ b/TODO.md @@ -0,0 +1,16 @@ +# TODO + +- Transcode MKV uploads to MP4 during ingest — browsers (Firefox/Chrome) cannot natively play MKV containers, so Jellyfish-style uploads fail to load in the video player. +- IMDB metadata microservice — subscribe to `catalogue.events.media.published` (durable consumer `"imdb-fetcher"`), look up title/year against IMDB API, patch catalogue with enriched metadata (rating, genre, plot, cast). +- Subtitle fetcher microservice — subscribe to `catalogue.events.media.published` (durable consumer `"subtitle-fetcher"`), fetch subtitles (e.g. OpenSubtitles API), store as `.vtt` in S3, update catalogue with subtitle_s3_key. Frontend `