docs: authentication & authorization system design (epic #79) #95

Open
unkinben wants to merge 1 commits from benvin/auth-design-doc into master
+185
View File
@@ -0,0 +1,185 @@
# Authentication & Authorization — Design
Status: **proposed** (tracking issue #79)
Today ArtifactAPI has no authentication: every proxy and management request is
served unconditionally. This document describes an auth/authz system that adds
identity and path-scoped authorization **without changing behaviour until an
operator turns enforcement on** — the default policy is fully open.
## Goals
- Identify callers as one of two principal kinds: **service accounts** and **users**.
- Authorize each request against a **path + capability** ACL model.
- Let **Vault/OpenBao** mint short-lived tokens so the Terraform provider can get
just-in-time credentials to make config changes.
- Ship **default-open**: an unconfigured deployment behaves exactly as today.
## Non-goals (initial phase)
- Per-object encryption, signing, or content trust.
- Rate limiting / quotas (separate concern).
- Multi-tenancy beyond what path ACLs express.
## Principals
| Kind | Authenticates with | Created by | Lifetime |
|---|---|---|---|
| Service account — static token | `Authorization: Bearer <token>` | admin via management API | until revoked |
| Service account — dynamic token | `Authorization: Bearer <token>` | Vault secrets engine → mint endpoint | lease TTL (auto-revoked) |
| User | UI session cookie (OIDC/LDAP login) | external IdP, first-seen on login | session TTL |
A **service account** is a named identity holding a set of ACL grants. It may
have any number of associated tokens (static, or dynamic ones minted by Vault).
A **user** is an identity resolved from an external IdP; group membership from
the IdP maps to ACL grants.
## Tokens
- Format: `aapi_<base62(32 random bytes)>`. The `aapi_` prefix makes tokens
greppable and lets us reject obviously-malformed values cheaply.
- Storage: only the **SHA-256 of the token** is stored, never the plaintext.
Lookup hashes the presented token and matches by hash.
- Each token row carries: id, principal (service account) ref, sha256, optional
label, `expires_at` (null = non-expiring), `created_at`, `last_used_at`.
- Revocation: delete the row (static) or Vault lease revoke → mint endpoint
revoke (dynamic).
## ACL model
A grant is `(path_pattern, capability)`. A principal is allowed an action iff at
least one of its grants matches the request's resource path and capability.
### Resource paths
```
remote/<remote-name>/<path-in-remote> # proxy + local repo objects
virtual/<virtual-name>/<path> # virtual repo reads
admin/remotes/<remote-name> # manage a remote definition
admin/virtuals/<virtual-name> # manage a virtual definition
admin/principals/<name> # manage service accounts / tokens
```
Patterns support a trailing `*` wildcard and `<segment>/*` prefixes, e.g.
`remote/dockerhub/*`, `remote/*`, `admin/*`. Matching is longest-prefix by
segment; an exact match always wins over a wildcard.
### Capabilities
| Capability | Meaning for `remote/...` | Meaning for `admin/...` |
|---|---|---|
| `read` | GET/HEAD an artifact | GET a definition |
| `create` | first upload of a new local file | create a new definition |
| `write` | overwrite / re-publish | update an existing definition |
| `delete` | remove an object | delete a definition |
The HTTP layer maps each route to `(resource path, capability)`:
| Route | Resource | Capability |
|---|---|---|
| `GET /api/v1/remote/{r}/*`, `/v2/{r}/*` | `remote/{r}/{path}` | `read` |
| `GET /api/v1/virtual/{v}/*` | `virtual/{v}/{path}` | `read` |
| `PUT /api/v2/remotes/{r}/files/*` (new file) | `remote/{r}/{path}` | `create` |
| `PUT ...` (existing file) | `remote/{r}/{path}` | `write` |
| `DELETE /api/v2/remotes/{r}/files/*` | `remote/{r}/{path}` | `delete` |
| `POST /api/v2/remotes` | `admin/remotes/{name}` | `create` |
| `PUT /api/v2/remotes/{r}` | `admin/remotes/{r}` | `write` |
| `DELETE /api/v2/remotes/{r}` | `admin/remotes/{r}` | `delete` |
## Enforcement middleware & default-open
A single middleware runs before the proxy/management handlers:
1. Resolve the principal from the request (bearer token → service account, or
session cookie → user). No credential → the **anonymous** principal.
2. Compute `(resource, capability)` for the route.
3. If **enforcement is disabled** (default), allow. Otherwise, evaluate the
principal's grants (including the anonymous principal's grants) and allow iff
a grant matches; else 401 (no/invalid credential) or 403 (authenticated but
unauthorized).
Enforcement is controlled by a single setting, `AUTH_ENFORCE` (default `false`).
While `false`, the middleware still *resolves* the principal (so `last_used_at`
and audit logging work) but never denies — making rollout observable before it
is enforced. The **anonymous** principal is seeded with `*` → all capabilities,
so even flipping `AUTH_ENFORCE=true` with no other config keeps the deployment
open until an admin tightens the anonymous grants.
## Vault integration
### Mint endpoint (artifactapi side)
`POST /api/v2/auth/tokens:mint` — restricted to callers Vault trusts. It creates
a dynamic token bound to a named service account with a caller-supplied TTL, and
returns the plaintext once. `DELETE /api/v2/auth/tokens/{id}` revokes it.
Trust between Vault and artifactapi: a dedicated **bootstrap service account**
whose static token is stored in Vault's engine `config`. The mint endpoint
requires `admin/principals/*: write`. (mTLS is a future hardening option.)
### `vault-plugin-secrets-artifactapi` (new repo)
Mirrors [`vault-plugin-secrets-litellm`](https://git.unkin.net/unkin/vault-plugin-secrets-litellm):
HashiCorp `vault/sdk`, OpenBao-compatible single binary. Paths:
- `config` — artifactapi base URL + bootstrap token.
- `roles/<name>` — target service account + default/max TTL.
- `creds/<name>` — mint a dynamic token (calls the mint endpoint); the Vault
lease's revoke calls the revoke endpoint.
E2e (`make e2e`) spins Postgres + MinIO + Redis + artifactapi + Vault + OpenBao
in Docker and exercises the full lease lifecycle against both engines. On the
Fedora host all bind mounts need `:z` (SELinux).
## User login (OIDC/LDAP) & UI
- `GET /api/v2/auth/login` starts an OIDC auth-code flow (or LDAP bind form);
`GET /api/v2/auth/callback` establishes a signed session cookie.
- IdP groups map to service-account-style grants via configurable group→grant
rules. Existing infra: `terraform-authentik`, `terraform-ldap`.
- The React UI gains a login state and sends the session cookie; management
screens hide actions the principal lacks.
## Terraform provider
`terraform-provider-artifactapi` gains a `token` attribute (and
`ARTIFACTAPI_TOKEN` env var) sent as `Authorization: Bearer`. In CI the token is
sourced from the Vault engine above, so config changes use short-lived creds.
## Data model (new tables, additive migration)
```sql
service_accounts(name PK, description, disabled, created_at)
auth_tokens(id PK, principal TEXT REFERENCES service_accounts(name) ON DELETE CASCADE,
token_sha256 TEXT UNIQUE, label, expires_at, created_at, last_used_at)
acl_grants(id PK, principal TEXT, path_pattern TEXT, capability TEXT,
UNIQUE(principal, path_pattern, capability))
-- principal = a service account name, the reserved 'anonymous', or 'user:<sub>'
```
All tables are created with `CREATE TABLE IF NOT EXISTS` alongside the existing
inline migrations; adding them changes no current behaviour.
## Rollout / phased delivery
Each phase is a separate PR; the system stays open until phase 6 is deliberately
enabled.
1. **Data model + resolution** — tables, token hashing, principal resolution
middleware in **observe-only** mode (never denies). Seed anonymous `*`.
2. **ACL evaluation** — grant matching + `(resource, capability)` route mapping,
still gated by `AUTH_ENFORCE=false`.
3. **Management API** — CRUD for service accounts, tokens, grants.
4. **Vault mint/revoke endpoints** + bootstrap trust.
5. **`vault-plugin-secrets-artifactapi`** (new repo) + `terraform-vault` role,
policies; `argocd-apps` deploy.
6. **OIDC/LDAP user login + UI**, Terraform provider `token`, and the switch to
enable enforcement in an environment.
## Cross-repo dependencies
- `terraform-vault` — mount the secrets engine, define `roles/*`, ACL policies,
and the K8s auth role the Terraform CI uses.
- `argocd-apps` — deploy the plugin sidecar/init and any ServiceAccount.
- `terraform-provider-artifactapi``token` attribute.
- `terraform-authentik` / `terraform-ldap` — IdP client + group mappings.