Compare commits

..

1 Commits

Author SHA1 Message Date
unkinben b698d1bdc0 perf: memoise regex compilation in the classifier
ci/woodpecker/pr/test Pipeline was successful
ci/woodpecker/pr/pre-commit Pipeline was successful
ci/woodpecker/pr/build Pipeline was successful
Classify runs on every proxied request and recompiled each remote's
pattern lists every time. Cache compiled regexes keyed by pattern text so
each distinct pattern compiles once and is reused thereafter.

Refs #73
2026-07-02 00:33:46 +10:00
2 changed files with 21 additions and 186 deletions
-185
View File
@@ -1,185 +0,0 @@
# Authentication & Authorization — Design
Status: **proposed** (tracking issue #79)
Today ArtifactAPI has no authentication: every proxy and management request is
served unconditionally. This document describes an auth/authz system that adds
identity and path-scoped authorization **without changing behaviour until an
operator turns enforcement on** — the default policy is fully open.
## Goals
- Identify callers as one of two principal kinds: **service accounts** and **users**.
- Authorize each request against a **path + capability** ACL model.
- Let **Vault/OpenBao** mint short-lived tokens so the Terraform provider can get
just-in-time credentials to make config changes.
- Ship **default-open**: an unconfigured deployment behaves exactly as today.
## Non-goals (initial phase)
- Per-object encryption, signing, or content trust.
- Rate limiting / quotas (separate concern).
- Multi-tenancy beyond what path ACLs express.
## Principals
| Kind | Authenticates with | Created by | Lifetime |
|---|---|---|---|
| Service account — static token | `Authorization: Bearer <token>` | admin via management API | until revoked |
| Service account — dynamic token | `Authorization: Bearer <token>` | Vault secrets engine → mint endpoint | lease TTL (auto-revoked) |
| User | UI session cookie (OIDC/LDAP login) | external IdP, first-seen on login | session TTL |
A **service account** is a named identity holding a set of ACL grants. It may
have any number of associated tokens (static, or dynamic ones minted by Vault).
A **user** is an identity resolved from an external IdP; group membership from
the IdP maps to ACL grants.
## Tokens
- Format: `aapi_<base62(32 random bytes)>`. The `aapi_` prefix makes tokens
greppable and lets us reject obviously-malformed values cheaply.
- Storage: only the **SHA-256 of the token** is stored, never the plaintext.
Lookup hashes the presented token and matches by hash.
- Each token row carries: id, principal (service account) ref, sha256, optional
label, `expires_at` (null = non-expiring), `created_at`, `last_used_at`.
- Revocation: delete the row (static) or Vault lease revoke → mint endpoint
revoke (dynamic).
## ACL model
A grant is `(path_pattern, capability)`. A principal is allowed an action iff at
least one of its grants matches the request's resource path and capability.
### Resource paths
```
remote/<remote-name>/<path-in-remote> # proxy + local repo objects
virtual/<virtual-name>/<path> # virtual repo reads
admin/remotes/<remote-name> # manage a remote definition
admin/virtuals/<virtual-name> # manage a virtual definition
admin/principals/<name> # manage service accounts / tokens
```
Patterns support a trailing `*` wildcard and `<segment>/*` prefixes, e.g.
`remote/dockerhub/*`, `remote/*`, `admin/*`. Matching is longest-prefix by
segment; an exact match always wins over a wildcard.
### Capabilities
| Capability | Meaning for `remote/...` | Meaning for `admin/...` |
|---|---|---|
| `read` | GET/HEAD an artifact | GET a definition |
| `create` | first upload of a new local file | create a new definition |
| `write` | overwrite / re-publish | update an existing definition |
| `delete` | remove an object | delete a definition |
The HTTP layer maps each route to `(resource path, capability)`:
| Route | Resource | Capability |
|---|---|---|
| `GET /api/v1/remote/{r}/*`, `/v2/{r}/*` | `remote/{r}/{path}` | `read` |
| `GET /api/v1/virtual/{v}/*` | `virtual/{v}/{path}` | `read` |
| `PUT /api/v2/remotes/{r}/files/*` (new file) | `remote/{r}/{path}` | `create` |
| `PUT ...` (existing file) | `remote/{r}/{path}` | `write` |
| `DELETE /api/v2/remotes/{r}/files/*` | `remote/{r}/{path}` | `delete` |
| `POST /api/v2/remotes` | `admin/remotes/{name}` | `create` |
| `PUT /api/v2/remotes/{r}` | `admin/remotes/{r}` | `write` |
| `DELETE /api/v2/remotes/{r}` | `admin/remotes/{r}` | `delete` |
## Enforcement middleware & default-open
A single middleware runs before the proxy/management handlers:
1. Resolve the principal from the request (bearer token → service account, or
session cookie → user). No credential → the **anonymous** principal.
2. Compute `(resource, capability)` for the route.
3. If **enforcement is disabled** (default), allow. Otherwise, evaluate the
principal's grants (including the anonymous principal's grants) and allow iff
a grant matches; else 401 (no/invalid credential) or 403 (authenticated but
unauthorized).
Enforcement is controlled by a single setting, `AUTH_ENFORCE` (default `false`).
While `false`, the middleware still *resolves* the principal (so `last_used_at`
and audit logging work) but never denies — making rollout observable before it
is enforced. The **anonymous** principal is seeded with `*` → all capabilities,
so even flipping `AUTH_ENFORCE=true` with no other config keeps the deployment
open until an admin tightens the anonymous grants.
## Vault integration
### Mint endpoint (artifactapi side)
`POST /api/v2/auth/tokens:mint` — restricted to callers Vault trusts. It creates
a dynamic token bound to a named service account with a caller-supplied TTL, and
returns the plaintext once. `DELETE /api/v2/auth/tokens/{id}` revokes it.
Trust between Vault and artifactapi: a dedicated **bootstrap service account**
whose static token is stored in Vault's engine `config`. The mint endpoint
requires `admin/principals/*: write`. (mTLS is a future hardening option.)
### `vault-plugin-secrets-artifactapi` (new repo)
Mirrors [`vault-plugin-secrets-litellm`](https://git.unkin.net/unkin/vault-plugin-secrets-litellm):
HashiCorp `vault/sdk`, OpenBao-compatible single binary. Paths:
- `config` — artifactapi base URL + bootstrap token.
- `roles/<name>` — target service account + default/max TTL.
- `creds/<name>` — mint a dynamic token (calls the mint endpoint); the Vault
lease's revoke calls the revoke endpoint.
E2e (`make e2e`) spins Postgres + MinIO + Redis + artifactapi + Vault + OpenBao
in Docker and exercises the full lease lifecycle against both engines. On the
Fedora host all bind mounts need `:z` (SELinux).
## User login (OIDC/LDAP) & UI
- `GET /api/v2/auth/login` starts an OIDC auth-code flow (or LDAP bind form);
`GET /api/v2/auth/callback` establishes a signed session cookie.
- IdP groups map to service-account-style grants via configurable group→grant
rules. Existing infra: `terraform-authentik`, `terraform-ldap`.
- The React UI gains a login state and sends the session cookie; management
screens hide actions the principal lacks.
## Terraform provider
`terraform-provider-artifactapi` gains a `token` attribute (and
`ARTIFACTAPI_TOKEN` env var) sent as `Authorization: Bearer`. In CI the token is
sourced from the Vault engine above, so config changes use short-lived creds.
## Data model (new tables, additive migration)
```sql
service_accounts(name PK, description, disabled, created_at)
auth_tokens(id PK, principal TEXT REFERENCES service_accounts(name) ON DELETE CASCADE,
token_sha256 TEXT UNIQUE, label, expires_at, created_at, last_used_at)
acl_grants(id PK, principal TEXT, path_pattern TEXT, capability TEXT,
UNIQUE(principal, path_pattern, capability))
-- principal = a service account name, the reserved 'anonymous', or 'user:<sub>'
```
All tables are created with `CREATE TABLE IF NOT EXISTS` alongside the existing
inline migrations; adding them changes no current behaviour.
## Rollout / phased delivery
Each phase is a separate PR; the system stays open until phase 6 is deliberately
enabled.
1. **Data model + resolution** — tables, token hashing, principal resolution
middleware in **observe-only** mode (never denies). Seed anonymous `*`.
2. **ACL evaluation** — grant matching + `(resource, capability)` route mapping,
still gated by `AUTH_ENFORCE=false`.
3. **Management API** — CRUD for service accounts, tokens, grants.
4. **Vault mint/revoke endpoints** + bootstrap trust.
5. **`vault-plugin-secrets-artifactapi`** (new repo) + `terraform-vault` role,
policies; `argocd-apps` deploy.
6. **OIDC/LDAP user login + UI**, Terraform provider `token`, and the switch to
enable enforcement in an environment.
## Cross-repo dependencies
- `terraform-vault` — mount the secrets engine, define `roles/*`, ACL policies,
and the K8s auth role the Terraform CI uses.
- `argocd-apps` — deploy the plugin sidecar/init and any ServiceAccount.
- `terraform-provider-artifactapi``token` attribute.
- `terraform-authentik` / `terraform-ldap` — IdP client + group mappings.
+21 -1
View File
@@ -2,6 +2,7 @@ package proxy
import (
"regexp"
"sync"
"git.unkin.net/unkin/artifactapi/internal/provider"
"git.unkin.net/unkin/artifactapi/pkg/models"
@@ -60,10 +61,29 @@ func (c *Classifier) Classify(remote models.Remote, path string) Classification
return ClassImmutable
}
// patternCache memoises regex compilation. Classify runs on every proxied
// request and previously recompiled each remote's pattern lists every time;
// keying by the pattern string lets each distinct pattern compile once and
// then be reused, with no invalidation needed (the pattern text is the key).
// A pattern that fails to compile is cached as a typed nil so we don't retry.
var patternCache sync.Map // map[string]*regexp.Regexp
func compileCached(pattern string) *regexp.Regexp {
if v, ok := patternCache.Load(pattern); ok {
return v.(*regexp.Regexp)
}
re, err := regexp.Compile(pattern)
if err != nil {
re = nil
}
patternCache.Store(pattern, re)
return re
}
func compilePatterns(patterns []string) []*regexp.Regexp {
compiled := make([]*regexp.Regexp, 0, len(patterns))
for _, p := range patterns {
if re, err := regexp.Compile(p); err == nil {
if re := compileCached(p); re != nil {
compiled = append(compiled, re)
}
}