215 lines
15 KiB
Markdown
215 lines
15 KiB
Markdown
# CLAUDE.md
|
|
|
|
This file provides guidance to Claude Code (claude.ai/code) when working with code in this repository.
|
|
|
|
## Project Overview
|
|
|
|
**Veylant IA** — A B2B SaaS platform acting as an intelligent proxy/gateway for enterprise AI consumption. Core value proposition: prevent Shadow AI, enforce PII anonymization, ensure GDPR/EU AI Act compliance, and control costs across all LLM usage in an organization.
|
|
|
|
Full product requirements are in `docs/AI_Governance_Hub_PRD.md` and the 6-month execution plan (13 sprints, 164 tasks) is in `docs/AI_Governance_Hub_Plan_Realisation.md`. Architecture Decision Records live in `docs/adr/`.
|
|
|
|
## Architecture
|
|
|
|
**Go module**: `github.com/veylant/ia-gateway` · **Go version**: 1.24
|
|
|
|
**Modular monolith** (not microservices), with two distinct runtimes:
|
|
|
|
```
|
|
API Gateway (Traefik)
|
|
│
|
|
Go Proxy [cmd/proxy] — chi router, zap logger, viper config
|
|
├── internal/auth/ Local JWT auth (HS256) — LocalJWTVerifier + LoginHandler (POST /v1/auth/login)
|
|
├── internal/middleware/ Auth (JWT verification), RateLimit, RequestID, SecurityHeaders
|
|
├── internal/router/ RBAC enforcement + provider dispatch + fallback chain
|
|
├── internal/routing/ Rules engine (PostgreSQL JSONB, in-memory cache, priority ASC)
|
|
├── internal/pii/ gRPC client to PII sidecar + /v1/pii/analyze HTTP handler
|
|
├── internal/auditlog/ ClickHouse append-only logger (async batch writer)
|
|
├── internal/compliance/ GDPR Art.30 registry + AI Act classification + PDF reports
|
|
├── internal/admin/ Admin REST API (/v1/admin/*) — routing rules, users, providers
|
|
├── internal/billing/ Token cost tracking (per provider pricing)
|
|
├── internal/circuitbreaker/ Failure-count breaker (threshold=5, open_ttl=60s)
|
|
├── internal/ratelimit/ Token-bucket limiter (per-tenant + per-user, DB overrides)
|
|
├── internal/flags/ Feature flags (PostgreSQL + in-memory fallback)
|
|
├── internal/crypto/ AES-256-GCM encryptor for prompt storage
|
|
├── internal/metrics/ Prometheus middleware + metrics registration
|
|
├── internal/provider/ Adapter interface + OpenAI/Anthropic/Azure/Mistral/Ollama impls
|
|
├── internal/proxy/ Core request handler (PII → upstream → audit → response)
|
|
├── internal/apierror/ OpenAI-format error helpers (WriteError, WriteErrorWithRequestID)
|
|
├── internal/health/ /healthz, /docs, /playground, /playground/analyze handlers
|
|
└── internal/config/ Viper-based config loader (VEYLANT_* env var overrides)
|
|
│ gRPC (<2ms) to localhost:50051
|
|
PII Detection Service [services/pii] — FastAPI + grpc.aio
|
|
├── HTTP health: :8091/healthz
|
|
├── Layer 1: Regex (IBAN, email, phone, SSN, credit cards)
|
|
├── Layer 2: Presidio + spaCy NER (names, addresses, orgs)
|
|
└── Layer 3: LLM validation (V1.1, ambiguous cases)
|
|
│
|
|
LLM Provider Adapters (OpenAI, Anthropic, Azure, Mistral, Ollama)
|
|
```
|
|
|
|
**Data layer:**
|
|
- PostgreSQL 16 — config, users, policies, processing registry (Row-Level Security for multi-tenancy; app role: `veylant_app`)
|
|
- ClickHouse — analytics and immutable audit logs
|
|
- Redis 7 — sessions, rate limiting, PII pseudonymization mappings (AES-256-GCM + TTL)
|
|
- Prometheus — metrics scraper on :9090; Grafana — dashboards on :3001 (admin/admin)
|
|
- HashiCorp Vault — secrets and API key rotation (90-day cycle)
|
|
|
|
**Frontend:** React 18 + TypeScript + Vite, shadcn/ui, recharts. Routes protected via local JWT (stored in localStorage, auto-logout on expiry); `web/src/auth/` manages the auth flow. API clients live in `web/src/api/`.
|
|
|
|
**Documentation site** (`http://localhost:3000/docs`): public, no auth required. Root: `web/src/pages/docs/` — sections: getting-started, installation, api-reference (8 endpoints), guides (6), deployment (3), security (2), changelog. Layout components: `DocLayout.tsx` (sidebar + content + TOC), `DocSidebar.tsx` (with search), `DocBreadcrumbs.tsx`, `DocPagination.tsx`. Shared components: `components/CodeBlock.tsx`, `Callout.tsx`, `ApiEndpoint.tsx`, `ParamTable.tsx`, `TableOfContents.tsx`. Nav structure: `web/src/pages/docs/nav.ts`. Uses `@tailwindcss/typography` (added as devDependency) for prose rendering.
|
|
|
|
## Repository Structure
|
|
|
|
```
|
|
cmd/proxy/ # Go main entry point — wires all modules, starts HTTP server
|
|
internal/ # All Go modules (see Architecture above for full list)
|
|
gen/ # Generated Go gRPC stubs (buf generate → never edit manually)
|
|
services/pii/ # Python FastAPI + gRPC PII detection service
|
|
gen/pii/v1/ # Generated Python proto stubs (run `make proto` first)
|
|
tests/ # pytest unit tests (test_regex.py, test_pipeline.py, test_pseudo.py)
|
|
proto/pii/v1/ # gRPC .proto definitions
|
|
migrations/ # golang-migrate SQL files (up/down pairs)
|
|
clickhouse/ # ClickHouse DDL applied at startup via ApplyDDL()
|
|
web/ # React frontend (Vite, src/pages, src/components, src/api)
|
|
src/pages/docs/ # Public documentation site (no auth); nav.ts defines sidebar structure
|
|
test/ # Integration tests (test/integration/, //go:build integration) + k6 load tests (test/k6/)
|
|
deploy/ # Helm, Kubernetes manifests, Terraform (EKS), Prometheus/Grafana, alertmanager
|
|
clickhouse/ # ClickHouse config overrides for Docker (e.g. listen-ipv4.xml — forces IPv4)
|
|
docker-compose.yml # Full local dev stack (9 services)
|
|
config.yaml # Local dev config (overridden by VEYLANT_* env vars)
|
|
```
|
|
|
|
## Build & Development Commands
|
|
|
|
Use `make` as the primary interface. The proxy runs on **:8090**, PII HTTP on **:8091**, PII gRPC on **:50051**.
|
|
|
|
```bash
|
|
make dev # Start full stack (proxy + PostgreSQL + ClickHouse + Redis + Keycloak + PII)
|
|
make dev-down # Stop and remove all containers and volumes
|
|
make dev-logs # Tail logs from all services
|
|
make build # go build → bin/proxy
|
|
make test # go test -race ./...
|
|
make test-cover # Tests with HTML coverage report (coverage.html)
|
|
make test-integration # Integration tests with testcontainers (requires Docker)
|
|
make lint # golangci-lint + black --check + ruff check
|
|
make fmt # gofmt + black
|
|
make proto # buf generate — regenerates gen/ and services/pii/gen/
|
|
make proto-lint # buf lint
|
|
make migrate-up # Apply pending DB migrations
|
|
make migrate-down # Roll back last migration
|
|
make migrate-status # Show current migration version
|
|
make check # Full pre-commit: build + vet + lint + test
|
|
make health # curl localhost:8090/healthz
|
|
make docs # Open http://localhost:8090/docs in browser (proxy must be running)
|
|
make helm-dry-run # Render Helm templates without deploying
|
|
make helm-deploy # Deploy to staging (requires IMAGE_TAG + KUBECONFIG env vars)
|
|
make load-test # k6 load test (SCENARIO=smoke|load|stress|soak, default: smoke)
|
|
make deploy-blue # Blue/green: deploy IMAGE_TAG to blue slot (requires kubectl + Istio)
|
|
make deploy-green # Blue/green: deploy IMAGE_TAG to green slot
|
|
make deploy-rollback # Roll back traffic to ACTIVE_SLOT (e.g. make deploy-rollback ACTIVE_SLOT=blue)
|
|
```
|
|
|
|
**Frontend dev server** (Vite, runs on :3000):
|
|
```bash
|
|
cd web && npm install && npm run dev # dev server with HMR
|
|
cd web && npm run build # tsc + vite build → web/dist/
|
|
cd web && npm run lint # ESLint (max-warnings: 0)
|
|
```
|
|
|
|
**Vite dev proxy:** In dev mode, all `/v1/*` requests from the frontend are proxied to `localhost:8090` (the Go proxy). No CORS issues during development.
|
|
|
|
**Run a single Go test:**
|
|
```bash
|
|
go test -run TestName ./internal/module/
|
|
```
|
|
|
|
**Run a single Python test:**
|
|
```bash
|
|
pytest services/pii/tests/test_file.py::test_function
|
|
```
|
|
|
|
**Proto prerequisite:** Run `make proto` before starting the PII service if `gen/` or `services/pii/gen/` is missing — the service will start but reject all gRPC requests otherwise.
|
|
|
|
**Config override:** Any config key can be overridden via env var with the `VEYLANT_` prefix and `.` → `_` replacement. Example: `VEYLANT_SERVER_PORT=9090` overrides `server.port`.
|
|
|
|
**Auth config:** `auth.jwt_secret` (env: `VEYLANT_AUTH_JWT_SECRET`) and `auth.jwt_ttl_hours`. Login endpoint: `POST /v1/auth/login` (public). Dev credentials: `admin@veylant.dev` / `admin123`. Tokens are HS256-signed JWTs; users stored in `users` table with bcrypt password hashes (migration 000010).
|
|
|
|
**Provider configs:** LLM provider API keys are stored encrypted (AES-256-GCM) in the `provider_configs` table (migration 000011). CRUD via `GET|POST /v1/admin/providers`, `PUT|DELETE|POST-test /v1/admin/providers/{id}`. Adapters hot-reload on save/update without proxy restart (`router.UpdateAdapter()` / `RemoveAdapter()`).
|
|
|
|
**Tools required:** `buf` (`brew install buf`), `golang-migrate` (`brew install golang-migrate`), `golangci-lint`, Python 3.12, `black`, `ruff`.
|
|
|
|
**Tenant onboarding** (after `make dev`):
|
|
```bash
|
|
deploy/onboarding/onboard-tenant.sh # creates admin, seeds 4 routing templates, configures rate limits
|
|
deploy/onboarding/import-users.sh # bulk import from CSV (email, first_name, last_name, department, role)
|
|
```
|
|
|
|
## Development Mode Graceful Degradation
|
|
|
|
When `server.env=development`, the proxy degrades gracefully instead of crashing:
|
|
- **PostgreSQL unreachable** → routing engine and feature flags disabled; flag store uses in-memory fallback
|
|
- **ClickHouse unreachable** → audit logging disabled
|
|
- **PII service unreachable** → PII disabled if `pii.fail_open=true` (default)
|
|
|
|
In production (`server.env=production`), any of the above causes a fatal startup error.
|
|
|
|
## Key Technical Constraints
|
|
|
|
**Latency budget**: The entire PII pipeline (regex + NER + pseudonymization) must complete in **<50ms**. The PII gRPC call has a configurable timeout (`pii.timeout_ms`, default 100ms).
|
|
|
|
**Streaming (SSE)**: The proxy must flush SSE chunks without buffering. PII anonymization applies to the **request** before it's sent upstream — not to the streamed response. This is the most technically complex piece of the MVP.
|
|
|
|
**Multi-tenancy**: Logical isolation via PostgreSQL Row-Level Security. The app connects as role `veylant_app` and sets `app.tenant_id` per session. Superuser bypasses RLS (dev only).
|
|
|
|
**Immutable audit logs**: ClickHouse is append-only — no DELETE operations. Retention via TTL policies only. ClickHouse DDL is applied idempotently at startup from `migrations/clickhouse/`.
|
|
|
|
**Proxy Docker image**: Uses `distroless/static` — no shell, no `wget`. `CMD-SHELL` health checks in docker-compose cannot work for the proxy container; dependents use `condition: service_started` instead.
|
|
|
|
**Routing rule evaluation**: Rules are sorted ascending by `priority` (lower = evaluated first). All conditions within a rule are AND-joined. An empty `Conditions` slice is a catch-all. First match wins. Supported condition fields: `user.role`, `user.department`, `request.sensitivity`, `request.model`, `request.use_case`, `request.token_estimate`. Operators: `eq`, `neq`, `in`, `nin`, `gte`, `lte`, `contains`, `matches`.
|
|
|
|
## Conventions
|
|
|
|
**Go import ordering** (`goimports` with `local-prefixes: github.com/veylant/ia-gateway`): three groups — stdlib · external · `github.com/veylant/ia-gateway/internal/...`. `gen/` is excluded from all linters (generated code).
|
|
|
|
**Commits**: Conventional Commits (`feat:`, `fix:`, `chore:`) — used for automated changelog generation.
|
|
|
|
**API versioning**: `/v1/` prefix, OpenAI-compatible format (`/v1/chat/completions`) so existing OpenAI SDK clients work without modification.
|
|
|
|
**LLM Provider Adapters**: Each provider implements `provider.Adapter` (`Send()`, `Stream()`, `Validate()`, `HealthCheck()`). Add new providers by implementing this interface in `internal/provider/<name>/`.
|
|
|
|
**Error handling**: Go modules use typed errors with `errors.Wrap`. The proxy always returns errors in OpenAI JSON format (`type`, `message`, `code`).
|
|
|
|
**Feature flags**: PostgreSQL table (`feature_flags`) + in-memory fallback when DB is unavailable. No external service.
|
|
|
|
**OpenAPI docs**: Generated from swaggo annotations — never write API docs by hand.
|
|
|
|
**Testing split**: 70% unit (`testing` + `testify` / `pytest`) · 20% integration (`testcontainers` for PG/ClickHouse/Redis, lives in `test/integration/`, requires `//go:build integration` tag) · 10% E2E (Playwright for UI). Tests are written in parallel with each module, not deferred.
|
|
|
|
**CI coverage thresholds**: Go internal packages must maintain ≥80% coverage; Python PII service ≥75%. NER tests (`test_ner.py`) are excluded from CI because `fr_core_news_lg` (~600MB) is only available in the Docker build.
|
|
|
|
## Custom Semgrep Rules (`.semgrep.yml`)
|
|
|
|
These are enforced in CI and represent project-specific guardrails:
|
|
- **`context.Background()` in HTTP handlers** → use `r.Context()` to propagate tenant context and cancellation.
|
|
- **SQL string concatenation** (`db.QueryContext(ctx, query+var)` or `fmt.Sprintf`) → use parameterized queries (`$1, $2, ...`).
|
|
- **Sensitive fields in logs** (`zap.String("password"|"api_key"|"token"|"secret"|"Authorization"|"email"|"prompt", ...)`) → use redaction helpers.
|
|
- **Hardcoded API keys** (string literals starting with `sk-`) → load from env or Vault.
|
|
- **`json.NewDecoder(r.Body).Decode()`** without `http.MaxBytesReader` → wrap body first.
|
|
- **Python `eval()`/`exec()`** on variables → never evaluate user-supplied data.
|
|
|
|
## Security Patterns
|
|
|
|
- Zero Trust network, mTLS between services, TLS 1.3 externally
|
|
- All sensitive fields encrypted at application level (AES-256-GCM)
|
|
- API keys stored as SHA-256 hashes only; prefix kept for display (e.g. `sk-vyl_ab12cd34`)
|
|
- RBAC roles: `admin`, `manager`, `user`, `auditor` — per-model and per-department permissions. `admin`/`manager` have unrestricted model access; `user` is limited to `rbac.user_allowed_models`; `auditor` cannot call `/v1/chat/completions` by default.
|
|
- Audit-of-the-audit: all accesses to audit logs are themselves logged
|
|
- CI pipeline (`.github/workflows/ci.yml`): Go build/test/lint, Python format/lint/test, Semgrep SAST, Trivy container scan (CRITICAL/HIGH blocking), gitleaks, OWASP ZAP DAST (non-blocking, main only), k6 smoke test + blue/green Helm staging deploy (main only)
|
|
- Release pipeline (`.github/workflows/release.yml`, on `v*` tag): multi-arch Docker image (amd64/arm64) → GHCR, Helm chart → GHCR OCI, GitHub Release with notes extracted from CHANGELOG.md
|
|
|
|
## MVP Scope (V1)
|
|
|
|
In scope: AI proxy, PII anonymization + pseudonymization, intelligent routing engine, audit logs, RBAC, React dashboard, GDPR Article 30 registry, AI Act risk classification, provider configuration wizard, integrated playground (prompt test with PII visualization).
|
|
|
|
Out of scope (V2+): ML anomaly detection, Shadow AI discovery, physical multi-tenant isolation, native SDKs, SIEM integrations.
|