# Veylant IA — Load Tests (k6) Performance tests for the Veylant proxy using [k6](https://k6.io). ## Prerequisites ```bash brew install k6 # macOS # or: https://k6.io/docs/getting-started/installation/ ``` The proxy must be running: `make dev` (or point `VEYLANT_URL` at staging). ## Scripts | Script | Description | |--------|-------------| | `k6-load-test.js` | Multi-scenario script (smoke / load / stress / soak) — **use this** | | `load_test.js` | Sprint 10 single-scenario script (1 000 VUs, 8 min) — kept for reference | ## Running tests ### Via Makefile (recommended) ```bash make load-test # smoke scenario (CI default) make load-test SCENARIO=load # 50 VUs, 5 min steady state make load-test SCENARIO=stress # 0→200 VUs, find breaking point make load-test SCENARIO=soak # 20 VUs, 30 min (detect memory leaks) ``` ### Via k6 directly ```bash # Basic (load scenario, local proxy) k6 run \ --env VEYLANT_URL=http://localhost:8090 \ --env VEYLANT_TOKEN=dev-token \ --env SCENARIO=load \ test/k6/k6-load-test.js # Against staging k6 run \ --env VEYLANT_URL=https://api-staging.veylant.ai \ --env VEYLANT_TOKEN=$STAGING_JWT \ --env SCENARIO=stress \ test/k6/k6-load-test.js # With k6 Cloud output (requires K6_CLOUD_TOKEN) k6 run --out cloud test/k6/k6-load-test.js ``` ## Scenarios | Scenario | VUs | Duration | Purpose | |----------|-----|----------|---------| | `smoke` | 1 | 1 min | Sanity check — runs in CI on every push | | `load` | 0→50→0 | 7 min | Steady-state: validates SLAs under normal load | | `stress` | 0→200 | 7 min | Find the breaking point (target: > 200 VUs) | | `soak` | 20 | 30 min | Detect memory leaks / slow GC under sustained load | ## Thresholds (SLAs) | Metric | Target | |--------|--------| | `http_req_duration p(99)` | < 500ms | | `http_req_duration p(95)` | < 200ms | | `http_req_failed` | < 1% | | `veylant_chat_latency_ms p(99)` | < 500ms | | `veylant_error_rate` | < 1% | ## Interpreting results - **`http_req_duration`** — end-to-end latency including upstream LLM. For Ollama models on local hardware this includes model inference time. - **`veylant_error_rate`** — tracks application-level errors (non-200 or missing `choices` array). - **`veylant_chat_errors`** / **`veylant_health_errors`** — absolute error counts per endpoint type. A passing run looks like: ``` ✓ http_req_duration.............: avg=42ms p(95)=112ms p(99)=287ms ✓ http_req_failed...............: 0.00% ✓ veylant_error_rate............: 0.00% ``` ## CI integration The `smoke` scenario runs automatically in the GitHub Actions `load-test` job (see `.github/workflows/ci.yml`). The job uses a mock Ollama that returns static responses to ensure deterministic latency.