| .. | ||
| k6-load-test.js | ||
| load_test.js | ||
| README.md | ||
Veylant IA — Load Tests (k6)
Performance tests for the Veylant proxy using k6.
Prerequisites
brew install k6 # macOS
# or: https://k6.io/docs/getting-started/installation/
The proxy must be running: make dev (or point VEYLANT_URL at staging).
Scripts
| Script | Description |
|---|---|
k6-load-test.js |
Multi-scenario script (smoke / load / stress / soak) — use this |
load_test.js |
Sprint 10 single-scenario script (1 000 VUs, 8 min) — kept for reference |
Running tests
Via Makefile (recommended)
make load-test # smoke scenario (CI default)
make load-test SCENARIO=load # 50 VUs, 5 min steady state
make load-test SCENARIO=stress # 0→200 VUs, find breaking point
make load-test SCENARIO=soak # 20 VUs, 30 min (detect memory leaks)
Via k6 directly
# Basic (load scenario, local proxy)
k6 run \
--env VEYLANT_URL=http://localhost:8090 \
--env VEYLANT_TOKEN=dev-token \
--env SCENARIO=load \
test/k6/k6-load-test.js
# Against staging
k6 run \
--env VEYLANT_URL=https://api-staging.veylant.ai \
--env VEYLANT_TOKEN=$STAGING_JWT \
--env SCENARIO=stress \
test/k6/k6-load-test.js
# With k6 Cloud output (requires K6_CLOUD_TOKEN)
k6 run --out cloud test/k6/k6-load-test.js
Scenarios
| Scenario | VUs | Duration | Purpose |
|---|---|---|---|
smoke |
1 | 1 min | Sanity check — runs in CI on every push |
load |
0→50→0 | 7 min | Steady-state: validates SLAs under normal load |
stress |
0→200 | 7 min | Find the breaking point (target: > 200 VUs) |
soak |
20 | 30 min | Detect memory leaks / slow GC under sustained load |
Thresholds (SLAs)
| Metric | Target |
|---|---|
http_req_duration p(99) |
< 500ms |
http_req_duration p(95) |
< 200ms |
http_req_failed |
< 1% |
veylant_chat_latency_ms p(99) |
< 500ms |
veylant_error_rate |
< 1% |
Interpreting results
http_req_duration— end-to-end latency including upstream LLM. For Ollama models on local hardware this includes model inference time.veylant_error_rate— tracks application-level errors (non-200 or missingchoicesarray).veylant_chat_errors/veylant_health_errors— absolute error counts per endpoint type.
A passing run looks like:
✓ http_req_duration.............: avg=42ms p(95)=112ms p(99)=287ms
✓ http_req_failed...............: 0.00%
✓ veylant_error_rate............: 0.00%
CI integration
The smoke scenario runs automatically in the GitHub Actions load-test job (see .github/workflows/ci.yml). The job uses a mock Ollama that returns static responses to ensure deterministic latency.