Veylant IA — Load Tests (k6)

Performance tests for the Veylant proxy using k6.

Prerequisites

brew install k6          # macOS
# or: https://k6.io/docs/getting-started/installation/

The proxy must be running: make dev (or point VEYLANT_URL at staging).

Scripts

Script	Description
`k6-load-test.js`	Multi-scenario script (smoke / load / stress / soak) — use this
`load_test.js`	Sprint 10 single-scenario script (1 000 VUs, 8 min) — kept for reference

Running tests

Via Makefile (recommended)

make load-test                       # smoke scenario (CI default)
make load-test SCENARIO=load         # 50 VUs, 5 min steady state
make load-test SCENARIO=stress       # 0→200 VUs, find breaking point
make load-test SCENARIO=soak         # 20 VUs, 30 min (detect memory leaks)

Via k6 directly

# Basic (load scenario, local proxy)
k6 run \
  --env VEYLANT_URL=http://localhost:8090 \
  --env VEYLANT_TOKEN=dev-token \
  --env SCENARIO=load \
  test/k6/k6-load-test.js

# Against staging
k6 run \
  --env VEYLANT_URL=https://api-staging.veylant.ai \
  --env VEYLANT_TOKEN=$STAGING_JWT \
  --env SCENARIO=stress \
  test/k6/k6-load-test.js

# With k6 Cloud output (requires K6_CLOUD_TOKEN)
k6 run --out cloud test/k6/k6-load-test.js

Scenarios

Scenario	VUs	Duration	Purpose
`smoke`	1	1 min	Sanity check — runs in CI on every push
`load`	0→50→0	7 min	Steady-state: validates SLAs under normal load
`stress`	0→200	7 min	Find the breaking point (target: > 200 VUs)
`soak`	20	30 min	Detect memory leaks / slow GC under sustained load

Thresholds (SLAs)

Metric	Target
`http_req_duration p(99)`	< 500ms
`http_req_duration p(95)`	< 200ms
`http_req_failed`	< 1%
`veylant_chat_latency_ms p(99)`	< 500ms
`veylant_error_rate`	< 1%

Interpreting results

http_req_duration — end-to-end latency including upstream LLM. For Ollama models on local hardware this includes model inference time.
veylant_error_rate — tracks application-level errors (non-200 or missing choices array).
veylant_chat_errors / veylant_health_errors — absolute error counts per endpoint type.

A passing run looks like:

✓ http_req_duration.............: avg=42ms p(95)=112ms p(99)=287ms
✓ http_req_failed...............: 0.00%
✓ veylant_error_rate............: 0.00%

CI integration

The smoke scenario runs automatically in the GitHub Actions load-test job (see .github/workflows/ci.yml). The job uses a mock Ollama that returns static responses to ensure deterministic latency.

2.7 KiB Raw Blame History