4.3 KiB
Veylant IA Proxy — Developer Integration Guide
Get up and running in under 30 minutes. The proxy is fully compatible with the OpenAI API — change one URL and your existing code works.
Prerequisites
- Your Veylant IA proxy URL (e.g.
https://api.veylant.aiorhttp://localhost:8090for local dev) - A JWT token issued by your organisation's Keycloak instance
1. Change the base URL
Python (openai SDK)
from openai import OpenAI
client = OpenAI(
api_key="your-jwt-token", # pass your JWT as the API key
base_url="https://api.veylant.ai/v1",
)
response = client.chat.completions.create(
model="gpt-4o",
messages=[{"role": "user", "content": "Summarise the Q3 report."}],
)
print(response.choices[0].message.content)
curl
curl -X POST https://api.veylant.ai/v1/chat/completions \
-H "Authorization: Bearer $VEYLANT_TOKEN" \
-H "Content-Type: application/json" \
-d '{
"model": "gpt-4o",
"messages": [{"role": "user", "content": "Hello!"}]
}'
Node.js (openai SDK)
import OpenAI from 'openai';
const client = new OpenAI({
apiKey: process.env.VEYLANT_TOKEN,
baseURL: 'https://api.veylant.ai/v1',
});
const response = await client.chat.completions.create({
model: 'gpt-4o',
messages: [{ role: 'user', content: 'Hello!' }],
});
console.log(response.choices[0].message.content);
2. Authentication
Every request to /v1/* must include a Bearer JWT in the Authorization header:
Authorization: Bearer <your-jwt-token>
Tokens are issued by your organisation's Keycloak instance. Contact your admin to obtain one.
The token must contain:
tenant_id— your organisation's identifieruser_id— your user identifierroles— at least one ofadmin,manager,user,auditor
3. Streaming
Streaming works identically to the OpenAI API — set stream: true:
stream = client.chat.completions.create(
model="gpt-4o",
messages=[{"role": "user", "content": "Tell me a story."}],
stream=True,
)
for chunk in stream:
print(chunk.choices[0].delta.content or "", end="", flush=True)
The proxy forwards SSE chunks from the upstream provider without buffering.
4. PII Anonymization (automatic)
PII anonymization is automatic and transparent. Before your prompt reaches the upstream provider:
- Named entities (names, emails, phone numbers, IBAN, etc.) are detected
- Entities are replaced with pseudonyms (e.g.
Jean Dupontbecomes[PERSON_1]) - The upstream response is de-pseudonymized before being returned to you
You receive the original names back in the response — the upstream never sees them.
To disable PII for your tenant, ask your admin to run:
PUT /v1/admin/flags/pii_enabled {"enabled": false}
5. Supported Models
The proxy routes to different providers based on model prefix:
| Model prefix | Provider |
|---|---|
gpt-*, o1-*, o3-* |
OpenAI |
claude-* |
Anthropic |
mistral-*, mixtral-* |
Mistral |
llama*, phi*, qwen* |
Ollama (self-hosted) |
Your admin may have configured custom routing rules that override this behaviour.
6. Error Codes
All errors follow the OpenAI error format:
{
"error": {
"type": "authentication_error",
"message": "missing or invalid token",
"code": null
}
}
| HTTP Status | Error type | Cause |
|---|---|---|
400 |
invalid_request_error |
Malformed JSON or missing required fields |
401 |
authentication_error |
Missing or expired JWT |
403 |
permission_error |
Model not allowed for your role (RBAC) |
429 |
rate_limit_error |
Too many requests — wait and retry |
502 |
upstream_error |
The upstream LLM provider returned an error |
7. Rate Limits
Limits are configured per-tenant. The default is 6 000 requests/minute with a burst of 1 000. Your admin can adjust this via PUT /v1/admin/rate-limits/{tenant_id}.
When you hit the limit you receive:
HTTP/1.1 429 Too Many Requests
Retry-After: 1
8. Health Check
Verify the proxy is reachable without authentication:
curl https://api.veylant.ai/healthz
# {"status":"ok"}
9. API Reference
Full interactive documentation is available at:
https://api.veylant.ai/docs
Or download the raw OpenAPI 3.1 spec:
curl https://api.veylant.ai/docs/openapi.yaml -o openapi.yaml