169 lines
4.3 KiB
Markdown
169 lines
4.3 KiB
Markdown
# Veylant IA Proxy — Developer Integration Guide
|
|
|
|
Get up and running in under 30 minutes. The proxy is fully compatible with the OpenAI API — change one URL and your existing code works.
|
|
|
|
## Prerequisites
|
|
|
|
- Your Veylant IA proxy URL (e.g. `https://api.veylant.ai` or `http://localhost:8090` for local dev)
|
|
- A JWT token issued by your organisation's Keycloak instance
|
|
|
|
## 1. Change the base URL
|
|
|
|
### Python (openai SDK)
|
|
|
|
```python
|
|
from openai import OpenAI
|
|
|
|
client = OpenAI(
|
|
api_key="your-jwt-token", # pass your JWT as the API key
|
|
base_url="https://api.veylant.ai/v1",
|
|
)
|
|
|
|
response = client.chat.completions.create(
|
|
model="gpt-4o",
|
|
messages=[{"role": "user", "content": "Summarise the Q3 report."}],
|
|
)
|
|
print(response.choices[0].message.content)
|
|
```
|
|
|
|
### curl
|
|
|
|
```bash
|
|
curl -X POST https://api.veylant.ai/v1/chat/completions \
|
|
-H "Authorization: Bearer $VEYLANT_TOKEN" \
|
|
-H "Content-Type: application/json" \
|
|
-d '{
|
|
"model": "gpt-4o",
|
|
"messages": [{"role": "user", "content": "Hello!"}]
|
|
}'
|
|
```
|
|
|
|
### Node.js (openai SDK)
|
|
|
|
```javascript
|
|
import OpenAI from 'openai';
|
|
|
|
const client = new OpenAI({
|
|
apiKey: process.env.VEYLANT_TOKEN,
|
|
baseURL: 'https://api.veylant.ai/v1',
|
|
});
|
|
|
|
const response = await client.chat.completions.create({
|
|
model: 'gpt-4o',
|
|
messages: [{ role: 'user', content: 'Hello!' }],
|
|
});
|
|
console.log(response.choices[0].message.content);
|
|
```
|
|
|
|
## 2. Authentication
|
|
|
|
Every request to `/v1/*` must include a `Bearer` JWT in the `Authorization` header:
|
|
|
|
```
|
|
Authorization: Bearer <your-jwt-token>
|
|
```
|
|
|
|
Tokens are issued by your organisation's Keycloak instance. Contact your admin to obtain one.
|
|
|
|
The token must contain:
|
|
- `tenant_id` — your organisation's identifier
|
|
- `user_id` — your user identifier
|
|
- `roles` — at least one of `admin`, `manager`, `user`, `auditor`
|
|
|
|
## 3. Streaming
|
|
|
|
Streaming works identically to the OpenAI API — set `stream: true`:
|
|
|
|
```python
|
|
stream = client.chat.completions.create(
|
|
model="gpt-4o",
|
|
messages=[{"role": "user", "content": "Tell me a story."}],
|
|
stream=True,
|
|
)
|
|
for chunk in stream:
|
|
print(chunk.choices[0].delta.content or "", end="", flush=True)
|
|
```
|
|
|
|
The proxy forwards SSE chunks from the upstream provider without buffering.
|
|
|
|
## 4. PII Anonymization (automatic)
|
|
|
|
PII anonymization is automatic and transparent. Before your prompt reaches the upstream provider:
|
|
|
|
1. Named entities (names, emails, phone numbers, IBAN, etc.) are detected
|
|
2. Entities are replaced with pseudonyms (e.g. `Jean Dupont` becomes `[PERSON_1]`)
|
|
3. The upstream response is de-pseudonymized before being returned to you
|
|
|
|
You receive the original names back in the response — the upstream never sees them.
|
|
|
|
To disable PII for your tenant, ask your admin to run:
|
|
```
|
|
PUT /v1/admin/flags/pii_enabled {"enabled": false}
|
|
```
|
|
|
|
## 5. Supported Models
|
|
|
|
The proxy routes to different providers based on model prefix:
|
|
|
|
| Model prefix | Provider |
|
|
|---|---|
|
|
| `gpt-*`, `o1-*`, `o3-*` | OpenAI |
|
|
| `claude-*` | Anthropic |
|
|
| `mistral-*`, `mixtral-*` | Mistral |
|
|
| `llama*`, `phi*`, `qwen*` | Ollama (self-hosted) |
|
|
|
|
Your admin may have configured custom routing rules that override this behaviour.
|
|
|
|
## 6. Error Codes
|
|
|
|
All errors follow the OpenAI error format:
|
|
|
|
```json
|
|
{
|
|
"error": {
|
|
"type": "authentication_error",
|
|
"message": "missing or invalid token",
|
|
"code": null
|
|
}
|
|
}
|
|
```
|
|
|
|
| HTTP Status | Error type | Cause |
|
|
|---|---|---|
|
|
| `400` | `invalid_request_error` | Malformed JSON or missing required fields |
|
|
| `401` | `authentication_error` | Missing or expired JWT |
|
|
| `403` | `permission_error` | Model not allowed for your role (RBAC) |
|
|
| `429` | `rate_limit_error` | Too many requests — wait and retry |
|
|
| `502` | `upstream_error` | The upstream LLM provider returned an error |
|
|
|
|
## 7. Rate Limits
|
|
|
|
Limits are configured per-tenant. The default is 6 000 requests/minute with a burst of 1 000. Your admin can adjust this via `PUT /v1/admin/rate-limits/{tenant_id}`.
|
|
|
|
When you hit the limit you receive:
|
|
```http
|
|
HTTP/1.1 429 Too Many Requests
|
|
Retry-After: 1
|
|
```
|
|
|
|
## 8. Health Check
|
|
|
|
Verify the proxy is reachable without authentication:
|
|
|
|
```bash
|
|
curl https://api.veylant.ai/healthz
|
|
# {"status":"ok"}
|
|
```
|
|
|
|
## 9. API Reference
|
|
|
|
Full interactive documentation is available at:
|
|
```
|
|
https://api.veylant.ai/docs
|
|
```
|
|
|
|
Or download the raw OpenAPI 3.1 spec:
|
|
```bash
|
|
curl https://api.veylant.ai/docs/openapi.yaml -o openapi.yaml
|
|
```
|