jarvis/CLAUDE.md
2026-04-13 22:01:33 +02:00

68 lines
3.9 KiB
Markdown

# CLAUDE.md
This file provides guidance to Claude Code (claude.ai/code) when working with code in this repository.
## Overview
JARVIS is an Electron-based desktop voice assistant with a particle animation UI. It listens for the wake word "Jarvis", captures a voice command, sends it to the Claude CLI for processing (with Bash tool access), and speaks the response aloud using macOS TTS. No build step, no TypeScript, no bundler — vanilla JavaScript + Electron.
## Running the App
```bash
npm install # first time only
npm start # launches Electron
```
The app requires:
- macOS (uses the `say` command for TTS)
- Claude CLI installed and accessible in PATH (`claude` command)
- `whisper-cpp` installed (`brew install whisper-cpp`) — provides `whisper-server`
- A GGML model at `~/whisper-models/ggml-base.bin` (override with `JARVIS_WHISPER_MODEL` env var). Download: `curl -L -o ~/whisper-models/ggml-base.bin https://huggingface.co/ggerganov/whisper.cpp/resolve/main/ggml-base.bin`
## Architecture
The app has three layers:
**`main.js` — Electron main process**
- Starts a local HTTP server on port 52736 (serves the renderer — required for `getUserMedia` to work in a secure context, not `file://`)
- Spawns `whisper-server` on port 52737 at startup (loads the GGML model once; killed on `before-quit`)
- Handles IPC: `askClaude`, `speak`, `stopSpeak`, `showContextMenu`, `whisperUrl`
- Calls the Claude CLI via `execFile` with `--model opus --allowed-tools Bash --dangerously-skip-permissions`, `cwd` set to `$HOME`
- Maintains conversation history (max 20 messages) in memory
- Detects French vs. English in responses to pick the `say` voice (Thomas vs. Alex)
**`preload.js` — context bridge**
- Exposes `window.jarvis.{askClaude, speak, stopSpeak, showContextMenu, whisperUrl}` with context isolation.
**`renderer.js` — UI + voice pipeline**
- `JarvisVisualizer`: Canvas-based particle animation. States: idle (cyan), listening (green), thinking (amber), speaking (light blue).
- `AudioPipeline`: `getUserMedia``AudioContext({sampleRate: 16000})``ScriptProcessorNode` delivers Float32 frames.
- `encodeWAV` / `transcribe`: encodes Float32 PCM to 16-bit WAV Blob and POSTs to `http://127.0.0.1:52737/inference` (multipart `file`, `response_format=text`).
- `JarvisController`: state machine (`idle`/`listening`/`thinking`/`speaking`) driving the full pipeline.
**Note**: We do NOT use `webkitSpeechRecognition` — it's broken in Electron (missing Google API key → `network` error). All STT goes through local whisper-server.
## Voice Pipeline Flow
1. **Idle**: every 1.2s, take the last 2.2s from a rolling 3s ring buffer; if RMS above floor, POST to whisper-server. If transcription matches `/\bjarvis\b/i` → enter listening with the trailing text as inline command.
2. **Listening**: accumulate Float32 frames into `cmdChunks`. Per-frame RMS drives a VAD: after speech onset, 1.5s of silence (or 12s max) → finalize.
3. **Finalize**: concat chunks, encode WAV, transcribe once, combine with inline → Claude CLI → `say`.
4. Audio capture is gated off during `thinking`/`speaking` so JARVIS never hears its own voice. Ring buffer is cleared on entry to listening and on return to idle.
## Key Constants (in `main.js`)
- Claude model alias: `opus`
- CLI timeout: 120s, output buffer: 2MB
- Conversation history cap: 20 items
- Local HTTP server port: 52736
- Whisper server port: 52737
- Whisper model path: `$JARVIS_WHISPER_MODEL` or `~/whisper-models/ggml-base.bin`
## Making Changes
- **System prompt / personality**: Edit `buildPrompt()` in `main.js`
- **Claude model or CLI flags**: Edit the `execFile` call in `askClaude()` in `main.js`
- **Wake word or silence timeout**: Edit `_startWakeLoop()` / `_listenContinuous()` in `renderer.js`
- **Visual states or animation**: Edit `JarvisVisualizer` in `renderer.js`
- Restart `npm start` after any change to see the effect (no hot reload)