3.9 KiB
3.9 KiB
CLAUDE.md
This file provides guidance to Claude Code (claude.ai/code) when working with code in this repository.
Overview
JARVIS is an Electron-based desktop voice assistant with a particle animation UI. It listens for the wake word "Jarvis", captures a voice command, sends it to the Claude CLI for processing (with Bash tool access), and speaks the response aloud using macOS TTS. No build step, no TypeScript, no bundler — vanilla JavaScript + Electron.
Running the App
npm install # first time only
npm start # launches Electron
The app requires:
- macOS (uses the
saycommand for TTS) - Claude CLI installed and accessible in PATH (
claudecommand) whisper-cppinstalled (brew install whisper-cpp) — provideswhisper-server- A GGML model at
~/whisper-models/ggml-base.bin(override withJARVIS_WHISPER_MODELenv var). Download:curl -L -o ~/whisper-models/ggml-base.bin https://huggingface.co/ggerganov/whisper.cpp/resolve/main/ggml-base.bin
Architecture
The app has three layers:
main.js — Electron main process
- Starts a local HTTP server on port 52736 (serves the renderer — required for
getUserMediato work in a secure context, notfile://) - Spawns
whisper-serveron port 52737 at startup (loads the GGML model once; killed onbefore-quit) - Handles IPC:
askClaude,speak,stopSpeak,showContextMenu,whisperUrl - Calls the Claude CLI via
execFilewith--model opus --allowed-tools Bash --dangerously-skip-permissions,cwdset to$HOME - Maintains conversation history (max 20 messages) in memory
- Detects French vs. English in responses to pick the
sayvoice (Thomas vs. Alex)
preload.js — context bridge
- Exposes
window.jarvis.{askClaude, speak, stopSpeak, showContextMenu, whisperUrl}with context isolation.
renderer.js — UI + voice pipeline
JarvisVisualizer: Canvas-based particle animation. States: idle (cyan), listening (green), thinking (amber), speaking (light blue).AudioPipeline:getUserMedia→AudioContext({sampleRate: 16000})→ScriptProcessorNodedelivers Float32 frames.encodeWAV/transcribe: encodes Float32 PCM to 16-bit WAV Blob and POSTs tohttp://127.0.0.1:52737/inference(multipartfile,response_format=text).JarvisController: state machine (idle/listening/thinking/speaking) driving the full pipeline.
Note: We do NOT use webkitSpeechRecognition — it's broken in Electron (missing Google API key → network error). All STT goes through local whisper-server.
Voice Pipeline Flow
- Idle: every 1.2s, take the last 2.2s from a rolling 3s ring buffer; if RMS above floor, POST to whisper-server. If transcription matches
/\bjarvis\b/i→ enter listening with the trailing text as inline command. - Listening: accumulate Float32 frames into
cmdChunks. Per-frame RMS drives a VAD: after speech onset, 1.5s of silence (or 12s max) → finalize. - Finalize: concat chunks, encode WAV, transcribe once, combine with inline → Claude CLI →
say. - Audio capture is gated off during
thinking/speakingso JARVIS never hears its own voice. Ring buffer is cleared on entry to listening and on return to idle.
Key Constants (in main.js)
- Claude model alias:
opus - CLI timeout: 120s, output buffer: 2MB
- Conversation history cap: 20 items
- Local HTTP server port: 52736
- Whisper server port: 52737
- Whisper model path:
$JARVIS_WHISPER_MODELor~/whisper-models/ggml-base.bin
Making Changes
- System prompt / personality: Edit
buildPrompt()inmain.js - Claude model or CLI flags: Edit the
execFilecall inaskClaude()inmain.js - Wake word or silence timeout: Edit
_startWakeLoop()/_listenContinuous()inrenderer.js - Visual states or animation: Edit
JarvisVisualizerinrenderer.js - Restart
npm startafter any change to see the effect (no hot reload)