CLAUDE.md

This file provides guidance to Claude Code (claude.ai/code) when working with code in this repository.

Overview

JARVIS is an Electron-based desktop voice assistant with a particle animation UI. It listens for the wake word "Jarvis", captures a voice command, sends it to the Claude CLI for processing (with Bash tool access), and speaks the response aloud using macOS TTS. No build step, no TypeScript, no bundler — vanilla JavaScript + Electron.

Running the App

npm install    # first time only
npm start      # launches Electron

The app requires:

macOS (uses the say command for TTS)
Claude CLI installed and accessible in PATH (claude command)
whisper-cpp installed (brew install whisper-cpp) — provides whisper-server
A GGML model at ~/whisper-models/ggml-base.bin (override with JARVIS_WHISPER_MODEL env var). Download: curl -L -o ~/whisper-models/ggml-base.bin https://huggingface.co/ggerganov/whisper.cpp/resolve/main/ggml-base.bin

Architecture

The app has three layers:

main.js — Electron main process

Starts a local HTTP server on port 52736 (serves the renderer — required for getUserMedia to work in a secure context, not file://)
Spawns whisper-server on port 52737 at startup (loads the GGML model once; killed on before-quit)
Handles IPC: askClaude, speak, stopSpeak, showContextMenu, whisperUrl
Calls the Claude CLI via execFile with --model opus --allowed-tools Bash --dangerously-skip-permissions, cwd set to $HOME
Maintains conversation history (max 20 messages) in memory
Detects French vs. English in responses to pick the say voice (Thomas vs. Alex)

preload.js — context bridge

Exposes window.jarvis.{askClaude, speak, stopSpeak, showContextMenu, whisperUrl} with context isolation.

renderer.js — UI + voice pipeline

JarvisVisualizer: Canvas-based particle animation. States: idle (cyan), listening (green), thinking (amber), speaking (light blue).
AudioPipeline: getUserMedia → AudioContext({sampleRate: 16000}) → ScriptProcessorNode delivers Float32 frames.
encodeWAV / transcribe: encodes Float32 PCM to 16-bit WAV Blob and POSTs to http://127.0.0.1:52737/inference (multipart file, response_format=text).
JarvisController: state machine (idle/listening/thinking/speaking) driving the full pipeline.

Note: We do NOT use webkitSpeechRecognition — it's broken in Electron (missing Google API key → network error). All STT goes through local whisper-server.

Voice Pipeline Flow

Idle: every 1.2s, take the last 2.2s from a rolling 3s ring buffer; if RMS above floor, POST to whisper-server. If transcription matches /\bjarvis\b/i → enter listening with the trailing text as inline command.
Listening: accumulate Float32 frames into cmdChunks. Per-frame RMS drives a VAD: after speech onset, 1.5s of silence (or 12s max) → finalize.
Finalize: concat chunks, encode WAV, transcribe once, combine with inline → Claude CLI → say.
Audio capture is gated off during thinking/speaking so JARVIS never hears its own voice. Ring buffer is cleared on entry to listening and on return to idle.

Key Constants (in `main.js`)

Claude model alias: opus
CLI timeout: 120s, output buffer: 2MB
Conversation history cap: 20 items
Local HTTP server port: 52736
Whisper server port: 52737
Whisper model path: $JARVIS_WHISPER_MODEL or ~/whisper-models/ggml-base.bin

Making Changes

System prompt / personality: Edit buildPrompt() in main.js
Claude model or CLI flags: Edit the execFile call in askClaude() in main.js
Wake word or silence timeout: Edit _startWakeLoop() / _listenContinuous() in renderer.js
Visual states or animation: Edit JarvisVisualizer in renderer.js
Restart npm start after any change to see the effect (no hot reload)

3.9 KiB Raw Blame History