jarvis/CLAUDE.md
2026-04-13 22:01:33 +02:00

3.9 KiB

CLAUDE.md

This file provides guidance to Claude Code (claude.ai/code) when working with code in this repository.

Overview

JARVIS is an Electron-based desktop voice assistant with a particle animation UI. It listens for the wake word "Jarvis", captures a voice command, sends it to the Claude CLI for processing (with Bash tool access), and speaks the response aloud using macOS TTS. No build step, no TypeScript, no bundler — vanilla JavaScript + Electron.

Running the App

npm install    # first time only
npm start      # launches Electron

The app requires:

  • macOS (uses the say command for TTS)
  • Claude CLI installed and accessible in PATH (claude command)
  • whisper-cpp installed (brew install whisper-cpp) — provides whisper-server
  • A GGML model at ~/whisper-models/ggml-base.bin (override with JARVIS_WHISPER_MODEL env var). Download: curl -L -o ~/whisper-models/ggml-base.bin https://huggingface.co/ggerganov/whisper.cpp/resolve/main/ggml-base.bin

Architecture

The app has three layers:

main.js — Electron main process

  • Starts a local HTTP server on port 52736 (serves the renderer — required for getUserMedia to work in a secure context, not file://)
  • Spawns whisper-server on port 52737 at startup (loads the GGML model once; killed on before-quit)
  • Handles IPC: askClaude, speak, stopSpeak, showContextMenu, whisperUrl
  • Calls the Claude CLI via execFile with --model opus --allowed-tools Bash --dangerously-skip-permissions, cwd set to $HOME
  • Maintains conversation history (max 20 messages) in memory
  • Detects French vs. English in responses to pick the say voice (Thomas vs. Alex)

preload.js — context bridge

  • Exposes window.jarvis.{askClaude, speak, stopSpeak, showContextMenu, whisperUrl} with context isolation.

renderer.js — UI + voice pipeline

  • JarvisVisualizer: Canvas-based particle animation. States: idle (cyan), listening (green), thinking (amber), speaking (light blue).
  • AudioPipeline: getUserMediaAudioContext({sampleRate: 16000})ScriptProcessorNode delivers Float32 frames.
  • encodeWAV / transcribe: encodes Float32 PCM to 16-bit WAV Blob and POSTs to http://127.0.0.1:52737/inference (multipart file, response_format=text).
  • JarvisController: state machine (idle/listening/thinking/speaking) driving the full pipeline.

Note: We do NOT use webkitSpeechRecognition — it's broken in Electron (missing Google API key → network error). All STT goes through local whisper-server.

Voice Pipeline Flow

  1. Idle: every 1.2s, take the last 2.2s from a rolling 3s ring buffer; if RMS above floor, POST to whisper-server. If transcription matches /\bjarvis\b/i → enter listening with the trailing text as inline command.
  2. Listening: accumulate Float32 frames into cmdChunks. Per-frame RMS drives a VAD: after speech onset, 1.5s of silence (or 12s max) → finalize.
  3. Finalize: concat chunks, encode WAV, transcribe once, combine with inline → Claude CLI → say.
  4. Audio capture is gated off during thinking/speaking so JARVIS never hears its own voice. Ring buffer is cleared on entry to listening and on return to idle.

Key Constants (in main.js)

  • Claude model alias: opus
  • CLI timeout: 120s, output buffer: 2MB
  • Conversation history cap: 20 items
  • Local HTTP server port: 52736
  • Whisper server port: 52737
  • Whisper model path: $JARVIS_WHISPER_MODEL or ~/whisper-models/ggml-base.bin

Making Changes

  • System prompt / personality: Edit buildPrompt() in main.js
  • Claude model or CLI flags: Edit the execFile call in askClaude() in main.js
  • Wake word or silence timeout: Edit _startWakeLoop() / _listenContinuous() in renderer.js
  • Visual states or animation: Edit JarvisVisualizer in renderer.js
  • Restart npm start after any change to see the effect (no hot reload)