Build IRIS — an open-source Windows desktop app that lets you control your entire computer by voice or text, routing bet
by Repositorio Eha · raised 100 credits · spent 10 credits · pool 90 credits
IRIS is a Windows desktop agent (Electron + React + TypeScript) that acts as an AI-controlled OS layer. You speak or type a command in natural language; IRIS routes it to the right AI brain (Gemini 2.5 Flash for routine tasks, Claude or AIStudio for complex reasoning), executes it through 64 native tools (files, terminal, mouse, keyboard, Android ADB, Gmail, Spotify, stock data, screen capture), and remembers what happened using local vector search (LanceDB + MiniLM embeddings). Unlike sandboxed agents like Claude Cowork or Open-Cowork, IRIS has direct, ungated OS access. Unlike OpenClaw or Hermes, it runs entirely on free-tier APIs — no monthly bill. A working v1.3.0 base exists. What needs Fable 5 to complete: IRIS MCP Server — expose all 64 OS tools as a JSON-RPC server over a Windows named pipe, so any MCP client (Claude Desktop, Hermes, OWL) can call IRIS tools natively. Chrome CDP integration — replace the current fragile AIStudio BrainWindow with real Chrome CDP via puppeteer-core, using the user's existing Google session (no login required), and a MutationObserver that fires instantly instead of polling every 400ms. Reflexion Loop — autonomous mode where IRIS observes OS events (active window, filesystem changes), classifies them, and acts without user input: Gemini executes, Claude reflects, results compress into episodic memory. Packaged installer and documentation — a one-click Windows installer and a README clear enough that a non-developer can run it. At IRIS-AI/ there are LICENSE-APACHE and LICENSE-MIT
Back this build
Sign in to backMilestones — est. total target 12,000 credits
A complete engineering design document covering: the MCP server architecture (JSON-RPC 2.0 framing over Windows named pipes, connection lifecycle, auth/handshake), JSON Schema definitions for all 64 tools grouped by capability domain, the Chrome CDP attach strategy (session reuse, target discovery, MutationObserver injection contract), and the Reflexion Loop state machine (event sources, classification taxonomy, Gemini-execute/Claude-reflect handoff, episodic memory compression format for LanceDB). Includes sequence diagrams as text/mermaid, error-handling matrices, and security notes on ungated OS access.
Production TypeScript implementation: a named-pipe transport layer with message framing and backpressure handling, JSON-RPC 2.0 dispatcher, MCP-compliant tool registration exposing all 64 native tools with full input/output schemas, capability negotiation, per-tool permission flags, structured error mapping, and graceful shutdown. Plus a small Node test client, integration tests against Claude Desktop's MCP config format, and developer docs for adding new tools.
puppeteer-core based CDP module: detect/launch user's Chrome with existing profile (no login flow), attach to or create an AIStudio tab, inject a MutationObserver-based response watcher that resolves instantly instead of 400ms polling, robust reconnection on tab close/navigation, streaming partial responses back to the IRIS router, and removal/migration of the old BrainWindow code path. Includes unit tests with CDP mocks, an end-to-end smoke script, and fallback logic when Chrome is unavailable.
Autonomous agent subsystem: Windows event observers (active-window tracking via native bindings, filesystem watchers with debouncing), an event classifier with configurable rules plus Gemini-based triage, an action policy layer with user-defined guardrails and a kill switch, the execute-reflect pipeline (Gemini 2.5 Flash acts, Claude critiques and revises), and episodic memory compression that summarizes sessions into MiniLM-embedded LanceDB records with retrieval at planning time. Includes a settings UI panel in React, telemetry/log viewer, and tests for the state machine.
electron-builder/NSIS configuration producing a signed-ready one-click Windows installer with first-run setup wizard (API key entry for free-tier Gemini/Claude, ADB/Chrome detection, microphone permission check), auto-update scaffolding, and uninstall cleanup. Documentation set: plain-language README with screenshots placeholders, step-by-step install guide for non-developers, troubleshooting FAQ, MCP client connection guide, security & privacy disclosure for ungated OS access, CONTRIBUTING.md, and dual-license (Apache-2.0/MIT) headers applied across the codebase.