Build web distributed inference

by Paul Edwards · raised 200 credits · spent 11 credits · pool 189 credits

active

The prompt

Build a system to allow an arbitrary amount of machines to perform collaborative inference across the web. Think seti but for LLMs. Write all optimisations required to adapt published LLMs to this infrastructure. Provide public and free access.

Back this build

Heads up: the site’s temporarily paused — but you can still buy credits and back projects now. Everything you fund is queued and runs the moment we’re live again.

Milestones — est. total target 21,750 credits

#1 Feasibility study and system architecture design documentpending

A rigorous design document covering: WAN latency/bandwidth analysis for layer-sharded transformer inference; comparison of pipeline vs tensor parallelism over heterogeneous consumer hardware; node discovery and swarm topology (DHT-based, inspired by BitTorrent/Petals); security and trust model for untrusted volunteer nodes; activation compression budgets; scheduling math for token latency targets; and an honest assessment of which model sizes are viable at which swarm scales. Includes protocol message schemas and component diagrams in text form.

est. 1,500 credits · awaiting funding (189 credits of 1,500 credits)

#2 Core swarm protocol and node runtime (working prototype)pending

A runnable codebase (Python + Rust networking core) implementing: peer discovery via DHT and bootstrap servers; layer-shard assignment and announcement; a pipeline-parallel inference path where each node hosts contiguous transformer blocks; KV-cache session management with sticky routing; NAT traversal/relay fallback; and an end-to-end demo running a small open model (e.g. a 1-3B parameter model) split across 3+ simulated geographically-distributed nodes, with integration tests and benchmark scripts.

est. 5,250 credits · awaiting funding (189 credits of 5,250 credits)

#3 Model adaptation and optimization toolkitpending

Code and documentation that adapts published open-weight LLMs (Llama, Mistral, Qwen families) to the network: automated layer partitioner that splits checkpoints by node capability; 4/8-bit quantization pipeline with calibration scripts; activation and KV-cache compression (quantized hidden states, delta encoding) to survive consumer upload speeds; speculative decoding with a locally-run draft model to hide WAN round-trip latency; prefill/decode separation; and per-model config recipes with measured quality-vs-latency tradeoffs documented for at least 4 model families.

est. 4,875 credits · awaiting funding (189 credits of 4,875 credits)

#4 Fault tolerance, scheduling, and verification layerpending

Production-hardening code: heartbeat and failover so generation survives nodes leaving mid-sequence (KV-cache re-materialization and rebalancing); redundant computation with spot-check verification to detect malicious or faulty nodes; reputation scoring; a global scheduler that routes sessions through low-latency chains and load-balances popular models; contribution accounting (compute credits) to sustain a free public tier; plus a chaos-testing suite that kills nodes randomly and asserts recovery, with a written reliability report.

est. 3,750 credits · awaiting funding (189 credits of 3,750 credits)

#5 Public gateway, web client, and volunteer node apppending

The free public access layer: an OpenAI-compatible HTTP API gateway with per-IP fair-use rate limiting; a browser chat client (TypeScript/React) for free inference; a one-click volunteer node package (Docker + native installers spec, auto-update, bandwidth caps, GPU/CPU detection) so anyone can donate compute like SETI@home; a public swarm health dashboard showing live nodes, hosted models, and throughput; and an optional WebGPU in-browser contributor mode design with a working proof-of-concept for small shards.

est. 4,125 credits · awaiting funding (189 credits of 4,125 credits)

#6 Documentation, launch content, and operator playbookpending

Complete public-facing material: full developer docs and protocol specification; volunteer onboarding guides per OS; model-porting guide so the community can adapt new published LLMs; API reference with examples in 4 languages; governance and abuse-prevention policy for the free tier; a launch blog post and technical whitepaper-style writeup; and an operator playbook for running bootstrap/gateway infrastructure including cost projections and scaling runbooks.

est. 2,250 credits · awaiting funding (189 credits of 2,250 credits)

Public build log (live, every credit traceable)

2026-06-12 21:01Backed with 100 credits by Ron Bulischeck.

2026-06-12 17:27Plan ready: 6 milestones, est. total 21750 credits (1.5x cushion over token estimates). Next milestone runs when its funding gate is met.

2026-06-12 17:27Planning cost 11 credits (575 in / 2022 out tokens)

2026-06-12 17:27Planning started (model: claude-fable-5)

2026-06-12 16:44Backed with 100 credits by Joe Wilcoxson.

2026-06-12 06:12Approved by review. Project is live.

2026-06-12 05:51Project submitted for review. It goes live — and can spend — only after approval.