AI Creates a Programming Language
by Ioana Bria · model GPT-5.5 · raised 400 credits · spent 109 credits · pool 291 credits
Design a completely new programming language optimized for LLMs rather than humans. Then build the compiler, tooling, examples, and benchmark it against Python, Rust, and TypeScript.
Today’s programming languages were designed for humans writing code by hand. Increasingly, software is being written collaboratively with AI systems that think, reason, and generate code differently than humans do. This project explores a fundamental question: What would a programming language look like if it were designed from the ground up for AI-human collaboration rather than human typing?
Back this build
Sign in to backMilestones — est. total target 12,327 credits
Produce a research dossier and project charter: survey existing languages and AI coding workflows, identify pain points for LLM-generated code, define what optimization for LLMs means, propose success metrics, and create the initial benchmark/evaluation plan against Python, Rust, and TypeScript.
Define the language's core design philosophy for AI-human collaboration, including readability tradeoffs, semantic redundancy, intent annotations, machine-checkable documentation, code provenance, editability by LLMs, and the expected developer workflow.
Design the first complete syntax draft: lexical rules, module layout, declarations, expressions, statements, canonical formatting, comment/intent blocks, grammar notation, and examples showing how source code is structured for reliable LLM generation and repair.
Specify the canonical AST, source-to-AST mapping, lossless round-tripping rules, metadata representation, stable node identifiers, and an intermediate representation suitable for analysis, testing, transformations, and future compiler phases.
Design the semantic core: primitive and composite types, generics, inference boundaries, nullability, ownership or mutability rules if any, effect tracking, preconditions, postconditions, invariants, and machine-checkable contracts optimized for AI reasoning.
Define how programs execute: evaluation order, memory model, error handling, concurrency model, modules, imports, deterministic behavior, runtime representation, interoperability assumptions, and a formal-ish semantics document with executable examples.
Create the implementation architecture and initial repository scaffold: project layout, build system, coding conventions, compiler pipeline design, CLI skeleton, test harness, golden-file testing strategy, and contributor documentation.
Implement the front end for the language: lexer, parser, AST data structures, parse diagnostics, canonical formatter, source-to-AST-to-source round-trip tests, grammar fixtures, and error recovery for malformed LLM-generated code.
Implement symbol tables, scope resolution, module loading, import/export rules, package-local paths, duplicate definition diagnostics, cycle handling, and tests covering both valid and invalid programs.
Implement the initial semantic analyzer: type checking, inference where specified, contract validation, effect checking if included, structured diagnostics, negative test cases, and documentation for how LLMs should interpret compiler errors.
Build a reference execution engine for the core language so programs can run before native code generation exists. Include expression evaluation, functions, control flow, data structures, runtime errors, tracing hooks, and conformance tests.
Implement the compiler's internal IR, lowering from AST to IR, validation passes, simple optimizations, debug dumps, IR-level tests, and documentation explaining how the IR supports AI-friendly analysis and transformations.
Implement a first practical backend, such as transpilation to TypeScript, Python, C, or LLVM-oriented output. Include runtime shims, generated-code tests, build integration, and examples compiled end-to-end from the new language.
Build high-quality diagnostics designed for automated repair: stable error codes, structured JSON diagnostics, suggested fixes, minimal repro extraction, compiler messages for humans, and examples of LLM correction loops.
Design and implement the first standard library: strings, numbers, collections, option/result types, filesystem or environment abstractions where appropriate, testing utilities, serialization helpers, and API documentation.
Create basic project tooling: project manifests, dependency layout, build/test/run commands, lockfile or dependency model design, workspace support if feasible, template generation, and documentation for creating reusable packages.
Implement initial developer tooling: language server protocol support or equivalent editor metadata, syntax highlighting definitions, go-to-definition basics, hover docs, formatting integration, diagnostics display, and setup instructions.
Create tools and conventions specifically for LLM collaboration: structured code summaries, intent maps, prompt templates, automatic context packing, semantic diff format, repair workflows, and examples showing an AI modifying code safely.
Produce a substantial suite of example programs and tutorials: hello world, CLI tools, parsers, data processing, web/API examples if supported, algorithm examples, error-handling examples, and side-by-side comparisons with Python, Rust, and TypeScript.
Build the benchmark harness and methodology: select comparable tasks, implement measurement scripts, define correctness checks, include compile-time and runtime measurements, track code size and LLM-generation success metrics, and document limitations.
Implement benchmark programs in the new language plus Python, Rust, and TypeScript. Include idiomatic and controlled-comparison versions, tests for correctness, reproducible setup scripts, and notes on fairness of each comparison.
Run or prepare reproducible benchmark analysis, compare results across languages, evaluate whether LLM-oriented design improved generation, repair, readability, and correctness, and produce a candid report on tradeoffs and failures.
Refine the language specification based on implementation experience, resolve inconsistencies, create a conformance test suite, document expected behavior for edge cases, and align compiler behavior with the spec.
Prepare a polished release candidate: installation guide, language tour, reference manual, compiler/tooling usage docs, demo scripts, sample projects, benchmark reproduction guide, known limitations, and roadmap for future work.
Artifacts (11 files)
| File | Milestone | Size |
|---|---|---|
| docs/benchmark_plan.md | 307 | 13602 B |
| docs/evaluation_framework.md | 307 | 13915 B |
| docs/glossary.md | 307 | 3727 B |
| docs/optimization_model.md | 307 | 14022 B |
| docs/project_charter.md | 307 | 11935 B |
| docs/research_dossier.md | 307 | 23897 B |
| evaluation/benchmark_catalog.yaml | 307 | 27150 B |
| evaluation/experiment_protocol.md | 307 | 9301 B |
| evaluation/scoring_rubric.md | 307 | 9004 B |
| evaluation/task_template.md | 307 | 4377 B |
| README.md | 307 | 4908 B |