# AI Creates a Programming Language — Milestone 1 ## Research, Goals, and Evaluation Framework This repository contains the first milestone deliverables for a project to design a new programming language optimized for LLM-generated software rather than primarily for human authorship. The milestone establishes the research basis, charter, definition of “LLM-optimized,” and the first evaluation framework for comparing the future language against Python, Rust, and TypeScript. This milestone does **not** implement the language or benchmark harness. It defines the criteria and benchmark plan that later milestones should implement and use. ## Deliverables | File | Purpose | | --- | --- | | `docs/research_dossier.md` | Survey of existing programming languages, formal methods tools, and AI coding workflows; pain points for LLM-generated code; design lessons. | | `docs/project_charter.md` | Mission, scope, non-goals, stakeholders, success criteria, governance, and milestone roadmap. | | `docs/optimization_model.md` | Precise definition of optimization for LLMs, including measurable design dimensions and language/tooling implications. | | `docs/evaluation_framework.md` | Metrics, experiment modes, scoring model, instrumentation plan, and reporting standard. | | `docs/benchmark_plan.md` | Initial benchmark suite structure and task families for comparison with Python, Rust, and TypeScript. | | `docs/glossary.md` | Shared terminology for the project. | | `evaluation/benchmark_catalog.yaml` | Machine-readable initial catalog of benchmark tasks, metrics, and baseline constraints. | | `evaluation/experiment_protocol.md` | Step-by-step protocol for running model/language evaluations reproducibly. | | `evaluation/scoring_rubric.md` | Point rubric and defect taxonomy for scoring generated programs. | | `evaluation/task_template.md` | Standard task specification template for later benchmark implementation. | ## Working definition A language is **optimized for LLMs** if, under realistic prompt, context, tool, and time constraints, LLM agents can produce correct, safe, maintainable programs in that language with higher reliability and lower repair cost than in comparable mainstream languages. Optimization is not treated as “shorter syntax” alone. This milestone defines it as a combined property of: - low-ambiguity syntax and canonical formatting, - explicit contracts for data, effects, resources, and errors, - machine-checkable interfaces and specifications, - diagnostics designed for automated repair loops, - standard-library APIs that are discoverable from local context, - deterministic builds and dependency resolution, - first-class provenance and generated-code review support. ## Evaluation baseline The project will evaluate the new language against: - **Python**: high LLM familiarity, dynamic runtime, fast prototyping, weak static guarantees. - **TypeScript**: high LLM familiarity, structural types, large ecosystem, JavaScript runtime complexity. - **Rust**: strong static guarantees, explicit ownership, high compiler assistance, steeper generation burden. The evaluation framework compares generated solutions across first-pass success, compile/typecheck success, test pass rate, repair iterations, token cost, hallucinated API usage, security defects, performance, maintainability, and context sensitivity. ## How to use this milestone 1. Read `docs/research_dossier.md` for the evidence base and design lessons. 2. Read `docs/project_charter.md` to understand project scope and success criteria. 3. Use `docs/optimization_model.md` when evaluating proposed language features. 4. Use `docs/evaluation_framework.md` and `evaluation/experiment_protocol.md` to design experiments. 5. Use `evaluation/benchmark_catalog.yaml` as the seed task inventory for benchmark implementation in later milestones. No build step is required for this documentation-only milestone. No third-party dependencies, generated lockfiles, or SDKs are used. ## Milestone acceptance mapping The requested milestone scope is covered as follows: - **Survey existing languages and AI coding workflows**: `docs/research_dossier.md` - **Identify pain points for LLM-generated code**: `docs/research_dossier.md`, `docs/optimization_model.md` - **Define what optimization for LLMs means**: `docs/optimization_model.md`, `docs/project_charter.md` - **Propose success metrics**: `docs/evaluation_framework.md`, `evaluation/scoring_rubric.md` - **Create initial benchmark/evaluation plan against Python, Rust, and TypeScript**: `docs/benchmark_plan.md`, `evaluation/benchmark_catalog.yaml`, `evaluation/experiment_protocol.md` ## Dependency and toolchain notes This milestone contains Markdown and YAML only. It does not import libraries, call SDKs, or require a package manager. Later implementation milestones should introduce manifests only when executable code is added.