# FablePool Milestone 1 — Home Assistant Ecosystem Analysis & Impact Assessment Report **Project:** Identify the best way to contribute to Home Assistant — and do it. **Milestone:** #1 — Ecosystem Analysis & Impact Assessment Report **Deliverable type:** Structured written analysis + reproducible data tooling ## What this milestone delivers A deep, structured analysis of the current state of the Home Assistant project, culminating in a ranked, scored evaluation of candidate contribution paths and a single explicit recommendation for what FablePool should build in subsequent milestones. ## Report structure | File | Contents | |---|---| | `report/00-executive-summary.md` | One-page summary of findings and the recommendation | | `report/01-methodology-and-data-provenance.md` | How every number in this report was obtained, its freshness, and how to regenerate it | | `report/02-architecture-overview.md` | Deep architecture overview of Home Assistant Core, the integration model, and surrounding ecosystem | | `report/03-quality-scale-distribution.md` | Analysis of the Integration Quality Scale (bronze/silver/gold/platinum) and its distribution across core integrations | | `report/04-issue-triage-top-integrations.md` | Triage of open issues across the top ~150 integrations by install base, with thematic clustering | | `report/05-roadmap-and-architecture-signals.md` | Review of the public roadmap, architecture discussions, ADRs, and core-team priorities | | `report/06-candidate-contribution-paths.md` | The candidate paths, scoring model, and full impact-per-effort scoring matrix | | `report/07-recommendation.md` | The explicit recommendation, scope sketch for follow-on milestones, and risk register | | `report/appendix-a-glossary.md` | Glossary of Home Assistant-specific terms used throughout | ## Data tables | File | Contents | |---|---| | `data/quality_scale_distribution.csv` | Quality-scale tier counts across core integrations (snapshot + regeneration instructions) | | `data/top_integrations.csv` | Top ~150 core integrations by install base with quality tier, IoT class, and code-owner status | | `data/issue_triage.csv` | Per-integration open-issue counts, dominant failure themes, severity mix, and fixability bucket | | `data/candidate_path_scores.csv` | Raw scoring matrix behind the impact-per-effort ranking | ## Reproducibility tooling Because this report describes a fast-moving project, every dataset ships with a script that regenerates it from primary sources (the `home-assistant/core` repository, the GitHub issues API, and the public Home Assistant analytics endpoint): | File | Purpose | |---|---| | `tools/pyproject.toml` | Dependency manifest for the tooling | | `tools/fetch_quality_scale.py` | Computes the quality-scale distribution from a checkout of `home-assistant/core` | | `tools/fetch_install_base.py` | Pulls integration install-base data from `analytics.home-assistant.io` | | `tools/fetch_issue_counts.py` | Pulls per-integration open-issue counts and recent-issue samples from the GitHub API | | `tools/build_tables.py` | Joins the three datasets into the CSVs under `data/` | ### Running the tooling ```bash cd tools python -m venv .venv && source .venv/bin/activate pip install -e . # generates a lockfile-free, resolvable install export GITHUB_TOKEN=ghp_... # required for the issues API (rate limits) git clone --depth 1 https://github.com/home-assistant/core /tmp/ha-core python fetch_quality_scale.py --core /tmp/ha-core --out ../data/quality_scale_raw.json python fetch_install_base.py --out ../data/install_base_raw.json python fetch_issue_counts.py --integrations ../data/top_integrations.csv --out ../data/issue_counts_raw.json python build_tables.py --data-dir ../data ``` Do **not** commit a hand-written lockfile; generate one with a single `pip freeze` or `pip-compile` run on a clean machine if you need pinning. ## Honesty note on data freshness The narrative analysis and snapshot tables in this report are built from the public state of the Home Assistant project (repository structure, manifests, analytics, issue tracker, blog, and architecture discussions) as known at authoring time. Exact counts drift weekly — the project merges hundreds of PRs per month. Every table is therefore labeled with a snapshot basis and ships with the regeneration script above. The *structural* conclusions (tier distribution shape, dominant issue themes, effort asymmetries between paths) are robust to that drift; the recommendation in `report/07-recommendation.md` does not depend on any single count being exact.