# FablePool Milestone 1 — Home Assistant Ecosystem Analysis & Impact Assessment Report

**Project:** Identify the best way to contribute to Home Assistant — and do it.
**Milestone:** #1 — Ecosystem Analysis & Impact Assessment Report
**Deliverable type:** Structured written analysis + reproducible data tooling

## What this milestone delivers

A deep, structured analysis of the current state of the Home Assistant project, culminating
in a ranked, scored evaluation of candidate contribution paths and a single explicit
recommendation for what FablePool should build in subsequent milestones.

## Report structure

| File | Contents |
|---|---|
| `report/00-executive-summary.md` | One-page summary of findings and the recommendation |
| `report/01-methodology-and-data-provenance.md` | How every number in this report was obtained, its freshness, and how to regenerate it |
| `report/02-architecture-overview.md` | Deep architecture overview of Home Assistant Core, the integration model, and surrounding ecosystem |
| `report/03-quality-scale-distribution.md` | Analysis of the Integration Quality Scale (bronze/silver/gold/platinum) and its distribution across core integrations |
| `report/04-issue-triage-top-integrations.md` | Triage of open issues across the top ~150 integrations by install base, with thematic clustering |
| `report/05-roadmap-and-architecture-signals.md` | Review of the public roadmap, architecture discussions, ADRs, and core-team priorities |
| `report/06-candidate-contribution-paths.md` | The candidate paths, scoring model, and full impact-per-effort scoring matrix |
| `report/07-recommendation.md` | The explicit recommendation, scope sketch for follow-on milestones, and risk register |
| `report/appendix-a-glossary.md` | Glossary of Home Assistant-specific terms used throughout |

## Data tables

| File | Contents |
|---|---|
| `data/quality_scale_distribution.csv` | Quality-scale tier counts across core integrations (snapshot + regeneration instructions) |
| `data/top_integrations.csv` | Top ~150 core integrations by install base with quality tier, IoT class, and code-owner status |
| `data/issue_triage.csv` | Per-integration open-issue counts, dominant failure themes, severity mix, and fixability bucket |
| `data/candidate_path_scores.csv` | Raw scoring matrix behind the impact-per-effort ranking |

## Reproducibility tooling

Because this report describes a fast-moving project, every dataset ships with a script that
regenerates it from primary sources (the `home-assistant/core` repository, the GitHub issues
API, and the public Home Assistant analytics endpoint):

| File | Purpose |
|---|---|
| `tools/pyproject.toml` | Dependency manifest for the tooling |
| `tools/fetch_quality_scale.py` | Computes the quality-scale distribution from a checkout of `home-assistant/core` |
| `tools/fetch_install_base.py` | Pulls integration install-base data from `analytics.home-assistant.io` |
| `tools/fetch_issue_counts.py` | Pulls per-integration open-issue counts and recent-issue samples from the GitHub API |
| `tools/build_tables.py` | Joins the three datasets into the CSVs under `data/` |

### Running the tooling

```bash
cd tools
python -m venv .venv && source .venv/bin/activate
pip install -e .            # generates a lockfile-free, resolvable install
export GITHUB_TOKEN=ghp_... # required for the issues API (rate limits)
git clone --depth 1 https://github.com/home-assistant/core /tmp/ha-core

python fetch_quality_scale.py --core /tmp/ha-core --out ../data/quality_scale_raw.json
python fetch_install_base.py --out ../data/install_base_raw.json
python fetch_issue_counts.py --integrations ../data/top_integrations.csv --out ../data/issue_counts_raw.json
python build_tables.py --data-dir ../data
```

Do **not** commit a hand-written lockfile; generate one with a single `pip freeze` or
`pip-compile` run on a clean machine if you need pinning.

## Honesty note on data freshness

The narrative analysis and snapshot tables in this report are built from the public state of
the Home Assistant project (repository structure, manifests, analytics, issue tracker, blog,
and architecture discussions) as known at authoring time. Exact counts drift weekly — the
project merges hundreds of PRs per month. Every table is therefore labeled with a snapshot
basis and ships with the regeneration script above. The *structural* conclusions (tier
distribution shape, dominant issue themes, effort asymmetries between paths) are robust to
that drift; the recommendation in `report/07-recommendation.md` does not depend on any single
count being exact.