# Report 03 — Integration Quality-Scale Distribution

> Part of Milestone 1: Ecosystem Analysis & Impact Assessment.
> Data snapshot: 2025-Q2 (see Report 01 for collection window and provenance).
> Reproducible via `tools/fetch_quality_scale.py` and `tools/build_report_tables.py`.
> Underlying data: `data/quality_scale_distribution.csv`, `data/top_integrations.csv`.

---

## 3.1 The quality scale, before and after the 2024 revamp

Home Assistant has carried an integration quality scale for years, but until late
2024 it was a loosely-enforced label (`quality_scale` in `manifest.json`) with
prose-level tier definitions. The **Integration Quality Scale (IQS) revamp**
(announced alongside the project roadmap work in 2024, documented at
`developers.home-assistant.io/docs/core/integration-quality-scale/`) changed it
into a **rules-based system**:

- Each tier — **Bronze, Silver, Gold, Platinum** — is defined by an explicit
  checklist of rules (e.g. `config-flow`, `entity-unique-id`, `reauthentication-flow`,
  `diagnostics`, `strict-typing`).
- Each integration that opts in carries a `quality_scale.yaml` in its source
  directory recording per-rule status: `done`, `todo`, or `exempt` (with
  justification).
- `hassfest` validates the claimed tier against the rule statuses in CI.
- Integrations that have not yet been (re-)assessed under the new rules carry
  the **`legacy`** designation — they are not necessarily low quality, but they
  carry no verified guarantees.
- Two non-tier designations exist: **`internal`** (core plumbing such as `sun`,
  `mobile_app`, `bluetooth`) and **`virtual`** (brand aliases that point at
  another integration, e.g. dozens of Tuya-brand stubs).

The practical meaning of each tier, condensed from the rule lists:

| Tier | What a user actually gets | Representative rules |
|---|---|---|
| **Bronze** | UI setup works and is tested; entities are stable across restarts | `config-flow`, full config-flow test coverage, `entity-unique-id`, `runtime-data`, `has-entity-name`, docs for setup |
| **Silver** | Survives bad days: auth expiry, devices going offline, clean reloads | `reauthentication-flow`, `entity-unavailable`, `log-when-unavailable`, `config-entry-unloading`, `parallel-updates`, `action-exceptions`, named code owner |
| **Gold** | Feels first-class: devices, translations, diagnostics, self-healing | `devices`, `diagnostics`, `entity-translations`, `icon-translations`, `entity-device-class`, `repair-issues`, `dynamic-devices`, `discovery`, `reconfiguration-flow` |
| **Platinum** | Technically exemplary | `strict-typing`, `async-dependency`, `inject-websession` (shared HTTP session) |

**Why this matters for contribution planning:** the rules give an external
contributor an *objective, reviewable work plan per integration*. Tier uplift
PRs are unusually mergeable because the acceptance criteria are written down by
the core team itself, and `hassfest` mechanically verifies most of them.

---

## 3.2 Distribution across all core integrations

From the snapshot of `homeassistant/components/*/manifest.json` plus
`quality_scale.yaml` files (script: `tools/fetch_quality_scale.py`):

| Designation | Count | Share of all 2,879 |
|---|---:|---:|
| Virtual (brand alias) | 287 | 10.0% |
| Internal | 58 | 2.0% |
| **Legacy (unassessed)** | **2,101** | **73.0%** |
| Bronze | 214 | 7.4% |
| Silver | 119 | 4.1% |
| Gold | 73 | 2.5% |
| Platinum | 27 | 0.9% |
| **Total** | **2,879** | 100% |

Headline: **only 433 of 2,534 scoreable integrations (17.1%) have any verified
tier at all**, and only 100 (3.9%) are Gold or Platinum. Eighteen months into
the IQS revamp, the long tail has barely been touched — which is expected, since
tier assessment is opt-in and driven almost entirely by individual code owners.

### Velocity check

Comparing manifests across the snapshot window's bounding releases, roughly
**14–18 integrations per month** gain or raise a tier, heavily skewed toward
new integrations (which must enter at Bronze since 2025) rather than uplift of
existing popular ones. At that organic rate, the existing top-150 would take
**years** to reach Silver coverage without directed effort.

---

## 3.3 Distribution within the top 150 by install base

This is the slice users actually live in. Joining the analytics install-base
ranking (Report 01, `data/top_integrations.csv`) with tier data:

| Designation | Count in top 150 | Share |
|---|---:|---:|
| Internal | 3 | 2.0% |
| **Legacy** | **62** | **41.3%** |
| Bronze | 31 | 20.7% |
| Silver | 27 | 18.0% |
| Gold | 19 | 12.7% |
| Platinum | 8 | 5.3% |

And the very top of the list (25 most-installed, excluding the 2 internal):

| Tier | Count of top 25 | Examples |
|---|---:|---|
| Platinum | 2 | `esphome`, `wled` |
| Gold | 4 | `shelly`, `hue`, `zwave_js`, `tplink` |
| Silver | 4 | `mqtt`, `zha`, `unifi`, `fritz` |
| Bronze | 7 | `tuya`, `sonos`, `samsungtv`, `homekit_controller`, … |
| Legacy | 6 | `cast`, `upnp`, `dlna_dmr`, `google_translate`, `homekit`, `ipp` |

Two findings worth underlining:

1. **The popular tier gap is real but tractable.** 93 of the top 150
   (Legacy + Bronze) lack the Silver guarantees — reauth flows, unavailability
   handling, clean unload — that most directly map to "my integration broke and
   I don't know why" forum threads. That is a *bounded* population.
2. **Legacy ≠ obscure.** Six of the 25 most-installed integrations are
   unassessed, including `cast` (≈176k reporting installs) and `homekit`
   (≈94k). Several of these are old, structurally sound code that mostly needs
   *assessment plus targeted gap-filling*, not rewrites.

---

## 3.4 Rule-level gap analysis (what actually blocks promotion)

To estimate where the effort lies, we audited a stratified sample of **40
integrations** from the top-150's Legacy/Bronze/Silver population against the
rule checklists (manual review of source + `quality_scale.yaml` where present),
then extrapolated to the 120 top-150 integrations below Gold. Estimated counts
of integrations blocked by each rule family:

| Rule (family) | Tier it blocks | Est. # affected (of 120) | Typical effort to fix |
|---|---|---:|---|
| `repair-issues` not used | Gold | ~112 | 0.5–2 days |
| `reconfiguration-flow` missing | Gold | ~98 | 0.5–1 day |
| `strict-typing` failing | Platinum | ~94 | 1–10 days (size-dependent) |
| `entity-translations` / `icon-translations` incomplete | Gold | ~81 | 0.5–2 days |
| `diagnostics` missing | Gold | ~74 | 0.5–1 day |
| `parallel-updates` undeclared | Silver | ~69 | < 0.5 day |
| `reauthentication-flow` missing (cloud/auth integrations) | Silver | ~41 | 1–3 days |
| `entity-unavailable` / `log-when-unavailable` incorrect | Silver | ~57 | 0.5–2 days |
| `config-entry-unloading` broken or untested | Silver | ~33 | 0.5–2 days |
| Config-flow test coverage below 100% | Bronze | ~46 | 1–3 days |
| No config flow at all (YAML-only setup) | Bronze | ~9 | 3–15 days |

Reading: the **Silver-blocking rules are cheap** (mostly < 2 days each, often
mechanical), while **Gold is dominated by translation/diagnostics/repairs
plumbing** that follows well-established patterns, and **Platinum is gated
almost entirely by `strict-typing`**, whose cost scales with integration size
(`tuya` and `zha` would be multi-week; `pi_hole` would be a day).

---

## 3.5 Effort model for tier uplift

Combining the rule audit with observed PR history for recent uplift work
(e.g. the Gold pushes on `enphase_envoy`, `reolink`, `lamarzocco`), median
effort per integration per step, including tests and review cycles:

| Transition | Median effort | P90 effort | Notes |
|---|---:|---:|---|
| Legacy → Bronze (has config flow) | 2 days | 5 days | Mostly test coverage + `quality_scale.yaml` assessment |
| Legacy → Bronze (YAML-only) | 8 days | 15+ days | Config-flow authoring; needs code-owner buy-in |
| Bronze → Silver | 2 days | 4 days | Reauth flow is the only commonly expensive item |
| Silver → Gold | 5 days | 9 days | Translations, diagnostics, repairs, reconfigure |
| Gold → Platinum | 4 days | 12+ days | `strict-typing`; sometimes requires dependency-library typing work upstream |

**Implication for path scoring (Report 06):** "polish the top 100 to Platinum"
is dominated by Platinum's typing tail and by a handful of giant integrations;
"lift the top ~100 non-internal integrations to **Silver**, and the best 30–40
of those to **Gold**" delivers most of the user-visible reliability benefit at
roughly **one quarter of the effort**, and avoids the highest-friction reviews.

---

## 3.6 Findings

- **F3.1** — 73% of all core integrations and 41% of the top 150 are unassessed
  (`legacy`); verified quality is the exception, not the rule.
- **F3.2** — The gap between "what users run" and "what is verified" is
  concentrated in a bounded set: **93 top-150 integrations below Silver**.
- **F3.3** — Silver-tier rules are the highest reliability-per-hour work in the
  entire codebase: small, mechanical, objectively checkable, and aligned with
  the most common user-reported failure modes (see Report 04 §4.4).
- **F3.4** — Platinum-for-its-own-sake is poor value on large integrations;
  `strict-typing` on `tuya`/`zha`-class codebases costs weeks and fixes few
  user-facing problems.
- **F3.5** — Because every rule has a documented definition and `hassfest`
  enforcement, uplift PRs have an unusually high merge probability for an
  outside contributor — the acceptance criteria are pre-agreed.