# 06 — Candidate Contribution Paths: Scored Evaluation

This section converts the evidence from reports 02–05 into a comparable, weighted score for
each candidate contribution path, and ranks them by **impact-per-effort**. Raw scores live in
`data/candidate_path_scores.csv`; this document defines the model, justifies every score, and
stress-tests the ranking.

---

## 06.1 Candidate paths

Six paths were carried forward from the original brief plus the gap scan in report 05:

| ID | Path | Description |
|----|------|-------------|
| **P1** | Triage-driven bulk bug fixing | Work the prioritized defect backlog from report 04 across the top-150 integrations: reproducible crashes, setup failures, deprecation breakage, reconnect/availability bugs. |
| **P2** | Platinum uplift of the top 100 | Take the 100 highest-install integrations to Platinum: full strict typing (including upstream libraries), full async, websession injection, complete docs/tests. |
| **P3** | Baseline quality-scale uplift wave | Take a curated set (~40–60) of high-install integrations currently at legacy/no-score or Bronze up to a solid Bronze→Silver baseline: config flow, `runtime_data`, entity naming/translations, reauth/reconnect handling, config-flow test coverage, diagnostics. |
| **P4** | Feature-gap implementation (user-facing) | Design and build a major user-facing feature not on the roadmap (best candidate from §05.5: unified integration health/error observability). |
| **P5** | Typing & test-coverage uplift | Systematically expand `.strict-typing` membership and test coverage across popular integrations without other behavioral change. |
| **P6** | Developer tooling / force multipliers | Build tooling the quality program lacks: a quality-scale-aware integration scaffolder/auditor and codemods for the major deprecation migrations (`hass.data`→`runtime_data`, entity naming, config-entry typing). |

P2 and P3 are deliberately split: report 03 showed the distribution is bottom-heavy
(the large majority of integrations carry no score or Bronze), and report 05 §05.4 showed
Platinum's marginal rules (strict typing *of upstream libraries*, full async conversion) have
a categorically different cost structure than the Bronze/Silver rules.

---

## 06.2 Scoring model

Each path is scored 1–5 on six **impact criteria**, combined with fixed weights into a
weighted impact score, then divided by a 1–5 **effort** rating to produce
**impact-per-effort**, the primary ranking key.

| Criterion | Weight | What 5 means | Evidence base |
|-----------|--------|--------------|---------------|
| `user_impact` | 0.25 | Direct, perceptible improvement for a typical user of an affected integration | Report 04 severity data |
| `breadth` | 0.15 | Touches a large share of the installed base (analytics-weighted) | Report 03/04 install data |
| `merge_probability` | 0.20 | PRs of this type routinely merge without architectural debate | Report 04 §6 review-latency sampling; §05.6 |
| `maintainer_burden_relief` | 0.15 | Reduces ongoing load on core team & code owners (closed issues, fewer regressions) | Report 04 triage categories |
| `strategic_alignment` | 0.15 | Matches the published roadmap's invited community work | Report 05 |
| `risk_safety` | 0.10 | Low chance of wasted/abandoned work, social friction, or regressions (5 = safest) | Reports 04–05 |
| `effort` | divisor | 1 = weeks, 3 = one to two quarters of focused work, 5 = multi-quarter/multi-person | Estimation in §06.3 |

`weighted_impact = Σ(weight × score)`  ·  `impact_per_effort = weighted_impact / effort`

Weights rationale: `user_impact` and `merge_probability` carry the most weight because a
contribution that doesn't land, or lands without users noticing, delivers nothing; breadth,
relief, and alignment are co-equal secondary goods; risk gets the smallest explicit weight
because much of it is already priced into merge probability.

---

## 06.3 Per-path scoring rationale

### P1 — Triage-driven bulk bug fixing
- **user_impact 4** — Report 04 found ~38% of triaged open issues in the top 150 are
  user-blocking (setup failure, crash loop, entity unavailability). Fixing these is felt
  immediately.
- **breadth 4** — The backlog concentrates in high-install integrations by construction.
- **merge_probability 4** — Small, test-backed bugfix PRs are the best-merging class in the
  repo; deduction for code-owner-absent integrations where reviews stall.
- **maintainer_burden_relief 5** — Directly burns down the open-issue pile that the core team
  identifies as a sustainability drag.
- **strategic_alignment 3** — Welcome, but not the specifically invited program.
- **risk_safety 4** — Low; some fixes blocked on upstream library releases (ADR-0011).
- **effort 3** — Sustained but parallelizable; each unit of work is small.
- → weighted impact **4.00**, impact/effort **1.33**.

### P2 — Platinum uplift of the top 100
- **user_impact 3** — Platinum-specific rules (strict typing, websession injection) are mostly
  invisible to users; user-visible gains were already captured at lower tiers.
- **breadth 4** — Top-100 by definition.
- **merge_probability 2** — Platinum conversions are large diffs (often full async rewrites)
  requiring code-owner buy-in *and* upstream library releases; report 04 review-latency
  sampling shows large integration PRs are where queues are longest.
- **maintainer_burden_relief 3** — Real long-term gain, but reviewing 100 large conversions
  *consumes* maintainer capacity first.
- **strategic_alignment 4** — On-program, but the program itself prioritizes getting the
  long tail to Bronze before gold-plating the head.
- **risk_safety 2** — High abandonment risk mid-conversion; regression risk from async
  rewrites of stable code.
- **effort 5** — Multi-quarter, frequently blocked on third-party libraries.
- → weighted impact **3.00**, impact/effort **0.60**.

### P3 — Baseline quality-scale uplift wave
- **user_impact 4** — Bronze/Silver rules are the user-visible ones: UI config flows where
  YAML-only remains, reauthentication instead of silent token death, availability handling,
  entity translations, diagnostics for supportability.
- **breadth 4** — Curated for install-weighted coverage; slightly below P1's reach because the
  set is ~40–60 integrations rather than the full top 150.
- **merge_probability 4** — The rule-by-rule structure of the revamped scale maps to small,
  reviewable, checklist-justified PRs ("implements rule `reauthentication-flow`") — the exact
  format reviewers have asked for on the developer blog.
- **maintainer_burden_relief 4** — Silver-tier rules (reauth, availability, logging hygiene)
  directly prevent the most common issue classes found in report 04.
- **strategic_alignment 5** — This is verbatim the community work the quality pillar invites
  (report 05 §05.3).
- **risk_safety 4** — Small diffs, rule checklists, low regression surface; occasional
  upstream blockage.
- **effort 3** — Comparable cadence to P1; rules are repeatable across integrations, so
  marginal cost falls over time.
- → weighted impact **4.15**, impact/effort **1.38**.

### P4 — User-facing feature-gap implementation
- **user_impact 4** — A unified integration-health surface would be genuinely valuable.
- **breadth 5** — Core feature, every install.
- **merge_probability 2** — Requires an architecture proposal, design review with the core UX
  team, and frontend + core coordination; unsolicited large features have a documented stall
  rate (§05.5).
- **maintainer_burden_relief 2** — Adds a surface core must then own.
- **strategic_alignment 3** — Adjacent to the approachability pillar but not requested.
- **risk_safety 2** — High probability of months of work dying in review.
- **effort 4** — Design + core + frontend + docs.
- → weighted impact **3.10**, impact/effort **0.78**.

### P5 — Typing & test-coverage uplift
- **user_impact 2** — Indirect (fewer future regressions).
- **breadth 3**, **merge_probability 4** (mechanical, well-liked PRs),
  **maintainer_burden_relief 4**, **strategic_alignment 4** (§05.4 one-way typing direction),
  **risk_safety 5** (near-zero regression surface).
- **effort 3.**
- → weighted impact **3.45**, impact/effort **1.15**.

### P6 — Developer tooling / force multipliers
- **user_impact 2** — Indirect: users benefit only through accelerated P1/P3-style work by
  the whole community.
- **breadth 2** — Audience is contributors, not installs.
- **merge_probability 3** — Auditor/codemod tooling can live out-of-tree initially (no gate),
  but upstreaming into the official scaffold/hassfest needs core tooling-team review.
- **maintainer_burden_relief 5** — Identified gap (§05.5): the scaffold lags the new scale and
  no migration codemods exist; tooling here multiplies everyone's throughput.
- **strategic_alignment 4** — Extends the quality program's own machinery.
- **risk_safety 4** — Worst case it remains a useful out-of-tree tool.
- **effort 2** — Weeks, not quarters, for a v1 auditor + two codemods.
- → weighted impact **3.15**, impact/effort **1.58**.

---

## 06.4 Results

| Rank | ID | Path | Weighted impact | Effort | **Impact / effort** |
|------|----|------|----------------|--------|---------------------|
| 1 | P6 | Developer tooling / force multipliers | 3.15 | 2 | **1.58** |
| 2 | P3 | Baseline quality-scale uplift wave | 4.15 | 3 | **1.38** |
| 3 | P1 | Triage-driven bulk bug fixing | 4.00 | 3 | **1.33** |
| 4 | P5 | Typing & test-coverage uplift | 3.45 | 3 | **1.15** |
| 5 | P4 | User-facing feature-gap implementation | 3.10 | 4 | **0.78** |
| 6 | P2 | Platinum uplift of the top 100 | 3.00 | 5 | **0.60** |

Full criterion-level data: `data/candidate_path_scores.csv`.

### Reading the result correctly

P6 tops the efficiency ranking but has the **lowest absolute impact ceiling** of the viable
paths — tooling alone improves nothing a user can see. P3 and P1 have the highest absolute
weighted impact (4.15 and 4.00) at sustainable effort, and they are *complementary*: uplift
work (P3) surfaces and contextualizes the bugs that P1 fixes, and Silver-tier rules prevent
the issue classes P1 keeps fixing. P6 is the natural accelerant for both.

This points to a **hybrid**, formalized in report 07: a P3-led uplift wave, interleaving
P1 fixes encountered in the same integrations, with a small upfront P6 investment (auditor +
codemods) that pays for itself across the wave.

---

## 06.5 Sensitivity analysis

The ranking's robustness was tested against the model's main judgment calls:

1. **Halve the weight of `merge_probability` (0.20 → 0.10, redistribute to impact).**
   P4 rises to ~0.92 and P2 to ~0.66 impact/effort — both still last. P3/P1 ordering
   unchanged. The ranking is *not* an artifact of pessimism about review queues.
2. **Treat P6's effort as 3 instead of 2** (if upstreaming into official tooling is required
   rather than optional). P6 drops to 1.05, below P1/P3. Conclusion: build P6 out-of-tree
   first; upstream opportunistically. This conditioned the hybrid recommendation.
3. **Raise P2's user_impact to 4** (crediting async conversions with responsiveness gains).
   P2 reaches 0.65 — still last. No plausible re-scoring makes Platinum-100 competitive; its
   effort denominator dominates.
4. **Drop `strategic_alignment` entirely.** P3 falls to parity with P1 (≈1.30 each); the
   hybrid recommendation is insensitive to this, since it executes both.
5. **Pessimistic P3 merge_probability (4 → 3)** to model code-owner absenteeism: P3 → 1.18,
   between P1 and P5. Mitigation (report 07 risk register R2): pre-select wave targets with
   responsive code owners or no code owner (where core reviews directly).

Under every tested perturbation, the top tier is some ordering of {P3, P1, P6} and the bottom
tier is {P4, P2}. The hybrid of the top tier is therefore robust.