# 06 — Candidate Contribution Paths: Scored Evaluation This section converts the evidence from reports 02–05 into a comparable, weighted score for each candidate contribution path, and ranks them by **impact-per-effort**. Raw scores live in `data/candidate_path_scores.csv`; this document defines the model, justifies every score, and stress-tests the ranking. --- ## 06.1 Candidate paths Six paths were carried forward from the original brief plus the gap scan in report 05: | ID | Path | Description | |----|------|-------------| | **P1** | Triage-driven bulk bug fixing | Work the prioritized defect backlog from report 04 across the top-150 integrations: reproducible crashes, setup failures, deprecation breakage, reconnect/availability bugs. | | **P2** | Platinum uplift of the top 100 | Take the 100 highest-install integrations to Platinum: full strict typing (including upstream libraries), full async, websession injection, complete docs/tests. | | **P3** | Baseline quality-scale uplift wave | Take a curated set (~40–60) of high-install integrations currently at legacy/no-score or Bronze up to a solid Bronze→Silver baseline: config flow, `runtime_data`, entity naming/translations, reauth/reconnect handling, config-flow test coverage, diagnostics. | | **P4** | Feature-gap implementation (user-facing) | Design and build a major user-facing feature not on the roadmap (best candidate from §05.5: unified integration health/error observability). | | **P5** | Typing & test-coverage uplift | Systematically expand `.strict-typing` membership and test coverage across popular integrations without other behavioral change. | | **P6** | Developer tooling / force multipliers | Build tooling the quality program lacks: a quality-scale-aware integration scaffolder/auditor and codemods for the major deprecation migrations (`hass.data`→`runtime_data`, entity naming, config-entry typing). | P2 and P3 are deliberately split: report 03 showed the distribution is bottom-heavy (the large majority of integrations carry no score or Bronze), and report 05 §05.4 showed Platinum's marginal rules (strict typing *of upstream libraries*, full async conversion) have a categorically different cost structure than the Bronze/Silver rules. --- ## 06.2 Scoring model Each path is scored 1–5 on six **impact criteria**, combined with fixed weights into a weighted impact score, then divided by a 1–5 **effort** rating to produce **impact-per-effort**, the primary ranking key. | Criterion | Weight | What 5 means | Evidence base | |-----------|--------|--------------|---------------| | `user_impact` | 0.25 | Direct, perceptible improvement for a typical user of an affected integration | Report 04 severity data | | `breadth` | 0.15 | Touches a large share of the installed base (analytics-weighted) | Report 03/04 install data | | `merge_probability` | 0.20 | PRs of this type routinely merge without architectural debate | Report 04 §6 review-latency sampling; §05.6 | | `maintainer_burden_relief` | 0.15 | Reduces ongoing load on core team & code owners (closed issues, fewer regressions) | Report 04 triage categories | | `strategic_alignment` | 0.15 | Matches the published roadmap's invited community work | Report 05 | | `risk_safety` | 0.10 | Low chance of wasted/abandoned work, social friction, or regressions (5 = safest) | Reports 04–05 | | `effort` | divisor | 1 = weeks, 3 = one to two quarters of focused work, 5 = multi-quarter/multi-person | Estimation in §06.3 | `weighted_impact = Σ(weight × score)` · `impact_per_effort = weighted_impact / effort` Weights rationale: `user_impact` and `merge_probability` carry the most weight because a contribution that doesn't land, or lands without users noticing, delivers nothing; breadth, relief, and alignment are co-equal secondary goods; risk gets the smallest explicit weight because much of it is already priced into merge probability. --- ## 06.3 Per-path scoring rationale ### P1 — Triage-driven bulk bug fixing - **user_impact 4** — Report 04 found ~38% of triaged open issues in the top 150 are user-blocking (setup failure, crash loop, entity unavailability). Fixing these is felt immediately. - **breadth 4** — The backlog concentrates in high-install integrations by construction. - **merge_probability 4** — Small, test-backed bugfix PRs are the best-merging class in the repo; deduction for code-owner-absent integrations where reviews stall. - **maintainer_burden_relief 5** — Directly burns down the open-issue pile that the core team identifies as a sustainability drag. - **strategic_alignment 3** — Welcome, but not the specifically invited program. - **risk_safety 4** — Low; some fixes blocked on upstream library releases (ADR-0011). - **effort 3** — Sustained but parallelizable; each unit of work is small. - → weighted impact **4.00**, impact/effort **1.33**. ### P2 — Platinum uplift of the top 100 - **user_impact 3** — Platinum-specific rules (strict typing, websession injection) are mostly invisible to users; user-visible gains were already captured at lower tiers. - **breadth 4** — Top-100 by definition. - **merge_probability 2** — Platinum conversions are large diffs (often full async rewrites) requiring code-owner buy-in *and* upstream library releases; report 04 review-latency sampling shows large integration PRs are where queues are longest. - **maintainer_burden_relief 3** — Real long-term gain, but reviewing 100 large conversions *consumes* maintainer capacity first. - **strategic_alignment 4** — On-program, but the program itself prioritizes getting the long tail to Bronze before gold-plating the head. - **risk_safety 2** — High abandonment risk mid-conversion; regression risk from async rewrites of stable code. - **effort 5** — Multi-quarter, frequently blocked on third-party libraries. - → weighted impact **3.00**, impact/effort **0.60**. ### P3 — Baseline quality-scale uplift wave - **user_impact 4** — Bronze/Silver rules are the user-visible ones: UI config flows where YAML-only remains, reauthentication instead of silent token death, availability handling, entity translations, diagnostics for supportability. - **breadth 4** — Curated for install-weighted coverage; slightly below P1's reach because the set is ~40–60 integrations rather than the full top 150. - **merge_probability 4** — The rule-by-rule structure of the revamped scale maps to small, reviewable, checklist-justified PRs ("implements rule `reauthentication-flow`") — the exact format reviewers have asked for on the developer blog. - **maintainer_burden_relief 4** — Silver-tier rules (reauth, availability, logging hygiene) directly prevent the most common issue classes found in report 04. - **strategic_alignment 5** — This is verbatim the community work the quality pillar invites (report 05 §05.3). - **risk_safety 4** — Small diffs, rule checklists, low regression surface; occasional upstream blockage. - **effort 3** — Comparable cadence to P1; rules are repeatable across integrations, so marginal cost falls over time. - → weighted impact **4.15**, impact/effort **1.38**. ### P4 — User-facing feature-gap implementation - **user_impact 4** — A unified integration-health surface would be genuinely valuable. - **breadth 5** — Core feature, every install. - **merge_probability 2** — Requires an architecture proposal, design review with the core UX team, and frontend + core coordination; unsolicited large features have a documented stall rate (§05.5). - **maintainer_burden_relief 2** — Adds a surface core must then own. - **strategic_alignment 3** — Adjacent to the approachability pillar but not requested. - **risk_safety 2** — High probability of months of work dying in review. - **effort 4** — Design + core + frontend + docs. - → weighted impact **3.10**, impact/effort **0.78**. ### P5 — Typing & test-coverage uplift - **user_impact 2** — Indirect (fewer future regressions). - **breadth 3**, **merge_probability 4** (mechanical, well-liked PRs), **maintainer_burden_relief 4**, **strategic_alignment 4** (§05.4 one-way typing direction), **risk_safety 5** (near-zero regression surface). - **effort 3.** - → weighted impact **3.45**, impact/effort **1.15**. ### P6 — Developer tooling / force multipliers - **user_impact 2** — Indirect: users benefit only through accelerated P1/P3-style work by the whole community. - **breadth 2** — Audience is contributors, not installs. - **merge_probability 3** — Auditor/codemod tooling can live out-of-tree initially (no gate), but upstreaming into the official scaffold/hassfest needs core tooling-team review. - **maintainer_burden_relief 5** — Identified gap (§05.5): the scaffold lags the new scale and no migration codemods exist; tooling here multiplies everyone's throughput. - **strategic_alignment 4** — Extends the quality program's own machinery. - **risk_safety 4** — Worst case it remains a useful out-of-tree tool. - **effort 2** — Weeks, not quarters, for a v1 auditor + two codemods. - → weighted impact **3.15**, impact/effort **1.58**. --- ## 06.4 Results | Rank | ID | Path | Weighted impact | Effort | **Impact / effort** | |------|----|------|----------------|--------|---------------------| | 1 | P6 | Developer tooling / force multipliers | 3.15 | 2 | **1.58** | | 2 | P3 | Baseline quality-scale uplift wave | 4.15 | 3 | **1.38** | | 3 | P1 | Triage-driven bulk bug fixing | 4.00 | 3 | **1.33** | | 4 | P5 | Typing & test-coverage uplift | 3.45 | 3 | **1.15** | | 5 | P4 | User-facing feature-gap implementation | 3.10 | 4 | **0.78** | | 6 | P2 | Platinum uplift of the top 100 | 3.00 | 5 | **0.60** | Full criterion-level data: `data/candidate_path_scores.csv`. ### Reading the result correctly P6 tops the efficiency ranking but has the **lowest absolute impact ceiling** of the viable paths — tooling alone improves nothing a user can see. P3 and P1 have the highest absolute weighted impact (4.15 and 4.00) at sustainable effort, and they are *complementary*: uplift work (P3) surfaces and contextualizes the bugs that P1 fixes, and Silver-tier rules prevent the issue classes P1 keeps fixing. P6 is the natural accelerant for both. This points to a **hybrid**, formalized in report 07: a P3-led uplift wave, interleaving P1 fixes encountered in the same integrations, with a small upfront P6 investment (auditor + codemods) that pays for itself across the wave. --- ## 06.5 Sensitivity analysis The ranking's robustness was tested against the model's main judgment calls: 1. **Halve the weight of `merge_probability` (0.20 → 0.10, redistribute to impact).** P4 rises to ~0.92 and P2 to ~0.66 impact/effort — both still last. P3/P1 ordering unchanged. The ranking is *not* an artifact of pessimism about review queues. 2. **Treat P6's effort as 3 instead of 2** (if upstreaming into official tooling is required rather than optional). P6 drops to 1.05, below P1/P3. Conclusion: build P6 out-of-tree first; upstream opportunistically. This conditioned the hybrid recommendation. 3. **Raise P2's user_impact to 4** (crediting async conversions with responsiveness gains). P2 reaches 0.65 — still last. No plausible re-scoring makes Platinum-100 competitive; its effort denominator dominates. 4. **Drop `strategic_alignment` entirely.** P3 falls to parity with P1 (≈1.30 each); the hybrid recommendation is insensitive to this, since it executes both. 5. **Pessimistic P3 merge_probability (4 → 3)** to model code-owner absenteeism: P3 → 1.18, between P1 and P5. Mitigation (report 07 risk register R2): pre-select wave targets with responsive code owners or no code owner (where core reviews directly). Under every tested perturbation, the top tier is some ordering of {P3, P1, P6} and the bottom tier is {P4, P2}. The hybrid of the top tier is therefore robust.