# Quality-Scale Audit: `onvif` > Audit snapshot per rubric; re-verify against `dev` before execution. ## 1. Integration profile | Attribute | Value | |---|---| | Domain | `onvif` | | IoT class | `local_push` (PullPoint/webhook events) + `local_polling` fallback | | Primary library | `onvif-zeep-async` (async SOAP via `zeep`/`httpx`) | | Declared `quality_scale` | none declared | | Install base | Top-50; the default path for generic IP cameras (Hikvision/Dahua/Reolink-generic, Amcrest, Tapo cams, NVRs) | | Issue volume | **Very high per install** — event subscription failures (PullPoint renew loops, webhook reachability), camera profiles missing, time-sync auth failures (WS-UsernameToken nonce drift), setup timeouts against slow NVRs, spammy logs | | CODEOWNERS | Yes — active owner with deep event-subsystem knowledge; all event work must be co-designed | **Why selected:** highest issue-to-install ratio in the candidate set. The device population is the most heterogeneous (hundreds of non-compliant ONVIF stacks), so the leverage is in *diagnosability* (diagnostics, log hygiene, clear errors) and *resilience* (reauth, graceful event degradation), not feature work. ## 2. Rule-by-rule audit ### Bronze | Rule | Status | Evidence / Gap | |---|---|---| | `config-flow` | ✅ | UI flow with manual + discovered (WS-Discovery via `wsdiscovery`) paths; DHCP triggers exist for known camera MAC prefixes. | | `test-before-configure` | ✅ | Device info + profile fetch before entry creation; detects time drift and surfaces it. | | `unique-config-entry` | ✅ | MAC or EndpointReference-based unique ID with documented fallback order. | | `config-flow-test-coverage` | 🟡 | Flow tests exist; discovered-device dedupe and time-drift error branches need coverage verification. | | `entity-unique-id` | ✅ | Per-profile/per-event unique IDs. | | `has-entity-name` | 🟡 | Event-derived binary sensors/sensors generate names from ONVIF topic strings; verify `has_entity_name` + sensible fallback naming; topic-derived names are often ugly (`tns1:RuleEngine/CellMotionDetector/Motion`) — naming polish is user-visible but must not alter unique IDs. | | `runtime-data` | 🟡 | Verify `entry.runtime_data` holds the `ONVIFDevice` wrapper. | | `appropriate-polling` | ✅ | Push-first; polling only for snapshot/PTZ status where needed. | | `entity-event-setup` | ✅ | Event entities subscribe in `async_added_to_hass` via the device event manager. | | `action-setup` | 🟡 | PTZ service: verify registration pattern and validation errors are translated. | | `common-modules` | ✅ | `device.py`, `event.py`, `parsers.py` structure is established (different names than convention, acceptable). | | `dependency-transparency` | ✅ | PyPI, public source. | | `docs-*` (Bronze set) | 🟡 | Docs explain events/webhooks; need a troubleshooting matrix (see OV-6) and removal instructions check. | | `brands` | ✅ | Present. | ### Silver | Rule | Status | Evidence / Gap | |---|---|---| | `config-entry-unloading` | ✅ | Unload cancels renew tasks and unsubscribes; verify no lingering `httpx` clients (reported leak class). | | `reauthentication-flow` | ✅/🟡 | Reauth flow exists for credential rejection; **gap:** auth failures caused by device/HA clock drift surface as generic setup failure rather than a actionable error — detect drift (`Fault` subcode `NotAuthorized` + time check) and message it distinctly. | | `action-exceptions` | 🟡 | PTZ and reboot/snapshot failures partially log-only. | | `entity-unavailable` | 🟡 | When PullPoint dies and webhook is unreachable, event entities silently freeze. Rule needs entities marked unavailable (or `restored` semantics) when the event stream is down. **Key gap** (OV-2). | | `log-when-unavailable` | ❌ | **Key gap.** Renew-failure loops produce log spam (a recurring complaint). Implement log-once on stream loss, single INFO on recovery, DEBUG for retries. | | `parallel-updates` | 🟡 | Declare explicitly (`0` for push platforms; PTZ/button actions `1` — many cameras choke on concurrent SOAP). | | `integration-owner` | ✅ | Present. | | `test-coverage` | 🟡 | Config flow and parsers covered; event-manager lifecycle (renew failure → fallback → recovery) under-covered — this is exactly where bugs live. | | `docs-configuration-parameters` | 🟡 | Webhook-vs-PullPoint behavior and network requirements need parameter-level docs. | ### Gold (selected) | Rule | Status | Notes | |---|---|---| | `diagnostics` | ✅/🟡 | `diagnostics.py` exists; **extend** with event-subsystem state (active transport, last renew result, parsed topic list, webhook reachability) — the single highest-leverage triage improvement for this integration (OV-1). Verify redaction (credentials, host, serial). | | `discovery` | ✅ | WS-Discovery + DHCP. | | `dynamic-devices` | ➖ | Single device per entry. | | `repair-issues` | ❌ | Out of scope, except: a repair issue for "webhook unreachable, degraded to PullPoint" is cheap inside OV-2 — optional, owner's call. | | `strict-typing` | ❌ | Out of scope (Platinum); keep new code strict-clean. | ## 3. Gap summary and proposed work items | ID | Work item | Rule(s) | Size | Risk | |---|---|---|---|---| | OV-1 | Extend diagnostics with event-subsystem state + redaction review | `diagnostics` | S | Low | | OV-2 | Event-stream availability: mark event entities unavailable on stream loss; optional repair issue on webhook→PullPoint degradation | `entity-unavailable` | M | **High** — core event manager; co-design with owner mandatory | | OV-3 | Log hygiene: log-once unavailable/restore, demote retry loops to DEBUG | `log-when-unavailable` | S | Low | | OV-4 | Distinct, translated error for time-drift auth failures (setup + reauth description) | `reauthentication-flow`, `exception-translations` (partial) | S | Medium | | OV-5 | Event-manager lifecycle tests (renew failure, transport fallback, recovery) to ≥95 % on `event.py` | `test-coverage` | M (tests only) | Medium — needs careful async fixtures | | OV-6 | Docs: troubleshooting matrix (symptom → cause → fix), parameter docs for webhook/PullPoint | `docs-*` | S–M (docs repo) | Low | | OV-7 | `PARALLEL_UPDATES` declarations, translated action exceptions (PTZ etc.) | `parallel-updates`, `action-exceptions` | S | Low | | OV-8 | Add `quality_scale.yaml`; declare Bronze, then Silver after OV-2…OV-5, OV-7 | meta | XS | Low | **Target outcome:** unscored → **Silver**, with materially better triage (OV-1, OV-6) expected to reduce inbound issue volume — the primary KPI for this integration per the impact report (milestone #1, §4.3). ## 4. Maintainer-coordination notes - **Do not touch the event manager without prior agreement.** Open a design issue for OV-2 with a state diagram (subscribed → renewing → degraded → lost → recovered) and let the owner shape it; the owner has hardware that CI cannot replicate. - Hardware diversity means regressions surface post-release; schedule OV-2 early in a release cycle and stay available through the beta (engagement plan §7).