# Architecture Overview: Home Assistant in 2025

This section describes the system an effective contributor must understand. It is organized
from the kernel outward, because the central architectural fact of Home Assistant is a
**stable, strictly-governed core surrounded by a vast, variable-quality integration layer**
— and that asymmetry is the foundation of this report's recommendation.

## 1. The shape of the project

Home Assistant is not one repository; it is a constellation governed since 2024 by the
**Open Home Foundation** (a Swiss non-profit holding the IP), with commercial development
funded primarily by **Nabu Casa** (Home Assistant Cloud subscriptions):

| Repository / project | Role | Language |
|---|---|---|
| `home-assistant/core` | The hub: event loop, state machine, entity model, automation engine, **and all ~2,800 bundled integrations** | Python (3.13-era, fully async) |
| `home-assistant/frontend` | The web UI (and the UI embedded in mobile apps) | TypeScript / Lit web components |
| `home-assistant/supervisor` | Manages the containerized deployment: add-ons, OS updates, backups | Python |
| `home-assistant/operating-system` | Home Assistant OS — Buildroot-based appliance OS | Buildroot/shell |
| `home-assistant/architecture` | ADRs and architecture discussions | Markdown |
| Mobile apps (`iOS`, `android`) | Companion apps: notifications, sensors, location | Swift / Kotlin |
| Protocol servers | `python-matter-server`, `zwave-js-server` (Node), `zigpy` stack, ESPHome, Wyoming (voice) | Mixed |
| `home-assistant.io` / `developers.home-assistant` | User and developer documentation | Markdown |

Deployment comes in four supported flavors — **Home Assistant OS** (the appliance, dominant
among reporting installs), **Supervised**, **Container**, and **Core** (bare venv). Add-ons
(separately containerized apps managed by the Supervisor) exist only on OS/Supervised.
This matters to contributors mainly in one way: **integrations cannot assume the host
environment** — no shelling out, no system packages, pure-Python (or pre-built wheel)
dependencies only, a constraint enforced by `hassfest` manifest validation and the
wheels build.

## 2. The core kernel

Everything in `home-assistant/core` outside `homeassistant/components/` is a compact,
high-quality kernel (~50k lines) that has been stable in shape for years:

### 2.1 Event bus, state machine, service registry

The `HomeAssistant` object (`hass`) owns a single asyncio event loop and three primitives:

- **Event bus** (`hass.bus`): typed events (`state_changed`, `call_service`,
  `homeassistant_started`, …). Everything observable in the system is an event.
- **State machine** (`hass.states`): the current state of every entity —
  `entity_id → State(state: str, attributes: dict, last_changed, last_updated, context)`.
  State writes fire `state_changed` events; the state machine is the *consequence* of
  entity updates, never the source of truth for devices.
- **Service registry** (`hass.services`): named, schema-validated actions
  (`light.turn_on`, …) — renamed "actions" in user-facing UI in 2024, still services in
  code. Service calls carry a `Context` (user, parent action) enabling the trace/attribution
  system.

Execution discipline is strict and enforced in review: the event loop must never block.
Synchronous library calls must be pushed through the executor
(`hass.async_add_executor_job`); a blocking-call detector in core actively logs/raises on
known blocking operations (file I/O, `time.sleep`, blocking HTTP) inside the loop. This
single rule is the root cause of the Platinum tier's hardest requirement (§5).

### 2.2 The entity model and registries

- **Entities** are Python objects subclassing per-domain bases (`SensorEntity`,
  `LightEntity`, `ClimateEntity`, … ~50 domains). The base classes define the *contract*
  (supported features bitmasks, device classes, state classes, units) and core handles
  state-machine writes, so an integration mostly fills in properties.
- **Entity registry / device registry / area registry**, plus **floors, labels, and
  categories** (added 2024.4): persistent metadata that survives restarts and lets users
  rename/organize without integrations caring. Devices group entities and carry identifiers
  (so multiple integrations can attach to one physical device), manufacturer/model/sw
  version, and `via_device` topology.
- **Unique IDs** are the load-bearing concept: an entity with a stable `unique_id` gets
  registry persistence, user customization, and survives re-setup. Lack of unique IDs is a
  classic legacy-integration defect and a Bronze-tier rule.

### 2.3 Config entries and flows

Modern integrations are configured through **config entries** — persisted setup records
created by **config flows** (UI wizards defined in the integration's `config_flow.py`).
The flow framework also provides:

- **Discovery-initiated flows** (zeroconf/mDNS, SSDP, DHCP-sniffing, Bluetooth
  advertisements, USB enumeration, MQTT discovery, `hassio` add-on discovery) — declared in
  the manifest so core can wake the right integration when hardware appears.
- **Reauth flows** (cloud token expired → actionable repair issue → guided re-login instead
  of a silently dead integration) and **reconfigure flows** (change host/IP without
  delete-and-re-add, generalized in 2024).
- **Options flows** for post-setup settings, and **subentries** (2025-era) for
  per-device/per-resource children of one entry.

YAML configuration survives for the automation/script/template layer and a shrinking set of
infrastructure integrations; **new device integrations must be config-entry based** (ADR-0010).
The decade-long YAML→UI migration is essentially complete policy-wise, but *partially*
complete code-wise — another quality-variance source in the long tail.

### 2.4 Update logic: `DataUpdateCoordinator`

The blessed pattern for polling integrations: one coordinator fetches per device/account on
an interval (or via push callbacks), entities subscribe, and core gets centralized error
handling (`UpdateFailed` → entities marked unavailable → automatic recovery on next success),
debounced refreshes, and request parallelism control. Half of the "integration dies until
restart" bug class in report 04 traces to integrations that predate or sidestep this
pattern and hand-roll update loops with broken error recovery.

### 2.5 Automation engine

Triggers (state, time, event, device, template, …) → conditions → actions, executed by a
script engine with run modes (single/restart/queued/parallel), `wait`/`repeat`/`choose`
control flow, **traces** (step-by-step execution recording for debugging), and
**blueprints** (parameterized shareable automations). Jinja2 templating is pervasive.
This subsystem is core-team-owned, actively developed (the roadmap's "automation
usability" track), and **not** a good outsider target.

### 2.6 Recorder, statistics, energy

The **recorder** persists events/states to SQLAlchemy-backed storage (SQLite default;
MariaDB/PostgreSQL supported) with a heavily optimized schema (state attributes
deduplicated/compressed; major schema overhauls landed 2022–2023 cutting DB size several-fold).
**Long-term statistics** (5-minute/hourly aggregates kept forever) power history charts and
the **energy dashboard**. The contributor-relevant consequence: sensors must declare correct
`device_class`, `state_class`, and units or statistics silently misbehave — one of the most
common *user-visible* defects in the long tail (report 04 theme "entity correctness"), and
exactly what quality-scale review catches.

### 2.7 Auth, API surface, frontend

Native auth (users, refresh/access tokens, MFA), a WebSocket API (primary frontend
transport), REST API, and server-sent events. The frontend is a separate
TypeScript/Lit codebase consumed by core as a built Python package
(`home-assistant-frontend`); dashboards are user-configurable ("Lovelace"), with a new
sections/drag-drop layout system rolled out across 2024–2025. Frontend contribution is a
distinct skill set and review pipeline — relevant to the feature-gap path's cost in report 06.

## 3. The integration layer — where the variance is

`homeassistant/components/` contains ~2,800 directories, each one integration ("domain"),
each with a `manifest.json` declaring: domain, name, dependencies (other integrations),
`requirements` (PyPI packages, **exact-pinned**), discovery hooks, `iot_class`
(`local_push`, `local_polling`, `cloud_push`, `cloud_polling`, `assumed_state`,
`calculated`), `code_owners`, `integration_type`, and — since October 2024 —
`quality_scale`.

Key structural properties of this layer:

1. **Monorepo with per-integration ownership.** All bundled integrations live in core and
   ride core's CI, but each has (optionally) listed **code owners** — community maintainers
   auto-requested on PRs/issues. A large fraction of integrations have no code owner or an
   inactive one. Core team members review everything that merges, but they don't *drive*
   per-integration work.
2. **The third-party-library rule (ADR-0011-era policy):** integrations may not implement
   protocol logic inline; device/API communication must live in a published PyPI library.
   Consequence: an integration's ceiling is often set by a library *outside* the
   contributor's direct control — the single biggest effort risk for Platinum uplift
   (report 06 §4).
3. **Exact-pinned requirements** (uniqueness enforced repo-wide): upstream API drift
   requires a library release *plus* a core bump PR. "Pinned lib is broken against the
   vendor's current API" is a top-five issue theme in report 04.
4. **Custom components** (`custom_components/`, distributed via HACS) form a parallel
   ecosystem outside core quality control. Migration of popular custom components into core
   is a recurring, partially-realized ambition; out of scope here but noted in report 05.

## 4. Quality machinery already in place

This is the scaffolding any contribution program inherits for free — and it is excellent:

- **`hassfest`**: static manifest/translations/services validation, runs in CI and as a
  pre-commit-style check; also validates `quality_scale.yaml` rule files against the
  declared tier.
- **Test harness**: `pytest-homeassistant-custom-component`-style fixtures live in core
  itself (`tests/common.py`, `MockConfigEntry`, time-travel helpers, snapshot testing via
  `syrupy`); per-integration coverage is tracked and **full coverage of `config_flow.py` is
  mandatory for new integrations**; the `.coveragerc`/coverage roster marks legacy
  integrations exempted from coverage gates — a literal machine-readable list of the debt.
- **Strict typing roster** (`.strict-typing`): integrations opt in to full mypy strict
  mode file-by-file; Platinum requires it.
- **Repairs platform**: integrations raise actionable, user-visible issues ("your token
  expired, click to fix") instead of log spam.
- **Diagnostics platform**: one click exports redacted integration state for bug reports —
  a Gold-tier rule, and its absence measurably degrades issue-triage quality in the data
  of report 04.
- **Release train**: monthly minor releases (`2025.x`), beta week, patch releases; deprecation
  policy of (typically) six months for breaking changes, announced in release notes.

## 5. Architectural consequences for this report

Five conclusions that the rest of the report builds on:

1. **The kernel is not the opportunity.** It is actively stewarded by the paid team,
   architecturally conservative (ADR-gated), and high-friction for outsiders. Marginal
   outsider hours there compete with the people best positioned in the world to do that work.
2. **The integration layer is the opportunity.** It is huge, popularity-concentrated
   (report 04: install base is extremely top-heavy), quality-variable, and the project has
   *just finished building* a rules-and-tooling framework (the Quality Scale) whose explicit
   purpose is to let contributors raise integration quality in a reviewable, checklisted way.
3. **Effort is predictable at Bronze→Silver, unpredictable at →Platinum.** Silver-tier
   rules (reauth flow, unavailability handling, coverage, parallel updates) are local to the
   integration. Platinum rules (fully async dependency, strict typing through the
   dependency's type surface) depend on third-party libraries — sometimes unmaintained ones —
   making "polish the top 100 to Platinum" an unbounded-cost program (scored in report 06).
4. **Bug themes map onto tier rules.** The dominant open-issue themes are the precise
   failure modes Silver/Gold rules exist to prevent, so uplift work *subsumes* the
   highest-value bug fixing rather than competing with it.
5. **Feature-gap work pays a structural toll.** Anything user-facing and cross-cutting
   needs an architecture discussion, likely frontend work in a separate repo/skill-set, and
   scarce core-team review bandwidth — all friction multipliers an integration-layer
   program avoids.

These are assessed quantitatively in reports 03–06.