# 3. Log Structure, Validation, Ordering, and Merge Semantics ## 3.1 The log A **log** is the set of all operations whose `log` member names a given identity. It is: - **Append-only:** operations are immutable and never removed. All change is expressed as new operations (corrections, refutations, revocations). - **Multi-writer:** each authorized author key writes its own hash chain (`seq`/`prev`) inside the log. - **A Merkle DAG:** `prev` and `deps` references are hashes, so the whole structure is tamper-evident, and any subset of heads commits to its full causal history. Definitions (using notation from `01-…` §1.7): - **Chain(log, author):** the sequence of that author's operations ordered by `seq`. - **Heads:** operations that no operation in the local store references via `prev` or `deps`. The head set is the log's synchronization frontier. - **Closed set:** a set of operations containing all `prev`/`deps` ancestors of each of its members. Derived state (§6–8) is only defined over closed sets. ## 3.2 Validation pipeline A verifier processes a received operation in **stages, in this order**, and reports the error code of the **first failing stage**. (The conformance vectors test this ordering.) | Stage | Check | Failure code | |---|---|---| | 1. Encoding | Bytes are valid UTF-8 and valid JSON; nesting depth ≤ 16. | `ERR_ENCODING` | | 2. Canonical | Bytes are exactly OMP-CJ canonical form (`01-…` §1.1): sorted members, no whitespace, minimal escapes, canonical integers. | `ERR_CANONICAL` | | 3. Version | `v` is present, a string, and equals `"omp/0.2"`. | `ERR_VERSION` | | 4. Schema | Envelope and body conform to `02-operations.md` and the JSON Schemas: required members present, no unknown members, types/grammars/enum values correct. | `ERR_SCHEMA` | | 5. Limits | Size and cardinality limits (`01-…` §1.6). | `ERR_LIMIT` | | 6. Range | Numeric domain checks not already structural: `confidence_ppm` 0…1000000, `max_depth` 0…8, `seq`/`lc` ≥ 1. | `ERR_RANGE` | | 7. Signature | Ed25519 verification of `sig` over the preimage against `author`. | `ERR_SIG` | | 8. Chain | `seq`=1 ⇔ `prev`=null; if `prev` non-null and known: `prev` is by the same `author` in the same `log` with `seq` exactly one less. Duplicate `(log, author, seq)` with a *different* op_id is **equivocation** (§4). | `ERR_CHAIN` | | 9. Clock | `lc` strictly greater than `lc` of `prev` (if known) and of every known `deps` member. | `ERR_CLOCK` | | 10. References | Every `deps`/body reference that is known locally has the required `type` and the same `log`; intra-body rules (e.g. `inline_b64` matches `content_hash`; `subject` = `log`; `target` ≠ `replacement`). | `ERR_REF` | | 11. Authorization | The author had the required capability at this operation's causal position (`04-…` §4). | `ERR_AUTHZ` | **Deferral, not rejection:** if a check in stages 8–11 needs a referenced operation the verifier does not have, the operation is parked with status `DEFER_MISSING_DEP` and re-validated when the missing ancestor arrives. Deferred operations MUST NOT contribute to derived state. A verifier MAY bound its deferral buffer and drop deferred operations under resource pressure (they can be re-synced); it MUST NOT mark them invalid. Stages 1–7 are **context-free**: their verdict for given bytes never changes. A verifier MAY cache them. Stages 8–11 are evaluated against the closed set and are **monotone**: once an operation is accepted into a closed set, adding more operations never revokes its acceptance (later revocations change *derived state*, not validity — see `04-…` §4.6). ## 3.3 Non-canonical bytes are invalid Operations are exchanged as canonical bytes (`01-…` §1.1). A replicator MUST NOT "helpfully" re-canonicalize and re-hash a non-canonical operation: the signature and ID are over canonical bytes, so a non-canonical encoding either breaks the signature or indicates a non-conforming producer. Reject with `ERR_CANONICAL`. ## 3.4 Equivocation If two distinct valid operations share `(log, author, seq)`, the author's device has forked its chain (malice, restored backup, or cloned key). Normative handling: 1. Both operations and both branches **remain in the log** (append-only; evidence of the fork must itself be auditable and syncable). 2. The replicator MUST surface an **equivocation event** (`WARN_EQUIVOCATION`) naming the author key and seq; the architecture expects the UI to demand a key rotation (revoke the author's grant, authorize a fresh key). 3. Derived state remains deterministic: both branches' operations participate in the total order of §5 like any other operations. No branch is discarded. 4. Operations on either branch still satisfy stage-8 validation (each has a well-formed `prev` link); equivocation is a property of the *pair*, detected at merge time. ## 3.5 Total order Merge semantics that need a "last writer" use the following total order over a closed set. For operations `A`, `B`: **A < B** iff 1. `A.lc < B.lc`, or 2. `A.lc == B.lc` and `op_id(A) < op_id(B)` (byte-wise comparison of the lowercase-hex ID strings). Because `lc` strictly increases along every `prev`/`deps` edge, this order is a **linear extension of causality**: if `A ≺ B` then `A < B`. It is also independent of arrival order and identical on every node — the keystone of convergence. **Convergence theorem (informative):** derived state (§6–8) is defined as a fold over the closed set in total order, where each step is a pure function. Hence two replicators holding the same closed set compute identical state; and merging is set union followed by recomputation (or incremental equivalents). ## 3.6 Claim state machine Every accepted `claim-assert` has a **status** in: ``` active | superseded | refuted | tainted ``` computed over the closed set as follows (definitions are order-independent except where the total order is explicitly invoked): 1. **superseded:** some accepted `correction` targets the claim. - Its **current replacement** is the `replacement` of the *greatest* such correction in total order (or none, if that correction has no replacement). Concurrent corrections thus resolve deterministically: latest-in-total-order wins; the others remain in the log as history. 2. **refuted:** some accepted `refutation` targets the claim. Refuted takes precedence over superseded if both apply. 3. **tainted:** the claim is not superseded/refuted itself, but its derivation inputs are bad. Define the **bad set** as the smallest set containing: - every refuted or superseded claim, - every refuted evidence operation, - every claim having a `derived_from` element whose target is in the bad set, - every `inference-call` having an `inputs` element in the bad set, - every claim with a `derived_from` element of kind `inference` whose target inference-call is in the bad set. A claim in the bad set only by propagation (not directly superseded/refuted) has status **tainted**. 4. **active:** otherwise. Consequences (normative): - Taint **cascades transitively and automatically**: correcting one claim invalidates the whole downstream derivation subgraph with no further operations required. - Taint is **recoverable**: the derivation engine re-derives by issuing fresh `claim-assert` operations with clean `derived_from` sets; new claims have independent status. (How re-derivation decides what to recompute is Milestone 4; the *invalidations* are fully determined here.) - Only **active** claims are eligible for read-scope projection to grantees and SHOULD be the only claims fed to assistants by default; superseded claims MAY be shown as history with their replacement chain. - Evidence has a two-valued status: `active` or `refuted`. ## 3.7 Derivation graph The derivation graph is the directed graph whose nodes are operations and whose edges are `derived_from[].op` (claims) and `inputs[]` (inference-calls). Normative properties: 1. The graph MUST be acyclic. Cycles are impossible to *create* honestly (edges point to already-hashed operations) and impossible cryptographically without a hash collision; a verifier MAY assume acyclicity. 2. Edges never dangle in a closed set: an edge to an unknown operation defers the source operation (§3.2). 3. The "why" of any claim is the transitive closure of its derivation edges, ending at evidence operations, user statements (`derived_from` empty, `method:"user-statement"`), and inference-calls (which carry model identity and request hashes). ## 3.8 Grant state machine Every accepted `permission-grant` has status in `active | revoked | expired`: - **revoked:** some accepted `revocation` targets it (permanent). - **expired:** not revoked, but `expires_at` is set and the evaluation-time policy of `04-…` §4.4 deems it past. Expiry affects *use* of the grant, never the validity of already-accepted operations. - **active:** otherwise. Authorization checking against grants — including the causal rule that makes revocation mechanically effective across nodes — is specified in `04-keys-and-capabilities.md` §4. ## 3.9 Merging two logs Given local closed set `S` and remote operations `R` (e.g. from a sync session): 1. Validate each `r ∈ R` through the pipeline (§3.2) against `S ∪ R` (iterating deferrals until a fixed point — each pass either accepts/rejects an operation or leaves the deferred set unchanged, so this terminates). 2. The new closed set is `S ∪ accepted(R)`. Rejected operations MUST NOT be stored as log members; they MAY be quarantined for diagnostics. 3. Recompute (or incrementally update) derived state per §3.5–3.8. 4. Report new heads, equivocation events, and claims whose status changed (the UI layer shows the user "what changed and why"). Because acceptance is monotone and state is a pure function of the closed set, merge is **commutative, associative, and idempotent** over operation sets — logs may sync in any order, any number of times, through any topology of user-owned nodes. ## 3.10 What is *not* ordered OMP deliberately assigns no semantics to: - The relative wall-clock times of operations (advisory only). - The relative order of operations from different authors **except** where the total order is explicitly invoked (current-replacement selection) or causality is explicitly required (authorization, §4 of `04-…`). - Arrival order at any node.