# 1. Encoding, Identifiers, Hashing, and Signatures ## 1.1 Canonical JSON profile (OMP-CJ) Every operation is serialized as **canonical JSON** and exchanged **as those exact bytes**. There is no "re-canonicalize on receipt": a verifier MUST check that the received bytes are already in canonical form and reject with `ERR_CANONICAL` if not (`03-log-and-merge.md` §3). OMP-CJ is a strict subset of JSON (RFC 8259). For the value domain OMP permits, OMP-CJ output is byte-identical to RFC 8785 (JCS); implementers MAY use a JCS library, but the rules below are complete on their own. ### 1.1.1 General 1. Encoding MUST be UTF-8 without BOM. 2. There MUST be no whitespace outside of string literals. 3. The top level of an operation MUST be a JSON object. ### 1.1.2 Objects 1. Member names MUST be unique within an object. 2. Members MUST be ordered by lexicographic comparison of their names as sequences of Unicode code points (for the ASCII names this protocol permits, this equals byte order of the UTF-8 encoding, and equals RFC 8785 ordering). 3. Member names defined by this protocol match `^[a-z0-9_]+$`. Extension member names MUST match `^x_[a-z0-9_]+$` (`05-versioning-errors-extensibility.md` §3). Any other member name is `ERR_SCHEMA`. ### 1.1.3 Strings Strings are emitted with the **minimal escaping** rules of RFC 8785: 1. `"` is escaped as `\"`; `\` is escaped as `\\`. 2. Control characters U+0000–U+001F use the short forms `\b` (U+0008), `\t` (U+0009), `\n` (U+000A), `\f` (U+000C), `\r` (U+000D); all other control characters use `\u00XX` with **lowercase** hexadecimal digits. 3. All other characters are emitted literally as UTF-8. No other escapes are permitted (`\/`, `\uXXXX` for printable characters, uppercase hex, and surrogate escapes are all non-canonical). 4. Strings MUST be valid Unicode: unpaired surrogates MUST NOT appear. ### 1.1.4 Numbers 1. The only permitted numbers are **integers** in the inclusive range **−9007199254740991 … 9007199254740991** (±(2⁵³−1)). 2. Canonical form: optional leading `-` (never for zero), then decimal digits with no leading zeros. `-0`, `1.0`, `1e3`, `01` are all non-canonical (`ERR_CANONICAL`). 3. Fractional or exponent-form numbers anywhere in an operation are invalid. Quantities that are conceptually fractional are scaled integers (e.g. `confidence_ppm`, parts-per-million, 0…1000000). ### 1.1.5 Literals `true`, `false`, `null` in lowercase only. ## 1.2 Identifiers All identifiers are printable ASCII strings with an algorithm prefix, so future suites can be added non-breakingly. | Identifier | Grammar | Meaning | |---|---|---| | Operation ID | `sha256:` + 64 lowercase hex | Hash of the operation preimage (§1.4). | | Content hash | `sha256:` + 64 lowercase hex | Hash of raw evidence bytes or opaque payloads. | | Identity ID | `omp:id:ed25519:` + 64 lowercase hex | A human identity; the hex is the 32-byte Ed25519 **root public key** (RFC 8032 encoding). | | Key ID | `omp:key:ed25519:` + 64 lowercase hex | An operational (device/agent) Ed25519 public key. | | Signature | `ed25519:` + 128 lowercase hex | A 64-byte Ed25519 signature. | Rules: 1. Hex MUST be lowercase. Uppercase or mixed case is `ERR_SCHEMA`. 2. An identity's root key, acting as an author, is written in key form (`omp:key:ed25519:`). The mapping between the two forms is the identity relation used throughout (`04-keys-and-capabilities.md` §1). 3. Verifiers MUST reject unknown algorithm prefixes in fields where this version requires `sha256`/`ed25519` (`ERR_SCHEMA`); they are reserved for future versions. ## 1.3 Timestamps Wall-clock timestamps are RFC 3339 strings restricted to: ``` YYYY-MM-DDTHH:MM:SSZ YYYY-MM-DDTHH:MM:SS.mmmZ ``` i.e. UTC only (`Z`, uppercase), `T` uppercase, optional exactly-3-digit milliseconds. Regex: `^[0-9]{4}-[0-9]{2}-[0-9]{2}T[0-9]{2}:[0-9]{2}:[0-9]{2}(\.[0-9]{3})?Z$`, and the value MUST be a real calendar instant (no month 13, leap seconds not permitted: seconds 00–59). Wall-clock timestamps are **advisory** (for humans and heuristics). Ordering and merge semantics never depend on them; they depend on the Lamport clock and operation IDs (`03-log-and-merge.md` §5). A verifier MUST NOT reject an operation because its timestamp is "in the future" or "too old". ## 1.4 Operation preimage, ID, and signature Let `E` be the operation object **without its `sig` member**, and `canon(E)` its OMP-CJ serialization. Define the **domain-separation prefix**: ``` DS = utf8("omp/0.2:op\n") # 11 bytes: 6f 6d 70 2f 30 2e 32 3a 6f 70 0a ``` and the **preimage**: ``` P = DS || canon(E) ``` Then: - **Operation ID:** `op_id = "sha256:" + hex(sha256(P))` - **Signature:** `sig = "ed25519:" + hex(Ed25519-Sign(author_private_key, P))` The wire form of the operation is `canon(E')` where `E'` is `E` plus the `sig` member. Because `sig` sorts where it sorts (`1.1.2`), the wire bytes are fully determined. Verifier obligations: 1. Recompute `canon(E)` by *removing* the `sig` member from the received (already canonical) bytes — implementations typically splice out the `"sig":"…"` member textually or re-serialize the parsed object; both MUST yield identical bytes if the input was canonical. 2. Verify the signature over `P` against the `author` public key (`ERR_SIG` on failure). 3. Compute `op_id` from `P`. The ID is *derived*, never transmitted as a field of the operation itself; other operations reference it. The domain prefix ensures bytes signed as an OMP operation can never collide with bytes signed in another OMP context (future contexts will use distinct prefixes, e.g. sync session transcripts). ## 1.5 Binary payloads The only inline binary field in this version is `inline_b64` on `evidence-ingest`. It uses **base64url without padding** (RFC 4648 §5, no `=`), alphabet `A–Z a–z 0–9 - _`. Decoded size limits are in §1.6. Padding characters or the standard alphabet (`+`, `/`) are `ERR_SCHEMA`. ## 1.6 Size and cardinality limits A verifier MUST enforce these limits (`ERR_LIMIT` unless another code is named): | Limit | Value | |---|---| | Canonical bytes of one operation (`canon(E')`) | ≤ 65536 | | `deps` array length | ≤ 32 | | `derived_from` array length | ≤ 64 | | `inputs` array length (inference-call) | ≤ 256 | | `caps` array length | ≤ 4 (set of distinct known caps) | | `scope.predicates` length | ≤ 64 | | `inline_b64` decoded length | ≤ 4096 bytes | | `predicate` string length | ≤ 128 | | `summary`, `purpose` length | ≤ 512 | | `reason`, `note` length | ≤ 2048 | | `object` member of claim-assert, canonical bytes | ≤ 8192 | | Nesting depth of any operation (top level = 1) | ≤ 16 | | `seq`, `lc` | ≥ 1 and ≤ 2⁵³−1 | String lengths are counted in Unicode code points. ## 1.7 Notation used in the rest of this spec `ancestors(O)` — the set of operations reachable from operation `O` by following `prev` and `deps` references transitively (not including `O` itself). `O₁ ≺ O₂` — `O₁ ∈ ancestors(O₂)` ("causally precedes"). `O₁ ∥ O₂` — neither precedes the other ("concurrent").