# FPM 0.2 — Worked Byte-Level Examples Status: Normative examples. Where this document and sections 01–05 disagree, sections 01–05 win; where this document and the generated test vectors disagree, the **vectors** win (they are produced mechanically by `tools/conformance`). This document walks through canonicalization, signing, hashing, and chaining at the byte level. Every value that requires cryptography (public keys, signatures, digests) is produced deterministically by the conformance generator from the fixed seeds in §3 and recorded in `spec/02-wire-format/vectors/`. Values written here as `{PLACEHOLDER}` are substitution points; the generator emits the fully substituted byte strings, and an implementation MUST reproduce them exactly. --- ## 1. Notation - `hex(...)` — lowercase hexadecimal. - `b64u(...)` — base64url per RFC 4648 §5, **no padding**. - `sha256(...)` — SHA-256 over raw bytes. - `canon(x)` — canonical serialization of JSON value `x` per §2. - Byte strings are shown either as JSON-looking text (which is exactly the UTF-8 bytes, no trailing newline) or as space-separated hex. --- ## 2. Canonicalization, byte by byte FPM canonical serialization is RFC 8785 (JCS) restricted to the **canonical JSON subset** of section 01: the only numbers are exact integers in `[-(2^53 - 1), 2^53 - 1]`. This restriction removes JCS's hardest part (ECMAScript float formatting) so a complete serializer is ~60 lines in any language. Rules: 1. **UTF-8**, no BOM, no insignificant whitespace anywhere. 2. **Objects**: members sorted by property name, compared as sequences of UTF-16 code units (for ASCII names this equals bytewise order). Duplicate names are **invalid input** — a verifier MUST reject, not last-wins. 3. **Arrays**: element order preserved. 4. **Strings**: shortest-form escaping — - `"` → `\"` and `\` → `\\` - control characters U+0000–U+001F: `\b` `\t` `\n` `\f` `\r` for 0x08 0x09 0x0A 0x0C 0x0D; all others as `\u00XX` with **lowercase** hex - every other character is emitted literally as its UTF-8 bytes (no `\uXXXX` escapes for printable or non-ASCII characters) 5. **Integers**: minimal decimal, optional leading `-`, no `+`, no leading zeros, never `-0` (serialize as `0`). The lexical forms `1.0`, `1e2`, `NaN`, `Infinity` are invalid on the wire. 6. **Literals**: `true`, `false`, `null` exactly. ### 2.1 A fully concrete corner case Input value (member order as received, deliberately unsorted): ```json { "b": 1, "a": 2, "\n": 3, "é": 4 } ``` Sort keys by UTF-16 code units: `"\n"` (U+000A) < `"a"` (U+0061) < `"b"` (U+0062) < `"é"` (U+00E9). Canonical text: `{"\n":3,"a":2,"b":1,"é":4}` Canonical bytes (27 bytes): ``` 7b 22 5c 6e 22 3a 33 2c 22 61 22 3a 32 2c 22 62 22 3a 31 2c 22 c3 a9 22 3a 34 7d ``` Note `\n` is escaped (2 bytes `5c 6e` inside the quotes) while `é` is raw UTF-8 (`c3 a9`), per rule 4. This object appears as test vector `valid/canon-corner-case` with its digest, so you can check your serializer without any cryptography. ### 2.2 Sorting trap `"sig"` sorts **between** `"seq"` and `"ts"` (`e` < `i` < `t`s first letter), so adding the signature to an envelope changes the byte positions of nothing before `"seq"` and shifts everything after it. Never sign-then-splice by string concatenation; always re-serialize. --- ## 3. Deterministic test identities All vectors use Ed25519 keys whose 32-byte private seeds are the **ASCII bytes** of these exactly-32-character strings (so the seeds themselves need no tooling to reproduce): | Identity | Role | Seed (ASCII, 32 bytes) | |---|---|---| | `alice-device` | Alice's phone (primary log) | `fpm-test-seed-alice-device-00001` | | `alice-laptop` | Alice's second node | `fpm-test-seed-alice-laptop-00002` | | `carol-clinic` | Delegated third-party node | `fpm-test-seed-carol-clinic-00003` | Seed hex for `alice-device` (sanity check for your decoder): ``` 66 70 6d 2d 74 65 73 74 2d 73 65 65 64 2d 61 6c 69 63 65 2d 64 65 76 69 63 65 2d 30 30 30 30 31 ``` The corresponding public keys and key ids are emitted by the generator into `vectors/keys.json` as: ```json { "alice-device": { "seed_ascii": "...", "public_key_b64u": "...", "key_id": "ed25519:..." }, ... } ``` Throughout this document, `{A}` = `alice-device`'s 43-character public-key b64u, `{L}` = `alice-laptop`'s, `{C}` = `carol-clinic`'s. All vector timestamps are fixed constants in June 2025 (listed per vector); nothing in the suite depends on the clock. --- ## 4. Worked example A — genesis `evidence-ingest` Alice's phone ingests a one-line plaintext note, `Buy oat milk\n` (13 bytes: `42 75 79 20 6f 61 74 20 6d 69 6c 6b 0a`). **Step 1 — content addressing.** `content_size` = 13. `content_inline` = `b64u(bytes)` = `QnV5IG9hdCBtaWxrCg` (18 chars, no padding). `content_hash` = `sha256:` + `hex(sha256(bytes))` — recorded in `vectors/valid/evidence-ingest-genesis.json` as `derived.content_hash`. **Step 2 — pre-signature envelope.** Build the envelope with every field except `sig`. This is the genesis op of `alice-device`'s log, so `seq` is `0` and `prev` is `null`. **Step 3 — signing input.** `canon(envelope_without_sig)` produces exactly this byte string (one line, no trailing newline; `{A}` and `{H}` substituted): ``` {"author":"ed25519:{A}","body":{"captured_at":"2025-06-01T11:59:30.000Z","content_hash":"sha256:{H}","content_inline":"QnV5IG9hdCBtaWxrCg","content_size":13,"labels":["notes"],"media_type":"text/plain","source":{"adapter":"notes.plaintext","origin":"file:///home/alice/notes/groceries.txt"}},"prev":null,"protocol":"fpm/0.2","seq":0,"ts":"2025-06-01T12:00:00.000Z","type":"evidence-ingest"} ``` Observe the canonical member order at each level — envelope: `author, body, prev, protocol, seq, ts, type`; body: `captured_at, content_hash, content_inline, content_size, labels, media_type, source`; source: `adapter, origin`. **Step 4 — sign.** `sig = b64u(Ed25519-sign(alice_device_seed, signing_input))` — 86 characters. Ed25519 is deterministic, so every conforming implementation produces the identical signature. **Step 5 — final envelope.** Insert `sig`; canonical order becomes `author, body, prev, protocol, seq, sig, ts, type`. **Step 6 — op_id.** `op_id = "sha256:" + hex(sha256(canon(signed_envelope)))`. Note the hash covers the envelope **with** the signature, so the hash chain binds signatures, not just content. The vector file records: `input.json` (the signed envelope), `input.canonical_b64u` (exact canonical bytes), `derived.signing_input_sha256`, `derived.sig`, `derived.op_id`. Your implementation must match all of them byte-for-byte. --- ## 5. Worked example B — chained `claim-assert` with basis and method At `seq` 1, the phone derives a claim from the note (and, in the richer vectors, from calendar evidence). Signing input template: ``` {"author":"ed25519:{A}","body":{"basis":["{OPID_EVIDENCE}"],"confidence_bp":7000,"method":{"kind":"rule","name":"shopping_list_extractor","version":"1.0.0"},"object":{"item":"oat milk"},"predicate":"diet.shopping_item","subject":"self"},"prev":"{OPID_EVIDENCE}","protocol":"fpm/0.2","seq":1,"ts":"2025-06-01T12:00:01.000Z","type":"claim-assert"} ``` Three things to verify here beyond the signature: 1. **Chain**: `prev` equals the op_id computed in example A, and `seq` is exactly `prev.seq + 1`. 2. **Basis resolution**: every entry in `basis` resolves to an `evidence-ingest` or `claim-assert` already in the merged set. (Here `basis[0]` happens to equal `prev`; in general they are unrelated.) 3. **Subset lexing**: `confidence_bp` is the integer token `7000` — a verifier that received `7000.0` or `7e3` MUST reject before schema validation (error `ERR_NOT_CANONICAL`). Vector: `valid/claim-assert-chained`. --- ## 6. Worked example C — `correction` and the staleness cascade At `seq` 2 Alice corrects the claim (she wants soy, not oat). Body template: ``` {"object":{"item":"soy milk"},"reason":"I switched brands","target":"{OPID_CLAIM}"} ``` Effect (spec section 03, restated): the target claim's served value becomes `{"item":"soy milk"}` at confidence 10000; any claim whose `basis` includes `{OPID_CLAIM}` is **STALE** and MUST NOT be served until a fresh `claim-assert` re-derives it. The vector suite includes a three-op derivation chain plus correction (`valid/correction-cascade`) and the verifier's `liveness` report must mark the downstream claim `stale`. `refutation` (vector `valid/refutation-kills-claim`) is the same shape with no replacement object; the target becomes **DEAD** and is additionally barred from future `basis` lists (a later claim citing a refuted op is rejected: `invalid/basis-cites-refuted`). --- ## 7. Worked example D — grant, delegation, revocation `valid/grant-clinic` (authored by `alice-device`, grantee `carol-clinic`): ``` {"delegable":false,"grantee":"ed25519:{C}","note":"share sleep schedule with clinic","scope":{"include_provenance":false,"min_confidence_bp":5000,"predicates":["sleep.*"],"subjects":["self"]}} ``` Serve-time evaluation (no new operations needed): a claim is servable to `{C}` iff its predicate matches `sleep.*`, subject ∈ `{self}`, confidence ≥ 5000, the claim is LIVE, and no `revocation` targeting this grant (or any ancestor, for delegated grants) has been merged. Vector `valid/revocation-of-grant` follows with a revocation at the next `seq`; the verifier's capability report must show the grant `revoked` and an empty servable set. The delegation vectors (`valid/grant-delegated`, `invalid/delegation-escalates-scope`) exercise the intersection rule: a child grant whose scope is not a subset of its parent's effective scope is rejected with `ERR_CAP_ESCALATION`. --- ## 8. Verifying a vector, end to end Given a vector file, a conforming verifier performs, in order: 1. **Byte check**: decode `input.canonical_b64u`; parse with a canonical-subset lexer (rejecting floats, duplicate keys, non-shortest escapes); re-serialize; bytes must round-trip identically (`ERR_NOT_CANONICAL`). 2. **Schema**: validate against `envelope.schema.json` (which dispatches the body) (`ERR_SCHEMA`). 3. **Version gate**: protocol `fpm/0.2` → continue; well-formed but unsupported version → result `defer` (store and forward, do not interpret); malformed → already rejected at step 2. 4. **Signature**: rebuild the signing input by deleting `sig` and re-canonicalizing; verify Ed25519 under the `author` key (`ERR_BAD_SIG`). 5. **op_id**: recompute and compare to `derived.op_id` where present (`ERR_BAD_OPID` in vector context). 6. **Chain**: per-log `seq`/`prev` continuity, genesis rule, no forks (two distinct ops with the same author+seq ⇒ `ERR_LOG_FORK`, see section 03 for the equivocation-handling rule). 7. **Referential & semantic checks**: basis/target/parent resolution, type constraints of targets, inline-content consistency, capability subset rules. 8. Compare the verifier's outcome to the vector's `expect` block. ``` $ fpmconf verify spec/02-wire-format/vectors/valid/claim-assert-chained.json PASS valid/claim-assert-chained (accept) ``` --- ## 9. The five mistakes every new implementation makes 1. Hashing the **pre-signature** bytes for op_id (it's the signed envelope). 2. Emitting `\u00e9`-style escapes for non-ASCII (canonical form is raw UTF-8). 3. Accepting `7000.0` because the JSON library silently coerces it (lex before you parse). 4. Treating `ts` as ordering authority during merge (it is informational only). 5. Forgetting that revoking a parent grant kills every delegated grant under it, even ones the revoker never saw.