# Inspecting and Auditing the Operation Log The operation log is the node's single source of truth. This document explains the anatomy of an operation as stored and exported, how signing and hash chaining work, how to inspect the log from the CLI and from Python, and how to audit a piece of evidence all the way back to raw source bytes. It assumes the milestone-2 wire-format specification; this is the operational companion to that spec. ## 1. Anatomy of an operation Every entry in the log is one **operation**: an immutable, signed, canonical JSON document. Conceptually it has four parts: ``` ┌─ header ──────────────────────────────────────────────┐ │ wire-format version │ │ type e.g. "evidence" │ │ author node public key (the signer) │ │ seq 0,1,2,… per author, no gaps │ │ prev hash of this author's previous op │ │ (null for seq 0) │ │ timestamp RFC 3339 UTC, time of creation │ ├─ payload ─────────────────────────────────────────────┤ │ type-specific body; for evidence: kind, external_id, │ │ and the parsed source content │ ├─ provenance ──────────────────────────────────────────┤ │ adapter, adapter_version, source locator, │ │ source_content_hash, imported_at │ ├─ signature ───────────────────────────────────────────┤ │ Ed25519 over the canonical bytes of everything above │ └───────────────────────────────────────────────────────┘ ``` The **operation ID** is the SHA-256 hash of the operation's canonical bytes (including the signature). IDs are therefore content addresses: two parties holding the same ID hold byte-identical operations. **Canonical bytes** are produced by `src/pmp/canonical.py` per the milestone-2 spec: UTF-8 JSON with sorted keys, no insignificant whitespace, and strict value conventions (which is why adapter bodies must be canonical-JSON-safe — see `adapter-authoring.md` §2.2). Canonicalization is what makes hashing and signing well-defined across implementations. ## 2. The three guarantees and what breaks them `pmp verify` (and `pmp.oplog`'s verification API) checks three independent properties over the whole log: 1. **Integrity** — recomputed canonical hash of each stored operation equals its stored operation ID. *Breaks if:* any byte of a stored operation was altered. 2. **Authenticity** — the Ed25519 signature verifies against the author's public key. *Breaks if:* an operation was forged or re-signed by a key other than the claimed author. 3. **Chain continuity** — per author, `seq` runs 0,1,2,… without gaps and each operation's `prev` equals the hash of the author's previous operation. *Breaks if:* operations were deleted, reordered, or inserted retroactively. Together these make the log **append-only in the cryptographic sense**: a verifier holding the author's public key can detect any tampering with history. Note what is *not* guaranteed: a node can always be destroyed wholesale, and a key-holder can always append new (signed) operations — those are addressed by the threat model and by later milestones (sync replicates history across devices; corrections supersede rather than erase). The tamper tests in `tests/test_oplog.py` demonstrate each failure mode concretely: they corrupt a stored payload byte, swap a signature, and remove a middle operation, and assert that verification names the offender. ## 3. Inspecting from the CLI ```bash pmp log --node-dir ./avery-node # one line per op, append order pmp show --node-dir ./avery-node # full pretty-printed op pmp verify --node-dir ./avery-node # full integrity/authenticity/chain check pmp export --node-dir ./avery-node # canonical JSONL to stdout pmp info --node-dir ./avery-node # node id, op count, versions ``` Typical audit session: ```bash $ pmp log --node-dir ./avery-node | head 0 9f3ac1… evidence calendar.ics calendar.event "Physio appointment" 1 41be77… evidence calendar.ics calendar.event "Standup (weekly)" … $ pmp show 9f3ac1 --node-dir ./avery-node { … full operation: header, payload, provenance, signature … } $ pmp verify --node-dir ./avery-node OK: operations, 1 author, chain intact, all signatures valid ``` Because `pmp export` emits canonical JSON Lines, the standard Unix toolbox works on it. With [`jq`](https://jqlang.github.io/jq/) (optional, not a dependency): ```bash # All evidence kinds and their counts pmp export --node-dir ./avery-node | jq -r .payload.kind | sort | uniq -c # Every operation derived from one source file pmp export --node-dir ./avery-node \ | jq -c 'select(.provenance.source | contains("avery-personal.ics"))' # Recompute an operation id externally (conceptually): # sha256 over the exported line's exact bytes == that operation's id, # because export emits canonical bytes one op per line. ``` ## 4. Inspecting from Python The library API mirrors the CLI. A read-only audit script: ```python from pmp.node import Node # see src/pmp/node.py for the constructor signature node = Node.open("./avery-node") # loads keys (public), opens the log for op in node.log: # append order print(op.seq, op.op_id[:8], op.type, op.payload.get("kind")) node.log.verify() # raises (from pmp.errors) on any # integrity/authenticity/chain failure ``` Consult the docstrings in `src/pmp/oplog.py` and `src/pmp/node.py` for the exact method names and signatures — they are the authoritative reference, and `tests/test_oplog.py` / `tests/test_node_import.py` show every call pattern in use. Going below the node API: `oplog.db` is plain SQLite, and you may read it with any SQLite browser **for inspection only**. Never write to it directly — hand-written rows will fail verification (that is the point), and the schema is an implementation detail of this reference node, not part of the protocol. The protocol-level artifact is the exported canonical JSONL. ## 5. Auditing evidence back to source bytes The provenance block makes every evidence operation independently checkable against the original source: 1. `pmp show ` and read `provenance.source` and `provenance.source_content_hash`. 2. Locate the original file named by the source locator. 3. Compute its SHA-256 (`sha256sum `). 4. Compare with `source_content_hash`. Match ⇒ this evidence was parsed from exactly those bytes by `provenance.adapter` at `provenance.adapter_version`. Mismatch ⇒ the source has changed since import (which is itself a finding: re-import to capture the new state as new evidence; the old evidence remains true testimony about the old bytes). This is the audit primitive later milestones build on: a derived claim will reference the evidence operations it came from, evidence references source bytes, and so "what do you know about me **and why**" bottoms out in hashes a user can verify themselves. ## 6. Portability and interop The export stream is the interchange surface. To verify Avery's log in a foreign implementation you need exactly three things, all public: 1. the milestone-2 wire-format specification (canonicalization, hashing, operation schema, signature scheme: Ed25519 over canonical bytes), 2. the exported JSONL, 3. the node's public key (`keys/node.pub`, also embedded as the `author` field). Nothing about SQLite, file layout, or this codebase is required. That property is deliberate and is covered by tests: the suite round-trips operations through canonical serialization and re-verifies them from the serialized form alone. ## 7. Operational hygiene * Run `pmp verify` after any restore-from-backup and before trusting a node you did not just create. * Back up the **whole node directory**; an untampered backup is always a valid prefix of the log and will verify cleanly. * Guard `keys/node.key`. Loss means this author can sign nothing further (history remains verifiable via `node.pub`); theft means an attacker can *append* as you — they still cannot rewrite history without detection. * Treat exported JSONL as sensitive: it is your evidence, in the clear. Capability-scoped sharing without revealing evidence is milestone 6, not this one.