# Inspecting and Auditing the Operation Log

The operation log is the node's single source of truth. This document explains
the anatomy of an operation as stored and exported, how signing and hash
chaining work, how to inspect the log from the CLI and from Python, and how to
audit a piece of evidence all the way back to raw source bytes.

It assumes the milestone-2 wire-format specification; this is the operational
companion to that spec.

## 1. Anatomy of an operation

Every entry in the log is one **operation**: an immutable, signed, canonical
JSON document. Conceptually it has four parts:

```
┌─ header ──────────────────────────────────────────────┐
│ wire-format version                                   │
│ type            e.g. "evidence"                       │
│ author          node public key (the signer)          │
│ seq             0,1,2,… per author, no gaps           │
│ prev            hash of this author's previous op     │
│                 (null for seq 0)                      │
│ timestamp       RFC 3339 UTC, time of creation        │
├─ payload ─────────────────────────────────────────────┤
│ type-specific body; for evidence: kind, external_id,  │
│ and the parsed source content                         │
├─ provenance ──────────────────────────────────────────┤
│ adapter, adapter_version, source locator,             │
│ source_content_hash, imported_at                      │
├─ signature ───────────────────────────────────────────┤
│ Ed25519 over the canonical bytes of everything above  │
└───────────────────────────────────────────────────────┘
```

The **operation ID** is the SHA-256 hash of the operation's canonical bytes
(including the signature). IDs are therefore content addresses: two parties
holding the same ID hold byte-identical operations.

**Canonical bytes** are produced by `src/pmp/canonical.py` per the milestone-2
spec: UTF-8 JSON with sorted keys, no insignificant whitespace, and strict
value conventions (which is why adapter bodies must be canonical-JSON-safe —
see `adapter-authoring.md` §2.2). Canonicalization is what makes hashing and
signing well-defined across implementations.

## 2. The three guarantees and what breaks them

`pmp verify` (and `pmp.oplog`'s verification API) checks three independent
properties over the whole log:

1. **Integrity** — recomputed canonical hash of each stored operation equals
   its stored operation ID. *Breaks if:* any byte of a stored operation was
   altered.
2. **Authenticity** — the Ed25519 signature verifies against the author's
   public key. *Breaks if:* an operation was forged or re-signed by a key other
   than the claimed author.
3. **Chain continuity** — per author, `seq` runs 0,1,2,… without gaps and each
   operation's `prev` equals the hash of the author's previous operation.
   *Breaks if:* operations were deleted, reordered, or inserted retroactively.

Together these make the log **append-only in the cryptographic sense**: a
verifier holding the author's public key can detect any tampering with history.
Note what is *not* guaranteed: a node can always be destroyed wholesale, and a
key-holder can always append new (signed) operations — those are addressed by
the threat model and by later milestones (sync replicates history across
devices; corrections supersede rather than erase).

The tamper tests in `tests/test_oplog.py` demonstrate each failure mode
concretely: they corrupt a stored payload byte, swap a signature, and remove a
middle operation, and assert that verification names the offender.

## 3. Inspecting from the CLI

```bash
pmp log    --node-dir ./avery-node            # one line per op, append order
pmp show   <op-id-prefix> --node-dir ./avery-node   # full pretty-printed op
pmp verify --node-dir ./avery-node            # full integrity/authenticity/chain check
pmp export --node-dir ./avery-node            # canonical JSONL to stdout
pmp info   --node-dir ./avery-node            # node id, op count, versions
```

Typical audit session:

```bash
$ pmp log --node-dir ./avery-node | head
0  9f3ac1…  evidence  calendar.ics       calendar.event  "Physio appointment"
1  41be77…  evidence  calendar.ics       calendar.event  "Standup (weekly)"
…

$ pmp show 9f3ac1 --node-dir ./avery-node
{ … full operation: header, payload, provenance, signature … }

$ pmp verify --node-dir ./avery-node
OK: <N> operations, 1 author, chain intact, all signatures valid
```

Because `pmp export` emits canonical JSON Lines, the standard Unix toolbox
works on it. With [`jq`](https://jqlang.github.io/jq/) (optional, not a
dependency):

```bash
# All evidence kinds and their counts
pmp export --node-dir ./avery-node | jq -r .payload.kind | sort | uniq -c

# Every operation derived from one source file
pmp export --node-dir ./avery-node \
  | jq -c 'select(.provenance.source | contains("avery-personal.ics"))'

# Recompute an operation id externally (conceptually):
# sha256 over the exported line's exact bytes == that operation's id,
# because export emits canonical bytes one op per line.
```

## 4. Inspecting from Python

The library API mirrors the CLI. A read-only audit script:

```python
from pmp.node import Node  # see src/pmp/node.py for the constructor signature

node = Node.open("./avery-node")          # loads keys (public), opens the log

for op in node.log:                       # append order
    print(op.seq, op.op_id[:8], op.type, op.payload.get("kind"))

node.log.verify()                         # raises (from pmp.errors) on any
                                          # integrity/authenticity/chain failure
```

Consult the docstrings in `src/pmp/oplog.py` and `src/pmp/node.py` for the
exact method names and signatures — they are the authoritative reference, and
`tests/test_oplog.py` / `tests/test_node_import.py` show every call pattern in
use.

Going below the node API: `oplog.db` is plain SQLite, and you may read it with
any SQLite browser **for inspection only**. Never write to it directly —
hand-written rows will fail verification (that is the point), and the schema is
an implementation detail of this reference node, not part of the protocol. The
protocol-level artifact is the exported canonical JSONL.

## 5. Auditing evidence back to source bytes

The provenance block makes every evidence operation independently checkable
against the original source:

1. `pmp show <op-id>` and read `provenance.source` and
   `provenance.source_content_hash`.
2. Locate the original file named by the source locator.
3. Compute its SHA-256 (`sha256sum <file>`).
4. Compare with `source_content_hash`. Match ⇒ this evidence was parsed from
   exactly those bytes by `provenance.adapter` at `provenance.adapter_version`.
   Mismatch ⇒ the source has changed since import (which is itself a finding:
   re-import to capture the new state as new evidence; the old evidence remains
   true testimony about the old bytes).

This is the audit primitive later milestones build on: a derived claim will
reference the evidence operations it came from, evidence references source
bytes, and so "what do you know about me **and why**" bottoms out in hashes a
user can verify themselves.

## 6. Portability and interop

The export stream is the interchange surface. To verify Avery's log in a
foreign implementation you need exactly three things, all public:

1. the milestone-2 wire-format specification (canonicalization, hashing,
   operation schema, signature scheme: Ed25519 over canonical bytes),
2. the exported JSONL,
3. the node's public key (`keys/node.pub`, also embedded as the `author` field).

Nothing about SQLite, file layout, or this codebase is required. That property
is deliberate and is covered by tests: the suite round-trips operations through
canonical serialization and re-verifies them from the serialized form alone.

## 7. Operational hygiene

* Run `pmp verify` after any restore-from-backup and before trusting a node you
  did not just create.
* Back up the **whole node directory**; an untampered backup is always a valid
  prefix of the log and will verify cleanly.
* Guard `keys/node.key`. Loss means this author can sign nothing further
  (history remains verifiable via `node.pub`); theft means an attacker can
  *append* as you — they still cannot rewrite history without detection.
* Treat exported JSONL as sensitive: it is your evidence, in the clear.
  Capability-scoped sharing without revealing evidence is milestone 6, not this
  one.