# Live-Ops Runbooks

These runbooks describe how the operating team should run Fan Passport during the World Cup and respond to common live incidents. They assume the roles described in `docs/growth-live-ops/README.md`.

## Severity levels

| Severity | Definition | Examples | Response target |
| --- | --- | --- | --- |
| SEV0 | Critical user trust, legal, safety, or platform-wide failure. | Wrong final result awards prizes, child safety incident, mass notification misfire, app unavailable on final day. | Immediate command room, updates every 15 minutes. |
| SEV1 | Major feature or campaign failure affecting many users. | Daily challenge unavailable, prediction lock broken, leaderboard impossible scores, sponsor reward over-redemption. | Response within 15 minutes, updates every 30 minutes. |
| SEV2 | Material issue with workaround or limited segment impact. | One locale has bad copy, one sticker pack delayed, one fixture result delayed. | Response within 60 minutes. |
| SEV3 | Minor issue that can be corrected in normal workflow. | Typo, low-volume support confusion, cosmetic rendering issue. | Resolve in daily cycle. |

## Runbook: Daily live-ops standup

**When:** every tournament day before publishing primary daily content.

**Attendees:** Live Ops Lead, Content Lead, Football Data Editor, Growth/CRM Manager, Community Lead, Analytics Lead, Engineering On-call, Support Lead, Partnerships Manager when sponsor content is active.

**Inputs:**

- Previous daily close note.
- Today's row from `data/live-ops/daily_challenge_schedule_2026.csv`.
- Fixture feed and kickoff windows.
- Content QA status.
- Notification plan.
- Reward inventory.
- Moderation queue and high-risk fixtures.
- Analytics dashboard.

**Agenda:**

1. Confirm today's tournament phase and fan promise.
2. Confirm fixtures, kickoff locks, stadium IDs, and followed-team personalization.
3. Confirm primary challenge, secondary challenge, trivia, collection drop, and prediction prompt.
4. Confirm sponsor placement and reward inventory.
5. Review notification sends and frequency risk.
6. Review open incidents, support trends, and moderation risks.
7. Confirm analytics events for the primary path.
8. Assign watch coverage for match windows.
9. Decide any same-day content changes.
10. Publish the internal daily brief.

**Exit criteria:**

- Daily content is ready to publish.
- No unresolved SEV0 or SEV1 blocks.
- Owners understand match windows and escalation paths.
- Support has correct public messaging for today's rules and rewards.

## Runbook: Matchday readiness

**When:** before each fixture block, normally 30-60 minutes before first kickoff.

**Owner:** Football Data Editor, supported by Engineering On-call.

**Steps:**

1. Confirm fixture IDs, teams, stadium, kickoff time, and match state.
2. Confirm prediction prompts are visible and show correct lock time.
3. Confirm predictions lock before kickoff.
4. Confirm check-in window opens only during configured period.
5. Confirm relevant followed-team notifications were sent or suppressed according to policy.
6. Confirm sponsor placements do not conflict with regulated categories or territory restrictions.
7. Confirm fallback challenge path for users without the relevant team.
8. Confirm moderation watch for high-risk fixtures.
9. Confirm support macro for prediction lock and check-in questions.
10. Record readiness in live-ops channel.

**Failure response:**

- If predictions cannot lock reliably, disable score-awarding predictions for the fixture and convert to non-scoring fan poll before kickoff.
- If fixture data is uncertain, delay match-specific notifications and use generic daily content.
- If check-in is unavailable, extend a fair make-good window after the match and communicate clearly.

## Runbook: Post-match result resolution

**When:** after full time or after extra time and penalties where applicable.

**Owner:** Football Data Editor.

**Steps:**

1. Wait for authoritative result verification.
2. Confirm competition outcome, not just regulation score, for knockout predictions.
3. Trigger prediction resolution.
4. Verify sample users across correct, incorrect, voided, and no-prediction states.
5. Trigger achievements and badge unlocks.
6. Publish match memory card if approved.
7. Trigger result notification to involved users.
8. Check leaderboard deltas for impossible scores.
9. Log any delayed or manual corrections.
10. Update daily close metrics.

**If result feed is delayed:**

- Mark prediction state as pending.
- Do not award points from unofficial data.
- Publish a neutral in-app note if delay exceeds 60 minutes.
- Resolve manually only with two-source verification and Live Ops Lead approval.

## Runbook: Major upset or viral moment

**When:** a lower-rated team defeats or eliminates a favorite, a dramatic comeback occurs, or a moment becomes widely discussed.

**Owner:** Content Lead.

**Steps:**

1. Confirm result or event with Football Data Editor.
2. Decide whether moment is celebratory, neutral, sensitive, or unsafe.
3. Select approved reactive content type:
   - Giant killing badge.
   - Upset memory card.
   - Trivia bonus.
   - Share card.
   - Squad prompt.
4. Check sponsor conflicts and legal restrictions.
5. Publish within the latency target from the content refresh process.
6. Notify only users who opted into relevant challenges or followed teams unless moment is tournament-defining.
7. Monitor reports and social response.
8. Measure share rate, challenge completion, and notification opt-out.

**Do not publish:**

- Content that mocks a country, player, or fan base.
- Content based on unverified rumors.
- Content around serious injury without careful review.
- Sponsor jokes attached to controversial incidents.

## Runbook: Data feed outage

**Severity:** SEV1 if predictions, scores, standings, or fixture states are affected during live matches.

**Owner:** Engineering On-call.

**Steps:**

1. Detect outage through monitoring, failed jobs, or operator report.
2. Confirm scope: fixtures, scores, standings, teams, stadiums, or all data.
3. Pause automated result resolution if score data is affected.
4. Pause match-specific notifications if kickoff or score state is uncertain.
5. Switch affected content to pending or non-scoring mode.
6. Use manual verification only for critical result resolution and only with Live Ops Lead approval.
7. Communicate internally every 30 minutes until stable.
8. After restoration, reconcile missed events, duplicate events, and user rewards.
9. Publish user-facing correction if any scoring, reward, or prediction state was visibly wrong.
10. Record cause and prevention in incident closeout.

## Runbook: Prediction lock issue

**Severity:** SEV1 for active fixtures, SEV0 if prizes or final-day predictions are affected.

**Owner:** Engineering On-call and Football Data Editor.

**Symptoms:**

- Users can submit after kickoff.
- Users cannot submit before lock.
- Lock time displayed in wrong timezone.
- Edited prediction bypasses lock.
- Prediction result uses wrong fixture outcome.

**Steps:**

1. Freeze affected prediction prompt.
2. Export impacted prediction IDs and timestamps.
3. Determine correct lock time and affected users.
4. Void unfair submissions after lock.
5. Restore valid submissions rejected before lock where logs allow.
6. If fairness cannot be restored, convert the fixture prediction to non-scoring and issue a make-good digital item to affected users.
7. Update leaderboard and reward eligibility.
8. Send correction notice to impacted users.
9. Add regression test for lock display and server validation.

## Runbook: Notification misfire

**Severity:** SEV1 for large unwanted send; SEV0 if harmful, regulated, or private information is included.

**Owner:** Growth/CRM Manager.

**Steps:**

1. Stop the campaign or activate notification kill switch.
2. Capture notification ID, trigger ID, audience, channel, copy, send count, and time.
3. Identify whether the issue is wrong audience, wrong time, wrong copy, duplicate send, broken deep link, or harmful content.
4. Notify Live Ops Lead, Support Lead, Engineering On-call, and Legal if needed.
5. Suppress follow-up sends to affected users until reviewed.
6. If deep link is broken, redirect route or replace destination.
7. If user action is required, send a clear correction through the least intrusive channel.
8. Monitor opt-outs, support tickets, and social complaints.
9. Record prevention: approval rule, audience check, dry run, or tooling change.

## Runbook: Sponsor reward over-redemption

**Severity:** SEV1 unless high-value or legal terms are breached, then SEV0.

**Owner:** Partnerships Manager.

**Steps:**

1. Pause reward redemption for the campaign.
2. Preserve reward claim logs.
3. Compare issued rewards, redeemed rewards, inventory, fraud holds, and sponsor-side records.
4. Identify whether over-redemption came from fraud, concurrency, inventory sync, terms confusion, or manual configuration.
5. Prioritize honoring legitimate user expectations where feasible.
6. If fulfilment cannot be honored, Legal and Support approve remedy wording.
7. Reopen campaign only after inventory and eligibility are corrected.
8. Provide sponsor with transparent aggregate report.
9. Add inventory threshold alert or redemption idempotency fix.

## Runbook: Leaderboard abuse spike

**Severity:** SEV1 if top ranks or rewards are affected.

**Owner:** Community Lead with Analytics Lead.

**Steps:**

1. Freeze public payout or winner announcement.
2. Identify suspicious users, squads, events, and score sources.
3. Compare scores against possible maximums.
4. Check duplicate accounts, referral clusters, device signals, prediction locks, and challenge completion velocity.
5. Hold suspicious scores while allowing honest users to continue.
6. Remove invalid points only after evidence is sufficient.
7. Recalculate affected leaderboard.
8. Communicate general integrity action if ranks visibly change.
9. Record fraud pattern and update detection rules.

## Runbook: Harmful user-generated content surge

**Severity:** SEV1 or SEV0 depending on content severity.

**Owner:** Community Lead.

**Steps:**

1. Identify affected surface: profiles, squads, comments, share captions, leaderboard.
2. Increase moderation staffing or route queue to highest severity first.
3. Temporarily disable user-edited public captions if necessary.
4. Remove violating content and enforce against severe users.
5. Preserve evidence for threats, child safety, or legal cases.
6. Coordinate support messaging.
7. Monitor repeat evasion and new account creation.
8. Re-enable surfaces gradually after report rate stabilizes.

## Runbook: Content error

**Severity:** SEV2 for isolated factual/copy errors; SEV1 if scoring, sponsor, legal, or mass confusion is involved.

**Owner:** Content Lead.

**Steps:**

1. Mark content as Needs Fix.
2. Determine impact: viewed only, acted on, scored, rewarded, shared, or sponsored.
3. Correct factual error with Football Data Editor.
4. Correct legal or sponsor copy with Legal/Partnerships if needed.
5. Publish new version.
6. If users were affected, provide make-good or correction notice.
7. Record error in daily close and update QA checklist.

## Runbook: Final day command room

**When:** final day from four hours before kickoff through recap launch.

**Owner:** Live Ops Lead.

**Participants:** Engineering On-call, Football Data Editor, Content Lead, Growth/CRM Manager, Community Lead, Support Lead, Analytics Lead, Partnerships Manager, Legal/Compliance Reviewer.

**Operating rules:**

- No non-critical code or content changes.
- All push sends require Live Ops Lead approval.
- Prediction lock and final result resolution receive double verification.
- Sponsor reward inventory is checked hourly.
- Leaderboard payout waits for fraud review.
- Recap generation is monitored as a primary system.
- Support response times are prioritized for account access, rewards, prediction scoring, and recap issues.

**Timeline:**

1. Morning: publish Final Day legacy stamp and final prediction prompt.
2. Four hours before kickoff: verify fixture, lock, reward, and notification state.
3. One hour before kickoff: send final prediction reminder only to eligible users without prediction.
4. Kickoff: lock predictions and confirm no late writes.
5. During match: monitor system health, moderation, and support.
6. Full time: wait for authoritative result.
7. Result verified: resolve predictions, achievements, champion collection, and final badge.
8. Recap generation: release in batches and monitor errors.
9. Post-final: send recap notification after user recap is ready.
10. Closeout: publish internal final-day report and begin post-final transition calendar.

## Incident closeout format

Every SEV0, SEV1, and repeated SEV2 should close with:

- Incident ID.
- Severity.
- Start and end time.
- Detection method.
- User impact.
- Revenue or sponsor impact.
- Safety or legal impact.
- Root cause.
- Immediate fix.
- User remedy.
- Prevention.
- Owners for follow-up actions.
- Date follow-up was verified.

Closeouts should be shared internally within 48 hours for SEV1 and within 24 hours for SEV0.