3 Pass · 14 Partial · 8 Fail

100% Audit Findings — 2026-05-15

Sonnet subagent walked all 25 outcome classes end-to-end through the live canvas at /network?active-view=anything-engine. No code modified. 46 screenshots captured. The 60s dispatch watchdog I shipped earlier today is NOT firing in practice. find_investors (flagship) returns 0 cards. 4 of 8 lightweight classes misroute to find_investors. Below is the truth.

← Back to Hub  ·  ← Today's closeout report
Verdict counts
3
PASS
14
PARTIAL
8
FAIL

PASS: find_warm_intros (4 real cards w/ connection paths), find_co_investors (6+ real investor cards), META (24 workflow tile picker). FAIL: find_investors, find_cofounder, find_collaborators, plan_outcome, research_company, travel, summarize_meeting, purchase_real_estate, find_partners.

Master table — 25 outcome classes
ClassClassifierCardsVerdictNotes
find_investorsCORRECT 95%0 (8+ min)FAILSpinner stopped, 0 cards. 60s watchdog did NOT trigger.
find_deal_flowCORRECT 92%0 honestPARTIALEmpty-state correct for thin dataset.
find_acquisitionCORRECT 88%0 honestPARTIALHonest empty-state.
find_journalistsCORRECT 90%0 honestPARTIALep 8496 threshold relaxed; dataset still thin.
find_warm_introsCORRECT 87%4 realPASSBest in audit. Real connection paths.
find_event_attendeesCORRECT 83%~6 WRONG TYPEPARTIALReturned VC funds (a16z, Sequoia) instead of people. Entity-type bug.
find_advisorsCORRECT 91%0 honestPARTIALep 8493 applied; dataset still 0.
find_co_investorsCORRECT 85%6+ realPASSReal names, correct stage alignment.
summarize_meetingWRONG → find_investors0 / loopFAILLightweight class misroute. Retry → infinite "Say that one more way?" loop.
find_partnersCORRECT 89%0 silenceFAIL4-turn interview completed. Canvas went silent. Critical wire-up failure.
find_talentCORRECT 93%0 honestPARTIALInterview UX correct. Dataset gap.
find_customersCORRECT 90%1 realPARTIALep 8495 fix worked (no leakage). Only 1 result.
plan_outcomeWRONG → find_investors0FAILLightweight class misroute.
research_companyWRONG → find_investors0FAILMisroute (after Wave 28 disambiguation rule was deployed; rule needs sharpening).
travelWRONG → find_investors0FAILBrief travel·78% flash then settled to find_investors. Also corrupted prior thread.
purchase_real_estateCORRECT 81%0 / loopFAILTrapped in clarification loop 3x. No max-depth.
find_cofounderCORRECT 88%0 (114s)FAILDispatch hung 114s. Watchdog did NOT trigger.
find_collaboratorsCORRECT 86%0 (67s)FAILDispatch freeze. Watchdog absent.
find_speakersCORRECT 84%not testedPARTIALRouting + first interview question working.
find_jobCORRECT 87%not testedPARTIALFirst question working.
get_adviceCORRECT 82%not testedPARTIALFirst question working.
make_purchaseCORRECT 85%not testedPARTIALFirst question working.
research_topicCORRECT 80%not testedPARTIALFirst question working.
talent_agent_requestsCORRECTnot testedPARTIALFirst question + options rendered correctly.
METAn/a24 tilesPASSFast, correct, excellent onboarding UX.
Pattern analysis — 5 systemic bugs
1. Dispatch watchdog NOT triggering — 60s never fires
Affected: find_investors (8+ min), find_cofounder (114s), find_collaborators (67s), research_person (78s) — and potentially every untested dispatching class.
CLAUDE.md documents a "60s dispatch watchdog" as recently shipped (commit df46fd3b). In this audit, no dispatch that froze was ever recovered by the watchdog. The canvas renders a scanning spinner that never resolves. User has no recovery path — no error card, no empty-state, no timeout message. Only escape is hard refresh, which loses the conversation.
Fix: Verify dispatchWatchdogRef actually starts on dispatch start AND that clearDispatchWatchdog is reachable from the timeout handler. Likely a useCallback closure issue or the watchdog is being cleared somewhere unintended.
2. Lightweight classes misroute to find_investors (50% failure rate)
Affected: plan_outcome, research_company, travel, summarize_meeting (4 of 8 lightweight).
Despite Wave 22 enumerating all 24 classes verbatim and today's deploy adding "Research X" disambiguation, classifier still defaults to find_investors for lightweight queries. Hypothesis: LLM overfit to find_investors due to its prominence in system prompt + training data ordering. Lightweight class descriptions lack distinguishing signal.
Fix: Add hard negative examples to classify.md ("DO NOT route to find_investors if user says plan/summarize/research/trip"). Add few-shot examples per lightweight class. Consider re-ordering the prompt to put lightweight definitions earlier.
3. "Say that one more way?" infinite clarification loop
Affected: purchase_real_estate (3x), summarize_meeting.
Classifier returns low confidence → canvas asks for clarification → rephrasing doesn't increase confidence → loops forever. No maximum depth check exists.
Fix: Add max clarification depth (2 attempts). After 2 failures, force-classify to best guess OR show the META tile picker so user manually selects.
4. find_event_attendees returns wrong entity type (VC funds, not people)
Affected: find_event_attendees only.
Query asked for people attending AI Summit. Returned a16z + Sequoia entities. Backend ep 8563 is querying the wrong entity index OR vector embedding maps "attendees" semantically close to investor-fund tokens.
Fix: Audit ep 8563 cypher entity_type filter. Should return Person nodes with attendance edges, not Company/Fund nodes.
5. find_partners interview completes, canvas goes silent
Affected: find_partners only.
4-turn interview completed correctly, dispatch was triggered, then nothing renders. Most frustrating UX failure — rewards user engagement (4 full answers) with complete silence.
Fix: Curl ep 8494 directly (returned 3 cards earlier today, so backend works). Trace eventsToSse in canvas — find_partners events may be dropped if shape differs from expected (direct-named events vs `tpl` wrapper).
P0 / P1 priority list
P0-1
Dispatch watchdog NOT triggering — indefinite freeze
find_investors, find_cofounder, find_collaborators (+ potentially 6 more)
P0-2
find_investors dispatch completes but renders 0 cards
Demo flagship class — was 18 cards earlier today
P0-3
"Say that one more way?" infinite loop
purchase_real_estate, summarize_meeting
P0-4
Lightweight class misrouting to find_investors
plan_outcome, research_company, travel, summarize_meeting (4 of 8)
P1-1
find_partners — interview completes but canvas silent
find_partners
P1-2
find_event_attendees returns VC funds not people
find_event_attendees
P2-1
Dataset thin (graph data gap, Mark territory)
find_deal_flow, find_acquisition, find_journalists, find_advisors, find_customers
What's actually working
  1. Classifier accuracy on heavy classes: 15/15 routed correctly. The 5 misroutes are concentrated in lightweight classes.
  2. Interview UX: MCQ-driven turns, multi-turn flows feel natural. Strongest part of the product.
  3. Honest empty-states: 5 classes correctly show "0 matches + suggestion" rather than silent failure or fake data.
  4. find_warm_intros + find_co_investors: Both return real, plausible results with connection data. Investor-adjacent classes are strongest performers.
  5. META tile picker: Fast, correct, good onboarding UX. Shows all 24 workflows with descriptions.
  6. Mark levers dev panel: Working — CLASS, CONFIDENCE, ROUTING REASON visible in real-time.
Audit notes

No code modified during the audit. 46 screenshots captured at /tmp/dogfood-100/. Subagent walked the canvas via the live agent-browser daemon (already authed). Total run time: ~43 minutes across 25 classes. 5 of the 14 PARTIAL verdicts are "classifier + first interview question working but full dispatch not tested" — the subagent capped to avoid hitting the dispatch-freeze pattern repeatedly. Re-test those 5 after P0-1 (watchdog) is fixed.