Anything Engine — 100% Audit Findings (2026-05-15)

Verdict counts

3

PASS

14

PARTIAL

8

FAIL

PASS: find_warm_intros (4 real cards w/ connection paths), find_co_investors (6+ real investor cards), META (24 workflow tile picker). FAIL: find_investors, find_cofounder, find_collaborators, plan_outcome, research_company, travel, summarize_meeting, purchase_real_estate, find_partners.

Master table — 25 outcome classes

Class	Classifier	Cards	Verdict	Notes
find_investors	CORRECT 95%	0 (8+ min)	FAIL	Spinner stopped, 0 cards. 60s watchdog did NOT trigger.
find_deal_flow	CORRECT 92%	0 honest	PARTIAL	Empty-state correct for thin dataset.
find_acquisition	CORRECT 88%	0 honest	PARTIAL	Honest empty-state.
find_journalists	CORRECT 90%	0 honest	PARTIAL	ep 8496 threshold relaxed; dataset still thin.
find_warm_intros	CORRECT 87%	4 real	PASS	Best in audit. Real connection paths.
find_event_attendees	CORRECT 83%	~6 WRONG TYPE	PARTIAL	Returned VC funds (a16z, Sequoia) instead of people. Entity-type bug.
find_advisors	CORRECT 91%	0 honest	PARTIAL	ep 8493 applied; dataset still 0.
find_co_investors	CORRECT 85%	6+ real	PASS	Real names, correct stage alignment.
summarize_meeting	WRONG → find_investors	0 / loop	FAIL	Lightweight class misroute. Retry → infinite "Say that one more way?" loop.
find_partners	CORRECT 89%	0 silence	FAIL	4-turn interview completed. Canvas went silent. Critical wire-up failure.
find_talent	CORRECT 93%	0 honest	PARTIAL	Interview UX correct. Dataset gap.
find_customers	CORRECT 90%	1 real	PARTIAL	ep 8495 fix worked (no leakage). Only 1 result.
plan_outcome	WRONG → find_investors	0	FAIL	Lightweight class misroute.
research_company	WRONG → find_investors	0	FAIL	Misroute (after Wave 28 disambiguation rule was deployed; rule needs sharpening).
travel	WRONG → find_investors	0	FAIL	Brief travel·78% flash then settled to find_investors. Also corrupted prior thread.
purchase_real_estate	CORRECT 81%	0 / loop	FAIL	Trapped in clarification loop 3x. No max-depth.
find_cofounder	CORRECT 88%	0 (114s)	FAIL	Dispatch hung 114s. Watchdog did NOT trigger.
find_collaborators	CORRECT 86%	0 (67s)	FAIL	Dispatch freeze. Watchdog absent.
find_speakers	CORRECT 84%	not tested	PARTIAL	Routing + first interview question working.
find_job	CORRECT 87%	not tested	PARTIAL	First question working.
get_advice	CORRECT 82%	not tested	PARTIAL	First question working.
make_purchase	CORRECT 85%	not tested	PARTIAL	First question working.
research_topic	CORRECT 80%	not tested	PARTIAL	First question working.
talent_agent_requests	CORRECT	not tested	PARTIAL	First question + options rendered correctly.
META	n/a	24 tiles	PASS	Fast, correct, excellent onboarding UX.

Pattern analysis — 5 systemic bugs

1. Dispatch watchdog NOT triggering — 60s never fires

Affected: find_investors (8+ min), find_cofounder (114s), find_collaborators (67s), research_person (78s) — and potentially every untested dispatching class.

CLAUDE.md documents a "60s dispatch watchdog" as recently shipped (commit df46fd3b). In this audit, no dispatch that froze was ever recovered by the watchdog. The canvas renders a scanning spinner that never resolves. User has no recovery path — no error card, no empty-state, no timeout message. Only escape is hard refresh, which loses the conversation.

Fix: Verify dispatchWatchdogRef actually starts on dispatch start AND that clearDispatchWatchdog is reachable from the timeout handler. Likely a useCallback closure issue or the watchdog is being cleared somewhere unintended.

2. Lightweight classes misroute to find_investors (50% failure rate)

Affected: plan_outcome, research_company, travel, summarize_meeting (4 of 8 lightweight).

Despite Wave 22 enumerating all 24 classes verbatim and today's deploy adding "Research X" disambiguation, classifier still defaults to find_investors for lightweight queries. Hypothesis: LLM overfit to find_investors due to its prominence in system prompt + training data ordering. Lightweight class descriptions lack distinguishing signal.

Fix: Add hard negative examples to classify.md ("DO NOT route to find_investors if user says plan/summarize/research/trip"). Add few-shot examples per lightweight class. Consider re-ordering the prompt to put lightweight definitions earlier.

3. "Say that one more way?" infinite clarification loop

Affected: purchase_real_estate (3x), summarize_meeting.

Classifier returns low confidence → canvas asks for clarification → rephrasing doesn't increase confidence → loops forever. No maximum depth check exists.

Fix: Add max clarification depth (2 attempts). After 2 failures, force-classify to best guess OR show the META tile picker so user manually selects.

4. find_event_attendees returns wrong entity type (VC funds, not people)

Affected: find_event_attendees only.

Query asked for people attending AI Summit. Returned a16z + Sequoia entities. Backend ep 8563 is querying the wrong entity index OR vector embedding maps "attendees" semantically close to investor-fund tokens.

Fix: Audit ep 8563 cypher entity_type filter. Should return Person nodes with attendance edges, not Company/Fund nodes.

5. find_partners interview completes, canvas goes silent

Affected: find_partners only.

4-turn interview completed correctly, dispatch was triggered, then nothing renders. Most frustrating UX failure — rewards user engagement (4 full answers) with complete silence.

Fix: Curl ep 8494 directly (returned 3 cards earlier today, so backend works). Trace eventsToSse in canvas — find_partners events may be dropped if shape differs from expected (direct-named events vs `tpl` wrapper).

P0 / P1 priority list

P0-1

Dispatch watchdog NOT triggering — indefinite freeze

find_investors, find_cofounder, find_collaborators (+ potentially 6 more)

P0-2

find_investors dispatch completes but renders 0 cards

Demo flagship class — was 18 cards earlier today

P0-3

"Say that one more way?" infinite loop

purchase_real_estate, summarize_meeting

P0-4

Lightweight class misrouting to find_investors

plan_outcome, research_company, travel, summarize_meeting (4 of 8)

P1-1

find_partners — interview completes but canvas silent

find_partners

P1-2

find_event_attendees returns VC funds not people

find_event_attendees

P2-1

Dataset thin (graph data gap, Mark territory)

find_deal_flow, find_acquisition, find_journalists, find_advisors, find_customers

What's actually working

Classifier accuracy on heavy classes: 15/15 routed correctly. The 5 misroutes are concentrated in lightweight classes.
Interview UX: MCQ-driven turns, multi-turn flows feel natural. Strongest part of the product.
Honest empty-states: 5 classes correctly show "0 matches + suggestion" rather than silent failure or fake data.
find_warm_intros + find_co_investors: Both return real, plausible results with connection data. Investor-adjacent classes are strongest performers.
META tile picker: Fast, correct, good onboarding UX. Shows all 24 workflows with descriptions.
Mark levers dev panel: Working — CLASS, CONFIDENCE, ROUTING REASON visible in real-time.

Audit notes

No code modified during the audit. 46 screenshots captured at /tmp/dogfood-100/. Subagent walked the canvas via the live agent-browser daemon (already authed). Total run time: ~43 minutes across 25 classes. 5 of the 14 PARTIAL verdicts are "classifier + first interview question working but full dispatch not tested" — the subagent capped to avoid hitting the dispatch-freeze pattern repeatedly. Re-test those 5 after P0-1 (watchdog) is fixed.

100% Audit Findings — 2026-05-15