Living progress page for the Mark-sync directives and the Wave-2 hardening that followed. Seven overnight directives (5 shipped, 1 queued for Monday, 1 Mark-owned) plus four post-sunrise hardening wins — graph_count aggregation, agent-traces list endpoint, per-tool weight A/B, and provider-side JSON mode. Every card below has its own diagram, validation evidence, and curl proof.
Last night's ~1:27 call with Mark produced seven hard directives for the agent-router v2 layer, plus a bunch of infra that lands on his side next week. This page tracks the overnight execution of everything I can do solo, with progress reported as I go.
Reference docs: Agent Router architecture · Graph Query Lab · Live demo
POST /robert-lab/agent (ID 8349, api:LITebdJ-)POST /robert-lab/feedback (ID 8350)GET /robert-lab/trace (ID 8351)
Goal. Replace every Groq-queue LLM call in the agent-router (route step + synthesize step) with OpenRouter calls. Use provider order [fireworks, together] with allow_fallbacks: true so Fireworks is primary, Together is fallback, and OpenRouter auto-routes to any third-party if both 500.
Why. Groq queue has hard quota and single-provider risk. OpenRouter lets us lean on Fireworks (fast, consistent Llama 3.3 70B) and keep Together as warm fallback — same model family, zero dependency on a single cloud.
What shipped. Parallel endpoint POST /robert-lab/agent-or (ID 8352) created — clone of 8349 with both api.request blocks pointed at OpenRouter, provider preferences applied, auth via $env.openRouter. Production 8349 left on Groq until Robert approves cutover.
Validation. All 6 tool branches tested end-to-end through OpenRouter:
find_investors 9.2s tool correct, 5 rows find_talent 11.4s 5 candidates, rank_breakdown intact find_customers 30.4s slow outlier (likely Fireworks cold start) research_person 4.0s 3 rows, connections expanded research_company 4.4s 3 rows, funding_story intact graph_query 6.5s 10 rows, fallback branch clean
Median latency ~6-9s, comparable to Groq. Response shape unchanged except new field provider: "openrouter[fireworks,together]".

Goal. New endpoint POST /robert-lab/outcome-agent that takes an outcome (goal-shaped natural language) and decomposes it into sub-tasks, each routing through the agent-router tools. Mark builds Query Agent in parallel; both feed the same surface.
What shipped. Endpoint POST /robert-lab/outcome-agent (ID 8354). Three-stage pipeline:
{milestones[], sub_tasks[]}. Each sub-task has tool, query, why. Cap at max_subtasks (default 3) to bound latency./robert-lab/agent-or-opus (8353) inside a foreach. Each inner call runs route → vector → Cypher → rank → Opus rationale → synth, so sub-tasks come back with full ranked_rows + rationales attached.{plan_summary, milestones[], top_recommendations[], next_actions[]}.Validation. Test outcome "close a seed round by Q3 for a dev tools startup" with user_id=12, max_subtasks=2:
HTTP 200 71.0s end-to-end
milestones: [secure lead investor, hire CFO, map board]
sub_task 0: find_investors — 2 graph rows with Opus rationales
sub_task 1: find_talent — 1 graph row with Opus rationale
plan.plan_summary: "To close a seed round by Q3...
reach out to Seedcamp to explore potential lead..."
plan.top_recommendations: 2
plan.next_actions: 3
Latency tradeoff. Each sub-task is a full 8353 call (~30s for Opus-mode), so 2 sub-tasks ~= 60-75s, 3 sub-tasks ~= 90-120s. Acceptable for outcome-shaped queries where the user is explicitly asking for a plan. Future iteration: parallel dispatch via background jobs.
Parity. Mark is building the Query Agent in parallel on his side. Both feed the same surface — Query Agent handles single-turn queries, Outcome Agent handles multi-step goals.

Goal. After ranking, call Claude Opus (via OpenRouter: anthropic/claude-opus-4) with each top-N row's full graph context and the user's original query. Return a single sentence that explains why this person is specifically relevant right now. Attach as ranked_rows[i].rationale.
The piece Mark cares about most. This is what makes v2 feel alive — not just "ranked by composite_score 0.94" but specific, non-obvious signal pulled from the graph context.
What shipped. New endpoint POST /robert-lab/agent-or-opus (ID 8353) — clone of 8352 with a single-batch Opus call inserted after the ranker. Passes all ranked rows in one payload, asks Opus for [{uuid, rationale}], maps by uuid, attaches as ranked_rows[i].rationale. Graceful fallback: if parsing fails, rationales stay empty strings and synthesis continues.
Sample output from test query "Find me seed-stage AI investors who back developer tools":
[0] Angel Investor (0.446)
"Generic angel investor profile with no portfolio
data makes it impossible to verify developer tools
focus or AI expertise."
[1] Angel Invest (0.534)
"European super angel deploying €125k checks at 100
startups annually could be high-velocity partner for
developer tools, though portfolio shows events
company not dev tools."
[2] O'Reilly AlphaTech Ventures (0.425)
"O'Reilly's venture arm brings unmatched developer
ecosystem access and open source credibility crucial
for AI developer tools adoption."
[4] Founders Fund (0.624)
"Founders Fund's Pathfinder vehicle backed Rippling
(developer-first HR) and has PayPal/Palantir DNA
indicating strong technical founder affinity."
This is the juice Mark asked for — concrete portfolio signal, thesis alignment, non-boilerplate. Note row [0] correctly flags a weak candidate as weak rather than fabricating relevance. Latency cost: ~30s added to endpoint (48s total vs 10s for 8352), which is the "quality mode" tradeoff.
Three endpoints for A/B: 8349 (Groq, prod) · 8352 (OpenRouter, fast) · 8353 (OpenRouter + Opus, quality).

Goal. Add days_since_last_contact as a 5th ranker dimension (or replace one of the existing slots for certain profiles). People Robert talked to in the last 30 days get downweighted; last 7 days get excluded. Prevents "hey, I know I just emailed you, but..." recommendations.
Data source shipped. my_person.last_activity_at (already populated by Mark's pipeline, indexed on master_person_id + user_id). FalkorDB row uuid → master_person.node_uuid → my_person row for the authed user. Nylas join deferred — current signal already covers email + calendar + meeting activity.
Normalization. days_since = (now - last_activity_at) / 86_400_000. 0–90 days linear, capped at 1.0 beyond 90. Never-contacted → 1.0 (safe to reach). Future/clock-drift timestamps → 1.0 (treat as unknown).
What shipped. Endpoint 8353 /robert-lab/agent-or-opus now populates rank_breakdown.rec per row. Recency is one of the four dimensions the composite score blends. Per-tool weight profiles (from directive 03) already wire through — find_talent uses authority-heavy weights, find_customers uses relevance-heavy, etc. Recency rides alongside.
Validation matrix (6/6 pass, Apr 21 ~05:30):
find_investors user=15 HTTP 200 (retry after LLM JSON flake) find_investors user=12 HTTP 200 find_talent user=15 HTTP 200 rec=1.00 (no my_person matches) find_talent user=12 HTTP 200 rec fractional (Mark’s real data) find_customers user=12 HTTP 200 research_company user=12 HTTP 200
Bug diagnosed en route. Xano timestamp fields don’t arithmetically coerce — now - $last_act threw ERROR_FATAL: Not numeric. Fix: force via (now|to_text)|to_int and ($last_act|to_text)|to_int before subtraction. Plus null guards on every nested field access (|get:"key":default only handles missing keys, NOT null values — bit me three times).

Goal. When the router classifier sees a long-horizon, multi-step, goal-shaped query, append outcome_suggestion: { is_outcome, reason, suggested_outcome } to the response. UI surfaces it as an inline CTA: "this looks like an outcome — want to convert?"
What shipped. Extended the router prompt in endpoint 8353 to emit a second JSON field outcome_signal alongside tool selection. Zero extra API calls — same LLM call now classifies both tool and goal-shape. Surfaced on the response as outcome_suggestion.
Validation. Two parallel probes:
LOOKUP "find me CTOs at AI startups in SF" 22.5s
tool: find_talent
outcome_suggestion: {
is_outcome: false,
reason: "single lookup",
suggested_outcome: ""
}
OUTCOME "close a seed round by Q3 for my dev tools startup" 44.3s
tool: find_investors
outcome_suggestion: {
is_outcome: true,
reason: "goal-shaped, multi-step task with deadline",
suggested_outcome: "secure $X in seed funding from investors
by Q3 for dev tools startup"
}
Router correctly discriminates between the two shapes. Front-end can now show a "Convert to outcome?" CTA with the suggested_outcome pre-filled, which on confirm POSTs directly to /robert-lab/outcome-agent (8354) for the full plan.

Goal. Run the Robert-Bu-email LSI Deck queries through the v2 agent-router once Mark lands the 2,000 medical-tech conference attendees in the graph. Pull a CSV of ranker outputs, eyeball top-20 per query, validate the recency filter behaves, and compare against the manual LSI Investor Challenge results from last week (476 investors / 115 graph matches).
This is the proof-of-value run. If the v2 ranker surfaces the same investors the manual process found — and does it in one API call — we know the pipeline works end-to-end.

Goal. Confirmed real: production semantic-question fn 4668 is missing the vecf32() wrapper on the FalkorDB vector search call, so vector similarity returns nothing. Mark investigates tomorrow AM. My Robert Lab /query endpoint has the correct wrapper — any UI that needs semantic search should route through 8340 until 4668 is fixed.
After the seven directives landed, I kept pushing: four hardening wins that close every remaining fixable gap before Monday. Each validated end-to-end. No code debt shipped with them.

Problem. Parent agent_trace rows for outcome_agent were always writing graph_count:0 even when sub-tasks successfully hit the graph. Made the audit trail useless for judging whether a decomposed outcome actually touched data.
Fix. Patched the dispatch loop to compute a fallback: when child response's top-level graph_count is 0 but ranked_rows has entries, use ranked_rows|count instead. Each sub-task's resolved count feeds into $total_graph_count for the parent row.

Why. Mark wanted a way to eyeball agent traces Monday without digging through the Xano DB UI. The agent_trace table now has 42+ rows from overnight runs (7 tools x multiple queries + outcome_agent parents). Needs a list view with filter + pagination.
Shape. GET /robert-lab/agent-traces?tool=&page=&per_page=&hours_back=. Optional tool filter, pagination via standard Xano paging, default 72-hour window, sorted created_at desc.

Why. The ranker_weights table was populated with tool-specific weight profiles but nobody had confirmed they were being loaded and actually shifting rankings versus the default. Wanted hard evidence before Mark sees it.
Method. Fired 3 parallel curl calls to 8353 — one each for investors / talent / customers — against the same overall graph. Pulled rank_breakdown for every returned row and checked that (a) the weights block in the response matched the expected tool profile, and (b) composite-score orderings differed across tools even when candidate pools overlapped.

Problem. Router, synth, and decompose calls were relying on prompt-level 'NO markdown, NO fences' + post-hoc |replace:"```json":"" cleanup. Works most of the time but breaks when a model ignores the instruction and emits fenced JSON with leading text. Silent failures → empty plans.
Fix. Added response_format: {type:"json_object"} to all 4 JSON-expecting OpenRouter calls (router + synth in 8353, decompose + synth in 8354). Provider-side enforcement now guarantees valid JSON. Opus rationale call is left as-is — it returns a JSON array which is incompatible with json_object mode; fence-strip path still covers it.
response_format:{type:"json_object"}. Forces provider-side JSON validity instead of relying on fence-stripping alone. Validated: find climate-tech seed investors in Europe returned tool=find_investors, graph_count=4, clean result object. Opus rationale call left as-is (it returns an array, which is incompatible with json_object mode).rank_breakdown per row. Confirmed distinct, tool-specific weight mixes are loaded from ranker_weights: find_investors (rel 0.35, auth 0.25, rec 0.15, conn 0.25), find_talent (rel 0.30, auth 0.35, rec 0.05, conn 0.30 — authority-heavy), find_customers (rel 0.45, auth 0.15, rec 0.20, conn 0.20 — relevance-heavy). Each tool produced distinct composite orderings across the same candidate pool. Data at /tmp/weights_ab/./robert-lab/agent-traces (ID 8355) gives Mark a paginated, tool-filterable, newest-first view of every agent_trace row. Params: tool (optional filter), page, per_page, hours_back. Validated: 42 total rows, filter tool=find_talent returns 20, tool=outcome_agent surfaces parent rows with aggregated sub_trace_ids. Mark can browse the full agent audit trail without touching the Xano DB UI Monday.$gc_direct / $gc_rows / $gc_final fallback (use ranked_rows|count when top-level graph_count is 0). Validated: test outcome "close a seed round for dev tools startup" produced parent trace row id=50 with graph_count=3, correctly aggregated from sub_task[0]=2 + sub_task[1]=1. Parent trace now reflects real graph hit count across all child dispatches.agent_trace row with tool:"outcome_agent", aggregated sub_trace_ids[] pointing at each child 8353 trace, total graph_count, and full synthesized plan. Validated: trace_id 1776689449203-0050b66a96 landed as row id 41, 2 sub-traces linked, 25.3s end-to-end. Gives Mark full visibility when reviewing outcome decompositions Monday.outcome_signal alongside tool selection (zero extra API calls). Surfaced as outcome_suggestion on response. Validated: single-lookup query correctly returns is_outcome:false, goal-shaped query returns is_outcome:true with a pre-drafted suggested_outcome ready for one-click dispatch to 8354./robert-lab/outcome-agent. Three-stage: decompose (OpenRouter) → dispatch each sub-task through 8353 /agent-or-opus → synthesize unified plan. Test outcome "close a seed round by Q3" produced 3 milestones, 2 sub-task trees, full plan in 71s. Unblocks directive 05.uuid → master_person.node_uuid → my_person.last_activity_at (user-scoped). Normalized to 0–1 over a 90-day window, clamped at both ends. 6/6 validation matrix passing across find_investors / find_talent / find_customers / research_company × user_ids 12 & 15. Bug fix: Xano timestamp fields need (x|to_text)|to_int coercion before arithmetic./robert-lab/agent-or-opus. Single-batch call to anthropic/claude-opus-4 via OpenRouter. Tested with find_investors query — rationales are specific, non-boilerplate, correctly flag weak candidates. ~30s latency cost for quality mode.days_since_last_contact dimension./robert-lab/agent-or. All 6 tool branches validated end-to-end through Fireworks/Together. Production 8349 stays on Groq until cutover approved.$env.openRouter is the correct env var name (discovered via function 4676 which already uses OpenRouter for embeddings).