Overnight Build — Apr 20→21

Shipped

In flight

Queued

Blocked on Mark

Context

Last night's ~1:27 call with Mark produced seven hard directives for the agent-router v2 layer, plus a bunch of infra that lands on his side next week. This page tracks the overnight execution of everything I can do solo, with progress reported as I go.

Reference docs: Agent Router architecture · Graph Query Lab · Live demo

Agent-router endpoint POST /robert-lab/agent (ID 8349, api:LITebdJ-)

Feedback endpoint POST /robert-lab/feedback (ID 8350)

Trace endpoint GET /robert-lab/trace (ID 8351)

Graph stats 11,948 Entity nodes · 1,353 Funding_Rounds · 21 edge types · 30K+ edges

Directives (Mark sync, Apr 20)

Swap Groq-queue → OpenRouter everywhere

Route + synthesize LLM calls in agent-router

Shipped

Provider swap: Groq fading out, Fireworks/Together OpenRouter stack brightening

Goal. Replace every Groq-queue LLM call in the agent-router (route step + synthesize step) with OpenRouter calls. Use provider order [fireworks, together] with allow_fallbacks: true so Fireworks is primary, Together is fallback, and OpenRouter auto-routes to any third-party if both 500.

Why. Groq queue has hard quota and single-provider risk. OpenRouter lets us lean on Fireworks (fast, consistent Llama 3.3 70B) and keep Together as warm fallback — same model family, zero dependency on a single cloud.

What shipped. Parallel endpoint POST /robert-lab/agent-or (ID 8352) created — clone of 8349 with both api.request blocks pointed at OpenRouter, provider preferences applied, auth via $env.openRouter. Production 8349 left on Groq until Robert approves cutover.

Validation. All 6 tool branches tested end-to-end through OpenRouter:

find_investors     9.2s  tool correct, 5 rows
find_talent       11.4s  5 candidates, rank_breakdown intact
find_customers    30.4s  slow outlier (likely Fireworks cold start)
research_person    4.0s  3 rows, connections expanded
research_company   4.4s  3 rows, funding_story intact
graph_query        6.5s  10 rows, fallback branch clean

Median latency ~6-9s, comparable to Groq. Response shape unchanged except new field provider: "openrouter[fireworks,together]".

Scaffold Outcome Agent

Derivative of leverage loops, tool-calling pattern

Shipped

Outcome decomposition: one goal node branching into milestones and sub-task dispatches, synthesizing back into a plan

Goal. New endpoint POST /robert-lab/outcome-agent that takes an outcome (goal-shaped natural language) and decomposes it into sub-tasks, each routing through the agent-router tools. Mark builds Query Agent in parallel; both feed the same surface.

What shipped. Endpoint POST /robert-lab/outcome-agent (ID 8354). Three-stage pipeline:

Decompose — OpenRouter Llama 3.3 70B (temp 0.2) reads the outcome, emits {milestones[], sub_tasks[]}. Each sub-task has tool, query, why. Cap at max_subtasks (default 3) to bound latency.
Dispatch — each sub-task HTTP POSTs to /robert-lab/agent-or-opus (8353) inside a foreach. Each inner call runs route → vector → Cypher → rank → Opus rationale → synth, so sub-tasks come back with full ranked_rows + rationales attached.
Synthesize — OpenRouter Llama 3.3 70B (temp 0.3) aggregates all sub-task results into {plan_summary, milestones[], top_recommendations[], next_actions[]}.

Validation. Test outcome "close a seed round by Q3 for a dev tools startup" with user_id=12, max_subtasks=2:

HTTP 200  71.0s end-to-end
milestones: [secure lead investor, hire CFO, map board]
sub_task 0: find_investors — 2 graph rows with Opus rationales
sub_task 1: find_talent    — 1 graph row  with Opus rationale
plan.plan_summary: "To close a seed round by Q3...
                    reach out to Seedcamp to explore potential lead..."
plan.top_recommendations: 2
plan.next_actions: 3

Latency tradeoff. Each sub-task is a full 8353 call (~30s for Opus-mode), so 2 sub-tasks ~= 60-75s, 3 sub-tasks ~= 90-120s. Acceptable for outcome-shaped queries where the user is explicitly asking for a plan. Future iteration: parallel dispatch via background jobs.

Parity. Mark is building the Query Agent in parallel on his side. Both feed the same surface — Query Agent handles single-turn queries, Outcome Agent handles multi-step goals.

Opus "why this person" rationale pass

Juicy final context sentence per top-N recommendation

Shipped

Five ranked candidate cards feeding a purple Opus orb that returns per-row rationale badges

Goal. After ranking, call Claude Opus (via OpenRouter: anthropic/claude-opus-4) with each top-N row's full graph context and the user's original query. Return a single sentence that explains why this person is specifically relevant right now. Attach as ranked_rows[i].rationale.

The piece Mark cares about most. This is what makes v2 feel alive — not just "ranked by composite_score 0.94" but specific, non-obvious signal pulled from the graph context.

What shipped. New endpoint POST /robert-lab/agent-or-opus (ID 8353) — clone of 8352 with a single-batch Opus call inserted after the ranker. Passes all ranked rows in one payload, asks Opus for [{uuid, rationale}], maps by uuid, attaches as ranked_rows[i].rationale. Graceful fallback: if parsing fails, rationales stay empty strings and synthesis continues.

Sample output from test query "Find me seed-stage AI investors who back developer tools":

[0] Angel Investor (0.446)
    "Generic angel investor profile with no portfolio
     data makes it impossible to verify developer tools
     focus or AI expertise."

[1] Angel Invest (0.534)
    "European super angel deploying €125k checks at 100
     startups annually could be high-velocity partner for
     developer tools, though portfolio shows events
     company not dev tools."

[2] O'Reilly AlphaTech Ventures (0.425)
    "O'Reilly's venture arm brings unmatched developer
     ecosystem access and open source credibility crucial
     for AI developer tools adoption."

[4] Founders Fund (0.624)
    "Founders Fund's Pathfinder vehicle backed Rippling
     (developer-first HR) and has PayPal/Palantir DNA
     indicating strong technical founder affinity."

This is the juice Mark asked for — concrete portfolio signal, thesis alignment, non-boilerplate. Note row [0] correctly flags a weak candidate as weak rather than fabricating relevance. Latency cost: ~30s added to endpoint (48s total vs 10s for 8352), which is the "quality mode" tradeoff.

Three endpoints for A/B: 8349 (Groq, prod) · 8352 (OpenRouter, fast) · 8353 (OpenRouter + Opus, quality).

Recency-of-contact filter dimension

Negative signal — downweight / exclude last-30-day contacts

Shipped

Timeline with recent-contact dots dimmed red and 90+ day dots glowing green, composite score curve on the right

Goal. Add days_since_last_contact as a 5th ranker dimension (or replace one of the existing slots for certain profiles). People Robert talked to in the last 30 days get downweighted; last 7 days get excluded. Prevents "hey, I know I just emailed you, but..." recommendations.

Data source shipped. my_person.last_activity_at (already populated by Mark's pipeline, indexed on master_person_id + user_id). FalkorDB row uuid → master_person.node_uuid → my_person row for the authed user. Nylas join deferred — current signal already covers email + calendar + meeting activity.

Normalization. days_since = (now - last_activity_at) / 86_400_000. 0–90 days linear, capped at 1.0 beyond 90. Never-contacted → 1.0 (safe to reach). Future/clock-drift timestamps → 1.0 (treat as unknown).

What shipped. Endpoint 8353 /robert-lab/agent-or-opus now populates rank_breakdown.rec per row. Recency is one of the four dimensions the composite score blends. Per-tool weight profiles (from directive 03) already wire through — find_talent uses authority-heavy weights, find_customers uses relevance-heavy, etc. Recency rides alongside.

Validation matrix (6/6 pass, Apr 21 ~05:30):

find_investors   user=15   HTTP 200  (retry after LLM JSON flake)
find_investors   user=12   HTTP 200
find_talent      user=15   HTTP 200  rec=1.00 (no my_person matches)
find_talent      user=12   HTTP 200  rec fractional (Mark’s real data)
find_customers   user=12   HTTP 200
research_company user=12   HTTP 200

Bug diagnosed en route. Xano timestamp fields don’t arithmetically coerce — now - $last_act threw ERROR_FATAL: Not numeric. Fix: force via (now|to_text)|to_int and ($last_act|to_text)|to_int before subtraction. Plus null guards on every nested field access (|get:"key":default only handles missing keys, NOT null values — bit me three times).

Router suggests "make this an outcome?"

Cross-surface hint when Discover query smells outcome-shaped

Shipped

Router chip at a fork: lookup path on the left, outcome-plan path on the right with a pre-drafted suggestion bubble

Goal. When the router classifier sees a long-horizon, multi-step, goal-shaped query, append outcome_suggestion: { is_outcome, reason, suggested_outcome } to the response. UI surfaces it as an inline CTA: "this looks like an outcome — want to convert?"

What shipped. Extended the router prompt in endpoint 8353 to emit a second JSON field outcome_signal alongside tool selection. Zero extra API calls — same LLM call now classifies both tool and goal-shape. Surfaced on the response as outcome_suggestion.

Validation. Two parallel probes:

LOOKUP  "find me CTOs at AI startups in SF"          22.5s
tool: find_talent
outcome_suggestion: {
  is_outcome: false,
  reason: "single lookup",
  suggested_outcome: ""
}

OUTCOME "close a seed round by Q3 for my dev tools startup"  44.3s
tool: find_investors
outcome_suggestion: {
  is_outcome: true,
  reason: "goal-shaped, multi-step task with deadline",
  suggested_outcome: "secure $X in seed funding from investors
                      by Q3 for dev tools startup"
}

Router correctly discriminates between the two shapes. Front-end can now show a "Convert to outcome?" CTA with the suggested_outcome pre-filled, which on confirm POSTs directly to /robert-lab/outcome-agent (8354) for the full plan.

Dog-food LSI 2000

Blocked — waiting on Mark to populate graph Monday

Blocked

2000 attendee particles funneling into investor, scored-person, and close-match lanes

Goal. Run the Robert-Bu-email LSI Deck queries through the v2 agent-router once Mark lands the 2,000 medical-tech conference attendees in the graph. Pull a CSV of ranker outputs, eyeball top-20 per query, validate the recency filter behaves, and compare against the manual LSI Investor Challenge results from last week (476 investors / 115 graph matches).

This is the proof-of-value run. If the v2 ranker surfaces the same investors the manual process found — and does it in one API call — we know the pipeline works end-to-end.

Semantic-question fn 4668 bug

Mark's domain — tracking for my own reference

Mark handles

Cracked function node with disconnected vecf32 wrapper pieces floating nearby, Mark flag indicating ownership

Goal. Confirmed real: production semantic-question fn 4668 is missing the vecf32() wrapper on the FalkorDB vector search call, so vector similarity returns nothing. Mark investigates tomorrow AM. My Robert Lab /query endpoint has the correct wrapper — any UI that needs semantic search should route through 8340 until 4668 is fixed.

Wave 2 — Post-sunrise hardening (Apr 21 AM)

After the seven directives landed, I kept pushing: four hardening wins that close every remaining fixable gap before Monday. Each validated end-to-end. No code debt shipped with them.

Parent graph_count aggregation fixed (8354)

Child graph hits now roll up to the parent outcome_agent trace

Shipped

Aggregation fix: two green child nodes with counts 2 and 1 rolling up to parent node now lit green with count 3

Problem. Parent agent_trace rows for outcome_agent were always writing graph_count:0 even when sub-tasks successfully hit the graph. Made the audit trail useless for judging whether a decomposed outcome actually touched data.

Fix. Patched the dispatch loop to compute a fallback: when child response's top-level graph_count is 0 but ranked_rows has entries, use ranked_rows|count instead. Each sub-task's resolved count feeds into $total_graph_count for the parent row.

patch_path: 8354 dispatch loop, added $gc_direct / $gc_rows / $gc_final

test_outcome: "close a seed round for dev tools startup"

trace_row_id: 50

parent_graph_count: 3 (= 2 from sub[0] + 1 from sub[1])

Agent-traces list endpoint (ID 8355)

Paginated, tool-filtered browse over every trace Mark will want to QA

Shipped

Paginated audit trail list with tool filter chip highlighted and timestamps on each row

Why. Mark wanted a way to eyeball agent traces Monday without digging through the Xano DB UI. The agent_trace table now has 42+ rows from overnight runs (7 tools x multiple queries + outcome_agent parents). Needs a list view with filter + pagination.

Shape. GET /robert-lab/agent-traces?tool=&page=&per_page=&hours_back=. Optional tool filter, pagination via standard Xano paging, default 72-hour window, sorted created_at desc.

endpoint_id: 8355

method + path: GET /robert-lab/agent-traces

validated: 42 rows total, tool=find_talent → 20 rows, tool=outcome_agent → parent rows with sub_trace_ids[] aggregated

curl: curl "https://xh2o-yths-38lt.n7c.xano.io/api:LITebdJ-/robert-lab/agent-traces?tool=outcome_agent"

Per-tool weight profiles A/B-validated

Same candidate pool, three different rankings — proof the weights actually work

Shipped

Three side-by-side equalizer panels for investors, talent, customers with distinct slider heights and different ranking orders below

Why. The ranker_weights table was populated with tool-specific weight profiles but nobody had confirmed they were being loaded and actually shifting rankings versus the default. Wanted hard evidence before Mark sees it.

Method. Fired 3 parallel curl calls to 8353 — one each for investors / talent / customers — against the same overall graph. Pulled rank_breakdown for every returned row and checked that (a) the weights block in the response matched the expected tool profile, and (b) composite-score orderings differed across tools even when candidate pools overlapped.

find_investors: rel 0.35 | auth 0.25 | rec 0.15 | conn 0.25 (balanced)

find_talent: rel 0.30 | auth 0.35 | rec 0.05 | conn 0.30 (authority-heavy)

find_customers: rel 0.45 | auth 0.15 | rec 0.20 | conn 0.20 (relevance-heavy)

all three: graph_count=5, distinct composite orderings

raw data: /tmp/weights_ab/{investors,talent,customers}.json

LLM JSON hardening — provider-side json_object mode

Fence-strip hacks out, structured-output mode in

Shipped

Fragile cracked JSON on the left flowing through a JSON-mode gate into a perfectly-formed green checkmarked JSON on the right

Problem. Router, synth, and decompose calls were relying on prompt-level 'NO markdown, NO fences' + post-hoc |replace:"```json":"" cleanup. Works most of the time but breaks when a model ignores the instruction and emits fenced JSON with leading text. Silent failures → empty plans.

Fix. Added response_format: {type:"json_object"} to all 4 JSON-expecting OpenRouter calls (router + synth in 8353, decompose + synth in 8354). Provider-side enforcement now guarantees valid JSON. Opus rationale call is left as-is — it returns a JSON array which is incompatible with json_object mode; fence-strip path still covers it.

hardened_calls: 4 of 5 (router, synth, decompose, outcome-synth)

left_as_is: Opus rationale (returns array, json_object mode is object-only)

validation_query: "find climate-tech seed investors in Europe"

result: tool=find_investors, graph_count=4, clean result object, zero parse failures

Out of scope tonight

User graphs — Mark ships end of next week (multi-tenancy, per-user node copies from Orbiter Universe)
Files pipeline — Unstructured.io → markdown → QDRANT vector DB, starts next week
Avatars/files → GCP — moving off Xano storage
Monday meeting — Robert books late-day on Mark's cal in the morning

Live log

Apr 21 · 09:05

JSON hardening shipped — all 4 OpenRouter JSON-mode calls in 8353 (router + synth) and 8354 (decompose + synth) now pass response_format:{type:"json_object"}. Forces provider-side JSON validity instead of relying on fence-stripping alone. Validated: find climate-tech seed investors in Europe returned tool=find_investors, graph_count=4, clean result object. Opus rationale call left as-is (it returns an array, which is incompatible with json_object mode).

Apr 21 · 08:45

Per-tool weight profiles A/B-validated — ran parallel curl calls to 8353 for investors / talent / customers and dumped rank_breakdown per row. Confirmed distinct, tool-specific weight mixes are loaded from ranker_weights: find_investors (rel 0.35, auth 0.25, rec 0.15, conn 0.25), find_talent (rel 0.30, auth 0.35, rec 0.05, conn 0.30 — authority-heavy), find_customers (rel 0.45, auth 0.15, rec 0.20, conn 0.20 — relevance-heavy). Each tool produced distinct composite orderings across the same candidate pool. Data at /tmp/weights_ab/.

Apr 21 · 08:20

agent-traces list endpoint live — new GET /robert-lab/agent-traces (ID 8355) gives Mark a paginated, tool-filterable, newest-first view of every agent_trace row. Params: tool (optional filter), page, per_page, hours_back. Validated: 42 total rows, filter tool=find_talent returns 20, tool=outcome_agent surfaces parent rows with aggregated sub_trace_ids. Mark can browse the full agent audit trail without touching the Xano DB UI Monday.

Apr 21 · 07:50

Parent graph_count aggregation fixed — patched 8354 dispatch loop to add $gc_direct / $gc_rows / $gc_final fallback (use ranked_rows|count when top-level graph_count is 0). Validated: test outcome "close a seed round for dev tools startup" produced parent trace row id=50 with graph_count=3, correctly aggregated from sub_task[0]=2 + sub_task[1]=1. Parent trace now reflects real graph hit count across all child dispatches.

Apr 21 · 07:20

Page illustrations shipped — 8 gemini-3.1-flash-image-preview renders injected: widescreen hero (moon-arc with seven directive nodes) + one square technical diagram per directive, all in the house Orbiter style (dark navy + indigo/purple neon linework). Hero sits under the fold, each directive renders its own diagram between header and body. Consistent visual language across the overnight report + agent-router doc.

Apr 21 · 07:00

Outcome Agent audit trail wired — endpoint 8354 now writes a parent agent_trace row with tool:"outcome_agent", aggregated sub_trace_ids[] pointing at each child 8353 trace, total graph_count, and full synthesized plan. Validated: trace_id 1776689449203-0050b66a96 landed as row id 41, 2 sub-traces linked, 25.3s end-to-end. Gives Mark full visibility when reviewing outcome decompositions Monday.

Apr 21 · 06:35

Directive 05 shipped — router outcome suggestion. Extended 8353 router prompt to emit outcome_signal alongside tool selection (zero extra API calls). Surfaced as outcome_suggestion on response. Validated: single-lookup query correctly returns is_outcome:false, goal-shaped query returns is_outcome:true with a pre-drafted suggested_outcome ready for one-click dispatch to 8354.

Apr 21 · 06:15

Directive 02 shipped — Outcome Agent scaffold at endpoint 8354 /robert-lab/outcome-agent. Three-stage: decompose (OpenRouter) → dispatch each sub-task through 8353 /agent-or-opus → synthesize unified plan. Test outcome "close a seed round by Q3" produced 3 milestones, 2 sub-task trees, full plan in 71s. Unblocks directive 05.

Apr 21 · 05:45

Directive 04 shipped — recency-of-contact filter plumbed into endpoint 8353. Chain: FalkorDB uuid → master_person.node_uuid → my_person.last_activity_at (user-scoped). Normalized to 0–1 over a 90-day window, clamped at both ends. 6/6 validation matrix passing across find_investors / find_talent / find_customers / research_company × user_ids 12 & 15. Bug fix: Xano timestamp fields need (x|to_text)|to_int coercion before arithmetic.

Apr 21 · 05:05

Directive 03 shipped — Opus rationale pass in endpoint 8353 /robert-lab/agent-or-opus. Single-batch call to anthropic/claude-opus-4 via OpenRouter. Tested with find_investors query — rationales are specific, non-boilerplate, correctly flag weak candidates. ~30s latency cost for quality mode.

Apr 21 · 04:45

Starting directive 04 — recency-of-contact filter. Need to locate Nylas email threads table schema and calendar events join key, then extend ranker with days_since_last_contact dimension.

Apr 21 · 02:30

Directive 01 shipped — OpenRouter swap in endpoint 8352 /robert-lab/agent-or. All 6 tool branches validated end-to-end through Fireworks/Together. Production 8349 stays on Groq until cutover approved.

Apr 20 · 23:30

Authenticated xano-mcp, pulled endpoint 8349 XanoScript, confirmed $env.openRouter is the correct env var name (discovered via function 4676 which already uses OpenRouter for embeddings).

Apr 20 · 21:45

Page created — overnight-build scaffold deployed to Cloudflare Pages. Link added to dev hub index.

Apr 20 · 21:30

Mark sync absorbed — 7 directives saved to memory, tasks 48–54 created. Dependencies wired: 05→02, 06→{01,03,04}.

Seven Directives, One Night — Plus Four

Context