← Back to Hub
Sandbox Pipeline · Reference Implementation

Find Investors
E2E Pipeline

The reference implementation of the Anything Engine. Classify → Embed → Graph Query → Synthesize → Crayon card stream. Live on the sandbox today.

May 1, 2026 · orbiter-sandbox.vercel.app · 4 of 14 tools live
Status
E2E
Working
6
Vector Dims
SSE
Streaming
T2
Zep mem_used
Live demo: orbiter-sandbox.vercel.app — Type any investor-search query and watch the full pipeline run. The classify, embed, FalkorDB query, and Groq synthesis steps all execute in sequence and stream contact cards back via SSE.
Pipeline — Step by Step
1
Classify Intent
User types a natural language query. Next.js route POSTs to Xano /classify (8400). Groq Llama 3.3 70B classifies the intent in <300ms. Returns {class, confidence, reasoning}. If confidence < 0.75, surface a confirmation card before routing.
// POST /api:UgP1h6uR/classify
{ "query": "find seed VCs for AI infrastructure, $3M round" }

// Response
{ "class": "find_investors", "confidence": 0.96, "count": 1,
  "reasoning": "Explicit fundraising context, stage+size mentioned." }
2
Embed Query
The classified query plus any pitch context (deck text, company description) is embedded via OpenRouter text-embedding-3-small (1536 dimensions). The embedding vector is used for semantic similarity matching against investor profiles. OpenAI embeddings are unit-normalized, so cosine similarity = dot product — no normalization step needed.
3
FalkorDB Cypher Query (Interim)
Multi-hop Cypher query over the FalkorDB knowledge graph. Matches VC_Firm and Angel labels, traverses portfolio and co-investment edges, scores against the query vector. Score filter < 0.85 removes low-confidence matches. The investor → company relationship is indirect: (Investor)‑[:INVESTED_IN]‑>(Funding_Round)<‑[:RAISED]‑(Company) — the second hop is required to get actual company names (not funding round IDs).
// Cypher pattern (simplified)
MATCH (i:VC_Firm)-[:INVESTED_IN]->(fr:Funding_Round)<-[:RAISED]-(co:Company)
WHERE fr.stage IN ['Seed', 'Pre-Seed']
  AND any(tag IN co.tags WHERE tag IN ['AI', 'Infrastructure', 'ML'])
WITH i, co, fr,
  vecf32(i.thesis_embedding) <-> vecf32($query_vec) AS score
WHERE score < 0.85
RETURN i.name, i.partner, co.name, fr.amount, score
ORDER BY score ASC
LIMIT 20
vecf32() is required: FalkorDB production vector search fails silently without the vecf32() wrapper on both sides of the similarity operator. Always wrap embedding vectors.
4
Synthesize with Groq + Opus 4
The ranked graph results are passed to Groq for per-person rationale synthesis. Each investor gets a WHY statement (justifies thesis fit, never asks for meetings), a drafted outreach subject line, and a confidence breakdown. Tone rule: "you're a journalist for a leading technical publication" — no AI-bro voice, no CTAs, no meeting closers.
// Per-investor synthesis output
{
  "master_person_id": 1847,
  "name": "Kai Nguyen",
  "firm": "Gradient Ventures",
  "fit_score": 0.91,
  "why": "Gradient led two AI-infra seed rounds in Q3 2024 (Layer and Synth.AI). Kai's public writing focuses on the infrastructure-application stack bottleneck — exactly the problem you're solving.",
  "draft_subject": "AI infra seed — Orbiter intro via [mutual]",
  "stage_match": true,
  "sector_match": true,
  "check_range": "$1M–$5M"
}
5
Stream via SSE → Crayon Cards
Results stream back to the Next.js BFF via Server-Sent Events. The Crayon SDK renders each investor as a contact card in real time as the stream arrives. Cards include name, firm, fit score, why statement, and a copy-ready outreach draft. No page reload — the card list populates progressively.
6
Zep Memory Update
After dispatch, the query, classification, and result entity IDs are written back to the user's Zep thread. On turn 2, thread.get_user_context provides prior context so vague follow-ups ("show me more like the last one") still classify and dispatch correctly. mem_used: true appears in the response when memory influenced the result.
FalkorDB Graph Structure (Interim)

The current sandbox uses FalkorDB as the interim graph database. 11,948 Entity nodes, 1,353 Funding_Rounds, 21 edge types, 30K+ edges. The AlloyDB migration preserves the same logical graph model but adds ScaNN vector indexes for sub-10ms hybrid queries.

LabelCount (approx)Key PropertiesRole in find_investors
VC_Firm~2,400name, thesis_embedding, check_min, check_max, stagePrimary match target
Angel~800name, thesis_embedding, sectors, stage_preferenceSecondary match target
Funding_Round1,353amount, stage, date, company_idPortfolio traversal hop
Company~5,200name, sector, tags, founded2nd-hop for portfolio company names
Person~8,000name, title, firm_id, bio_embeddingPartner-level contact for outreach

Key Edge Types for Investor Queries

INVESTED_IN
Investor → Funding_Round. Primary portfolio traversal edge.
RAISED
Company → Funding_Round. Closes the 2-hop loop to reach Company from Investor.
CO_INVESTED
VC_Firm → VC_Firm via shared Funding_Round. Used for co-investor warm-path drafts.
AlloyDB ScaNN — 6 Vector Dimensions

The AlloyDB migration adds 6 ScaNN vector columns per investor. A single SQL query combines hard filters (stage, check_size range) with semantic similarity across all 6 dimensions simultaneously — no multi-step pipeline needed.

sector
Thesis alignment by industry vertical. AI, BioTech, ClimateTech, etc.
stage
Preferred investment stage. Pre-seed, Seed, Series A/B/C+.
check_size
Check range fit. Avoids surfacing a $50M+ fund for a $1M round.
geography
Preferred markets. US-only, emerging markets, global, region-specific.
signal
Recency signal. Recent investments, blog posts, and public statements.
founder_fit
Pattern matching on founder backgrounds the fund has backed before.
ScaNN advantage: Unlike the current 2-step pipeline (vector search → filter → re-rank), AlloyDB ScaNN executes hard filters AND semantic similarity in a single SQL call. This eliminates the rank degradation that happens when post-filtering removes top vector matches.
-- AlloyDB ScaNN pattern (pending migration)
SELECT
  i.id, i.name, i.firm, i.partner_name,
  (i.sector_embedding <=> $sector_vec) * 0.3 +
  (i.stage_embedding <=> $stage_vec)  * 0.25 +
  (i.check_embedding <=> $check_vec)  * 0.2 +
  (i.geo_embedding   <=> $geo_vec)    * 0.15 +
  (i.signal_embedding <=> $signal_vec) * 0.05 +
  (i.founder_embedding <=> $founder_vec) * 0.05 AS composite_score
FROM investors i
WHERE i.check_min <= $ask AND i.check_max >= $ask
  AND i.stage @> ARRAY[$stage]
ORDER BY composite_score ASC
LIMIT 20
Zep Memory Layer
Why Zep

Chosen over Mem0 and Cognee for plug-and-play integration, SOC2 compliance, temporal memory (facts expire appropriately), and a Graphiti escape hatch for graph-structured memory if needed. Free tier handles current scale.

Turn 2 Behavior

On a vague follow-up query ("show me more like those"), mem_used flips to true and the response quality matches a full-context first query. Verified in sandbox testing — no degradation on turn 2.

Memory Flow

// Every dispatch ingest (Xano endpoint, post-synthesis)
{
  "thread_id": "thread_abc123",
  "user_id": 15,
  "facts": [
    { "type": "query", "value": "find seed VCs for AI infrastructure" },
    { "type": "classification", "value": "find_investors" },
    { "type": "entities", "value": ["Gradient Ventures", "Kai Nguyen", "Layer", "Synth.AI"] },
    { "type": "context", "value": { "stage": "Seed", "sector": "AI Infrastructure", "ask": 3000000 } }
  ]
}

// Turn 2 retrieval (pre-classify)
GET /zep/threads/{thread_id}/context
→ { "prior_class": "find_investors", "prior_entities": [...], "prior_context": {...} }
CrayonChat SDK — Generative UI

The Crayon SDK renders server responses as rich interactive cards rather than plain text. Each investor result streams in as a structured card template. The frontend uses @crayonai/react-core with custom templates registered per card type.

Streaming Pattern

SSE stream from Xano → Next.js route handler → Crayon SDK. Each data: event carries a partial card payload. Cards render progressively as tokens arrive — no blank loading state.

Template Registry

Xano response includes template_name: "contact_card". The SDK maps this to the registered React component. The sandbox uses a subset of the full copilot template registry: contact_card, scanning_card, error_card.

Contact Card Schema

// SSE data payload per investor
{
  "template_name": "contact_card",
  "data": {
    "master_person_id": 1847,
    "name": "Kai Nguyen",
    "title": "General Partner",
    "firm": "Gradient Ventures",
    "avatar_url": "https://...",
    "fit_score": 0.91,
    "why": "Gradient led two AI-infra seed rounds...",
    "tags": ["AI Infrastructure", "Seed", "$1M–$5M"],
    "draft_subject": "AI infra seed — Orbiter intro via [mutual]",
    "draft_opening": "Hi Kai, [mutual] suggested I reach out..."
  }
}
File Upload — Pitch Deck Context

Users can upload pitch decks and company documents to enrich the investor-matching context. The text is extracted server-side and injected into the embedding and synthesis steps.

FormatMax SizeExtraction MethodStatus
.pdf25MBServer-side PDF parseLIVE
.doc / .docx25MBMammoth.js extractionLIVE
.txt5MBRaw textLIVE
.pptx25MBSlide text extractionLIVE
Endpoint 8414: POST /api:UgP1h6uR/file-upload — accepts multipart form data, returns extracted text and a pitch_context_id for reference in subsequent dispatch calls. Text is truncated to 8K tokens before embedding.
Mode Context Requirements

Each Anything Engine tool has a minimum context floor. When the floor is not met, the dispatcher surfaces a context-gap card asking the user to provide what's missing before running the query.

find_investors
Required
Pitch context: deck text OR company description OR stage + sector + ask size. Without at least one, match quality degrades significantly.
meeting_prep
Required
selectedEvent must be set — a specific calendar event must be selected before the prep pipeline will run. Surfaces event-picker card if missing.
leverage (copilot)
Required
selectedPerson must be set — a person must be selected from the network before the leverage loop tool activates.
find_talent
Recommended
Job description OR role title + required skills. Will attempt generic search without it but accuracy drops. JD text recommended.
AlloyDB Migration — What Changes
Today (FalkorDB Interim)
  • Cypher multi-hop query for graph traversal
  • Single thesis embedding per investor
  • 2-step pipeline: vector search then filter
  • ~1,400ms average end-to-end latency
  • No hard filter + semantic in single call
After (AlloyDB ScaNN)
  • Single SQL call: filters + 6-dim semantic
  • 6 vector columns per investor (sector/stage/check/geo/signal/founder)
  • ScaNN index: <10ms vector scan at full table
  • Composite weighted score, tunable per query class
  • Jog builds delta sync from BigQuery (~2 weeks out)
Xano is database-agnostic: The /classify and /dispatch endpoints don't change during the migration. Only the internal Xano function that executes the data query is swapped. The UI, SSE streaming, and Crayon card schema are unaffected.
Full Tech Stack
LayerTechnologyRole
UINext.js 14 App RouterFrontend + thin BFF route handlers. Zero business logic.
AuthWorkOS AuthKitOAuth, session management, user identity.
Generative UICrayonChat SDK (@crayonai/react-core)SSE streaming → contact card templates.
OrchestrationXano (API Group 1270)All pipeline logic: classify, embed, query, synthesize.
ClassifierGroq Llama 3.3 70BIntent classification at <300ms, temp 0.1.
EmbeddingsOpenRouter text-embedding-3-small1536-dim vectors for query and investor profiles.
Graph DB (interim)FalkorDBCypher multi-hop for investor graph traversal.
Graph DB (target)AlloyDB + ScaNN6-dim hybrid queries. Pending migration.
SynthesisGroq + Claude Opus 4Per-investor rationale and outreach drafts.
MemoryZep CloudThread context, mem_used on turn 2.
HostingVercelAuto-deploys on push to main. roboulos-projects/orbiter-sandbox.
What's Next
AlloyDB ScaNN

Migrate from FalkorDB to AlloyDB. Add 6-dim ScaNN indexes per investor. Jog builds BigQuery → AlloyDB delta sync. Target: same week as Mark's go-ahead.

Remaining 10 Tools

find_investors is the reference pattern. Port to: find_talent (done), find_customers (done), research_person (done). Next: find_partners, find_advisors, find_co_investors following the same Cypher → Groq → Crayon pattern.

Canonical Class Lock

14 class names must be locked across UI labels, classifier prompt, dispatcher, and Mintlify docs simultaneously. One source of truth, no drift between layers.

Port to Copilot

Once AlloyDB is live and 4+ tools are solid, the Anything Engine dispatch endpoint gets wired into the main Orbiter copilot. The sandbox validates the pattern before the port.