Find Investors — E2E Pipeline (Sandbox)

Status

E2E

Working

6

Vector Dims

SSE

Streaming

T2

Zep mem_used

Live demo: orbiter-sandbox.vercel.app — Type any investor-search query and watch the full pipeline run. The classify, embed, FalkorDB query, and Groq synthesis steps all execute in sequence and stream contact cards back via SSE.

Pipeline — Step by Step

1

Classify Intent

User types a natural language query. Next.js route POSTs to Xano /classify (8400). Groq Llama 3.3 70B classifies the intent in <300ms. Returns {class, confidence, reasoning}. If confidence < 0.75, surface a confirmation card before routing.

// POST /api:UgP1h6uR/classify
{ "query": "find seed VCs for AI infrastructure, $3M round" }

// Response
{ "class": "find_investors", "confidence": 0.96, "count": 1,
  "reasoning": "Explicit fundraising context, stage+size mentioned." }

2

Embed Query

The classified query plus any pitch context (deck text, company description) is embedded via OpenRouter text-embedding-3-small (1536 dimensions). The embedding vector is used for semantic similarity matching against investor profiles. OpenAI embeddings are unit-normalized, so cosine similarity = dot product — no normalization step needed.

3

FalkorDB Cypher Query (Interim)

Multi-hop Cypher query over the FalkorDB knowledge graph. Matches VC_Firm and Angel labels, traverses portfolio and co-investment edges, scores against the query vector. Score filter < 0.85 removes low-confidence matches. The investor → company relationship is indirect: (Investor)‑[:INVESTED_IN]‑>(Funding_Round)<‑[:RAISED]‑(Company) — the second hop is required to get actual company names (not funding round IDs).

// Cypher pattern (simplified)
MATCH (i:VC_Firm)-[:INVESTED_IN]->(fr:Funding_Round)<-[:RAISED]-(co:Company)
WHERE fr.stage IN ['Seed', 'Pre-Seed']
  AND any(tag IN co.tags WHERE tag IN ['AI', 'Infrastructure', 'ML'])
WITH i, co, fr,
  vecf32(i.thesis_embedding) <-> vecf32($query_vec) AS score
WHERE score < 0.85
RETURN i.name, i.partner, co.name, fr.amount, score
ORDER BY score ASC
LIMIT 20

vecf32() is required: FalkorDB production vector search fails silently without the vecf32() wrapper on both sides of the similarity operator. Always wrap embedding vectors.

4

Synthesize with Groq + Opus 4

The ranked graph results are passed to Groq for per-person rationale synthesis. Each investor gets a WHY statement (justifies thesis fit, never asks for meetings), a drafted outreach subject line, and a confidence breakdown. Tone rule: "you're a journalist for a leading technical publication" — no AI-bro voice, no CTAs, no meeting closers.

// Per-investor synthesis output
{
  "master_person_id": 1847,
  "name": "Kai Nguyen",
  "firm": "Gradient Ventures",
  "fit_score": 0.91,
  "why": "Gradient led two AI-infra seed rounds in Q3 2024 (Layer and Synth.AI). Kai's public writing focuses on the infrastructure-application stack bottleneck — exactly the problem you're solving.",
  "draft_subject": "AI infra seed — Orbiter intro via [mutual]",
  "stage_match": true,
  "sector_match": true,
  "check_range": "$1M–$5M"
}

5

Stream via SSE → Crayon Cards

Results stream back to the Next.js BFF via Server-Sent Events. The Crayon SDK renders each investor as a contact card in real time as the stream arrives. Cards include name, firm, fit score, why statement, and a copy-ready outreach draft. No page reload — the card list populates progressively.

6

Zep Memory Update

After dispatch, the query, classification, and result entity IDs are written back to the user's Zep thread. On turn 2, thread.get_user_context provides prior context so vague follow-ups ("show me more like the last one") still classify and dispatch correctly. mem_used: true appears in the response when memory influenced the result.

FalkorDB Graph Structure (Interim)

The current sandbox uses FalkorDB as the interim graph database. 11,948 Entity nodes, 1,353 Funding_Rounds, 21 edge types, 30K+ edges. The AlloyDB migration preserves the same logical graph model but adds ScaNN vector indexes for sub-10ms hybrid queries.

Label	Count (approx)	Key Properties	Role in find_investors
`VC_Firm`	~2,400	name, thesis_embedding, check_min, check_max, stage	Primary match target
`Angel`	~800	name, thesis_embedding, sectors, stage_preference	Secondary match target
`Funding_Round`	1,353	amount, stage, date, company_id	Portfolio traversal hop
`Company`	~5,200	name, sector, tags, founded	2nd-hop for portfolio company names
`Person`	~8,000	name, title, firm_id, bio_embedding	Partner-level contact for outreach

Key Edge Types for Investor Queries

INVESTED_IN

Investor → Funding_Round. Primary portfolio traversal edge.

RAISED

Company → Funding_Round. Closes the 2-hop loop to reach Company from Investor.

CO_INVESTED

VC_Firm → VC_Firm via shared Funding_Round. Used for co-investor warm-path drafts.

AlloyDB ScaNN — 6 Vector Dimensions

The AlloyDB migration adds 6 ScaNN vector columns per investor. A single SQL query combines hard filters (stage, check_size range) with semantic similarity across all 6 dimensions simultaneously — no multi-step pipeline needed.

sector

Thesis alignment by industry vertical. AI, BioTech, ClimateTech, etc.

stage

Preferred investment stage. Pre-seed, Seed, Series A/B/C+.

check_size

Check range fit. Avoids surfacing a $50M+ fund for a $1M round.

geography

Preferred markets. US-only, emerging markets, global, region-specific.

signal

Recency signal. Recent investments, blog posts, and public statements.

founder_fit

Pattern matching on founder backgrounds the fund has backed before.

ScaNN advantage: Unlike the current 2-step pipeline (vector search → filter → re-rank), AlloyDB ScaNN executes hard filters AND semantic similarity in a single SQL call. This eliminates the rank degradation that happens when post-filtering removes top vector matches.

-- AlloyDB ScaNN pattern (pending migration)
SELECT
  i.id, i.name, i.firm, i.partner_name,
  (i.sector_embedding <=> $sector_vec) * 0.3 +
  (i.stage_embedding <=> $stage_vec)  * 0.25 +
  (i.check_embedding <=> $check_vec)  * 0.2 +
  (i.geo_embedding   <=> $geo_vec)    * 0.15 +
  (i.signal_embedding <=> $signal_vec) * 0.05 +
  (i.founder_embedding <=> $founder_vec) * 0.05 AS composite_score
FROM investors i
WHERE i.check_min <= $ask AND i.check_max >= $ask
  AND i.stage @> ARRAY[$stage]
ORDER BY composite_score ASC
LIMIT 20

Zep Memory Layer

Why Zep

Chosen over Mem0 and Cognee for plug-and-play integration, SOC2 compliance, temporal memory (facts expire appropriately), and a Graphiti escape hatch for graph-structured memory if needed. Free tier handles current scale.

Turn 2 Behavior

On a vague follow-up query ("show me more like those"), mem_used flips to true and the response quality matches a full-context first query. Verified in sandbox testing — no degradation on turn 2.

Memory Flow

// Every dispatch ingest (Xano endpoint, post-synthesis)
{
  "thread_id": "thread_abc123",
  "user_id": 15,
  "facts": [
    { "type": "query", "value": "find seed VCs for AI infrastructure" },
    { "type": "classification", "value": "find_investors" },
    { "type": "entities", "value": ["Gradient Ventures", "Kai Nguyen", "Layer", "Synth.AI"] },
    { "type": "context", "value": { "stage": "Seed", "sector": "AI Infrastructure", "ask": 3000000 } }
  ]
}

// Turn 2 retrieval (pre-classify)
GET /zep/threads/{thread_id}/context
→ { "prior_class": "find_investors", "prior_entities": [...], "prior_context": {...} }

CrayonChat SDK — Generative UI

The Crayon SDK renders server responses as rich interactive cards rather than plain text. Each investor result streams in as a structured card template. The frontend uses @crayonai/react-core with custom templates registered per card type.

Streaming Pattern

SSE stream from Xano → Next.js route handler → Crayon SDK. Each data: event carries a partial card payload. Cards render progressively as tokens arrive — no blank loading state.

Template Registry

Xano response includes template_name: "contact_card". The SDK maps this to the registered React component. The sandbox uses a subset of the full copilot template registry: contact_card, scanning_card, error_card.

Contact Card Schema

// SSE data payload per investor
{
  "template_name": "contact_card",
  "data": {
    "master_person_id": 1847,
    "name": "Kai Nguyen",
    "title": "General Partner",
    "firm": "Gradient Ventures",
    "avatar_url": "https://...",
    "fit_score": 0.91,
    "why": "Gradient led two AI-infra seed rounds...",
    "tags": ["AI Infrastructure", "Seed", "$1M–$5M"],
    "draft_subject": "AI infra seed — Orbiter intro via [mutual]",
    "draft_opening": "Hi Kai, [mutual] suggested I reach out..."
  }
}

File Upload — Pitch Deck Context

Users can upload pitch decks and company documents to enrich the investor-matching context. The text is extracted server-side and injected into the embedding and synthesis steps.

Format	Max Size	Extraction Method	Status
`.pdf`	25MB	Server-side PDF parse	LIVE
`.doc / .docx`	25MB	Mammoth.js extraction	LIVE
`.txt`	5MB	Raw text	LIVE
`.pptx`	25MB	Slide text extraction	LIVE

Endpoint 8414: POST /api:UgP1h6uR/file-upload — accepts multipart form data, returns extracted text and a pitch_context_id for reference in subsequent dispatch calls. Text is truncated to 8K tokens before embedding.

Mode Context Requirements

Each Anything Engine tool has a minimum context floor. When the floor is not met, the dispatcher surfaces a context-gap card asking the user to provide what's missing before running the query.

find_investors

Required

Pitch context: deck text OR company description OR stage + sector + ask size. Without at least one, match quality degrades significantly.

meeting_prep

Required

selectedEvent must be set — a specific calendar event must be selected before the prep pipeline will run. Surfaces event-picker card if missing.

leverage (copilot)

Required

selectedPerson must be set — a person must be selected from the network before the leverage loop tool activates.

find_talent

Recommended

Job description OR role title + required skills. Will attempt generic search without it but accuracy drops. JD text recommended.

AlloyDB Migration — What Changes

Today (FalkorDB Interim)

Cypher multi-hop query for graph traversal
Single thesis embedding per investor
2-step pipeline: vector search then filter
~1,400ms average end-to-end latency
No hard filter + semantic in single call

After (AlloyDB ScaNN)

Single SQL call: filters + 6-dim semantic
6 vector columns per investor (sector/stage/check/geo/signal/founder)
ScaNN index: <10ms vector scan at full table
Composite weighted score, tunable per query class
Jog builds delta sync from BigQuery (~2 weeks out)

Xano is database-agnostic: The /classify and /dispatch endpoints don't change during the migration. Only the internal Xano function that executes the data query is swapped. The UI, SSE streaming, and Crayon card schema are unaffected.

Full Tech Stack

Layer	Technology	Role
UI	Next.js 14 App Router	Frontend + thin BFF route handlers. Zero business logic.
Auth	WorkOS AuthKit	OAuth, session management, user identity.
Generative UI	CrayonChat SDK (`@crayonai/react-core`)	SSE streaming → contact card templates.
Orchestration	Xano (API Group 1270)	All pipeline logic: classify, embed, query, synthesize.
Classifier	Groq Llama 3.3 70B	Intent classification at <300ms, temp 0.1.
Embeddings	OpenRouter text-embedding-3-small	1536-dim vectors for query and investor profiles.
Graph DB (interim)	FalkorDB	Cypher multi-hop for investor graph traversal.
Graph DB (target)	AlloyDB + ScaNN	6-dim hybrid queries. Pending migration.
Synthesis	Groq + Claude Opus 4	Per-investor rationale and outreach drafts.
Memory	Zep Cloud	Thread context, `mem_used` on turn 2.
Hosting	Vercel	Auto-deploys on push to main. `roboulos-projects/orbiter-sandbox`.

What's Next

AlloyDB ScaNN

Migrate from FalkorDB to AlloyDB. Add 6-dim ScaNN indexes per investor. Jog builds BigQuery → AlloyDB delta sync. Target: same week as Mark's go-ahead.

Remaining 10 Tools

find_investors is the reference pattern. Port to: find_talent (done), find_customers (done), research_person (done). Next: find_partners, find_advisors, find_co_investors following the same Cypher → Groq → Crayon pattern.

Canonical Class Lock

14 class names must be locked across UI labels, classifier prompt, dispatcher, and Mintlify docs simultaneously. One source of truth, no drift between layers.

Port to Copilot

Once AlloyDB is live and 4+ tools are solid, the Anything Engine dispatch endpoint gets wired into the main Orbiter copilot. The sandbox validates the pattern before the port.

Find InvestorsE2E Pipeline

Key Edge Types for Investor Queries

Memory Flow

Contact Card Schema

Find Investors
E2E Pipeline