Meridian
Collaborative Data Analysis • Live-time Collaboration & Reproducible Workflows
Role: Founder / Full-Stack Engineer
Duration: Shipped Nov 17, 2025
The Problem
Non-technical teams waste hours on repetitive data work: uploading spreadsheets, writing ad-hoc queries, waiting for results, and losing track of how insights were discovered. Existing tools aren't built for real-time collaboration or reproducible analysis workflows.
Challenge
Build a platform that combines fast analytical queries, streaming AI reasoning, and real-time multi-user collaboration while keeping heavy database binaries server-side (no WASM), preserving reproducibility, and enabling instant, live-updating charts and tables.
Solution
Meridian combines server-side DuckDB-node for vectorized OLAP analytics, Convex for real-time reactive subscriptions and presence, TanStack Start for clean RPC/server separation and streaming, and Cloudflare R2 for file storage. The platform streams agent reasoning and query results to the client so teams can collaborate live and replay any analysis to understand how insights were discovered.
Process
- Discovery & Design (Days 1-2): Validated pain points with non-technical teams and benchmarked DuckDB-node vs DuckDB-WASM. Designed separation of concerns so native DuckDB runs on server functions and clients only call RPCs.
- Core Platform (Days 3-8): Implemented server-only DuckDB via TanStack Start server functions. Built Convex subscriptions for live-time updates, presence, and query history storage.
- Streaming AI Agents (Days 9-12): Built streaming agents (Gemini + Convex Agent) that break questions into steps and stream explanatory reasoning to the UI while results update live.
- Polish & Deployment (Days 13-16): Integrated Cloudflare R2 for CSV storage and Firecrawl for URL → CSV extraction, added monitoring (Sentry) and automated reviews (CodeRabbit), and deployed on Netlify with server function constraints in mind.
Challenges Overcome
- Avoiding DuckDB WASM at Scale: Kept DuckDB-node on the server and used TanStack Start RPC separation so node binaries never bundle to the client. This solved client-side performance and scale issues flagged by experts.
- Passing Large Data Over RPC: Stored large CSVs and DuckDB artifacts in Cloudflare R2 / MotherDuck and passed references over RPC, enabling fast server-side processing without heavy serialization over the wire.
- Live-Time Updates & Reproducibility: Used Convex subscriptions for instant chart/table updates and stored every analysis step + reasoning so sessions are fully replayable and auditable.
Impact & Results
- Analysis Turnaround: Manual spreadsheets and serial queries → Instant insights with streaming steps (Minutes → seconds for common workflows)
- Collaboration Latency: No synchronized live updates → Millisecond-level chart updates (Real-time multi-user sync via Convex)
- Query Reproducibility: Lost ad-hoc analysis history → Full replay of analysis + reasoning (Transparent, auditable workflows)
Key Achievements
- Built server-side DuckDB analytics pipeline enabling vectorized OLAP queries on millions of rows
- Implemented streaming AI agents that expose step-by-step reasoning for transparent insights
- Delivered live-updating charts and tables with Convex reactivity and TanStack Start streaming
- Designed robust server-client separation to avoid WASM pitfalls and enable production-scale DuckDB usage
Tech Stack
TanStack Start, Convex, DuckDB-node, Cloudflare R2, Firecrawl, Gemini (streaming agents), Netlify, Sentry, CodeRabbit, TanStack Query & Table