Sumit Gupta
Engineering leader building AI agents, eval systems, and the infrastructure to run them.
Hardware
Baymax
NZXT H1 Mini-ITX · RTX 3060 Ti · 20TB
Home server running a full Docker stack — media automation, AI agent infrastructure, eval pipelines, observability, and a multi-agent orchestration bus. Everything self-hosted.
MacBook Pro
M1 Max · 64GB RAM · Ollama
Local inference server running Gemma 4 models. Used for A/B evaluation against cloud models — testing where local models can replace API calls without quality loss.
What I'm Building
Multi-Agent System
Six specialized AI agents with distinct personas, each handling a domain — from media curation to email to productivity. Orchestrated through a custom event bus with durable workflows.
Eval Pipeline
Production eval system with Langfuse tracing, binary pass/fail scoring, and automated quality gates. Every agent change is measured before it ships.
Local Inference
Running structured experiments to replace cloud API calls with local models. Tracking cost, latency, and quality tradeoffs across a multi-GPU fleet.