Sumit Gupta

Engineering leader building AI agents, eval systems, and the infrastructure to run them.

Hardware

Baymax

NZXT H1 Mini-ITX · RTX 3060 Ti · 20TB

Home server running a full Docker stack — media automation, AI agent infrastructure, eval pipelines, observability, and a multi-agent orchestration bus. Everything self-hosted.

MacBook Pro

M1 Max · 64GB RAM · Ollama

Local inference server running Gemma 4 models. Used for A/B evaluation against cloud models — testing where local models can replace API calls without quality loss.

What I'm Building

Multi-Agent System

Six specialized AI agents with distinct personas, each handling a domain — from media curation to email to productivity. Orchestrated through a custom event bus with durable workflows.

Eval Pipeline

Production eval system with Langfuse tracing, binary pass/fail scoring, and automated quality gates. Every agent change is measured before it ships.

Local Inference

Running structured experiments to replace cloud API calls with local models. Tracking cost, latency, and quality tradeoffs across a multi-GPU fleet.