Sumit Gupta

A work-in-progress, automated documentation of personal projects.

What I'm Exploring

Agent Orchestration

How do you coordinate multiple AI agents that each own a domain? Building a multi-agent system from scratch to learn about memory, tool use, handoffs, and durable workflows.

Measuring Agent Quality

Eval pipelines, tracing, binary pass/fail scoring — figuring out what "good" means for an agent and how to enforce it before changes ship.

Fully Local AI

What if quality tokens weren't scarce? Running local models on consumer hardware to explore what an abundance mindset looks like when inference is free.

Home Automation

Voice-controlled smart home with local processing. The intersection of AI agents and physical space — where software meets the real world.

Hardware

Baymax

NZXT H1 Mini-ITX · RTX 3060 Ti 8GB · 20TB

Home server running a full Docker stack — media automation, AI agent infrastructure, eval pipelines, observability, and a multi-agent orchestration bus. Everything self-hosted.

Inference Server

RTX 5060 Ti 16GB · vLLM

Dedicated local inference box. 16GB VRAM for running medium-sized models (26B MoE, 31B dense) at production quality with constrained decoding. The goal: near-zero API costs for routine agent work.

MacBook Pro

M1 Max · 64GB Unified · Ollama

Testing larger models that don't fit in GPU VRAM — 70B quantized, dense 31B+. Used for A/B evaluation against cloud models before promoting to the inference server.

Experiments → GitHub → LinkedIn →