This post remains available as published. Older posts may use historical terminology that does not match the current public Gambit framing.

Gambit: an open source agent harness for reliable agents, assistants, and workflows

By Dan Sisco

agents

open-source

gambit

We’ve been quietly building Gambit, an open source agent harness. An agent
harness is essentially an operating system for LLMs, as Google's Phil Schmid
recently wrote.

Instead of wiring together yet another bespoke orchestration chain, Gambit takes
care of tool calling, planning, context, and evaluation so you can focus on the
part that’s unique to your product.

An Agent Harness is the infrastructure that wraps around an AI model to manage
long-running tasks. It is not the agent itself. It is the software system that
governs how the agent operates, ensuring it remains reliable, efficient, and
steerable.

Phil Schmid, AI Developer Experience, Google DeepMind

Why another harness?

Most frameworks still use complex code pipelines to orchestrate agents. They
look something like:

compute -> compute -> compute -> LLM -> compute -> compute -> LLM

Every hop requires you to glue more code together and hope your context window
doesn’t explode. Gambit flips that. Think:

LLM -> LLM -> LLM -> compute -> LLM -> LLM -> compute -> LLM

The language model drives the workflow and calls into compute only when it has
to. That keeps intent and state inside the agent, and removes a lot of
orchestration boilerplate.

Decks: the building blocks

We define agents as "decks." Think of decks as a unit of execution. Decks can be
markdown or TypeScript. They describe capabilities, tools, and interfaces in a
type-safe way. Your root deck can call other decks whenever a task needs to fan
out. Each deck can choose its own model, temperature, and system prompt, so
experimentation stays localized instead of rippling across the entire system.

Agents can call other agents freely. Because decks define their inputs and
outputs, Gambit can enforce contracts between them. You get modularity without
sacrificing the ability to tweak low-level behavior.

Guardrails baked in

Every step in a Gambit chain gets evals automatically via graders. Graders are
another deck type that scores entire conversations or single turns.

You can also define test agents for each deck. They simulate the situations your
production agent will run into and can generate synthetic transcripts for humans
or graders to inspect. The goal is to make evals a default from the beginning,
not an afterthought.

Why we built it

Before Gambit we shipped an LLM-powered video editor. It “worked,” but
reliability always lagged behind experience. That pushed us down the path of
improving inference-time quality without piling on human review. Gambit is the
harness we wanted back then: lightweight, composable, and honest about what’s
still missing.

What you can build right now

Open-source assistants where prompts, logic, and tools are easy to share and
remix.
Rubric-based grading pipelines so you don’t leak PII or fall out of policy
accidentally.
Fast prototypes: spin up a bot, let Codex or Claude Code run the CLI +
graders, and you’ve got a usable first pass with minimal human hand-holding.

We’re already running this with a handful of design partners, and it’s been fun
to watch them riff on the deck pattern.

What’s next (and what’s missing)

This is still a prototype. We know there are obvious edges we haven’t sanded
down yet. The goal of sharing now is to find more people who care about reliable
agents and want to shape the roadmap with us.

Try Gambit

Take Gambit for a spin, star the GitHub repo, and sign up for updates so you’ll
hear about new decks, graders, and integrations as soon as they land. Questions,
ideas, or critiques are all welcome.

You can also see a full demo here.