Research/HuggingFace

Thousand Token Wood: Multi-agent economy runs on a 3B model

H

HuggingFace

June 6, 2026

◷ 2 MIN

Original source

huggingface.co — read the full announcement →

What Thousand Token Wood Actually Is

Hugging Face released a demo called Thousand Token Wood. It's a simulation where 100 AI agents trade wood, stone, and gold in a minimal economy. Each agent runs on a 3B-parameter language model—specifically a quantized version of Gemma 2B. The entire thing fits in 12GB of VRAM. Agents negotiate prices, form contracts, and even cheat. It's not a game; it's a proof-of-concept for running multi-agent systems on commodity hardware. The simulation processes 1,000 tokens per agent per round. Hence the name.

Why a 3B Model Actually Matters Here

Most multi-agent research uses models like GPT-4 or Claude 3.5 Opus, costing thousands of dollars per run. Thousand Token Wood deliberately uses the smallest possible model. That changes the economics. If you can run 100 agents on a single RTX 4090, you can iterate fast. The prior state of the field was that multi-agent systems were reserved for labs with big budgets. Hugging Face's point is that you don't need that. The model's small size also forces compression: fewer parameters mean simpler agent strategies, which actually makes the economy more interpretable. It's a case of constraints becoming features.

What This Means for Agentic AI and Research

Honestly, this is more interesting for the infrastructure than the simulation itself. If multi-agent economies can run on a 3B model, then the bottleneck isn't model size—it's coordination overhead. Hugging Face also open-sourced the orchestration code. That's the real deliverable. For researchers, this means you can test economic theories with agent-based models without burning API credits. For startups building AI NPCs or automated negotiating bots, this is a viable path. But don't expect these agents to pass Turing tests. They're dumb and limited, which makes the emergent behavior (like collusion or price-fixing) all the more surprising.

The Missing Benchmarks and Real-World Caveats

The big question: does the economy actually resemble human behavior? The blog post shows agents spontaneously forming cartels, but is that robust or a fluke? We don't know. They didn't run systematic ablation studies. They also used a single prompt template—no variance. The model is likely not fine-tuned for economic reasoning, so results might be brittle. Another unknown: how does it scale? 100 agents is cute; 10,000 agents with 3B models would break any single GPU. Hugging Face didn't release a multi-GPU version. Also, the token limit of 1,000 per round forces short interactions. Real-world negotiations need more context. Watch for replication attempts.

Frequently Asked Questions

What model does Thousand Token Wood use?▾

It uses a quantized 3B-parameter model, specifically a version of Google's Gemma 2B, running in 4-bit precision. It fits on a single consumer GPU with 12GB VRAM.

How many agents run in the simulation?▾

The demo runs exactly 100 agents. Each agent manages inventory and negotiates trades using the same underlying model, but with different prompts and initial conditions.

Is the code open source?▾

Yes, Hugging Face released the full orchestration code on their GitHub repository under the Thousand Token Wood project name. It includes the simulation loop, the agent prompts, and the negotiation logic.

Can I run this on a free Google Colab instance?▾

Probably not. The model requires at least 12GB of VRAM. A Colab free tier gives you about 16GB? The full simulation with 100 agents might also exceed memory if you run it purely on TPU. Better to use a paid GPU or a local RTX 4090.

What economic behaviors emerge in the simulation?▾

Agents spontaneously form cartels to inflate prices of scarce resources. They also engage in price wars and sometimes default on contracts. The behavior is emergent, not scripted, but it's limited by the short token budget per round.