Pi Coding Agent: Architecture, Agent Loop, Extension System & Ecosystem
Overview
Pi is an open-source, MIT-licensed, terminal-based AI coding agent created by Mario Zechner (@badlogic), best known as the creator of the libGDX game framework. The project lives at badlogic/pi-mono and its playful official website is shittycodingagent.ai — a self-deprecating name that inadvertently became its brand identity.
Pi’s founding thesis: context engineering is paramount. By shipping with only 4 built-in tools and a system prompt under 1,000 tokens, Pi maximizes the context window available for actual code and project information. Everything else is delegated to an extension system, making Pi radically extensible without being opinionated.
As of March 2026: 28.3K GitHub stars, 3K forks, 3,400+ commits, 181 releases (latest v0.63.1), 349+ npm dependents, and 134+ contributors. Pi powers OpenClaw (180K+ GitHub stars), one of the most-starred open-source projects of early 2026.
Origins & History
Zechner built Pi after growing frustrated with Claude Code, which he described as having become “a spaceship with 80% of functionality I have no use for”. The system prompt and tools changed on every release, breaking his workflows and altering model behavior unpredictably.
Timeline:
- August 2025:
pi-monomonorepo created with foundational TypeScript/npm workspace structure - November 30, 2025: Public launch via blog post “What I learned building an opinionated and minimal coding agent”
- December 2, 2025: Pi appeared on the Terminal-Bench 2.0 leaderboard
- January 31, 2026: Armin Ronacher (Flask creator) published “Pi: The Minimal Agent Within OpenClaw”, giving Pi prominent external validation
- Early 2026: Pi became the engine powering OpenClaw, reaching massive visibility
- February 2026: Peter Steinberger (OpenClaw creator) joined OpenAI to build personal agents
- March 2026: Rapid daily releases continue; v0.63.1 with Gemini 3.1 Pro Preview support
Monorepo Architecture
Pi is organized as a layered monorepo with strict dependency enforcement (Nader Dabit’s deep dive):
┌─────────────────────────────────────────────────┐
│ Applications Layer │
│ pi-coding-agent │ pi-mom │ pi-pods │ pi-web-ui │
├─────────────────────────────────────────────────┤
│ UI Layer │
│ pi-tui (Terminal UI) │
├─────────────────────────────────────────────────┤
│ Agent Layer │
│ pi-agent-core (Agent Loop) │
├─────────────────────────────────────────────────┤
│ Foundation Layer │
│ pi-ai (Unified Multi-Provider LLM API) │
└─────────────────────────────────────────────────┘
| Package | Purpose |
|---|---|
| pi-ai | Unified LLM API supporting 15+ providers with streaming, tool calling (TypeBox schemas), cross-provider context handoffs, thinking/reasoning support, and cost/token tracking |
| pi-agent-core | Agent loop: tool execution, AJV validation, event streaming, state management, message queuing (steering + follow-up) |
| pi-coding-agent | Full CLI runtime: 4 built-in tools, JSONL session persistence, AGENTS.md context files, auto-compaction, extension/skill/theme system |
| pi-tui | Terminal UI with differential rendering, synchronized output, markdown rendering, live streaming diffs |
| pi-web-ui | Web components for AI chat interfaces |
| pi-mom | Slack bot that delegates to the coding agent |
| pi-pods | CLI for managing vLLM GPU pod deployments |
The Agent Loop
Core Mechanics
The agent loop in pi-agent-core follows the standard agentic pattern: send messages to the LLM → execute tool calls → feed results back → repeat until the model stops calling tools. A critical design decision: the loop has no max-steps, no timeouts, no iteration limits. As Zechner writes: “The agent loop just loops until the agent says it’s done.”
Event Lifecycle
The full lifecycle (extensions documentation):
pi starts
│
├── session_directory (CLI startup only)
├── session_start
│
▼
user sends prompt ──────────────────────────────────┐
│ │
├── input (can intercept/transform/handle) │
├── before_agent_start (inject messages, modify │
│ system prompt) │
├── agent_start │
│ ├── message_start / message_update / message_end │
│ │ │
│ │ ┌── turn (repeats while LLM calls tools) ───┐
│ │ │ │
│ │ ├── turn_start │
│ │ ├── context (modify messages before LLM) │
│ │ ├── before_provider_request │
│ │ │ LLM responds, may call tools: │
│ │ │ ├── tool_execution_start │
│ │ │ ├── tool_call (can block) │
│ │ │ ├── tool_execution_update │
│ │ │ ├── tool_result (can modify) │
│ │ │ └── tool_execution_end │
│ │ └── turn_end │
│ │ │
│ └── agent_end │
│ │
user sends another prompt ◄─────────────────────────┘
Execution Flow (Step-by-Step)
- User input received — the
inputevent fires, extensions can intercept/transform - Skill/template expansion — if input matches a
/skill:namepattern, Pi expands it lazily before_agent_start— extensions inject messages or modify system prompt (RAG, memory, dynamic context)agent_start— core loop begins- Turn begins —
contextevent fires (extensions rewrite message history), thenbefore_provider_request - LLM call via
streamFn()— provider-agnostic streaming layer normalizes responses intotext_delta,thinking_delta,toolcall_delta,done,error - Tool call detection — parameters validated against TypeBox schemas via AJV; validation errors returned as tool results for self-correction
- Tool execution — dual-payload results:
content(LLM-visible) +details(UI-only) - Results fed back — loop returns to step 5 for another turn
- Termination — when LLM produces a response without tool calls,
agent_endfires; follow-up messages in queue trigger new runs
No Built-in Planning Mode
Pi deliberately rejects planning modes, ReAct frameworks, and chain-of-thought scaffolding. Instead it relies on:
- The LLM’s inherent reasoning ability — “All frontier models have been RL-trained extensively, so they inherently understand what a coding agent is”
- File-based plans — write plans to
PLAN.mdfiles for observability and cross-session persistence - Thinking levels — five tiers (
minimal,low,medium,high,xhigh) control model reasoning depth, specified via model name suffix (e.g.,sonnet:high)
The Four Core Tools
| Tool | Description |
|---|---|
| read | Read file contents and images; supports directory listing |
| write | Create or overwrite files completely |
| edit | Surgical find/replace text edits with progressive output parsing |
| bash | Execute shell commands (default 30s timeout) |
Additional tools (grep, find, ls) exist but are disabled by default. Extensions can register custom tools via pi.registerTool() or override built-in tools by registering with the same name. The --no-tools flag starts Pi with zero built-in tools for fully custom setups.
Context Management
Auto-Compaction:
- Triggers when
contextTokens > contextWindow - reserveTokens(default reserve: 16,384 tokens) - Walks backwards through messages to
keepRecentTokensthreshold (default: 20,000 tokens) - LLM generates structured summary (goal, constraints, progress, decisions, next steps, read/modified files)
- Summary replaces older messages in-memory; full history retained in JSONL file
- Customizable via
session_before_compactextension hook
Session Persistence:
- Append-only JSONL files with tree structure via
id/parentIdfields - Mutable
leafIdpointer enables in-place branching without creating new files /treenavigates session history;/forkcreates new session from a branch point- Sessions auto-save to
~/.pi/agent/sessions/organized by working directory
Cross-Provider Context Handoffs:
- Mid-session model switching via
/modelorCtrl+Lacross 15+ providers - Thinking traces converted to plain text for provider compatibility
Human-in-the-Loop
Pi intentionally ships without permission popups. Instead:
- Extension-based approval gates via
tool_callevent: block dangerous operations withctx.ui.confirm() - Steering messages (
Enter): delivered after current tool, interrupt remaining tools - Follow-up messages (
Alt+Enter): queued until agent finishes, then delivered - Abort via
Escape,/stop, orctx.abort()
Termination
- Natural: loop exits when LLM produces response without tool calls — no step limits, no token limits on iterations
- User-initiated:
Escape,/stop, steering messages,ctx.abort() - Context overflow: auto-compaction recovers and retries
Extension & Plugin System
Architecture: TypeScript Modules with Event-Driven API
Pi’s extension model is custom — not based on VSCode extensions, LSP, or any existing standard. Extensions are TypeScript modules that export a default function receiving an ExtensionAPI object:
import type { ExtensionAPI } from "@mariozechner/pi-coding-agent";
export default function (pi: ExtensionAPI) {
pi.on("tool_call", async (event, ctx) => {
// intercept tool calls
});
pi.registerTool({ /* custom tool */ });
pi.registerCommand("mycommand", { /* slash command */ });
}Extensions are loaded via jiti (just-in-time TypeScript transpiler) — no compilation step needed. Save a .ts file, run /reload, and it’s live.
Discovery Locations
| Location | Scope |
|---|---|
~/.pi/agent/extensions/*.ts | Global (single files) |
~/.pi/agent/extensions/*/index.ts | Global (directories) |
.pi/extensions/*.ts | Project-local |
.pi/extensions/*/index.ts | Project-local (directories) |
settings.json → extensions array | Custom paths, npm/git packages |
-e ./path.ts CLI flag | Quick testing without installing |
What Plugins Can Do
- Custom Tools — register LLM-callable tools via
pi.registerTool()with TypeBox parameter schemas - Slash Commands — register interactive commands (e.g.,
/stats) viapi.registerCommand() - Keyboard Shortcuts — register hotkeys via
pi.registerShortcut() - Model Providers — dynamically register/override LLM providers including OAuth flows via
pi.registerProvider() - Custom UI Components — full TUI components, overlays, status bar, widgets via
ctx.ui.custom() - Message Renderers — custom display for tool calls/results via
pi.registerMessageRenderer() - CLI Flags — custom runtime flags via
pi.registerFlag() - Event Interception — block/modify tool calls, inject LLM context, intercept user input, customize compaction
- Sub-agents — spawn Pi instances via tmux or
--mode jsonsubprocesses - State Persistence — write to session JSONL via
pi.appendEntry(customType, data)
ExtensionAPI Surface
Registration: pi.on(), pi.registerTool(), pi.registerCommand(), pi.registerShortcut(), pi.registerFlag(), pi.registerMessageRenderer(), pi.registerProvider()
Messaging: pi.sendMessage() (with steer/followUp/nextTurn delivery modes), pi.sendUserMessage(), pi.appendEntry()
Model Control: pi.setModel(), pi.getThinkingLevel(), pi.setThinkingLevel()
Tool Access: pi.getActiveTools(), pi.getAllTools(), pi.setActiveTools(), pi.exec()
ExtensionContext (ctx): ctx.ui (dialogs, notifications, overlays), ctx.cwd, ctx.sessionManager, ctx.modelRegistry, ctx.isIdle(), ctx.abort(), ctx.compact(), ctx.getSystemPrompt(), ctx.getContextUsage()
Custom Tool Example
import { Type } from "@sinclair/typebox";
import { StringEnum } from "@mariozechner/pi-ai";
export default function (pi: ExtensionAPI) {
pi.registerTool({
name: "my_tool",
label: "My Tool",
description: "Does something useful for the LLM",
parameters: Type.Object({
action: StringEnum(["search", "analyze"] as const),
query: Type.String(),
}),
async execute(toolCallId, params, signal, onUpdate, ctx) {
onUpdate?.({ content: [{ type: "text", text: "Working..." }] });
return { content: [{ type: "text", text: "Result here" }] };
},
});
}Permission Gate Example
pi.on("tool_call", async (event, ctx) => {
if (event.toolName === "bash" && event.input.command.includes("rm -rf")) {
const ok = await ctx.ui.confirm("Allow dangerous command?", "Allow?");
if (!ok) return { block: true, reason: "Blocked by user" };
}
});Skills: On-Demand Capability Packages
Skills are a complementary extensibility mechanism — markdown files (SKILL.md) following the Agent Skills standard that describe workflows in natural language. Key property: skills are loaded lazily — only names and descriptions appear in the system prompt, with full instructions loaded on-demand. This is “progressive disclosure.”
Skills are compatible with Claude Code, Codex CLI, and other agents.
First-party skills (badlogic/pi-skills): brave-search, browser-tools, Google Calendar/Drive/Gmail CLI, transcribe (Groq Whisper), VS Code integration, YouTube transcript retrieval.
Package System
{
"name": "my-package",
"keywords": ["pi-package"],
"pi": {
"extensions": ["./extensions"],
"skills": ["./skills"],
"prompts": ["./prompts"],
"themes": ["./themes"]
}
}Management commands: pi install npm:@scope/pkg, pi install git:github.com/user/repo, pi remove, pi update, pi list, pi config. A package gallery aggregates community packages.
Security Model
No built-in sandboxing by default. Extensions run with full system permissions. Zechner’s stance: “As soon as your agent can write code and run code, it’s pretty much game over” for security — theatrical guardrails are pointless.
Community-built sandboxing:
- nono: Kernel-level via Landlock (Linux) / Seatbelt (macOS) — irreversible, deny-by-default
- gondolin: Linux micro-VM isolation
- Docker/containers: recommended for production use
Ecosystem Comparisons
| Feature | Pi | Claude Code | Cursor | Aider | Cline |
|---|---|---|---|---|---|
| Interface | Terminal CLI | Terminal CLI | IDE (VS Code fork) | Terminal CLI | VS Code Extension |
| Open Source | MIT | No | No | Apache 2.0 | Apache 2.0 |
| Models | 15+ providers, mid-session switching | Anthropic only | Multi-model | 100+ models | All major models |
| Core Tools | 4 | 20+ built-in | IDE-integrated | Git-integrated | Approval-gated |
| System Prompt | ~1,000 tokens | ~10,000+ tokens | N/A | Moderate | Moderate |
| Extension System | TypeScript (50+ examples) | Hooks (6-8) | Plugins | Limited | MCP, tools |
| MCP | No (deliberate) | Yes | Yes | No | Yes |
| Sub-agents | Via extensions/tmux | Built-in | Built-in | No | v3.58 native |
| Permissions | None (YOLO) | Built-in prompts | Built-in | Limited | Approve everything |
| Sessions | Tree-structured JSONL | Linear | Linear | Git branches | Checkpoints |
| Price | Free (BYOK) | $20-200/mo | $20-200/mo | Free (BYOK) | Free (BYOK) |
Where Pi wins vs. Claude Code: multi-model freedom, system prompt efficiency (~1K vs ~10K tokens), extension depth (20+ lifecycle hooks vs 6-8), cost (smaller prompts = fewer tokens), session tree architecture, full transparency (MIT OSS).
Where Claude Code wins: deepest reasoning (Opus 4.5 at 80.9% SWE-bench), batteries-included experience, enterprise support, larger community.
The No-MCP Stance: Pi’s rejection of MCP is principled — popular MCP servers consume 7-9% of context (13-18K tokens) with tools the model may never use. Zechner advocates “CLI tools with README files” accessible via bash, loaded only when needed.
oh-my-pi: The Batteries-Included Fork
oh-my-pi by Can Boluk transforms Pi into a comprehensive terminal coding environment:
- Hash-anchored edits (Hashline): Content-hash anchors replacing verbatim text — 6.7%-68.3% edit success rate improvements across models
- LSP integration: 11 LSP operations, 40+ language configs (Rust, Go, Python, TypeScript, Java, Kotlin, etc.), auto-diagnostics on write/edit
- Subagent system: 6 bundled agents (explore, plan, designer, reviewer, task, quick_task) with parallel execution via git worktrees
- Browser automation: Puppeteer-based with 14 stealth plugins
- Python kernel: Embedded IPython with streaming output
- TTSR (Time-Traveling Stream Rules): Mid-stream LLM interception to prevent invalid output
- AI-powered git: Conventional commits with hunk-level staging and dependency-ordered split commits
- Native performance: Rust-based N-API bindings for CPU-intensive operations
- 65+ syntax themes with auto dark/light switching
Currently at v13.14.0, built on Bun runtime for fast startup.
Notable Community Projects
| Project | Description |
|---|---|
| shitty-extensions | cost-tracker, handoff, oracle, plan-mode, memory-mode, usage-bar |
| pi-messenger | Multi-agent coordination with presence tracking, file reservations, crew-based task orchestration |
| pi-extensions (tmustier) | files-widget, tab-status, agent-guidance, arcade minigames |
| awesome-pi-agent | Curated list with 310+ stars |
| nono | Kernel-level sandboxing by the Sigstore author |
| pi-nvim | Neovim bridge |
| VS Code extension | Official VS Code integration |
OpenClaw: Pi at Scale
Pi’s breakout moment came as the engine inside OpenClaw (Peter Steinberger’s personal AI assistant), which reached 180K+ GitHub stars. As Shaw noted: “What OpenClaw did was merge Pi with Claude Skills. That’s the killer combo.”
OpenClaw uses Pi’s SDK mode to embed the agent into a messaging gateway across WhatsApp, Telegram, Discord, Slack, Signal, iMessage, Google Chat, and Microsoft Teams. This validated Pi’s architecture at massive production scale.
⚠️ Security concern: researchers found 341 malicious skills out of 2,857 on ClawHub — a 12% contamination rate, mostly installing Atomic Stealer malware.
Key Design Insights
-
Context engineering over feature accumulation: Pi’s ~1K token system prompt vs competitors’ ~10K+ leaves dramatically more context for actual code
-
Self-extending agent philosophy: If you want a feature, ask the agent to build it as an extension — the agent extends itself (Armin Ronacher)
-
Minimal scaffolding works: Terminal-Bench results show that “Terminus 2” (just a raw tmux session) holds its own against sophisticated tooling, validating that frontier models don’t need hand-holding
-
Tree-structured sessions: Unlike linear conversation histories, Pi’s branching sessions enable “side quests” without polluting main context
-
Dual-payload tool results: Separating LLM-visible content from UI-rendered details prevents context pollution while enabling rich rendering
-
Validation as self-correction: Tool parameter validation errors are returned as tool results, enabling LLM self-correction within the natural loop
Practical Recommendations
Choose Pi when:
- Multi-model flexibility matters (switch between Claude, GPT, Gemini, local)
- You want to build your own agent on top of a solid SDK
- Minimizing LLM costs is important
- You prefer terminal-native, composable Unix workflows
- Deep customization without forking internals is desired
- Vendor independence and open source is required
Choose alternatives when:
- Maximum reasoning on hard problems → Claude Code (80.9% SWE-bench)
- Batteries-included IDE experience → Cursor
- Git-native workflow → Aider
- Human-in-the-loop approval flows → Cline
- Fully autonomous coding → Devin / OpenHands
- MCP integrations are critical → Claude Code, Cursor, or Cline
Related Notes
- 2026-03-22 - Research - Pi Coding Agent 深度研究 - 扩展机制、oh-my-pi 与 Harness 工程
- 2026-03-21 - Research - Harness Engineering in AI Coding
- 2026-03-22 - Research - Building Proper Tests for Coding Agents in Harness Engineering
Sources
- Pi official site (shittycodingagent.ai)
- badlogic/pi-mono GitHub
- Mario Zechner’s founding blog post (Nov 2025)
- Armin Ronacher’s Pi analysis (Jan 2026)
- Nader Dabit’s Pi framework deep-dive
- Pi extensions docs
- Pi session docs
- Pi compaction docs
- oh-my-pi (can1357)
- awesome-pi-agent
- badlogic/pi-skills
- OpenClaw GitHub
- Terminal-Bench
- npm: @mariozechner/pi-coding-agent
- DeepWiki: Pi architecture
- disler/pi-vs-claude-code comparison
- Real Python: Pi AI coding tool
- Shivam Agarwal’s Pi/OpenClaw anatomy