Pi Coding Agent: Architecture, Agent Loop, Extension System & Ecosystem

Overview

Pi is an open-source, MIT-licensed, terminal-based AI coding agent created by Mario Zechner (@badlogic), best known as the creator of the libGDX game framework. The project lives at badlogic/pi-mono and its playful official website is shittycodingagent.ai — a self-deprecating name that inadvertently became its brand identity.

Pi’s founding thesis: context engineering is paramount. By shipping with only 4 built-in tools and a system prompt under 1,000 tokens, Pi maximizes the context window available for actual code and project information. Everything else is delegated to an extension system, making Pi radically extensible without being opinionated.

As of March 2026: 28.3K GitHub stars, 3K forks, 3,400+ commits, 181 releases (latest v0.63.1), 349+ npm dependents, and 134+ contributors. Pi powers OpenClaw (180K+ GitHub stars), one of the most-starred open-source projects of early 2026.

Origins & History

Zechner built Pi after growing frustrated with Claude Code, which he described as having become “a spaceship with 80% of functionality I have no use for”. The system prompt and tools changed on every release, breaking his workflows and altering model behavior unpredictably.

Timeline:

Monorepo Architecture

Pi is organized as a layered monorepo with strict dependency enforcement (Nader Dabit’s deep dive):

┌─────────────────────────────────────────────────┐
│            Applications Layer                    │
│  pi-coding-agent │ pi-mom │ pi-pods │ pi-web-ui │
├─────────────────────────────────────────────────┤
│              UI Layer                            │
│           pi-tui (Terminal UI)                   │
├─────────────────────────────────────────────────┤
│            Agent Layer                           │
│         pi-agent-core (Agent Loop)               │
├─────────────────────────────────────────────────┤
│           Foundation Layer                       │
│      pi-ai (Unified Multi-Provider LLM API)      │
└─────────────────────────────────────────────────┘
PackagePurpose
pi-aiUnified LLM API supporting 15+ providers with streaming, tool calling (TypeBox schemas), cross-provider context handoffs, thinking/reasoning support, and cost/token tracking
pi-agent-coreAgent loop: tool execution, AJV validation, event streaming, state management, message queuing (steering + follow-up)
pi-coding-agentFull CLI runtime: 4 built-in tools, JSONL session persistence, AGENTS.md context files, auto-compaction, extension/skill/theme system
pi-tuiTerminal UI with differential rendering, synchronized output, markdown rendering, live streaming diffs
pi-web-uiWeb components for AI chat interfaces
pi-momSlack bot that delegates to the coding agent
pi-podsCLI for managing vLLM GPU pod deployments

The Agent Loop

Core Mechanics

The agent loop in pi-agent-core follows the standard agentic pattern: send messages to the LLM → execute tool calls → feed results back → repeat until the model stops calling tools. A critical design decision: the loop has no max-steps, no timeouts, no iteration limits. As Zechner writes: “The agent loop just loops until the agent says it’s done.”

Event Lifecycle

The full lifecycle (extensions documentation):

pi starts
│
├── session_directory (CLI startup only)
├── session_start
│
▼
user sends prompt ──────────────────────────────────┐
│                                                    │
├── input (can intercept/transform/handle)           │
├── before_agent_start (inject messages, modify      │
│   system prompt)                                   │
├── agent_start                                      │
│   ├── message_start / message_update / message_end │
│   │                                                │
│   │   ┌── turn (repeats while LLM calls tools) ───┐
│   │   │                                            │
│   │   ├── turn_start                               │
│   │   ├── context (modify messages before LLM)     │
│   │   ├── before_provider_request                  │
│   │   │   LLM responds, may call tools:            │
│   │   │   ├── tool_execution_start                 │
│   │   │   ├── tool_call (can block)                │
│   │   │   ├── tool_execution_update                │
│   │   │   ├── tool_result (can modify)             │
│   │   │   └── tool_execution_end                   │
│   │   └── turn_end                                 │
│   │                                                │
│   └── agent_end                                    │
│                                                    │
user sends another prompt ◄─────────────────────────┘

Execution Flow (Step-by-Step)

  1. User input received — the input event fires, extensions can intercept/transform
  2. Skill/template expansion — if input matches a /skill:name pattern, Pi expands it lazily
  3. before_agent_start — extensions inject messages or modify system prompt (RAG, memory, dynamic context)
  4. agent_start — core loop begins
  5. Turn beginscontext event fires (extensions rewrite message history), then before_provider_request
  6. LLM call via streamFn() — provider-agnostic streaming layer normalizes responses into text_delta, thinking_delta, toolcall_delta, done, error
  7. Tool call detection — parameters validated against TypeBox schemas via AJV; validation errors returned as tool results for self-correction
  8. Tool execution — dual-payload results: content (LLM-visible) + details (UI-only)
  9. Results fed back — loop returns to step 5 for another turn
  10. Termination — when LLM produces a response without tool calls, agent_end fires; follow-up messages in queue trigger new runs

No Built-in Planning Mode

Pi deliberately rejects planning modes, ReAct frameworks, and chain-of-thought scaffolding. Instead it relies on:

  1. The LLM’s inherent reasoning ability — “All frontier models have been RL-trained extensively, so they inherently understand what a coding agent is”
  2. File-based plans — write plans to PLAN.md files for observability and cross-session persistence
  3. Thinking levels — five tiers (minimal, low, medium, high, xhigh) control model reasoning depth, specified via model name suffix (e.g., sonnet:high)

The Four Core Tools

ToolDescription
readRead file contents and images; supports directory listing
writeCreate or overwrite files completely
editSurgical find/replace text edits with progressive output parsing
bashExecute shell commands (default 30s timeout)

Additional tools (grep, find, ls) exist but are disabled by default. Extensions can register custom tools via pi.registerTool() or override built-in tools by registering with the same name. The --no-tools flag starts Pi with zero built-in tools for fully custom setups.

Context Management

Auto-Compaction:

  1. Triggers when contextTokens > contextWindow - reserveTokens (default reserve: 16,384 tokens)
  2. Walks backwards through messages to keepRecentTokens threshold (default: 20,000 tokens)
  3. LLM generates structured summary (goal, constraints, progress, decisions, next steps, read/modified files)
  4. Summary replaces older messages in-memory; full history retained in JSONL file
  5. Customizable via session_before_compact extension hook

Session Persistence:

  • Append-only JSONL files with tree structure via id/parentId fields
  • Mutable leafId pointer enables in-place branching without creating new files
  • /tree navigates session history; /fork creates new session from a branch point
  • Sessions auto-save to ~/.pi/agent/sessions/ organized by working directory

Cross-Provider Context Handoffs:

  • Mid-session model switching via /model or Ctrl+L across 15+ providers
  • Thinking traces converted to plain text for provider compatibility

Human-in-the-Loop

Pi intentionally ships without permission popups. Instead:

  • Extension-based approval gates via tool_call event: block dangerous operations with ctx.ui.confirm()
  • Steering messages (Enter): delivered after current tool, interrupt remaining tools
  • Follow-up messages (Alt+Enter): queued until agent finishes, then delivered
  • Abort via Escape, /stop, or ctx.abort()

Termination

  • Natural: loop exits when LLM produces response without tool calls — no step limits, no token limits on iterations
  • User-initiated: Escape, /stop, steering messages, ctx.abort()
  • Context overflow: auto-compaction recovers and retries

Extension & Plugin System

Architecture: TypeScript Modules with Event-Driven API

Pi’s extension model is custom — not based on VSCode extensions, LSP, or any existing standard. Extensions are TypeScript modules that export a default function receiving an ExtensionAPI object:

import type { ExtensionAPI } from "@mariozechner/pi-coding-agent";
 
export default function (pi: ExtensionAPI) {
  pi.on("tool_call", async (event, ctx) => {
    // intercept tool calls
  });
  pi.registerTool({ /* custom tool */ });
  pi.registerCommand("mycommand", { /* slash command */ });
}

Extensions are loaded via jiti (just-in-time TypeScript transpiler) — no compilation step needed. Save a .ts file, run /reload, and it’s live.

Discovery Locations

LocationScope
~/.pi/agent/extensions/*.tsGlobal (single files)
~/.pi/agent/extensions/*/index.tsGlobal (directories)
.pi/extensions/*.tsProject-local
.pi/extensions/*/index.tsProject-local (directories)
settings.jsonextensions arrayCustom paths, npm/git packages
-e ./path.ts CLI flagQuick testing without installing

What Plugins Can Do

  1. Custom Tools — register LLM-callable tools via pi.registerTool() with TypeBox parameter schemas
  2. Slash Commands — register interactive commands (e.g., /stats) via pi.registerCommand()
  3. Keyboard Shortcuts — register hotkeys via pi.registerShortcut()
  4. Model Providers — dynamically register/override LLM providers including OAuth flows via pi.registerProvider()
  5. Custom UI Components — full TUI components, overlays, status bar, widgets via ctx.ui.custom()
  6. Message Renderers — custom display for tool calls/results via pi.registerMessageRenderer()
  7. CLI Flags — custom runtime flags via pi.registerFlag()
  8. Event Interception — block/modify tool calls, inject LLM context, intercept user input, customize compaction
  9. Sub-agents — spawn Pi instances via tmux or --mode json subprocesses
  10. State Persistence — write to session JSONL via pi.appendEntry(customType, data)

ExtensionAPI Surface

Registration: pi.on(), pi.registerTool(), pi.registerCommand(), pi.registerShortcut(), pi.registerFlag(), pi.registerMessageRenderer(), pi.registerProvider()

Messaging: pi.sendMessage() (with steer/followUp/nextTurn delivery modes), pi.sendUserMessage(), pi.appendEntry()

Model Control: pi.setModel(), pi.getThinkingLevel(), pi.setThinkingLevel()

Tool Access: pi.getActiveTools(), pi.getAllTools(), pi.setActiveTools(), pi.exec()

ExtensionContext (ctx): ctx.ui (dialogs, notifications, overlays), ctx.cwd, ctx.sessionManager, ctx.modelRegistry, ctx.isIdle(), ctx.abort(), ctx.compact(), ctx.getSystemPrompt(), ctx.getContextUsage()

Custom Tool Example

import { Type } from "@sinclair/typebox";
import { StringEnum } from "@mariozechner/pi-ai";
 
export default function (pi: ExtensionAPI) {
  pi.registerTool({
    name: "my_tool",
    label: "My Tool",
    description: "Does something useful for the LLM",
    parameters: Type.Object({
      action: StringEnum(["search", "analyze"] as const),
      query: Type.String(),
    }),
    async execute(toolCallId, params, signal, onUpdate, ctx) {
      onUpdate?.({ content: [{ type: "text", text: "Working..." }] });
      return { content: [{ type: "text", text: "Result here" }] };
    },
  });
}

Permission Gate Example

pi.on("tool_call", async (event, ctx) => {
  if (event.toolName === "bash" && event.input.command.includes("rm -rf")) {
    const ok = await ctx.ui.confirm("Allow dangerous command?", "Allow?");
    if (!ok) return { block: true, reason: "Blocked by user" };
  }
});

Skills: On-Demand Capability Packages

Skills are a complementary extensibility mechanism — markdown files (SKILL.md) following the Agent Skills standard that describe workflows in natural language. Key property: skills are loaded lazily — only names and descriptions appear in the system prompt, with full instructions loaded on-demand. This is “progressive disclosure.”

Skills are compatible with Claude Code, Codex CLI, and other agents.

First-party skills (badlogic/pi-skills): brave-search, browser-tools, Google Calendar/Drive/Gmail CLI, transcribe (Groq Whisper), VS Code integration, YouTube transcript retrieval.

Package System

{
  "name": "my-package",
  "keywords": ["pi-package"],
  "pi": {
    "extensions": ["./extensions"],
    "skills": ["./skills"],
    "prompts": ["./prompts"],
    "themes": ["./themes"]
  }
}

Management commands: pi install npm:@scope/pkg, pi install git:github.com/user/repo, pi remove, pi update, pi list, pi config. A package gallery aggregates community packages.

Security Model

No built-in sandboxing by default. Extensions run with full system permissions. Zechner’s stance: “As soon as your agent can write code and run code, it’s pretty much game over” for security — theatrical guardrails are pointless.

Community-built sandboxing:

  • nono: Kernel-level via Landlock (Linux) / Seatbelt (macOS) — irreversible, deny-by-default
  • gondolin: Linux micro-VM isolation
  • Docker/containers: recommended for production use

Ecosystem Comparisons

FeaturePiClaude CodeCursorAiderCline
InterfaceTerminal CLITerminal CLIIDE (VS Code fork)Terminal CLIVS Code Extension
Open SourceMITNoNoApache 2.0Apache 2.0
Models15+ providers, mid-session switchingAnthropic onlyMulti-model100+ modelsAll major models
Core Tools420+ built-inIDE-integratedGit-integratedApproval-gated
System Prompt~1,000 tokens~10,000+ tokensN/AModerateModerate
Extension SystemTypeScript (50+ examples)Hooks (6-8)PluginsLimitedMCP, tools
MCPNo (deliberate)YesYesNoYes
Sub-agentsVia extensions/tmuxBuilt-inBuilt-inNov3.58 native
PermissionsNone (YOLO)Built-in promptsBuilt-inLimitedApprove everything
SessionsTree-structured JSONLLinearLinearGit branchesCheckpoints
PriceFree (BYOK)$20-200/mo$20-200/moFree (BYOK)Free (BYOK)

Where Pi wins vs. Claude Code: multi-model freedom, system prompt efficiency (~1K vs ~10K tokens), extension depth (20+ lifecycle hooks vs 6-8), cost (smaller prompts = fewer tokens), session tree architecture, full transparency (MIT OSS).

Where Claude Code wins: deepest reasoning (Opus 4.5 at 80.9% SWE-bench), batteries-included experience, enterprise support, larger community.

The No-MCP Stance: Pi’s rejection of MCP is principled — popular MCP servers consume 7-9% of context (13-18K tokens) with tools the model may never use. Zechner advocates “CLI tools with README files” accessible via bash, loaded only when needed.

oh-my-pi: The Batteries-Included Fork

oh-my-pi by Can Boluk transforms Pi into a comprehensive terminal coding environment:

  • Hash-anchored edits (Hashline): Content-hash anchors replacing verbatim text — 6.7%-68.3% edit success rate improvements across models
  • LSP integration: 11 LSP operations, 40+ language configs (Rust, Go, Python, TypeScript, Java, Kotlin, etc.), auto-diagnostics on write/edit
  • Subagent system: 6 bundled agents (explore, plan, designer, reviewer, task, quick_task) with parallel execution via git worktrees
  • Browser automation: Puppeteer-based with 14 stealth plugins
  • Python kernel: Embedded IPython with streaming output
  • TTSR (Time-Traveling Stream Rules): Mid-stream LLM interception to prevent invalid output
  • AI-powered git: Conventional commits with hunk-level staging and dependency-ordered split commits
  • Native performance: Rust-based N-API bindings for CPU-intensive operations
  • 65+ syntax themes with auto dark/light switching

Currently at v13.14.0, built on Bun runtime for fast startup.

Notable Community Projects

ProjectDescription
shitty-extensionscost-tracker, handoff, oracle, plan-mode, memory-mode, usage-bar
pi-messengerMulti-agent coordination with presence tracking, file reservations, crew-based task orchestration
pi-extensions (tmustier)files-widget, tab-status, agent-guidance, arcade minigames
awesome-pi-agentCurated list with 310+ stars
nonoKernel-level sandboxing by the Sigstore author
pi-nvimNeovim bridge
VS Code extensionOfficial VS Code integration

OpenClaw: Pi at Scale

Pi’s breakout moment came as the engine inside OpenClaw (Peter Steinberger’s personal AI assistant), which reached 180K+ GitHub stars. As Shaw noted: “What OpenClaw did was merge Pi with Claude Skills. That’s the killer combo.”

OpenClaw uses Pi’s SDK mode to embed the agent into a messaging gateway across WhatsApp, Telegram, Discord, Slack, Signal, iMessage, Google Chat, and Microsoft Teams. This validated Pi’s architecture at massive production scale.

⚠️ Security concern: researchers found 341 malicious skills out of 2,857 on ClawHub — a 12% contamination rate, mostly installing Atomic Stealer malware.

Key Design Insights

  1. Context engineering over feature accumulation: Pi’s ~1K token system prompt vs competitors’ ~10K+ leaves dramatically more context for actual code

  2. Self-extending agent philosophy: If you want a feature, ask the agent to build it as an extension — the agent extends itself (Armin Ronacher)

  3. Minimal scaffolding works: Terminal-Bench results show that “Terminus 2” (just a raw tmux session) holds its own against sophisticated tooling, validating that frontier models don’t need hand-holding

  4. Tree-structured sessions: Unlike linear conversation histories, Pi’s branching sessions enable “side quests” without polluting main context

  5. Dual-payload tool results: Separating LLM-visible content from UI-rendered details prevents context pollution while enabling rich rendering

  6. Validation as self-correction: Tool parameter validation errors are returned as tool results, enabling LLM self-correction within the natural loop

Practical Recommendations

Choose Pi when:

  • Multi-model flexibility matters (switch between Claude, GPT, Gemini, local)
  • You want to build your own agent on top of a solid SDK
  • Minimizing LLM costs is important
  • You prefer terminal-native, composable Unix workflows
  • Deep customization without forking internals is desired
  • Vendor independence and open source is required

Choose alternatives when:

  • Maximum reasoning on hard problems → Claude Code (80.9% SWE-bench)
  • Batteries-included IDE experience → Cursor
  • Git-native workflow → Aider
  • Human-in-the-loop approval flows → Cline
  • Fully autonomous coding → Devin / OpenHands
  • MCP integrations are critical → Claude Code, Cursor, or Cline

Sources