feat: claude code runner (#9)

This commit is contained in:
banteg
2026-01-01 17:04:49 +04:00
committed by GitHub
parent 4885e3b878
commit 936ea5109b
30 changed files with 2259 additions and 382 deletions
+384
View File
@@ -0,0 +1,384 @@
Below is a concrete implementation spec for adding **Anthropic Claude Code (“claude” CLI / Agent SDK runtime)** as a first-class engine in Takopi (v0.2.0).
---
## Scope
### Goal
Add a new engine backend **`claude`** so Takopi can:
* Run Claude Code non-interactively via the **Agent SDK CLI** (`claude -p`). ([Claude Code][1])
* Stream progress in Telegram by parsing **`--output-format stream-json --verbose`** (newline-delimited JSON). Note: `--output-format` only works with `-p/--print`. ([Claude Code][1])
* Support resumable sessions via **`--resume <session_id>`** (Takopi emits a canonical resume line the user can reply with). ([Claude Code][1])
### Non-goals (v1)
* Interactive Q&A inside a single run (e.g., answering `AskUserQuestion` prompts mid-flight).
* Full “slash commands” integration (Claude Code docs note many slash commands are interactive-only). ([Claude Code][1])
* MCP prompt-handling for permissions (use allow rules instead).
---
## UX and behavior
### Engine selection
* Existing: `takopi codex`
* New: `takopi claude`
Takopi requires an explicit engine subcommand; `takopi` alone prints the engine
selection panel and exits.
### Resume UX (canonical line)
Takopi appends a **single backticked** resume line at the end of the message, like:
```text
`claude --resume 8b2d2b30-...`
```
Rationale:
* Claude Code supports resuming a specific conversation by session ID with `--resume`. ([Claude Code][1])
* The CLI reference also documents `--resume/-r` as the resume mechanism.
Takopi should parse either:
* `claude --resume <id>`
* `claude -r <id>` (short form from docs)
**Note:** Claude session IDs should be treated as **opaque strings**. Do not assume UUID format.
### Permissions / non-interactive runs
In `-p` mode, Claude Code can require tool approvals. Takopi cannot click/answer interactive prompts, so **users must preconfigure permissions** (via Claude Code settings or `--allowedTools`). Claudes settings system supports allow/deny tool rules. ([Claude Code][2])
**Safety note:** `-p/--print` skips the workspace trust dialog; only use this flag in trusted directories.
Takopi should document this clearly: if permissions arent configured and Claude tries to use a gated tool, the run may block or fail.
---
## Config additions
Takopi config lives at either:
* `.takopi/takopi.toml` (project-local), or
* `~/.takopi/takopi.toml` (home). (Existing Takopi behavior.)
Add a new optional `[claude]` section.
Recommended v1 schema:
```toml
# .takopi/takopi.toml
engine = "claude"
[claude]
model = "claude-sonnet-4-5-20250929" # optional (Claude Code supports model override in settings too)
allowed_tools = "Bash,Read,Edit" # optional but strongly recommended for automation
dangerously_skip_permissions = false # optional (high risk; prefer sandbox use only)
use_api_billing = false # optional (keep ANTHROPIC_API_KEY for API billing)
```
Notes:
* `--allowedTools` exists specifically to auto-approve tools in programmatic runs. ([Claude Code][1])
* Claude Code tools (Bash/Edit/Write/WebSearch/etc.) and whether permission is required are documented. ([Claude Code][2])
* Takopi only reads `model`, `allowed_tools`, `dangerously_skip_permissions`, and `use_api_billing` from `[claude]`.
* By default Takopi strips `ANTHROPIC_API_KEY` from the subprocess environment so Claude uses subscription billing. Set `use_api_billing = true` to keep the key.
---
## Code changes (by file)
### 1) `src/takopi/engines.py`
Add a new backend:
* Engine ID: `EngineId("claude")`
* `check_setup()` should:
* `shutil.which("claude")` must exist.
* Error message should include official install options and “run `claude` once to authenticate”.
* Install methods include install scripts, Homebrew, and npm. ([Claude Code][4])
* Agent SDK / CLI can use Claude Code authentication from running `claude`, or API key auth. ([Claude][5])
* `build_runner()` should parse `[claude]` config and instantiate `ClaudeRunner`.
* `startup_message()` e.g.:
* `takopi (claude) is ready\npwd: ...`
### 2) New file: `src/takopi/runners/claude.py`
Implement a new `Runner`:
#### Public API
* `engine: EngineId = "claude"`
* `format_resume(token) -> str`: returns `` `claude --resume {token}` ``
* `extract_resume(text) -> ResumeToken | None`: parse last match of `--resume/-r`
* `is_resume_line(line) -> bool`: matches the above patterns
* `run(prompt, resume)` async generator of `TakopiEvent`
#### Subprocess invocation
Use Agent SDK CLI non-interactively:
Core invocation:
* `claude -p --output-format stream-json --verbose` ([Claude Code][1])
* `--verbose` overrides config and is required for full stream-json output.
Resume:
* add `--resume <session_id>` if resuming. ([Claude Code][1])
Model:
* add `--model <name>` if configured. ([Claude Code][1])
Permissions:
* add `--allowedTools "<rules>"` if configured. ([Claude Code][1])
* add `--dangerously-skip-permissions` only if explicitly enabled (high risk; document clearly).
Prompt passing:
* Pass the prompt as the final positional argument after `--` (CLI expects `prompt` as an argument). This also protects prompts that begin with `-`. ([Claude Code][1])
Other flags:
* Claude exposes more CLI flags, but Takopi does not surface them in config.
#### Stream parsing
In stream-json mode, Claude emits newline-delimited JSON objects. ([Claude Code][1])
Per the official Agent SDK TypeScript reference, message types include:
* `system` with `subtype: 'init'` and fields like `session_id`, `cwd`, `tools`, `model`, `permissionMode`, `output_style`. ([Claude Code][3])
* `assistant` / `user` messages with Anthropic SDK message objects. ([Claude Code][3])
* final `result` message with:
* `subtype: 'success'` or error subtype(s),
* `is_error`,
* `result` (string on success),
* `usage`, `total_cost_usd`, `modelUsage`,
* `errors` list on failures,
* `permission_denials`. ([Claude Code][3])
Takopi should:
* Parse each line as JSON; on decode error emit a warning ActionEvent (like CodexRunner does) and continue.
* Prefer stdout for JSON; log stderr separately (do not merge).
* Treat unknown top-level fields (e.g., `parent_tool_use_id`) as optional metadata and ignore them unless needed.
#### Mapping to Takopi events
**StartedEvent**
* Emit upon first `system/init` message:
* `resume = ResumeToken(engine="claude", value=session_id)`
(treat `session_id` as opaque; do not validate as UUID)
* `title = model` (or user-specified config title; default `"claude"`)
* `meta` should include `cwd`, `tools`, `permissionMode`, `output_style` for debugging.
**Action events (progress)**
The core useful progress comes from tool usage.
Claude Code tools list is documented (Bash/Edit/Write/WebSearch/WebFetch/TodoWrite/Task/etc.). ([Claude Code][2])
Strategy:
* When you see an **assistant message** with a content block `type: "tool_use"`:
* Emit `ActionEvent(phase="started")` with:
* `action.id = tool_use.id`
* `action.kind` based on tool name (complete mapping):
* `Bash``command`
* `Edit`/`Write`/`NotebookEdit``file_change` (best-effort path extraction)
* `Read``tool`
* `Glob`/`Grep``tool`
* `WebSearch`/`WebFetch``web_search`
* `TodoWrite`/`TodoRead``note`
* `AskUserQuestion``note`
* `Task`/`Agent``tool`
* `KillShell``command`
* otherwise → `tool`
* `action.title`:
* Bash: use `input.command` if present
* Read/Write/Edit/NotebookEdit: use file path (best-effort; field may be `file_path` or `path`)
* Glob/Grep: use pattern
* WebSearch: use query
* WebFetch: use URL
* TodoWrite/TodoRead: short summary (e.g., “update todos”)
* AskUserQuestion: short summary (e.g., “ask user”)
* otherwise: tool name
* `detail` includes a compacted copy of input (or a safe summary).
* When you see a **user message** with a content block `type: "tool_result"`:
* Emit `ActionEvent(phase="completed")` for `tool_use_id`
* `ok = not is_error`
* `content` may be a string or an array of content blocks; normalize to a string for summaries
* `detail` includes a small summary (char count / first line / “(truncated)”)
This mirrors CodexRunners “started → completed” item tracking and renders well in existing `TakopiProgressRenderer`.
**CompletedEvent**
* Emit on `result` message:
* `ok = (is_error == false)` (treat `is_error` as authoritative; `subtype` is informational)
* `answer = result` on success; on error, a concise message using `errors` and/or denials
* `usage` attach:
* `total_cost_usd`, `usage`, `modelUsage`, `duration_ms`, `duration_api_ms`, `num_turns` ([Claude Code][3])
* Always include `resume` (same session_id).
* Emit exactly one completed event per run. After emitting it, ignore any
trailing JSON lines (do not emit a second completion).
* We do not use an idle-timeout completion; completion is driven by Claudes
`result` event or process exit handling.
**Permission denials**
Because result includes `permission_denials`, optionally emit warning ActionEvent(s) *before* CompletedEvent (CompletedEvent must be final):
* kind: `warning`
* title: “permission denied: <tool_name>”
This preserves the “warnings before started/completed” ordering principle Takopi already tests for CodexRunner.
#### Session serialization / locks
Must match Takopi runner contract:
* Lock key: `claude:<session_id>` (string) in a `WeakValueDictionary` of `anyio.Lock`.
* When resuming:
* acquire lock before spawning subprocess.
* When starting a new session:
* you dont know session_id until `system/init`, so:
* spawn process,
* wait until the **first** `system/init`,
* acquire lock for that session id **before** yielding StartedEvent,
* then continue yielding.
This mirrors CodexRunners correct behavior and ensures “new run + resume run” serialize once the session is known.
Assumption: Claude emits a single `system/init` per run. If multiple `init`
events arrive, ignore the subsequent ones (do not attempt to re-lock).
#### Cancellation / termination
Reuse the existing subprocess lifecycle pattern (like `CodexRunner.manage_subprocess`):
* Kill the process group on cancellation
* Drain stderr concurrently (log-only)
* Ensure locks release in `finally`
## Documentation updates
### README
Add a “Claude Code engine” section that covers:
* Installation (install script / brew / npm). ([Claude Code][4])
* Authentication:
* run `claude` once and follow prompts, or use API key auth (Agent SDK docs mention `ANTHROPIC_API_KEY`). ([Claude][5])
* Non-interactive permission caveat + how to configure:
* settings allow/deny rules,
* or `--allowedTools` / `[claude].allowed_tools`. ([Claude Code][2])
* Resume format: `` `claude --resume <id>` ``.
### `docs/developing.md`
Extend “Adding a Runner” with:
* “ClaudeRunner parses Agent SDK stream-json output”
* Mention key message types and the init/result messages.
---
## Test plan
Mirror the existing `CodexRunner` tests patterns.
### New tests: `tests/test_claude_runner.py`
1. **Contract & locking**
* `test_run_serializes_same_session` (stub `_run` like Codex tests)
* `test_run_allows_parallel_new_sessions`
* `test_run_serializes_new_session_after_session_is_known`:
* Provide a fake `claude` executable in tmp_path that:
* prints system/init with session_id,
* then waits on a file gate,
* a second invocation with `--resume` writes a marker file and exits,
* assert the resume invocation doesnt run until gate opens.
2. **Resume parsing**
* `format_resume` returns `claude --resume <id>`
* `extract_resume` handles both `--resume` and `-r`
3. **Translation / event ordering**
* Fake `claude` outputs:
* system/init
* assistant tool_use (Bash)
* user tool_result
* result success with `result: "ok"`
* Assert Takopi yields:
* StartedEvent
* ActionEvent started
* ActionEvent completed
* CompletedEvent(ok=True, answer="ok")
4. **Failure modes**
* `result` subtype error with `errors: [...]`:
* CompletedEvent(ok=False)
* permission_denials exist:
* warning ActionEvent(s) emitted before CompletedEvent
5. **Cancellation**
* Stub `claude` that sleeps; ensure cancellation kills it (pattern already used for codex subprocess cancellation tests).
---
## Implementation checklist
* [ ] Add `ClaudeBackend` in `src/takopi/engines.py` and register in `ENGINES`.
* [ ] Add `src/takopi/runners/claude.py` implementing the `Runner` protocol.
* [ ] Add tests + stub executable fixtures.
* [ ] Update README and developing docs.
* [ ] Run full test suite.
---
If you want, I can also propose the exact **event-to-action mapping table** (tool → kind/title/detail rules) you should start with, based on Claude Codes documented tool list (Bash/Edit/Write/WebSearch/etc.). ([Claude Code][2])
[1]: https://code.claude.com/docs/en/headless "Run Claude Code programmatically - Claude Code Docs"
[2]: https://code.claude.com/docs/en/settings "Claude Code settings - Claude Code Docs"
[3]: https://code.claude.com/docs/en/sdk/sdk-typescript "Agent SDK reference - TypeScript - Claude Docs"
[4]: https://code.claude.com/docs/en/quickstart "Quickstart - Claude Code Docs"
[5]: https://platform.claude.com/docs/en/agent-sdk/quickstart "Quickstart - Claude Docs"
@@ -0,0 +1,108 @@
# Claude `stream-json` event cheatsheet
`claude -p --output-format stream-json --verbose` writes **one JSON object per line**
(JSONL) with a required `type` field. (`--output-format` only works with `-p`.)
This cheatsheet is derived from `humanlayer/claudecode-go/types.go` and
`client_test.go`.
## Top-level event lines
### `system` (init)
Fields:
- `type`: `"system"`
- `subtype`: `"init"`
- `session_id`
- `tools`: array of tool names
- `mcp_servers`: array of `{name, status}`
- `cwd`, `model`, `permissionMode`, `apiKeySource` (optional)
Example:
```json
{"type":"system","subtype":"init","session_id":"session_01","cwd":"/repo","model":"sonnet","permissionMode":"auto","apiKeySource":"env","tools":["Bash","Read","Write","WebSearch"],"mcp_servers":[{"name":"approvals","status":"connected"}]}
```
### `assistant` / `user`
Fields:
- `type`: `"assistant"` or `"user"`
- `session_id`
- `message` (see below)
Example (assistant text):
```json
{"type":"assistant","session_id":"session_01","message":{"id":"msg_1","type":"message","role":"assistant","content":[{"type":"text","text":"Planning next steps."}],"usage":{"input_tokens":120,"output_tokens":45}}}
```
Example (assistant tool use):
```json
{"type":"assistant","session_id":"session_01","message":{"id":"msg_2","type":"message","role":"assistant","content":[{"type":"tool_use","id":"toolu_1","name":"Bash","input":{"command":"ls -la"}}]}}
```
Example (user tool result, string content):
```json
{"type":"user","session_id":"session_01","message":{"id":"msg_3","type":"message","role":"user","content":[{"type":"tool_result","tool_use_id":"toolu_1","content":"total 2\nREADME.md\nsrc\n"}]}}
```
Example (user tool result, array content):
```json
{"type":"user","session_id":"session_01","message":{"id":"msg_4","type":"message","role":"user","content":[{"type":"tool_result","tool_use_id":"toolu_2","content":[{"type":"text","text":"Task completed"}]}]}}
```
Optional parent field (for nested tool usage):
```json
{"type":"assistant","parent_tool_use_id":"toolu_parent","session_id":"session_01", ...}
```
### `result`
Fields (success path):
- `type`: `"result"`
- `subtype`: `"success"` (or `"completion"`)
- `session_id`
- `total_cost_usd`, `is_error`, `duration_ms`, `duration_api_ms`, `num_turns`
- `result`: final answer string
- `usage`: usage object
- `modelUsage`: optional per-model usage
Example (success):
```json
{"type":"result","subtype":"success","session_id":"session_01","total_cost_usd":0.0123,"is_error":false,"duration_ms":12345,"duration_api_ms":12000,"num_turns":2,"result":"Done.","usage":{"input_tokens":150,"output_tokens":70,"service_tier":"standard","server_tool_use":{"web_search_requests":0}}}
```
Example (error + permission denials):
```json
{"type":"result","subtype":"error","session_id":"session_02","total_cost_usd":0.001,"is_error":true,"duration_ms":2000,"duration_api_ms":1800,"num_turns":1,"result":"","error":"Permission denied","permission_denials":[{"tool_name":"Bash","tool_use_id":"toolu_9","tool_input":{"command":"git fetch origin main"}}]}
```
## Message object (`message` field)
Fields:
- `id`, `type`, `role`
- `model` (optional)
- `content`: array of content blocks
- `usage` (assistant messages)
## Content block shapes (in `message.content[]`)
### Text
```json
{"type":"text","text":"Hello"}
```
### Tool use
```json
{"type":"tool_use","id":"toolu_1","name":"Bash","input":{"command":"ls -la"}}
```
### Tool result
String content:
```json
{"type":"tool_result","tool_use_id":"toolu_1","content":"ok"}
```
Array content (Task tool format):
```json
{"type":"tool_result","tool_use_id":"toolu_2","content":[{"type":"text","text":"Task done"}]}
```
+232
View File
@@ -0,0 +1,232 @@
# Claude Code -> Takopi event mapping (spec)
This document specifies how to add a Claude Code runner to Takopi by translating
Claude CLI `--output-format stream-json` JSONL events into Takopi events. It is
based on the reverse-engineered schema in `humanlayer/claudecode-go`:
- `claudecode-go/types.go` (StreamEvent, Message, Content, Result)
- `claudecode-go/client.go` (CLI flags, stream parsing)
- `claudecode-go/client_test.go` (schema validation + permission_denials)
The goal is to make a Claude runner feel identical to the Codex runner from the
bridge/renderer point of view while preserving Takopi invariants (stable action
ids, per-session serialization, single completed event).
---
## 1. Input stream contract (Claude CLI)
Claude Code CLI emits **one JSON object per line** (JSONL) when invoked with
`--output-format stream-json` (only valid with `-p/--print`).
Recommended invocation (matches claudecode-go):
```
claude -p --output-format stream-json --verbose -- <query>
```
Notes:
- `--verbose` is required for `stream-json` output (clis may otherwise drop events).
- `-p/--print` is required for `--output-format` and `--include-partial-messages`.
- `-- <query>` is required to safely pass prompts that start with `-`.
- Resuming uses `--resume <session_id>` and optional `--fork-session`.
- The CLI does **not** read the prompt from stdin in claudecode-go; it passes the
prompt as the final positional argument after `--`.
---
## 2. Resume tokens and resume lines
- Engine id: `claude`
- Canonical resume line (embedded in chat):
```
`claude --resume <session_id>`
```
Runner must implement its own regex (cannot use `compile_resume_pattern` because
that only matches `<engine> resume <token>`). Suggested regex:
```
(?im)^\s*`?claude\s+(?:--resume|-r)\s+(?P<token>[^`\s]+)`?\s*$
```
**Note:** Claude session IDs should be treated as opaque strings.
Resume rules:
- If a resume token is provided to `run()`, the runner MUST verify that any
`session_id` observed in the stream matches it.
- If the stream yields a different `session_id`, emit a fatal error and end the run.
---
## 3. Session lifecycle + serialization
Takopi requires **serialization per session id**:
- For new runs (`resume=None`), do **not** acquire a lock until a `session_id`
is observed (usually the first `system.init` event).
- Once the session id is known, acquire a lock for `claude:<session_id>` and hold
it until the run completes.
- For resumed runs, acquire the lock immediately on entry.
This matches the Codex runner behavior in `takopi/runners/codex.py`.
---
## 4. Event translation (Claude JSONL -> Takopi)
### 4.1 Top-level `system` events
Claude emits a system init event early in the stream:
```
{"type":"system","subtype":"init","session_id":"...", ...}
```
**Mapping:**
- Emit a Takopi `started` event as soon as `session_id` is known.
- Assume only one `system.init` per run; if more appear, ignore the subsequent
ones to avoid re-locking.
- Optional: emit a `note` action summarizing tools/MCP servers (debug-only).
### 4.2 `assistant` / `user` message events
Claude messages include a `message` object with a `content[]` array. Each content
block can represent text, tool usage, or tool results.
For each content block:
#### A) `type = "tool_use"`
**Mapping:** emit `action` with `phase="started"`.
- `action.id` = `content.id`
- `action.kind` = map from tool name (see section 5)
- `title`:
- if kind=`command`: use `input.command` if present
- else: tool name or derived label
- `detail` should include:
- `tool_name`, `tool_input`, `message_id`, `parent_tool_use_id` (if provided)
#### B) `type = "tool_result"`
**Mapping:** emit `action` with `phase="completed"`.
- `action.id` = `content.tool_use_id`
- `ok`:
- if `content.is_error` exists and is true -> `ok=False`
- else `ok=True`
- `detail` should include:
- `tool_use_id`, `content` (raw), `message_id`
The runner SHOULD keep a small in-memory map from `tool_use_id -> tool_name`
(learned from `tool_use`) so the completed action title can match the started
action title.
#### C) `type = "text"`
**Mapping:**
- Default: do **not** emit an action (avoid duplicate rendering).
- Store the latest assistant text as a fallback final answer if `result.result`
is empty or missing.
#### D) `type = "thinking"` or other unknown types
**Mapping:** optional `note` action (phase completed) with title derived from
content; otherwise ignore.
### 4.3 `result` events
The terminal event looks like:
```
{"type":"result","subtype":"success", ...}
```
**Mapping:** emit a single Takopi `completed` event:
- `ok = !event.is_error`
- `answer = event.result` (fallback to last assistant text if empty)
- `error = event.error` (if present)
- `resume = ResumeToken(engine="claude", value=event.session_id)`
- `usage = event.usage` (pass through)
- Emit exactly one `completed` event; ignore any trailing JSON lines afterward.
No idle-timeout completion is used.
#### Permission denials
`result.permission_denials` may contain tool calls that were blocked. Emit a
warning action for each denial *before* the final `completed` event:
- `action.kind = "warning"`
- `title = "permission denied: <tool_name>"`
- `detail = {tool_name, tool_use_id, tool_input}`
- `ok = False`, `level = "warning"`
### 4.4 Error handling / malformed lines
- If a JSONL line is invalid JSON: emit a warning action and continue.
- If the subprocess exits non-zero or the stream ends without a `result` event:
emit `completed` with `ok=False` and `error` explaining the failure.
- Emit **exactly one** `completed` event per run.
---
## 5. Tool name -> ActionKind mapping heuristics
Claude tool names can evolve. The runner SHOULD map based on tool name and input
shape. Suggested rules:
| Tool name pattern | ActionKind | Title logic |
| --- | --- | --- |
| `Bash`, `Shell` | `command` | `input.command` |
| `Write`, `Edit`, `MultiEdit`, `NotebookEdit` | `file_change` | `input.path` |
| `Read` | `tool` | `Read <path>` |
| `WebSearch` | `web_search` | `input.query` |
| (default) | `tool` | tool name |
For `file_change`, emit `detail.changes = [{"path": <path>, "kind": "update"}]`.
If input indicates creation (ex: `create: true`), use `kind: "add"`.
If a tool name is unknown, map to `tool` and include the full input in `detail`.
---
## 6. Usage mapping
Takopi `completed.usage` should mirror the Claude `result.usage` object
without transformation. Optionally include `modelUsage` inside `usage` or
`detail` if downstream consumers want it (currently unused by renderers).
---
## 7. Implementation checklist (handoff)
Add a Claude runner without changing the Takopi domain model:
1. Create `takopi/runners/claude.py` implementing `Runner` and (custom)
resume parsing.
2. Update `takopi/engines.py`:
- add `claude` backend id
- `check_setup`: locate `claude` binary (PATH + common locations)
- `build_runner`: read `[claude]` config + construct runner
- `startup_message`: `"claude is ready\npwd: <cwd>"`
3. Add new docs (this file + `claude-stream-json-cheatsheet.md`).
4. Add fixtures in `tests/fixtures/` (see below).
5. Add unit tests mirroring `tests/test_codex_*` but for Claude translation
and resume parsing (recommended, not required for initial handoff).
---
## 8. Suggested Takopi config keys
A minimal TOML config for Claude:
```toml
[claude]
# model: opus | sonnet | haiku
model = "sonnet"
allowed_tools = ["Bash", "Read", "Write", "WebSearch"]
dangerously_skip_permissions = false
use_api_billing = false
```
Takopi only maps these keys to Claude CLI flags; other options should be configured in Claude Code settings.
When `use_api_billing` is false (default), Takopi strips `ANTHROPIC_API_KEY` from the Claude subprocess environment to prefer subscription billing.