645 lines
19 KiB
Markdown
645 lines
19 KiB
Markdown
# Adding a Runner
|
||
|
||
This guide explains how to add a **new engine runner** to Takopi.
|
||
|
||
A *runner* is the adapter between an engine-specific CLI (Codex, Claude Code, …) and Takopi’s
|
||
**normalized event model** (`StartedEvent`, `ActionEvent`, `CompletedEvent`).
|
||
|
||
If you are building an external plugin package, read `docs/plugins.md` first.
|
||
|
||
Takopi is designed so that adding a runner usually means **adding one new module** under
|
||
`src/takopi/runners/` plus a small **msgspec schema** module under `src/takopi/schemas/`—
|
||
no changes to the bridge, renderer, or CLI.
|
||
|
||
When writing code intended for plugins, prefer importing from `takopi.api`
|
||
instead of internal modules.
|
||
|
||
The walkthrough below uses an **imaginary engine** named **Acme** (`acme`) and intentionally mirrors
|
||
the patterns used in `runners/claude.py`.
|
||
|
||
---
|
||
|
||
## What “done” looks like
|
||
|
||
After you add a runner, you should be able to:
|
||
|
||
- Run `takopi acme` (CLI subcommand is auto-registered).
|
||
- Start a new session and get a resume line like `` `acme --resume <token>` ``.
|
||
- Reply to any bot message containing that resume line and continue the same session.
|
||
- See progress updates (optional) and always get a final completion event.
|
||
|
||
---
|
||
|
||
## Mental model
|
||
|
||
### 1) Takopi owns the domain model
|
||
|
||
Takopi’s core types live in `takopi.model`:
|
||
|
||
- `ResumeToken(engine, value)`
|
||
- `StartedEvent(engine, resume, title?, meta?)`
|
||
- `ActionEvent(engine, action, phase, ok?, message?, level?)`
|
||
- `CompletedEvent(engine, ok, answer, resume?, error?, usage?)`
|
||
|
||
Runners **must not** invent new event types. They translate engine output into these.
|
||
|
||
### 2) The runner contract (invariants)
|
||
|
||
A run must produce events with these invariants (see `tests/test_runner_contract.py`):
|
||
|
||
- Exactly **one** `StartedEvent`.
|
||
- Exactly **one** `CompletedEvent`.
|
||
- `CompletedEvent` is the **last** event.
|
||
- `CompletedEvent.resume == StartedEvent.resume` (same token).
|
||
|
||
Action events are optional (minimal runner mode):
|
||
|
||
- Minimum viable runner: `StartedEvent` → `CompletedEvent`.
|
||
- You may add `ActionEvent`s later (recommended for better progress UX).
|
||
|
||
### 3) Resume lines are runner-owned
|
||
|
||
Takopi deliberately treats the runner as the authority for:
|
||
|
||
- How a resume line looks in chat (`format_resume()`)
|
||
- How to parse a resume token out of text (`extract_resume()`)
|
||
- How to detect a resume line reliably (`is_resume_line()`)
|
||
|
||
This matters because Takopi’s Telegram truncation logic preserves resume lines.
|
||
|
||
---
|
||
|
||
## Step-by-step: add the imaginary `acme` runner
|
||
|
||
### Step 1 — Pick an engine id + resume command
|
||
|
||
Choose a stable engine id string. This string becomes:
|
||
|
||
- The config table name (`[acme]` in `takopi.toml`)
|
||
- The CLI subcommand (`takopi acme`)
|
||
- The `ResumeToken.engine`
|
||
|
||
Engine ids must match the plugin ID regex:
|
||
|
||
```
|
||
^[a-z0-9_]{1,32}$
|
||
```
|
||
|
||
For Acme we’ll use:
|
||
|
||
- Engine id: `"acme"`
|
||
- Canonical resume command embedded in chat: `` `acme --resume <token>` ``
|
||
|
||
#### Write a resume regex
|
||
|
||
Follow the pattern used by Claude/Codex: accept optional backticks, be case-insensitive,
|
||
match full line, and capture a group named `token`.
|
||
|
||
```py
|
||
_RESUME_RE = re.compile(
|
||
r"(?im)^\s*`?acme\s+--resume\s+(?P<token>[^`\s]+)`?\s*$"
|
||
)
|
||
```
|
||
|
||
Why this shape?
|
||
|
||
- `(?m)` lets `^`/`$` match per-line inside multi-line messages.
|
||
- Optional backticks (`\`?`) lets you match Telegram inline-code formatting.
|
||
- Capturing the **last** token in a message lets users paste multiple resume lines.
|
||
|
||
---
|
||
|
||
### Step 2 — Create `src/takopi/schemas/acme.py` + `src/takopi/runners/acme.py`
|
||
|
||
Create a new schema module and a runner module:
|
||
|
||
```
|
||
src/takopi/schemas/
|
||
codex.py
|
||
acme.py # ← new
|
||
|
||
src/takopi/runners/
|
||
codex.py
|
||
claude.py
|
||
mock.py
|
||
acme.py # ← new
|
||
```
|
||
|
||
Takopi discovers engines via **entrypoints**. Every engine backend must be exposed
|
||
as an entrypoint under `takopi.engine_backends`, and the entrypoint name must match
|
||
the backend id.
|
||
|
||
For in-repo engines, add an entrypoint in `pyproject.toml`:
|
||
|
||
```toml
|
||
[project.entry-points."takopi.engine_backends"]
|
||
acme = "takopi.runners.acme:BACKEND"
|
||
```
|
||
|
||
For external plugins, use your package’s `pyproject.toml` with the same group.
|
||
|
||
---
|
||
|
||
### Step 3 — Translate Acme JSONL into Takopi events
|
||
|
||
Most CLIs we integrate are JSONL-streaming processes.
|
||
|
||
Takopi provides `JsonlSubprocessRunner`, which:
|
||
|
||
- spawns the CLI
|
||
- drains stderr and logs it
|
||
- reads stdout line-by-line as JSONL bytes
|
||
- calls your `decode_jsonl(...)` and then `translate(...)` to convert each event into Takopi events
|
||
- guarantees “exactly one CompletedEvent” behavior
|
||
- provides safe fallbacks for rc != 0 or stream ending without a completion event
|
||
|
||
#### Define a state object
|
||
|
||
Copy the Claude pattern: create a small dataclass to hold streaming state.
|
||
|
||
Common things to track:
|
||
|
||
- `factory`: `EventFactory` instance for creating Takopi events and tracking resume
|
||
- `pending_actions`: map tool_use_id → `Action` so tool results can complete them
|
||
- `last_assistant_text`: fallback for final answer if the engine omits it
|
||
- `note_seq`: counter used by `JsonlSubprocessRunner.note_event(...)`
|
||
|
||
```py
|
||
from dataclasses import dataclass, field
|
||
|
||
from ..events import EventFactory
|
||
|
||
@dataclass
|
||
class AcmeStreamState:
|
||
factory: EventFactory = field(default_factory=lambda: EventFactory(ENGINE))
|
||
pending_actions: dict[str, Action] = field(default_factory=dict)
|
||
last_assistant_text: str | None = None
|
||
note_seq: int = 0
|
||
```
|
||
|
||
#### Define a msgspec schema (recommended path)
|
||
|
||
Codex now decodes JSONL with **msgspec**, and new runners should follow that pattern.
|
||
Create a small schema module under `src/takopi/schemas/` and expose a `decode_event(...)`
|
||
function. Only include the event shapes your CLI actually emits.
|
||
|
||
Minimal example:
|
||
|
||
```py
|
||
from __future__ import annotations
|
||
|
||
from typing import Any, Literal, TypeAlias
|
||
|
||
import msgspec
|
||
|
||
|
||
class SessionStart(msgspec.Struct, tag="session.start", kw_only=True):
|
||
session_id: str
|
||
model: str | None = None
|
||
|
||
|
||
class ToolUse(msgspec.Struct, tag="tool.use", kw_only=True):
|
||
id: str
|
||
name: str
|
||
input: dict[str, Any] | None = None
|
||
|
||
|
||
class ToolResult(msgspec.Struct, tag="tool.result", kw_only=True):
|
||
tool_use_id: str
|
||
content: Any
|
||
is_error: bool | None = None
|
||
|
||
|
||
class Final(msgspec.Struct, tag="final", kw_only=True):
|
||
session_id: str
|
||
ok: bool
|
||
answer: str | None = None
|
||
error: str | None = None
|
||
|
||
|
||
AcmeEvent: TypeAlias = SessionStart | ToolUse | ToolResult | Final
|
||
|
||
_DECODER = msgspec.json.Decoder(AcmeEvent)
|
||
|
||
|
||
def decode_event(data: bytes | str) -> AcmeEvent:
|
||
return _DECODER.decode(data)
|
||
```
|
||
|
||
#### Decide what Acme emits
|
||
|
||
For this guide, assume Acme outputs events like:
|
||
|
||
```json
|
||
{"type":"session.start","session_id":"acme_01","model":"acme-large"}
|
||
{"type":"tool.use","id":"toolu_1","name":"Bash","input":{"command":"ls"}}
|
||
{"type":"tool.result","tool_use_id":"toolu_1","content":"ok","is_error":false}
|
||
{"type":"final","session_id":"acme_01","ok":true,"answer":"Done."}
|
||
```
|
||
|
||
#### Map them to Takopi events
|
||
|
||
Use this mapping (mirrors Claude’s approach):
|
||
|
||
- `session.start` → `StartedEvent(engine="acme", resume=ResumeToken("acme", session_id))`
|
||
- `tool.use` → `ActionEvent(phase="started")` and stash action in `pending_actions`
|
||
- `tool.result` → `ActionEvent(phase="completed", ok=...)` and pop from `pending_actions`
|
||
- `final` → `CompletedEvent(ok, answer, resume)`
|
||
|
||
**Important:** emit exactly one `CompletedEvent`.
|
||
|
||
#### Make the translator a pure function
|
||
|
||
Claude keeps translation logic in a standalone function (`translate_claude_event(...)`).
|
||
This makes it easy to unit test without spawning a subprocess.
|
||
|
||
Do the same for Acme. Use pattern matching against msgspec shapes, and rely on the
|
||
`EventFactory` (as in Codex/Claude) to standardize event creation:
|
||
|
||
```py
|
||
def translate_acme_event(
|
||
event: acme_schema.AcmeEvent,
|
||
*,
|
||
title: str,
|
||
state: AcmeStreamState,
|
||
factory: EventFactory,
|
||
) -> list[TakopiEvent]:
|
||
match event:
|
||
case acme_schema.SessionStart(session_id=session_id, model=model):
|
||
if not session_id:
|
||
return []
|
||
event_title = str(model) if model else title
|
||
token = ResumeToken(engine=ENGINE, value=session_id)
|
||
return [factory.started(token, title=event_title)]
|
||
|
||
case acme_schema.ToolUse(id=tool_id, name=name, input=tool_input):
|
||
if not tool_id:
|
||
return []
|
||
tool_input = tool_input or {}
|
||
name = str(name or "tool")
|
||
|
||
# Keep titles short and friendly.
|
||
# (Claude uses takopi.utils.paths.relativize_command / relativize_path)
|
||
kind: ActionKind = "tool"
|
||
title = name
|
||
if name in {"Bash", "Shell"}:
|
||
kind = "command"
|
||
title = relativize_command(str(tool_input.get("command") or name))
|
||
|
||
action = Action(
|
||
id=tool_id,
|
||
kind=kind,
|
||
title=title,
|
||
detail={"name": name, "input": tool_input},
|
||
)
|
||
state.pending_actions[action.id] = action
|
||
return [
|
||
factory.action_started(
|
||
action_id=action.id,
|
||
kind=action.kind,
|
||
title=action.title,
|
||
detail=action.detail,
|
||
)
|
||
]
|
||
|
||
case acme_schema.ToolResult(
|
||
tool_use_id=tool_use_id, content=content, is_error=is_error
|
||
):
|
||
if not tool_use_id:
|
||
return []
|
||
action = state.pending_actions.pop(tool_use_id, None)
|
||
if action is None:
|
||
action = Action(
|
||
id=tool_use_id,
|
||
kind="tool",
|
||
title="tool result",
|
||
detail={},
|
||
)
|
||
|
||
result_text = (
|
||
""
|
||
if content is None
|
||
else (content if isinstance(content, str) else str(content))
|
||
)
|
||
detail = dict(action.detail)
|
||
detail.update(
|
||
{"result_preview": result_text, "is_error": bool(is_error)}
|
||
)
|
||
|
||
return [
|
||
factory.action_completed(
|
||
action_id=action.id,
|
||
kind=action.kind,
|
||
title=action.title,
|
||
ok=not bool(is_error),
|
||
detail=detail,
|
||
)
|
||
]
|
||
|
||
case acme_schema.Final(session_id=session_id, ok=ok, answer=answer, error=error):
|
||
answer = answer or ""
|
||
if ok and not answer and state.last_assistant_text:
|
||
answer = state.last_assistant_text
|
||
|
||
resume = (
|
||
ResumeToken(engine=ENGINE, value=session_id) if session_id else None
|
||
)
|
||
|
||
if ok:
|
||
return [factory.completed_ok(answer=answer, resume=resume)]
|
||
|
||
error_text = str(error) if error else "acme run failed"
|
||
return [
|
||
factory.completed_error(
|
||
error=error_text,
|
||
answer=answer,
|
||
resume=resume,
|
||
)
|
||
]
|
||
|
||
case _:
|
||
return []
|
||
```
|
||
|
||
This is intentionally close to Claude’s structure:
|
||
|
||
- Match on the msgspec event type
|
||
- Handle “init/session start” first
|
||
- Emit action-start and action-complete events
|
||
- Emit a final `CompletedEvent`
|
||
|
||
---
|
||
|
||
### Step 4 — Implement the `AcmeRunner` class
|
||
|
||
Most engines can implement a runner by combining:
|
||
|
||
- `ResumeTokenMixin` (resume parsing + resume-line detection)
|
||
- `JsonlSubprocessRunner` (process + JSONL streaming + completion semantics)
|
||
|
||
#### Why this combo?
|
||
|
||
It matches Claude/Codex:
|
||
|
||
- Runner owns resume format/regex.
|
||
- Base class owns locking and subprocess lifecycle.
|
||
- Translation stays in a pure function and is easily testable.
|
||
|
||
#### Minimal skeleton
|
||
|
||
```py
|
||
from __future__ import annotations
|
||
|
||
import logging
|
||
import re
|
||
from dataclasses import dataclass
|
||
from pathlib import Path
|
||
from typing import Any
|
||
|
||
from ..backends import EngineBackend, EngineConfig
|
||
from ..model import (
|
||
EngineId,
|
||
ResumeToken,
|
||
TakopiEvent,
|
||
)
|
||
|
||
from ..runner import JsonlSubprocessRunner, ResumeTokenMixin, Runner
|
||
from ..schemas import acme as acme_schema
|
||
|
||
logger = logging.getLogger(__name__)
|
||
|
||
ENGINE: EngineId = EngineId("acme")
|
||
_RESUME_RE = re.compile(
|
||
r"(?im)^\s*`?acme\s+--resume\s+(?P<token>[^`\s]+)`?\s*$"
|
||
)
|
||
|
||
|
||
@dataclass
|
||
class AcmeRunner(ResumeTokenMixin, JsonlSubprocessRunner):
|
||
engine: EngineId = ENGINE
|
||
resume_re: re.Pattern[str] = _RESUME_RE
|
||
|
||
acme_cmd: str = "acme"
|
||
model: str | None = None
|
||
allowed_tools: list[str] | None = None
|
||
session_title: str = "acme"
|
||
logger = logger
|
||
|
||
def format_resume(self, token: ResumeToken) -> str:
|
||
# Override because our canonical resume command is "acme --resume ...".
|
||
if token.engine != ENGINE:
|
||
raise RuntimeError(f"resume token is for engine {token.engine!r}")
|
||
return f"`acme --resume {token.value}`"
|
||
|
||
def command(self) -> str:
|
||
return self.acme_cmd
|
||
|
||
def build_args(
|
||
self,
|
||
prompt: str,
|
||
resume: ResumeToken | None,
|
||
*,
|
||
state: Any,
|
||
) -> list[str]:
|
||
_ = prompt, state
|
||
args = ["--output-format", "stream-json", "--verbose"]
|
||
if resume is not None:
|
||
args.extend(["--resume", resume.value])
|
||
if self.model is not None:
|
||
args.extend(["--model", str(self.model)])
|
||
if self.allowed_tools:
|
||
args.extend(["--allowed-tools", ",".join(self.allowed_tools)])
|
||
return args
|
||
|
||
def stdin_payload(
|
||
self,
|
||
prompt: str,
|
||
resume: ResumeToken | None,
|
||
*,
|
||
state: Any,
|
||
) -> bytes | None:
|
||
_ = resume, state
|
||
# Acme reads the prompt from stdin.
|
||
return prompt.encode()
|
||
|
||
def new_state(self, prompt: str, resume: ResumeToken | None) -> AcmeStreamState:
|
||
_ = prompt, resume
|
||
return AcmeStreamState()
|
||
|
||
def decode_jsonl(
|
||
self,
|
||
*,
|
||
raw: bytes,
|
||
line: bytes,
|
||
state: AcmeStreamState,
|
||
) -> acme_schema.AcmeEvent | None:
|
||
_ = raw, state
|
||
return acme_schema.decode_event(line)
|
||
|
||
def translate(
|
||
self,
|
||
data: acme_schema.AcmeEvent,
|
||
*,
|
||
state: AcmeStreamState,
|
||
resume: ResumeToken | None,
|
||
found_session: ResumeToken | None,
|
||
) -> list[TakopiEvent]:
|
||
_ = resume, found_session
|
||
return translate_acme_event(
|
||
data,
|
||
title=self.session_title,
|
||
state=state,
|
||
factory=state.factory,
|
||
)
|
||
```
|
||
|
||
Notes:
|
||
|
||
- `JsonlSubprocessRunner` already enforces the “exactly one completed event” rule.
|
||
- When `resume=None`, Takopi will acquire a per-session lock after it sees the first
|
||
`StartedEvent`. This is why emitting `StartedEvent` early is important.
|
||
|
||
#### Optional but recommended overrides (Claude-inspired)
|
||
|
||
Depending on how robust you want the integration, consider adding:
|
||
|
||
- `env(...)`: to strip or inject environment variables (Claude strips `ANTHROPIC_API_KEY`
|
||
unless configured to use API billing).
|
||
- `invalid_json_events(...)`: emit a helpful warning `ActionEvent` on malformed JSONL.
|
||
- `decode_error_events(...)`: log + drop `msgspec.DecodeError` if the engine emits garbage.
|
||
- `process_error_events(...)`: customize rc != 0 behavior.
|
||
- `stream_end_events(...)`: handle “process exited cleanly but never emitted a final event”.
|
||
|
||
Claude uses these to produce better failures instead of silent hangs.
|
||
|
||
---
|
||
|
||
### Step 5 — Add `build_runner(...)` and `BACKEND`
|
||
|
||
Takopi needs a way to build your runner from config.
|
||
|
||
Follow the pattern in `runners/claude.py`:
|
||
|
||
```py
|
||
def build_runner(config: EngineConfig, _config_path: Path) -> Runner:
|
||
acme_cmd = "acme"
|
||
|
||
model = config.get("model")
|
||
allowed_tools = config.get("allowed_tools")
|
||
|
||
title = str(model) if model is not None else "acme"
|
||
|
||
return AcmeRunner(
|
||
acme_cmd=acme_cmd,
|
||
model=model,
|
||
allowed_tools=allowed_tools,
|
||
session_title=title,
|
||
)
|
||
|
||
|
||
BACKEND = EngineBackend(
|
||
id="acme",
|
||
build_runner=build_runner,
|
||
install_cmd="npm install -g @acme/acme-cli",
|
||
)
|
||
```
|
||
|
||
That’s it for wiring.
|
||
|
||
Because engine backends are auto-discovered (`takopi.engines`), you do **not** need
|
||
to register the runner elsewhere.
|
||
|
||
If the binary name differs from the engine id, set:
|
||
|
||
- `EngineBackend(cli_cmd="acme-cli")`
|
||
|
||
so onboarding can find it on PATH.
|
||
|
||
---
|
||
|
||
### Step 6 — Add tests (copy Claude’s testing strategy)
|
||
|
||
A good runner PR usually contains 3 types of tests.
|
||
|
||
#### 1) Resume parsing tests
|
||
|
||
Copy `tests/test_claude_runner.py::test_claude_resume_format_and_extract`.
|
||
|
||
For Acme, assert:
|
||
|
||
- `format_resume(...)` outputs the canonical resume line.
|
||
- `extract_resume(...)` can parse it back out.
|
||
- It ignores other engines’ resume lines.
|
||
|
||
#### 2) Translation unit tests (fixtures)
|
||
|
||
Claude’s translation tests load JSONL fixtures and feed them into the pure translator.
|
||
|
||
Do the same:
|
||
|
||
- `tests/fixtures/acme_stream_success.jsonl`
|
||
- `tests/fixtures/acme_stream_error.jsonl`
|
||
|
||
Then assert:
|
||
|
||
- first event is `StartedEvent`
|
||
- action events are correct (ids, kinds, titles)
|
||
- the last event is a `CompletedEvent`
|
||
- completed.resume matches started.resume
|
||
|
||
If you use msgspec, also add a tiny schema sanity test (pattern from
|
||
`tests/test_codex_schema.py`) that decodes your fixture with
|
||
`takopi.schemas.<engine>.decode_event`.
|
||
|
||
#### 3) Lock/serialization tests (optional, but great)
|
||
|
||
Claude has async tests proving that:
|
||
|
||
- two runs with the same resume token serialize (`max_in_flight == 1`)
|
||
- a new session run locks correctly after it emits `StartedEvent`
|
||
|
||
If your runner uses `JsonlSubprocessRunner`, you get most of this for free, but having
|
||
one targeted test catches regressions.
|
||
|
||
---
|
||
|
||
## Common pitfalls (and how Claude avoided them)
|
||
|
||
- **StartedEvent arrives too late**
|
||
- If you wait until the end to emit `StartedEvent`, Takopi can’t acquire the per-session lock
|
||
early and another task might resume the same session concurrently.
|
||
- Emit `StartedEvent` immediately when you learn the session id.
|
||
|
||
- **Multiple completion events**
|
||
- Some CLIs emit multiple “final-ish” events. Decide which one becomes Takopi’s `CompletedEvent`.
|
||
- `JsonlSubprocessRunner` will stop reading after the first `CompletedEvent` it sees.
|
||
|
||
- **Missing completion event**
|
||
- Claude handles “stream ended without a result event” by emitting a synthetic `CompletedEvent`
|
||
in `stream_end_events(...)`.
|
||
|
||
- **Unhelpful error reporting**
|
||
- Include stderr tail in a warning action (Claude includes `stderr_tail` in `detail`).
|
||
|
||
- **Resume line gets truncated**
|
||
- Ensure `is_resume_line()` matches your `format_resume()` output. Takopi tries to preserve
|
||
resume lines during truncation.
|
||
|
||
- **Leaking secrets**
|
||
- If your engine can run in “subscription mode” without env keys, strip env vars like Claude
|
||
does with `ANTHROPIC_API_KEY`.
|
||
|
||
---
|
||
|
||
## Final checklist
|
||
|
||
Before you call the runner “done”:
|
||
|
||
- [ ] `takopi acme` appears automatically (module exports `BACKEND`).
|
||
- [ ] `format_resume()` matches `extract_resume()` + `is_resume_line()`.
|
||
- [ ] Translation emits exactly one `StartedEvent` and one `CompletedEvent`.
|
||
- [ ] `CompletedEvent.resume` matches `StartedEvent.resume`.
|
||
- [ ] rc != 0 produces a failure `CompletedEvent` (via `process_error_events`).
|
||
- [ ] “no final event” produces a failure `CompletedEvent` (via `stream_end_events`).
|
||
- [ ] Tests cover resume parsing + at least one translation fixture.
|