Tools Reference#
All tools share the same interface: await tool.execute(ctx, params) -> ToolResult.
Tool priority rule: ui → screenshot(marks=True) → mouse with image dimensions.
Tool |
Description |
|---|---|
Accessibility-based UI interaction — click by name, type, read values |
|
Screen capture with optional Set-of-Marks overlay |
|
Pixel-level mouse control with HiDPI coordinate translation |
|
Type text, press keys, send hotkeys |
|
Open, close, focus, and list applications |
|
Read and write clipboard text |
|
Extract text from any screen region |
|
Read the session audit log |
|
Record and replay desktop workflows |
ToolResult#
All tools return:
@dataclass
class ToolResult:
title: str # short human-readable label
output: str # text returned to the LLM
error: bool # True if the action failed
attachments: list[Attachment] # binary files (screenshots, etc.)
metadata: dict # extra data for programmatic consumers
Attachment:
@dataclass
class Attachment:
filename: str
content: bytes
media_type: str # e.g. "image/png"
def to_base64(self) -> str: ...
Start with the primary interaction tool: ui →