Tools Reference

Contents

Tools Reference#

All tools share the same interface: await tool.execute(ctx, params) -> ToolResult.

Tool priority rule: uiscreenshot(marks=True)mouse with image dimensions.

Tool

Description

ui

Accessibility-based UI interaction — click by name, type, read values

screenshot

Screen capture with optional Set-of-Marks overlay

mouse

Pixel-level mouse control with HiDPI coordinate translation

keyboard

Type text, press keys, send hotkeys

app

Open, close, focus, and list applications

clipboard

Read and write clipboard text

ocr

Extract text from any screen region

audit

Read the session audit log

learn

Record and replay desktop workflows


ToolResult#

All tools return:

@dataclass
class ToolResult:
    title: str           # short human-readable label
    output: str          # text returned to the LLM
    error: bool          # True if the action failed
    attachments: list[Attachment]  # binary files (screenshots, etc.)
    metadata: dict       # extra data for programmatic consumers

Attachment:

@dataclass
class Attachment:
    filename: str
    content: bytes
    media_type: str      # e.g. "image/png"

    def to_base64(self) -> str: ...

Start with the primary interaction tool: ui →