MCP Integration

MCP Integration#

MCP (Model Context Protocol) is how AI coding tools like Claude Code, Cursor, and Continue connect to external capabilities. opendesk ships as an MCP server — you install it once, register it with your client, and from then on any conversation can use screenshot, mouse, keyboard, ui, app, clipboard, and ocr without any extra setup.


How it works#

Your AI client (Claude Code / Cursor / Continue)
       |
       | MCP over stdio
       v
  opendesk-mcp  <-- the server this package installs
       |
       +-- screenshot tool  (captures screen, returns PNG to the LLM)
       +-- ui tool          (clicks elements by name via AX tree)
       +-- mouse tool       (pixel-level mouse control with Retina scaling)
       +-- keyboard tool    (types text, presses keys, hotkeys)
       +-- app tool         (opens, closes, focuses apps)
       +-- clipboard tool   (read/write clipboard)
       +-- ocr tool         (extracts text from any screen region)

The LLM decides when to call a tool, calls it, gets the result (including PNG screenshots), and continues. You never write tool-calling code yourself.


What the LLM sees#

Once registered, the LLM sees these tools in every conversation:

Tool

When the LLM uses it

screenshot

“Let me see what’s on the screen”

ui

“Click the Save button” (finds it via AX tree, no coordinates)

mouse

Last resort for canvas areas with no accessible elements

keyboard

“Type this text into the field”

app

“Open Terminal” / “List running apps”

clipboard

“Copy this to clipboard” / “What did I copy?”

ocr

“Read the text in that region”

The LLM follows the priority rule automatically because it’s in each tool’s description: ui first, then screenshot(marks=True) to see numbered elements, then mouse as last resort.


Ask Claude#

Once connected, just describe what you want in plain English:

“Take a screenshot and tell me what’s on my screen”

“Click the Save button in TextEdit”

“Open Spotify”

“Type hello world into the active window”

“What text is in that dialog box?”

“Read the text on screen”

“Show me the audit log”

“Replay everything from this session”

Claude decides which tool to call, calls it, and responds with the result — you never write any code.


Setup#

  1. Install the package

  2. Register with your client: Claude Code · Claude Desktop · Cursor · Continue