sandraschi/pywinauto-mcp
GitHub: sandraschi/pywinauto-mcp
Stars: 18 | Forks: 6
# PyWinAuto MCP
**Let an AI assistant control real Windows apps** through a single MCP server that wraps window, UI, mouse, keyboard, screenshots, OCR, and optional face checks behind a small set of **portmanteau** tools (many operations, few entry points so models stay focused).
**Stack:** v0.4.2 FastMCP 3.2.0+ Python 3.12+ Windows 10/11
**Web dashboard (optional):** This repo ships **`web_sota`** — a local browser UI (Vite; default **http://localhost:10788**) that talks to the same backend as the REST API (**http://127.0.0.1:10789**). Use it for a **tools hub**, safety/help pages, **local LLM chat** (Ollama or LM Studio), **camera** selection, biometrics, and overview — run **`web_sota/start.ps1`**. You do **not** need the webapp for normal MCP **stdio** use in an IDE; it is an extra operator surface.
**Important:** This is **not** a browser sandbox. It runs in **your** desktop session and can move the real cursor, type into real windows, and drive the same UI you see. Read **[docs/SAFETY.md](docs/SAFETY.md)** before you wire it into an IDE. For **why hooks and full desktop control look “malware-adjacent”** and how this repo gates them (research, forensics, legitimate automation), see **[Dual-use tooling](docs/SAFETY.md#dual-use-tooling-research-forensics-and-guardrails)** in that doc. For throwaway desktops (Windows Sandbox, VMs, mapped folders), use **[virtualization-mcp](https://github.com/sandraschi/virtualization-mcp)** alongside this project. Fleet notes: `mcp-central-docs/patterns/PYWINAUTO_MCP_SAFETY.md`. Optional **face** features are off until you opt in see **SAFETY 5** and `PYWINAUTO_MCP_ENABLE_FACE`. Optional **global keylogger** is off until you opt in see **SAFETY 6** and `PYWINAUTO_MCP_ENABLE_KEYLOGGER`.
**Native Windows vs websites (HTML DOM):** **pywinauto-mcp** drives **desktop UI** (Win32 / UI Automation windows, dialogs, many native controls). It does **not** expose the **HTML DOM** inside a browser tab. For **website** automation and analysis (selectors, accessibility tree, network, console), use a **browser MCP**; those servers are usually built on **Playwright** (or Chromium-only stacks). The two are **orthogonal** combine them in your IDE when a workflow needs **both** a real browser page and a **native** app on the same machine.
### Complementary: [autohotkey-mcp](https://github.com/sandraschi/autohotkey-mcp) (sibling repo)
Fleet sibling **[`autohotkey-mcp`](https://github.com/sandraschi/autohotkey-mcp)** targets **AutoHotkey v2 scriptlets**: list/run/stop scripts from a depot, optional **ScriptletCOMBridge**, optional **LLM-assisted** generation of `.ahk` into a sandbox folder. This repo targets **PyWinAuto + portmanteau tools** (window/element tree, mouse/keyboard, visual, system). **Neither replaces the other.**
| | **pywinauto-mcp (here)** | **autohotkey-mcp** |
|--|--|--|
| **Core idea** | **Structured native UI automation** — UIA/window discovery, element ops, OCR/screenshots, one MCP tool surface. | **AHK script lifecycle** — run and manage scriptlets, peek source/metadata, optional generate AHK. |
| **Overlap** | Both assume a **high-trust Windows session** (read each repo’s **SAFETY**). | Same. |
| **Not duplicated** | No first-class **scriptlet depot** or **AHK generation** workflow. | No **UI Automation tree** or **portmanteau** window/element API like here. |
| **Typical use** | “Drive this app’s UI from the model” (clicks, fields, windows). | “Run or author **hotkey/macro** AHK I already organize in a folder.” |
| **Together** | Install **both** MCP servers when you want **agent-driven UI control** and **AHK utilities** in the same client; keep boundaries clear (native UI here, AHK scripts there). | — |
### Discovery (GitHub, Glama, MCP catalogs)
- **Safety:** [`docs/SAFETY.md`](docs/SAFETY.md) kill switch, rate limits, HITL (human-in-the-loop), dry-run.
- **Dual-use / research:** [Dual-use tooling](docs/SAFETY.md#dual-use-tooling-research-forensics-and-guardrails) — capability vs. intent, guardrails, vendor refusal context.
- **Isolation:** **Windows Sandbox** / VM via [`virtualization-mcp`](https://github.com/sandraschi/virtualization-mcp). Repo stars are **not** a safety guarantee.
### Product & docs
- **PRD / index:** [`docs/PRD.md`](docs/PRD.md) [`docs/README.md`](docs/README.md)
- **Web dashboard:** [`web_sota/`](web_sota/) `start.ps1` (ports **10788** / **10789**)
### Prompts, skills, MCPB
- **MCP prompts:** `desktop_automation_operator_protocol`, `desktop_automation_runbook` [`src/pywinauto_mcp/prompts.py`](src/pywinauto_mcp/prompts.py).
- **Cursor skill:** [`skills/desktop-automation-protocol/SKILL.md`](skills/desktop-automation-protocol/SKILL.md).
- **Foreground:** [`docs/OPERATOR_PROTOCOL.md`](docs/OPERATOR_PROTOCOL.md) keep the target app focused during automation.
- **MCPB:** [`mcpb/manifest.json`](mcpb/manifest.json) packages the server; prompts come from the running process unless you extend the pack.
### Planned / todo
- **Optional voice (STT / keyword / speaker-adjacent):** Not implemented. Would mirror face: **env + optional extra**, local-first, same HITL (human-in-the-loop) / safety docs not authentication.
## Examples
- **[examples/README.md](examples/README.md)** — **demos:** mouse dance, 9-Notepad grid, typewriter Notepad, plus older samples. Run all three in order: **`just demo`** (requires [just](https://github.com/casey/just)).
- **just paint-demo** — Industrialized MS Paint drawing (justfile/PowerShell orchestrated).
- [examples/notepad_basic.py](examples/notepad_basic.py) — simple window flow.
- [examples/calculator_advanced.py](examples/calculator_advanced.py) — element tree.
- [examples/system_monitoring.py](examples/system_monitoring.py) — processes / tray.
Latency depends on the host, target app, and backends (OCR, etc.); treat any old benchmark tables as obsolete.
## Tools (portmanteau)
Seven core interfaces plus **`get_desktop_state`**, **optional** `automation_face` when enabled, and **optional** `global_keylogger` when enabled (see **SAFETY**):
| Tool | Operations | Description |
|------|------------|-------------|
| `automation_windows` | 11 | Window management (list, find, maximize, etc.) |
| `automation_elements` | 14 | UI element interaction (click, hover, text, etc.) |
| `automation_mouse` | 9 | Mouse — HITL (human-in-the-loop) may apply |
| `automation_keyboard` | 4 | Keyboard — HITL (human-in-the-loop) may apply |
| `global_keylogger` | 5 | Session keyboard capture (opt-in: `PYWINAUTO_MCP_ENABLE_KEYLOGGER`; see SAFETY §6) |
| `automation_visual` | 4 | Screenshot, OCR, find image |
| `automation_face` | 5 | Face (opt-in: env + `face` extra) |
| `automation_system` | | status, **help**, wait, info, clipboard, processes, start_app |
| `get_desktop_state` | 1 | UI tree / discovery |
## Usage snippets
### `automation_windows`
automation_windows("find", title="Notepad", partial=True)
automation_windows("maximize", handle=12345)
### `automation_elements`
automation_elements("click", window_handle=12345, control_id="btnOK")
automation_elements("set_text", window_handle=12345, control_id="Edit1", text="Hello!")
### `automation_visual`
automation_visual("extract_text", image_path="screen.png")
automation_visual("find_image", template_path="button.png", threshold=0.8)
## Quick Start
git clone https://github.com/sandraschi/pywinauto-mcp
cd pywinauto-mcp
just
This opens an interactive dashboard showing all available commands. Run `just bootstrap` to install dependencies, then `just serve` or `just dev` to start.
### Manual Setup
If you don't have `just` installed:
## Installation
**Prerequisites:** [uv](https://docs.astral.sh/uv/) recommended Python 3.12+ Windows 10/11 Tesseract optional (visual/OCR).
**Run with uvx (published package):**
uvx pywinauto-mcp
**Claude Desktop (example):** replace `` with the directory where you cloned this repository (not a path from another machine).
"mcpServers": {
"pywinauto-mcp": {
"command": "uv",
"args": ["--directory", "", "run", "pywinauto-mcp"]
}
}
**Install from source:**
uv pip install -e .
**MCPB:**
mcpb install pywinauto-mcp
## Safety
**Canonical:** [`docs/SAFETY.md`](docs/SAFETY.md) two-server model with **`virtualization-mcp`**, HITL (human-in-the-loop), env vars, fleet docs.
**Dual-use / research framing:** [Dual-use tooling (research, forensics, guardrails)](docs/SAFETY.md#dual-use-tooling-research-forensics-and-guardrails) — why full desktop automation and hooks overlap offensive tooling in **capability**, how **intent and authorization** differ, and what this repo does to limit abuse.
Desktop automation is **not** browser-sandboxed. **Sampling** and long agent loops can multiply tool calls.
### Human-in-the-loop (HITL)
**HITL** (human-in-the-loop) means an operator must **approve** before mutating **mouse** / **keyboard** when the server is configured that way.
- **`approve_automation(duration_minutes=...)`** before mutating **mouse** / **keyboard**, or tools return `clarification_needed`.
- **`automation_mouse("position")`** and **`automation_mouse("telemetry")`** are read-only and skip approval.
## Operations Overview
### 🖱️ Mouse Control (`automation_mouse`)
Consolidates all mouse interactions: position tracking, clicking, dragging, and hovering.
**Pointer injection** uses the in-repo **`win32_mouse`** backend (**`SetCursorPos`** / **`mouse_event`**, DPI-aware) for **`automation_mouse`** and for coordinate-based clicks in **`automation_elements`** — not PyAutoGUI alone, so moves and clicks stay reliable on scaled displays. Responses may include **`input_backend: "win32_mouse"`**. **Failsafe** matches PyAutoGUI: moving the cursor to the **upper-left screen corner** aborts injected pointer ops unless **`PYWINAUTO_MCP_BYPASS_HITL=1`** (also disables that corner check for **`win32_mouse`**).
**Operations**: `position`, `click`, `double_click`, `right_click`, `move`, `move_relative`, `scroll`, `drag`, `hover`, `telemetry`
**Key Tool: Visual Telemetry HUD**
Launches a high-visibility, "Always-on-Top" overlay for real-time calibration:
- **Coordinate Tracking**: Live X/Y position updates.
- **Scrolling Input Buffer**: Displays the last 20 keyboard characters and click events (visual verification only, no persistence).
- **Industrial Safety**: High visibility ensures monitoring is transparent and operator-auditable.
# Example: Launch telemetry HUD for 30 seconds
mcp-invoke automation_mouse --operation "telemetry" --telemetry_duration 30
### Environment variables
| Env | Purpose |
|-----|---------|
| `PYWINAUTO_MCP_BYPASS_HITL=1` | Bypasses approval prompts and disables pointer failsafe (`pyautogui.FAILSAFE` and **`win32_mouse`** corner escape) for trust-controlled demos/CI. |
| `PYWINAUTO_MCP_KILL_SWITCH=1` | Blocks mutating mouse/keyboard (after HITL (human-in-the-loop) path). |
| `PYWINAUTO_MCP_MAX_ACTIONS_PER_MINUTE` | Default `120` rolling 60s cap for mutating actions. |
| `PYWINAUTO_MCP_DRY_RUN=1` | Count actions without sending input (`dry_run` in results). |
| `PYWINAUTO_MCP_ENABLE_FACE=1` | Allows registering **`automation_face`** (needs `face` extra). |
| `PYWINAUTO_MCP_ENABLE_KEYLOGGER=1` | Allows registering **`global_keylogger`** (session keyboard capture; see SAFETY §6). |
| `PYWINAUTO_LLM_BASE_URL` | Default OpenAI-compatible root for the **web_sota** local-LLM proxy (e.g. Ollama `http://127.0.0.1:11434/v1`, LM Studio `http://127.0.0.1:1234/v1`). The UI can override per session. |
| `PYWINAUTO_MCP_CAMERA_MAX_INDEX` | Max OpenCV index to probe for **`GET /api/v1/cameras/`** (default `10`, capped at 32). |
**`automation_safety(operation="status"|"reset_counters")`** counters and flags. **`automation_system("help")`** returns a structured overview (version, tools, safety keys, doc paths).
**Fleet:** `mcp-central-docs` `patterns/PYWINAUTO_MCP_SAFETY.md`.
## Testing (CI vs local)
See **[docs/TESTING.md](docs/TESTING.md)** environment-aware markers (`requires_hardware`, `destructive`, ) aligned with **mcp-central-docs** `standards/testing-environment-aware.md`. In CI, hardware-marked tests are skipped; run locally on Windows to exercise OpenCV / real window flows.
## Maintenance
- Documentation should match **implemented** tools and env vars.
- **`glama.json`:** update when releasing or when marketplace metadata changes.
- **Portable clones:** avoid committing machine-specific paths (e.g. a fixed drive + `Dev\repos\...`). Run **`just check-machine-paths`** (or `pwsh -File scripts/check-no-machine-paths.ps1`) on `src/` and key root files; use **`-FullRepo`** to scan the whole tree (legacy docs may still match until cleaned).
## 🛡️ Industrial Quality Stack
This project adheres to **SOTA 14.1** industrial standards for high-fidelity agentic orchestration:
- **Python (Core)**: [Ruff](https://astral.sh/ruff) for linting and formatting. Zero-tolerance for `print` statements in core handlers (`T201`).
- **Webapp (UI)**: [Biome](https://biomejs.dev/) for sub-millisecond linting. Strict `noConsoleLog` enforcement.
- **Protocol Compliance**: Hardened `stdout/stderr` isolation to ensure crash-resistant JSON-RPC communication.
- **Automation**: [Justfile](./justfile) recipes for all fleet operations (`just lint`, `just fix`, `just dev`).
- **Security**: Automated audits via `bandit` and `safety`.
## License
MIT Copyright (c) 2026 Sandra Schipal.
## Web dashboard (`web_sota`)
Optional Vite UI + local backend. Default ports from **`web_sota/start.ps1`** (or repo-root **`start.ps1`**): frontend **10788**, backend **10789**. The start scripts **wait for the Python API to be reachable** before starting Vite so the Vite proxy does not log connection refused errors during a cold **`uv run`**.
# From repo root (recommended)
.\start.ps1
# Or from web_sota
Set-Location web_sota
.\start.ps1
Open `http://localhost:10788` **Help** route documents safety, env vars, and tool overview (same themes as `automation_system("help")`).
**Local LLM chat** (`/chat`): proxies to **Ollama** or **LM Studio** (OpenAI-compatible `/v1` on localhost only). Pick a model, optional **personas**, **prompt refiner**, and **repo knowledge** pre-prompt (`src/pywinauto_mcp/llm_repo_context.py`) for questions like can I click and drag?. Start Ollama or LM Studio first; then **Refresh models**.