Sonofg0tham/ward

GitHub: Sonofg0tham/ward

Stars: 0 | Forks: 0

# Ward Ward is a CLI and a GitHub Action. It screens the metadata an AI agent ingests before any LLM-based reviewer, SAST agent, or IaC scanner sees it. The job: catch prompt injection attempts embedded in the places that traditional security tools ignore. ## Why this exists In March 2026, AI bots compromised five major GitHub projects. The attack class was the same in each: agents being hijacked through inputs that traditional scanners treat as inert metadata. Branch names. File names. Commit messages. PR titles. The existing security stack does not help here: - **SAST scanners** ignore branch names and commit messages. Those have never been an attack surface before. - **Secret scanners** look for credentials, not instructions. - **Prompt firewalls** (Lakera, LlamaFirewall, BoltClaw) sit at the LLM boundary inside the agent. By the time they see the text, it is already in the context window. - **OWASP ASI Top 10** names the pattern (ASI01, goal hijack via untrusted input) but does not ship tooling. Ward sits earlier. It runs against the surface area that attackers actually use, before any LLM has a chance to act on it. ## Where Ward fits in | Tool | Layer | Catches | |------|-------|---------| | **Ward** | Before the agent reads input | Prompt injection in branch names, file names, commit messages, PR titles, PR descriptions, code comments, README files | | **Lakera Guard** | LLM boundary | Prompt injection in the prompt itself, jailbreaks, off-topic queries | | **LlamaFirewall** | LLM boundary | Prompt injection, alignment violations, output policy enforcement | | **BoltClaw** | Agent configuration | Tampering with agent system prompts, tool allowlists, MCP configs | | **SAST / secret scanners** | Source code | Vulnerabilities and credentials in the code itself | Ward is one layer. It is not a replacement for the others. Defence in depth still applies. ## What Ward catches Six detector categories, 25+ rules out of the box: - **Instruction overrides** ("ignore previous instructions", "your new task is...", fake `[SYSTEM]` blocks). - **Role manipulation** (tokenizer tags like `<|im_start|>system`, "developer mode", DAN-style activation). - **Obfuscation** (zero-width unicode, RTL override, base64 blobs in unusual fields, hex blobs, HTML comments). - **Tool-call injection** (fake `` wrappers, JSON tool-call objects, `mcp://` URIs, shell metacharacters in names). - **Exfiltration prompts** (instructions to POST findings to a URL, include secrets, encode data in DNS queries). - **AI tool-specific quirks** (Anthropic Human / Assistant tags, Cursor command palette, Antigravity tool schemas, Copilot slash commands). ## Install pipx install ward-scanner Verify the install: ward version ## Use ### Scan a PR by reference export GITHUB_TOKEN=ghp_... ward scan-pr sonofg0tham/ward#42 ### Scan local git state ward scan-local Walks the working tree, scans the current branch name, the last 20 commit messages, tag names, every tracked file's path, and the top-of-file content of any `.md`, `.txt`, `.rst`, and source files. ### Scan a single string echo "feat/ignore-previous-instructions" | ward scan-stdin --surface branch_name Every other Ward command is built on this one. Pipe whatever string you want through it. ### Other commands ward scan-branch feat/ignore-previous-instructions ward scan-commit HEAD ward explain io.ignore_previous ### Output formats ward scan-local --format pretty # default, terminal table ward scan-local --format json # machine-readable ward scan-local --format sarif # GitHub Code Scanning compatible ### Severity thresholds # Drop anything below MEDIUM, only FAIL on CRITICAL. ward scan-local --severity-threshold medium --fail-on critical Exit codes: - `0` PASS, no findings above the threshold. - `1` WARN, findings exist but none reached the fail-on severity. - `2` FAIL, at least one finding at or above fail-on. ## Run the adversarial lab Ward ships with a built-in lab that runs each scripted attack scenario through two pipelines (unprotected and Ward-protected) and produces a Markdown report you can paste into a blog post or PR comment: ward lab attack # Wrote lab report: ward-lab-report.md # Blocked by Ward: 5/5 scenarios. The mock reviewer agent does not call an LLM. The lab demonstrates whether the untrusted instruction would have reached the agent's context window, not what the LLM would have done with it. Wiring in a real reviewer is the next step. Flags: `--output `, `--no-write` (print to stdout), `--fail-on `. ## Pre-commit hook If you use the [pre-commit](https://pre-commit.com/) framework, drop this into your `.pre-commit-config.yaml`: - repo: https://github.com/sonofg0tham/ward rev: v0.1.0 hooks: - id: ward-scan-local args: [--fail-on, high] Ward then runs on every `git commit` and `git push`, screening your branch name, commit messages, and tracked documentation files for injection patterns. Stops you committing a poisoned PR before it ever reaches GitHub. Other hook ids: `ward-scan-stdin` (designed for the `commit-msg` stage, screens the message you're typing), `ward-selftest` (manual, useful as a CI gate). ## GitHub Action Add it to a workflow in three lines: - uses: sonofg0tham/ward/action@v1 with: fail-on: high name: Ward on: [pull_request] permissions: contents: read security-events: write jobs: ward: runs-on: ubuntu-latest steps: - uses: actions/checkout@v4 - uses: sonofg0tham/ward/action@v1 with: fail-on: high format: sarif upload-sarif: true ## Use Ward as a Python SDK If you are building an agentic system (CrewAI, AutoGen, LangGraph, your own loop) and want to screen text before it reaches the model, import Ward directly: from ward import build_input, scan_inputs, load_rule_pack, Verdict # Load the bundled rule pack once at startup. pack = load_rule_pack() def safe_ingest(untrusted_text: str) -> str: inputs = [build_input("pr_body", untrusted_text, location="user-input")] report = scan_inputs(inputs, pack, target="my-agent") if report.verdict is not Verdict.PASS: flagged = [f.rule_id for f in report.findings] raise ValueError(f"Refusing to ingest untrusted text: {flagged}") return untrusted_text The 13 supported surface types (`branch_name`, `commit_message`, `pr_body`, `file_content`, ...) let you tune which rules apply. A LangGraph tool that ingests web search results would use `pr_body` or `file_content`; a CrewAI agent reading a filename would use `file_name`. ### Inside a LangGraph node from ward import build_input, scan_inputs, load_rule_pack, Verdict _pack = load_rule_pack() def web_search_node(state): text = state["search_result"] report = scan_inputs( [build_input("file_content", text, location="search")], _pack, target="search_result", ) if report.verdict is not Verdict.PASS: state["search_result"] = "(blocked by Ward)" state["ward_findings"] = [f.rule_id for f in report.findings] return state ### Inside a CrewAI tool from crewai.tools import BaseTool from ward import build_input, scan_inputs, load_rule_pack, Verdict class GuardedFileReader(BaseTool): name = "read_file" description = "Read a file, screened by Ward." _pack = load_rule_pack() def _run(self, path: str) -> str: text = open(path).read() report = scan_inputs( [build_input("file_content", text, location=path)], self._pack, target=path, ) if report.verdict is not Verdict.PASS: return f"(refused: Ward flagged {[f.rule_id for f in report.findings]})" return text ## Custom rule packs Drop a directory of YAML files alongside your repo and point Ward at it: ward scan-local --rule-pack ./security/ward-rules Each YAML file is a list of rules. Schema is documented in [`src/ward/rules/instruction_overrides.yaml`](src/ward/rules/instruction_overrides.yaml). ## Ignoring whole paths with `.wardignore` Some directories - test fixtures, security research notes, rule packs themselves - are intentionally adversarial and should not be scanned for content. Drop a `.wardignore` at the repo root with fnmatch-style globs: # .wardignore tests/fixtures/**/* # adversarial by design security/research/* # writeup of past attacks docs/threat-models/* Filenames in ignored paths are STILL scanned (a malicious filename remains suspicious even inside an ignored directory). Only the content scan is suppressed. Ward's own repo uses this to exclude its own source tree from self-scanning. ## Suppressing rules in documentation Security-research docs (Ward's own README included) need to *talk about* the attack strings without firing the scanner. Drop this directive near the top of any documentation file: The directive accepts rule ids or fnmatch-style globs, comma-separated. It is only honoured on `file_content` and `code_comment` surfaces, never on branch names, commit messages, PR titles, or PR bodies. That's the intentional asymmetry: attackers cannot suppress detection from inside the text Ward is trying to screen. Supported comment styles for the directive: # ward-allow-file: io.* # Python / Bash / YAML // ward-allow-file: io.* // JS / TS / Go / Rust / Java /* ward-allow-file: io.* */ /* C / CSS */ ## Evasion resistance Ward feeds detectors a normalised view of the text plus several alternative forms designed to defeat common evasion tricks: - **Leetspeak** — `1gn0r3 4ll pr3v10us` becomes `ignore all previous`. - **Intra-word separators** — `i.g.n.o.r.e` and `i-g-n-o-r-e` collapse to `ignore`. - **Repeated letters** — `ignooooore` and `previousssss` collapse to `ignore` and `previous`. Two collapse variants are tried (collapse to 1 letter and collapse to 2) so naturally-doubled English words like `all`, `free`, `see` survive. - **Zero-width unicode** — stripped before regex match. - **NFKC** — fullwidth and compatibility characters fold to ASCII. - **Base64 / hex blocks** — decoded and re-scanned. - **Identifier delimiters** — `-`, `_`, `/`, `.` in branch and file names normalise to spaces. **Known limitation:** the all-single-space case (`i g n o r e p r e v i o u s`) is not handled, because the original word boundaries cannot be recovered reliably from spaced-out singletons. Multi-space separators between words (`i g n o r e p r e v i o u s`) are still ambiguous and out of scope for v0.1. ## Threat model Ward is a pattern-matching tool. It catches the attack class documented in OWASP ASI Top 10 (ASI01) and in the March 2026 GitHub supply-chain incidents. It does **not** catch: - Novel zero-day injection techniques that match no rule. - Attacks embedded in non-text formats (images, PDFs, audio). - Attacks on the model itself once context has been built. That is a prompt firewall's job. - Vulnerabilities in the code being reviewed. That is SAST's job. See [SECURITY.md](SECURITY.md) for the full threat model and the vulnerability disclosure process. ## Telemetry Ward sends none. No phone home, no anonymous stats, no metrics collection. The only outbound network calls Ward ever makes are the GitHub API requests you explicitly trigger via `ward scan-pr`. ## Development git clone https://github.com/sonofg0tham/ward cd ward python -m venv .venv && source .venv/bin/activate # or .venv\Scripts\activate on Windows pip install -e ".[dev]" pytest Coverage target is 75% and current trunk runs at 83%. ## Licence MIT. See [LICENSE](LICENSE).