# PromptGenie
**Secure prompt engineering for AI agents and engineering teams.**
PromptGenie is a CLI that turns rough task descriptions into optimised, tool-specific, security-checked prompts — and executes them end-to-end. It ships a built-in linter, multi-file security scanner, diff engine, test runner, model benchmarker, context pack system, workflow engine, CI integration, quality scoring, token estimation, full UNIX-composable pipeline, and a declarative run engine that sends prompts to any provider (Anthropic, OpenAI, Ollama, LM Studio, vLLM) with streaming, variable resolution, context assembly, policy gates, and run history. v1.6.0 adds a unified `EventBus` / `EventFormatter` infrastructure so every lifecycle event — run tokens, lint findings, policy violations, eval results — flows through a single typed channel that commands, tests, and future integrations all subscribe to. v1.7.0 adds a formal JSON Schema for `.promptgenie.yaml`, `workspace:` and `defaults:` config blocks, `config validate` for CI-safe schema checking, and `config init` to scaffold a new config with editor autocomplete wired up.
## Why
Most prompt engineering is done by hand, rewritten constantly, and never tested. Prompts for agentic tools (Claude Code, Cursor, Devin) are especially risky: a vague scope or missing stop condition can cause scope creep, destructive edits, or unintended deployments.
PromptGenie makes prompts:
- **Structured** — section-by-section output matched to the target tool's requirements
- **Linted** — catches vague verbs, missing scope, broad tasks, and agentic risks before you send
- **Scanned** — multi-file, directory, and zip scanning; flags heuristic patterns consistent with secrets, prompt injection, and unsafe agent permissions; opt-in LLM semantic analysis layer with pre-send secret redaction
- **Diffed** — compare two versions with token delta, score delta, section changes, and risk changes
- **Tested** — declarative unit tests assert quality, safety, structure, and content before you ship
- **Benchmarked** — run prompts against real Claude models and score responses across 6 rubric dimensions
- **Context-aware** — reusable project context packs inject stack, architecture, pitfalls, and style into every prompt
- **Workflow-driven** — break complex tasks into staged prompt chains with approval gates, handoffs, and per-step scope locks
- **CI-integrated** — GitHub Actions workflow and pre-commit hooks keep bad prompts out of your repo
- **Machine-readable** — `--format json` and `--format sarif` on every lint and scan for CI pipelines and GitHub code scanning
- **UNIX-composable** — every command accepts `-` to read from stdin; pipe directly into `jq`, `sarif-fmt`, or your own tools without temp files
- **Scored** — rates every prompt across 7 quality dimensions
- **Repeatable** — YAML model profiles, templates, and context packs versioned alongside your code
- **Executable** — `promptgenie run spec.yaml` executes a PromptSpec end-to-end: resolves variables, assembles context, enforces policy, streams the response, and persists the run
- **Provider-agnostic** — built-in adapters for Anthropic, OpenAI, Ollama, LM Studio, LocalAI, vLLM, and NousResearch Hermes; add any OpenAI-compatible endpoint with one command; no API key needed for local providers
- **Hermes-ready** — first-class NousResearch Hermes support: a `hermes` target profile for `generate`/`adapt`/`lint`/scoring, plus a built-in OpenAI-compatible `hermes` provider (Nous Portal, `NOUS_API_KEY`) for `run`/`benchmark`/`evaluate`
## Quickstart
git clone https://github.com/mylesagnew/promptgenie.git
cd promptgenie
python3 -m venv .venv && source .venv/bin/activate
pip install -e .
# Generate a security-checked prompt for Claude Code
promptgenie generate "review this repo for security issues" --target claude-code
# Scan a prompt file for secrets and injection risks
promptgenie scan examples/auth-refactor.md
# Lint a prompt for quality issues
promptgenie lint examples/auth-refactor.md
# Check your installation health
promptgenie doctor
# Install shell tab-completion
promptgenie completion install zsh
# Launch the guided interactive menu
promptgenie interactive
**Event bus (v1.6.0+):**
from promptgenie.core.event_bus import EventBus
from promptgenie.core.event_formatters import NDJSONFormatter
from promptgenie.core.events import Event, EventKind
from promptgenie.core.run_engine import run_spec
bus = EventBus()
tokens: list[str] = []
bus.subscribe(EventKind.RUN_TOKEN, lambda e: tokens.append(e.text))
bus.subscribe_all(lambda e: print(e.to_ndjson())) # log everything
result = run_spec(spec, dry_run=False, event_bus=bus)
print("".join(tokens)) # assembled response
print(f"Events: {len(bus)}, tokens: {len(bus.of_kind(EventKind.RUN_TOKEN))}")
**Pipe-friendly (v1.1.0+):**
# Lint from stdin, extract issues with jq
cat prompt.md | promptgenie lint - --format json | jq '.issues[]'
# Scan and feed results directly to GitHub SARIF upload
cat prompt.md | promptgenie scan - --format sarif > findings.sarif
# Side-by-side diff with markdown output
promptgenie diff v1.md v2.md --side-by-side
promptgenie diff v1.md v2.md --format markdown > DIFF.md
# Generate with template variables
promptgenie generate "deploy {{service}} to {{env:string:staging}}" \
--target claude-code --var service=api --no-input
**PromptSpec and run engine (v1.2.0+):**
# Install provider support (httpx + Anthropic SDK)
pip install "promptgenie[providers]"
# Scaffold a declarative PromptSpec
promptgenie spec init code-review --target claude-code
# → creates code-review.prompt.yaml
# Validate the spec
promptgenie spec validate code-review.prompt.yaml
# Preview the assembled prompt without calling any provider
promptgenie spec render code-review.prompt.yaml --var env=prod
# Add a local Ollama provider (no API key needed)
promptgenie provider add ollama \
--base-url http://localhost:11434/v1 --model llama3 --local
promptgenie provider doctor ollama
# NousResearch Hermes (built-in provider — just set your key)
export NOUS_API_KEY=...
promptgenie provider doctor hermes
promptgenie generate "summarise this incident" --target hermes
promptgenie run code-review.prompt.yaml --provider hermes --model Hermes-4-405B --stream
# Execute the spec end-to-end
promptgenie run code-review.prompt.yaml --provider ollama --stream
# Dry run — resolve vars, build context, no provider call
promptgenie run code-review.prompt.yaml --dry-run --show-context
# Stream to stdout and write final response to file
promptgenie run code-review.prompt.yaml --tee response.md
# NDJSON event stream (pipeline-friendly)
promptgenie run code-review.prompt.yaml --format ndjson \
| jq 'select(.event=="done")'
# Assemble context from git diff + all Python files
promptgenie context build --git-diff --glob "src/**/*.py" --max-tokens 8000
# Inspect how spec variables would resolve
promptgenie vars inspect code-review.prompt.yaml --var env=prod
Expected output: a structured, linted, scored prompt ready to paste into your AI tool.
## Demo
**Generate a structured, scored prompt:**
$ promptgenie generate "review this repo for security issues" --target claude-code
╭─ Generated Prompt target: claude-code template: agentic-task mode: standard ─╮
│ # Prompt for Claude Code │
│ │
│ ## Objective │
│ review this repo for security issues │
│ │
│ ## Scope │
│ Work only within the explicitly listed files or directories. │
│ Do not modify files outside this scope without asking first. │
│ │
│ ## Stop Conditions │
│ Stop and ask for approval if: │
│ - Any file outside the defined scope needs to be modified │
│ - A new dependency would be added │
│ - A database schema change is required │
│ - Tests fail and the fix is non-obvious │
│ - The task would require a deployment │
│ │
│ ## Output Format │
│ Show diffs for each changed file. │
│ Run tests and report results. │
│ Summarise what changed and why. │
│ │
│ ## Acceptance Criteria │
│ Done when all objectives are met, output matches the requested format, │
│ and no forbidden actions were taken. │
╰──────────────────────────────────────────────────────────────────────────────────╯
╭──────────────────────── Prompt Quality Score ──────────────────────────────────╮
│ │
│ Target Fit 83 Task Clarity 90 │
│ Context Sufficiency 75 Output Contract 90 │
│ Safety Controls 82 Token Efficiency 95 │
│ Testability 90 │
│ │
│ Overall 86/100 Token estimate 150 │
│ │
╰────────────────────────────────────────────────────────────────────────────────╯
**Scan a prompt for static prompt-risk patterns:**
$ promptgenie scan examples/auth-refactor.md
╭──────────── Security Scan Risk: HIGH examples/auth-refactor.md ────────────╮
│ [HIGH] [PERM_006] Unrestricted package installation. │
│ → Restrict agent permissions to minimum required scope. │
│ Add explicit approval gates. │
╰──────────────────────────────────────────────────────────────────────────────╯
**Lint a prompt for quality issues:**
$ promptgenie lint examples/auth-refactor.md
╭────────────── Lint Results 80/100 examples/auth-refactor.md ───────────────╮
│ [HIGH] [AGENT_004] Allows unrestricted package installation. │
│ → Add explicit constraints and approval gates. │
╰──────────────────────────────────────────────────────────────────────────────╯
**Validate all built-in profiles against the schema:**
$ promptgenie validate-profiles
Validating 5 profile(s) in promptgenie/profiles
✓ [profile] chatgpt.yaml
✓ [profile] claude-code.yaml
✓ [profile] claude.yaml
✓ [profile] cursor.yaml
✓ [profile] gemini.yaml
All 5 file(s) valid.
**Validate everything at once (profiles, templates, context packs):**
$ promptgenie validate --all
✓ [profile] chatgpt.yaml
✓ [profile] claude-code.yaml
✓ [profile] claude.yaml
✓ [profile] cursor.yaml
✓ [profile] gemini.yaml
✓ [template] cyber_templates.yaml (7 templates)
✓ [context-pack] cyber-security-team.yaml
✓ [context-pack] django-rest-api.yaml
✓ [context-pack] react-supabase-app.yaml
All 15 file(s) valid.
Errors fail CI (exit 1). Warnings are advisory (exit 0). Use `--no-warnings` to suppress them.
## Architecture: Workspace schema (v1.7.0)
`.promptgenie.yaml` now has a published JSON Schema at `promptgenie/schemas/workspace.schema.json`. Wire it to VS Code for inline autocomplete and error highlighting:
// .vscode/settings.json
{
"yaml.schemas": {
"./promptgenie/schemas/workspace.schema.json": ".promptgenie.yaml"
}
}
Or use the `yaml-language-server` comment that `config init` writes automatically:
# yaml-language-server: $schema=https://promptgenie.dev/schemas/workspace.schema.json
$schema: "https://promptgenie.dev/schemas/workspace.schema.json"
workspace:
name: "my-project"
team: "platform-eng"
policy: ".promptgenie-policy.yaml"
defaults:
provider: anthropic
model: claude-opus-4-5
target: claude-code
security:
airgap: false
block_secrets: true
Validate any config with `promptgenie config validate` — catches unknown keys, type mismatches, invalid enum values, bad `expires` dates, and missing required rule fields:
# Validate and exit 0/1 (CI-safe)
promptgenie config validate
# Machine-readable output
promptgenie config validate --format json | jq '.errors[]'
# Scaffold a new config with schema pointer pre-wired
promptgenie config init --name "my-project"
## Architecture: Event model (v1.6.0)
Every observable lifecycle moment in PromptGenie is an `Event` — a frozen, NDJSON-serialisable value object with a typed `EventKind`.
EventKind domains
─────────────────────────────────────────────────────
run.* start · token · warning · error · tool_call · done · dry
lint.* finding
scan.* finding
policy.* pass · violation
diff.* computed
eval.* result
ci.* check
audit.* write
Commands emit events; formatters and subscribers consume them:
EventBus ──subscribe(kind, fn)──► Listener callbacks
──subscribe_all(fn)────► Catch-all (audit, telemetry)
──emit(event)──────────► dispatches to all matching listeners
──emit_to(event, fmt)──► dispatch + format + write in one call
──collected / of_kind──► test assertions without stdout mocking
Built-in formatters implement the `EventFormatter` protocol (`format(event) → str | None`):
| Formatter | Emits | Suppresses |
|---|---|---|
| `NDJSONFormatter` | all events as JSON lines | — |
| `TokenOnlyFormatter` | `run.token` text only | everything else |
| `RichFormatter` | human-readable Rich markup | `run.token` |
| `SilentFormatter` | nothing | everything |
`run_spec()` accepts `event_bus=` alongside the legacy `on_token=` / `on_event=` callbacks — fully backward-compatible. A subscriber exception never propagates into the run pipeline.
## Install
git clone https://github.com/mylesagnew/promptgenie.git
cd promptgenie
python3 -m venv .venv && source .venv/bin/activate
pip install -e .
Optional extras:
| Extra | What it adds | Install |
|---|---|---|
| `benchmark` | `anthropic` SDK — required for `promptgenie benchmark` | `pip install "promptgenie[benchmark]"` |
| `tokenizer` | `tiktoken` — accurate token counts (falls back to `len/4` without it) | `pip install "promptgenie[tokenizer]"` |
| `providers` | `httpx` + `anthropic` SDK — required to run prompts against providers | `pip install "promptgenie[providers]"` |
| _(no extra)_ | `openai` SDK — required for `promptgenie scan --llm` (not packaged as an extra) | `pip install openai` |
### Docker
# Build
docker build -t promptgenie .
# Run any command
docker run --rm promptgenie generate "review this repo for security issues" --target claude-code
# Benchmark (requires API key)
docker run --rm -e ANTHROPIC_API_KEY=$ANTHROPIC_API_KEY \
-v "$PWD":/prompts promptgenie benchmark /prompts/my-prompt.md --yes
The image runs as a non-root user (`promptgenie`, uid 1001). Mount a local directory with `-v` to read and write prompt files.
## Commands
| Command | Description |
|---|---|
| `generate` | Build an optimised prompt from a rough task description; resolves `{{variable}}` placeholders |
| `lint` | Check a prompt file for quality and structural issues |
| `scan` | Scan files, directories, and zip archives for security risks; opt-in LLM semantic analysis |
| `policy` | CI policy gate — fail the build if findings breach configurable thresholds; outputs text, JSON, or SARIF |
| `diff` | Compare two prompt versions — token, score, section, and risk delta; `--side-by-side`, `--format json\|yaml\|markdown` |
| `adapt` | Translate a prompt from one target profile to another |
| `test` | Run a declarative prompt test suite (exits 5 on assertion failures) |
| `benchmark` | Run a prompt against a Claude model and score the output |
| `workflow` | Generate a staged prompt chain from a `.workflow.yaml` file |
| `doctor` | Self-check — Python version, config, provider keys, extras, Ollama, shell completion |
| `completion install` | Install tab-completion for zsh, bash, or fish |
| `completion show` | Print the completion script to stdout |
| `completion status` | Show per-shell installation state and cache freshness |
| `completion refresh-cache` | Rebuild the dynamic completion cache |
| **Phase 2** | |
| `spec init` | Scaffold a new PromptSpec YAML file |
| `spec validate` | Validate a PromptSpec against the JSON Schema |
| `spec render` | Resolve variables and preview the assembled prompt |
| `spec schema` | Print the PromptSpec JSON Schema |
| `run` | Execute a PromptSpec end-to-end (vars → context → gate → send → stream) |
| `context build` | Assemble context from files, globs, git diff, stdin, URLs |
| `provider list` | List all configured AI providers |
| `provider add` | Add or update a provider (Ollama, OpenAI, vLLM, LM Studio, …) |
| `provider doctor` | Test provider reachability and configuration |
| `provider show` | Show capabilities and config for a provider |
| `vars list` | List `{{variable}}` placeholders declared in a spec |
| `vars inspect` | Show resolved value + source for each variable |
| `compress` / `optimize` | Shrink a prompt's token footprint with native content-routed compression; `--max-tokens` budget, `--aggressive`, `--format json` |
| `pack list` | List available context packs |
| `pack show` | Preview a context pack's rendered content |
| `pack inject` | Inject a context pack into an existing prompt file |
| `pack init` | Create a new blank context pack |
| `pack search` | Search the registry index for available rule/context packs |
| `pack install` | Download and install a pack from the registry |
| `pack update` | Fetch the remote registry and install/update all packs |
| `pack dirs` | Show registry and user rules directories |
| `ci init` | Scaffold GitHub Actions and pre-commit hooks into a project |
| `ci status` | Check which CI integrations are active |
| `list-targets` | Show all available model profiles |
| `list-templates` | Show all available prompt templates |
| `validate` | Validate YAML config files — profiles, templates, context packs, workflows, prompt tests |
| `validate-profiles` | Validate all profile YAML files against the profile schema |
| `interactive` | Launch the guided menu — generate, lint, scan, diff, test, and more |
| **Phase 3 — SecDevOps** | |
| `analyze` | Aggregate lint + scan with unified OWASP-aligned finding model; SARIF/JSON/Rich |
| `redact` | Replace secrets and PII with `[REDACTED:LABEL]` placeholders |
| `redteam` | 13 offline OWASP LLM Top 10 attack packs; heuristic susceptibility judge |
| `auth login` | Store provider credentials in keyring or env |
| `auth logout` | Remove stored credentials |
| `auth status` | Show credential resolution for all providers |
| `audit list` | View tamper-evident audit log (SQLite, SHA-256 chain) |
| `audit export` | Export audit log to JSON/CSV/NDJSON |
| `audit verify` | Verify the audit chain has not been tampered with |
| `config show` | Show current effective config (rich / JSON / YAML) |
| `config set` | Set a config key (e.g. `security.airgap true`) |
| `config get` | Print the current value of a config key |
| `config validate` | Validate `.promptgenie.yaml` against the workspace schema; exits 0/1/2; `--format json` for CI |
| `config init` | Scaffold a new `.promptgenie.yaml` with JSON Schema pointer and editor autocomplete comment |
| **Phase 4 — Evaluation** | |
| `evaluate` | Multi-model matrix evaluation with latency, cost, safety, and rubric metrics |
| `eval init` | Scaffold a new eval suite YAML file |
| `eval run` | Run an eval suite against a prompt or spec |
| `eval compare` | Compare current run to a baseline; exit 8 on regression |
| `eval approve` | Approve current snapshots as the new baseline |
| **Phase 5 — TUI and Ecosystem** | |
| `tui` | Full-screen Textual TUI (requires `pip install "promptgenie[tui]"`) |
| `wizard` | Guided 8-step prompt-building Q&A |
| `palette` | Fuzzy command palette across commands, templates, and history |
| `history list` | Browse run history with filtering |
| `history show` | Inspect a single run's events and response |
| `history diff` | Diff two historical responses |
| `history replay` | Re-run a historical spec (supports `--dry-run`) |
| `watch` | File watcher — re-runs lint/scan/policy on change |
| `template list` | List templates (project → user → built-in resolution) |
| `template render` | Render a template with variables |
| `lock` | Create a lockfile with SHA-256 hashes of all spec dependencies |
| `plugin list` | List installed plugins |
| `plugin scaffold` | Scaffold a new plugin stub |
### `generate`
Generate an optimised prompt from a rough task description.
promptgenie generate "refactor the auth module to use JWT" \
--target claude-code \
--mode exhaustive
promptgenie generate "threat model the payment API" \
--target claude \
--template threat-model \
--context "Django REST API, Stripe integration, PostgreSQL" \
--out payment-threat-model.md
**Options:**
| Flag | Description |
|---|---|
| `--target`, `-t` | Target AI tool. Auto-inferred if omitted. |
| `--template`, `-T` | Template ID (e.g. `threat-model`, `agentic-task`). Auto-inferred if omitted. |
| `--context`, `-c` | Project or task context. |
| `--constraints`, `-x` | Constraints or forbidden actions. |
| `--output-format`, `-f` | Desired output format for the generated prompt. |
| `--mode`, `-m` | `minimal` / `standard` / `exhaustive` |
| `--out`, `-o` | Save prompt to file. |
| `--pack`, `-p` | Context pack ID to inject (e.g. `react-supabase-app`). |
| `--no-lint` | Skip inline lint pass. |
| `--no-scan` | Skip inline security scan. |
**Modes:**
| Mode | Use for |
|---|---|
| `minimal` | Reasoning models, simple tasks, low token budget |
| `standard` | Default — balanced structure and detail |
| `exhaustive` | Agentic tools, complex tasks, security-critical workflows |
### `lint`
Check a prompt file for quality and structural issues.
# Default rich terminal output
promptgenie lint my-prompt.md
# Machine-readable JSON (CI scripts, dashboards)
promptgenie lint my-prompt.md --format json
# SARIF for GitHub code scanning
promptgenie lint my-prompt.md --format sarif --out lint-results.sarif
# Read from stdin — pipe-friendly
cat my-prompt.md | promptgenie lint - --format json
echo "Do the thing." | promptgenie lint - --format json | jq '.issues[]'
**What it checks:**
- Vague verbs (`help`, `fix`, `improve`, `make better`)
- Multiple tasks chained in one prompt
- Missing target AI tool
- Overly broad scope (`fix the whole app`, `update all files`)
- Missing stop conditions (agentic prompts)
- Missing scope definition
- Missing forbidden actions
- Missing output format
- Missing success criteria
- Dangerous agentic instructions (`do whatever it takes`, `deploy to production`, `drop the table`)
Exits `1` if any HIGH severity issues are found — safe to use in CI.
**Options:**
| Flag | Description |
|---|---|
| `--format` | Output format: `rich` (default) / `json` / `sarif` |
| `--out`, `-o` | Write output to file instead of stdout |
### `scan`
Scan one or more prompt files, directories, or zip archives for security risks, with an optional LLM semantic analysis layer.
# Single file — original behaviour preserved
promptgenie scan my-prompt.md
# Read from stdin
cat my-prompt.md | promptgenie scan -
cat my-prompt.md | promptgenie scan - --format json | jq '.findings[]'
# Entire directory (recursive)
promptgenie scan ./prompts/
# Zip archive — all contained prompt files scanned, zip-slip protected
promptgenie scan prompts-bundle.zip
# Mix of files, directories, and zips
promptgenie scan prompt1.md ./more-prompts/ archive.zip
# Machine-readable JSON (aggregate output for multi-file scans)
promptgenie scan ./prompts/ --format json
# SARIF for GitHub code scanning upload (all files in one run)
promptgenie scan ./prompts/ --format sarif --out scan-results.sarif
# Opt-in LLM semantic analysis (requires OPENAI_API_KEY)
promptgenie scan my-prompt.md --llm
# Air-gap / privacy mode — suppress all LLM network calls
promptgenie scan my-prompt.md --llm --no-external-llm
# CI gate — fail on any finding at or above MEDIUM
promptgenie scan ./prompts/ --fail-on-severity MEDIUM
# Show which files were skipped (size cap, wrong suffix, quota)
promptgenie scan ./prompts/ --show-skipped
**What it flags (heuristic patterns):**
| Category | Pattern examples flagged |
|---|---|
| Secrets | API keys, tokens, AWS credentials, private keys embedded in prompt |
| Prompt injection | Instruction overrides, system prompt extraction, output suppression |
| Agent permissions | Unrestricted filesystem access, arbitrary code execution, unsupervised publishing |
| RAG risks | Instructions that follow retrieved content, untrusted input pipelines |
| Chained risks | Web fetch + action (email/deploy/write) without approval gate |
Exits `1` on CRITICAL or HIGH findings — safe to use in CI or pre-commit hooks.
The scanner reports the **class** of secret found, never the secret value itself.
**Options:**
| Flag | Default | Description |
|---|---|---|
| `--format` | `rich` | Output format: `rich` / `json` / `sarif` |
| `--out`, `-o` | — | Write output to file instead of stdout |
| `--llm` | off | Enable opt-in LLM semantic analysis (requires `OPENAI_API_KEY`) |
| `--no-external-llm` | off | Suppress all LLM network calls (privacy / air-gap mode) |
| `--max-files N` | 500 | Cap total files collected across all paths |
| `--max-bytes N` | 10485760 | Cap total uncompressed bytes (default 10 MB) |
| `--max-file-bytes N` | 1048576 | Skip individual files over this size (default 1 MB) |
| `--fail-on-severity` | — | Exit 1 when any finding meets or exceeds this level (`LOW`/`MEDIUM`/`HIGH`/`CRITICAL`) |
| `--show-skipped` | off | Print files excluded due to size cap, wrong suffix, or quota |
| `--config PATH` | — | Path to `.promptgenie.yaml` |
| `--no-config` | — | Ignore any `.promptgenie.yaml` |
| `--best-effort` | off | Fall back to built-in defaults on missing config |
**Multi-file resource limits:**
Files are collected before scanning. Limits are applied per-collection run:
- Files with unsupported suffixes are skipped (`wrong_suffix`)
- Files over `--max-file-bytes` are skipped (`too_large`)
- Once total collected bytes reach `--max-bytes` or file count reaches `--max-files`, remaining files are skipped (`quota_exceeded`)
- Use `--show-skipped` to see which files were excluded and why
**Zip archive safety:**
Each zip member path is validated before extraction — absolute paths, `..` traversal sequences, resolved paths escaping the extraction root, and Unix symlinks all raise a hard error and skip the archive. The member count is capped at 1 000.
**LLM semantic analysis (`--llm`):**
Off by default — explicit opt-in required. When enabled:
- Content is pre-scanned for secrets and redacted before any text leaves the host
- Content is capped at 8 000 characters per file before the API call
- API key is read from `OPENAI_API_KEY` (or a custom env var via config)
- Pass `--no-external-llm` to block all network calls even if `--llm` is set (air-gap / CI privacy mode)
- LLM findings are included in JSON output under `files[].llm`; they do not affect the heuristic risk level
**Scan JSON output** includes `category` (rule category: `secret`, `injection`, `permission`, `rag`, `obfuscation`) and `source` (`builtin` / `registry` / `custom`) on every finding.
**Secret rule IDs** — each secret pattern has a unique code for precise suppression:
| Code | Pattern |
|---|---|
| `SEC_SECRET_APIKEY` | Generic `sk-` / `api_key` patterns |
| `SEC_SECRET_TOKEN` | Generic bearer tokens |
| `SEC_SECRET_OPENAI` | OpenAI `sk-` keys |
| `SEC_SECRET_GOOGLE` | Google API keys (`AIza…`) |
| `SEC_SECRET_SLACK` | Slack tokens (`xox[bpoas]-…`) |
| `SEC_SECRET_PRIVKEY` | PEM private key headers |
| `SEC_SECRET_GITHUB` | GitHub PATs (`ghp_…`, `github_pat_…`) |
| `SEC_SECRET_AWS_KEY` | AWS access key IDs (`AKIA…`) |
| `SEC_SECRET_AWS_SECRET` | AWS secret access key patterns |
Use `SEC_SECRET` as an alias in `enabled_rules` / `disabled_rules` config to target all secret rules at once.
### `policy`
CI policy gate — run lint and scan together and exit non-zero if findings exceed configurable thresholds. Designed to be dropped into any GitHub Actions step or pre-push hook.
# Fail if any HIGH-or-above security finding exists (default)
promptgenie policy my-prompt.md
# Fail if any CRITICAL finding exists, AND lint score drops below 70
promptgenie policy my-prompt.md --max-risk CRITICAL --min-score 70
# Allow up to 2 MEDIUM findings before failing
promptgenie policy my-prompt.md --max-risk MEDIUM --max-findings 2
# Machine-readable JSON output for CI dashboards
promptgenie policy my-prompt.md --format json
**Exit codes:**
| Code | Meaning |
|---|---|
| `0` | All thresholds passed — prompt is clean |
| `1` | One or more thresholds exceeded — findings printed |
| `2` | Usage / configuration error (bad file path, invalid config) |
**Options:**
| Flag | Default | Description |
|---|---|---|
| `--max-risk` | `HIGH` | Fail if any security finding is at or above this level (`CRITICAL` / `HIGH` / `MEDIUM` / `LOW`) |
| `--max-findings` | `0` | Fail if total qualifying findings exceed this count; `0` = any qualifying finding fails |
| `--min-score` | `0` | Fail if lint quality score is below this value; `0` = lint score not checked |
| `--format` | `text` | Output format: `text` (Rich table), `json` (machine-readable), or `sarif` (SARIF v2.1.0) |
| `--config PATH` | — | Path to `.promptgenie.yaml` |
| `--no-config` | — | Ignore any `.promptgenie.yaml` |
**Expired allowlist warnings** — when the loaded config contains allowlist entries that have expired (or have a malformed `expires` date), the policy command surfaces them:
- `--format json`: `allowlist_warnings` array in the output document
- `--format text`: `⚠ Allowlist: …` line per expired entry in the Rich output
This keeps stale suppressions visible in CI rather than silently inactive.
**Example GitHub Actions steps:**
# Text output with Rich table — good for human-readable CI logs
- name: PromptGenie policy gate
run: promptgenie policy my-prompt.md --max-risk HIGH --min-score 75
# SARIF output — upload to GitHub Code Scanning
- name: PromptGenie policy (SARIF)
run: promptgenie policy my-prompt.md --format sarif --max-risk MEDIUM > policy.sarif
- name: Upload SARIF results
uses: github/codeql-action/upload-sarif@v3
with:
sarif_file: policy.sarif
### `diff`
Compare two prompt versions side-by-side — tokens, quality scores, section changes, lint changes, and security finding changes.
promptgenie diff v1.md v2.md --target claude-code
promptgenie diff v1.md v2.md --target claude-code --unified
# Diff stdin against a saved version
cat new-draft.md | promptgenie diff - v1.md
**What it shows:**
| Panel | Content |
|---|---|
| **Summary** | Tokens, quality score, lint count, security findings — A vs B with delta |
| **Quality Score Breakdown** | All 7 dimensions side-by-side with per-dimension delta |
| **Section Changes** | Each `## Section` marked ADDED / REMOVED / CHANGED / UNCHANGED with inline line diffs |
| **Lint Changes** | Issues resolved in v2 vs new issues introduced |
| **Security Changes** | Findings resolved in v2 vs new findings introduced |
**Options:**
| Flag | Description |
|---|---|
| `--target`, `-t` | Profile to use for quality scoring (default: `claude`) |
| `--unified`, `-u` | Show full colour-coded unified diff |
### `adapt`
Translate a prompt written for one target into another — rewriting model-specific language, preserving agentic safety sections by default, and adding sections required by the destination profile.
# Claude Code → Cursor (same agentic category — all safety sections kept)
promptgenie adapt my-prompt.md --from claude-code --to cursor
# Claude Code → ChatGPT (safety sections preserved by default)
promptgenie adapt my-prompt.md --from claude-code --to chatgpt --out chatgpt-prompt.md
# Explicitly strip safety sections when adapting to a non-agentic target
promptgenie adapt my-prompt.md --from claude-code --to chatgpt --strip-agentic-safety
# Show original alongside adapted version
promptgenie adapt my-prompt.md --from claude-code --to gemini --show-original
# Adapt from stdin
cat my-prompt.md | promptgenie adapt - --from claude-code --to cursor
**What it does:**
| Scenario | Behaviour |
|---|---|
| Agentic → Agentic (e.g. `claude-code` → `cursor`) | Keeps all sections, rewrites model name |
| Agentic → General, default (e.g. `claude-code` → `chatgpt`) | **Preserves** scope / stop conditions / constraints; notes in change log |
| Agentic → General with `--strip-agentic-safety` | Drops scope / stop conditions / constraints, warns you, trims tokens |
| Missing required sections | Generates default content from the destination profile |
| Forbidden patterns in content | Replaces with `[REMOVED — forbidden by target profile]` |
Outputs a colour-coded change log (KEPT / REWRITTEN / ADDED / DROPPED per section) and a score and token summary with delta.
**Options:**
| Flag | Description |
|---|---|
| `--from` | Source target profile |
| `--to` | Destination target profile |
| `--out`, `-o` | Save adapted prompt to file |
| `--show-original` | Print original alongside adapted version |
| `--strip-agentic-safety` | Remove agentic safety sections when adapting to a non-agentic target (off by default) |
### `compress` / `optimize`
Shrink a prompt's (or assembled context's) token footprint *before* it reaches the model — same content, fewer tokens. A native, dependency-free engine inspired by [headroom](https://github.com/headroomlabs-ai/headroom): content-routed structural techniques, no Rust toolchain or heavy ML deps. `optimize` is an alias for `compress`.
# Compress to stdout (lossless default tier)
promptgenie compress prompt.md
# Write the smaller version to a file
promptgenie compress prompt.md --out smaller.md
# Hit a token budget — enables every technique, exits 1 if it can't fit
promptgenie compress prompt.md --max-tokens 4000
# Add the aggressive (mildly lossy) tier and show what changed
promptgenie compress prompt.md --aggressive --diff
# Machine-readable savings report
promptgenie compress prompt.md --format json | jq '.tokens_saved'
# Pipe-friendly
cat context.md | promptgenie compress -
**Techniques** (fence-aware — fenced ```code``` blocks are never altered):
| Technique | Tier | What it does |
|---|---|---|
| `trim-trailing-ws` | default | Strip trailing whitespace at line ends |
| `collapse-blank-lines` | default | Collapse 2+ consecutive blank lines into one |
| `json-compact` | default | Minify whole-document JSON and ```json fenced blocks |
| `strip-html-comments` | aggressive | Remove `` from prose |
| `collapse-spaces` | aggressive | Collapse runs of inline spaces in prose (keeps indentation) |
| `dedupe-log-lines` | aggressive | Fold 3+ identical consecutive lines into `line (×N)` |
The **default** tier is lossless / near-lossless for Markdown prompts. The **aggressive** tier (via `--aggressive`, or automatically when `--max-tokens` is set) trades a little fidelity for higher savings — ideal for build logs, search dumps, and verbose tool output. Run `promptgenie compress --list-techniques` for the live catalogue.
**Options:**
| Flag | Description |
|---|---|
| `--out`, `-o` | Write compressed output to a file instead of stdout |
| `--max-tokens N` | Target token budget; enables all techniques; exits 1 if the result still exceeds N |
| `--techniques T,T` | Run an explicit subset of techniques (overrides the tiers) |
| `--aggressive` | Add the aggressive tier on top of the defaults |
| `--list-techniques` | Print the technique catalogue and exit |
| `--diff` / `--dry-run` | Report per-technique savings to stderr (`--dry-run` skips writing/emitting output) |
| `--format` | Output format: `text` (default) / `json` / `yaml` |
Exits `0` on success, `1` when a `--max-tokens` budget cannot be met, `2` on a bad technique name or unreadable file.
### `test`
Run a declarative prompt test suite defined in a `.prompt-test.yaml` file. Assert content, structure, quality scores, token budgets, lint severity, and security risk — all without sending the prompt to a model.
promptgenie test my-suite.prompt-test.yaml
promptgenie test my-suite.prompt-test.yaml --verbose
**Test file format:**
prompt: path/to/my-prompt.md # relative to the test file
target: claude-code
description: "Auth refactor prompt — safety and quality assertions"
tests:
- name: has explicit stop conditions
must_include:
- "Stop and ask"
- "approval"
- name: scope is restricted
must_include:
- "src/auth"
must_not_include:
- "entire codebase"
- "fix everything"
- name: no unsafe agentic patterns
must_not_include:
- "do whatever it takes"
- "deploy to production"
- name: required sections present
required_sections:
- Objective
- Scope
- Stop Conditions
- Acceptance Criteria
- name: quality score threshold
min_score: 80
- name: token budget
max_tokens: 500
- name: no high lint issues
max_lint_severity: MEDIUM
- name: no high security findings
max_security_risk: MEDIUM
- name: no production deployment pattern
regex_not_match:
- "deploy to (prod|production|live)"
**All assertion types:**
| Assertion | What it checks |
|---|---|
| `must_include` | Phrase is present in the prompt (case-insensitive) |
| `must_not_include` | Phrase is absent from the prompt |
| `required_sections` | `## Section` heading exists |
| `regex_match` | Regex matches anywhere in the prompt |
| `regex_not_match` | Regex does not match |
| `min_score` | Quality score ≥ threshold |
| `max_tokens` | Token count ≤ budget |
| `max_lint_severity` | No lint issue worse than HIGH / MEDIUM / LOW |
| `max_security_risk` | No security finding worse than CRITICAL / HIGH / MEDIUM / LOW |
Exits `0` on full pass, `1` on any failure — safe to run in CI or as a pre-commit hook.
**Options:**
| Flag | Description |
|---|---|
| `--verbose`, `-v` | Show all assertions including passing ones |
See [`examples/auth-refactor.prompt-test.yaml`](examples/auth-refactor.prompt-test.yaml) for a full working example.
### `benchmark`
Run a prompt against a Claude model, score the response across 6 rubric dimensions using a judge model, and report token usage, latency, and estimated cost. Compare two prompts head-to-head across multiple runs.
# Requires ANTHROPIC_API_KEY
export ANTHROPIC_API_KEY=sk-ant-...
# Single run
promptgenie benchmark my-prompt.md
# Specific model, print full response
promptgenie benchmark my-prompt.md --model claude-opus-4-8 --show-response
# Average scores across 3 runs
promptgenie benchmark my-prompt.md --runs 3
# Compare two prompt versions head-to-head
promptgenie benchmark v1.md --compare v2.md --runs 3
# Save model response to file
promptgenie benchmark my-prompt.md --out response.md
**Rubric dimensions:**
| Dimension | What it measures |
|---|---|
| Relevance | Did the response address the prompt objective? |
| Completeness | Were all tasks, sections, and requirements covered? |
| Format Compliance | Did the output match the requested format? |
| Safety Compliance | Did the response respect constraints and stop conditions? |
| Conciseness | Was the output free of padding and unnecessary repetition? |
| Actionability | Is the output specific, concrete, and immediately usable? |
**What it outputs:**
| Panel | Content |
|---|---|
| **Benchmark** | Score per dimension + overall, model, latency, token usage (with cache breakdown), estimated cost |
| **Judge Reasoning** | One-sentence explanation per dimension from the judge model |
| **Prompt Comparison** | Side-by-side A vs B scores with delta column, when `--compare` is used |
The response is scored by a separate judge call (claude-haiku — fast and cheap) so benchmark results are comparable across models and prompt versions. Prompt caching is applied to the judge system prompt, reducing cost on repeated runs.
**Options:**
| Flag | Description |
|---|---|
| `--model`, `-m` | Claude model to benchmark (default: `claude-sonnet-4-6`) |
| `--runs`, `-n` | Number of runs — scores are averaged (default: 1) |
| `--compare`, `-c` | Second prompt file to benchmark and compare |
| `--api-key` | Anthropic API key (or set `ANTHROPIC_API_KEY`) |
| `--show-response` | Print full model response to terminal |
| `--out`, `-o` | Save model response to file |
| `--yes`, `-y` | Skip external-send confirmation prompt (for CI/non-interactive use) |
**Provider abstraction:**
`benchmark` is backed by a `ModelProvider` protocol. The default is `AnthropicProvider`. To plug in a different backend, implement the three-method interface and pass it to `run_benchmark()` in Python:
from promptgenie.core.benchmarker import run_benchmark, ModelProvider
class MyProvider:
def complete(self, model, prompt, system=None):
# returns (response_text, {"input": n, "output": n, "cache_read": 0, "cache_write": 0})
...
def judge_model(self):
return "my-judge-model"
def estimate_cost(self, model, input_tokens, output_tokens, cache_read, cache_write):
return 0.0
results = run_benchmark("my-prompt.md", model="my-model", provider=MyProvider())
### `workflow`
Break a complex task into a staged prompt chain — one focused prompt per step, with handoffs, approval gates, per-step scope locks, and stop conditions. Agentic tools perform significantly better with staged prompts than with a single large prompt.
# Show all steps + full prompts
promptgenie workflow my-feature.workflow.yaml
# Summary and step index only
promptgenie workflow my-feature.workflow.yaml --summary
# Show a single step
promptgenie workflow my-feature.workflow.yaml --step 3
# Save all steps as individual .md files
promptgenie workflow my-feature.workflow.yaml --out ./prompts/
**`.workflow.yaml` format:**
name: secure-login-feature
description: "Build a secure JWT login system end-to-end"
target: claude-code
context_pack: react-supabase-app # optional — injected into step 1
mode: standard
steps:
- id: inspect
name: Inspect existing auth
objective: "Map the current authentication architecture and identify gaps"
scope:
- src/auth/
- src/middleware/
output: "Architecture summary with file map and identified gaps"
- id: plan
name: Propose implementation plan
depends_on: inspect
objective: "Propose a JWT implementation plan based on the inspection"
output: "Numbered plan with file list and risk notes"
requires_approval: true # model stops here for human review
- id: implement
name: Implement middleware
depends_on: plan
objective: "Implement JWT middleware only, as per the approved plan"
scope:
- src/middleware/auth.ts
forbidden:
- Do not touch files outside scope
- Do not install packages without approval
stop_conditions:
- Tests fail
- A file outside scope needs changing
output: "Diff of changed files + test results"
- id: security-review
name: Security review
depends_on: implement
objective: "Security review of the JWT implementation"
output: "Findings table: | Finding | Severity | Recommendation |"
requires_approval: true
**Step fields:**
| Field | Description |
|---|---|
| `id` | Unique step identifier (used in `depends_on`) |
| `name` | Human-readable step name |
| `objective` | What this step must accomplish |
| `depends_on` | ID of the step that must complete first |
| `scope` | Files or directories the model may touch |
| `forbidden` | Actions explicitly prohibited in this step |
| `stop_conditions` | Conditions that require stopping and asking for approval |
| `output` | Expected output format or deliverable |
| `requires_approval` | If `true`, inserts an approval gate — model will not proceed |
| `context_note` | Optional extra notes for this step |
**What each rendered step contains:**
- Workflow header and step number
- Handoff summary from the previous step
- Objective, scope, forbidden actions, stop conditions
- Approval gate notice (if set)
- Expected output and acceptance criteria
**Options:**
| Flag | Description |
|---|---|
| `--summary` | Show step index only — no prompt content |
| `--step N` | Render a single step by number |
| `--out DIR` | Save all steps as `step_01_name.md` files in a directory |
See [`examples/secure-login.workflow.yaml`](examples/secure-login.workflow.yaml) for a full 6-step example with approval gates and a context pack.
### `pack`
Context packs are reusable YAML files that capture everything a model needs to know about your project — stack, architecture, coding style, forbidden changes, known pitfalls, and terminology. Use them to stop repeating yourself across every prompt.
PromptGenie also ships a **plugin registry** — a versioned index of rule packs and context packs that can be installed with `pack update` or `pack install`. Registry packs are stored in `~/.promptgenie/registry/packs/` and loaded automatically by the scanner and linter when referenced via `rules_dirs` config.
**Search the registry:**
promptgenie pack search
promptgenie pack search owasp
**Install a specific pack:**
promptgenie pack install owasp-llm-top10
Both `pack install` and `pack update` verify the SHA-256 checksum of every downloaded file against the registry index. Install is refused if no checksum is present. Pass `--allow-unverified` only when using a private registry that does not yet publish checksums (a visible warning is shown):
# Private registry without checksums — bypass with explicit opt-in
promptgenie pack install my-private-pack --allow-unverified
promptgenie pack update --url https://my-registry.example.com/index.yaml --allow-unverified
**Update all packs from the remote registry:**
promptgenie pack update
**Show registry and user directories:**
promptgenie pack dirs
**Built-in packs (shipped with PromptGenie, no network required — 14 total):**
| Pack ID | Type | Description |
|---|---|---|
| `owasp-llm-top10` | rules | OWASP LLM Top 10 scanner rules (2025 edition) |
| `enterprise-lint` | rules | Enterprise prompt governance lint rules |
| `gpt-4o` | profile | OpenAI GPT-4o — multimodal, function-calling, structured output |
| `mistral` | profile | Mistral AI — instruction-following and multilingual tasks |
| `llama3` | profile | Meta Llama 3 — open-source / self-hosted / fine-tuning |
| `github-copilot` | profile | GitHub Copilot — IDE-embedded code generation |
| `devops-templates` | template | DevOps & SRE — runbooks, postmortems, CI/CD, on-call handoffs |
| `data-science-templates` | template | Data Science & ML — EDA, model eval, experiment design, model cards |
| `legal-compliance-templates` | template | Legal & Compliance — contracts, GDPR DPIA, regulatory gap analysis |
| `product-management-templates` | template | Product Management — PRD, user stories, OKRs, retros |
| `customer-support-templates` | template | Customer Support & Success — triage, escalation, KB articles |
| `ai-safety-context` | context | AI safety context pack for alignment-aware prompting |
| `responsible-ai-context` | context | Responsible AI — fairness, explainability, harm prevention |
| `regulated-industries-context` | context | Regulated industries — HIPAA, SOX, PCI-DSS, FCA/SEC |
**Enable registry packs via config:**
# .promptgenie.yaml
scanner:
rules_dirs:
- ~/.promptgenie/registry/packs # registry installs
- ./local-rules # project-local rules
enabled_rules: # whitelist — only run these codes
- OWASP_LLM01_001
- OWASP_LLM02_001
- SEC_SECRET # alias — expands to all SEC_SECRET_* sub-rules
linter:
rules_dirs:
- ~/.promptgenie/registry/packs
enabled_rules:
- ENT_001
- ENT_003
**Expiring allowlist entries** — time-limit exceptions with automatic re-activation after the expiry date:
scanner:
allowlist:
- phrase: "sk-ant-ci-placeholder"
rules:
- SEC_SECRET
expires: "2026-12-31"
reason: "CI placeholder — rotate before expiry, see ticket #456"
**List available packs:**
promptgenie pack list
**Preview a pack's rendered content:**
promptgenie pack show react-supabase-app
promptgenie pack show react-supabase-app --mode exhaustive
**Generate a prompt with a pack injected:**
promptgenie generate "refactor the auth module" \
--target claude-code \
--pack react-supabase-app \
--mode exhaustive
The pack is rendered at the same depth as the prompt mode and injected into the Context section automatically.
**Inject a pack into an existing prompt file:**
promptgenie pack inject my-prompt.md react-supabase-app
promptgenie pack inject my-prompt.md react-supabase-app --out enriched-prompt.md
**Create your own pack:**
promptgenie pack init my-project --name "My App" --description "Next.js + Prisma SaaS"
# Edit the generated file at promptgenie/context-packs/my-project.yaml
**Pack file format:**
name: react-supabase-app
description: "React + Supabase SaaS application"
stack:
- React 18 + TypeScript
- Supabase (auth, database, storage)
- Tailwind CSS + shadcn/ui
architecture:
- SPA with React Router v6
- Supabase RLS for all data access
coding_style:
- Functional components only
- Custom hooks for all data fetching
forbidden_changes:
- Do not modify Supabase migration files directly
- Do not disable Row-Level Security on any table
known_pitfalls:
- RLS policies must be updated when adding new tables
- Edge functions have a cold start — avoid for latency-sensitive paths
terminology:
workspace: "Top-level organisational unit"
member: "A user who belongs to a workspace"
preferred_output_format: "TypeScript with explicit return types"
**Render modes:**
| Mode | Sections included |
|---|---|
| `minimal` | Stack only |
| `standard` | Stack, architecture, coding style, terminology |
| `exhaustive` | All sections including forbidden changes and known pitfalls |
**Included starter packs:**
| ID | Stack |
|---|---|
| `react-supabase-app` | React 18 + TypeScript + Supabase + Tailwind CSS |
| `django-rest-api` | Django 5 + DRF + PostgreSQL + Celery |
| `cyber-security-team` | Python + Splunk + Sigma + AWS + Burp Suite |
Packs are stored in `promptgenie/context-packs/*.yaml` and can be committed alongside your code.
### `ci`
Add prompt quality gates to any project in one command. Scaffolds a GitHub Actions workflow and pre-commit hooks that automatically run lint, scan, and test on prompt files.
**Set up CI in any project:**
cd my-project
promptgenie ci init
Creates three files if they don't already exist:
| File | Purpose |
|---|---|
| `.github/workflows/prompt-check.yml` | GitHub Actions — 3 parallel jobs: lint, scan, test |
| `.pre-commit-config.yaml` | Pre-commit hooks for staged `.prompt.md` and test files |
| `.promptignore` | Glob patterns to exclude from lint/scan checks |
The main `ci.yml` runs 5 parallel jobs: `test` (Python 3.10–3.12, coverage ≥85%), `lint` (ruff, mypy), `security` (bandit, pip-audit), `vscode-extension` (npm ci, audit, compile, lint), and `build` (wheel smoke test).
**Check what's active:**
promptgenie ci status
╭──────────────────────────────────────────────┬──────────╮
│ Integration │ Status │
├──────────────────────────────────────────────┼──────────┤
│ GitHub Actions (prompt-check.yml) │ ✓ Active │
│ Pre-commit hooks (.pre-commit-config.yaml) │ ✓ Active │
│ .promptignore exclusion file │ ✓ Active │
│ Git repository │ ✓ Active │
╰──────────────────────────────────────────────┴──────────╯
**GitHub Actions behaviour:**
The workflow triggers on any push or pull request touching `.md`, `.prompt-test.yaml`, or `.workflow.yaml` files and runs three parallel jobs:
| Job | Command | Fails on |
|---|---|---|
| `prompt-lint` | `promptgenie lint` per file | Any HIGH severity issue |
| `prompt-scan` | `promptgenie scan` per file | Any HIGH or CRITICAL finding |
| `prompt-test` | `promptgenie test` per suite | Any assertion failure |
**Pre-commit hooks:**
pip install pre-commit && pre-commit install
# Hooks run automatically on every git commit
Hooks check staged `.prompt.md` files with lint and scan, and staged `.prompt-test.yaml` files with test — before the commit lands.
**`.promptignore`:**
# Exclude these paths from lint/scan
README.md
CHANGELOG.md
docs/**
**Options:**
| Flag | Description |
|---|---|
| `--dir` | Target directory (default: current directory) |
### `list-targets`
Show all available model profiles.
promptgenie list-targets
### `list-templates`
Show all available prompt templates.
promptgenie list-templates
### `validate`
Validate YAML config files against their schema. Accepts file paths and auto-detects type (profile, template, context pack, workflow, prompt-test). Use `--all` to validate all built-in files.
# Validate a single file
promptgenie validate my-profile.yaml
# Validate a workflow
promptgenie validate examples/secure-login.workflow.yaml
# Validate all built-in profiles, templates, and context packs
promptgenie validate --all
Errors exit 1 (blocking). Warnings exit 0 (advisory — missing recommended fields, unknown keys).
### `validate-profiles`
Validate all profile YAML files against the profile schema. Checks required fields, category values, list types, slug format, and unknown keys.
# Validate built-in profiles
promptgenie validate-profiles
# Validate profiles in a custom directory
promptgenie validate-profiles --dir ./my-profiles
# Suppress advisory warnings (errors still shown)
promptgenie validate-profiles --no-warnings
### `doctor`
Run a self-check to verify your PromptGenie installation, environment, and provider credentials. Each check prints a pass (✓), warning (⚠), or failure (✗) with a one-line remediation hint.
# Rich terminal output (default)
promptgenie doctor
# JSON output — machine-readable, includes schema_version: "1.0"
promptgenie doctor --format json
# Pipe to jq to check a specific group
promptgenie doctor --format json | jq '.groups[] | select(.title=="Providers")'
**What it checks:**
| Group | Checks |
|---|---|
| Runtime | Python ≥ 3.10, `promptgenie` package version |
| Configuration | `.promptgenie.yaml` config, policy files, `NO_COLOR`/`FORCE_COLOR` env vars |
| Optional extras | `anthropic` (benchmark), `tiktoken` (tokenizer) |
| Providers | `ANTHROPIC_API_KEY`, `OPENAI_API_KEY`, local Ollama reachability |
| Shell completion | Per-shell installation state |
Hard failures exit 1; optional warnings exit 0.
### `completion`
Install, inspect, or manage tab-completion for your shell.
# Install for your shell (writes script + updates RC file)
promptgenie completion install zsh
promptgenie completion install bash
promptgenie completion install fish
# Print the script without installing
promptgenie completion show zsh
# Check what's installed and where
promptgenie completion status
# Rebuild the dynamic completion cache (targets, templates, packs)
promptgenie completion refresh-cache
After installing, restart your shell or source the RC file:
source ~/.zshrc # zsh
source ~/.bashrc # bash
exec fish # fish
The dynamic completion cache is stored at `~/.cache/promptgenie/completions.json` and includes all available `--target`, `--template`, and context pack names for instant tab-completion.
### Variable resolver
`generate` detects `{{variable}}` placeholders in generated prompts and resolves them from multiple sources.
**Placeholder syntax:**
| Syntax | Meaning |
|---|---|
| `{{name}}` | Required string variable, no default |
| `{{name:type}}` | Typed variable (`string`, `int`, `float`, `bool`, `secret`) |
| `{{name:type:default}}` | Optional variable with inline default |
**Resolution order** (highest priority first):
1. `--var key=value` CLI flag
2. `--vars file.yaml` values file
3. `PG_
` environment variable
4. Interactive `click.prompt` (unless `--no-input`)
5. Inline default from the placeholder
6. `VarResolutionError` → exits 2
# Resolve from CLI flags
promptgenie generate "deploy {{service}} to {{env:string:staging}}" \
--target claude-code --var service=api --var env=prod
# Resolve from a YAML file
promptgenie generate "review {{component}}" --vars vars.yaml
# Schema with types, required, allowed_values
promptgenie generate "scan {{target_env}}" \
--vars-schema schema.yaml --no-input
# Pipe-friendly — never prompt, exit 2 if unresolved
cat prompt-template.md | \
promptgenie generate "{{task}}" --var task="auth refactor" --no-input
**Schema YAML** (`--vars-schema schema.yaml`):
variables:
env:
type: string
required: true
allowed_values: [prod, staging, dev]
description: "Target deployment environment"
token:
type: secret
required: true
count:
type: int
default: 5
required: false
## Phase 2 — PromptSpec and Run Engine
### `spec`
Manage declarative PromptSpec YAML files — the portable unit of a prompt execution.
# Scaffold a new spec
promptgenie spec init code-review --target claude-code
promptgenie spec init deploy-check --target ollama --out specs/deploy.yaml
# Validate structure
promptgenie spec validate my-prompt.yaml
promptgenie spec validate my-prompt.yaml --format json | jq '.errors'
# Preview the assembled prompt without calling any provider
promptgenie spec render my-prompt.yaml --var env=prod
promptgenie spec render my-prompt.yaml --format json | jq .prompt
# Print the JSON Schema (pipe to tools, import into editors)
promptgenie spec schema
promptgenie spec schema --format yaml
**PromptSpec fields:**
version: 1 # must be 1
name: code-review # human-readable name
target: claude-code # target profile
template: agentic-task # optional named template
mode: chat # chat | completion | agentic
prompt: | # inline prompt (or use template:)
Review {{component}} in {{env}}.
vars: # inline variable defaults
env: staging
context: # context sources (assembled before sending)
- type: git_diff
- type: glob
pattern: "src/**/*.py"
max_bytes: 32768
policy: # policy gate
- no-secrets
provider: anthropic # optional provider override
model: claude-opus-4-5 # optional model override
system_prompt: | # injected system prompt
You are a senior code reviewer.
output_contract:
format: markdown # text | json | yaml | markdown | code
max_tokens: 2048
run:
stream: true
timeout: 120
require_clean: true # abort if git tree is dirty
no_history: false
### `run`
Execute a PromptSpec end-to-end: resolve vars → build context → security gate → render → send to provider → stream response → persist run.
# Basic run
promptgenie run my-prompt.yaml
# Dry run — resolve vars and build context without calling provider
promptgenie run my-prompt.yaml --dry-run --show-context
# Override provider and model
promptgenie run my-prompt.yaml --provider ollama --model llama3 --stream
# Pass variables
promptgenie run my-prompt.yaml --var env=prod --var component=auth
promptgenie run my-prompt.yaml --vars prod.yaml
# Write response to file while streaming to stdout
promptgenie run my-prompt.yaml --tee response.md
# Machine-readable NDJSON event stream
promptgenie run my-prompt.yaml --format ndjson
promptgenie run my-prompt.yaml --format ndjson | jq 'select(.event=="done")'
# Abort if working tree is dirty
promptgenie run my-prompt.yaml --require-clean
# Never prompt for variables — fail if any are unresolved
promptgenie run my-prompt.yaml --no-input --var env=prod
**Flags:**
| Flag | Description |
|---|---|
| `--dry-run` | Resolve vars + build context; no provider call |
| `--stream / --no-stream` | Streaming or non-streaming |
| `--require-clean` | Abort if git working tree is dirty |
| `--provider NAME` | Override configured provider |
| `--model NAME` | Override model (e.g. `gpt-4o`, `llama3`) |
| `--timeout N` | Abort provider call after N seconds |
| `--no-history` | Skip run persistence |
| `--var KEY=VAL` | Inline variable (repeatable) |
| `--vars FILE` | YAML/JSON variable file |
| `--max-context-tokens N` | Context token budget |
| `--context-strategy` | `manual` \| `newest` \| `smallest` \| `git-relevant` |
| `--trust` | Trust this spec's context sources without prompting (records the spec as trusted) |
| `--allow-url` | Permit URL-type context sources (HTTPS-only; SSRF-protected with IP pinning) |
| `--allow-insecure-url` | Also permit plain `http://` URL sources (emits a security warning; default blocked) |
| `--allow-sensitive-env` | Permit credential-like env vars in `env` context sources (emits a warning) |
| `--allow-secrets` | Downgrade secrets gate from hard-block to warning (use only in controlled CI environments) |
| `--tee FILE` | Write response to file while streaming |
| `--format text\|ndjson` | NDJSON emits `start/token/warning/error/done` events |
| `--show-context` | Print context manifest before sending |
Run history is persisted to `~/.local/share/promptgenie/runs/` (files `0600`, with secrets redacted).
### `context build`
Assemble context from multiple sources into a single text block (for inspection or piping into other tools).
# Assemble all Python files under src/
promptgenie context build --glob "src/**/*.py" --max-tokens 8000
# Include git diff + staged changes
promptgenie context build --git-diff --git-staged
# Write to file
promptgenie context build --file README.md --out context.md
# JSON output with source manifest
promptgenie context build --git-diff --format json | jq '.manifest'
# Pipe stdin
git diff | promptgenie context build --stdin
# Manifest only (no text body)
promptgenie context build --glob "**/*.py" --manifest-only
**Source types:**
| Type | Flag | Description |
|---|---|---|
| `file` | `--file PATH` | Single file |
| `glob` | `--glob PATTERN` | File glob (e.g. `src/**/*.py`) |
| `stdin` | `--stdin` | Read from stdin |
| `env` | *(via spec only)* | Environment variable value |
| `cmd` | `--cmd "COMMAND"` | Shell command stdout |
| `git_diff` | `--git-diff` | `git diff` output |
| `git_staged` | `--git-staged` | `git diff --staged` output |
| `url` | `--url URL` | HTTP GET (requires `--allow-url`) |
Add `.promptignore` to your repo to exclude files from glob/file sources (same syntax as `.gitignore`).
### `provider`
Manage AI provider configurations stored at `~/.config/promptgenie/providers.yaml`.
# List all configured providers
promptgenie provider list
promptgenie provider list --format json
# Add Ollama (local — no API key)
promptgenie provider add ollama \
--base-url http://localhost:11434/v1 \
--model llama3 --local
# Add LM Studio
promptgenie provider add lm-studio \
--base-url http://localhost:1234/v1 \
--model local-model --local
# Add a custom OpenAI-compatible endpoint (vLLM, LocalAI, etc.)
promptgenie provider add my-vllm \
--type openai_compat \
--base-url http://gpu-server:8000/v1 \
--model mistral-7b --local
# Show provider details + capabilities
promptgenie provider show anthropic
promptgenie provider show ollama --format json
# Test reachability
promptgenie provider doctor ollama
promptgenie provider doctor anthropic
promptgenie provider doctor my-openai --format json
# Remove a provider
promptgenie provider remove old-provider --yes
Built-in defaults (active before `providers.yaml` exists):
| Name | Type | Endpoint |
|---|---|---|
| `anthropic` | Anthropic Messages API | `ANTHROPIC_API_KEY` |
| `openai` | OpenAI-compatible | `OPENAI_API_KEY` + `api.openai.com` |
| `ollama` | OpenAI-compatible (local) | `http://localhost:11434/v1` |
| `hermes` | OpenAI-compatible (NousResearch) | `NOUS_API_KEY` + `inference-api.nousresearch.com/v1` |
Install optional extras for full provider support:
pip install "promptgenie[providers]" # httpx + anthropic SDK
### Hermes (NousResearch)
PromptGenie ships first-class support for the **NousResearch Hermes** model family — both a target profile (for authoring/linting prompts) and a built-in provider (for executing them). No `provider add` step is needed; just supply an API key.
**1. Get an API key.** Create one in the [Nous Portal](https://portal.nousresearch.com) and export it:
export NOUS_API_KEY=sk-...
The built-in `hermes` provider is OpenAI-compatible and points at the Nous Portal (`https://inference-api.nousresearch.com/v1`), with `Hermes-4-405B` as the default model.
**2. Verify connectivity:**
promptgenie provider doctor hermes
promptgenie provider show hermes
**3. Author Hermes-tuned prompts** with the `hermes` target profile — it encodes ChatML / strong-system-role guidance, reliable JSON-mode and tool-calling, a 128k context window, and the external-guardrail security controls Hermes needs (it is highly steerable and lightly moderated):
# Generate (target auto-inferred from "hermes"/"nous", or pass --target)
promptgenie generate "extract action items from this transcript" --target hermes
# Adapt an existing prompt written for another model
promptgenie adapt prompts/review.md --from claude --to hermes
# Lint / score against the Hermes profile
promptgenie lint prompts/review.md
**4. Execute, benchmark, and evaluate** against Hermes:
# Run a PromptSpec end-to-end through Hermes
promptgenie run spec.yaml --provider hermes --stream
# Pick a specific Hermes variant
promptgenie run spec.yaml --provider hermes --model Hermes-4-70B
# Multi-model evaluation including Hermes (cost is estimated)
promptgenie evaluate prompts/review.md --models hermes,claude,gpt-4o
**Custom endpoint or model.** If you serve Hermes elsewhere (OpenRouter, Together, a self-hosted vLLM, etc.), override the defaults — your `providers.yaml` entry wins over the built-in default:
promptgenie provider add hermes \
--type openai_compat \
--base-url https://openrouter.ai/api/v1 \
--model nousresearch/hermes-4-405b \
--api-key-env OPENROUTER_API_KEY
### `vars`
Inspect variable resolution for a PromptSpec.
# List all {{variable}} placeholders in a spec
promptgenie vars list my-prompt.yaml
# Inspect how each variable would resolve (shows source: cli/file/env/default)
promptgenie vars inspect my-prompt.yaml
promptgenie vars inspect my-prompt.yaml --var env=prod --redacted
promptgenie vars inspect my-prompt.yaml --vars prod.yaml --format json
## Target Profiles
| ID | Name | Category |
|---|---|---|
| `claude` | Claude | General assistant |
| `claude-code` | Claude Code | Agentic coding |
| `chatgpt` | ChatGPT | General assistant |
| `cursor` | Cursor | IDE coding |
| `gemini` | Gemini | General assistant / multimodal |
Each profile defines required sections, forbidden patterns, stop conditions, security controls, and a default output format. Stored in `promptgenie/profiles/*.yaml`.
## Templates
| ID | Name | Category |
|---|---|---|
| `agentic-task` | Agentic Task Brief | Coding |
| `threat-model` | Threat Model | Security |
| `secure-code-review` | Secure Code Review | Security |
| `soc-triage` | SOC Alert Triage | Security operations |
| `pentest` | Penetration Test Plan | Security |
| `iac-review` | IaC Security Review | Security |
| `prompt-injection-test` | Prompt Injection Test Suite | Security |
Stored in `promptgenie/templates/*.yaml`.
## Quality Score
Every generated prompt is scored across 7 dimensions:
| Dimension | What it measures |
|---|---|
| Target Fit | Required sections present for the target tool |
| Task Clarity | Absence of vague verbs and ambiguous framing |
| Context Sufficiency | Enough context for the model to act without guessing |
| Output Contract | Output format explicitly defined |
| Safety Controls | Stop conditions, forbidden actions, constraints present |
| Token Efficiency | Prompt length relative to complexity |
| Testability | Acceptance criteria or success definition present |
Score of 80+ is considered production-ready. Below 60 triggers lint warnings automatically.
## Project structure
promptgenie/
├── cli.py # Click group + command registration
├── commands/
│ ├── generate.py # generate command
│ ├── lint.py # lint command
│ ├── scan.py # scan command
│ ├── diff.py # diff command
│ ├── adapt.py # adapt command
│ ├── test.py # test command
│ ├── benchmark.py # benchmark command
│ ├── workflow.py # workflow command
│ ├── ci.py # ci group (init, status)
│ ├── pack.py # pack group (list, show, inject, init, search, install, update, dirs)
│ ├── policy.py # policy command — CI gate (exit 0/1/2)
│ ├── targets.py # list-targets, list-templates
│ └── interactive.py # guided interactive menu mode
├── renderers/
│ └── rich.py # console, color constants, shared formatting helpers
├── core/
│ ├── generator.py # Prompt builder, scoring, token estimation
│ ├── linter.py # Lint rules engine
│ ├── scanner.py # Security scanner
│ ├── differ.py # Diff engine — token, score, section, risk delta
│ ├── adapter.py # Adapt engine — cross-profile prompt translation
│ ├── tester.py # Test runner — declarative prompt unit tests
│ ├── benchmarker.py # Benchmark engine — model calls, rubric scoring, cost
│ ├── context_packs.py # Context pack engine — load, render, inject, init
│ ├── workflow.py # Workflow engine — staged prompt chains
│ ├── ci.py # CI scaffolder — GitHub Actions + pre-commit
│ ├── config.py # .promptgenie.yaml config loader
│ ├── registry.py # Pack registry — remote index, install, update, rule loading
│ ├── input_handler.py # Multi-file collector — files, dirs, zips; zip-slip protection; byte/file caps
│ ├── llm_analyzer.py # Opt-in LLM semantic analysis; pre-send secret redaction; privacy mode
│ └── formatters.py # Structured output — JSON and SARIF v2.1.0; multi-file aggregation
├── registry/
│ ├── index.yaml # Built-in registry index (14 packs)
│ └── packs/
│ ├── owasp-llm-top10.yaml # OWASP LLM Top 10 scanner rules
│ ├── enterprise-lint.yaml # Enterprise governance lint rules
│ ├── gpt-4o.yaml # OpenAI GPT-4o profile
│ ├── mistral.yaml # Mistral AI profile
│ ├── llama3.yaml # Meta Llama 3 profile
│ ├── github-copilot.yaml # GitHub Copilot profile
│ ├── devops-templates.yaml # DevOps & SRE templates
│ ├── data-science-templates.yaml # Data Science & ML templates
│ ├── legal-compliance-templates.yaml # Legal & Compliance templates
│ ├── product-management-templates.yaml # Product Management templates
│ ├── customer-support-templates.yaml # Customer Support templates
│ ├── ai-safety-context.yaml # AI safety context pack
│ ├── responsible-ai-context.yaml # Responsible AI context pack
│ └── regulated-industries-context.yaml # Regulated industries context
├── profiles/
│ ├── claude.yaml
│ ├── claude-code.yaml
│ ├── chatgpt.yaml
│ ├── cursor.yaml
│ └── gemini.yaml
├── templates/
│ └── cyber_templates.yaml # 7 security and coding templates
├── context-packs/
│ ├── react-supabase-app.yaml # React + Supabase SaaS
│ ├── django-rest-api.yaml # Django + DRF + PostgreSQL
│ └── cyber-security-team.yaml # Security engineering team
├── examples/
│ ├── auth-refactor.md # Example prompt
│ ├── auth-refactor.prompt-test.yaml # Example test suite
│ └── secure-login.workflow.yaml # Example 6-step workflow
├── .github/
│ └── workflows/
│ └── prompt-check.yml # GitHub Actions — lint, scan, test
├── .github/
│ ├── CODEOWNERS # Code ownership (all files → @mylesagnew)
│ └── workflows/
│ ├── ci.yml # Pytest (3.10–3.12), ruff, mypy, bandit, pip-audit, build
│ ├── prompt-check.yml # Lint, scan, and test prompt files on every PR
│ └── release.yml # Tag-triggered PyPI publish + SBOM + GitHub Release
├── .pre-commit-config.yaml # Pre-commit hooks
├── .promptgenie.yaml.example # Example project config (rules, suppressions, overrides)
├── SECURITY.md # Vulnerability reporting and scanner limitations
├── CONTRIBUTING.md # Contributor guide, rule authoring, profile/template schema
├── CHANGELOG.md # Version history
├── ROADMAP.md # Product roadmap — 5 phases, top-10 features, architecture principles
├── vscode-extension/ # VS Code / Cursor extension
│ ├── package.json # Extension manifest (commands, settings, activation events)
│ ├── package-lock.json # Locked npm dependencies (required for npm ci in CI)
│ ├── tsconfig.json
│ ├── src/
│ │ ├── extension.ts # Activate / deactivate, event wiring
│ │ ├── runner.ts # CLI subprocess wrapper (lint/scan → JSON)
│ │ ├── diagnostics.ts # LintOutput / ScanOutput → VS Code Diagnostics
│ │ ├── statusBar.ts # Score + issue count in the status bar
│ │ └── types.ts # TypeScript interfaces for CLI JSON output
│ └── README.md # Extension-specific docs
└── pyproject.toml # Modern packaging, coverage gate, dev dependency groups
## Roadmap
**Strategic position:** PromptGenie is the secure, terminal-native prompt engineering workbench for developers and DevOps teams — not just a prompt generator.
Prompt lifecycle: **Author → Render → Lint → Scan → Test → Run → Evaluate → Diff → Gate → Audit**
### Shipped (v1.0.x)
- [x] `generate`, `lint`, `scan`, `diff`, `adapt`, `test`, `benchmark`, `workflow`, `interactive`, `policy`, `validate`, `pack`, `ci` — full command surface
- [x] Multi-file / directory / zip scanning with zip-slip protection; opt-in LLM semantic analysis (`--llm`) with pre-send secret redaction
- [x] Context packs, workflow mode, plugin registry (14 packs), OWASP LLM Top 10 rules, enterprise lint rules
- [x] GitHub Actions CI (`ci.yml`): pytest 3.10–3.12, coverage ≥85%, ruff, mypy, bandit, pip-audit, VS Code extension CI, build + wheel smoke test
- [x] SARIF output on lint, scan, and policy for GitHub Code Scanning upload
- [x] Policy-as-code: `policy` command with `--max-risk`, `--min-score`, `--format sarif`, expired allowlist reporting
- [x] Registry hardening: SHA-256 checksums required, HTTPS-only, 1 MiB download cap, fail-closed YAML parsing
- [x] VS Code / Cursor extension: inline diagnostics, status bar score, command palette
- [x] SBOM, release provenance, CodeQL, OpenSSF Scorecard, Dependabot
- [x] 1,273 tests · 85%+ coverage · 0 ruff issues · 0 mypy errors
### Phase 1 — Terminal and Pipeline Foundations
- [x] Universal stdin/stdout — `-` sentinel on `lint`, `scan`, `diff`, `adapt`; `safe_read_text("-")` reads stdin with same size guard; `` label in all output formats
- [x] Stable structured output — `schema_version: "1.0"` on all JSON outputs; `diag_console` (stderr) for diagnostics; `is_structured_mode()` suppresses banners in JSON/SARIF/YAML/NDJSON modes
- [x] Strict exit code contract — `0` OK · `1` failure · `2` usage · `3` provider · `4` template · `5` test · `6` secrets · `7` timeout · `130` interrupted; `PromptGenieError(code, hint)`; `handle_error()` writes to stderr; SIGINT → 130
- [x] Shell completion — `promptgenie completion install zsh|bash|fish`; `show`, `status`, `refresh-cache`; dynamic cache at `~/.cache/promptgenie/completions.json`
- [x] `promptgenie doctor` — Python version, config, optional extras, provider keys, Ollama, shell completion; remediation hints; `--format json` with `schema_version: "1.0"`
- [x] Side-by-side diff — `diff --side-by-side` Rich two-column table; semantic section matching; `diff --format json|yaml|markdown`
- [x] Renderer profiles — `ColorMode` (auto|always|never); `--color` global flag; `NO_COLOR`/`FORCE_COLOR` env vars; `diag_console` separates data from diagnostics; `init_renderer()` wired into CLI group
- [x] Interactive variable resolver — `{{name}}`, `{{name:type:default}}` placeholders; `--var`, `--vars`, `--vars-schema`, `--no-input` on `generate`; env `PG_`; secret masking; type coercion; `VarResolutionError` exits 2
### Phase 2 — PromptSpec and Run Engine
- [x] Declarative PromptSpec YAML/JSON — `version: 1` with `name`, `target`, `template`, `mode`, `vars`, `context`, `policy`, `provider`, `model`, `output_contract`, `run`; JSON Schema at `promptgenie/schemas/promptspec.schema.json`; `spec init/render/validate/schema`
- [x] `promptgenie run` — load spec → resolve vars → build context → secrets gate → render → send to provider → stream response → persist run; `--dry-run`, `--stream`, `--require-clean`, `--provider`, `--model`, `--timeout`, `--no-history`, `--tee`, `--format ndjson`
- [x] Streaming response mode — `asyncio`-based; NDJSON events (`start/token/warning/error/done`); `--tee output.md` writes assembled response to file; `--format ndjson` for piping
- [x] Variable files and env binding — `--vars prod.yaml`, `--var k=v`, `--env-prefix PG_`; secret masking; `vars list` + `vars inspect --redacted` shows source per variable
- [x] Context builder — 8 source types: `file`, `glob`, `stdin`, `env`, `cmd`, `git_diff`, `git_staged`, `url`; `.promptignore`; 4 strategies; SHA-256 + token estimates; `context build` command
- [x] Provider abstraction — `BaseProvider` with `async complete()` + `stream()`; `ProviderCapabilities`; `AnthropicProvider` + `OpenAICompatProvider`; config at `~/.config/promptgenie/providers.yaml`
- [x] `promptgenie provider add/list/remove/show/doctor` — first-class Ollama/local provider management; `provider doctor` probes reachability
### Phase 3 — SecDevOps Guardrails ✅
- [x] `promptgenie analyze` — aggregate `lint + scan + policy + custom rules`; unified OWASP-aligned finding model; SARIF/JSON/Rich output
- [x] Policy-as-code v2: `--policy promptgenie.policy.yaml`; `--explain` mode; `external_model_send` gate; policy discovery chain; SARIF multi-run output
- [x] Data leakage detector: JWTs, database URLs, internal hostnames, emails, phone numbers, credit cards, SSNs; `promptgenie redact`; `[REDACTED:LABEL]` placeholders; `--diff`
- [x] `promptgenie redteam` — 13 OWASP LLM Top 10 attack packs; offline heuristic susceptibility judge; `--categories`, `--fail-on-susceptible`
- [x] Local-first routing policy: `RoutingConfig`; condition rules (`contains_secrets`, `classification ==`, `*`); `routing.default` fallback
- [x] Credential management: `promptgenie auth login|logout|status`; keyring, env, 1Password, AWS SSM, GCP Secret Manager, Azure Key Vault; `ref:` pointer resolution at runtime
- [x] Audit log: `promptgenie audit list|show|export|verify`; SQLite; SHA-256 tamper-evident hash chain; JSON/CSV/NDJSON export
- [x] Air-gapped mode: `security.airgap: true` in config; blocks all external provider calls; local providers (Ollama) still work
### Phase 4 — Evaluation and Regression Testing ✅
- [x] Multi-model matrix evaluation: `--models claude,gpt-4.1,ollama/llama3.1`; `asyncio` parallel with semaphore; per-model latency/cost/safety/rubric metrics; `--runs N`
- [x] Eval suites: `promptgenie eval init|run|compare|approve`; 11 assertion types: `contains`, `regex_match`, `json_path`, `semantic_similarity`, `judge_rubric`, `refuses_instruction_override`, and more; snapshot store at `evals/.snapshots/`
- [x] Baseline regression gates: `--save-baseline`, `--compare --fail-on-regression`; per-metric thresholds; exits `EXIT_REGRESSION = 8` on breach
- [x] GitHub Actions native reporter: `::error`/`::warning` annotations; Markdown step summary; SARIF 2.1.0 upload; auto-detected via `GITHUB_ACTIONS`
- [x] Changed-prompt detection: `--changed`; `git diff --name-only`; dependency-aware (template → dependents, policy → all specs)
### Phase 5½ — Event Infrastructure and Workspace Schema ✅
- [x] Unified Event model (`EventKind`, `Event`, `EventBus`, `EventFormatter`) — typed pub/sub for every lifecycle moment; NDJSON serialisation; `run_spec()` `event_bus=` kwarg; backward-compatible with `on_token=`/`on_event=` callbacks (v1.6.0)
- [x] Four built-in `EventFormatter` implementations: `NDJSONFormatter`, `TokenOnlyFormatter`, `RichFormatter`, `SilentFormatter` — `@runtime_checkable` Protocol for custom formatters (v1.6.0)
- [x] Policy command hardening: `max_risk` gate scoped to scan findings only (not lint); expired allowlist warnings in text/JSON/SARIF output; threshold detail in violation messages (v1.6.0)
- [x] `promptgenie/schemas/workspace.schema.json` — JSON Schema (Draft 2020-12) for `.promptgenie.yaml`; `additionalProperties: false` at every level; VS Code `yaml-language-server` compatible (v1.7.0)
- [x] `WorkspaceConfig` + `DefaultsConfig` dataclasses on `PromptGenieConfig`; `load_config()` parses `workspace:` and `defaults:` blocks (v1.7.0)
- [x] `validate_workspace_config()` — pure-Python structural validator; no `jsonschema` dep; catches unknown keys, type errors, bad enums, ISO date formats, missing required fields (v1.7.0)
- [x] `config validate` — CI-safe schema validation command; exits 0/1/2; `--format json` for machine-readable output (v1.7.0)
- [x] `config init` — scaffold `.promptgenie.yaml` with `$schema` pointer and `yaml-language-server` comment; `--name`, `--force` (v1.7.0)
### Phase 5 — Advanced TUI and Ecosystem ✅
- [x] Full-screen Textual TUI: `promptgenie tui`; file-tree navigator, Markdown editor, findings panel, score/token/provider status bar; `Ctrl+S/R/L/D/T/Q` bindings; graceful degradation without `textual`
- [x] Guided prompt wizard: `promptgenie wizard`; 8-step Q&A → PromptSpec YAML + rendered Markdown; `--out`, `--spec-out`, `--no-spec`
- [x] Smart command palette: `promptgenie palette`; Textual fuzzy finder across commands, templates, context packs, and recent history; readline fallback; `--print-only` for shell piping
- [x] Prompt history: `promptgenie history list|show|diff|replay|export|clear`; SQLite; SHA-256 content-hash deduplication; `--search`, `--provider`, `--status` filters
- [x] Watch mode: `promptgenie watch`; `watchfiles` optional extra with polling fallback; `--debounce`; debounced Rich `Live` dashboard
- [x] Template command group: `promptgenie template list|show|render|validate|new|edit`; layered resolution (project → user → built-in); `$EDITOR` integration; re-validates after `edit`
- [x] Prompt lockfiles: `promptgenie lock`; SHA-256 hashes of spec, template, policy, context sources, provider/model; `--check` for CI; `--strict` for missing optional files
- [x] Plugin SDK: 5 entry-point groups (`promptgenie.providers`, `.rules`, `.renderers`, `.context_sources`, `.evaluators`); `plugin list|doctor|scaffold|install`
### Phase 6 — Governance, SSO, and Cloud Sync *(planned)*
- [ ] Team policy server — central policy fetch on every run; org-wide `disabled_rules`, allowlists, routing rules; policy version pinned in lockfile
- [ ] SSO / OIDC credential binding — `promptgenie auth login --sso`; OIDC device flow; per-user audit attribution; `PROMPTGENIE_TOKEN` env var for CI
- [ ] Prompt registry — `promptgenie registry push|pull`; versioned, signed, searchable; OCI-compatible layout
- [ ] Remote eval runners — offload matrix evaluations to a cloud runner pool; cost and latency budgets enforced server-side
- [ ] `promptgenie fmt` — normalise Markdown prompt files and PromptSpec YAML; heading order, key sort, trailing whitespace; `--check` exits 1 if formatting would change (CI-safe)
- [ ] `promptgenie make` — YAML task graph (`promptgenie.make.yaml`); `--changed` filtering; `--parallel N`; compatible with Make, just, Taskfile
## Configuration
Place a `.promptgenie.yaml` file in your project root (or any parent directory). All commands auto-discover and load it. Run `promptgenie config init` to scaffold one with the JSON Schema pointer and editor autocomplete pre-wired.
**Full schema:** `promptgenie/schemas/workspace.schema.json` (Draft 2020-12). All sections enforce `additionalProperties: false` — typos in key names are caught by `config validate`.
# yaml-language-server: $schema=https://promptgenie.dev/schemas/workspace.schema.json
$schema: "https://promptgenie.dev/schemas/workspace.schema.json"
# Project-level metadata (optional — used in policy server and audit trail)
workspace:
name: "my-project"
version: "1.0"
team: "platform-eng"
description: "Prompt engineering workspace for the payments API."
policy: ".promptgenie-policy.yaml" # default policy file
# Workspace-wide defaults — overridden by --provider/--model/--target CLI flags
defaults:
provider: anthropic
model: claude-opus-4-5
target: claude-code
scanner:
# Allowlist entries suppress findings whose *matched text* contains the phrase.
# Suppression is scoped to the finding's match — not the whole prompt.
# Simple string: suppresses any finding whose matched text contains this phrase.
allowlist:
- "example-token-for-docs"
# Scoped object: suppress only specific rule codes when the phrase is matched.
# Safer — won't accidentally suppress unrelated findings on the same line.
# - phrase: "known-safe-deploy"
# rules:
# - PERM_005
# Expiring suppression — automatically deactivates after the ISO date.
# Use for time-limited exceptions (CI placeholders, short-lived tokens).
# - phrase: "sk-ant-ci-placeholder"
# rules:
# - SEC_SECRET
# expires: "2026-12-31"
# reason: "CI placeholder — rotate before expiry, see ticket #456"
# Disable specific rule codes entirely (no phrase check needed)
disabled_rules:
- SEC_007
# Whitelist mode — ONLY run these rule codes (takes precedence over disabled_rules)
# Use SEC_SECRET to target all secret sub-rules at once (SEC_SECRET_AWS_KEY, SEC_SECRET_GITHUB, etc.)
# enabled_rules:
# - SEC_SECRET
# - OWASP_LLM01_001
# Override the default risk level for a rule
severity_overrides:
PERM_005: CRITICAL
# Extra directories to load rule packs from (supports ~ expansion)
# Each *.yaml file is scanned for a scanner_rules key
# rules_dirs:
# - ~/.promptgenie/registry/packs
# - ./local-rules
linter:
# Disable specific lint rules
disabled_rules:
- TASK_003
# Whitelist mode — ONLY run these codes
# enabled_rules:
# - TASK_001
# - ENT_001
# Extra directories to load lint rule packs from
# rules_dirs:
# - ~/.promptgenie/registry/packs
# Add project-specific vague verbs beyond the built-in list
custom_vague_verbs:
- "tidy"
- "polish"
# Add custom lint rules (appended after built-in rules)
custom_rules:
- id: MY_LINT_001
category: custom
pattern: "refactor everything"
severity: HIGH
confidence: HIGH
message: "Overly broad refactor instruction."
suggestion: "Narrow the refactor to specific modules or files."
**Custom scanner rules** can also be added under `scanner.custom_rules`. Each rule requires `id`, `pattern`, `risk`, `confidence`, `message`, and `recommendation`. All patterns are validated at load time — syntax errors and nested quantifiers (ReDoS risk) raise `ValueError` and abort config loading:
scanner:
custom_rules:
- id: MY_SEC_001
category: custom
pattern: "disable (all )?logging"
risk: HIGH
confidence: MEDIUM
message: "Logging suppression detected — may hide audit trail."
recommendation: "Retain audit logs. Use log-level configuration instead."
false_positive_note: "May trigger on prompts about log configuration."
### Config CLI flags
The `scan`, `lint`, `generate`, `adapt`, and `workflow` commands all accept:
| Flag | Effect |
|---|---|
| `--config PATH` | Load a specific config file instead of auto-discovering `.promptgenie.yaml` |
| `--no-config` | Ignore any `.promptgenie.yaml`; run with default settings |
| `--best-effort` | Fall back to built-in defaults on missing profile, template, or config (fail-open) |
When a config file is loaded in rich output mode, its path is shown as a dim line before results. A missing or malformed `--config` file is a **fatal error** by default — pass `--best-effort` to fall back to defaults instead.
### Config management commands
# Scaffold a new .promptgenie.yaml with schema pointer
promptgenie config init
promptgenie config init --name "my-project" --force # overwrite existing
# Validate .promptgenie.yaml against the workspace schema
promptgenie config validate # exits 0 = valid, 1 = errors, 2 = not found
promptgenie config validate --format json # machine-readable for CI
promptgenie config validate --config path/to/file.yaml
# Show current effective config
promptgenie config show
promptgenie config show --format json
# Get / set individual keys
promptgenie config get security.airgap
promptgenie config set security.airgap true
promptgenie config set routing.default ollama
## Development
Dependencies are locked in `uv.lock`. Install [uv](https://docs.astral.sh/uv/) then:
git clone https://github.com/mylesagnew/promptgenie.git
cd promptgenie
uv sync --extra dev
**Run tests:**
uv run pytest tests/
**Lint and format:**
uv run ruff check promptgenie/
uv run ruff format promptgenie/
uv run mypy promptgenie
**Security checks:**
uv run bandit -r promptgenie/ -ll
uv run pip-audit --skip-editable --progress-spinner off
**Build:**
uv build
uv run --with twine twine check dist/*
**Generate SBOM:**
uv run cyclonedx-py environment --output-format json --outfile sbom.cyclonedx.json
**Releasing** (maintainers only):
1. Update `version` in `pyproject.toml` and add a `[X.Y.Z]` entry to `CHANGELOG.md`.
2. Run `uv lock` to update the lockfile.
3. Commit, push to `main`, then push a semver tag: `git tag vX.Y.Z && git push origin vX.Y.Z`.
4. The `release.yml` workflow runs the full gate, builds, publishes to PyPI via Trusted Publishing, generates GitHub artifact attestations, generates a CycloneDX SBOM, and creates a GitHub Release — all without a stored API token.
See [SECURITY.md](SECURITY.md) for the vulnerability reporting process and scanner limitations.
See [CONTRIBUTING.md](CONTRIBUTING.md) for the contributor guide, rule authoring docs, and profile/template schema reference.
See [CHANGELOG.md](CHANGELOG.md) for a full version history.
See [ROADMAP.md](ROADMAP.md) for the full product roadmap with implementation details, architecture principles, and the optional extras plan.
## License
MIT