Sakeeb91/specstress
GitHub: Sakeeb91/specstress
Stars: 0 | Forks: 0
# SpecStress
AI can produce code faster than humans can review it. The bottleneck moves to the spec —
and a weak spec makes bad code look correct. SpecStress treats every candidate spec as
hostile until proven otherwise.
## What it does
SpecStress takes a problem (signature + intent), a candidate spec written as a
property-based test, and a library of adversarial implementations. It runs each
implementation against the spec under Hypothesis and produces:
- a **mutation score** — fraction of known-bad implementations the spec catches
- a **diagnosis** — `STRONG`, `UNDERCONSTRAINED`, `OVERCONSTRAINED`, or `AMBIGUOUS`
- a downloadable **Markdown report** with counterexamples
- optionally, **Qwen3-suggested missing properties** (via [Tinker](https://thinkingmachines.ai/tinker/)) to turn a weak spec into a strong one
## Demo
python -m venv .venv
source .venv/bin/activate
pip install -r requirements.txt
streamlit run app.py
To enable the **Suggest stronger spec** button, export a Tinker API key:
export TINKER_API_KEY=tml-...
On Streamlit Cloud, paste the key under **Settings → Secrets** instead (see
`.streamlit/secrets.toml.example`).
Three demos ship with the tool:
| Demo | Function | Why it's interesting |
| --- | --- | --- |
| `sort` | `sort(xs)` | Weak "is sorted" spec accepts `[]`, `sorted(set(xs))`, `[0]*len(xs)` |
| `withdraw` | `withdraw(balance, amount)` | Weak "balance ≥ 0" spec accepts no-op and abs-amount mutants |
| `sanitize` | `sanitize(html)` | Weak `"