Sakeeb91/specstress

GitHub: Sakeeb91/specstress

Stars: 0 | Forks: 0

# SpecStress AI can produce code faster than humans can review it. The bottleneck moves to the spec — and a weak spec makes bad code look correct. SpecStress treats every candidate spec as hostile until proven otherwise. ## What it does SpecStress takes a problem (signature + intent), a candidate spec written as a property-based test, and a library of adversarial implementations. It runs each implementation against the spec under Hypothesis and produces: - a **mutation score** — fraction of known-bad implementations the spec catches - a **diagnosis** — `STRONG`, `UNDERCONSTRAINED`, `OVERCONSTRAINED`, or `AMBIGUOUS` - a downloadable **Markdown report** with counterexamples - optionally, **Qwen3-suggested missing properties** (via [Tinker](https://thinkingmachines.ai/tinker/)) to turn a weak spec into a strong one ## Demo python -m venv .venv source .venv/bin/activate pip install -r requirements.txt streamlit run app.py To enable the **Suggest stronger spec** button, export a Tinker API key: export TINKER_API_KEY=tml-... On Streamlit Cloud, paste the key under **Settings → Secrets** instead (see `.streamlit/secrets.toml.example`). Three demos ship with the tool: | Demo | Function | Why it's interesting | | --- | --- | --- | | `sort` | `sort(xs)` | Weak "is sorted" spec accepts `[]`, `sorted(set(xs))`, `[0]*len(xs)` | | `withdraw` | `withdraw(balance, amount)` | Weak "balance ≥ 0" spec accepts no-op and abs-amount mutants | | `sanitize` | `sanitize(html)` | Weak `"