uppalapatipurnaraviteja99/threat-researcher
GitHub: uppalapatipurnaraviteja99/threat-researcher
Stars: 0 | Forks: 0
# ThreatResearcher
Multi-agent system that writes threat intel briefs on CVEs. Built with LangGraph + Tavily.
I built this because I wanted to learn LangGraph and the multi-agent pattern, and threat briefs were a good fit — they need decomposition (planning), web search (research), and fact-checking (the critic).
## How it works
Three agents, one workflow:
1. Planner takes the question and splits it into 3-5 sub-questions
2. Researcher hits Tavily for each one and drafts a brief with inline citations
3. Critic fact-checks every claim against the sources. If anything is ungrounded, it sends the work back to the Researcher (max 2 retries)
## Why the Critic got rewritten
First version of the Critic was a 2-line prompt that just asked "is this brief good? answer GOOD or BAD". Predictably, it said GOOD to everything. On a CVE-2024-3400 brief that had multiple hallucinated facts, it caught zero.
Rewrote it with a Pydantic schema that forces the LLM to list specific ungrounded claims one by one. On the same brief, it caught 8.
Lesson: free-text "looks good?" prompts are useless for fact-checking. You need structured output and you have to make the model do the work claim by claim.
## Provider swap
`PROVIDER` env var picks between OpenAI, Anthropic, and Ollama. Mostly tested with `gpt-4o-mini` because it's cheap. Anthropic and Ollama paths are wired but I haven't done a full quality comparison yet.
## Run it
pip install -r requirements.txt
cp .env.example .env
# edit .env and add your keys
python run.py "Give me a brief on CVE-2024-3400"
You need an OpenAI key and a Tavily key (free tier is fine).
## Tested on
- CVE-2024-3400 (Palo Alto PAN-OS)
- CVE-2023-34362 (MOVEit)
- CVE-2021-44228 (Log4Shell)
All produced briefs. Critic flagged stuff on all of them, which is honestly the point.
## Stuff that's not great
- Tavily free tier is 1000 searches/month, this burns through it fast if you test a lot
- Critic is strict to a fault — sometimes flags claims that are technically in the sources but worded differently. Tradeoff I'm okay with for now
- Only English sources
- No caching, so every run hits Tavily fresh
## What I'd do next
Add semantic similarity matching in the Critic so it doesn't flag paraphrased-but-grounded claims. Maybe try a smaller local model as the Critic specifically.