uppalapatipurnaraviteja99/threat-researcher

GitHub: uppalapatipurnaraviteja99/threat-researcher

Stars: 0 | Forks: 0

# ThreatResearcher Multi-agent system that writes threat intel briefs on CVEs. Built with LangGraph + Tavily. I built this because I wanted to learn LangGraph and the multi-agent pattern, and threat briefs were a good fit — they need decomposition (planning), web search (research), and fact-checking (the critic). ## How it works Three agents, one workflow: 1. Planner takes the question and splits it into 3-5 sub-questions 2. Researcher hits Tavily for each one and drafts a brief with inline citations 3. Critic fact-checks every claim against the sources. If anything is ungrounded, it sends the work back to the Researcher (max 2 retries) ## Why the Critic got rewritten First version of the Critic was a 2-line prompt that just asked "is this brief good? answer GOOD or BAD". Predictably, it said GOOD to everything. On a CVE-2024-3400 brief that had multiple hallucinated facts, it caught zero. Rewrote it with a Pydantic schema that forces the LLM to list specific ungrounded claims one by one. On the same brief, it caught 8. Lesson: free-text "looks good?" prompts are useless for fact-checking. You need structured output and you have to make the model do the work claim by claim. ## Provider swap `PROVIDER` env var picks between OpenAI, Anthropic, and Ollama. Mostly tested with `gpt-4o-mini` because it's cheap. Anthropic and Ollama paths are wired but I haven't done a full quality comparison yet. ## Run it pip install -r requirements.txt cp .env.example .env # edit .env and add your keys python run.py "Give me a brief on CVE-2024-3400" You need an OpenAI key and a Tavily key (free tier is fine). ## Tested on - CVE-2024-3400 (Palo Alto PAN-OS) - CVE-2023-34362 (MOVEit) - CVE-2021-44228 (Log4Shell) All produced briefs. Critic flagged stuff on all of them, which is honestly the point. ## Stuff that's not great - Tavily free tier is 1000 searches/month, this burns through it fast if you test a lot - Critic is strict to a fault — sometimes flags claims that are technically in the sources but worded differently. Tradeoff I'm okay with for now - Only English sources - No caching, so every run hits Tavily fresh ## What I'd do next Add semantic similarity matching in the Critic so it doesn't flag paraphrased-but-grounded claims. Maybe try a smaller local model as the Critic specifically.