LLM Red Team Risk Evaluator
**Live Demo:** [Try the app on Hugging Face Spaces](https://huggingface.co/spaces/Venkatkoushik22/llm-redteam-risk-evaluator)
A lightweight red-teaming dashboard and API for testing LLM responses against adversarial prompts such as prompt injection, hallucination, privacy leakage, toxicity, and jailbreak attempts.
## Why This Project
Most LLM projects focus only on generating answers. This project focuses on what happens before deployment: testing where an AI system fails.
The evaluator runs adversarial prompts across common LLM risk categories and generates structured reports with pass/fail scoring, risk scores, and dashboard-based failure analysis.
The current version uses a simulated LLM response layer for safe local testing, but the design allows real model APIs or local models to be plugged in later.
## Features
* Runs adversarial prompt evaluations
* Tests prompt injection, hallucination, privacy leakage, toxicity, and jailbreak behavior
* Generates CSV risk reports
* Provides a FastAPI endpoint for triggering evaluations
* Includes a Streamlit dashboard for reviewing failures
* Tracks pass rate, failed tests, and average risk score
## Tech Stack
Python, FastAPI, Streamlit, Pandas
## Project Structure
llm-redteam-risk-evaluator/
├── app/
│ ├── dashboard.py
│ ├── evaluator.py
│ └── main.py
├── prompts/
│ └── test_prompts.json
├── reports/
├── assets/
│ └── dashboard.png
├── requirements.txt
└── README.md
## Run Locally
pip install -r requirements.txt
Run the evaluator:
python app/evaluator.py
Run the dashboard:
streamlit run app/dashboard.py
Run the API:
uvicorn app.main:app --reload
Open API docs:
http://127.0.0.1:8000/docs
## Output
The system creates structured CSV reports with:
* Prompt category
* Test prompt
* Model response
* Risk score
* PASS or FAIL status
## Future Improvements
* Add real LLM API integration
* Add local model support with Ollama
* Add DeepEval or DeepTeam based scoring
* Export reports as JSON and PDF
* Add historical comparison across evaluation runs