venkatkoushik22/llm-redteam-risk-evaluator

GitHub: venkatkoushik22/llm-redteam-risk-evaluator

Stars: 1 | Forks: 0

LLM Red Team Risk Evaluator

**Live Demo:** [Try the app on Hugging Face Spaces](https://huggingface.co/spaces/Venkatkoushik22/llm-redteam-risk-evaluator) A lightweight red-teaming dashboard and API for testing LLM responses against adversarial prompts such as prompt injection, hallucination, privacy leakage, toxicity, and jailbreak attempts. ## Why This Project Most LLM projects focus only on generating answers. This project focuses on what happens before deployment: testing where an AI system fails. The evaluator runs adversarial prompts across common LLM risk categories and generates structured reports with pass/fail scoring, risk scores, and dashboard-based failure analysis. The current version uses a simulated LLM response layer for safe local testing, but the design allows real model APIs or local models to be plugged in later.

LLM Red Team Risk Evaluator Dashboard

## Features * Runs adversarial prompt evaluations * Tests prompt injection, hallucination, privacy leakage, toxicity, and jailbreak behavior * Generates CSV risk reports * Provides a FastAPI endpoint for triggering evaluations * Includes a Streamlit dashboard for reviewing failures * Tracks pass rate, failed tests, and average risk score ## Tech Stack Python, FastAPI, Streamlit, Pandas ## Project Structure llm-redteam-risk-evaluator/ ├── app/ │ ├── dashboard.py │ ├── evaluator.py │ └── main.py ├── prompts/ │ └── test_prompts.json ├── reports/ ├── assets/ │ └── dashboard.png ├── requirements.txt └── README.md ## Run Locally pip install -r requirements.txt Run the evaluator: python app/evaluator.py Run the dashboard: streamlit run app/dashboard.py Run the API: uvicorn app.main:app --reload Open API docs: http://127.0.0.1:8000/docs ## Output The system creates structured CSV reports with: * Prompt category * Test prompt * Model response * Risk score * PASS or FAIL status ## Future Improvements * Add real LLM API integration * Add local model support with Ollama * Add DeepEval or DeepTeam based scoring * Export reports as JSON and PDF * Add historical comparison across evaluation runs