Rohith-Ramamoorthy-2000/Threat_intelligence_chatbot
GitHub: Rohith-Ramamoorthy-2000/Threat_intelligence_chatbot
Stars: 0 | Forks: 0
# Threat_intelligence_chatbot
# Gen AI-Lab:3
# Team member:
Rohith, Vishruta, Bertilla
# Threat Intelligence Chatbot using RAG
## Overview
This project is an AI-powered Threat Intelligence Chatbot developed using Retrieval-Augmented Generation (RAG). The chatbot uses the MITRE ATT&CK Enterprise dataset as a cybersecurity knowledge base and provides intelligent responses to cybersecurity-related questions.
The system combines semantic search, vector databases, and Large Language Models (LLMs) to generate contextual threat intelligence responses.
## Features
* Retrieval-Augmented Generation (RAG) architecture
* MITRE ATT&CK Enterprise knowledge integration
* Semantic search using MiniLM embeddings
* FAISS vector database for fast retrieval
* FLAN-T5 for response generation
* Interactive Gradio chatbot interface
* Google Drive storage support for persistent vector database
* Lightweight and Colab-friendly implementation
## Technologies Used
| Technology | Purpose |
| --------------------- | --------------------------- |
| Python | Core development |
| Google Colab | Development environment |
| MITRE ATT&CK | Threat intelligence dataset |
| Sentence Transformers | Embedding generation |
| FAISS | Vector similarity search |
| FLAN-T5 | Language model |
| Gradio | Web interface |
| Pandas | Data processing |
## Project Architecture
User Query
↓
MiniLM Embedding
↓
FAISS Vector Search
↓
Top-K Relevant Chunks
↓
FLAN-T5 LLM
↓
Generated Response
↓
Gradio Interface
## Dataset
This project uses the Enterprise ATT&CK dataset from MITRE ATT&CK.
### Official Dataset Link
https://raw.githubusercontent.com/mitre/cti/master/enterprise-attack/enterprise-attack.json
## Installation
### Clone Repository
git clone YOUR_GITHUB_REPOSITORY_LINK
cd YOUR_PROJECT_FOLDER
### Install Dependencies
pip install transformers
pip install sentence-transformers
pip install faiss-cpu
pip install gradio
pip install pandas
## Google Colab Setup
Mount Google Drive:
from google.colab import drive
drive.mount('/content/drive')
## Folder Structure
threat_chatbot/
│
├── data/
│ ├── enterprise-attack.json
│ └── mitre_data.csv
│
├── faiss_index/
│ ├── index.faiss
│ └── chunks.pkl
│
├── notebook/
│ └── threat_chatbot.ipynb
│
└── README.md
## Workflow
1. Download MITRE ATT&CK dataset
2. Convert JSON to CSV
3. Chunk the text data
4. Generate embeddings using MiniLM
5. Store embeddings in FAISS
6. Retrieve relevant chunks using semantic similarity
7. Generate contextual responses using FLAN-T5
8. Display results through Gradio UI
## Example Questions
* What is T1059?
* Explain privilege escalation techniques
* How do attackers use PowerShell?
* What techniques are used for persistence?
* Explain phishing-related attack techniques
* What is credential dumping?
* How can lateral movement be detected?
## Models Used
### Embedding Model
all-MiniLM-L6-v2
### LLM
google/flan-t5-base
## Vector Database
This project uses FAISS for efficient similarity search and retrieval.
## User Interface
The chatbot interface is built using Gradio.
Features:
* Interactive chat UI
* Large readable response area
* Easy deployment in Colab
## Future Improvements
* CVE integration
* Threat actor intelligence
* Real-time threat feeds
* LangChain integration
* Memory-enabled chatbot
* Multi-source RAG
* Deployment using Docker or Hugging Face Spaces
## Advantages
* Fast retrieval of cybersecurity knowledge
* Context-aware responses
* Reduced hallucination compared to standalone LLMs
* Lightweight and scalable
* Beginner-friendly implementation
## Conclusion
## References
* [MITRE ATT&CK](https://attack.mitre.org?utm_source=chatgpt.com)
* [FAISS Documentation](https://faiss.ai?utm_source=chatgpt.com)
* [Sentence Transformers](https://www.sbert.net?utm_source=chatgpt.com)
* [Hugging Face Transformers](https://huggingface.co/docs/transformers/index?utm_source=chatgpt.com)
* [Gradio Documentation](https://www.gradio.app?utm_source=chatgpt.com)