Gabi-comm/Network-Log-RAG-Chatbot
GitHub: Gabi-comm/Network-Log-RAG-Chatbot
Stars: 0 | Forks: 0
# Network-Log-RAG-Chatbot
An intelligent log analysis system that uses embeddings to index network logs into a vector database (ChromaDB), allowing a local Language Model chatbot (Ollama) to query, analyze, and detect security anomalies.
# Embedded Network Log Model with LLM Chatbot
An intelligent, privacy-first network log analysis system. This project implements a Retrieval-Augmented Generation (RAG) pipeline that embeds raw network security logs, stores them in a vector database, and pairs them with a local Large Language Model (LLM) chatbot for interactive, natural-language security analysis.
## Features
* **Log Embedding & Ingestion:** Uses Hugging Face embedding models to transform unstructured network logs into dense vector representations.
* **Vector Storage (ChromaDB):** Stores embedded log data locally for rapid, semantic similarity search.
* **Local LLM Integration (Ollama):** Queries the embedded logs using an open-source, local language model to ensure data privacy and security.
* **Interactive Chat Interface:** Allows security analysts to query complex network behavior, trace anomalies, and generate incident summaries using natural language.
## Tech Stack
* **Language:** Python 3.14+
* **Framework:** LangChain
* **Embeddings:** Hugging Face (`langchain-huggingface`)
* **Vector DB:** ChromaDB
* **Local LLM Engine:** Ollama
## Project Structure
The project is organized into modular scripts representing distinct phases of a Retrieval-Augmented Generation (RAG) pipeline:
* **`HR_Policies.txt`**: The raw text dataset containing simulated policy information and hidden PII.
* **`redact.py`**: Pre-processing script used to scan the raw text, detect sensitive information (like SSNs or emails), and clean it.
* **`embed.py`**: Loads the sanitized data, splits it into digestible text chunks, generates vector embeddings, and stores them in ChromaDB.
* **`retriever.py`**: Handles semantic search logic-taking a user query and fetching the most relevant document chunks from the vector database.
* **`chatbot.py`**: The final execution script that wraps the retriever and links it to Ollama, providing an interactive natural language chat interface.