gR3nn/ai-log-anomaly-investigator
GitHub: gR3nn/ai-log-anomaly-investigator
Stars: 0 | Forks: 0
# AI Log Anomaly Investigator
`ai-log-anomaly-investigator` is a local-first SOC analysis demo built with Python, Streamlit, scikit-learn, and Hugging Face Transformers. It shows how rule-based detections, anomaly detection, and local DistilBERT inference can work together in a security workflow without any external LLM API dependency.
## Overview
The project analyzes synthetic security logs and surfaces:
- Rule-based detections for common SOC scenarios such as brute force, suspicious PowerShell, encoded commands, impossible travel, suspicious DNS, privilege escalation, port scanning, and possible data exfiltration
- Isolation Forest anomaly detection using user and host behavior features
- Local DistilBERT classification for benign vs malicious event inference
- MITRE ATT&CK mapping and deterministic SOC report generation
## Tech Stack
- Python
- Streamlit
- pandas
- scikit-learn
- Hugging Face Transformers
- PyTorch
## Project Flow
1. `generate_synthetic_logs.py` creates a reproducible SOC dataset at `data/synthetic_security_logs.csv`
2. `src/rule_engine.py` generates explainable alerts with evidence and MITRE ATT&CK context
3. `src/anomaly_detector.py` builds behavior features and runs Isolation Forest
4. `src/transformer_classifier.py` loads a local fine-tuned DistilBERT model from `models/security-distilbert`
5. `src/report_generator.py` produces a local SOC investigation report
6. `app.py` presents the workflow in Streamlit
## Quick Start
python3 -m venv .venv
source .venv/bin/activate
pip install -r requirements.txt
python generate_synthetic_logs.py
python train_transformer.py
streamlit run app.py
## Main Commands
Generate the dataset:
python generate_synthetic_logs.py
Train the local Transformer model:
python train_transformer.py
Run the Streamlit app:
streamlit run app.py
Run tests:
pytest
## Dataset
The synthetic dataset includes:
- Event types: `login_success`, `login_failed`, `process_start`, `network_connection`, `dns_query`, `file_access`, `privilege_escalation`, `cloud_login`, `mfa_failure`, `suspicious_email`
- Labels: `benign`, `malicious`
- Attack scenarios: `brute_force`, `successful_login_after_failures`, `impossible_travel`, `suspicious_powershell`, `encoded_command`, `port_scan`, `data_exfiltration`, `suspicious_dns`, `privilege_escalation`, `malicious_email_link`, `benign_activity`
## Screenshots
### Dashboard And Controls

### Transformer Classifier
The Transformer Classifier section shows local DistilBERT inference over a selected security event.

### Anomaly Detection

### SOC Report Export

## Notes
- This is an educational SOC AI prototype, not a production detection system.
- The dataset is synthetic, so model behavior and alert quality are limited by generated examples.
- The app still runs if the Transformer model has not been trained yet. In that case it shows training instructions instead of crashing.