gR3nn/ai-log-anomaly-investigator

GitHub: gR3nn/ai-log-anomaly-investigator

Stars: 0 | Forks: 0

# AI Log Anomaly Investigator `ai-log-anomaly-investigator` is a local-first SOC analysis demo built with Python, Streamlit, scikit-learn, and Hugging Face Transformers. It shows how rule-based detections, anomaly detection, and local DistilBERT inference can work together in a security workflow without any external LLM API dependency. ## Overview The project analyzes synthetic security logs and surfaces: - Rule-based detections for common SOC scenarios such as brute force, suspicious PowerShell, encoded commands, impossible travel, suspicious DNS, privilege escalation, port scanning, and possible data exfiltration - Isolation Forest anomaly detection using user and host behavior features - Local DistilBERT classification for benign vs malicious event inference - MITRE ATT&CK mapping and deterministic SOC report generation ## Tech Stack - Python - Streamlit - pandas - scikit-learn - Hugging Face Transformers - PyTorch ## Project Flow 1. `generate_synthetic_logs.py` creates a reproducible SOC dataset at `data/synthetic_security_logs.csv` 2. `src/rule_engine.py` generates explainable alerts with evidence and MITRE ATT&CK context 3. `src/anomaly_detector.py` builds behavior features and runs Isolation Forest 4. `src/transformer_classifier.py` loads a local fine-tuned DistilBERT model from `models/security-distilbert` 5. `src/report_generator.py` produces a local SOC investigation report 6. `app.py` presents the workflow in Streamlit ## Quick Start python3 -m venv .venv source .venv/bin/activate pip install -r requirements.txt python generate_synthetic_logs.py python train_transformer.py streamlit run app.py ## Main Commands Generate the dataset: python generate_synthetic_logs.py Train the local Transformer model: python train_transformer.py Run the Streamlit app: streamlit run app.py Run tests: pytest ## Dataset The synthetic dataset includes: - Event types: `login_success`, `login_failed`, `process_start`, `network_connection`, `dns_query`, `file_access`, `privilege_escalation`, `cloud_login`, `mfa_failure`, `suspicious_email` - Labels: `benign`, `malicious` - Attack scenarios: `brute_force`, `successful_login_after_failures`, `impossible_travel`, `suspicious_powershell`, `encoded_command`, `port_scan`, `data_exfiltration`, `suspicious_dns`, `privilege_escalation`, `malicious_email_link`, `benign_activity` ## Screenshots ### Dashboard And Controls ![Dashboard and Controls](https://static.pigsec.cn/wp-content/uploads/repos/2026/05/b779396f41233205.png) ### Transformer Classifier The Transformer Classifier section shows local DistilBERT inference over a selected security event. ![Transformer Classifier](https://static.pigsec.cn/wp-content/uploads/repos/2026/05/73e3b2092d233205.png) ### Anomaly Detection ![Anomaly Detection](https://static.pigsec.cn/wp-content/uploads/repos/2026/05/191bb94746233206.png) ### SOC Report Export ![SOC Report Export](https://static.pigsec.cn/wp-content/uploads/repos/2026/05/5e35437b0d233206.png) ## Notes - This is an educational SOC AI prototype, not a production detection system. - The dataset is synthetic, so model behavior and alert quality are limited by generated examples. - The app still runs if the Transformer model has not been trained yet. In that case it shows training instructions instead of crashing.