praveenrajasneo/dns-tunneling-detection

GitHub: praveenrajasneo/dns-tunneling-detection

基于Random Forest与LSTM双模型融合的DNS隧道攻击检测系统，通过五阶段pipeline从DNS流量中高效识别隐蔽的数据窃取和C2通信。

Stars: 70 | Forks: 1

# 🔍 DNS 隧道攻击检测 ![Python](https://img.shields.io/badge/Python-3.8+-blue?style=flat&logo=python) ![TensorFlow](https://img.shields.io/badge/TensorFlow-2.x-orange?style=flat&logo=tensorflow) ![Scikit-learn](https://img.shields.io/badge/Scikit--learn-ML-green?style=flat&logo=scikit-learn) ![Streamlit](https://img.shields.io/badge/Streamlit-Dashboard-red?style=flat&logo=streamlit) ## 📌 概述 DNS 隧道是一种利用 DNS 协议进行数据窃取或维持隐蔽 C2（命令与控制）通道的网络攻击技术。传统的基于签名的 IDS 系统无法有效检测此类攻击。本项目实现了一个**5阶段检测 pipeline**，结合了统计特征工程、Random Forest 和 LSTM 模型以及加权决策融合——达到了 **97.8% 的准确率**和仅 **1.4% 的误报率**。 ## 🏗️ 系统架构 ``` PCAP / Live DNS Traffic ↓ Packet Parser (Scapy) ↓ CDN Whitelist Filter ↓ Feature Extractor (7 features) ↓ Rule Engine (Anomaly Scoring) ↓ ┌─────────────┐ │ Random │ + │ LSTM │ │ Forest │ │ Model │ └─────────────┘ ↓ Decision Fusion Layer ↓ Risk Classification: High / Medium / Low ↓ Streamlit Dashboard ``` ## 🔬 提取的7项统计特征 | 特征 | 描述 | |---|---| | 查询长度 | DNS 查询字符串的长度 | | 熵 | 查询的香农熵 | | 子域名深度 | 子域名的层级数量 | | N-gram 偏差 | 与正常 N-gram 模式的偏差 | | 数字比率 | 查询中数字字符的比例 | | 时间行为 | 时间窗口内的查询频率 | | 响应大小比率 | 响应大小与查询大小的比例 | ## 📊 结果 | 指标 | 数值 | |---|---| | 准确率 | **97.8%** | | 精确率 | 0.98 | | 召回率 | 0.97 | | 误报率 | **1.4%** | | 数据集 | CIC-Bell-DNS-2021 (120,000 个样本) | **相比基线模型的提升：** - 比独立的 Random Forest 高出 +3.2% - 比仅基于熵的方法高出 +10% ## 🛠️ 技术栈 - **Python 3.8+** - **Scapy** – 数据包解析 - **Scikit-learn** – Random Forest 分类器 - **TensorFlow / Keras** – LSTM 序列模型 - **Streamlit** – 实时监控仪表板 - **tldextract** – 域名解析 ## 📁 项目结构 ``` dns-tunneling-detection/ ├── src/ │ ├── parser.py # Packet parsing with Scapy │ ├── whitelist.py # CDN whitelist filtering │ ├── feature_extractor.py # 7-feature extraction │ ├── rule_engine.py # Anomaly rule scoring │ ├── classifier.py # RF + LSTM classifiers │ ├── fusion.py # Decision fusion layer │ └── dashboard.py # Streamlit dashboard ├── models/ │ ├── random_forest.pkl # Trained RF model │ └── lstm_model.h5 # Trained LSTM model ├── data/ # Place dataset here (not tracked) ├── outputs/ # Results and logs ├── requirements.txt ├── .gitignore └── README.md ``` ## 🚀 快速开始 ### 1. 克隆仓库 ``` git clone https://github.com/YOUR_USERNAME/dns-tunneling-detection.git cd dns-tunneling-detection ``` ### 2. 安装依赖 ``` pip install -r requirements.txt ``` ### 3. 添加数据集下载 [CIC-Bell-DNS-2021](https://www.unb.ca/cic/datasets/dns.html) 数据集，并将 CSV 文件放入 `data/` 文件夹中。 ### 4. 运行检测 pipeline ``` python src/classifier.py ``` ### 5. 启动仪表板 ``` streamlit run src/dashboard.py ``` ## 👥 团队 | 姓名 | 角色 | |---|---| | Praveen Raj K | ML Pipeline 与特征工程 | | Pranav V | 数据包解析与仪表板 | **指导老师：** Mrs. G. Smilarubavathy **所属机构：** St. Joseph's Institute of Technology, Chennai ## 📚 参考文献 - CIC-Bell-DNS-2021 Dataset — University of New Brunswick - Bilge et al., 2011 — Passive DNS Analysis - Aiello et al., 2015 — Query Length & Frequency - Sheridan et al., 2015 — Entropy-based Detection - Buczak & Guven, 2016 — ML-based IDS - Nadler et al., 2019 — Low-rate DNS Tunneling ## 📄 许可证本项目仅供学术研究使用。

标签：Apex, DNS隧道检测, Kubernetes, LSTM, PCAP分析, Python, Scapy, Scikit-learn, Streamlit, TensorFlow, 信标分析, 入侵检测系统, 命令与控制, 威胁情报, 安全数据湖, 开发者工具, 异常检测, 数据泄露防护, 无后门, 机器学习, 模型融合, 深度学习, 特征工程, 网络安全, 网络探测, 访问控制, 逆向工具, 随机森林, 隐私保护