assia-ahmedabdi/BrakTooth-Attack-Classification

GitHub: assia-ahmedabdi/BrakTooth-Attack-Classification

Stars: 0 | Forks: 0

# 🛡️ BrakTooth Attack Type Classification ## 📖 Description The BrakTooth family of vulnerabilities targets Bluetooth Classic devices by crashing, deadlocking, or in some cases executing arbitrary code. These attacks specifically exploit the **LMP and baseband layers** of the Bluetooth Classic stack. This project builds a **multi-class classification model** to identify which type of BrakTooth attack is occurring — or whether traffic is normal — based solely on packet-level features. Early and accurate identification of attack types enables faster incident response, more targeted patching, and a deeper understanding of Bluetooth threat patterns. The [notebook](https://www.kaggle.com/code/spredisbread/type-of-attack-detection) runs on **Kaggle**. The dataset contains **6,402 labeled Bluetooth packets** across **12 classes** (11 attack types + normal traffic), sourced from the ISOT BrakTooth Attack Dataset. ## 🗂️ Dataset **Source:** [ISOT BrakTooth Attack Dataset](https://www.kaggle.com/datasets/detecting-braktooth-attacks) on Kaggle | Feature | Description | Type | |---|---|---| | `Protocol` | Bluetooth protocol used (L2CAP, OBEX, SDP, RFCOMM, LMP, HCI...) | Categorical | | `Info` | Additional packet information depending on the protocol | Categorical | | `Length` | Packet length in bytes | Integer | | `Delta` | Time difference from the previous packet (seconds) | Float | | `Type` | Attack type label — target variable (12 classes) | Categorical | **Attack classes:** `au_rand_flooding`, `duplicated_encapsulated_payload`, `duplicated_iocap`, `feature_response_flooding`, `invalid_feature_page_execution`, `invalid_setup_complete`, `invalid_timing_accuracy`, `lmp_auto_rate_overflow`, `lmp_overflow_dm1`, `truncated_lmp_accepted`, `truncated_sco_link_request`, `normal` ## 🛠️ Technologies | Layer | Tools | |---|---| | **Language** | Python 3 | | **Data Processing** | Pandas, NumPy | | **Visualization** | Matplotlib, Seaborn | | **Feature Engineering** | Scikit-learn (StandardScaler, MinMaxScaler, One-Hot Encoding) | | **ML Models** | Scikit-learn, XGBoost, LightGBM | | **Hyperparameter Tuning** | GridSearchCV | | **Platform** | Kaggle Notebooks | ## ⚙️ Process ### 1. 🔍 Exploratory Data Analysis Loading and inspecting the training set (6,402 samples, 5 features). Identifying class distribution (12 unique attack types), outlier detection on `Length` and `Delta`, and characterizing the 308 unique `Info` values. ### 2. 🧹 Data Preprocessing - Outlier removal based on `Delta` range - Simplification of the high-cardinality `Info` variable into broader categories (Sent, Rcvd, LMP, Configure, Connection, Disconnection...) - Deskewing the `Delta` feature using quantile transformation - Binning (`Delta` bucketing into `qtDelta`) - One-hot encoding of categorical features (`Protocol`, `Info`) - Standardization of numerical features ### 3. 🔧 Feature Engineering Construction of 17 engineered features from the base numerical columns, including exponential transforms, power transforms, and interaction terms (`Delta*Length`, `qtDelta+Length`, `Length_on_Delta`, etc.), followed by feature selection to retain the 21 most informative. ### 4. 🤖 Model Selection - Broad benchmarking with **LazyPredict** across 26 classifiers - Top performers: `ExtraTreesClassifier` (78% accuracy), `RandomForestClassifier` (79%), `LGBMClassifier` (79%) - Additional testing of **One-vs-Rest** strategies (KNN, Decision Tree, Random Forest, Bagging, SVM, SGD) - Final model selection via **GridSearchCV** on the top 3 candidates ### 5. 🎯 Prediction Final predictions generated on `X_test.csv` using the best-tuned model. ## 📊 Results | Model | Accuracy | Balanced Accuracy | F1 Score | |---|---|---|---| | ExtraTreesClassifier | 0.78 | 0.37 | 0.78 | | RandomForestClassifier | 0.79 | 0.36 | 0.78 | | LGBMClassifier | 0.79 | 0.36 | 0.78 | | XGBClassifier | 0.78 | 0.35 | 0.78 | | KNeighborsClassifier | 0.74 | 0.28 | 0.72 | ## 🔭 Future Work - **Class imbalance handling** — Apply SMOTE or class-weighted loss to improve recall on minority attack types - **Deep learning** — Explore LSTM or transformer-based models for temporal packet sequence modeling - **Real-time detection** — Deploy the model as a streaming classifier on live Bluetooth traffic ## 🏷️ Tags `bluetooth` `braktooth` `cybersecurity` `attack-detection` `network-security` `intrusion-detection` `classification` `machine-learning` `scikit-learn` `xgboost` `lightgbm` `feature-engineering` `multiclass-classification` `imbalanced-data` `python` `kaggle` `isot-dataset` `lmp` `bluetooth-classic` `anomaly-detection`