assia-ahmedabdi/BrakTooth-Attack-Classification
GitHub: assia-ahmedabdi/BrakTooth-Attack-Classification
Stars: 0 | Forks: 0
# 🛡️ BrakTooth Attack Type Classification
## 📖 Description
The BrakTooth family of vulnerabilities targets Bluetooth Classic devices by crashing, deadlocking, or in some cases executing arbitrary code. These attacks specifically exploit the **LMP and baseband layers** of the Bluetooth Classic stack.
This project builds a **multi-class classification model** to identify which type of BrakTooth attack is occurring — or whether traffic is normal — based solely on packet-level features. Early and accurate identification of attack types enables faster incident response, more targeted patching, and a deeper understanding of Bluetooth threat patterns.
The [notebook](https://www.kaggle.com/code/spredisbread/type-of-attack-detection) runs on **Kaggle**.
The dataset contains **6,402 labeled Bluetooth packets** across **12 classes** (11 attack types + normal traffic), sourced from the ISOT BrakTooth Attack Dataset.
## 🗂️ Dataset
**Source:** [ISOT BrakTooth Attack Dataset](https://www.kaggle.com/datasets/detecting-braktooth-attacks) on Kaggle
| Feature | Description | Type |
|---|---|---|
| `Protocol` | Bluetooth protocol used (L2CAP, OBEX, SDP, RFCOMM, LMP, HCI...) | Categorical |
| `Info` | Additional packet information depending on the protocol | Categorical |
| `Length` | Packet length in bytes | Integer |
| `Delta` | Time difference from the previous packet (seconds) | Float |
| `Type` | Attack type label — target variable (12 classes) | Categorical |
**Attack classes:**
`au_rand_flooding`, `duplicated_encapsulated_payload`, `duplicated_iocap`, `feature_response_flooding`, `invalid_feature_page_execution`, `invalid_setup_complete`, `invalid_timing_accuracy`, `lmp_auto_rate_overflow`, `lmp_overflow_dm1`, `truncated_lmp_accepted`, `truncated_sco_link_request`, `normal`
## 🛠️ Technologies
| Layer | Tools |
|---|---|
| **Language** | Python 3 |
| **Data Processing** | Pandas, NumPy |
| **Visualization** | Matplotlib, Seaborn |
| **Feature Engineering** | Scikit-learn (StandardScaler, MinMaxScaler, One-Hot Encoding) |
| **ML Models** | Scikit-learn, XGBoost, LightGBM |
| **Hyperparameter Tuning** | GridSearchCV |
| **Platform** | Kaggle Notebooks |
## ⚙️ Process
### 1. 🔍 Exploratory Data Analysis
Loading and inspecting the training set (6,402 samples, 5 features). Identifying class distribution (12 unique attack types), outlier detection on `Length` and `Delta`, and characterizing the 308 unique `Info` values.
### 2. 🧹 Data Preprocessing
- Outlier removal based on `Delta` range
- Simplification of the high-cardinality `Info` variable into broader categories (Sent, Rcvd, LMP, Configure, Connection, Disconnection...)
- Deskewing the `Delta` feature using quantile transformation
- Binning (`Delta` bucketing into `qtDelta`)
- One-hot encoding of categorical features (`Protocol`, `Info`)
- Standardization of numerical features
### 3. 🔧 Feature Engineering
Construction of 17 engineered features from the base numerical columns, including exponential transforms, power transforms, and interaction terms (`Delta*Length`, `qtDelta+Length`, `Length_on_Delta`, etc.), followed by feature selection to retain the 21 most informative.
### 4. 🤖 Model Selection
- Broad benchmarking with **LazyPredict** across 26 classifiers
- Top performers: `ExtraTreesClassifier` (78% accuracy), `RandomForestClassifier` (79%), `LGBMClassifier` (79%)
- Additional testing of **One-vs-Rest** strategies (KNN, Decision Tree, Random Forest, Bagging, SVM, SGD)
- Final model selection via **GridSearchCV** on the top 3 candidates
### 5. 🎯 Prediction
Final predictions generated on `X_test.csv` using the best-tuned model.
## 📊 Results
| Model | Accuracy | Balanced Accuracy | F1 Score |
|---|---|---|---|
| ExtraTreesClassifier | 0.78 | 0.37 | 0.78 |
| RandomForestClassifier | 0.79 | 0.36 | 0.78 |
| LGBMClassifier | 0.79 | 0.36 | 0.78 |
| XGBClassifier | 0.78 | 0.35 | 0.78 |
| KNeighborsClassifier | 0.74 | 0.28 | 0.72 |
## 🔭 Future Work
- **Class imbalance handling** — Apply SMOTE or class-weighted loss to improve recall on minority attack types
- **Deep learning** — Explore LSTM or transformer-based models for temporal packet sequence modeling
- **Real-time detection** — Deploy the model as a streaming classifier on live Bluetooth traffic
## 🏷️ Tags
`bluetooth` `braktooth` `cybersecurity` `attack-detection` `network-security` `intrusion-detection` `classification` `machine-learning` `scikit-learn` `xgboost` `lightgbm` `feature-engineering` `multiclass-classification` `imbalanced-data` `python` `kaggle` `isot-dataset` `lmp` `bluetooth-classic` `anomaly-detection`