moDgass/Alzheimer-s-Detection
GitHub: moDgass/Alzheimer-s-Detection
Stars: 0 | Forks: 0
# Alzheimer's Disease Detection using Machine Learning
## Project Overview
This project implements a machine learning model to detect and classify Alzheimer's disease stages using clinical data. The model leverages diagnostic information, cognitive scores, and biomarker data to provide accurate disease classification.
## Objective
The goal of this project is to:
- Build a predictive model for Alzheimer's disease classification
- Process and merge multi-source clinical datasets
- Standardize and encode features for machine learning
- Train a Random Forest Classifier for accurate diagnosis prediction
## Dataset
The project uses clinical data containing:
- **Diagnosis Target**: Actual diagnoses of patients
- **Cognitive Scores**: Cognitive assessment results
- **Biomarkers & Data**: Clinical measurements and imaging data (FDG-PET, etc.)
The data is merged on patient identifiers (RID) to create a comprehensive feature set.
## Disease Classification
The model predicts three categories:
- **CN**: Cognitively Normal
- **LMCI**: Late Mild Cognitive Impairment
- **AD**: Alzheimer's Disease
## Technologies & Libraries
- **Python 3.x**
- **pandas**: Data manipulation and merging
- **numpy**: Numerical computations
- **scikit-learn**: Machine learning algorithms
- StandardScaler: Feature scaling
- train_test_split: Data splitting
- RandomForestClassifier: Classification model
- LabelEncoder: Categorical encoding
- **matplotlib**: Data visualization
- **openpyxl**: Excel file handling
## Project Workflow
1. **Data Loading**: Load multiple sheets from Excel file
2. **Data Cleaning**: Remove missing values
3. **Data Integration**: Merge datasets on patient ID
4. **Feature Scaling**: Standardize numerical features
5. **Feature Engineering**: Encode categorical variables
6. **Model Training**: Train Random Forest Classifier
7. **Evaluation**: Assess model performance
## Key Features
- Multi-sheet Excel data integration
- Automatic handling of missing values
- Standardized feature scaling
- Categorical variable encoding
- Train-test split (80-20)
- Random Forest ensemble learning
## Usage
# Load the notebook and run cells sequentially
# Ensure the data file "CSI_7_MAL_2324_CW_resit_data.xlsx" is in the same directory
# The notebook will:
# 1. Load and merge clinical datasets
# 2. Preprocess the data
# 3. Train the model
# 4. Generate predictions
## Files
- `Alzheimer's Detection.ipynb` - Main Jupyter notebook with complete pipeline
- `CSI_7_MAL_2324_CW_resit_data.xlsx` - Clinical dataset (required)
## Model Details
- **Algorithm**: Random Forest Classifier
- **Train-Test Split**: 80-20
- **Feature Scaling**: StandardScaler
- **Encoding**: LabelEncoder for target variable
## Model Performance Metrics
The model's performance can be evaluated using:
- Accuracy
- Precision & Recall
- Confusion Matrix
- Classification Report
## Learning Outcomes
This project demonstrates:
- Data preprocessing and integration
- Feature engineering techniques
- Model training and validation
- Medical data analysis
- Machine learning best practices
## Future Enhancements
- Feature importance analysis
- Hyperparameter tuning
- Cross-validation implementation
- Additional classification algorithms comparison
- Model interpretability with SHAP values
- Web interface for predictions
## License
This project is part of my Machine Learning Projects collection.
## Author
## Mohamed Diaby Gassama
**Note**: This project requires the clinical dataset to run. Ensure all dependencies are installed before execution.