hardikdixit123/Fake-Job-Posting-Identifier
GitHub: hardikdixit123/Fake-Job-Posting-Identifier
Stars: 0 | Forks: 0
# Fake Job Posting Identifier
An AI/ML-powered fraud detection system designed to identify fraudulent online job postings using Natural Language Processing (NLP), feature engineering, and machine learning techniques. The system analyzes job descriptions, company details, email domains, and suspicious linguistic patterns to classify postings as real or fake.
## Features
- **Machine Learning Pipeline**
- Built using TF-IDF Vectorization and Random Forest Classification for accurate text-based fraud detection.
- **Advanced Feature Engineering**
- Detects suspicious keywords, urgency phrases, personal email domains, and website inconsistencies.
- **Natural Language Processing**
- Includes text preprocessing techniques such as tokenization, stopword removal, lemmatization, and text cleaning using NLTK.
- **Interactive Web Application**
- Developed a Streamlit-based UI for real-time fake job prediction and feature analysis.
## Project Structure
fake-job-posting-identifier/
│
├── app.py # Streamlit application
├── src/
│ ├── preprocess.py # NLP preprocessing functions
│ ├── features.py # Feature engineering logic
│ └── train.py # Model training pipeline
│
├── models/ # Saved ML models
├── data/ # EMSCAD dataset
├── screenshots/ # UI screenshots
├── requirements.txt
└── README.md
## Screenshots
### Main Interface

### Legitimate Job Detection

### Fake Job Detection
## 
## How to Run Locally
### 1. Clone the Repository
git clone https://github.com/YOUR_USERNAME/fake-job-posting-identifier.git
cd fake-job-posting-identifier
### 2. Install Dependencies
pip install -r requirements.txt
### 3. Run the Application
streamlit run app.py
### 4. Open in Browser
Navigate to:
http://localhost:8501
## Tech Stack
- Python 3
- Scikit-learn
- NLTK
- Pandas
- NumPy
- Streamlit
## Machine Learning Workflow
1. Data Cleaning & NLP Preprocessing
2. TF-IDF Feature Extraction
3. Custom Fraud Feature Engineering
4. Random Forest Model Training
5. Real-Time Prediction using Streamlit
## Future Improvements
- Deep Learning-based NLP models (BERT/LSTM)
- Browser extension for real-time scam detection
- Integration with online job portals
- Cloud deployment support
## Dataset
Dataset used: **EMSCAD Fake Job Postings Dataset**