AddittyaPS/Fake-Job-Posting-Identifier

GitHub: AddittyaPS/Fake-Job-Posting-Identifier

Stars: 0 | Forks: 0

# Fake Job Posting Identifier An AI/ML-powered fraud detection system designed to identify fraudulent online job postings using Natural Language Processing (NLP), feature engineering, and machine learning techniques. The system analyzes job descriptions, company details, email domains, and suspicious linguistic patterns to classify postings as real or fake. ## Features - **Machine Learning Pipeline** - Built using TF-IDF Vectorization and Random Forest Classification for accurate text-based fraud detection. - **Advanced Feature Engineering** - Detects suspicious keywords, urgency phrases, personal email domains, and website inconsistencies. - **Natural Language Processing** - Includes text preprocessing techniques such as tokenization, stopword removal, lemmatization, and text cleaning using NLTK. - **Interactive Web Application** - Developed a Streamlit-based UI for real-time fake job prediction and feature analysis. ## Project Structure fake-job-posting-identifier/ │ ├── app.py # Streamlit application ├── src/ │ ├── preprocess.py # NLP preprocessing functions │ ├── features.py # Feature engineering logic │ └── train.py # Model training pipeline │ ├── models/ # Saved ML models ├── data/ # EMSCAD dataset ├── screenshots/ # UI screenshots ├── requirements.txt └── README.md ## How to Run Locally ### 1. Clone the Repository git clone https://github.com/YOUR_USERNAME/fake-job-posting-identifier.git cd fake-job-posting-identifier ### 2. Install Dependencies pip install -r requirements.txt ### 3. Run the Application streamlit run app.py ### 4. Open in Browser Navigate to: http://localhost:8501 ## Tech Stack - Python 3 - Scikit-learn - NLTK - Pandas - NumPy - Streamlit ## Machine Learning Workflow 1. Data Cleaning & NLP Preprocessing 2. TF-IDF Feature Extraction 3. Custom Fraud Feature Engineering 4. Random Forest Model Training 5. Real-Time Prediction using Streamlit ## Future Improvements - Deep Learning-based NLP models (BERT/LSTM) - Browser extension for real-time scam detection - Integration with online job portals - Cloud deployment support ## Dataset Dataset used: **EMSCAD Fake Job Postings Dataset**