Yashwanth-raj555/AI-Powered-Indian-Cybercrime-Threat-Intelligence-Analytics-Platform

GitHub: Yashwanth-raj555/AI-Powered-Indian-Cybercrime-Threat-Intelligence-Analytics-Platform

Stars: 1 | Forks: 0

# AI-Powered Indian Cybercrime & Threat Intelligence Analytics Platform ## Cybersecurity Analytics An end-to-end cybercrime and threat intelligence analytics project built using Python, SQL Server, Power BI, and Streamlit AI Assistant to analyze Indian cybercrime incidents, financial losses, high-risk attack types, city-level risks, and sector-wise vulnerabilities. # Dashboard Preview ![AI-Powered India Cybercrime Analytics Dashboard](https://static.pigsec.cn/wp-content/uploads/repos/2026/06/158aed6d36102108.png) # Project Overview The AI-Powered Indian Cybercrime & Threat Intelligence Analytics Platform transforms raw cybercrime incident data into meaningful threat intelligence. This project analyzes 1,200 cybercrime cases across India and helps identify: * Cybercrime trends * Financial loss patterns * High-risk attack types * City-level cybercrime impact * Sector-wise threat exposure * High-value cyber attacks * AI-powered cybersecurity insights The project follows a complete data analytics workflow from raw data cleaning to SQL analysis, Power BI dashboarding, and AI assistant integration. # Business Problem How can Indian cybercrime data be analyzed effectively to understand cyber attack patterns, financial losses, high-risk threats, and sector-level vulnerabilities? This project helps answer: * Which cyber attack types cause the highest financial loss? * Which cities are most affected by cybercrime? * Which sectors face the highest cyber threat? * Which years show the highest cybercrime loss? * Which attacks are classified as high-risk? * How can users ask AI-based questions about the cybercrime dataset? # Dataset Information | Attribute | Details | | -------------- | ------------------------------------------------ | | Dataset Type | Cybersecurity / Cybercrime Dataset | | Dataset Nature | Structured Tabular Dataset | | Dataset Level | Cybercrime Incident Level | | Total Records | 1,200 Cybercrime Cases | | Main Target | Cybercrime Trend, Risk & Financial Loss Analysis | # Data Dictionary | Column | Description | | -------------------- | --------------------------------------------------- | | year | Year in which the cybercrime case occurred | | day | Day of the month | | amount_lost_inr | Financial loss amount in Indian Rupees | | incident_type | Type of cyber attack such as ransomware or phishing | | city | Indian city where the case was recorded | | category | Affected sector/category | | amount_lost_category | Low, medium, or high financial loss group | | is_high_value_attack | Binary indicator for high-risk/high-value attack | | day_range | Early, mid, or late month segmentation | | case_date | Generated date field for analysis | # Tools & Technologies Used | Tool | Purpose | | ---------------- | ---------------------------- | | Python | Data Cleaning & EDA | | Pandas | Data Preprocessing | | NumPy | Numerical Operations | | SQL Server | Threat Intelligence Analysis | | Power BI | Dashboard Visualization | | Streamlit | AI Assistant Interface | | Jupyter Notebook | Data Cleaning Workflow | # Project Workflow Raw Cybercrime CSV Dataset ↓ Python Data Cleaning & Feature Engineering ↓ SQL Server Threat Intelligence Analysis ↓ Power BI Cybercrime Analytics Dashboard ↓ Streamlit AI Assistant ↓ Cybercrime Intelligence Platform # Python Data Cleaning Pipeline Python and Pandas were used to clean and prepare the cybercrime dataset. ### Cleaning Steps * Imported cybercrime CSV dataset * Checked dataset shape, columns, and data types * Standardized column names * Removed duplicate records * Cleaned text columns * Fixed spelling and category issues * Converted data types * Handled missing values * Controlled outliers using IQR method * Created new analytical features * Exported final cleaned dataset ### Feature Engineering New fields created: * `amount_lost_category` * `is_high_value_attack` * `day_range` * `case_date` Final cleaned dataset: * 1,200 rows * 10 columns * Zero missing values * Zero duplicate records # SQL Server Threat Intelligence Analysis SQL Server was used to generate structured cybercrime intelligence from the cleaned dataset. ### SQL Analysis Areas * Total cybercrime cases * Executive summary of financial loss * Yearly cybercrime trend * Attack type risk analysis * City-level cybercrime risk * Sector-wise cyber threat analysis * High-risk cyber attack detection * Monthly cybercrime pattern * Top financial loss incidents * Year-over-year financial loss growth ### Key SQL Insights * Total Cases: 1,200 * Total Financial Loss: ₹27.78 Cr * High-Risk Cases: 300 * Highest Single Loss: ₹4,99,946 * Highest Loss Attack Type: Ransomware * Top Risk City: Delhi * High Impact Sectors: Social Media, Corporate, Financial # Power BI Dashboard Features The Power BI dashboard was designed with a cyber-intelligence theme. ### Dashboard KPIs * Total Cases * Total Financial Loss * High Risk Cases * Highest Single Loss ### Dashboard Visuals * Yearly Cybercrime Trend & Financial Analysis * Cyber Attack Types by Financial Loss * Top 5 Cities by Financial Loss * Sector-Wise Financial Loss * High-Risk Cases by Attack Type * Monthly Cybercrime Trend * Cyber Threat Severity Analysis # AI Assistant Explanation This project includes an AI-powered assistant called **CyberInsight AI Assistant**. The AI assistant was built using Streamlit and acts as an interactive question-answering layer for the cybercrime analytics project. ## AI Assistant Features * Understands cybercrime-related questions using natural language input * Provides intelligent insights from cybercrime analytics data * Identifies high-risk attack patterns and cyber threat trends * Supports interactive cybersecurity intelligence analysis through AI * Shows quick insights such as top risk city, top attack type, and highest loss year * Allows users to ask cybercrime-related questions * Helps users understand the dataset without manually reading every chart * Can be extended using Gemini, Groq, OpenRouter, or local Pandas-based responses ## AI Assistant ![CyberInsight_AI_Assistant_Screenshot](https://static.pigsec.cn/wp-content/uploads/repos/2026/06/63587fdc80102116.png) # Key Findings * Ransomware caused the highest financial damage * Delhi, Mumbai, and Chennai were among the highest-risk cities * Social Media and Corporate sectors showed major financial impact * 2022 recorded the highest financial loss * Early and mid-month periods showed higher cybercrime activity * AI assistant improves the project by adding an interactive insight layer # Repository Structure AI-Powered-Indian-Cybercrime-Analytics/ │ ├── dataset/ │ └── cybersecurity_cases_india_combined.csv │ ├── notebook/ │ └── cybercrime_data_cleaning_pipeline.ipynb │ ├── sql_queries/ │ └── cybercrime_analytics_Insights.sql │ ├── powerbi_dashboard/ │ └── AI Cybercrime Threat Intelligence Dashboard.pbix │ ├── report/ │ └── Report_AI_Cybercrime_Project_Report.pdf │ ├── presentation/ │ └── AI-Powered-Indian-Cybercrime-and-Threat-Intelligence-Analytics-Platform.pdf │ ├── screenshots/ │ ├── Ai_powered_india_cybercrime_analytics_dashboard.png │ └── cyberinsight_ai_assistant.png │ ├── README.md ├── LICENSE └── .gitignore # Future Scope * Real-time cybercrime data feeds * Machine learning-based attack prediction * Anomaly detection for high-loss cases * Threat intelligence API integration * Natural language SQL generation * Online deployment of Streamlit AI Assistant * Power BI web integration # Disclaimer This project is created for educational and portfolio purposes only. The dataset used in this project is for analytics, learning, and cybersecurity awareness demonstration. # Author ## Miryala Yashwanth * Python * SQL Server * Power BI * Streamlit * Data Analytics * Cybersecurity Analytics