harshal0704/Oscanft

GitHub: harshal0704/Oscanft

Stars: 0 | Forks: 0

# 🔥 OSCANFT ### *Open Source Cyber Autonomous Network Forensics & Threat Intelligence* [![FastAPI](https://img.shields.io/badge/FastAPI-005571?style=for-the-badge&logo=fastapi)](https://fastapi.tiangolo.com) [![Google Gemini](https://img.shields.io/badge/Google%20Gemini-8E75C2?style=for-the-badge&logo=google-gemini&logoColor=white)](https://deepmind.google/technologies/gemini/) [![Bright Data](https://img.shields.io/badge/Bright%20Data-FF8C38?style=for-the-badge&logo=data-grip&logoColor=white)](https://brightdata.com) [![Neon](https://img.shields.io/badge/Neon%20Database-00E599?style=for-the-badge&logo=postgresql&logoColor=black)](https://neon.tech) **OSCANFT** is an autonomous, industrial-grade cyber threat intelligence and forensics platform designed for modern Security Operations Centers (SOCs). It deploys a collaborative **6-Agent Autonomous Swarm** powered by **Google Gemini 2.5** to scour the open web, dark web, social media, code repositories, data brokers, and regulatory bodies concurrently — delivering real-time threat posture assessments through a premium glassmorphism command center. By leveraging **Bright Data's 10 MCP-powered data collection tools** for anti-bot web intelligence gathering, an **IOC Enrichment & MITRE ATT&CK Mapping Engine** with 19 threat classifications, and **Neon Serverless Postgres** for persistent storage, OSCANFT provides unified threat scoring, cross-source deduplication, and live WebSocket telemetry — all rendered in a stunning ember-gold-emerald themed dashboard. ## 💎 What Makes OSCANFT Different? | Feature | Description | |:---|:---| | 🤖 **6-Agent Autonomous Swarm** | Parallel Gemini-driven sweepers running via `asyncio.gather` — 6x concurrent intelligence gathering | | 🌐 **10 Bright Data MCP Tools** | SERP search, web scraping, social monitoring, dark web scanning, GitHub secrets, domain intel, news aggregation, data broker checks, batch scraping, AI discovery | | 🛡️ **MITRE ATT&CK Mapping** | 19 finding types mapped to enterprise ATT&CK tactics and techniques, rendered as an interactive SOC heatmap | | 🧠 **Automated IOC Enrichment** | Regex classifiers for IPs, URLs, domains, emails, CVEs, SHA256/MD5 hashes with MITRE correlation | | 🔍 **Data Exposure Monitoring** | Dedicated agent scanning social media, dark web, data brokers, and GitHub for leaked PII and credentials | | 💾 **Dual-Database Architecture** | Auto-provisions PostgreSQL on Neon serverless, with immediate SQLite fallback | | 📺 **Real-Time SOC Console** | WebSocket-powered live telemetry with animated gauges, heatmaps, and sliding drawer interfaces | | 🔔 **Slack Alert Integration** | AI-formatted Block Kit digests with severity-coded threat summaries | | 📊 **Historical Trend Analysis** | SVG sparkline charts tracking risk score progression across scan history | | 🎨 **Premium Ember-Gold Theme** | Vibrant warm color scheme with glassmorphism, particle animations, and micro-interactions | ## 🌐 Bright Data Integration — 10 MCP Data Sources OSCANFT leverages **Bright Data's MCP (Model Context Protocol)** infrastructure to power its intelligence gathering across 10 specialized data collection tools: | # | Tool | API Method | Description | Use Case | |:--|:---|:---|:---|:---| | 1 | `brightdata__search_engine` | SERP API | Structured Google/Bing searches returning organic results | General threat intelligence queries | | 2 | `brightdata__scrape_as_markdown` | Web Unlocker | Scrapes any webpage bypassing bot protection, returns clean Markdown | Deep-dive analysis of specific threat pages | | 3 | `brightdata__discover` | AI-Ranked Search | Intent-based web discovery returning contextually relevant pages | Smart threat discovery | | 4 | `brightdata__scrape_batch` | Parallel Unlocker | Concurrent scraping of up to 10 URLs simultaneously | Mass evidence collection | | 5 | `brightdata__social_monitor` | Social SERP | Monitors Twitter/X, Reddit, Telegram for brand mentions and threat chatter | Social media OSINT | | 6 | `brightdata__dark_web_scanner` | Deep Web SERP | Scans dark web forums, marketplaces, and paste sites for data listings | Dark web threat intelligence | | 7 | `brightdata__github_secrets_scanner` | Code SERP | Searches GitHub for exposed API keys, credentials, and config files | Code leak detection | | 8 | `brightdata__domain_intelligence` | Domain SERP | WHOIS, DNS, SSL cert transparency, subdomain discovery, typosquatting | Infrastructure reconnaissance | | 9 | `brightdata__news_aggregator` | News SERP | Aggregates CVE feeds, security advisories, and threat landscape reports | Threat awareness | | 10 | `brightdata__data_broker_check` | People SERP | Scans data broker sites for exposed employee PII and corporate data | PII exposure monitoring | ### Connection Architecture - **MCP SSE Endpoint**: `https://mcp.brightdata.com/sse?token={token}&pro=1` - **SERP API**: `https://api.brightdata.com/serp/google` - **Web Unlocker Proxy**: `brd.superproxy.io:22225` (residential/datacenter proxy rotation) - **Fallback Mode**: Comprehensive mock data engine for offline/demo/sandbox environments ## 🤖 Autonomous Agent Swarm — 6 Specialized Units | Agent | Codename | Scope | Key Data Sources | |:---|:---|:---|:---| | 🔴 **Threat Intel** | `threat_intel` | Credential leaks, code exposure, dark web archives, malware indicators | SERP, Web Scraper, Social Monitor | | 📜 **Regulatory** | `regulatory` | CISA advisories, SEC rulings, GDPR bulletins, FTC enforcement actions | News Aggregator, SERP, Web Scraper | | 🏢 **Third-Party Risk** | `third_party_risk` | Vendor breaches, CVEs, outages, supply chain vulnerabilities | News Aggregator, SERP, Batch Scraper | | 🛡️ **Brand Monitor** | `brand_monitor` | Typosquatting domains, executive impersonation, phishing kits | Domain Intelligence, SERP, Web Scraper | | 📋 **Compliance Parser** | `compliance_parser` | Policy gap analysis, framework alignment auditing (SOC2, GDPR, ISO27001) | SERP, Web Scraper, News Aggregator | | 🔍 **Data Exposure** | `data_exposure` | Social media leaks, dark web listings, GitHub secrets, data broker PII | Social Monitor, Dark Web Scanner, GitHub Scanner, Data Broker Check | All agents run **concurrently** via `asyncio.gather()`, each powered by **Google Gemini 2.5** with MCP tool calling for autonomous web intelligence gathering. ## 🏗️ Architecture & Data Flow graph TD classDef client fill:#FF8C38,stroke:#CC6600,stroke-width:2px,color:#fff; classDef agent fill:#10B981,stroke:#059669,stroke-width:2px,color:#fff; classDef core fill:#F59E0B,stroke:#D97706,stroke-width:2px,color:#fff; classDef db fill:#00cc99,stroke:#009966,stroke-width:2px,color:#fff; classDef notify fill:#ff3366,stroke:#cc0033,stroke-width:2px,color:#fff; UI[Ember-Gold Glassmorphism Dashboard]:::client -->|Dispatch Trigger| API[FastAPI Orchestrator]:::core Cron[4-Hour Scheduler]:::core -->|Auto-Run| API API -->|Spin Up Swarm| Swarm[Agentic Swarm Cluster]:::core subgraph Swarm [6-Agent Autonomous Swarm] A1[Threat Intel Agent]:::agent A2[Regulatory Agent]:::agent A3[Third-Party Risk Agent]:::agent A4[Brand Monitor Agent]:::agent A5[Compliance Parser Agent]:::agent A6[Data Exposure Agent]:::agent end Swarm -->|10 MCP Tools| BD[Bright Data MCP Client]:::core BD -->|SERP Searches| Google((Google/Bing)) BD -->|Web Unlocker| Sites((Paste Sites, Forums)) BD -->|Social Monitor| Social((Twitter, Reddit, Telegram)) BD -->|Dark Web Scanner| DarkWeb((Dark Web Markets)) BD -->|GitHub Scanner| GitHub((GitHub Repos)) BD -->|Domain Intel| DNS((WHOIS, DNS, SSL)) BD -->|News Aggregator| News((CVE Feeds, Advisories)) BD -->|Data Broker Check| Brokers((People Search, Brokers)) Swarm -->|Raw Findings| IOC[IOC Enrichment Engine]:::core IOC -->|19 MITRE Mappings| Scorer[Gemini Risk Correlation Engine]:::core Scorer -->|Deduplicate & Score 0-100| Report[Risk Report & Roadmap]:::core Report -->|Store Schema| Neon[(Neon Postgres / SQLite)]:::db Report -->|Broadcast Progress| WS[WebSocket Manager]:::core WS -->|Real-Time Streams| UI Report -->|Slack Blocks Formatter| Slack[Slack Alert Service]:::notify Slack -->|Live Webhook| SlackChannel((Slack #oscanft-alerts)) ## 🛡️ MITRE ATT&CK Coverage — 19 Threat Classifications OSCANFT maps all findings against the **MITRE ATT&CK Enterprise** framework: | Finding Type | Tactic | Technique | Agent Source | |:---|:---|:---|:---| | `credential_leak` | TA0006 Credential Access | T1552 Unsecured Credentials | Threat Intel | | `code_leak` | TA0009 Collection | T1213 Data from Info Repositories | Threat Intel | | `typosquat` | TA0001 Initial Access | T1566 Phishing | Brand Monitor | | `dark_web_mention` | TA0043 Reconnaissance | T1593 Search Open Websites | Threat Intel | | `infrastructure_exposure` | TA0043 Reconnaissance | T1595 Active Scanning | Brand Monitor | | `compliance_gap` | TA0005 Defense Evasion | T1562 Impair Defenses | Compliance Parser | | `vendor_breach` | TA0001 Initial Access | T1199 Trusted Relationship | Third-Party Risk | | `regulatory_update` | TA0043 Reconnaissance | T1592 Gather Victim Identity | Regulatory | | `exec_impersonation` | TA0001 Initial Access | T1566.002 Spearphishing Link | Brand Monitor | | `phishing_kit` | TA0001 Initial Access | T1566.003 Spearphishing Attachment | Brand Monitor | | `outage` | TA0040 Impact | T1499 Endpoint DoS | Third-Party Risk | | `cve_vulnerability` | TA0001 Initial Access | T1190 Exploit Public-Facing App | Third-Party Risk | | `policy_violation` | TA0005 Defense Evasion | T1562 Impair Defenses | Compliance Parser | | `social_media_exposure` | TA0043 Reconnaissance | T1593.001 Social Media | Data Exposure | | `dark_web_listing` | TA0043 Reconnaissance | T1597 Search Closed Sources | Data Exposure | | `pii_exposure` | TA0009 Collection | T1530 Data from Cloud Storage | Data Exposure | | `github_secret_leak` | TA0006 Credential Access | T1552.004 Private Keys | Data Exposure | | `subdomain_takeover` | TA0001 Initial Access | T1584 Compromise Infrastructure | Brand Monitor | | `news_cve_alert` | TA0001 Initial Access | T1190 Exploit Public-Facing App | Regulatory | ## 👁️ Cyber Command Center Dashboard OSCANFT features a premium ember-gold themed SOC console: - **Unified Threat Gauge** — Animated conic ring showing corporate threat index (0–100) - **MITRE ATT&CK Heatmap Grid** — 7-tactic enterprise coverage with progressive heat levels - **Tactical Action Roadmap** — Auto-prioritized *Immediate* vs *Defensive* remediation queues - **6-Agent Status Board** — Live running/completed/failed state for each sweeper - **Vector Sparkline Chart** — SVG trendlines mapping historical scan results - **Slide-In Detail Drawer** — Full evidence blocks, MITRE tags, and JSON export - **Live Console Log** — Terminal-style WebSocket progress with color-coded output - **Particle Background** — Animated network mesh with ember-colored nodes and connections ## 📡 API Documentation | Endpoint | Method | Description | |:---|:---|:---| | `/api/scan` | `POST` | Dispatches the 6-agent autonomous swarm concurrently | | `/api/scans` | `GET` | Lists recent completed security evaluations | | `/api/scans/{scan_id}` | `GET` | Full findings dataset for a specific scan | | `/api/scans/{scan_id}/export` | `GET` | JSON or RFC-4180 CSV report export | | `/api/findings` | `GET` | Active findings with optional `agent` and `severity` filters | | `/api/risk-score` | `GET` | Current risk score, executive brief, and trend data | | `/api/agents/status` | `GET` | Connection states and scopes for all 6 sweepers | | `/api/health` | `GET` | System health, database type, and scan status | | `/api/stats` | `GET` | Aggregated telemetry: severity distribution, agent counts | | `/ws/scan-progress` | `WS` | Real-time WebSocket stream during background sweeps | ## ⚙️ Technology Stack | Layer | Technology | Purpose | |:---|:---|:---| | **AI Engine** | Google Gemini 2.5 Flash | Autonomous agent reasoning, threat analysis, risk scoring | | **Data Collection** | Bright Data MCP | 10 web intelligence tools with anti-bot capabilities | | **Backend** | FastAPI + Uvicorn | Async REST API with WebSocket support | | **Database** | Neon Serverless Postgres | Cloud-native storage with SQLite local fallback | | **Frontend** | Vanilla HTML/CSS/JS | Zero-dependency glassmorphism SOC dashboard | | **Notifications** | Slack Block Kit API | AI-formatted security digest alerts | | **Scheduling** | Python `schedule` | 4-hour automated scan orchestration | | **Containerization** | Docker + Compose | One-command deployment | ### Python Dependencies google-genai>=1.14.0 # Google Gemini SDK mcp>=1.2.0 # Model Context Protocol fastapi>=0.115.0 # Async web framework uvicorn>=0.34.0 # ASGI server python-dotenv>=1.0.0 # Environment config pydantic>=2.10.0 # Data validation schedule>=1.2.2 # Cron scheduling httpx>=0.28.0 # Async HTTP client websockets>=14.0 # WebSocket support psycopg2-binary>=2.9.9 # PostgreSQL driver rich>=13.9.0 # Console formatting aiofiles>=24.0 # Async file I/O ## ⚙️ Environment Configuration Create a `.env` file in the root directory: # API Keys & Endpoints (Google GenAI API Key is required) GEMINI_API_KEY=your_gemini_api_key_here # Bright Data MCP Client (Leave empty for fallback mock mode) BRIGHT_DATA_MCP_TOKEN=your_bright_data_mcp_token_here BRIGHT_DATA_MCP_URL=https://mcp.brightdata.com/mcp?token=your_token_here # Database URL (Optional: Leave empty for local SQLite fallback) NEON_API_KEY= DATABASE_URL= # Notifications (Optional: Outputs Blocks markup to console if empty) SLACK_BOT_TOKEN= SLACK_CHANNEL=#oscanft-alerts PORT=8000 # Target Profile (Customize for your organization) TARGET_ORG=Acme SaaS TARGET_DOMAINS=acme-saas.com,acme-security.net TARGET_BRAND_TERMS=Acme,AcmeSaaS,AcmeCorp TARGET_IP_RANGES=192.168.1.0/24,10.0.0.0/16 TARGET_VENDORS=[{"name": "Stripe", "criticality": "critical", "data_access": "payment_data"}, {"name": "Auth0", "criticality": "critical", "data_access": "identity"}, {"name": "AWS", "criticality": "high", "data_access": "infrastructure"}] ## 🚀 Quick Start ### Step 1: Install Dependencies pip install -r requirements.txt ### Step 2: Set API Credentials Generate an official Gemini API Key from Google AI Studio and place it in the `.env` file. ### Step 3: Launch OSCANFT python main.py ### Step 4: Open Command Center 👉 **[http://localhost:8000](http://localhost:8000)** ## 🐳 Docker Deployment ### Docker Compose (Recommended) docker-compose up -d --build ### Docker CLI docker build -t oscanft . docker run -d -p 8000:8000 --env-file .env -v oscanft-data:/app/data --name oscanft oscanft ## 🔍 Codebase Structure oscanft/ ├── main.py # Orchestrator (Cron scheduler + FastAPI boot) ├── config.py # Organization and API parameters loader ├── requirements.txt # Python dependencies ├── Dockerfile # Container config ├── docker-compose.yml # Compose with persistent volumes ├── .dockerignore # Build exclusions ├── mcp_bridge/ │ ├── bright_data.py # 10 Bright Data MCP tools with offline mock engines │ ├── neon.py # Neon PG client & SQLite DDL fallback │ └── gemini_adapter.py # Gemini/AIMLAPI bridge with tool calling ├── agents/ │ ├── base_agent.py # Swarm skeleton with prompt interpolation & retries │ ├── threat_intel.py # Credential leak and code exposure monitor │ ├── regulatory.py # Legal, CISA, and SEC directive crawler │ ├── third_party_risk.py # Vendor breach and CVE analyzer │ ├── brand_monitor.py # Typosquat and phishing kit hunter │ ├── compliance_parser.py # SOC2/GDPR alignment auditor │ └── data_exposure.py # Social/dark web/PII exposure monitor ├── engine/ │ ├── models.py # Pydantic schemas (Finding, RiskReport, AgentRun) │ ├── ioc_enrichment.py # Regex classifiers & 19 MITRE ATT&CK mappings │ ├── risk_scorer.py # Gemini deduplicator and rating compiler │ ├── report_generator.py # JSON/CSV export formatters │ └── slack_alerts.py # Slack Block Kit generator ├── prompts/ # 7 agent system prompts + MCP tools instructions ├── db/ │ ├── repository.py # Data access layer │ └── schema.sql # PostgreSQL/SQLite DDL └── dashboard/ ├── index.html # SOC command center layout ├── styles.css # Ember-gold-emerald design system (1800+ lines) └── dashboard.js # WebSocket controller & UI renderer ## 📄 License MIT License — See [LICENSE](LICENSE) for details.