harshal0704/Oscanft
GitHub: harshal0704/Oscanft
Stars: 0 | Forks: 0
# 🔥 OSCANFT
### *Open Source Cyber Autonomous Network Forensics & Threat Intelligence*
[](https://fastapi.tiangolo.com)
[](https://deepmind.google/technologies/gemini/)
[](https://brightdata.com)
[](https://neon.tech)
**OSCANFT** is an autonomous, industrial-grade cyber threat intelligence and forensics platform designed for modern Security Operations Centers (SOCs). It deploys a collaborative **6-Agent Autonomous Swarm** powered by **Google Gemini 2.5** to scour the open web, dark web, social media, code repositories, data brokers, and regulatory bodies concurrently — delivering real-time threat posture assessments through a premium glassmorphism command center.
By leveraging **Bright Data's 10 MCP-powered data collection tools** for anti-bot web intelligence gathering, an **IOC Enrichment & MITRE ATT&CK Mapping Engine** with 19 threat classifications, and **Neon Serverless Postgres** for persistent storage, OSCANFT provides unified threat scoring, cross-source deduplication, and live WebSocket telemetry — all rendered in a stunning ember-gold-emerald themed dashboard.
## 💎 What Makes OSCANFT Different?
| Feature | Description |
|:---|:---|
| 🤖 **6-Agent Autonomous Swarm** | Parallel Gemini-driven sweepers running via `asyncio.gather` — 6x concurrent intelligence gathering |
| 🌐 **10 Bright Data MCP Tools** | SERP search, web scraping, social monitoring, dark web scanning, GitHub secrets, domain intel, news aggregation, data broker checks, batch scraping, AI discovery |
| 🛡️ **MITRE ATT&CK Mapping** | 19 finding types mapped to enterprise ATT&CK tactics and techniques, rendered as an interactive SOC heatmap |
| 🧠 **Automated IOC Enrichment** | Regex classifiers for IPs, URLs, domains, emails, CVEs, SHA256/MD5 hashes with MITRE correlation |
| 🔍 **Data Exposure Monitoring** | Dedicated agent scanning social media, dark web, data brokers, and GitHub for leaked PII and credentials |
| 💾 **Dual-Database Architecture** | Auto-provisions PostgreSQL on Neon serverless, with immediate SQLite fallback |
| 📺 **Real-Time SOC Console** | WebSocket-powered live telemetry with animated gauges, heatmaps, and sliding drawer interfaces |
| 🔔 **Slack Alert Integration** | AI-formatted Block Kit digests with severity-coded threat summaries |
| 📊 **Historical Trend Analysis** | SVG sparkline charts tracking risk score progression across scan history |
| 🎨 **Premium Ember-Gold Theme** | Vibrant warm color scheme with glassmorphism, particle animations, and micro-interactions |
## 🌐 Bright Data Integration — 10 MCP Data Sources
OSCANFT leverages **Bright Data's MCP (Model Context Protocol)** infrastructure to power its intelligence gathering across 10 specialized data collection tools:
| # | Tool | API Method | Description | Use Case |
|:--|:---|:---|:---|:---|
| 1 | `brightdata__search_engine` | SERP API | Structured Google/Bing searches returning organic results | General threat intelligence queries |
| 2 | `brightdata__scrape_as_markdown` | Web Unlocker | Scrapes any webpage bypassing bot protection, returns clean Markdown | Deep-dive analysis of specific threat pages |
| 3 | `brightdata__discover` | AI-Ranked Search | Intent-based web discovery returning contextually relevant pages | Smart threat discovery |
| 4 | `brightdata__scrape_batch` | Parallel Unlocker | Concurrent scraping of up to 10 URLs simultaneously | Mass evidence collection |
| 5 | `brightdata__social_monitor` | Social SERP | Monitors Twitter/X, Reddit, Telegram for brand mentions and threat chatter | Social media OSINT |
| 6 | `brightdata__dark_web_scanner` | Deep Web SERP | Scans dark web forums, marketplaces, and paste sites for data listings | Dark web threat intelligence |
| 7 | `brightdata__github_secrets_scanner` | Code SERP | Searches GitHub for exposed API keys, credentials, and config files | Code leak detection |
| 8 | `brightdata__domain_intelligence` | Domain SERP | WHOIS, DNS, SSL cert transparency, subdomain discovery, typosquatting | Infrastructure reconnaissance |
| 9 | `brightdata__news_aggregator` | News SERP | Aggregates CVE feeds, security advisories, and threat landscape reports | Threat awareness |
| 10 | `brightdata__data_broker_check` | People SERP | Scans data broker sites for exposed employee PII and corporate data | PII exposure monitoring |
### Connection Architecture
- **MCP SSE Endpoint**: `https://mcp.brightdata.com/sse?token={token}&pro=1`
- **SERP API**: `https://api.brightdata.com/serp/google`
- **Web Unlocker Proxy**: `brd.superproxy.io:22225` (residential/datacenter proxy rotation)
- **Fallback Mode**: Comprehensive mock data engine for offline/demo/sandbox environments
## 🤖 Autonomous Agent Swarm — 6 Specialized Units
| Agent | Codename | Scope | Key Data Sources |
|:---|:---|:---|:---|
| 🔴 **Threat Intel** | `threat_intel` | Credential leaks, code exposure, dark web archives, malware indicators | SERP, Web Scraper, Social Monitor |
| 📜 **Regulatory** | `regulatory` | CISA advisories, SEC rulings, GDPR bulletins, FTC enforcement actions | News Aggregator, SERP, Web Scraper |
| 🏢 **Third-Party Risk** | `third_party_risk` | Vendor breaches, CVEs, outages, supply chain vulnerabilities | News Aggregator, SERP, Batch Scraper |
| 🛡️ **Brand Monitor** | `brand_monitor` | Typosquatting domains, executive impersonation, phishing kits | Domain Intelligence, SERP, Web Scraper |
| 📋 **Compliance Parser** | `compliance_parser` | Policy gap analysis, framework alignment auditing (SOC2, GDPR, ISO27001) | SERP, Web Scraper, News Aggregator |
| 🔍 **Data Exposure** | `data_exposure` | Social media leaks, dark web listings, GitHub secrets, data broker PII | Social Monitor, Dark Web Scanner, GitHub Scanner, Data Broker Check |
All agents run **concurrently** via `asyncio.gather()`, each powered by **Google Gemini 2.5** with MCP tool calling for autonomous web intelligence gathering.
## 🏗️ Architecture & Data Flow
graph TD
classDef client fill:#FF8C38,stroke:#CC6600,stroke-width:2px,color:#fff;
classDef agent fill:#10B981,stroke:#059669,stroke-width:2px,color:#fff;
classDef core fill:#F59E0B,stroke:#D97706,stroke-width:2px,color:#fff;
classDef db fill:#00cc99,stroke:#009966,stroke-width:2px,color:#fff;
classDef notify fill:#ff3366,stroke:#cc0033,stroke-width:2px,color:#fff;
UI[Ember-Gold Glassmorphism Dashboard]:::client -->|Dispatch Trigger| API[FastAPI Orchestrator]:::core
Cron[4-Hour Scheduler]:::core -->|Auto-Run| API
API -->|Spin Up Swarm| Swarm[Agentic Swarm Cluster]:::core
subgraph Swarm [6-Agent Autonomous Swarm]
A1[Threat Intel Agent]:::agent
A2[Regulatory Agent]:::agent
A3[Third-Party Risk Agent]:::agent
A4[Brand Monitor Agent]:::agent
A5[Compliance Parser Agent]:::agent
A6[Data Exposure Agent]:::agent
end
Swarm -->|10 MCP Tools| BD[Bright Data MCP Client]:::core
BD -->|SERP Searches| Google((Google/Bing))
BD -->|Web Unlocker| Sites((Paste Sites, Forums))
BD -->|Social Monitor| Social((Twitter, Reddit, Telegram))
BD -->|Dark Web Scanner| DarkWeb((Dark Web Markets))
BD -->|GitHub Scanner| GitHub((GitHub Repos))
BD -->|Domain Intel| DNS((WHOIS, DNS, SSL))
BD -->|News Aggregator| News((CVE Feeds, Advisories))
BD -->|Data Broker Check| Brokers((People Search, Brokers))
Swarm -->|Raw Findings| IOC[IOC Enrichment Engine]:::core
IOC -->|19 MITRE Mappings| Scorer[Gemini Risk Correlation Engine]:::core
Scorer -->|Deduplicate & Score 0-100| Report[Risk Report & Roadmap]:::core
Report -->|Store Schema| Neon[(Neon Postgres / SQLite)]:::db
Report -->|Broadcast Progress| WS[WebSocket Manager]:::core
WS -->|Real-Time Streams| UI
Report -->|Slack Blocks Formatter| Slack[Slack Alert Service]:::notify
Slack -->|Live Webhook| SlackChannel((Slack #oscanft-alerts))
## 🛡️ MITRE ATT&CK Coverage — 19 Threat Classifications
OSCANFT maps all findings against the **MITRE ATT&CK Enterprise** framework:
| Finding Type | Tactic | Technique | Agent Source |
|:---|:---|:---|:---|
| `credential_leak` | TA0006 Credential Access | T1552 Unsecured Credentials | Threat Intel |
| `code_leak` | TA0009 Collection | T1213 Data from Info Repositories | Threat Intel |
| `typosquat` | TA0001 Initial Access | T1566 Phishing | Brand Monitor |
| `dark_web_mention` | TA0043 Reconnaissance | T1593 Search Open Websites | Threat Intel |
| `infrastructure_exposure` | TA0043 Reconnaissance | T1595 Active Scanning | Brand Monitor |
| `compliance_gap` | TA0005 Defense Evasion | T1562 Impair Defenses | Compliance Parser |
| `vendor_breach` | TA0001 Initial Access | T1199 Trusted Relationship | Third-Party Risk |
| `regulatory_update` | TA0043 Reconnaissance | T1592 Gather Victim Identity | Regulatory |
| `exec_impersonation` | TA0001 Initial Access | T1566.002 Spearphishing Link | Brand Monitor |
| `phishing_kit` | TA0001 Initial Access | T1566.003 Spearphishing Attachment | Brand Monitor |
| `outage` | TA0040 Impact | T1499 Endpoint DoS | Third-Party Risk |
| `cve_vulnerability` | TA0001 Initial Access | T1190 Exploit Public-Facing App | Third-Party Risk |
| `policy_violation` | TA0005 Defense Evasion | T1562 Impair Defenses | Compliance Parser |
| `social_media_exposure` | TA0043 Reconnaissance | T1593.001 Social Media | Data Exposure |
| `dark_web_listing` | TA0043 Reconnaissance | T1597 Search Closed Sources | Data Exposure |
| `pii_exposure` | TA0009 Collection | T1530 Data from Cloud Storage | Data Exposure |
| `github_secret_leak` | TA0006 Credential Access | T1552.004 Private Keys | Data Exposure |
| `subdomain_takeover` | TA0001 Initial Access | T1584 Compromise Infrastructure | Brand Monitor |
| `news_cve_alert` | TA0001 Initial Access | T1190 Exploit Public-Facing App | Regulatory |
## 👁️ Cyber Command Center Dashboard
OSCANFT features a premium ember-gold themed SOC console:
- **Unified Threat Gauge** — Animated conic ring showing corporate threat index (0–100)
- **MITRE ATT&CK Heatmap Grid** — 7-tactic enterprise coverage with progressive heat levels
- **Tactical Action Roadmap** — Auto-prioritized *Immediate* vs *Defensive* remediation queues
- **6-Agent Status Board** — Live running/completed/failed state for each sweeper
- **Vector Sparkline Chart** — SVG trendlines mapping historical scan results
- **Slide-In Detail Drawer** — Full evidence blocks, MITRE tags, and JSON export
- **Live Console Log** — Terminal-style WebSocket progress with color-coded output
- **Particle Background** — Animated network mesh with ember-colored nodes and connections
## 📡 API Documentation
| Endpoint | Method | Description |
|:---|:---|:---|
| `/api/scan` | `POST` | Dispatches the 6-agent autonomous swarm concurrently |
| `/api/scans` | `GET` | Lists recent completed security evaluations |
| `/api/scans/{scan_id}` | `GET` | Full findings dataset for a specific scan |
| `/api/scans/{scan_id}/export` | `GET` | JSON or RFC-4180 CSV report export |
| `/api/findings` | `GET` | Active findings with optional `agent` and `severity` filters |
| `/api/risk-score` | `GET` | Current risk score, executive brief, and trend data |
| `/api/agents/status` | `GET` | Connection states and scopes for all 6 sweepers |
| `/api/health` | `GET` | System health, database type, and scan status |
| `/api/stats` | `GET` | Aggregated telemetry: severity distribution, agent counts |
| `/ws/scan-progress` | `WS` | Real-time WebSocket stream during background sweeps |
## ⚙️ Technology Stack
| Layer | Technology | Purpose |
|:---|:---|:---|
| **AI Engine** | Google Gemini 2.5 Flash | Autonomous agent reasoning, threat analysis, risk scoring |
| **Data Collection** | Bright Data MCP | 10 web intelligence tools with anti-bot capabilities |
| **Backend** | FastAPI + Uvicorn | Async REST API with WebSocket support |
| **Database** | Neon Serverless Postgres | Cloud-native storage with SQLite local fallback |
| **Frontend** | Vanilla HTML/CSS/JS | Zero-dependency glassmorphism SOC dashboard |
| **Notifications** | Slack Block Kit API | AI-formatted security digest alerts |
| **Scheduling** | Python `schedule` | 4-hour automated scan orchestration |
| **Containerization** | Docker + Compose | One-command deployment |
### Python Dependencies
google-genai>=1.14.0 # Google Gemini SDK
mcp>=1.2.0 # Model Context Protocol
fastapi>=0.115.0 # Async web framework
uvicorn>=0.34.0 # ASGI server
python-dotenv>=1.0.0 # Environment config
pydantic>=2.10.0 # Data validation
schedule>=1.2.2 # Cron scheduling
httpx>=0.28.0 # Async HTTP client
websockets>=14.0 # WebSocket support
psycopg2-binary>=2.9.9 # PostgreSQL driver
rich>=13.9.0 # Console formatting
aiofiles>=24.0 # Async file I/O
## ⚙️ Environment Configuration
Create a `.env` file in the root directory:
# API Keys & Endpoints (Google GenAI API Key is required)
GEMINI_API_KEY=your_gemini_api_key_here
# Bright Data MCP Client (Leave empty for fallback mock mode)
BRIGHT_DATA_MCP_TOKEN=your_bright_data_mcp_token_here
BRIGHT_DATA_MCP_URL=https://mcp.brightdata.com/mcp?token=your_token_here
# Database URL (Optional: Leave empty for local SQLite fallback)
NEON_API_KEY=
DATABASE_URL=
# Notifications (Optional: Outputs Blocks markup to console if empty)
SLACK_BOT_TOKEN=
SLACK_CHANNEL=#oscanft-alerts
PORT=8000
# Target Profile (Customize for your organization)
TARGET_ORG=Acme SaaS
TARGET_DOMAINS=acme-saas.com,acme-security.net
TARGET_BRAND_TERMS=Acme,AcmeSaaS,AcmeCorp
TARGET_IP_RANGES=192.168.1.0/24,10.0.0.0/16
TARGET_VENDORS=[{"name": "Stripe", "criticality": "critical", "data_access": "payment_data"}, {"name": "Auth0", "criticality": "critical", "data_access": "identity"}, {"name": "AWS", "criticality": "high", "data_access": "infrastructure"}]
## 🚀 Quick Start
### Step 1: Install Dependencies
pip install -r requirements.txt
### Step 2: Set API Credentials
Generate an official Gemini API Key from Google AI Studio and place it in the `.env` file.
### Step 3: Launch OSCANFT
python main.py
### Step 4: Open Command Center
👉 **[http://localhost:8000](http://localhost:8000)**
## 🐳 Docker Deployment
### Docker Compose (Recommended)
docker-compose up -d --build
### Docker CLI
docker build -t oscanft .
docker run -d -p 8000:8000 --env-file .env -v oscanft-data:/app/data --name oscanft oscanft
## 🔍 Codebase Structure
oscanft/
├── main.py # Orchestrator (Cron scheduler + FastAPI boot)
├── config.py # Organization and API parameters loader
├── requirements.txt # Python dependencies
├── Dockerfile # Container config
├── docker-compose.yml # Compose with persistent volumes
├── .dockerignore # Build exclusions
├── mcp_bridge/
│ ├── bright_data.py # 10 Bright Data MCP tools with offline mock engines
│ ├── neon.py # Neon PG client & SQLite DDL fallback
│ └── gemini_adapter.py # Gemini/AIMLAPI bridge with tool calling
├── agents/
│ ├── base_agent.py # Swarm skeleton with prompt interpolation & retries
│ ├── threat_intel.py # Credential leak and code exposure monitor
│ ├── regulatory.py # Legal, CISA, and SEC directive crawler
│ ├── third_party_risk.py # Vendor breach and CVE analyzer
│ ├── brand_monitor.py # Typosquat and phishing kit hunter
│ ├── compliance_parser.py # SOC2/GDPR alignment auditor
│ └── data_exposure.py # Social/dark web/PII exposure monitor
├── engine/
│ ├── models.py # Pydantic schemas (Finding, RiskReport, AgentRun)
│ ├── ioc_enrichment.py # Regex classifiers & 19 MITRE ATT&CK mappings
│ ├── risk_scorer.py # Gemini deduplicator and rating compiler
│ ├── report_generator.py # JSON/CSV export formatters
│ └── slack_alerts.py # Slack Block Kit generator
├── prompts/ # 7 agent system prompts + MCP tools instructions
├── db/
│ ├── repository.py # Data access layer
│ └── schema.sql # PostgreSQL/SQLite DDL
└── dashboard/
├── index.html # SOC command center layout
├── styles.css # Ember-gold-emerald design system (1800+ lines)
└── dashboard.js # WebSocket controller & UI renderer
## 📄 License
MIT License — See [LICENSE](LICENSE) for details.