LxveAce/worldviewosnit

GitHub: LxveAce/worldviewosnit

一个针对特定目标的自动化网络情报与逆向分析实验室,集成了流量捕获、行为模拟与多维度数据关联报告。

Stars: 0 | Forks: 0

# 🧠 Ultimate Web Intelligence & Reverse Engineering Lab ## 🎯 Mission Perform a full-spectrum technical, behavioral, and security analysis of: **Target:** https://worldviewosint.com/ Work out of this GitHub repo. All logs, scripts, captures, and reports are stored here. You will behave as an autonomous: - OSINT analyst - Network traffic analyst - Reverse engineer - Backend inference engine Your goal is to **capture, analyze, reconstruct, and classify** every observable behavior, data source, and communication channel used by the target system. # ⚠️ Operating Principles - Operate autonomously — minimize manual intervention - Log ALL observable data — nothing is irrelevant until proven otherwise - Focus on observation, inference, reconstruction - Prefer evidence over assumptions — cite specific requests, headers, payloads - Remain low-noise during probing — no aggressive scanning, no brute-force - Document every finding as you go, not after the fact - If a phase produces no results, log that explicitly (absence of evidence is data) # 📁 PROJECT STRUCTURE Maintain this directory layout in the repo root: worldviewosnit/ ├── README.md ├── captures/ │ ├── traffic.har # HAR capture from Playwright │ ├── traffic_dump.mitm # Raw mitmproxy dump │ └── network.json # Parsed request/response log ├── scripts/ │ ├── capture.js # Playwright automation script │ ├── replay.sh # API replay script │ └── analyze.py # Log analysis / correlation ├── logs/ │ ├── endpoints.json # Discovered endpoints + status codes │ ├── telegram.json # Telegram-specific traffic │ ├── cookies.json # Cookie inventory │ └── console.json # Browser console output ├── recon/ │ ├── dns.txt # WHOIS + DNS records │ ├── tls.txt # TLS certificate details │ ├── subdomains.txt # Subdomain enumeration results │ └── source-analysis.md # JavaScript / source map findings ├── reports/ │ ├── architecture.md # Architecture diagram │ ├── api-map.md # Full endpoint map │ ├── telegram-report.md # Telegram forensics report │ ├── data-authenticity.md # Signal validation report │ ├── backend-fingerprint.md # Stack identification │ ├── security-assessment.md # Risk assessment │ └── final-classification.md # System intent + verdict └── tools/ └── setup.sh # Automated toolchain installer # 🧰 TOOLCHAIN SETUP Install all tools before beginning. Verify each installation before proceeding. ## Core - **Node.js** (latest LTS) — `node --version` - **Python 3.11+** — `python --version` - **jq** — JSON processing — `jq --version` - **curl** — HTTP requests — `curl --version` - **httpie** — human-readable HTTP — `http --version` ## Browser Automation - **Playwright** — `npx playwright --version` - **Chromium** — installed via Playwright: `npx playwright install chromium` ## Network Interception - **mitmproxy** (v10+) — `mitmproxy --version` ## DNS / Domain Recon - **whois** — domain registration lookup - **dig** or **nslookup** — DNS record queries - **subfinder** (optional) — subdomain enumeration - **openssl** — TLS certificate inspection ## Optional - **tcpdump** / **Wireshark** — low-level packet capture - **Wappalyzer CLI** or **webanalyze** — technology fingerprinting ## Verification Run the following to confirm the toolchain is ready: node --version && python3 --version && jq --version && curl --version | head -1 && mitmproxy --version && npx playwright --version All tools must return valid version numbers before proceeding to Phase 1. # 🌐 ARCHITECTURE STACK [Agent] ↓ [Playwright Automation] ↓ [Chromium Browser (headless or headed)] ↓ [mitmproxy Interception Layer — 127.0.0.1:8080] ↓ [Target Site + APIs + Third-Party Endpoints (Telegram, CDNs, etc.)] Data flows downward. Every layer captures and logs. mitmproxy sees all HTTPS traffic after certificate trust is established. # 🧪 PHASE 0 — ENVIRONMENT PREPARATION Before touching the target, prepare a clean analysis environment. ## Operational Security 1. **Use a VPN or isolated network** — your real IP will be logged by the target (especially if it phones home to Telegram) 2. **Use a clean browser profile** — no cookies, no extensions, no saved logins 3. **Dedicated workspace** — clone this repo, work exclusively inside it 4. **No authentication** — do NOT log into anything on the target; observe as an anonymous visitor ## Environment Checklist - [ ] VPN active and verified (check IP at https://ifconfig.me) - [ ] Toolchain installed and verified (see above) - [ ] Repo cloned and directory structure created - [ ] mitmproxy CA certificate NOT yet trusted (do that in Phase 2) - [ ] Browser profile is clean / fresh # 🧪 PHASE 1 — DNS & DOMAIN RECONNAISSANCE Before intercepting traffic, gather passive intelligence about the target's infrastructure. ## 1.1 — WHOIS Lookup whois worldviewosint.com > recon/dns.txt Extract: - Registrar - Registration / expiration dates - Registrant info (or privacy service) - Name servers ## 1.2 — DNS Records dig worldviewosint.com ANY +noall +answer >> recon/dns.txt dig worldviewosint.com A >> recon/dns.txt dig worldviewosint.com AAAA >> recon/dns.txt dig worldviewosint.com CNAME >> recon/dns.txt dig worldviewosint.com MX >> recon/dns.txt dig worldviewosint.com TXT >> recon/dns.txt dig worldviewosint.com NS >> recon/dns.txt Look for: - **CNAME to Vercel/Netlify/Cloudflare** → hosting provider - **TXT records** → SPF, DKIM, domain verification tokens (Google, etc.) - **MX records** → email infrastructure (or lack thereof) ## 1.3 — Subdomain Enumeration subfinder -d worldviewosint.com -o recon/subdomains.txt Or manually check common subdomains: for sub in www api app admin dev staging mail; do dig +short "$sub.worldviewosint.com" >> recon/subdomains.txt done ## 1.4 — TLS Certificate Analysis echo | openssl s_client -connect worldviewosint.com:443 -servername worldviewosint.com 2>/dev/null | openssl x509 -noout -text > recon/tls.txt Extract: - Issuer (Let's Encrypt → likely automated; Cloudflare → CDN-proxied) - Subject Alternative Names (SANs) — may reveal related domains - Validity period - Certificate chain ## 1.5 — Passive Observations Note: - Does the domain resolve to a CDN (Cloudflare, Vercel, AWS CloudFront)? - Are there any CAA records restricting certificate issuance? - Does the IP belong to a known hosting provider? (`whois `) Save all output to `recon/dns.txt` and `recon/tls.txt`. # 🧪 PHASE 2 — NETWORK INTERCEPTION SETUP ## 2.1 — Start mitmproxy mitmdump --save-stream-file captures/traffic_dump.mitm This starts the proxy on `127.0.0.1:8080` by default and writes all traffic to the dump file. ## 2.2 — Install and Trust the mitmproxy CA Certificate 1. With mitmproxy running, open a browser and navigate to: `http://mitm.it` 2. Download the certificate for your OS 3. Trust the certificate: - **macOS:** Add to Keychain Access → System → "Always Trust" - **Windows:** Install to "Trusted Root Certification Authorities" - **Linux:** Copy to `/usr/local/share/ca-certificates/` and run `sudo update-ca-certificates` ## 2.3 — Verify Interception curl --proxy http://127.0.0.1:8080 https://worldviewosint.com/ -o /dev/null -w "%{http_code}" -s Expected: HTTP `200`. If you get a certificate error, the CA cert is not trusted correctly. Fix before proceeding. # 🧪 PHASE 3 — FULL TRAFFIC CAPTURE Launch Chromium via Playwright with the mitmproxy proxy enabled. Capture everything. ## What to Capture - XHR / fetch requests - WebSocket connections and frames - Script loads (JS bundles, chunks) - Stylesheet and font loads - Image / media requests - Third-party API calls (analytics, CDNs, Telegram, etc.) - Service Worker registrations and fetch events ## What to Log for Each Request/Response | Field | Description | |-------|-------------| | `url` | Full request URL | | `method` | HTTP method (GET, POST, etc.) | | `headers` | All request and response headers | | `request_body` | POST/PUT body (if any) | | `response_body` | Response content | | `status` | HTTP status code | | `content_type` | Response content type | | `timestamp` | Unix timestamp (ms) | | `duration` | Time from request to response (ms) | | `initiator` | What triggered the request (script, user action, etc.) | ## Output Files - `captures/network.json` — structured request/response log - `captures/traffic.har` — HAR format for browser devtools import - `logs/console.json` — browser console output (errors, warnings, logs) # 🧪 PHASE 4 — BEHAVIOR SIMULATION Simulate realistic user behavior to trigger all network activity the site is capable of producing. ## Interaction Sequence 1. **Initial page load** — observe all requests fired on first visit 2. **Wait 30 seconds** — watch for background polling, WebSocket connections, delayed XHR calls 3. **Scroll the full page** — some content lazy-loads or triggers on scroll 4. **Click all interactive elements:** - Buttons, toggles, tabs, dropdowns - Map interactions (zoom, pan, click markers) - Any navigation links / route changes 5. **Hover over interactive elements** — tooltips, popups, info panels 6. **Resize the browser window** — responsive breakpoints may load different assets or APIs 7. **Navigate to all visible routes/pages** — capture each page's network activity independently 8. **Wait another 30 seconds after all interactions** — catch any delayed or periodic calls 9. **Repeat the full sequence once** — compare traffic patterns for consistency ## What to Watch For - Requests that fire on a timer (polling intervals) - Requests that fire on specific user actions only - WebSocket messages that arrive without user interaction - Differences between first-load and subsequent-load traffic - Requests to domains other than the target (third-party calls) # 🧪 PHASE 5 — DATA LOGGING & ORGANIZATION Structure all captured data for analysis. ## Structured Outputs 1. **Unique endpoints list** — deduplicated, sorted by domain, then path 2. **Grouped responses by endpoint** — all responses for each endpoint in chronological order 3. **Frequency tracking** — how often each endpoint is called per minute 4. **Payload size tracking** — request and response sizes per endpoint 5. **Timeline view** — all requests plotted on a timeline (use timestamps) ## Deduplication Rules - Same URL + same method + same request body = duplicate (keep first occurrence, count total) - Same URL + different query params = separate entries - Same URL + different response body = track as "dynamic" endpoint ## File Format All logs as JSON arrays. Example entry: { "url": "https://worldviewosint.com/api/data", "method": "GET", "status": 200, "content_type": "application/json", "request_headers": {}, "response_size": 4523, "timestamp": 1716300000000, "duration_ms": 142, "occurrence_count": 5, "classification": "REAL-TIME API" } # 🧪 PHASE 6 — TELEGRAM FORENSICS (CRITICAL) This is the highest-priority analysis phase. The target may be exfiltrating visitor data to Telegram. ## Detection Filter all captured traffic for any requests to: api.telegram.org Also check for: - Obfuscated Telegram calls (base64-encoded URLs, proxied through the target's own backend) - References to `t.me` or `telegram` in JavaScript source code ## For EACH Telegram Request, Extract: ### URL Structure Telegram Bot API URLs follow this pattern: https://api.telegram.org/bot/ Extract: - **Bot token** — the string between `bot` and `/` (e.g., `bot123456:ABC-DEF...`) - **Method** — the API method called (e.g., `sendMessage`, `sendPhoto`, `sendDocument`) ### Request Payload - `chat_id` — the target chat/channel/group - Positive number = individual user - Negative number = group chat - `-100` prefix = supergroup or channel - `text` — message content (may contain visitor data) - `parse_mode` — formatting mode (HTML, Markdown) - Any file attachments or media ### Response - `ok` field (boolean) - `result.message_id` — confirms message was sent - Error messages if failed ## Bot Identity Verification Using the extracted bot token, query: curl https://api.telegram.org/bot/getMe This reveals: - Bot username - Bot display name - Whether it can join groups - Whether it's a public bot ## Determine for Each Call | Question | How to Answer | |----------|--------------| | What triggers it? | Correlate timestamp with user action timeline from Phase 4 | | How often? | Count occurrences, check for interval patterns | | What data is sent? | Parse the `text` field and any attached data | | Does it include visitor IP? | Check text content and any headers forwarded | | Does it include user agent? | Check text content | | Does it include interaction data? | Check if click/scroll events are referenced | | Is it one-way logging or bidirectional? | Check for any response-dependent behavior | ## Classification Classify the Telegram integration as: - **Visitor logging** — sends data about each visitor (IP, UA, etc.) - **Alert system** — sends notifications on specific events - **Analytics/tracking** — aggregates and reports interaction data - **C2 (command and control)** — receives instructions from Telegram (check for polling of `getUpdates`) - **Exfiltration** — sends sensitive or identifiable data without user consent # 🧪 PHASE 7 — ENDPOINT DISCOVERY Go beyond observed endpoints. Probe for hidden or undocumented paths. ## Step 1 — Harvest from Observed Traffic From Phase 3-5 captures, extract all unique URL paths. These are your known endpoints. ## Step 2 — Check Standard Discovery Files curl -s https://worldviewosint.com/robots.txt curl -s https://worldviewosint.com/sitemap.xml curl -s https://worldviewosint.com/.well-known/security.txt curl -s https://worldviewosint.com/.well-known/openapi.json ## Step 3 — Probe Common Framework Paths Based on the stack inferred in Phase 1 (DNS) and Phase 10 (Fingerprinting), probe paths relevant to the detected framework: ### General /api/ /api/v1/ /api/v2/ /internal/ /debug/ /admin/ /health /healthcheck /status /metrics /graphql /.env /config.json ### Next.js Specific /_next/data/ /api/hello /_next/static/ /__nextjs_original-stack-frame ### Python/FastAPI Specific /docs /redoc /openapi.json ### Vercel Specific /.vercel/ /_vercel/insights/script.js ## Step 4 — Log Results For each probed path, record: | Field | Value | |-------|-------| | Path | URL path | | Status Code | HTTP response code | | Response Size | Content-Length or body length | | Content-Type | Response type | | Classification | `valid` / `hidden` / `redirect` / `error` / `dead` | ## Rate Limiting - Wait **500ms between requests** — do not hammer the server - If you receive `429 Too Many Requests`, back off and note the rate limit headers - If you receive `403 Forbidden`, log it but do not retry with bypass techniques Save results to `logs/endpoints.json`. # 🧪 PHASE 8 — CLIENT-SIDE CODE ANALYSIS Analyze the JavaScript bundles, source maps, and client-side storage used by the target. ## 8.1 — JavaScript Bundle Analysis From the captured traffic (Phase 3), extract all `.js` files loaded by the site. For each bundle: - Note the filename pattern (hashed chunks = Webpack/Vite/Next.js build) - Check file size — large bundles may contain embedded data or logic - Search for readable strings: API keys, endpoints, tokens, config objects # Extract all JS URLs from the HAR file cat captures/traffic.har | jq -r '.log.entries[].request.url' | grep '\.js' | sort -u ## 8.2 — Source Map Recovery Check if source maps are exposed: # For each JS bundle URL, check for .map suffix curl -s -o /dev/null -w "%{http_code}" https://worldviewosint.com/_next/static/chunks/main-abc123.js.map If source maps are available: - Download them - Reconstruct the original source tree - This reveals the full application source code, component structure, and build configuration ## 8.3 — Hardcoded Secrets Search In all captured JavaScript, search for: /api[_-]?key/i /token/i /secret/i /password/i /telegram/i /bot[0-9]/i /chat_id/i /firebase/i /supabase/i /\.env/i Any hardcoded API keys or tokens are critical findings. ## 8.4 — Client-Side Storage During the Playwright session, dump: // Local Storage JSON.stringify(Object.entries(localStorage)); // Session Storage JSON.stringify(Object.entries(sessionStorage)); // Cookies document.cookie; // IndexedDB databases indexedDB.databases(); Look for: - Session tokens or identifiers - Cached API responses - User tracking identifiers (fingerprint hashes, UUIDs) - Analytics IDs ## 8.5 — Service Worker Analysis Check if a Service Worker is registered: navigator.serviceWorker.getRegistrations(); If present: - Download the SW script - Analyze what it caches - Check if it intercepts or modifies requests - Check if it sends background sync / push notification requests Save findings to `recon/source-analysis.md`. # 🧪 PHASE 9 — DATA SOURCE CLASSIFICATION For ALL observed data streams (API responses, WebSocket messages, embedded data), classify each one. ## Classification Labels | Label | Definition | How to Identify | |-------|-----------|-----------------| | **STATIC** | Hardcoded in the page or JS bundles | Same content on every load, present in source code | | **REAL-TIME API** | Fetched live from an external data source | Changes between requests, has timestamps near current time | | **CACHED** | Server-side cached version of live data | Identical responses within a time window, then changes | | **AGGREGATED** | Compiled from multiple sources (RSS, scraping) | Mixed formatting, inconsistent structure, attribution markers | | **PROXIED** | Fetched server-side from another API, served to client | Target domain URL but data structure matches known external API | | **SYNTHETIC** | Fabricated or procedurally generated | Unrealistic values, perfect distributions, no external source match | ## For Each Data Stream, Document: 1. Endpoint URL 2. Data type (JSON, XML, plaintext, binary) 3. Update frequency (if polled) 4. Classification label 5. Confidence level (HIGH / MED / LOW) 6. Evidence supporting the classification # 🧪 PHASE 10 — BACKEND FINGERPRINTING Identify the server-side technology stack through observable signals. ## 10.1 — Response Header Analysis Extract and analyze these headers from ALL responses: | Header | What It Reveals | |--------|----------------| | `server` | Web server software (nginx, Apache, etc.) | | `x-powered-by` | Framework (Express, Next.js, PHP, etc.) | | `x-vercel-id` | Vercel deployment (confirms serverless hosting) | | `x-vercel-cache` | Vercel edge cache status | | `x-nextjs-cache` | Next.js ISR/SSR cache status | | `cf-ray` | Cloudflare proxy (confirms CDN) | | `x-request-id` | Request tracing (common in production systems) | | `set-cookie` | Session management, tracking cookies | | `content-security-policy` | CSP rules — reveals allowed script/connect sources | | `access-control-allow-origin` | CORS configuration — reveals allowed origins | | `strict-transport-security` | HSTS configuration | | `x-frame-options` | Clickjacking protection | | `x-content-type-options` | MIME sniffing protection | ## 10.2 — Inference Table | Observed Signal | Inference | |-----------------|-----------| | `x-vercel-id` present | Hosted on Vercel (serverless) | | `x-powered-by: Next.js` | React SSR/SSG frontend | | `server: uvicorn` | Python ASGI backend (likely FastAPI) | | `server: nginx` | Reverse proxy or direct server | | `cf-ray` present | Behind Cloudflare CDN | | `/_next/` paths in URLs | Confirmed Next.js application | | `/api/` routes returning JSON | Backend API layer present | | WebSocket upgrade headers | Real-time data push capability | | `set-cookie: __cf_bm` | Cloudflare bot management active | ## 10.3 — Error Page Fingerprinting Deliberately trigger error responses to reveal framework information: curl -s https://worldviewosint.com/nonexistent-page-abc123 curl -s https://worldviewosint.com/api/nonexistent curl -s -X POST https://worldviewosint.com/ Custom error pages often reveal the framework (Next.js default 404, FastAPI validation errors, etc.). ## 10.4 — Technology Fingerprinting If `webanalyze` or Wappalyzer CLI is available: webanalyze -host worldviewosint.com -crawl 2 ## Summary Determination Document: - **Hosting:** serverless vs. traditional vs. containerized - **Frontend framework:** React, Vue, vanilla, etc. - **Backend framework:** Next.js API routes, FastAPI, Express, etc. - **CDN/proxy:** Cloudflare, Vercel Edge, none - **Caching behavior:** edge-cached, stale-while-revalidate, no-cache - **Polling frequency:** interval between repeated API calls Save to `reports/backend-fingerprint.md`. # 🧪 PHASE 11 — API REPLAY Replay observed API calls outside the browser to determine authentication requirements, response stability, and rate limits. ## Method For each discovered API endpoint, replay using curl: # Basic replay curl -s "https://worldviewosint.com/api/endpoint" \ -H "User-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36" \ | jq . > replay_response.json # Compare with captured response diff <(cat captured_response.json | jq -S .) <(cat replay_response.json | jq -S .) ## For Each Endpoint, Determine: | Test | Method | What It Reveals | |------|--------|-----------------| | **No headers** | `curl URL` | Does it require auth at all? | | **With browser UA** | `curl -H "User-Agent: ..." URL` | UA-based filtering? | | **With cookies** | `curl -b "cookie=value" URL` | Session-dependent? | | **Without referer** | Omit Referer header | Referer checking? | | **Repeated calls** | 10 calls in 60 seconds | Rate limited? At what threshold? | | **Different IP** | Via VPN/proxy change | IP-based rate limiting? | | **After time delay** | Same call 5 minutes later | Response changes? (dynamic vs cached) | | **Parameter manipulation** | Change query params | Input validation? Different data? | ## Comparison Matrix For each endpoint, record: | Endpoint | Auth Required | Static/Dynamic | Rate Limited | Cache TTL | |----------|:------------:|:--------------:|:------------:|:---------:| | `/api/...` | yes/no | S/D | yes/no/unknown | seconds | Save to `reports/api-map.md`. # 🧪 PHASE 12 — SIGNAL VALIDATION Assess whether the data displayed by the target is real, transformed, or fabricated. ## Detection Methods ### Temporal Analysis - Compare data timestamps to current wall-clock time - Are timestamps plausible? (within expected range for the data type) - Do timestamps progress naturally, or do they repeat/reset? - Is there a fixed offset between "live" data and actual time? ### Content Analysis - Are there repeated entries with identical content but different timestamps? - Do severity/priority values follow a realistic distribution, or are they uniform/random? - Do geographic coordinates correspond to real locations? - Do entity names (ships, aircraft, people) resolve to real-world entities? ### Behavioral Analysis - Does data update on a fixed interval regardless of world events? - Does "real-time" data change when the page is backgrounded (tab not visible)? - Does refreshing the page show the same "live" data or different data? - Are update intervals suspiciously regular (exactly every N seconds)? ### Cross-Reference - Compare claimed data against known public sources: - **Earthquakes** → USGS API (`earthquake.usgs.gov`) - **Aircraft** → ADS-B Exchange, FlightRadar24 - **Maritime** → MarineTraffic, AIS data - **News** → Original RSS source articles - **Weather** → NOAA, OpenWeatherMap ### Statistical Methods - Calculate entropy of data fields — low entropy suggests synthetic/templated data - Check distribution of numeric values — uniform distribution is unnatural - Look for copy-paste artifacts (identical phrasing, formatting) ## Confidence Assignment | Level | Criteria | |-------|----------| | **HIGH** (real) | Data matches external sources, timestamps are current, content is unique | | **MED** (transformed) | Data resembles real sources but is reformatted, delayed, or aggregated | | **LOW** (synthetic) | Data has no external source match, shows statistical anomalies, or is clearly fabricated | Save to `reports/data-authenticity.md`. # 🧪 PHASE 13 — BEHAVIOR MODELING Reconstruct the complete system logic from observed evidence. ## Data Lifecycle Model Map the full path of data through the system: [External Source] → [Ingestion Method] → [Server-Side Processing] → [API Response] → [Client-Side Rendering] → [User Display] For each data type, document: 1. **Source** — where does the data originate? 2. **Ingestion** — how does the server get it? (API call, scrape, WebSocket, RSS) 3. **Transformation** — is it modified? (reformatted, filtered, enriched, delayed) 4. **Delivery** — how does it reach the browser? (REST API, WebSocket, SSR, embedded in HTML) 5. **Rendering** — what UI component displays it? ## UI → Network Relationship Map For each user-facing feature: | UI Element | User Action | Network Request Triggered | Endpoint | |------------|-------------|---------------------------|----------| | Globe | Click marker | GET /api/... | ... | | Feed panel | Scroll | GET /api/... | ... | | Auto-refresh | None (timer) | GET /api/... every Ns | ... | ## System Flow Diagram Produce a text-based or Mermaid diagram: graph TD A[User visits site] --> B[Initial page load] B --> C[JS bundles loaded] C --> D[API calls fire] D --> E[Data rendered in UI] D --> F[Telegram notification sent?] E --> G[Polling loop begins] G --> D ## Purpose Classification Based on all evidence, classify the system's purpose: - **Visualization tool** — displays data for informational purposes - **Monitoring dashboard** — actively watches for events/thresholds - **Tracking/logging tool** — records visitor behavior - **Honeypot** — designed to attract and monitor visitors - **Portfolio/demo** — non-functional showcase Save to `reports/architecture.md`. # 🧪 PHASE 14 — SECURITY ANALYSIS Evaluate the security posture and risk profile of the target system. ## 14.1 — Data Exfiltration Assessment | Vector | Check | Finding | |--------|-------|---------| | Telegram API calls | Phase 6 results | What data is sent? | | Third-party analytics | Google Analytics, Mixpanel, etc. | What is tracked? | | Pixel tracking | 1x1 images, tracking pixels | Present? To where? | | WebSocket exfiltration | Data sent via WS to non-target origins | Present? | ## 14.2 — Visitor Tracking Assessment | Technique | How to Detect | |-----------|---------------| | IP logging | Check Telegram payloads for IP addresses | | Browser fingerprinting | Look for canvas, WebGL, AudioContext, font enumeration in JS | | Cookie tracking | Unique identifiers in cookies that persist across sessions | | Local storage tracking | UUIDs or fingerprint hashes stored client-side | | Session recording | Look for FullStory, Hotjar, LogRocket scripts | ## 14.3 — Application Security | Check | Method | |-------|--------| | **Content Security Policy** | Read CSP header — is it restrictive or permissive? | | **CORS configuration** | Is `Access-Control-Allow-Origin: *`? (overly permissive) | | **Mixed content** | Any HTTP resources loaded on HTTPS page? | | **Exposed secrets** | API keys, tokens in JS source (Phase 8) | | **Information disclosure** | Verbose error messages, stack traces, debug headers | | **Exposed source maps** | `.js.map` files accessible (Phase 8) | | **Open redirects** | URL parameters that control redirects | | **Clickjacking protection** | X-Frame-Options or CSP frame-ancestors set? | ## 14.4 — Privacy Compliance - Does the site display a cookie consent banner? - Does it have a privacy policy? - Does the privacy policy disclose Telegram data transmission? - Are tracking cookies set before user consent? - Does it comply with GDPR/CCPA requirements? ## 14.5 — Risk Classification For each finding, assign a risk level: | Level | Criteria | Examples | |-------|----------|----------| | **LOW** | Informational, no direct user harm | Technology disclosure via headers | | **MEDIUM** | Privacy concern or minor security issue | Tracking without disclosure, permissive CORS | | **HIGH** | Active data exfiltration or serious security flaw | Sending visitor IPs to Telegram, exposed API keys, no CSP | | **CRITICAL** | Malicious intent or severe vulnerability | C2 communication, credential harvesting, malware delivery | Provide specific justification for each rating — evidence, not opinion. Save to `reports/security-assessment.md`. # 🧬 PHASE 15 — ADVANCED CORRELATION Cross-analyze findings from all previous phases to detect patterns invisible in isolation. ## Correlation Targets ### Payload Similarity - Do different endpoints return data with shared structures or field names? - Are the same data objects referenced across multiple API responses? - Do Telegram payloads contain data from specific API responses? ### Timing Correlation - Do API calls and Telegram calls happen at the same time? (piggyback exfiltration) - Is there a fixed delay between data ingestion and display? - Do polling intervals match any external data source's update frequency? ### Behavioral Loops - Does the system exhibit cycles? (e.g., every 60s: fetch data → display → report to Telegram) - Do "live" updates follow a deterministic pattern? (same sequence repeating) - Is "randomness" in the data actually pseudorandom with a detectable seed? ## Synthetic Behavior Detection | Signal | Indicates | |--------|-----------| | Data updates at exact intervals (e.g., every 30.0s) | Polling, not real-time push | | Data changes but structure stays identical | Template-based generation | | "Breaking" events appear on a schedule | Scripted, not organic | | All data sources update simultaneously | Single orchestrator, not independent feeds | ## Output Produce a correlation matrix documenting: 1. Which data streams are related 2. Which events are causally linked 3. Any evidence of fabrication or simulation # 🧬 PHASE 16 — SYSTEM INTENT INFERENCE Based on ALL evidence gathered across Phases 0–15, determine what this system truly is. ## Decision Framework | Classification | Evidence Required | |----------------|-------------------| | **Production OSINT platform** | Real data sources, original analysis, functional backend, no synthetic data | | **Monitoring dashboard** | Live data feeds, threshold alerts, operational indicators | | **Portfolio / demo** | Synthetic or cached data, no real backend processing, impressive UI with shallow depth | | **Tracking / surveillance tool** | Primary function is collecting visitor data (Telegram exfil, fingerprinting) | | **Honeypot** | Deliberately attracts security researchers or specific audiences, logs all visitors | | **Prototype / MVP** | Partial functionality, mix of real and placeholder data, incomplete features | ## Key Questions 1. Does it do what it claims to do? (Does the OSINT data represent real intelligence?) 2. Who is the intended audience? (Analysts, clients, the public, or no one?) 3. Is visitor tracking a side effect or the primary purpose? 4. How much engineering effort is behind it? (Sophisticated backend or API-wrapper frontend?) 5. Is it commercially operated or a personal project? ## Confidence Rating Assign an overall confidence level to your classification: - **HIGH** — multiple independent evidence streams support the conclusion - **MED** — evidence supports it but alternative explanations exist - **LOW** — insufficient evidence for a definitive conclusion Save to `reports/final-classification.md`. # 🧪 PHASE 17 — OPEN-SOURCE INTELLIGENCE TOOL RECON (GITHUB + ECOSYSTEM) Identify and correlate the tools, frameworks, and data sources used by the target system by searching open-source ecosystems. ## 🎯 Objective Determine whether the application is built from: - Known open-source OSINT platforms - Cloned or forked dashboards - Publicly available data pipelines - Common frontend visualization stacks ## 🔍 Search Targets ### GitHub Search for repositories related to: - `"osint dashboard"` - `"worldview osint"` - `"osint globe"` - `"3d globe intelligence"` - `"cesium osint"` - `"react osint dashboard"` - `"next.js osint"` - `"threat intelligence dashboard"` ### Package Registries - **npm:** Search for OSINT, globe, geospatial, cesium packages - **PyPI:** Search for OSINT, intelligence, data aggregation libraries ### Other Sources - Developer blogs and tutorials - OSINT tool directories (OSINT Framework, IntelTechniques) - GitHub Awesome lists (`awesome-osint`, `awesome-threat-intelligence`) ## 🔎 Keyword Sets Use variations of: - "OSINT platform" - "global intelligence dashboard" - "C4ISR dashboard" - "threat monitoring dashboard" - "real-time geospatial intelligence" - "incident monitoring globe" - "cyber threat map" ## 🧪 Component Identification From observed behavior + code patterns in Phase 8, match against known tools: ### Frontend Libraries | Component | Look For | |-----------|----------| | CesiumJS | 3D globe rendering, `cesium.com` in network traffic | | Mapbox GL | `mapbox.com` tokens or API calls | | Leaflet | `leafletjs.com` references, L.map() calls | | Three.js / Globe.gl | WebGL globe rendering | | Deck.gl | Geospatial data layers | | D3.js | Custom visualizations, SVG elements | | React / Next.js | `_next/` paths, React devtools markers | ### Backend / Data | Component | Look For | |-----------|----------| | FastAPI | `/docs` or `/redoc` endpoints, `uvicorn` header | | Express | `x-powered-by: Express` header | | Vercel Functions | `x-vercel-id` header, `/api/` routes | | Supabase | `supabase.co` in network traffic | | Firebase | `firebase.googleapis.com` calls | ### Data Sources | Source | Data Type | Verification URL | |--------|-----------|-----------------| | USGS | Earthquakes | `earthquake.usgs.gov/fdsnws/` | | ADS-B Exchange | Aircraft tracking | `adsbexchange.com` | | MarineTraffic / AIS | Ship tracking | `marinetraffic.com` | | GDELT | Global events | `api.gdeltproject.org` | | RSS feeds | News aggregation | Various | | ACLED | Conflict data | `acleddata.com` | ## 🔗 Correlation Process For each discovered open-source project, compare with the target system: 1. **UI similarity** — visual layout, color scheme, component arrangement 2. **Data structures** — API response shapes, field names, nesting patterns 3. **Endpoint naming** — `/api/events`, `/api/threats`, etc. 4. **Visualization style** — globe type, marker styles, panel layouts 5. **Feature overlap** — same feature set or subset? 6. **Code structure** — if source maps are available, compare component trees ## 🧬 Clone / Template Detection Determine if the system is: | Classification | Criteria | |----------------|----------| | **Direct clone** | Identical or near-identical to a known repo, minimal changes | | **Fork/modification** | Based on a known project but customized significantly | | **Inspired by** | Similar concept but different implementation | | **Original** | No significant matches found in open-source ecosystem | ## 🧠 Confidence Scoring For each match: | Level | Criteria | |-------|----------| | **HIGH** | Nearly identical structure, shared code, same API calls | | **MED** | Similar architecture and components, different implementation | | **LOW** | General conceptual similarity only | ## 📊 Output Produce: 1. List of matching repositories/tools with URLs 2. Description of similarities for each match 3. Likelihood of shared origin (HIGH / MED / LOW) 4. Identified tech stack components with evidence 5. Assessment: original work vs. assembled from existing tools ## ⚠️ Rules - Prioritize high-similarity matches — don't pad the report with weak matches - Ignore generic tools unless clearly relevant to the target - Focus on structural and functional overlap, not surface-level appearance - Remain evidence-based — link every claim to a specific observation # 📊 OUTPUT REQUIREMENTS Produce a structured final report covering all findings. Each section should be a standalone document in the `reports/` directory. ## 1. Architecture Diagram (`reports/architecture.md`) - Text-based and/or Mermaid diagram - Shows all system components and data flows - Includes external dependencies and third-party services ## 2. API Map (`reports/api-map.md`) - Every discovered endpoint - Method, auth requirements, response type - Classification (public, hidden, dead) - Rate limit behavior ## 3. Telegram Analysis (`reports/telegram-report.md`) - All Telegram endpoints called - Bot identity (token, username) - Payload contents and frequency - Purpose classification - Risk assessment specific to Telegram usage ## 4. Data Authenticity Report (`reports/data-authenticity.md`) - Classification of each data stream - Confidence levels with evidence - Cross-reference results against public sources ## 5. Backend Fingerprint (`reports/backend-fingerprint.md`) - Confirmed technology stack - Hosting provider and infrastructure - Caching and performance characteristics ## 6. Security Risk Assessment (`reports/security-assessment.md`) - All findings with risk levels - Privacy compliance assessment - Visitor tracking inventory - Recommendations (if applicable) ## 7. Final Classification (`reports/final-classification.md`) - System type: demo / prototype / operational / surveillance / other - Confidence level with justification - Summary of all supporting evidence - Open questions or areas needing further investigation # 🧠 AUTOMATION SCRIPT (Playwright + Proxy) Save this as `scripts/capture.js` and run with `node scripts/capture.js`. const { chromium } = require('playwright'); const fs = require('fs'); const path = require('path'); const TARGET = 'https://worldviewosint.com/'; const PROXY = 'http://127.0.0.1:8080'; const CAPTURE_DIR = path.join(__dirname, '..', 'captures'); const LOG_DIR = path.join(__dirname, '..', 'logs'); // Ensure output directories exist [CAPTURE_DIR, LOG_DIR].forEach(dir => { if (!fs.existsSync(dir)) fs.mkdirSync(dir, { recursive: true }); }); (async () => { console.log('[*] Launching browser with proxy:', PROXY); const browser = await chromium.launch({ headless: false, proxy: { server: PROXY } }); const context = await browser.newContext({ recordHar: { path: path.join(CAPTURE_DIR, 'traffic.har') }, userAgent: 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/120.0.0.0 Safari/537.36' }); const page = await context.newPage(); const networkLogs = []; const consoleLogs = []; const telegramLogs = []; // Capture all network requests page.on('request', req => { const entry = { type: 'request', timestamp: Date.now(), url: req.url(), method: req.method(), headers: req.headers(), body: req.postData() || null, resource_type: req.resourceType() }; networkLogs.push(entry); if (req.url().includes('api.telegram.org')) { telegramLogs.push({ ...entry, direction: 'outbound' }); console.log('[!] TELEGRAM REQUEST:', req.method(), req.url()); } }); // Capture all network responses page.on('response', async res => { let body = null; try { body = await res.text(); } catch {} const entry = { type: 'response', timestamp: Date.now(), url: res.url(), status: res.status(), headers: res.headers(), body: body, content_type: res.headers()['content-type'] || null }; networkLogs.push(entry); if (res.url().includes('api.telegram.org')) { telegramLogs.push({ ...entry, direction: 'inbound' }); console.log('[!] TELEGRAM RESPONSE:', res.status(), res.url()); } }); // Capture browser console output page.on('console', msg => { consoleLogs.push({ timestamp: Date.now(), type: msg.type(), text: msg.text(), location: msg.location() }); }); // Capture page errors page.on('pageerror', err => { consoleLogs.push({ timestamp: Date.now(), type: 'error', text: err.message, stack: err.stack }); }); // --- Phase 1: Initial load --- console.log('[*] Loading target:', TARGET); await page.goto(TARGET, { waitUntil: 'networkidle' }); console.log('[*] Page loaded. Waiting 30s for background activity...'); await page.waitForTimeout(30000); // --- Phase 2: Scroll the page --- console.log('[*] Scrolling page...'); await page.evaluate(() => { return new Promise(resolve => { let distance = 0; const step = 300; const interval = setInterval(() => { window.scrollBy(0, step); distance += step; if (distance >= document.body.scrollHeight) { clearInterval(interval); window.scrollTo(0, 0); resolve(); } }, 200); }); }); await page.waitForTimeout(3000); // --- Phase 3: Click interactive elements --- console.log('[*] Clicking interactive elements...'); const clickable = await page.$$('button, [role="button"], a[href], .clickable, [onclick]'); for (const el of clickable.slice(0, 20)) { try { await el.click({ timeout: 2000 }); await page.waitForTimeout(1500); } catch {} } // --- Phase 4: Wait for more background activity --- console.log('[*] Waiting 30s for additional background activity...'); await page.waitForTimeout(30000); // --- Phase 5: Dump client-side storage --- console.log('[*] Dumping client-side storage...'); const storage = await page.evaluate(() => { const ls = {}; for (let i = 0; i < localStorage.length; i++) { const key = localStorage.key(i); ls[key] = localStorage.getItem(key); } const ss = {}; for (let i = 0; i < sessionStorage.length; i++) { const key = sessionStorage.key(i); ss[key] = sessionStorage.getItem(key); } return { localStorage: ls, sessionStorage: ss, cookies: document.cookie }; }); // --- Save all outputs --- console.log('[*] Saving captures...'); // Close context first to finalize HAR await context.close(); fs.writeFileSync( path.join(CAPTURE_DIR, 'network.json'), JSON.stringify(networkLogs, null, 2) ); fs.writeFileSync( path.join(LOG_DIR, 'console.json'), JSON.stringify(consoleLogs, null, 2) ); fs.writeFileSync( path.join(LOG_DIR, 'telegram.json'), JSON.stringify(telegramLogs, null, 2) ); fs.writeFileSync( path.join(LOG_DIR, 'cookies.json'), JSON.stringify(storage, null, 2) ); await browser.close(); // --- Summary --- console.log('\n[*] Capture complete.'); console.log(` Total network entries: ${networkLogs.length}`); console.log(` Console entries: ${consoleLogs.length}`); console.log(` Telegram entries: ${telegramLogs.length}`); console.log(` Files saved to: ${CAPTURE_DIR} and ${LOG_DIR}`); if (telegramLogs.length > 0) { console.log('\n[!] WARNING: Telegram API traffic detected! Review logs/telegram.json immediately.'); } })(); # 🔧 QUICK REFERENCE — COMMAND CHEAT SHEET # Start mitmproxy (run in a separate terminal) mitmdump --save-stream-file captures/traffic_dump.mitm # Run the capture script node scripts/capture.js # Extract unique endpoints from captured traffic cat captures/network.json | jq -r '.[].url' | sort -u > logs/endpoints.txt # Filter Telegram traffic only cat captures/network.json | jq '[.[] | select(.url | contains("telegram"))]' > logs/telegram_filtered.json # Check for exposed source maps cat captures/traffic.har | jq -r '.log.entries[].request.url' | grep '\.js$' | while read url; do curl -s -o /dev/null -w "%{http_code} $url.map\n" "${url}.map" done # WHOIS lookup whois worldviewosint.com > recon/dns.txt # TLS certificate check echo | openssl s_client -connect worldviewosint.com:443 -servername worldviewosint.com 2>/dev/null | openssl x509 -noout -text > recon/tls.txt # Replay an API endpoint curl -s "https://worldviewosint.com/api/ENDPOINT" -H "User-Agent: Mozilla/5.0" | jq . # Count requests per domain cat captures/network.json | jq -r '.[].url' | awk -F/ '{print $3}' | sort | uniq -c | sort -rn ## Legal & Ethical Disclaimer This framework is provided for **authorized security research and educational purposes only.** Only test, scan, intercept, or analyze systems that you own or for which you have explicit written permission. Unauthorized reconnaissance or interception of systems you do not own may violate computer-misuse, wiretap, and other laws. You are solely responsible for ensuring your use complies with all applicable laws and regulations. The author provides this software **as-is, with no warranty**, and accepts **no liability** for any misuse or damage.
标签:ESC4, GitHub, MITM代理, OSINT, Playwright, 实时处理, 特征检测, 网络流量分析, 自动化分析, 跨站脚本, 逆向分析, 逆向工具