anasahhm/Specter
GitHub: anasahhm/Specter
Stars: 0 | Forks: 0
# SPECTER: Wire API-Powered AI Threat Intelligence Platform
Real-time URL threat analysis. Wire API extracts technical signals, Google Generative AI recognizes behavioral patterns, deterministic scoring outputs risk (0-100). Async architecture handles 120s Wire API + 45s AI calls without blocking.
[Live Demo](https://specter-weld.vercel.app)
## Why I Built This
Static URL threat detection is broken:
- Blacklist-based tools miss 60%+ of new phishing campaigns
- SaaS solutions cost $500/month and have 5-minute latencies
- Open-source tools use only regex patterns (too many false positives)
I needed hybrid intelligence: raw technical signals (domain age, SSL validity, redirect chains) + AI pattern recognition (behavioral clustering, social engineering vectors). And it had to be fast.
## The Architecture Problem
**The Issue:** Wire API takes 120s, Google Generative AI takes 45s. If I block the request thread waiting for both, the user sits staring at a loading spinner for 175 seconds.
**The Solution:** Async worker architecture.
POST /api/investigations/start → returns investigation ID immediately (300ms)
Background: Wire API (120s) → AI Analysis (45s) → Threat Scoring (10s)
Frontend polls GET /api/investigations/:id every 2s with 3-min graceful timeout
Status persisted: processing → completed
User gets results without waiting (8-15s typical, 180s max)
This pattern scales. Investigate 50 URLs and come back later. No polling hell, no WebSocket complexity.
## Tech Stack
**Frontend:** React 18 + Vite (3-4x faster than Webpack) + Tailwind + Framer Motion
**Backend:** Node.js/Express + MongoDB + Mongoose + Helmet.js + express-rate-limit
**Intelligence:** Wire API (technical metadata) + Google Generative AI (pattern analysis)
**Auth:** JWT (30-day expiry) + bcryptjs (salt: 10) + input validation
**Deployment:** Vercel (frontend) + Render (backend) + MongoDB Atlas (database)
## Installation & Setup
### Prerequisites
Node.js v18+, npm/yarn, MongoDB (Atlas free tier works)
### Clone & Install
git clone https://github.com/anasahhm/specter.git
cd specter
# Backend
cd backend
npm install
cp .env.example .env # Add your API keys
# Frontend
cd frontend
npm install
cp .env.example .env
### Environment Variables
**Backend (.env):**
NODE_ENV=development
PORT=5000
MONGODB_URI=mongodb+srv://user:password@cluster.mongodb.net/specter
JWT_SECRET=your-super-secret-key-minimum-32-characters
WIRE_API_KEY=your-wire-api-key-here
GOOGLE_GENERATIVE_AI_KEY=your-key-here
FRONTEND_URL=http://localhost:5173
**Frontend (.env):**
VITE_API_URL=http://localhost:5000
VITE_APP_NAME=SPECTER
### Run Locally
**Terminal 1 (Backend):**
cd backend && npm run dev
# Should output:
# ╔══════════════════════════════════════════╗
# ║ SPECTER - SERVER STARTED ║
# ║ Port: 5000 | Database: Connected ║
# ║ Wire API: ✓ | Google AI: ✓ ║
# ╚══════════════════════════════════════════╝
**Terminal 2 (Frontend):**
cd frontend && npm run dev
# http://localhost:5173
**Terminal 3 (Test API):**
curl http://localhost:5000/api/health
# {"status":"operational","timestamp":"2024-05-31T12:00:00.000Z"}
## How It Works
### 3-Stage Pipeline
**Step 1: Wire API (120s timeout)**
- Domain metadata, SSL certificates, age, MX records
- Redirect chains, technology stack detection
- Embedded links, forms, scripts
- Output: Raw technical signals
**Step 2: AI Analysis (45s timeout)**
- Google Generative AI pattern recognition
- Behavioral clustering against known threats
- Phishing vector identification
- Confidence scoring and summary generation
- Fallback: Rule-based analysis if AI unavailable
**Step 3: Threat Scoring (10s timeout)**
- Risk score (0-100)
- Threat classification (Critical/High/Medium/Low/Safe)
- Scam probability, toxicity rating, confidence
- Output: Final verdict
### Data Flow
1. User submits URL
2. POST /api/investigations/start
3. Backend returns investigationId (status: processing)
4. Frontend polls GET /api/investigations/:id every 2s
5. Background: Step 1 → Step 2 → Step 3
6. Status changes to completed
7. Frontend renders results
## Gotchas I Solved
### 1. **Slow External APIs Without Blocking**
- **Problem:** Wire API + AI = 165s. Blocking the request thread kills UX.
- **Solution:** Async workers + polling. POST returns instantly with ID, frontend polls every 2s.
- **Lesson:** For external APIs >10s, always use async + polling or WebSockets.
### 2. **Rate Limit Exploitation**
- **Problem:** Users hammer the API. Bots scrape URL intelligence.
- **Solution:** Dual-axis rate limiting:
- Global: 100 requests/15min (catches distributed attacks)
- Per-user: 5 investigations/min (prevents individual abuse)
- Sliding window (not fixed buckets)
- **Lesson:** Single rate limit is insufficient. Attack from one user looks different than botnet traffic.
### 3. **External API Resilience**
- **Problem:** What if Wire API is down? What if Google AI returns an error?
- **Solution:** Graceful degradation:
- Wire API failure → Use cached domain reputation data
- Google AI timeout → Fall back to rule-based threat scoring
- Both failures → Return partial results with explicit warnings
- **Lesson:** Single point of failure cascades. Build fallbacks at every layer.
### 4. **JWT Token Expiry Handling**
- **Problem:** Users investigate for hours but tokens expire after 30 days.
- **Solution:** Token refresh pattern:
- 30-day access tokens + refresh tokens
- Frontend axios interceptor refreshes automatically
- No sensitive data in error messages
- **Lesson:** Never leak token details in error responses.
### 5. **JavaScript-Heavy Sites**
- **Problem:** Wire API sees static HTML. Dynamic forms, obfuscated links, JS-rendered content are invisible.
- **Solution:** Hybrid approach:
- Wire API for structural/technical analysis
- Google AI for behavioral/pattern analysis
- Triangulation catches what either misses
- **Lesson:** No single tool is complete. Combine strengths.
### 6. **MongoDB Connection Pooling**
- **Problem:** Mongoose default pool size (5) was too small under concurrent load.
- **Solution:** Tuned pool settings in connection URI, added connection monitoring.
- **Lesson:** Database bottlenecks surface under load, not in dev.
## API Reference
### Authentication
POST /api/auth/register
{ email, password, displayName? }
POST /api/auth/login
{ email, password }
GET /api/auth/profile
Headers: Authorization: Bearer {token}
### Investigations
POST /api/investigations/start
{ targetType: "url", targetValue: "https://..." }
Returns: { investigationId, status: "processing" }
GET /api/investigations/:investigationId
Returns: Complete threat analysis
GET /api/investigations?page=1&limit=10
Returns: User's investigation history
PUT /api/investigations/:investigationId/bookmark
{ isBookmarked: boolean }
### Reports & Analytics
GET /api/reports/:investigationId
GET /api/reports/:investigationId/export?format=pdf|json
GET /api/analytics/user-stats
GET /api/analytics/threat-distribution
## Threat Metrics
| Metric | Range | Meaning |
|--------|-------|---------|
| **Risk Score** | 0-100 | Overall threat severity |
| **Threat Level** | Critical/High/Medium/Low/Safe | Classification |
| **Phishing Detected** | Yes/No | Known phishing patterns |
| **Scam Probability** | 0-100% | Fraudulent intent likelihood |
| **Toxicity Score** | 0-100 | Content toxicity |
| **Confidence Score** | 0-100% | Analysis certainty |
## Performance
| Metric | Target | Actual |
|--------|--------|--------|
| Page Load | <2s | 1.2s |
| Investigation Start (API) | <500ms | 300ms |
| Results Available | <30s | 8-15s |
| API Response Time | <1s | 200-400ms |
| Database Query | <100ms | 50-80ms |
## Security
**JWT Auth** - 30-day token expiry + refresh rotation
**Password Hashing** - bcryptjs (salt rounds: 10)
**Rate Limiting** - 100 req/15min global + 5/min per user
**Helmet.js** - CSP, X-Frame-Options, HSTS, etc.
**CORS** - Whitelist frontend origin only
**Input Validation** - Email format, password entropy, URL structure
**Error Handling** - No sensitive data leakage
**Environment Isolation** - Secrets in .env, never in code
## Deployment
**Frontend (Vercel):**
1. Push to GitHub
2. vercel.com/new → Import repo
3. Set `VITE_API_URL` env var
4. Deploy
**Backend (Render):**
1. render.com → Create Web Service
2. Connect GitHub repo
3. Set all env vars (MONGODB_URI, WIRE_API_KEY, etc.)
4. Deploy
**Database (MongoDB Atlas):**
1. cloud.mongodb.com → Create cluster (free tier)
2. Get connection string
3. Whitelist your IP
4. Set as MONGODB_URI
## Project Structure
specter/
├── frontend/
│ ├── src/
│ │ ├── pages/ # Route components
│ │ ├── components/ # Reusable UI components
│ │ ├── hooks/ # Custom React hooks
│ │ ├── api/ # API client + interceptors
│ │ ├── context/ # Auth context
│ │ ├── utils/ # Helpers
│ │ └── styles/ # Global CSS
│ ├── vite.config.js
│ ├── tailwind.config.js
│ └── package.json
│
├── backend/
│ ├── src/
│ │ ├── routes/ # Express route handlers
│ │ ├── services/ # Business logic (Wire, AI, Scoring)
│ │ ├── models/ # Mongoose schemas
│ │ ├── config/ # Validation, constants
│ │ ├── scripts/ # Database seeders
│ │ ├── server.js # Express app setup
│ │ └── index.js # Entry point
│ └── package.json
│
└── docs/
├── ARCHITECTURE.md # System design
├── API.md # Endpoint reference
└── DEPLOYMENT.md # Production setup
## Testing
### Manual API Tests
# Register
curl -X POST http://localhost:5000/api/auth/register \
-H "Content-Type: application/json" \
-d '{"email":"test@example.com","password":"Test123!"}'
# Login
curl -X POST http://localhost:5000/api/auth/login \
-H "Content-Type: application/json" \
-d '{"email":"test@example.com","password":"Test123!"}'
# Start investigation
curl -X POST http://localhost:5000/api/investigations/start \
-H "Authorization: Bearer YOUR_TOKEN" \
-H "Content-Type: application/json" \
-d '{"targetType":"url","targetValue":"https://example.com"}'
# Get results
curl http://localhost:5000/api/investigations/INVESTIGATION_ID \
-H "Authorization: Bearer YOUR_TOKEN"
### Browser Testing
1. Register account
2. Test URLs: `example.com` (safe), `malicious-url.com` (suspicious)
3. Verify threat scores, phishing detection, report generation
4. Check investigation history and bookmarks
## Troubleshooting
### Frontend can't reach backend
# Check backend is running
curl http://localhost:5000/api/health
# Check VITE_API_URL in frontend/.env matches backend
# Check FRONTEND_URL in backend/.env matches frontend origin (http://localhost:5173)
# Check browser console for CORS errors
### MongoDB connection failed
# Verify connection string
MONGODB_URI=mongodb+srv://user:password@cluster.mongodb.net/specter
# Check IP whitelist in MongoDB Atlas (add 0.0.0.0/0 for development)
# Verify database user has correct credentials
### Wire API errors
- Check API key is valid and quota isn't exceeded
- Review Wire API docs for rate limits
- Enable debug logging: `DEBUG=* npm run dev`
### Investigation timeout (>180s)
- Default timeout is 180 seconds
- Check backend logs for step-specific errors
- Test with simple URL first (e.g., example.com)
## Stats
- **48 hour build** (hackathon sprint)
- **2100+ LOC** (1200 frontend, 900 backend)
- **12 API endpoints** (Auth, Investigations, Reports, Analytics)
- **3-stage pipeline** (Wire API → AI → Scoring)
- **8-15s typical latency** (8-180s max with timeouts)
- **4 database collections** (users, investigations, reports, analytics)
- **18 React components** (modular, reusable)
- **3 backend services** (Wire client, AI analyzer, threat scorer)
## What I Learned
1. **Async beats blocking.** External APIs >10s? Don't wait. Async + polling scales better.
2. **Graceful degradation saves systems.** When Wire API fails, use cached data. When AI times out, use rules.
3. **Rate limiting is multidimensional.** Global limits catch botnets. Per-user limits catch individual abuse.
4. **Hybrid intelligence works.** One data source has blind spots. Wire API + AI catch what each misses.
5. **Security is layering.** JWT + bcryptjs + Helmet + CORS + input validation = defense in depth.
## License
MIT — see [LICENSE](./LICENSE)
## Made By Anas Ahmed
Questions? Open a [GitHub issue](https://github.com/anasahhm/specter/issues)
**Live:** https://specter-weld.vercel.app
标签:自定义脚本