Nevetan/Shield-Scan
GitHub: Nevetan/Shield-Scan
Stars: 0 | Forks: 0
# ShieldScan — AI-Powered Website Threat Detector
A Chrome extension + Python backend that analyses any website for phishing, scams, and malware indicators using multi-signal detection and Llama 3.3 70B via Groq.
## Architecture
Chrome Extension (JS)
└── popup.html / popup.js
│ extracts URL + page content
▼
FastAPI Backend (Python) ←── localhost:8000
├── URL feature extraction (entropy, subdomains, TLD, impersonation)
├── Content analysis (keywords, urgency language, form fields)
├── WHOIS domain age lookup
└── Llama 3.3 70B via Groq (free AI verdict + explanation)
## Setup
### 1. Get a free Groq API key
Go to **console.groq.com** → sign in with Google → **API Keys → Create API key**.
### 2. Backend
cd backend
pip install -r requirements.txt
Open `main.py` and paste your Groq key on line 22:
client = Groq(api_key="YOUR_GROQ_API_KEY")
Start the server:
uvicorn main:app --reload --port 8000
Verify it's running:
curl http://localhost:8000/health
### 3. Chrome Extension
1. Open Chrome → `chrome://extensions`
2. Enable **Developer mode** (top right toggle)
3. Click **Load unpacked**
4. Select the `extension/` folder
The ShieldScan icon will appear in your toolbar. Keep the terminal running in the background — the extension needs the server to work.
## Features Analysed
### URL-level signals
| Feature | Why it matters |
|---|---|
| Shannon entropy of domain | High entropy = randomly generated = suspicious |
| Subdomain count | Deep subdomains often used in phishing |
| Brand impersonation | e.g. `paypal.evil.com` |
| IP address instead of domain | Legitimate sites use domain names |
| Suspicious TLD | `.tk`, `.xyz`, `.icu` etc. |
| Hyphens & digit ratio | `my-paypa1-secure.com` patterns |
| Hex encoding | URL obfuscation technique |
### Content signals
| Feature | Why it matters |
|---|---|
| Suspicious keyword matching | 30+ phishing/scam phrases |
| Brand name presence | Combined with non-brand domain = suspicious |
| Urgency language | Pressure tactics are a scam hallmark |
| Password/card input fields | Credential harvesting detection |
| External link analysis | Phishing kits link to multiple external domains |
### Domain signals
| Feature | Why it matters |
|---|---|
| WHOIS domain age | Newly registered domains = high risk |
| SSL certificate | No HTTPS = baseline red flag |
## API
### `POST /analyse`
**Request:**
{
"url": "https://example.com",
"page_text": "...",
"page_title": "...",
"links": ["https://..."]
}
**Response:**
{
"threat_level": "warning",
"risk_score": 54,
"summary": "This site was registered 12 days ago and contains urgency language asking users to verify their PayPal account, despite being hosted on a non-PayPal domain.",
"signals": [
{"message": "Domain registered only 12 days ago", "severity": "high"},
{"message": "Contains 'verify your account' phishing language", "severity": "high"},
{"message": "References PayPal but domain is unrelated", "severity": "high"},
{"message": "HTTPS present", "severity": "low"}
],
"has_ssl": true,
"domain_age_days": 12
}
# Author
Nevetan Uthayachandran
标签:后端开发