SagarBiswas-MultiHAT/Web_Vulnerability_Scanner-AI
GitHub: SagarBiswas-MultiHAT/Web_Vulnerability_Scanner-AI
Stars: 22 | Forks: 0
# Learning Grade AI Web Vulnerability Scanner
**After running python app.py**

**After running mainScaner.py**

**After running cd Reports, python -m http.server 8080**

**JSON Report Example**

**HTML Report Example**

**AI Help Center** **1).**  **2).** 
**withOut Anonymize before send**

**with Anonymize before send**

# Table of contents
- [Key features](#key-features)
- [What this scanner does (and doesn't)](#what-this-scanner-does-and-doesnt)
- [Requirements](#requirements)
- [Files](#files)
- [Installation](#installation)
- [Usage & Command-line arguments](#usage--command-line-arguments)
- [Examples](#examples)
- [Output files & report format](#output-files--report-format)
- [Internals & design decisions](#internals--design-decisions)
- [Extending the scanner](#extending-the-scanner)
- [Safety & legal notes](#safety--legal-notes)
- [Contributing](#contributing)
- [Contact / Acknowledgements](#contact--acknowledgements)
# Key features
- Queue-based polite crawler (no recursive thread spawning).
- Per-host rate limiting (configurable `--delay`) to avoid hammering a server
- `robots.txt` awareness (the scanner checks and respects rules where available).
[](https://github.com/SagarBiswas-MultiHAT/AI_Web_Vulnerability_Scanner/actions)
[](https://www.python.org/)
[](https://github.com/SagarBiswas-MultiHAT/Web_Vulnerability_Scanner-AI)
[](https://github.com/SagarBiswas-MultiHAT/Web_Vulnerability_Scanner-AI/blob/main/LICENSE)
[](https://github.com/SagarBiswas-MultiHAT/Web_Vulnerability_Scanner-AI/commits)
[](https://github.com/SagarBiswas-MultiHAT/Web_Vulnerability_Scanner-AI/issues)
A portfolio-ready, **learning-grade** web vulnerability scanner and lightweight **AI-assisted report viewer**. This project demonstrates a polite, non-destructive approach to crawling and finding common web security issues (security headers, insecure cookie flags, reflected XSS heuristics, and basic error-based SQL injection indicators). It ships with a small Flask-based AI proxy intended to power the in-report AI assistance (optional).
Pictures
**After running python app.py**

**After running mainScaner.py**

**After running cd Reports, python -m http.server 8080**

**JSON Report Example**

**HTML Report Example**

**AI Help Center** **1).**  **2).** 
**withOut Anonymize before send**

**with Anonymize before send**

Quick explanation — Queue-based crawler, Polite crawler & No recursive thread spawning (beginner-friendly)
#### **1.** Queue-based crawler A **queue-based crawler** uses a **single shared work queue** to manage all URLs that need to be visited. **How it works (conceptually):** 1. Start with the base URL → put it into the queue 2. Worker threads repeatedly: - Take **one URL** from the queue - Fetch the page - Extract links - Add **new, allowed URLs** back into the same queue 3. Repeat until the queue is empty or the depth limit is reached **What this means:** - Each worker processes **one URL at a time** - There is **central control** over what gets scanned - Crawl order, depth, and limits remain predictable #### **2.** Polite crawler A **polite crawler** is designed **not to stress or harm servers**. In this scanner, politeness includes: - ⏱ Per-host rate limiting (`--delay`) - 🤖 Respecting `robots.txt` - 🌱 GET-only requests (non-destructive) - 🚫 No brute-force or payload floods - 🧵 Controlled number of threads Instead of hammering a site, the scanner behaves more like a **careful human using a browser**. #### **3.** No recursive thread spawning (critical design choice) This explains what the crawler **deliberately does NOT do**. **✘ Bad design (recursive thread spawning):**
| |
| ------------------------------ |
| Thread A visits URL A |
| └── spawns Thread B for link B |
| └── spawns Thread C for link C |
| └── spawns Thread D for link D |
A **thread** is a lightweight unit of execution that allows a program to do work in **parallel (In parallel means multiple tasks are executed at the same time, instead of one after another.)**.
**Problems with this approach:**
- Unbounded thread growth
- Loss of concurrency control
- Servers get overwhelmed
- Scanner runs out of memory or sockets
- Hard to enforce delays and crawl depth
### ✓ What this scanner does instead?
[ Queue ]
↓
Worker Thread Pool (fixed size)
↓
Fetch → Extract → Enqueue (back to Queue)
#### Why this design is considered best practice?
| Aspect | Queue-based crawler | Recursive spawning |
| ---------------------- | ---------------------- | ------------------ |
| Thread control | ✅ Fixed & predictable | ❌ Unbounded |
| Rate limiting | ✅ Enforceable | ❌ Difficult |
| Server safety | ✅ Polite | ❌ Aggressive |
| Memory safety | ✅ Stable | ❌ Risky |
| Debugging | ✅ Easier | ❌ Chaotic |
| Legal / ethical safety | ✅ Much safer | ❌ Risky |
This is why **professional tools and search engine crawlers** use queue-based designs.
**Result:** safer scans, predictable behavior, ethical crawling, and easier extensibility.