harsh-kumar274/DOCUMENT-VERIFICATION-AND-FRAUD-DETECTION-USING-AI-AND-COMPUTER-VISION
GitHub: harsh-kumar274/DOCUMENT-VERIFICATION-AND-FRAUD-DETECTION-USING-AI-AND-COMPUTER-VISION
基于AI和计算机视觉的PAN卡验证与欺诈检测系统,自动检测图像篡改并提取验证信息。
Stars: 0 | Forks: 2
# AI驱动的PAN卡验证与欺诈检测
一个全栈系统,利用计算机视觉、OCR和基于规则的欺诈决策引擎,验证印度PAN卡图像并检测篡改行为。
## 特性
- **9阶段CV流水线** — 预处理、检测、校正、特征提取、篡改检测、OCR、验证、决策
- **多策略OCR** — 对每个字段尝试5种预处理变体,选取置信度最高的结果
- **OCR后校正** — 修复常见字符混淆(O↔0、I↔1、B↔8等)
- **RANSAC透视校正** — 修复倾斜、旋转和透视畸变
- **篡改检测** — ELA分析、纹理检查、噪声模式分析
- **PAN验证** — 正则格式检查、持有人类型代码、结构验证
- **欺诈决策引擎** — 加权评分聚合,生成最终裁决
- **毛玻璃风格Web界面** — 深色主题高级仪表板,实时展示结果
## 技术栈
| 技术 | 用途 |
|------------|---------|
| Python 3.10+ | 核心语言 |
| OpenCV 4.9 | 图像处理、边缘检测、轮廓、透视变换 |
| Tesseract OCR | 从PAN卡字段提取文本 |
| FastAPI | REST API后端 |
| NumPy | 数组运算 |
| Pillow | 图像格式转换 |
| Jinja2 | HTML模板渲染 |
| HTML/CSS/JS | 前端仪表板 |
## 项目结构
```
├── backend/
│ ├── config.py # Tesseract path auto-detection, system config
│ ├── preprocessing.py # Phase 1: Image cleanup & quality metrics
│ ├── detection.py # Phase 2: Document boundary detection
│ ├── correction.py # Phase 3: RANSAC perspective correction
│ ├── features.py # Phase 4-5: ORB feature extraction & matching
│ ├── tampering.py # Phase 6: Forgery detection (ELA)
│ ├── ocr.py # Phase 7: Multi-strategy Tesseract OCR
│ ├── validation.py # Phase 8: PAN format validation
│ ├── decision.py # Phase 9: Fraud decision engine
│ ├── cv_pipeline.py # End-to-end pipeline orchestrator
│ ├── main.py # FastAPI routes
│ └── utils.py # Shared utilities
│
├── frontend/
│ ├── index.html # Main UI
│ ├── style.css # Dark glassmorphic theme
│ └── app.js # Frontend logic & API calls
│
├── templates/
│ └── PAN_CARD_TEMPLATE.jpeg
│
├── requirements.txt
├── run.py # Entry point
└── SYSTEM_DESIGN.md # Architecture docs
```
## 流水线
```
Upload Image
│
▼
1. Preprocessing ──► Noise reduction, contrast, sharpening
│
▼
2. Detection ──► Edge detection, contour analysis, boundary find
│
▼
3. Correction ──► RANSAC homography, perspective warp to 856×540
│
├──────────────────┬────────────────────┐
▼ ▼ ▼
4-5. Features 6. Tampering 7. OCR Extraction
Template match ELA analysis PAN, Name, DOB
Layout check Texture check Father's Name
│ │ │
└──────────────────┼────────────────────┘
▼
8. PAN Validation ──► Regex, format, structure checks
│
▼
9. Fraud Decision ──► Score aggregation → Verdict
│
▼
✅ Genuine / 🟡 Manual Review / ❌ Fraudulent
```
## 安装
### 1. 安装 Tesseract OCR
**macOS:**
```
brew install tesseract
```
**Ubuntu/Debian:**
```
sudo apt install tesseract-ocr
```
**Windows:** 从 [UB Mannheim](https://github.com/UB-Mannheim/tesseract/wiki) 下载并添加到 PATH。
### 2. 安装并运行
```
# 创建 virtual environment
python -m venv venv
source venv/bin/activate # Windows: venv\Scripts\activate
# 安装 dependencies
pip install -r requirements.txt
# 启动 server
python run.py
```
在浏览器中打开 **http://127.0.0.1:8000**。
## 使用说明
1. 在浏览器中打开仪表板
2. 上传PAN卡图像(拖放或点击)
3. 实时观察流水线处理每个阶段
4. 查看提取的OCR数据、置信度分数和最终裁决
### API
```
curl -X POST http://127.0.0.1:8000/api/verify -F "file=@pan_card.jpg"
```
## 示例输出
```
PAN: "GRNPP3804H" conf=86.0% ✅
Name: "SIDDHARTH PRAJAPATI" conf=91.5% ✅
FatherName: "INDERJEET PRAJAPATI" conf=65.3% ⚠️
DOB: "01/01/2004" conf=73.8% ⚠️
Validation: PASS (all structural checks passed)
Decision: NEEDS MANUAL REVIEW (39.2% fraud probability)
```
## 依赖项
```
fastapi>=0.100.0
uvicorn>=0.22.0
python-multipart>=0.0.6
opencv-python==4.9.0.80
numpy>=1.26.0,<2.0
pytesseract>=0.3.10
Pillow>=10.0.0
jinja2>=3.1.2
pydantic>=2.0
```
## 许可证
仅用于教育和内部验证目的。处理真实PAN卡图像时,请遵守数据隐私法规(DPDP法案)。
标签:AV绕过, FastAPI, HTML/CSS/JS, Jinja2, NumPy, OCR光学字符识别, OpenCV, ORB特征匹配, PAN卡验证, Pillow, Python, RANSAC单应性校正, REST API, Tesseract, 印度身份证验证, 图像处理, 图像矫正, 图像篡改检测, 多模态安全, 数据可视化, 文档验证, 无后门, 模板匹配, 欺诈检测, 深度学习, 特征提取, 玻璃态UI, 自动化验证, 表单验证, 计算机视觉, 逆向工具, 金融科技, 防伪检测, 预处理