harsh-kumar274/DOCUMENT-VERIFICATION-AND-FRAUD-DETECTION-USING-AI-AND-COMPUTER-VISION

GitHub: harsh-kumar274/DOCUMENT-VERIFICATION-AND-FRAUD-DETECTION-USING-AI-AND-COMPUTER-VISION

基于AI和计算机视觉的PAN卡验证与欺诈检测系统,自动检测图像篡改并提取验证信息。

Stars: 0 | Forks: 2

# AI驱动的PAN卡验证与欺诈检测 一个全栈系统,利用计算机视觉、OCR和基于规则的欺诈决策引擎,验证印度PAN卡图像并检测篡改行为。 ## 特性 - **9阶段CV流水线** — 预处理、检测、校正、特征提取、篡改检测、OCR、验证、决策 - **多策略OCR** — 对每个字段尝试5种预处理变体,选取置信度最高的结果 - **OCR后校正** — 修复常见字符混淆(O↔0、I↔1、B↔8等) - **RANSAC透视校正** — 修复倾斜、旋转和透视畸变 - **篡改检测** — ELA分析、纹理检查、噪声模式分析 - **PAN验证** — 正则格式检查、持有人类型代码、结构验证 - **欺诈决策引擎** — 加权评分聚合,生成最终裁决 - **毛玻璃风格Web界面** — 深色主题高级仪表板,实时展示结果 ## 技术栈 | 技术 | 用途 | |------------|---------| | Python 3.10+ | 核心语言 | | OpenCV 4.9 | 图像处理、边缘检测、轮廓、透视变换 | | Tesseract OCR | 从PAN卡字段提取文本 | | FastAPI | REST API后端 | | NumPy | 数组运算 | | Pillow | 图像格式转换 | | Jinja2 | HTML模板渲染 | | HTML/CSS/JS | 前端仪表板 | ## 项目结构 ``` ├── backend/ │ ├── config.py # Tesseract path auto-detection, system config │ ├── preprocessing.py # Phase 1: Image cleanup & quality metrics │ ├── detection.py # Phase 2: Document boundary detection │ ├── correction.py # Phase 3: RANSAC perspective correction │ ├── features.py # Phase 4-5: ORB feature extraction & matching │ ├── tampering.py # Phase 6: Forgery detection (ELA) │ ├── ocr.py # Phase 7: Multi-strategy Tesseract OCR │ ├── validation.py # Phase 8: PAN format validation │ ├── decision.py # Phase 9: Fraud decision engine │ ├── cv_pipeline.py # End-to-end pipeline orchestrator │ ├── main.py # FastAPI routes │ └── utils.py # Shared utilities │ ├── frontend/ │ ├── index.html # Main UI │ ├── style.css # Dark glassmorphic theme │ └── app.js # Frontend logic & API calls │ ├── templates/ │ └── PAN_CARD_TEMPLATE.jpeg │ ├── requirements.txt ├── run.py # Entry point └── SYSTEM_DESIGN.md # Architecture docs ``` ## 流水线 ``` Upload Image │ ▼ 1. Preprocessing ──► Noise reduction, contrast, sharpening │ ▼ 2. Detection ──► Edge detection, contour analysis, boundary find │ ▼ 3. Correction ──► RANSAC homography, perspective warp to 856×540 │ ├──────────────────┬────────────────────┐ ▼ ▼ ▼ 4-5. Features 6. Tampering 7. OCR Extraction Template match ELA analysis PAN, Name, DOB Layout check Texture check Father's Name │ │ │ └──────────────────┼────────────────────┘ ▼ 8. PAN Validation ──► Regex, format, structure checks │ ▼ 9. Fraud Decision ──► Score aggregation → Verdict │ ▼ ✅ Genuine / 🟡 Manual Review / ❌ Fraudulent ``` ## 安装 ### 1. 安装 Tesseract OCR **macOS:** ``` brew install tesseract ``` **Ubuntu/Debian:** ``` sudo apt install tesseract-ocr ``` **Windows:** 从 [UB Mannheim](https://github.com/UB-Mannheim/tesseract/wiki) 下载并添加到 PATH。 ### 2. 安装并运行 ``` # 创建 virtual environment python -m venv venv source venv/bin/activate # Windows: venv\Scripts\activate # 安装 dependencies pip install -r requirements.txt # 启动 server python run.py ``` 在浏览器中打开 **http://127.0.0.1:8000**。 ## 使用说明 1. 在浏览器中打开仪表板 2. 上传PAN卡图像(拖放或点击) 3. 实时观察流水线处理每个阶段 4. 查看提取的OCR数据、置信度分数和最终裁决 ### API ``` curl -X POST http://127.0.0.1:8000/api/verify -F "file=@pan_card.jpg" ``` ## 示例输出 ``` PAN: "GRNPP3804H" conf=86.0% ✅ Name: "SIDDHARTH PRAJAPATI" conf=91.5% ✅ FatherName: "INDERJEET PRAJAPATI" conf=65.3% ⚠️ DOB: "01/01/2004" conf=73.8% ⚠️ Validation: PASS (all structural checks passed) Decision: NEEDS MANUAL REVIEW (39.2% fraud probability) ``` ## 依赖项 ``` fastapi>=0.100.0 uvicorn>=0.22.0 python-multipart>=0.0.6 opencv-python==4.9.0.80 numpy>=1.26.0,<2.0 pytesseract>=0.3.10 Pillow>=10.0.0 jinja2>=3.1.2 pydantic>=2.0 ``` ## 许可证 仅用于教育和内部验证目的。处理真实PAN卡图像时,请遵守数据隐私法规(DPDP法案)。
标签:AV绕过, FastAPI, HTML/CSS/JS, Jinja2, NumPy, OCR光学字符识别, OpenCV, ORB特征匹配, PAN卡验证, Pillow, Python, RANSAC单应性校正, REST API, Tesseract, 印度身份证验证, 图像处理, 图像矫正, 图像篡改检测, 多模态安全, 数据可视化, 文档验证, 无后门, 模板匹配, 欺诈检测, 深度学习, 特征提取, 玻璃态UI, 自动化验证, 表单验证, 计算机视觉, 逆向工具, 金融科技, 防伪检测, 预处理