amey-16/Rag-inspector

GitHub: amey-16/Rag-inspector

一款用于调试和优化 RAG 系统的 AI 工程平台，提供从文档摄入到答案评估的全链路分析与可视化能力。

Stars: 0 | Forks: 0

# RAG Inspector 🔍 RAG Inspector 是一款可用于生产环境的 AI 工程工具，它揭示了 RAG 系统的内部运作机制——从文档如何进行分块和 embedding，到答案如何生成和验证。专为那些不满足于“它检索到了东西吗？”并真正想了解“为什么”以及“效果如何”的工程师而打造。 B[PyPDFLoader] B --> C[RecursiveCharacterTextSplitter\n500 / 1000 / 2000 tokens] C --> D[HuggingFace Embeddings\nall-MiniLM-L6-v2] D --> E[Qdrant Cloud\n3 Collections] F[User Question] --> G[Query Embedding] G --> H{Retrieval Strategy} H -->|Similarity| E H -->|MMR| E E --> I[Top-K Chunks + Cosine Scores] I --> J[Groq Llama 3.3 70B\nContext-only answer] J --> K[Groq Llama 8B\nHallucination Check + RAGAS Eval] K --> L[Suggestion Engine] G --> M[UMAP Reduction\ndiskcache keyed by hash] M --> N[Plotly Scatter Map] ``` ## 🛠️ 技术栈 **后端** - FastAPI · LangChain · Groq (Llama 3.3 70B + Llama 8B) - Qdrant Cloud · Sentence Transformers · UMAP-learn · Scikit-learn - 部署在 **Hugging Face Spaces**（16 GB 内存，2 CPU，免费套餐） **前端** - React · Vite · TailwindCSS - Recharts · Plotly.js · Axios - 部署在 **Vercel**（免费套餐） ## 🚀 快速开始 ### 前置条件 - Python 3.11+ - Node.js 18+ - [Qdrant Cloud](https://cloud.qdrant.io) 免费集群 - [Groq](https://console.groq.com) 免费 API key ### 后端 ``` cd backend python -m venv .venv # Windows: .venv\Scripts\activate | macOS/Linux: source .venv/bin/activate pip install -r requirements.txt # 复制并填写你的凭据 cp .env.example .env # 使用你的 QDRANT_URL, QDRANT_API_KEY, GROQ_API_KEY 编辑 .env uvicorn app.main:app --host 0.0.0.0 --port 7860 --reload ``` API 文档地址：http://localhost:7860/docs ### 前端 ``` cd frontend npm install cp .env.example .env # .env 已指向 http://localhost:7860 npm run dev # 打开 http://localhost:5173 ``` ### 测试 ``` cd backend pytest tests/ -v ``` ## ☁️ 部署 ### 后端 → Hugging Face Spaces (Docker) 1. 创建一个使用 **Docker** runtime 的新 HF Space 2. 将 `backend/` 文件夹的内容推送到 Space 仓库 3. 在 Space 设置中添加 Secrets： - `QDRANT_URL` - `QDRANT_API_KEY` - `GROQ_API_KEY` - `CORS_ORIGINS=https://your-app.vercel.app` ### 前端 → Vercel 1. 将 `frontend/` 文件夹推送到 GitHub 2. 在 Vercel 中导入 3. 设置环境变量：`VITE_API_URL=https://your-hf-space.hf.space` 4. 部署 ## 📁 项目结构 ``` RAGINS/ ├── backend/ │ ├── app/ │ │ ├── main.py # FastAPI app + CORS + health │ │ ├── config.py # Pydantic settings │ │ ├── models/schemas.py # All Pydantic models │ │ ├── routes/ │ │ │ ├── upload.py # POST /upload │ │ │ ├── query.py # POST /query │ │ │ ├── analysis.py # POST /compare, POST /embeddings │ │ │ └── experiment.py # POST /experiment/chunk_size │ │ └── services/ │ │ ├── pdf_loader.py # PyPDFLoader + MD5 hash │ │ ├── chunker.py # Multi-size chunking │ │ ├── embedding_service.py # HuggingFace singleton │ │ ├── vector_store.py # Qdrant + MMR implementation │ │ ├── retriever.py # Orchestration + latency │ │ ├── relevance_scorer.py # Cosine similarity scoring │ │ ├── llm_factory.py # Groq 70B + 8B singletons │ │ ├── hallucination_detector.py # LLM-as-judge │ │ ├── evaluation.py # RAGAS-style metrics │ │ ├── umap_visualizer.py # UMAP + diskcache │ │ └── suggestion_engine.py # Rule-based suggestions │ ├── tests/test_retrieval.py # Pytest suite │ ├── Dockerfile # HF Spaces optimised │ ├── requirements.txt │ └── .env.example │ └── frontend/ ├── src/ │ ├── api/api.js # Axios client │ ├── pages/Dashboard.jsx # Main layout │ └── components/ │ ├── BackendStatus.jsx # Cold-start polling │ ├── UploadSection.jsx │ ├── QuerySection.jsx # + Latency panel │ ├── RetrievedChunks.jsx # + Recharts bars │ ├── RetrievalComparison.jsx │ ├── EvaluationMetrics.jsx # SVG gauge rings │ ├── HallucinationPanel.jsx │ ├── EmbeddingMap.jsx # Plotly UMAP │ ├── ChunkExperimentPanel.jsx │ └── SuggestionsPanel.jsx ├── tailwind.config.js ├── vercel.json └── .env.example ``` ## ⚠️ 已知限制 - **无多用户隔离**：所有用户共享相同的 Qdrant 集合。适用于作品集展示或单用户演示。 - **无身份验证**：投入生产环境前请添加 OAuth/JWT。 - **UMAP 对 CPU 消耗大**：在免费硬件上首次运行约需 15–20 秒；后续调用会读取磁盘缓存。 ## 📄 许可证 MIT

标签：AI工程, Clair, LLM, RAG, Unmanaged PE, 可视化分析, 向量检索, 性能优化, 数据库接管, 检测绕过, 检索增强生成, 自定义脚本, 逆向工具