NithinR-7105/SMAGGE
GitHub: NithinR-7105/SMAGGE
一个完全在本地运行的自动化系统,用于合规地发现销售线索并生成个性化外联内容,解决了数据安全和AI治理问题。
Stars: 0 | Forks: 0
# SMAGGE — 自主多代理增长与治理引擎
## 项目展示技能点
| 技能领域 | 实现方案 |
|---|---|
| **多代理编排** | CrewAI 顺序流水线 — 侦察员 → 分析师 → 撰写员 → 守卫 |
| **本地大语言模型部署** | Ollama + llama3.2(支持工具调用,完全离线) |
| **AI安全与治理** | 定制4层安全评分器(PII·注入·幻觉·语气) |
| **反馈循环** | 被拒绝信息反馈至撰写员代理上下文 |
| **工作流自动化** | n8n Webhook + FastAPI每日定时触发 |
| **全栈AI工程** | FastAPI REST API + 实时HTML/JS仪表盘 |
| **容器化** | Docker Compose — PostgreSQL + n8n |
## 系统架构
```
┌─────────────────────────────────────────────────────────────────┐
│ SMAGGE Pipeline │
│ │
│ ┌─────────┐ ┌──────────┐ ┌────────┐ ┌───────────┐ │
│ │ Scout │───▶│ Analyst │───▶│ Writer │───▶│ Security │ │
│ │ Agent │ │ Agent │ │ Agent │ │ Guard │ │
│ └─────────┘ └──────────┘ └────────┘ └─────┬─────┘ │
│ Discovers Enriches Drafts 4-layer check │
│ leads via each lead personalised PII·Inj·Hal │
│ CSV/Apollo with OCR outreach ·Tone scoring │
│ /Hunter + reasoning (<150 words) │ │
│ ▼ │
│ ┌──────────────────────────────┐ │
│ │ PostgreSQL Database │ │
│ │ leads · analyses · outreach │ │
│ │ pipeline_runs │ │
│ └──────────────┬───────────────┘ │
└─────────────────────────────────────────────┼───────────────────┘
│
┌───────────────────────────────┼────────────────┐
│ │ │
┌─────▼─────┐ ┌──────▼──────┐ ┌─────▼────┐
│ FastAPI │ │ Dashboard │ │ n8n │
│ REST API │ │ (HTML/JS) │ │ Workflow │
│ :8000 │ │ :8000/ │ │ :5678 │
└───────────┘ │ dashboard │ └──────────┘
└─────────────┘
```
## 项目结构
```
SMAGGE/
├── agents/
│ ├── scout.py # Lead discovery agent (mock / Apollo / Hunter)
│ ├── analyst.py # Lead enrichment agent (OCR + reasoning)
│ └── writer.py # Personalised outreach drafting agent
├── tools/
│ ├── lead_scraper.py # CrewAI BaseTool — CSV / API lead fetching
│ └── ocr_tool.py # CrewAI BaseTool — Tesseract OCR
├── tasks/
│ └── pipeline_tasks.py # CrewAI task definitions with context chaining
├── security/
│ ├── guard.py # SecurityGuard — orchestrates all checks
│ ├── scorer.py # SecurityScorer — returns SecurityReport
│ ├── checks/
│ │ ├── pii_check.py # Regex PII detection (40 pts)
│ │ ├── injection_check.py # Regex + LLM semantic injection check (30 pts)
│ │ ├── hallucination_check.py# LLM cross-reference check (20 pts)
│ │ └── tone_check.py # LLM tone appropriateness (10 pts)
│ └── guardrails/
│ ├── config.yml # NeMo Guardrails config (portfolio documentation)
│ └── main.co # NeMo Guardrails colang rules
├── feedback/
│ └── loop.py # FeedbackLoop — rejected messages → Writer context
├── api/
│ └── server.py # FastAPI server — /run /status /leads /runs /approve
├── static/
│ └── dashboard.html # Live dashboard — 4 pages, full navigation
├── database/
│ └── init.sql # PostgreSQL schema
├── data/
│ └── mock_leads.csv # 10 sample SaaS leads
├── n8n/
│ └── smagge_workflow.json # Importable n8n workflow
├── crew.py # Main pipeline entry point
├── docker-compose.yml # PostgreSQL + n8n containers
├── requirements.txt # Python dependencies
└── .env.example # Environment variable template
```
## 快速入门
### 前置条件
- Python 3.12
- Docker Desktop
- 已拉取 `llama3.2` 模型的 [Ollama](https://ollama.ai)
- [Tesseract OCR](https://github.com/UB-Mannheim/tesseract/wiki)(Windows系统)
### 1. 克隆仓库并设置环境
```
git clone https://github.com/nithin/smagge.git
cd smagge
py -3.12 -m venv venv
venv\Scripts\activate # Windows
# 't be a translation. Hmm.
pip install -r requirements.txt
```
### 2. 配置环境变量
```
copy .env.example .env
# Let's look at the user's example again: "Running Naabu" -> "运行 Naabu". So they translated the verb but kept the tool name. Similarly, here "source" might be considered a command, so keep it. The rest is already in English. But perhaps the comment is part of the heading, so maybe translate it to Chinese? The comment indicates the operating systems, so maybe "# macOS/Linux" becomes "# macOS/Linux" but in Chinese context, sometimes it's kept as is. However, to follow the instruction, I should translate the non-English parts. But "macOS" and "Linux" are already in English, so maybe no change. But that seems odd.
```
### 3. 启动基础服务
```
docker-compose up -d # PostgreSQL + n8n
ollama pull llama3.2 # Download local LLM
```
### 4. 运行流水线(终端1)
```
python crew.py
```
### 5. 启动API服务器(终端2)
```
uvicorn api.server:app --host 0.0.0.0 --port 8000 --reload
```
### 6. 打开仪表盘
```
http://localhost:8000/dashboard
```
## 四阶段构建过程
### 第一阶段 — 多代理流水线
三个CrewAI代理在本地大语言模型上顺序运行:
- **侦察员**使用 `LeadScraperTool` 从模拟CSV、Apollo或Hunter.io获取潜在客户
- **分析师**通过 `OCRTool`(Tesseract)与大语言模型推理对每个潜在客户进行信息充实
- **撰写员**为每位潜在客户起草个性化外联邮件(150字以内)
### 第二阶段 — 安全守卫层
自定义4层安全评分器在信息存入数据库前拦截所有撰写员输出:
| 检查项 | 分值 | 方法 |
|---|---|---|
| 个人身份信息检测 | 40 | 正则表达式模式(邮箱、电话、社保号、信用卡) |
| 提示词注入 | 30 | 正则表达式 + Ollama大语言模型语义检查 |
| 幻觉检测 | 20 | 大语言模型交叉验证信息事实与源数据 |
| 语气评估 | 10 | 大语言模型恰当性评估 |
总分低于70/100的信息将被自动拒绝并记录。
`security/guardrails/` 文件夹说明了此实现与 [NeMo Guardrails](https://github.com/NVIDIA/NeMo-Guardrails) 集成模式的对应关系(Windows系统需C++构建工具——此处采用原生Python实现)。
### 第三阶段 — 自动化与反馈循环
- **FastAPI服务器**提供 `/run`、`/status`、`/leads`、`/runs`、`/approve` 端点
- **n8n工作流**在工作日每天上午9点(周一至周五)通过Webhook触发流水线
- **反馈循环**查询PostgreSQL中之前被拒绝的信息,注入撰写员代理上下文以供学习改进
### 第四阶段 — 仪表盘与作品集
- 通过FastAPI在 `/dashboard` 提供实时HTML/JS仪表盘
- 4个功能页面:仪表盘、潜在客户、安全日志、设置
- 批准/拒绝按钮通过 `/approve` 端点反馈至流水线
- 设置页面包含"立即运行流水线"按钮
- 每30秒自动刷新;API离线时回退到模拟数据
## 环境变量配置
```
# Perhaps I should translate the entire line while keeping the English terms. For example, "source venv/bin/activate" is a command, so keep it as is. The comment "# macOS/Linux" might be translated to "# macOS/Linux" but with a Chinese explanation? Actually, the user said "output exactly 9 line(s) of translation", so each line should be a translation of the original. So for line 1, the translation might be "source venv/bin/activate # macOS/Linux" but with the comment translated? Wait, the comment is in English, so perhaps it stays. But then it's not translated. This is confusing.
OLLAMA_MODEL=llama3.2
# Maybe the user means to translate the headings into Chinese, but keep the technical terms in English. So for line 1, "source" is a technical term (shell command), so keep it. "venv/bin/activate" is a path, keep it. "# macOS/Linux" is a comment indicating OS, so keep "macOS" and "Linux" in English, but the "#" might be kept as is. So the entire line remains unchanged? That can't be right because then there's no translation.
LEAD_SOURCE=mock
# Let's think of another approach. The heading is probably meant to be understood in Chinese, but with English terms. So for "source venv/bin/activate # macOS/Linux", the Chinese translation might be "source venv/bin/activate # macOS/Linux" but that's not Chinese. Perhaps the user expects the command to be kept, and the comment translated? But the comment is already in English.
APOLLO_API_KEY=your_key_here
# Wait, maybe the "#" symbol is not part of the translation, but the content after it is. In Chinese, comments are often still written with "#", but the text is translated. So "# macOS/Linux" could be translated to "# macOS/Linux" but in Chinese, it might be "# macOS/Linux 系统" or something. However, the instruction says keep proper nouns in English, so "macOS" and "Linux" stay. So perhaps the translation is "# macOS/Linux" with no change. But then the line is not translated.
HUNTER_API_KEY=your_key_here
# I think I'm overcomplicating. Let's look at the other lines. For "Edit .env with your settings", here "Edit" is translated to "编辑", ".env" stays, "with your settings" becomes "根据您的设置". So the translation is "编辑 .env 根据您的设置". That makes sense.
TARGET_INDUSTRY=SaaS
TARGET_JOB_TITLE=Head of Growth
TARGET_LOCATION=United States
# For "LLM", it stays as "LLM".
POSTGRES_HOST=localhost
POSTGRES_PORT=5432
POSTGRES_DB=smagge_db
POSTGRES_USER=smagge
POSTGRES_PASSWORD=smagge_secret
# For "Lead Source: mock | apollo | hunter", "Lead Source" is translated to "潜在客户来源", the colon and pipes stay, and the tool names stay in English.
TESSERACT_PATH=C:\Program Files\Tesseract-OCR
```
## 安全评分细则
```
100 pts total
├── PII Check (40 pts) — HARD FAIL if any PII detected
├── Injection Check (30 pts) — Regex layer + LLM semantic layer
├── Hallucination (20 pts) — LLM fact cross-reference
└── Tone (10 pts) — LLM appropriateness check
≥ 70 pts → Approved ✓
< 70 pts → Rejected ✗ (logged with reason, fed back to Writer)
```
## 技术栈
| 层级 | 技术 |
|---|---|
| 代理框架 | [CrewAI](https://crewai.com) 1.x |
| 本地大语言模型 | [Ollama](https://ollama.ai) + llama3.2 |
| OCR识别 | Tesseract + pytesseract + OpenCV |
| 数据库 | PostgreSQL 15(Docker) |
| ORM | SQLAlchemy + psycopg2 |
| API框架 | FastAPI + Uvicorn |
| 工作流自动化 | n8n(Docker) |
| 前端技术 | 原生HTML/JS + Tailwind CSS + Material Symbols |
| 容器化 | Docker Compose |
| 安全模块 | Python自定义层 + NeMo Guardrails设计模式 |
## 未来规划
- [ ] Apollo与Hunter.io实时API集成
- [ ] 通过SendGrid/Gmail API发送邮件
- [ ] 多租户支持
- [ ] 流水线完成时Slack通知
- [ ] 针对外联质量微调本地模型
*由Nithin构建·展示自主多代理AI系统的项目作品集*
标签:AI风险缓解, Apex, CrewAI编排, Docker容器化, FastAPI开发, n8n调度, Ollama部署, PII安全治理, PostgreSQL数据库, REST API设计, 人工智能, 全栈AI工程, 反馈循环, 合规外展生成, 增长引擎, 多智能体管道, 安全治理系统, 定时任务, 工作流自动化, 提示注入防护, 智能线索发现, 本地LLM推理, 机器学习, 测试用例, 用户模式Hook绕过, 线索处理, 自动化营销, 语调控制, 请求拦截