Srinihalreddyr/vitatwin

GitHub: Srinihalreddyr/vitatwin

VitaTwin 是一个结合本地 LLM、FAISS RAG 和临床规则引擎的心理健康智能原型系统，通过对纵向用户数据的多信号分析实现倦怠、焦虑与抑郁的早期检测与可解释风险预警。

Stars: 0 | Forks: 0

# 🧠 VitaTwin — 心理健康智能助手 VitaTwin 是一个临床智能原型，它结合了**本地 LLM (Ollama/llama3)**、**FAISS RAG pipeline**、**多信号临床规则引擎**和**实时 Streamlit dashboard**，旨在将心理健康护理从被动治疗转向**早期危机检测**。 ## 🚀 快速开始 ### 步骤 0 — 导航至项目文件夹 ``` cd vitatwin ``` ### 步骤 1 — 安装 Ollama（用于 LLM 支持）从 **https://ollama.com/download** 下载并安装，然后运行： ``` ollama pull llama3 ``` ### 步骤 2 — 安装 Python 依赖 ``` pip install faiss-cpu numpy pandas flask streamlit plotly ollama ``` ### 步骤 3 — 生成数据集 + 构建 FAISS 索引 ``` python main.py setup ``` ### 步骤 4a — 启动 dashboard ``` python -m streamlit run ui/dashboard.py ``` ### 步骤 4b — 或者启动 REST API ``` python main.py api ``` ### 步骤 5 — 运行系统测试 ``` python main.py test ``` ## 🏗️ 架构 ``` vitatwin/ ├── data/ │ ├── generate_dataset.py # Synthetic dataset generator (50 users, 14 days) │ ├── users.json # 50 user profiles │ ├── users.csv # Flat CSV export │ ├── vitatwin.db # SQLite database │ ├── faiss.index # FAISS vector index │ └── faiss_meta.pkl # TF-IDF vectorizer + document store │ ├── rag/ │ └── rag_pipeline.py # FAISS + TF-IDF RAG engine │ ├── models/ │ ├── clinical_engine.py # Rule-based early detection engine │ └── assistant.py # Ollama LLM assistant (with template fallback) │ ├── api/ │ └── server.py # Flask REST API (6 endpoints) │ ├── ui/ │ └── dashboard.py # Streamlit dashboard (4 pages) │ └── main.py # Entry point (setup / api / test) ``` ### 系统流程 ``` User Query │ ▼ [RAG Pipeline] TF-IDF Vectorizer → FAISS Index → Top-K User Profiles retrieved │ ▼ [Clinical Rule Engine] Multi-signal Rule Evaluation → Findings + Signals + Confidence │ ▼ [Ollama LLM — llama3] System prompt + structured patient context → Natural language response │ ▼ [API / Dashboard] JSON response with risk_score, explanation, suggested_actions, explainability block ``` ## 📦 第 1 部分 — 数据集 **50 个合成用户配置**，每个包含 14 天的纵向数据。 | 字段 | 描述 | |-------|-------------| | `user_id` | 唯一标识符 (VT001–VT050) | | `mood_history` | 每日情绪得分 (1–10) 及其真实趋势 | | `stress_scores` | 每日压力得分 (1–10) | | `sleep_hours` | 每日睡眠时长 | | `energy_levels` | 每日精力得分 (1–10) | | `journal_entries` | 每日文字日记条目 | | `social_indicators` | 社交互动、运动、工作时长、屏幕时间 | | `aggregates` | 所有指标的 14 天平均值 | **状态标签**（在 50 个用户中的分布情况）： - `healthy` (8), `mild_stress` (8), `moderate_stress` (8) - `severe_stress` (6), `burnout_risk` (7), `anxiety_trend` (6) - `depression_indicators` (4), `resilient` (3) 存储于：`data/users.json`, `data/users.csv`, `data/vitatwin.db` ## 🔍 第 2 部分 — RAG Pipeline **文件：** `rag/rag_pipeline.py` ``` User Profiles │ ▼ Document Builder → Clinically-engineered text chunks │ ▼ TF-IDF Vectorizer (512 features, IDF-weighted, L2-normalized) │ ▼ FAISS IndexFlatIP (inner-product = cosine similarity on unit vectors) │ ▼ Top-K profiles → injected as context into Ollama LLM prompt ``` **支持的查询：** - `"What mental health risks are visible?"` → 检索高风险配置 - `"Has stress increased recently?"` → 检索压力呈上升趋势的用户 - `"Summarize this user's emotional state"` → 返回完整的纵向上下文 ## 🤖 第 3 部分 — LLM 心理健康助手 (Ollama) **文件：** `models/assistant.py` 该助手使用 **Ollama**（默认为本地 LLM — llama3）并结合 FAISS 检索到的上下文： ``` # RAG 检索患者上下文 context = _build_user_context(user, result) # 14-day data + clinical findings # Ollama 生成临床响应 response = ollama.chat( model="llama3", messages=[ {"role": "system", "content": SYSTEM_PROMPT}, # clinical analyst persona {"role": "user", "content": f"{context}\n\nQuestion: {question}"} ] ) ``` **功能：** - 由 LLM 根据用户的风险配置生成自适应的日常问候问题 - 基于实际 14 天数据的支持性临床叙述 - 跨主题推理（例如，“压力如何影响睡眠？”） - 如果 Ollama 未运行，可平滑降级至模板引擎 **LLM 状态**会实时显示在 dashboard 侧边栏和 AI 助手页面顶部。 ## ⚠️ 第 4 部分 — 早期检测逻辑 **文件：** `models/clinical_engine.py` 四个规则评估器并行运行： ### 倦怠检测器 | 信号 | 阈值 | 权重 | |--------|-----------|--------| | 平均压力 (14天) | ≥ 7.0 | +15–25 分 | | 压力趋势斜率 | > 0.15/天 | +15 分 | | 睡眠下降 (7天) | ≤ −20% | +20 分 | | 平均精力 | ≤ 4.0 | +10–20 分 | | 工作时长 | ≥ 10小时/天 | +10 分 | | 负面日记比例 | ≥ 50% | +10–20 分 | ### 焦虑检测器信号：压力升高、压力波动大 (σ)、睡眠不佳、负面日记、社交退缩 ### 抑郁检测器信号：情绪低落 (平均值 ≤ 4.0)、情绪呈下降趋势、无运动、孤立 (每周社交 ≤ 1 次)、精力极低、持续负面认知 ### 韧性检测器识别具有持续良好健康特征的用户。 ### 风险等级 | 得分 | 等级 | |-------|-------| | 0–30 | 低 | | 31–55 | 中等 | | 56–75 | 高 | | 76–100 | 临界 | ## 💡 第 5 部分 — 可解释性每项发现都具有完整的来源信息： ``` Finding( category="burnout_risk", label="Potential Burnout Risk", risk_score=75.0, confidence=0.8, # 80% — 4 of 5 signals fired signals=[ Signal("avg_stress", value=9.6, threshold=7.0, severity="severe"), Signal("low_energy", value=3.7, threshold=4.0, severity="moderate"), Signal("overwork", value=13.0, threshold=10.0, severity="mild"), Signal("negative_journals", value=0.71, threshold=0.5, severity="severe"), ], triggered_data={ "avg_stress_14d": 9.6, "stress_trend_slope": 0.039, "sleep_pct_change_7d": -18.2, "avg_energy_14d": 3.7, "work_hours_daily": 13.0, "negative_journal_ratio": 0.71, } ) ``` ## 🌐 第 6 部分 — 临床智能 API **文件：** `api/server.py` · 启动：`python main.py api` → `http://localhost:5000` | 方法 | Endpoint | 描述 | |--------|----------|-------------| | GET | `/mental/health` | 服务健康检查 | | GET | `/mental/users` | 列出所有 50 位用户 | | POST | `/mental/screen` | 完整的心理健康筛查 | | POST | `/mental/ask` | 基于 LLM 的临床问答 | | POST | `/mental/risk-score` | 风险得分 + 完整的可解释性信息块 | | GET | `/mental/summary/` | 情绪总结 + 自适应问题 | | GET | `/mental/search?q=...` | 基于 FAISS 的语义用户搜索 | ## 📊 第 7 部分 — Dashboard **文件：** `ui/dashboard.py` · 启动：`streamlit run ui/dashboard.py` | 页面 | 内容 | |------|----------| | **人群概览** | KPI 卡片、风险饼图、状态柱状图、压力与情绪散点图、高风险表 | | **个人配置** | 14 天趋势图、健康雷达图、临床发现 + 信号、日记条目、社交背景 | | **AI 助手** | 基于 Ollama 的聊天、快速提问、自适应日常问候 | | **语义搜索** | 对所有 50 个配置进行 FAISS 搜索并显示相似度得分 | ## 🔧 技术栈 | 组件 | 技术 | |-----------|-----------| | LLM | Ollama (llama3 / mistral / gemma2 — 自动检测) | | Vector DB | FAISS IndexFlatIP | | Embeddings | 自定义 TF-IDF (512维, L2归一化) | | Vector Memory | 基于 FAISS 的会话记忆 (TF-IDF + 余弦相似度) | | 临床引擎 | 基于规则的多信号检测器 | | API | Flask | | Dashboard | Streamlit + Plotly | | 存储 | JSON + CSV + SQLite | ## ⭐ 附加功能 — Vector Memory **文件：** `models/vector_memory.py` VitaTwin 实现了**基于 FAISS 的 vector memory** —— 每一轮对话（问题 + 回答）都通过 TF-IDF 进行 embedding 并存储在 FAISS 索引中。对于每个新问题，系统会在记忆中搜索语义相似的历史对话，并将最相关的部分作为附加上下文注入到 LLM prompt 中。这赋予了助手真正的会话记忆能力 —— 如果你询问了关于压力的问题，然后提出一个后续问题，助手会记得之前的对话并自然地对其进行引用。 ``` New question │ ▼ VectorMemory.retrieve() — FAISS search over past turns │ ▼ Top-K relevant past turns → injected into LLM context block │ ▼ Ollama LLM — answers with awareness of conversation history │ ▼ VectorMemory.store() — new turn added to memory index ``` **实时状态**显示在 dashboard 侧边栏和 AI 助手标题处： ``` 🧠 Vector Memory · 4 turns · vocab 128 ``` ## 📋 环境要求 ``` faiss-cpu>=1.7.4 numpy>=1.24.0 pandas>=2.0.0 flask>=3.0.0 streamlit>=1.32.0 plotly>=5.18.0 ollama>=0.2.0 ```

标签：Kubernetes, LLM, RAG, Streamlit, Unmanaged PE, 人工智能, 代码示例, 医疗健康, 心理健康, 数据分析, 用户模式Hook绕过, 访问控制, 逆向工具