nmdra/AutoMark

GitHub: nmdra/AutoMark

一个基于 LangGraph 和 Ollama 的本地隐私保护自动评分系统，解决离线环境下学生作业的自动化打分与反馈问题。

Stars: 1 | Forks: 0

# AutoMark **学生作业自动评分多智能体系统** 一个基于 [LangGraph](https://github.com/langchain-ai/langgraph) 和 [Ollama](https://ollama.com) 构建的本地、隐私保护的自动评分系统。它使用一组专业化智能体对 `.txt` 或 `.pdf` 格式的学生提交作业进行评分，依据 JSON 量规输出结构化的 Markdown 反馈报告和评分表——整个过程完全在本地硬件上运行，不调用任何外部 API。一个 FastAPI REST 封装层将评分流程暴露为 HTTP 服务，便于与其他工具或前端集成。 ## 目录 - [架构](#architecture) - [智能体](#agents) - [工具](#tools) - [项目结构](#project-structure) - [前置条件](#prerequisites) - [安装](#installation) - [配置](#configuration) - [用法](#usage) - [REST API](#rest-api) - [Docker 栈](#docker-stack) - [Make 命令](#make-targets) - [测试](#testing) - [数据格式](#data-formats) - [输出](#output) - [故障排查](#troubleshooting) ## 架构该流程是一个定向的 LangGraph `StateGraph`。入口点根据文件类型路由到相应的摄入智能体，随后依次经过分析、历史对比和报告生成阶段： ``` flowchart LR detect["_detect"] ingest["ingestion (txt)"] pdf["pdf_ingestion"] analysis["analysis"] historical["historical"] report["report"] endnode["END"] detect --> ingest detect --> pdf ingest --> analysis pdf --> analysis analysis --> historical historical --> report report --> endnode analysis -. ingestion failed .-> report ``` | 步骤 | 智能体 | 职责 | |---|---|---| | 1 | **_detect** | 根据文件扩展名路由到正确的摄入智能体 | | 2a | **Ingestion** | 验证并读取 `.txt` 提交；提取 `student_id` | | 2b | **PDF Ingestion** | 通过 `pymupdf4llm` 将 `.pdf` 转换为 Markdown；使用 LLM 提取 `student_id` 与 `student_name` | | 3 | **Analysis** | 使用 LLM 对每个量规标准评分；总分通过确定性方法计算 | | 4 | **Historical** | 将结果保存到 SQLite；检索历史报告；生成进展洞察 | | 5 | **Report** | 生成 Markdown 反馈报告和评分表 | 如果摄入失败（例如文件缺失），流程会直接短路到 `report`，生成不包含分数的最小化回退报告。 ## 智能体 ### 摄入智能体 (`agents/ingestion.py`) 验证两个输入文件路径非空、文件存在且非空、扩展名正确。读取纯文本提交并解析 JSON 量规。使用正则模式（`Student ID: <值>`）提取 `student_id`。设置 `ingestion_status` 为 `"success"` 或 `"failed"`。 ### PDF 摄入智能体 (`agents/pdf_ingestion.py`) 验证 `.pdf` 提交路径，调用 `pymupdf4llm` 将 PDF 转换为 Markdown，并始终将该完整 Markdown 传递给下游评分/报告。默认使用轻量模型（`gemma3:1b-it-q4_K_M`）从紧凑的元数据提示中提取学生信息（`student_id`、`student_name`）。设置 `ingestion_status` 为 `"success"` 或 `"failed"`。 ### 分析智能体 (`agents/analysis.py`) 调用分析模型（默认：`phi4-mini:3.8b-q4_K_M`）对每个量规标准评分。LLM 输出使用 Pydantic 模式（`RubricScores`）进行结构化。分数被钳制在 `[0, max_score]` 范围内，总分通过 `calculate_total_score` 确定性计算——LLM 不会被用于算术运算。 ### 历史智能体 (`agents/historical.py`) 将当前评分结果保存到 SQLite 数据库，检索该学生的所有历史报告，并使用轻量模型在存在历史记录时生成简洁的进展洞察。同时将一份独立的性能分析报告写入磁盘。 ### 报告智能体 (`agents/report.py`) 调用分析模型生成格式良好的 Markdown 反馈报告和评分表。如果 LLM 不可用，则回退到基于模板的报告。 ## 工具 | 模块 | 函数 | 描述 | |---|---|---| | `tools/file_ops.py` | `read_text_file` | 读取 UTF-8 文本文件；将 `OSError` 包装为 `RuntimeError` | | `tools/file_ops.py` | `read_json_file` | 读取并解析 JSON 文件；将 `OSError`/`JSONDecodeError` 包装为异常 | | `tools/file_ops.py` | `validate_submission_files` | 检查存在性、大小和文件扩展名 | | `tools/file_writer.py` | `write_feedback_report` | 将反馈报告写入磁盘，必要时创建父目录 | | `tools/file_writer.py` | `write_analysis_report` | 将性能分析报告写入磁盘 | | `tools/score_calculator.py` | `calculate_total_score` | 求和标准分数，计算百分比，分配字母等级 | | `tools/logger.py` | `log_agent_action`, `log_model_call`, `timed_model_call` | 通过 `structlog` 输出结构化日志（JSON 文件 + 控制台） | | `tools/db_manager.py` | `init_db` | 初始化 SQLite 学生结果数据库 | | `tools/db_manager.py` | `save_report` | 保存学生评分结果 | | `tools/db_manager.py` | `get_past_reports` | 检索学生所有历史评分结果 | | `tools/pdf_processor.py` | `convert_pdf_to_markdown` | 使用 `pymupdf4llm` 将 PDF 转换为 Markdown 文本 | **等级阈值：** | 等级 | 百分比 | |---|---| | A | ≥ 90% | | B | ≥ 75% | | C | ≥ 60% | | D | ≥ 50% | | F | < 50% | ## 项目结构 ``` . ├── data/ │ ├── rubric.json # Rubric definition (criteria + max scores) │ ├── submission.txt # Sample plain-text student submission │ └── students.db # SQLite database (created by `make init-db`) ├── src/mas/ │ ├── agents/ │ │ ├── ingestion.py # Plain-text ingestion + student ID extraction │ │ ├── pdf_ingestion.py # PDF → Markdown ingestion + LLM detail extraction │ │ ├── analysis.py # LLM-powered rubric scoring │ │ ├── historical.py # Persist results + progression insights │ │ └── report.py # Markdown feedback report + marking sheet │ ├── tools/ │ │ ├── file_ops.py # File reading and validation helpers │ │ ├── file_writer.py # Report and analysis file writers │ │ ├── score_calculator.py │ │ ├── logger.py │ │ ├── db_manager.py # SQLite persistence helpers │ │ └── pdf_processor.py # pymupdf4llm PDF-to-Markdown converter │ ├── api.py # FastAPI REST wrapper │ ├── config.py # Environment-based settings (python-dotenv) │ ├── graph.py # LangGraph pipeline definition │ ├── llm.py # Ollama LLM factory functions │ └── state.py # Shared AgentState TypedDict ├── tests/ │ ├── test_tools.py # Unit tests for all tool modules (no LLM) │ ├── test_ingestion.py # Ingestion agent tests │ ├── test_pdf_ingestion.py# PDF ingestion agent tests (mocked LLM) │ ├── test_analysis.py # Analysis agent tests (mocked LLM) │ ├── test_historical.py # Historical agent tests (mocked LLM + DB) │ ├── test_report.py # Report agent tests (mocked LLM) │ ├── test_config.py # Settings / config tests │ └── test_llm_judge.py # LLM-as-a-Judge tests via phi4-mini + requests ├── output/ # Generated reports (created on first run) ├── Dockerfile.api # Docker image for the FastAPI service ├── docker-compose.yml # Full stack: Ollama + AutoMark API ├── .env.example # Example environment variable file ├── Makefile └── pyproject.toml ``` ## 前置条件 - **Python 3.13+** 和 [uv](https://docs.astral.sh/uv/)（包管理器） - **Docker** 与 Docker Compose（用于 Ollama 服务和/或完整栈） - 至少 **4 GB** 空闲内存（用于 `phi4-mini:3.8b-q4_K_M` 模型） - 可选：用于更快推理的 NVIDIA GPU ## 安装 **1. 克隆仓库** ``` git clone https://github.com/nmdra/AutoMark.git cd AutoMark ``` **2. 安装 Python 依赖** ``` # 运行时 + 开发依赖项（包括 pytest 和 requests） uv pip install -e ".[dev]" ``` **3. 配置环境变量**（可选） ``` cp .env.example .env # 编辑 .env 以覆盖默认值（模型名称、路径、Ollama URL 等） ``` **4. 初始化学生数据库** ``` make init-db ``` **5. 启动 Ollama 服务** ``` # CPU（默认） make start # GPU（NVIDIA） make start-gpu ``` **6. 拉取模型** ``` make pull-model ``` 这会预先将默认分析模型拉取到 Ollama 容器中。首次下载约需数 GB，仅需执行一次（数据持久化在 `~/.ollama` 卷中）。 ## 配置设置从环境变量（或项目根目录的 `.env` 文件）加载，所有值均有合理默认值。 AutoMark 默认使用两个本地 Ollama 模型角色： - **分析模型**：用于评分与报告生成（`AUTOMARK_ANALYSIS_MODEL_NAME`） - **轻量模型**：用于 PDF 元数据提取与历史洞察（`AUTOMARK_LIGHT_MODEL_NAME`） | 变量 | 默认值 | 描述 | |---|---|---| | `AUTOMARK_ANALYSIS_MODEL_NAME` | `phi4-mini:3.8b-q4_K_M` | 用于量规评分与报告生成的 Ollama 模型标识符 | | `AUTOMARK_MODEL_NAME` | `phi4-mini:3.8b-q4_K_M` | 遗留别名（已弃用）；分析模型未设置时作为后备，启用分析模型后忽略 | | `AUTOMARK_LIGHT_MODEL_NAME` | `gemma3:1b-it-q4_K_M` | 轻量，用于提取/洞察任务 | | `AUTOMARK_OLLAMA_BASE_URL` | `http://localhost:11434` | Ollama HTTP API 基础 URL | | `AUTOMARK_DB_PATH` | `data/students.db` | SQLite 数据库路径 | | `AUTOMARK_LOG_FILE` | `agent_trace.log` | JSON 代理跟踪日志路径 | | `AUTOMARK_OUTPUT_PATH` | `output/feedback_report.md` | 默认反馈报告路径 | | `AUTOMARK_MARKING_SHEET_PATH` | `output/marking_sheet.md` | 默认评分表路径 | | `AUTOMARK_ANALYSIS_REPORT_PATH` | `output/analysis_report.md` | 默认分析报告路径 | | `AUTOMARK_DATA_BASE_DIR` | `<项目根>/data` | 提交/量规文件基础目录（API 路径遍历防护） | | `AUTOMARK_NUM_CTX` | `4096` | Ollama 上下文窗口大小 | | `AUTOMARK_NUM_PREDICT` | `512` | 每个模型响应的最大生成令牌数 | | `AUTOMARK_LLM_REPORT_ENABLED` | `true` | 设为 `false` 以跳过 LLM 报告生成，改用确定性模板 | | `AUTOMARK_SUBMISSION_MAX_CHARS` | `8000` | 发送至分析模型的提交最大字符数 | | `AUTOMARK_MIN_REPORTS_FOR_INSIGHTS` | `1` | 生成进展洞察所需的最少历史报告数 | | `AUTOMARK_PDF_REGEX_FAST_PATH_ENABLED` | `true` | 设为 `false` 以强制使用基于模型的提取而非正则快速路径 | | `AUTOMARK_JOB_WORKER_CONCURRENCY` | `2` | 异步批处理任务的线程数 | | `AUTOMARK_JOB_QUEUE_MAX_SIZE` | `100` | 内存中排队的最大批处理任务数 | | `AUTOMARK_JOB_MAX_RETRIES` | `1` | 每个批处理项的默认重试次数 | | `AUTOMARK_BATCH_MAX_ITEMS` | `100` | 每个批处理请求接受的最大项目数 | | `AUTOMARK_JOB_RETENTION_DAYS` | `30` | 建议的已完成任务保留周期 | | `AUTOMARK_EXPORT_MAX_BYTES` | `10485760` | 最大允许导出的 CSV/JSON/PDF 大小（字节） | 为提升并行报告/洞察生成的吞吐量，可在 Ollama 服务上设置 `OLLAMA_NUM_PARALLEL>=2`。 ## 用法 ### REST API 启动开发 API 服务器（连接到 `localhost:11434` 处的 Ollama）： ``` make api ``` 交互式 API 文档位于 [http://localhost:8000/docs](http://localhost:8000/docs)。 **端点：** | 方法 | 路径 | 描述 | |---|---|---| | `GET` | `/health` | 活跃性检查 | | `POST` | `/grade` | 运行评分管道 | | `POST` | `/grade/batch` | 提交异步批处理评分任务 | | `GET` | `/jobs` | 列出异步任务（支持状态、限制、偏移） | | `GET` | `/jobs/{job_id}` | 获取完整任务状态及每项结果 | | `POST` | `/jobs/{job_id}/cancel` | 取消排队/运行中的任务 | | `POST` | `/jobs/{job_id}/exports/{format}` | 为已完成任务生成 CSV/JSON/PDF 导出 | | `GET` | `/jobs/{job_id}/exports/{format}` | 下载先前生成的导出文件 | | `GET` | `/sessions/{session_id}/logs` | 检索指定会话的跟踪日志条目 | **示例评分请求：** ``` curl -X POST http://localhost:8000/grade \ -H "Content-Type: application/json" \ -d '{"submission_path": "submission.txt", "rubric_path": "rubric.json"}' ``` 两个路径均相对于 `AUTOMARK_DATA_BASE_DIR`（默认：`data/`）解析，并阻止路径遍历。 **示例批处理请求：** ``` curl -X POST http://localhost:8000/grade/batch \ -H "Content-Type: application/json" \ -d '{ "items": [ {"submission_path": "submission.txt", "rubric_path": "rubric.json", "correlation_id": "s1"}, {"submission_path": "submission2.txt", "rubric_path": "rubric.json", "correlation_id": "s2"} ], "max_retries": 1 }' ``` 随后轮询 `GET /jobs/{job_id}` 获取状态/进度，并使用 `POST /jobs/{job_id}/exports/{format}`（`csv`、`json` 或 `pdf`）构建可下载工件。 **示例响应（缩写）：** ``` { "session_id": "abc123", "student_id": "IT21000001", "student_name": "", "total_score": 17.0, "percentage": 85.0, "grade": "B", "summary": "A well-structured submission...", "criteria": [...], "feedback_report": "## Feedback\n...", "output_filepath": "output/20240101_120000_..._feedback_report.md", "marking_sheet_path": "output/20240101_120000_..._marking_sheet.md" } ``` ### Docker 栈使用 Docker Compose 运行完整栈（Ollama + AutoMark API）： ``` make docker-up ``` 这将构建 API 镜像并启动 Ollama 与 AutoMark API 容器。API 在 `http://localhost:8000` 可用。停止命令： ``` make docker-down ``` ## Make 命令 | 目标 | 描述 | |---|---| | `make start` | 启动 Ollama（CPU Docker 配置文件） | | `make start-gpu` | 启动 Ollama（NVIDIA GPU Docker 配置文件） | | `make stop` | 停止所有 Ollama 容器 | | `make pull-model` | 将默认分析模型拉取到 Ollama 容器 | | `make init-db` | 在 `data/students.db` 初始化 SQLite 学生数据库 | | `make api` | 在端口 8000 启动 FastAPI 开发服务器 | | `make docker-up` | 构建并启动完整 Docker 栈（Ollama + API） | | `make docker-down` | 停止完整 Docker 栈 | | `make test` | 运行完整的 pytest 测试套件 | | `make logs` | 跟踪 Ollama 容器日志 | | `make clean` | 移除容器、卷、`__pycache__` 和生成的输出 | ## 测试 ``` make test # 或直接： uv run pytest tests/ -v ``` 测试套件分为三层： ### 工具单元测试 (`test_tools.py`) 快速、确定性的测试，覆盖所有工具模块——无需 LLM 或网络。测试成功路径、错误处理、边界情况、文件 I/O、等级阈值、日志格式和时间戳有效性。 ### 代理集成测试每个智能体都有独立的测试文件。LLM 调用通过 `unittest.mock` 模拟，因此这些测试无需 Ollama 即可即时运行： | 文件 | 测试智能体 | |---|---| | `test_ingestion.py` | 摄入——文件验证、学生 ID 提取、错误处理 | | `test_pdf_ingestion.py` | PDF 摄入——PDF 转换、LLM 提取、回退行为 | | `test_analysis.py` | 分析——评分、分数钳制、LLM 回退、等级逻辑 | | `test_historical.py` | 历史——数据库持久化、历史报告检索、进展洞察生成 | | `test_report.py` | 报告——文件写入、LLM 回退、覆盖行为 | | `test_config.py` | 配置——环境变量加载、默认值 | ### LLM 作为裁判 (`test_llm_judge.py`) 使用 `requests` 库直接调用 `phi4-mini`（通过 Ollama REST API `/api/generate`），询问给定分配分数是否公平。模型被提示仅以 **YES** 或 **NO** 响应。包含三个测试： - 对高质量提交给予高分——期望 `YES` - 对同一提交给予零分——期望 `NO` - 响应格式断言（必须恰好为 `YES` 或 `NO`）当 Ollama 未运行或未拉取模型时，这些测试会自动跳过。 ## 数据格式 ### 提交支持的格式： - **纯文本**（`.txt`）——UTF-8 编码。可选地在任意行包含 `Student ID: <值>` 以自动提取 ID。 - **PDF**（`.pdf`）——自动转换为 Markdown。LLM 尝试从封面页提取 `student_id` 与 `student_name`。 ### 量规（`data/rubric.json`） ``` { "module": "CTSE – IT4080", "assignment": "Cloud Technology Fundamentals", "total_marks": 20, "criteria": [ { "id": "C1", "name": "Definition of Containerisation", "description": "Accurately defines containerisation and distinguishes it from virtual machines.", "common_mistakes": ["missing_answer", "out_of_context"], "max_score": 5 } ] } ``` 必需字段：`total_marks`（整数）、`criteria`（数组）。每个标准需包含 `id`、`name`、`description` 和 `max_score`。可选字段：`common_mistakes`（），用于记录预期错误，例如 `missing_answer` 和 `out_of_context`。 ## 输出成功运行后，以下文件将写入 `output/` 目录。通过 REST API 调用时，文件名会包含时间戳、学生姓名和学号以避免冲突： | 文件 | 描述 | |---|---| | `*_feedback_report.md` | 总体摘要、每标准分数与理由、改进建议 | | `*_marking_sheet.md` | 紧凑的评分表，包含每标准常见错误标签及常见错误汇总 | | `*_analysis_report.md` | 历史性能分析及进展洞察 | 一个追加式的结构化 JSON 跟踪日志（通过 `structlog`）会写入项目根目录的 `agent_trace.log`，在每次运行时记录。 ## 故障排查 **Ollama 无法访问** 确保容器正在运行（`docker ps`）且端口已绑定：`curl http://localhost:11434/api/tags` **模型未找到** 运行 `make pull-model`。首次拉取需要约 2.5 GB 磁盘空间。 **LLM 返回格式错误的 JSON（分析智能体）** 该智能体会捕获异常并将所有标准分数回退为零，同时在最终状态中添加 `error` 字段。请查看 `agent_trace.log` 获取详细信息。 **未检测到 GPU** 确保已安装 [NVIDIA 容器工具包](https://docs.nvidia.com/datacenter/cloud-native/container-toolkit/install-guide.html)，并使用 `make start-gpu` 启动。 **数据库未初始化** 在首次运行评分前执行 `make init-db` 以创建 `data/students.db`。

标签：AI辅助评估, AI风险缓解, AV绕过, FastAPI, JSON评分标准, LangGraph, LLM评估, Markdown报告, Ollama, PDF解析, PyRIT, REST API, 作业批改, 多智能体系统, 学生作业评估, 教育技术, 文本解析, 文档批改, 无API调用, 有状态工作流, 本地AI, 本地运行, 状态图, 离线批改, 管道处理, 结构化反馈, 网络安全, 自动批改, 自动评分, 自托管, 请求拦截, 逆向工具, 隐私保护