fuyufan-lab/xiaoyuan
GitHub: fuyufan-lab/xiaoyuan
一个为大模型引入规则、AC 测试与证据驱动的人机信任层,减少幻觉并统一跨平台协作。
Stars: 0 | Forks: 0
小元:解决大模型的使用幻觉 推演幻觉 上下文信息丢失导致执行出错等问题 让人机信任不再是难题
演绎系统 + Trellis,用规则、验收和证据约束 Claude、Cursor、OpenCode、Codex、GLM、GPT 与 OpenAI-compatible 外部模型。
这个“演绎系统”真正提高效率的点,不是它能直接写代码,而是它把开发从“模型凭感觉执行”改成了:
定义规则 → 推导结构 → 用数据验证 → 找差距 → 再写代码
所以当需求对的时候,它会非常快;当需求错、层级错、目标没锁死的时候,它也会非常快地“把错的东西做完整”。
安装部署 • 能力对比 • 功能介绍 • 外部模型配置 • AC 测试
## Why Xiaoyuan? | Capability | What it changes | | --- | --- | | **证据优先** | 把“模型说完成了”降级为“必须有规则、AC 测试和 evidence 才算完成”。 | | **反幻觉裁判层** | 次用大模型/主用大模型 可以生成、质疑和修复建议,但最终裁判固定为 deterministic gate。 | | **使用幻觉治理** | 避免把页面、链接、账号、回答、计划误判为真实可交付状态。 | | **红蓝对抗边界** | 预留 generator、red team、blue team、critic、judge 五类角色,防止外部模型自证可信。 | | **多平台入口** | 为 Claude、Cursor、OpenCode、Codex 提供一致的命令/skill 入口。 | | **交付顺序锁** | 默认按“数据 → 规则 → 差距 → 产出物 → 代码”推进,不让实现绕过验收逻辑。 | | **需求效果判断** | 在开发前判断需求做出来是不是用户真正要的效果,并对主方案、备选方案、兜底方案排序。 | | **本质需求 Root Gate** | 底层第一优先级能力;先发现本质需求、描绘效果、推导最优方案,再允许进入实现。 | ## Quick Start # 1. Clone git clone https://github.com/fuyufan-lab/xiaoyuan.git cd xiaoyuan # 2. Install Xiaoyuan into your project bash scripts/install.sh /path/to/project # 3. Create a local model config template bash scripts/configure-models.sh /path/to/project template # 4. Verify after installation cd /path/to/project python3 .deductive/scripts/xiaoyuan.py configure python3 .deductive/scripts/xiaoyuan.py platform-router The generated `.env.xiaoyuan` is local-only and should not be committed. Everyday use should happen through the host adapter, not by memorizing CLI commands. In Codex, Claude, Cursor, or OpenCode, use the native Xiaoyuan judge entry before action. The adapter calls the runtime kernel underneath while keeping the user in the current host workflow. ## Model Providers Xiaoyuan keeps external LLMs behind a trust boundary. Configure providers through environment variables or the local `.env.xiaoyuan` file generated during installation. # GLM: default generator/red-team provider GLM_API_KEY= GLM_BASE_URL= GLM_TEXT_MODEL=glm-5-turbo # GPT / OpenAI-compatible: reserved, disabled by default GPT_API_KEY= GPT_BASE_URL= GPT_TEXT_MODEL= # Any other OpenAI-compatible provider EXTERNAL_LLM_API_KEY= EXTERNAL_LLM_BASE_URL= EXTERNAL_LLM_MODEL= Apply provider values without passing secrets as positional CLI arguments: GLM_API_KEY="..." \ GLM_BASE_URL="..." \ GLM_TEXT_MODEL="glm-5-turbo" \ bash scripts/configure-models.sh /path/to/project glm No real API keys or fixed provider URLs are stored in this public repository. ## Use Cases ### Make AI coding work auditable Use Xiaoyuan when an agent claims that a feature, fix, migration, integration, or deployment step is done. The answer does not become trusted until the rule is covered by AC tests and the gate writes evidence. ### Run red/blue model review without letting models judge themselves Use GLM, GPT, or another OpenAI-compatible model to generate counterexamples, repair suggestions, and semantic risk reviews. Xiaoyuan keeps those outputs below the evidence layer; acceptance still belongs to deterministic gate checks. ### Keep one trust protocol across tools Claude, Cursor, OpenCode, and Codex can expose different command surfaces, but Xiaoyuan keeps the core judgment path the same: rules, tests, state, and evidence. ### Reduce human-in-the-loop hallucination Xiaoyuan is built for moments where people mistake a fluent response, a visible page, a generated file, or a public link for a verified system. It forces the question back to: what data exists, what rule applies, what gap remains, what was produced, and what code proves it? ### Judge whether the request is the real need Use `judge-intent` before implementation when a request may describe a technical shortcut instead of the desired user effect. It now prints effect targets, a best-path user story, a bad-fit preview, confirmation questions, and acceptance signals. For example, if the real need is native ecommerce-platform operation with no perceived latency, Xiaoyuan should reject remote control as the primary plan and rank a local browser plugin or desktop assistant above KasmVNC/noVNC/WebRTC. ### Discover the essence before choosing a plan Use `discover-essence` when the risk is deeper than a single bad implementation choice. It is Xiaoyuan's root gate and must run before Trellis task execution, intent-effect judging, acceptance tests, code generation, or external-model red/blue debate. It is universal: extract information, infer essence, define constraints, derive structure, depict effect, reject pseudo-optimal plans, generate acceptance signals, then implement. In ecommerce operations, this should point to a native multi-platform operating workspace with plugins and evidence, not a loop of iframe, remote-control, or RPA-first half-solutions. Use `preview-essence` to turn the essence into a readable structure map before implementation. Use `audit-implementation` after implementation to scan files for visible pseudo-optimal terms or missing essence signals; this is a deterministic text-level check, not yet a full screenshot/DOM audit. Use `thinking-protocol` when a request needs top-level reasoning quality. It borrows the transferable logic behind Thinking-Claude style prompts: decompose the goal, surface hidden assumptions, generate counter-cases, derive first-principles constraints, converge on the best path, and emit acceptance signals. Xiaoyuan does not treat long chain-of-thought as proof; it turns reasoning into a contract that users can judge, developers can implement, and tests can verify. Use `platform-router` to let Xiaoyuan detect the current execution platform and select the native entrypoint for Codex, Claude, Cursor, or OpenCode. If the native entrypoint is missing, Xiaoyuan falls back to the universal Python CLI without skipping the same root gates, AC rules, or evidence requirements. Use `architecture-layers` to inspect Xiaoyuan's core boundary. Core capabilities define universal reasoning and evidence contracts. Evidence sources only translate concrete tools, services, credentials, APIs, browsers, databases, or external systems into evidence packets. Use cases such as ecommerce operations consume those layers; they do not define the core. Use the host-native Xiaoyuan judge entry when you want Xiaoyuan to behave like an add-on judgment layer instead of a pile of separate commands. It combines judgment-boundary mapping, essence discovery, effect preview, best-path selection, pseudo-optimal rejection, policy memory, and meta-control into one action-before-build decision: Codex: xiaoyuan-judge skill Claude: /xiaoyuan:judge Cursor: xiaoyuan-judge OpenCode: xiaoyuan/judge The CLI remains the backend and diagnostic surface for adapters, CI, and maintainers: python3 .deductive/scripts/xiaoyuan.py judgment-boundary-map --request "your raw request" --proposal "planned action" python3 .deductive/scripts/xiaoyuan.py run --request "your raw request" --proposal "planned action" Xiaoyuan core is not the content system. The content system and ecommerce operations are first use cases that exposed the trust problem. See `docs/XIAOYUAN_CORE_BOUNDARY.md` for the separation between core, evidence sources/adapters, and use cases. Use `meta-control-check` before implementation. It is a single protocol, not a list of lock commands. It checks five dimensions at once: Need, Level, Invariant, Reality, and Evidence. If a proposal treats an external system type, credential, UI, industry, or adapter as core, it returns `blocked`. Use `effect-audit` after implementation to check whether the built surface still matches the essence. It scans implementation files for visible DOM and interaction signals: primary workspace, account switching, plugin boundaries, risk evidence, recovery paths, and pseudo-optimal main-path terms. This is static DOM/text-level auditing; the next frontier is screenshot, click-path, latency, and real task completion auditing. Use `visual-interaction-audit` when Playwright is available. It opens the page in a real browser, checks that screenshots are nonblank, verifies critical DOM visibility, clicks account switching, toggles plugin assistance, collapses navigation, measures basic latency, and rejects visible pseudo-optimal main-path terms. Use `user-task-audit` to move from component checks to task checks. It uses a real browser to click account switching, open plugin assistance, enter a customer-service task, hand off to a human, close the drawer, and select a task card. A failed task path blocks any 99% completion claim. Use `backend-network-audit` to verify that user tasks map to API contracts, unavailable APIs degrade to usable fallback data, and repeated interactions do not lose state. It still does not replace real backend state assertions, database checks, or production telemetry. Use `state-transition-audit` as the generic core gate for state-change evidence. It requires an adapter to prove that a scenario can read initial state, perform a low-risk change, read back the change, produce evidence, and update metrics. `real-backend-state-audit` remains as a compatibility alias for the ecommerce service adapter. Use `reality-e2e-audit` before any final 99% claim. It is not tied to ecommerce credentials or any one external system type. It requires an evidence packet with an external system, low-risk action, external response, state change, telemetry trace, and human or rule decision. `production-writeback-telemetry-audit` remains as a compatibility alias for external-system writeback and production telemetry evidence. ## Capability Comparison These numbers are engineering estimates, not third-party benchmark results. Xiaoyuan measures how hard it is for hallucination to enter the "done, trusted, shippable" state; it does not claim to make LLMs stop hallucinating. ### Hallucination Risk Reduction | Risk type | Without Xiaoyuan | With Xiaoyuan | Estimated reduction | | --- | --- | --- | ---: | | 口头完成幻觉 | 模型说“已完成”即可被误信 | 必须有 AC/state/evidence | 5-10x | | 配置路径幻觉 | 模型说“已配置”,但路径缺失 | `configure` checks paths | 3-6x | | 代码未验证幻觉 | 代码改完但没跑验收 | gate scans AC and writes state | 5-10x | | 多平台上下文漂移 | 不同工具各自解释状态 | Shared rule layer across tools | 2-4x | | 外部模型自证可信 | GLM/GPT acts as judge | External models cannot be final judge | 3-6x | | 使用者误判上线 | 页面/链接被误认为可用 | Back to data, rules, gaps, deliverables, code | 2-4x | | 创意/语义质量误判 | 模型评分被当作市场真相 | Critic-only; requires feedback calibration | 1.5-3x | ### Compared With Similar Tool Categories | Type | Representative capability | Explicit rules | Evidence loop | Multi-platform consistency | Red/blue boundary | External model config | Human-use hallucination control | Overall estimate | | --- | --- | ---: | ---: | ---: | ---: | ---: | ---: | ---: | | Prompt-only rule files | `CLAUDE.md`, `AGENTS.md`, `.cursorrules` | 2/5 | 0/5 | 1/5 | 0/5 | 0/5 | 1/5 | 15-25% | | Memory/workflow frameworks | Trellis-style spec/task/memory | 4/5 | 2/5 | 4/5 | 1/5 | 1/5 | 2/5 | 45-60% | | Eval-only tools | Benchmark, prompt eval, scoring | 3/5 | 3/5 | 1/5 | 2/5 | 1/5 | 1/5 | 35-50% | | CI-only gates | Tests, lint, pre-commit | 2/5 | 4/5 | 1/5 | 0/5 | 0/5 | 2/5 | 35-55% | | Agent orchestration | Multi-agent delegation and parallel work | 2/5 | 1/5 | 3/5 | 2/5 | 1/5 | 1/5 | 30-45% | | **Xiaoyuan current version** | Deductive rules + Trellis + provider boundary + AC | **4/5** | **4/5** | **4/5** | **3/5** | **3.5/5** | **4/5** | **70-85%** | | Xiaoyuan target version | Add executors, health checks, CI secret scan, upgrader | 5/5 | 5/5 | 5/5 | 5/5 | 5/5 | 5/5 | 90-95% | ### Best-Fit Scenarios | Scenario | Fit | Why | | --- | ---: | --- | | AI coding agent acceptance | 90% | Rules, AC, and evidence directly constrain the meaning of "done". | | Multi-tool team collaboration | 85% | Platform entries differ, but the trust protocol stays the same. | | LLM red/blue review | 70% | Roles and provider config are ready; automated executors are still next. | | Pre-production release checks | 70% | Xiaoyuan can split acceptance logic; real environment probes stay project-specific. | | Content quality governance | 60% | It can govern process and evidence, but not replace market feedback. | | High-risk decision support | 75% | It blocks model self-certification, but still needs human approval and domain rules. | The full analysis remains in [Capability Comparison](./docs/CAPABILITY_COMPARISON.md). ## How It Works Xiaoyuan installs a deterministic trust layer into a target project: .deductive/ ├── acs/rules.json # Rule registry ├── config.json # Gate and platform config ├── hooks/ # Deterministic hooks ├── model_providers.json # GLM/GPT/OpenAI-compatible role config ├── principles/ # LLM trust principles └── scripts/xiaoyuan.py # status / configure / explain .agents/ # Codex skills .claude/ # Claude commands .cursor/ # Cursor commands .opencode/ # OpenCode commands and agents tests/test_ac_*.py # Acceptance coverage At a high level, the workflow is: 1. Define the rule. 2. Bind the rule to an AC test. 3. Run the deterministic gate. 4. Write state and evidence. 5. Treat external model output as advisory unless evidence exists. ## Repository Layout xiaoyuan/ ├── adapters/ # Platform entry points ├── deductive/ # Core Xiaoyuan trust layer ├── docs/DEPLOYMENT.md # Install and provider setup guide ├── examples/model-providers.env.example ├── scripts/install.sh ├── scripts/configure-models.sh └── tests/ # Xiaoyuan AC tests ## Verification Run the package-level checks: python3 -m pytest \ tests/test_ac_012_xiaoyuan_principles.py \ tests/test_ac_013_model_provider_adversarial_config.py \ tests/test_ac_014_xiaoyuan_deployment_docs.py \ tests/test_ac_015_capability_comparison_doc.py \ -q Expected result: 15 passed ## FAQIs Xiaoyuan another LLM?
No. Xiaoyuan is a trust layer around LLM usage. It can coordinate GLM, GPT, and other compatible models, but it does not let them become the final judge of correctness.How is this different from Trellis?
Trellis gives AI work a cross-platform workflow and memory structure. Xiaoyuan adds a stricter deductive gate: rules, acceptance tests, state, and evidence decide whether work is trusted.Can Xiaoyuan eliminate hallucination?
No system can eliminate all hallucination. Xiaoyuan reduces the chance that hallucination enters the “done, trusted, shippable” state by requiring deterministic evidence.Where do I put model URLs and API keys?
Use local environment variables or the generated `.env.xiaoyuan` file. The public repository only keeps variable names and configuration structure.标签:AC测试, AI辅助编程, Claude, Codex, Cursor, Cutter, CVE检测, GLM, GPT, OpenAI兼容, OpenCode, Trellis, 上下文丢失, 云计算, 人工智能, 人机信任, 代码推导, 使用幻觉, 反幻觉裁判, 可信AI, 外部模型, 大模型, 层级设计, 差距分析, 幻觉治理, 开发效率, 推演幻觉, 数据管道, 数据验证, 模型提供者, 测试驱动, 漏洞管理, 演绎系统, 用户模式Hook绕过, 目标锁死, 确定性校验, 结构化推导, 网络可观测性, 网络安全研究, 规则引擎, 证据优先, 软件工程, 逆向工具, 需求验证, 验收测试