Punkrose/Prism

GitHub: Punkrose/Prism

Stars: 0 | Forks: 0

# 🔷 PRISM ![License](https://img.shields.io/badge/license-MIT-blue.svg) ![Node.js](https://img.shields.io/badge/node-%3E%3D18.0.0-brightgreen.svg) ![Zero Dependencies](https://img.shields.io/badge/dependencies-0-orange.svg) ![Tests](https://img.shields.io/badge/tests-node%3Atest-purple.svg) **AI-powered prompt reverse engineering, model fingerprinting, and LLM benchmarking platform.** PRISM analyzes LLM outputs to infer the prompts that produced them, identify which model generated a response, and benchmark LLM performance across standardized tasks — all with zero external dependencies. ## ✨ Features ### 🔍 Prompt Reverse Engineering - Detect prompt type (creative, analytical, code, conversational, instruction-following, factual) - Extract formatting constraints (code blocks, lists, headers, JSON, etc.) - Identify refusal patterns and confidence levels - Analyze response structure (paragraphs, sentence length, code presence) - Compute complexity scores ### 🪪 Model Fingerprinting - Identify which LLM produced a given output - Match against built-in model profiles (GPT-4o, Claude 3.5, Gemini Pro, Llama 3, DeepSeek, Mistral) - Analyze stylistic markers: formality, hedging, code quality, repetition rate - Rank candidate models by confidence score ### 📊 LLM Benchmarking - Run standardized tasks (summary, code, reasoning, creative, factual) - Heuristic scoring against multiple criteria per task - Multi-model comparison with leaderboard generation - Formatted benchmark reports ## 📦 Installation git clone https://github.com/Punkrose/Prism.git cd Prism npm install # No dependencies to install — just sets up the project ## 🚀 Quick Start ### CLI Usage # Reverse engineer a prompt from a response node bin/prism.js reverse "Here is a function that reverses a string..." # Fingerprint which model produced output node bin/prism.js fingerprint "Certainly! I'd be happy to help. Here's a detailed explanation..." # Run benchmarks with simulated provider node bin/prism.js bench --tasks 5 # Show project info node bin/prism.js info # Run from file node bin/prism.js reverse ./response.txt ### API Usage const { PromptReverseEngineer, ModelFingerprinter, LLMBenchmark } = require('./src'); // ── Prompt Reverse Engineering ── const engineer = new PromptReverseEngineer({ detail: 'full' }); const analysis = engineer.analyze('Your LLM response text here...'); console.log(analysis.promptType); // 'code' console.log(analysis.complexity); // { score: 65, level: 'moderate' } console.log(analysis.constraints); // ['code-blocks', 'bullet-lists'] console.log(analysis.inference); // Human-readable summary // ── Model Fingerprinting ── const fingerprinter = new ModelFingerprinter(); const result = fingerprinter.fingerprint('Certainly! Here\'s a helpful response...'); console.log(result.detected); // 'claude-3.5' console.log(result.confidence); // 0.72 console.log(result.ranked); // [{ id, score, markers, styleMatch }, ...] // ── Benchmarking ── const benchmark = new LLMBenchmark(); const results = await benchmark.run(async (prompt) => { // Call your LLM API here return await myLLM.generate(prompt); }); console.log(benchmark.report(results)); ## 🔧 How It Works ### Prompt Reverse Engineering The `PromptReverseEngineer` analyzes text responses to infer the likely prompt structure: 1. **Tokenization** — Breaks response into typed tokens (words, numbers, punctuation, code) 2. **Pattern Detection** — Identifies structural markers (code blocks, lists, headers, JSON) 3. **Classification** — Uses weighted signal scoring to classify prompt type across 6 categories 4. **Refusal Detection** — Matches against 20+ known refusal patterns with confidence scoring 5. **Complexity Scoring** — Combines length, vocabulary diversity, structural complexity, and sentence length ### Model Fingerprinting The `ModelFingerprinter` identifies the likely source model: 1. **Marker Extraction** — Searches for known model-specific phrases and patterns 2. **Style Analysis** — Computes metrics: avg sentence length, formality (0-1), hedging (0-1), code quality (0-1), repetition rate (0-1) 3. **Profile Matching** — Compares extracted markers and style metrics against built-in model profiles 4. **Ranking** — Scores each model on a 50/50 weighted combination of marker matches and style similarity ### LLM Benchmarking The `LLMBenchmark` evaluates LLM performance: 1. **Task Execution** — Runs standardized prompts through a provider function 2. **Heuristic Scoring** — Scores each response against task-specific criteria using pattern matching and structural analysis 3. **Comparison** — Produces ranked leaderboards across multiple models 4. **Reporting** — Generates human-readable benchmark reports with scores, latencies, and rankings ## 🪪 Default Model Profiles | Model | ID | Key Markers | Formality | Hedging | |-------|-----|-------------|-----------|---------| | GPT-4o | `gpt-4o` | "As an AI", "cutoff" | 0.70 | 0.30 | | Claude 3.5 | `claude-3.5` | "Certainly", "I'd be happy" | 0.80 | 0.40 | | Gemini Pro | `gemini-pro` | "Great question", "Here are" | 0.60 | 0.50 | | Llama 3 | `llama-3` | "As a language model" | 0.50 | 0.20 | | DeepSeek | `deepseek` | "Let me think", "First, Second, Third," | 0.60 | 0.35 | | Mistral | `mistral` | "In summary", "The answer is" | 0.55 | 0.25 | ## 📋 Default Benchmark Tasks | Task | Prompt | Criteria | |------|--------|----------| | Summary | Summarize a pangram passage | conciseness, accuracy | | Code | Reverse a string without `.reverse()` | correctness, efficiency, readability | | Reasoning | Logic puzzle about roses and flowers | logic, clarity | | Creative | Write a haiku about AI | creativity, structure | | Factual | Three laws of thermodynamics | accuracy, completeness | ## 🧪 Testing # Run all tests npm test # Run demo npm run demo ## 📁 Project Structure prism/ ├── bin/ │ └── prism.js # CLI entry point ├── src/ │ ├── index.js # Main exports │ ├── utils.js # Shared utilities │ ├── reverse.js # Prompt reverse engineering │ ├── fingerprint.js # Model fingerprinting │ └── benchmark.js # LLM benchmarking ├── test/ │ ├── reverse.test.js # Reverse engineering tests │ ├── fingerprint.test.js # Fingerprinting tests │ └── benchmark.test.js # Benchmarking tests ├── examples/ │ └── demo.js # Full demo script ├── package.json ├── LICENSE └── README.md ## 📜 License MIT License — Copyright (c) 2026 Punkrose See [LICENSE](./LICENSE) for details.
标签:自定义脚本