HyLora/beyond_the_black_box

GitHub: HyLora/beyond_the_black_box

Stars: 0 | Forks: 0

# Beyond the Black Box: Full-Stack Architecture & Data Pipeline This repository contains the complete source code, backend architecture, and computational data analysis pipeline used in the Master's Thesis: **"Beyond the Black Box: Designing Frictional Interfaces for Trust Calibration and Appropriate Reliance in Scientific LLMs."** This system bridges a React-based frontend prototype with a NebulaGraph backend, serving dynamic Knowledge Graph triples while running an automated Natural Language Processing (NLP) pipeline to analyze user trust metrics. ## 📌 System Architecture This repository contains three major components of the "Glass Box" experimental prototype: 1. **Frontend UI (`/frontend`):** The interactive web interface where users engage with the LLM and the rendered Knowledge Graph. 2. **Backend API (`/scripts/api_server.py`):** A FastAPI server that queries a Nebula graph database (ORKG data) and serves semantic triples to the frontend. 3. **Data Analysis Pipeline (`/scripts/cluster_motivations.py`):** An NLP pipeline that processes qualitative user motivations using `all-MiniLM-L6-v2` sentence embeddings and K-Means clustering. ## 📂 Repository Structure ### 🖥️ Frontend (Web Interface) * Contains the React components, survey rendering (`questionnaire.tsx`), and the interactive UI for the Glass Box and Black Box conditions. ### ⚙️ Backend & API (`/scripts`) * `api_server.py`: The FastAPI application defining the endpoints. * `run_full_pipeline.py`: The master orchestrator. Boots up the Uvicorn server and establishes the Ngrok tunnel. * `test_endpoint.py`: Utility script to verify the API and Nebula connection. ### 📊 Data Analysis (`/scripts` & `/data`) * `cluster_motivations.py`: Generates the 384-dimensional sentence embeddings and applies unsupervised semantic clustering. * `analyze_results.py`: Handles statistical analysis of behavioral telemetry (Time-on-Task) and cognitive metrics (NASA-TLX). * `/data`: Contains the anonymized extraction of user responses and the mathematically assigned semantic clusters (`motivations_fully_coded.csv`). ## 🚀 Installation & Usage ### 1. Running the Data Analysis Pipeline To reproduce the findings and generate the semantic clustering visualizations, install the Python dependencies: pip install pandas numpy scikit-learn sentence-transformers matplotlib seaborn Then execute the analysis script: python scripts/cluster_motivations.py ### 2. Running the Backend Server To spin up the local API and expose it via Ngrok: pip install fastapi uvicorn pyngrok nebula3-python python scripts/run_full_pipeline.py (Note: You must update your frontend .env variables with the newly generated Ngrok URL to establish the connection). ## 📜 License & Academic Integrity This repository is published in partial fulfillment of the requirements for the Master's Degree in Human-Centered Artificial Intelligence (University of Milan, University of Milano-Bicocca, University of Pavia). The data provided is strictly anonymized to protect participant privacy.
标签:自动化攻击