McOwska/Information-Retrieval-Prompt-Injection
GitHub: McOwska/Information-Retrieval-Prompt-Injection
Stars: 0 | Forks: 0
# Prompt Injection in Multi-Hop Retrieval
This repository is a small research-style project for testing how prompt injection affects a simple multi-hop retrieval pipeline.
It builds a HotpotQA-based corpus, poisons documents with redirect instructions, runs a multi-hop agent over clean or poisoned retrieval, and evaluates the results with:
- `F1` for answer quality
- `ASR` (attack success rate) for how often the model follows the injected instruction
The code is intentionally lightweight and easy to inspect.
## What is in the repo
- `dataset/` builds the corpus and question set, and applies poisoning strategies
- `retireval/` contains the BM25 retriever
- `multihop_agent/` contains the retrieval + reasoning loop
- `evaluation/` runs experiments and computes metrics
- `classifier/` contains an optional adversarial-intent guard
- `results/` and `results_old/` contain saved outputs from previous runs
## Project flow
The typical workflow is:
1. Build a clean corpus and question set from HotpotQA
2. Create a poisoned corpus
3. Run the multi-hop agent with clean or poisoned retrieval
4. Measure answer quality and attack success
## Setup
Create and activate a virtual environment, then install dependencies:
python3 -m venv venv
source venv/bin/activate
pip install -r requirements.txt
Create a `.env` file in the repo root:
GROQ_API_KEY=gsk_***
BASE_URL=https://api.groq.com/openai/v1
MODEL=llama-3.1-8b-instant
TEMPERATURE=0.0
MAX_TOKENS=256
`OPENAI_API_KEY` or `BERGET_API_KEY` can also be used instead of `GROQ_API_KEY`, depending on the endpoint you want to call.
## Prepare the data
Generate the clean corpus, poisoned corpus, and question set:
python3 dataset/save_docs_to_local.py
This writes JSONL files into `data/processed/`, including:
- `corpus.jsonl`
- `corpus_poisoned_all_embedded.jsonl`
- `questions.jsonl`
Poisoning behavior is configured in [dataset/poisoning.py]() and [dataset/save_docs_to_local.py]().
## Run an evaluation
The current main script is [main_eval.py](), which:
Run it with:
python3 main_eval.py
Evaluation logic lives in [evaluation/evaluator.py]() and metrics are defined in [evaluation/metrics.py]().
## Optional components
The agent itself is in [multihop_agent/agent.py](). It:
- retrieves documents with BM25
- generates follow-up queries across hops
- produces a final answer from the accumulated context
There is also an optional classifier-based guard in [classifier/guard.py]() for flagging adversarial documents before they enter context.
## Results processing
If you have multiple result folders, these scripts help combine them:
- [results_processing/merge_results.py]()
- [results_processing/evaluate_results.py]()
They merge saved `comparison.csv` files and compute aggregate metrics across runs.
## Notes
- The repository currently focuses on experimentation rather than packaging
- Some scripts are intentionally simple and use fixed paths for faster iteration
- The `retireval/` directory name is kept as-is to match the existing codebase
## Quick start
If you just want the shortest path:
source venv/bin/activate
pip install -r requirements.txt
python3 dataset/save_docs_to_local.py
python3 main_eval.py