CipherZ3r0/security-research-writeups
GitHub: CipherZ3r0/security-research-writeups
Stars: 0 | Forks: 0
# AI Security Research Writeups
This repository is a living archive of my AI security research, red-team notes, and public writeups. The focus is on how modern LLM systems fail under real adversarial pressure: prompt injection, instruction hijacking, unsafe output generation, and the broader problem of untrusted context being treated as trusted signal.
I use this space to document what I have tested, what I have learned, and how those findings map to practical defenses. The work here is intentionally varied. Some pieces are bug-bounty style assessments, some are challenge-based vulnerability writeups, and others will become deeper research notes as new attack surfaces emerge.
## What I have worked on
- Prompt injection assessment of Brave Leo, based on a real conversation transcript and focused on instruction leakage, refusal boundary failures, and unsafe output generation.
- Prompt injection vulnerability research against TryHackMe Echo AI, including bypass techniques such as instruction override, context switching, encoding-based injection, and multi-step prompt chaining.
- Responsible-disclosure style reporting that translates raw testing into structured findings, impact analysis, root-cause discussion, and mitigation guidance.
## Research focus
My work sits at the intersection of AI security, adversarial prompting, and product-facing defensive evaluation. I am especially interested in:
- Prompt injection and indirect instruction hijacking
- System prompt leakage and behavioral boundary failures
- Unsafe content generation under adversarial framing
- AI-assisted phishing and social engineering abuse paths
- Model governance gaps in chatbot products and integrated assistants
## Future research directions
The next areas I plan to explore are:
- Persistent memory architectures and how retained state can be safely isolated, scoped, and audited
- Memory and context poisoning, including how attackers can seed long-lived or session-local context for later abuse
Those topics matter because memory turns a single chat into a long-lived attack surface. Once a model can retain preferences, facts, or task state across turns, attackers gain new ways to influence future behavior without needing to win every prompt from scratch.
## Repository layout
- [AI-security/bug-bounty/brave_research/Brave_Leo_Prompt_Injection_Assessment.md](AI-security/bug-bounty/brave_research/Brave_Leo_Prompt_Injection_Assessment.md)
- [AI-security/bug-bounty/tryhackme_leo/Tryhackme_Prompt_Injection_Vulnerability.md](AI-security/bug-bounty/tryhackme_leo/Tryhackme_Prompt_Injection_Vulnerability.md)
## Working style
- I prefer evidence-driven writeups over speculation.
- I document attack chains clearly enough that defenders can reproduce the risk and reason about the fix.
- I separate public-safe reporting from sensitive exploit detail when needed.
- I treat AI systems as adversarial environments, not just products with a chat interface.
## Notes
This repository will continue to expand as I publish new research across AI security, prompt injection, memory safety, and other red-team oriented investigations.