CipherZ3r0/security-research-writeups

GitHub: CipherZ3r0/security-research-writeups

Stars: 0 | Forks: 0

# AI Security Research Writeups This repository is a living archive of my AI security research, red-team notes, and public writeups. The focus is on how modern LLM systems fail under real adversarial pressure: prompt injection, instruction hijacking, unsafe output generation, and the broader problem of untrusted context being treated as trusted signal. I use this space to document what I have tested, what I have learned, and how those findings map to practical defenses. The work here is intentionally varied. Some pieces are bug-bounty style assessments, some are challenge-based vulnerability writeups, and others will become deeper research notes as new attack surfaces emerge. ## What I have worked on - Prompt injection assessment of Brave Leo, based on a real conversation transcript and focused on instruction leakage, refusal boundary failures, and unsafe output generation. - Prompt injection vulnerability research against TryHackMe Echo AI, including bypass techniques such as instruction override, context switching, encoding-based injection, and multi-step prompt chaining. - Responsible-disclosure style reporting that translates raw testing into structured findings, impact analysis, root-cause discussion, and mitigation guidance. ## Research focus My work sits at the intersection of AI security, adversarial prompting, and product-facing defensive evaluation. I am especially interested in: - Prompt injection and indirect instruction hijacking - System prompt leakage and behavioral boundary failures - Unsafe content generation under adversarial framing - AI-assisted phishing and social engineering abuse paths - Model governance gaps in chatbot products and integrated assistants ## Future research directions The next areas I plan to explore are: - Persistent memory architectures and how retained state can be safely isolated, scoped, and audited - Memory and context poisoning, including how attackers can seed long-lived or session-local context for later abuse Those topics matter because memory turns a single chat into a long-lived attack surface. Once a model can retain preferences, facts, or task state across turns, attackers gain new ways to influence future behavior without needing to win every prompt from scratch. ## Repository layout - [AI-security/bug-bounty/brave_research/Brave_Leo_Prompt_Injection_Assessment.md](AI-security/bug-bounty/brave_research/Brave_Leo_Prompt_Injection_Assessment.md) - [AI-security/bug-bounty/tryhackme_leo/Tryhackme_Prompt_Injection_Vulnerability.md](AI-security/bug-bounty/tryhackme_leo/Tryhackme_Prompt_Injection_Vulnerability.md) ## Working style - I prefer evidence-driven writeups over speculation. - I document attack chains clearly enough that defenders can reproduce the risk and reason about the fix. - I separate public-safe reporting from sensitive exploit detail when needed. - I treat AI systems as adversarial environments, not just products with a chat interface. ## Notes This repository will continue to expand as I publish new research across AI security, prompt injection, memory safety, and other red-team oriented investigations.