LLM Agent Honeypot: Monitoring AI Hacking Agents in the Wild
Deployed vulnerable servers to detect and study autonomous AI agents attempting exploitation — an early warning system for AI-powered cyberattacks.
Deployed vulnerable servers to detect and study autonomous AI agents attempting exploitation — an early warning system for AI-powered cyberattacks.
Research demonstrating the operational feasibility of autonomous AI agents in post-exploitation operations. The agent autonomously conducts reconnaissance, exfiltrates data, and spreads laterally via a compact USB deployment.
Evaluation of GPT-5's performance in elite cybersecurity competitions. Following OpenAI and DeepMind's AI achievements at IMO and ICPC, we demonstrated frontier AI is similarly capable at hacking. GPT-5 finished 25th, outperforming 93% of human participants—placing between the world's #3-ranked team (24th) and #7-ranked team (26th).
Research demonstrating AI agents compromising multi-host networks rather than single targets, chaining vulnerabilities across three machines—timing attacks, SSTI, and XXE. GPT-5 performed 3× faster than o3.
Talk at BSides security conference on AI capabilities in offensive security, featuring projects I worked on.
Ran a Claude-based agent that placed 2nd among AI teams.
Year-in-review analysis of AI security challenges and breakthroughs, covering jailbreak vulnerabilities, AI-enabled cyber operations, model security, and emerging defenses.
Analysis of a study on persuasion techniques for LLM jailbreaking. Found that the original study measured a confounding variable, not the persuasion techniques. Controlled experiments showed most methods don't work or have negative effectiveness.
Security vulnerabilities discovered and responsibly disclosed:
Meta (Meta-SecAlign bypass, acknowledged and fixed),
Oracle (fixed, publicly credited),
Telegram (fixed, bounty awarded),
Open Source (CVE-2022-25876),
and more.
Submitted my first bug bounty in 2016.
Playing CTFs since 2016 (started at 14). Former member of MindCrafters—ex-top-30
worldwide team.
Favorite categories: web security / OSINT.