Research — Reworr

LLM Agent Honeypot: Monitoring AI Hacking Agents in the Wild

October 2024 — arXiv (Lead author)

Deployed vulnerable servers to catch autonomous AI hacking agents — an early warning system for AI-powered cyberattacks.

Research Paper Live Dashboard Blog Post Dataset

Media coverage:

MIT Technology Review Bloomberg Law Cybernews

Blog posts:

Apart Research LessWrong

AI Hacking Cable

August 2025 — Palisade Research (Lead author)

Research demonstrating the operational feasibility of autonomous AI agents in post-exploitation operations. The agent autonomously conducts reconnaissance, exfiltrates data, and spreads laterally via a compact USB deployment.

Blog Post Technical Report Twitter Thread

Media coverage:

TIME

GPT-5 at CTFs: Case Studies From Top-Tier Cybersecurity Events

November 2025 — arXiv (Lead author)

Evaluation of GPT-5's performance in elite cybersecurity competitions. Following OpenAI and DeepMind's AI achievements at IMO and ICPC, we demonstrated frontier AI is similarly capable at hacking. GPT-5 finished 25th, outperforming 93% of human participants—placing between the world's #3-ranked team (24th) and #7-ranked team (26th).

Research Paper Twitter Thread

Multi-Host AI Hacking

August 2025 — Palisade Research (Co-author)

Research demonstrating AI agents compromising multi-host networks rather than single targets, chaining vulnerabilities across three machines—timing attacks, SSTI, and XXE. GPT-5 performed 3× faster than o3.

GitHub Live Demo Twitter Thread

LLM Safety Bypass Demos

October 2024 – April 2025 — Palisade Research (Lead author)

Practical demonstrations of LLM vulnerabilities across different attack surfaces — prompt injection, jailbreaks, and visual exploits.

Prompt Injection Claude Computer Use agent visits malicious site, executes commands, steals SSH keys View Thread

Aesopian Jailbreak Safety bypass via allegories — one model rewrites, another executes View Thread

Visual Jailbreak Harmful instructions as generated images, bypassing text filters View Thread

Autonomous Hacking & Rogue Replication: Offensive Capabilities of Frontier AI

February 2026 — AI Safety Poland Talks #7, Online Talk (Speaker)

Talk covering LLM offensive capabilities from autonomous hacking to rogue replication, featuring projects I worked on.

Slides

AI in Offensive Security: Capabilities & Trends

September 2025 — BSides Kraków, Conference Talk (Speaker)

Talk at BSides security conference on AI capabilities in offensive security, featuring projects I worked on.

Slides

Evaluating AI Cyber Capabilities

May 2025 — arXiv (Acknowledged contributor)

Ran a Claude-based agent that placed 2nd among AI teams.

Research Paper

The Frontier of AI Security: What Did We Learn in the Last Year?

February 2025 — Heron AI Security Newsletter (Lead author)

Year-in-review analysis of AI security challenges and breakthroughs, covering jailbreak vulnerabilities, AI-enabled cyber operations, model security, and emerging defenses.

Article

Review of LLM Persuasion Jailbreak Study

January–March 2024 — Substack (Sole author)

Analysis of a study on persuasion techniques for LLM jailbreaking. Found that the original study measured a confounding variable, not the persuasion techniques. Controlled experiments showed most methods don't work or have negative effectiveness.

Article Twitter Thread Original Paper Original Project

Cybersecurity (2016–2023)

Vulnerability Research

Security vulnerabilities discovered and responsibly disclosed: Meta (Meta-SecAlign bypass, acknowledged and fixed), Oracle (fixed, publicly credited), Telegram (fixed, bounty awarded), Open Source (CVE-2022-25876), and more.

VERA Botnet Disruption

August 2022 — Technical Report (Sole author)

Discovered a command-injection vulnerability in an active DDoS botnet (VERA), performed controlled validation and coordinated disruption of malicious infrastructure (50+ hosts).

Write-up (republished)

OSINT Investigation (State-Sponsored Attack Attribution)

April 2021 — Media investigation (Acknowledged contributor)

Contributed OSINT analysis to a major investigative journalism piece attributing a data breach (440k affected) to state actors. Traced attack infrastructure through email headers and domain registration data across multiple related operations.

Investigation