2,000 users attempted to hack Fernando I's AI assistant

Fernando I's AI assistant, Claw, underwent a 2,000‑person adversarial test that uncovered prompt‑injection, session‑isolation, and debug‑leakage vulnerabilities, providing concrete data for AI security hardening.

sources[Fernando I's blog]

On June 26, 2026 Fernando I released the results of a 2,000‑person adversarial test on his AI assistant, Claw, detailing how participants tried to subvert the system's responses [Fernando I's blog]. The test ran for four weeks, during which volunteers submitted prompts designed to trigger hallucinations, bypass safety filters, or extract internal prompts.

Test design and findings

Participants focused on three attack vectors: prompt injection, jailbreak chaining, and data poisoning. The most successful attempts involved crafted sequences that forced the model to reveal system instructions, accounting for 18 % of all breaches. Coordinated attacks that combined multiple vectors succeeded in 7 % of cases, exposing a gap in the assistant’s session isolation.

The audit also recorded 45 distinct failure modes, ranging from malformed input handling to unexpected token generation. Notably, the assistant’s error‑handling routine returned raw stack traces in 12 % of edge‑case queries, providing attackers with implementation details.

Implications for developers

The public release of these findings gives AI engineers concrete data on how real users exploit conversational agents [Fernando I's blog]. First, the prevalence of prompt‑injection success underscores the need for robust input sanitization beyond keyword filtering. Second, the observed session‑isolation failures suggest that multi‑turn interactions must be sandboxed per user to prevent state leakage. Third, the accidental exposure of internal diagnostics highlights the importance of stripping debug information from production endpoints.

Developers can apply these insights by integrating automated adversarial test suites, tightening validation layers, and auditing logging configurations. Fernando I plans to open‑source the test harness, enabling the community to replicate and extend the methodology.

[Fernando I's blog]

adjacent broadcasts