LLMs hack custom vulnerable app in $1,500 test

A developer built a deliberately insecure web app, spent $1,500 on API calls, and measured how well large language models could locate and exploit its flaws, revealing both promise and limits for AI‑driven security testing.

sources[Kasra Blog]

On June 4, 2026 the author of a personal blog released a write‑up describing a $1,500 experiment that built a deliberately insecure web application and asked several large language models to locate and exploit its flaws [Kasra Blog]. The goal was to measure how far current LLMs can go in automated vulnerability discovery.

The test app contained classic OWASP‑type weaknesses—hard‑coded credentials, SQL injection points, and insecure deserialization. The author prompted models such as OpenAI’s GPT‑4, Anthropic’s Claude and Google’s Gemini with instructions to “break the app”. Each model received multiple prompt variations to see whether it could generate working exploit code. According to the post, GPT‑4 succeeded in extracting the admin password and crafting a SQL injection payload, while Claude produced a valid deserialization attack. Gemini, however, failed to produce a usable exploit for any of the flaws [Kasra Blog].

The findings have three practical implications. First, LLMs can act as semi‑automated pen‑test assistants, automatically surfacing exploitable bugs that a human tester might miss. Second, the success rate varies dramatically between models and prompt styles, meaning LLM‑driven testing is not a replacement for traditional security audits. Third, the entire operation cost only $1,500 in API usage, a fraction of the expense of hiring a professional red‑team, which suggests that malicious actors could adopt the same approach at scale.

Overall, the experiment shows both the promise and the limits of generative AI in security work.

adjacent broadcasts

TX_404911·ai

operator_channel

[ comments_offline · provider_not_configured ]

transmission_log

Subscribe to the broadcast.

Daily digest of the day's most important tech news. No fluff. Engineering signal only.

// delivered via substack · double-opt-in confirmation