There Is Nothing Responsible About Disclosure Of Every Successful Prompt Injection
The InfoSec community is strongest when it can collaborate openly. Few organizations can fend off sophisticated attacks aloneâand even they sometimes fail. If we all had to independently discover every malware variant, every vulnerability, every best practice, we wouldnât get very far. Over time, shared mechanisms emerged: VirusTotal and EDR for malware, CVE for vulnerabilities, standard organizations like OWASP for best practices.
Itâs remarkable that we can rely on one another at all. Companies compete fiercely. But their security teams collaborate. That kind of collaboration is not trivial. Nuclear arsenals. Ever-growing marketing budgets. These are places where more collaboration could helpâbut doesnât.
So when attackers hijack AI agents through prompt injection, we fall back on what we know: vulnerability disclosure. Vendors launch responsible disclosure programs (which we applaud). They ask researchers to report successful prompt injections privately. The implicit implication: responsible behavior means private reporting, not public sharing.
But thereâs a problem, prompt injection canât be fixed.
Blocking a specific prompt does little to protect users. It creates an illusion of security that leaves users exposed. Ask any AI security researcher: another prompt will surfaceâoften the same day the last one was blocked.
Calling a vulnerability âfixedâ numbs defenders, hiding the deeper issue. Anything your agent can do for you, an attacker can do tooâonce they take control.Â
Itâs not a bug, itâs a design choice. A tradeoff.
Every AI agent sits somewhere on a spectrum between power and risk. Defenders deserve to know where on that spectrum they are. Open-ended agents like OpenAI Operator and Claudeâs Computer Use maximize power, to be used at oneâs own peril. AI assistants make different tradeoffs exemplified by their approach to web browsingâeach vendor has come up with their own imperfect defense mechanism. Vendors make choices which users are forced to live with.
Prompt injections illustrate those choices. They are not vulnerabilities to be patched. Theyâre demonstrations. We canât expect organizations to make informed choices about AI risk without showing visceral examples of what could go wrong. And we canât expect vendors to hold off powerful capabilities in the name of safety without public scrutiny.
That doesnât mean vulnerability disclosure has no place in AI. Vendors can make risky choices without realizing it. Mistakes happen. Disclosures should be about the agentâs architecture, not a specific prompt.
Rather than treating prompt injection like a vulnerability, treat it like malware. Malware is an inherent risk of general-purpose operating systems. We donât treat new malware variants as vulnerabilities. We donât privately report them to Microsoft or Apple. We publicly share it, as soon as possible, or rather its hash. We donât claim that malware is âfixedâ because one hash made it onto a denylist.
When a vulnerability can be fixed, disclosure helps. But when the risk is inherent to the technology, hiding it only robs users of informed choice.