Tantalus
A prompt-injection arena where you are the attacker. Your goal: get an AI agent to exfiltrate data from a user's workstation.
The arena puts you in front of a realistic AI assistant with access to files, emails, and chat history — pre-loaded with both legitimate tools and poisoned ones. It's the same substrate the whitepaper ran on.
Illustrative — the live arena is at tantalus.io.
Do deployed AI security controls prevent malicious output — or clean it up after?
With Tantalus as the substrate, I ran the harness across ~6.1M inference calls on models from 1.7B to 119B parameters. Every behavioral and structural control was bypassed or allowed malicious data to be generated — except one. Only a single generation-layer control had a provable 100% rate at blocking bad behavior from ever being generated. An abliterated model's data-exfiltration success fell from 97.85% to 0% under it.