I spent years shipping production systems where "mostly correct" wasn't good enough — the kind of work with audit trails, formal review, and a real cost to being wrong. When generative AI arrived in the integration layer, I saw the same failure modes being waved through with a shrug.
So I built Tantalus, a prompt-injection arena, and ran it across ~6.1 million inference calls spanning models from 1.7B to 119B parameters. Every behavioral and structural control was bypassed — except one, which held at a provable 100%. That study is the backbone of how I advise clients: claims qualified to a harness, not to vibes.
Today I do two things. I secure the GenAI integration layer for teams shipping agents and assistants, and I build production systems — like the mileage app on this site — to the same correctness-and-audit standard I hold security to.
How I work
Evidence over assertion
Every recommendation is tied to a reproducible test. If I can't demonstrate it in a harness, I won't sell it to you as fact.
Prevention over cleanup
The control that mattered stopped bad output from ever being generated. I design for that layer, not for filtering damage after the fact.
You work with the principal
Principal-led by design. The architect who scopes the work is the one who delivers it and stands behind it.
Found something? I read every report.
If you find a security issue in this site or in anything I've published, email info@cybersharkconsulting.com. Test only against your own deployments — never against third parties — and give me a reasonable window to remediate before public disclosure. Machine-readable details live at /.well-known/security.txt.
Bring me the AI concern that keeps you up at night.
25 minutes on Google Meet. No pitch — we pressure-test it live and you leave with a threat-model sketch, whether or not we work together.