Most security firms have never shipped a production agent. Most app developers have never written a threat model. We sit in the overlap.
These are the in-house systems we operate ourselves — the same architectures we audit for clients. Every one of them informs the consulting work, and every audit informs the next product.
The agent system we run our work on. Snapshot rollback, file-lock management, circuit breakers, streaming hub, auto-verification. Lives across CLI, VS Code extension, and a desktop app. Forge is what taught us what we now write threat models about.
A 14B-parameter chat model fine-tuned for our work, served via REST API on a self-hosted inference stack we operate end-to-end. Same architecture as what's deployed at the companies we audit — which is exactly the point.
A Dungeons & Dragons assistant for players and Dungeon Masters who want a thinking partner at the table — not a dice-roll toy. Builds campaigns, designs encounters, writes monsters, generates NPCs with motives and voices, holds a thread of lore across a session. Runs on the Klaus inference stack. Pro tier coming.
A voice-driven language learning app built for real conversation practice, not flashcard drills. Multi-language (EN, FR, ES, DE, IT, PT, NL, PL, JA, KO, ZH, AR, HI, RU, TR), CEFR level settings, full voice in/out. No cloud AI, no per-minute fees, no data leaving home — every response generated on our self-hosted stack.
A focused tabletop assistant for 5e: spells, dice, characters, encounters. Lean and offline-capable. Distinct from Mirror (which is AI-first); Sir Squint's is the no-bloat reference companion. Launching paid on Google Play.
A handful of smaller projects that aren't products but matter for the work — game-server architecture experiments, .NET MAUI cross-platform tooling, Unity samples. Ask if you're curious; happy to show specific code.
Most consultants doing AI security have never operated a self-hosted inference stack, never debugged a production agent in front of real users, never shipped a fine-tuned model to a paying audience. The systems above are the reason we know the failure modes before your team encounters them.
When we write a threat model for your agent, we have already debugged the same failure modes in ours. When we audit your inference deployment, we've already configured one. The labs aren't a portfolio in the marketing sense — they're the thing that makes the audits credible.
Schedule a discovery call →