LLM safety tooling still leans heavily on black box filtering. Watch the prompt, watch the output, block the obvious stuff. LLM guardrail bypasses keep showing up in the same place: prompts that look harmless to moderation classifiers, yet still result in harmful output.

At BSidesSF 2026, ObjectSecurity will present “Increasing the Analysis Surface of Large Language Models,” a session that shifts attention to model internals. The talk introduces research on model internal analysis as a new place to look for risk. Unsafe prompts do not just change what the model says, they change what the model pays attention to. Those attention patterns can be measured and used as signals for detection.

The session covers how model internals can be tied back to unsafe prompts, what those internal signals look like in practice, and where they fit alongside input and output checks. If you care about jailbreaks, prompt injection, or any case where boundary filters miss what is happening inside the model, this talk is for you.

Speakers:
Stephen Brennan, AI/ML Researcher, ObjectSecurity LLC
Ulrich Lang, Founder/CEO, ObjectSecurity

BSidesSF takes place at City View at Metreon in San Francisco. Attend the talk and bring your questions.