Exploring Information Security - Exploring Information Security

Exploring AI, APIs, and the Social Engineering of LLMs


Listen Later

Summary:

Timothy De Block is joined by Keith Hoodlet, Engineering Director at Trail of Bits, for a fascinating, in-depth look at AI red teaming and the security challenges posed by Large Language Models (LLMs). They discuss how prompt injection is effectively a new form of social engineering against machines, exploiting the training data's inherent human biases and logical flaws. Keith breaks down the mechanics of LLM inference, the rise of middleware for AI security, and cutting-edge attacks using everything from emojis and bad grammar to weaponized image scaling. The episode stresses that the fundamental solutions—logging, monitoring, and robust security design—are simply timeless principles being applied to a terrifyingly fast-moving frontier.

Key TakeawaysThe Prompt Injection Threat
  • Social Engineering the AI: Prompt injection works by exploiting the LLM's vast training data, which includes all of human history in digital format, including movies and fiction. Attackers use techniques that mirror social engineering to trick the model into doing something it's not supposed to, such as a customer service chatbot issuing an unauthorized refund.

  • Business Logic Flaws: Successful prompt injections are often tied to business logic flaws or a lack of proper checks and guardrails, similar to vulnerabilities seen in traditional applications and APIs.

  • Novel Attack Vectors: Attackers are finding creative ways to bypass guardrails:

    • Image Scaling: Trail of Bits discovered how to weaponize image scaling to hide prompt injections within images that appear benign to the user, but which pop out as visible text to the model when downscaled for inference.

    • Invisible Text: Attacks can use white text, zero-width characters (which don't show up when displayed or highlighted), or Unicode character smuggling in emails or prompts to covertly inject instructions.

    • Syntax & Emojis: Research has shown that bad grammar, run-on sentences, or even a simple sequence of emojis can successfully trigger prompt injections or jailbreaks.

Defense and Design
  • LLM Security is API Security: Since LLMs rely on APIs for their "tool access" and to perform actions (like sending an email or issuing a refund), security comes down to the same principles used for APIs: proper authorization, access control, and eliminating misconfiguration.

  • The Middleware Layer: Some companies are using middleware that sits between their application and the Frontier LLMs (like GPT or Claude) to handle system prompting, guard-railing, and filtering prompts, effectively acting as a Web Application Firewall (WAF) for LLM API calls.

  • Security Design Patterns: To defend against prompt injection, security design patterns are key:

    • Action-Selector Pattern: Instead of a text field, users click on pre-defined buttons that limit the model to a very specific set of safe actions.

    • Code-Then-Execute Pattern (CaMeL): The first LLM is used to write code (e.g., Pythonic code) based on the natural language prompt, and a second, quarantined LLM executes that safer code.

    • Map-Reduce Pattern: The prompt is broken into smaller chunks, processed, and then passed to another model, making it harder for a prompt injection to be maintained across the process.

  • Timeless Hygiene: The most critical defenses are logging, monitoring, and alerting. You must log prompts and outputs and monitor for abnormal behavior, such as a user suddenly querying a database thousands of times a minute or asking a chatbot to write Python code.

Resources & Links Mentioned
  • Trail of Bits Research:

    • Blog: blog.trailofbits.com

    • Company Site: trailofbits.com

    • Weaponizing image scaling against production AI systems

  • Call Me A Jerk: Persuading AI to Comply with Objectionable Requests

  • Securing LLM Agents Paper: Design Patterns for Securing LLM Agents against Prompt Injections.

  • Camel Prompt Injection

  • Defending LLM applications against Unicode character smuggling

  • Logit-Gap Steering: Efficient Short-Suffix Jailbreaks for Aligned Large Language Models

  • LLM Explanation: Three Blue One Brown (3Blue1Brown) has a great short video explaining how Large Language Models work.

  • Lakera Gandalf: Game for learning how to use prompt injection against AI

  • Keith Hoodlet's Personal Sites:

    • Website: securing.dev and thought.dev

Support the Podcast:

Enjoyed this episode? Leave us a review and share it with your network! Subscribe for more insightful discussions on information security and privacy.

Contact Information:

Leave a comment below or reach out via the contact form on the site, email timothy.deblock[@]exploresec[.]com, or reach out on LinkedIn.

Check out our services page and reach out if you see any services that fit your needs.

Social Media Links:

[RSS Feed] [iTunes] [LinkedIn][YouTube]

Subscribe

Sign up with your email address to receive news and updates.

Email Address
Sign Up

We respect your privacy.

Thank you!


...more
View all episodesView all episodes
Download on the App Store

Exploring Information Security - Exploring Information SecurityBy Timothy De Block

  • 4.7
  • 4.7
  • 4.7
  • 4.7
  • 4.7

4.7

43 ratings


More shows like Exploring Information Security - Exploring Information Security

View all
Security Now (Audio) by TWiT

Security Now (Audio)

2,010 Listeners

Defensive Security Podcast - Malware, Hacking, Cyber Security & Infosec by Jerry Bell and Andrew Kalat

Defensive Security Podcast - Malware, Hacking, Cyber Security & Infosec

370 Listeners

Risky Business by Patrick Gray

Risky Business

374 Listeners

SANS Internet Stormcenter Daily Cyber Security Podcast (Stormcast) by Johannes B. Ullrich

SANS Internet Stormcenter Daily Cyber Security Podcast (Stormcast)

653 Listeners

Syntax - Tasty Web Development Treats by Wes Bos & Scott Tolinski - Full Stack JavaScript Web Developers

Syntax - Tasty Web Development Treats

987 Listeners

The Diary Of A CEO with Steven Bartlett by DOAC

The Diary Of A CEO with Steven Bartlett

8,494 Listeners

Darknet Diaries by Jack Rhysider

Darknet Diaries

8,047 Listeners

Hacking Humans by N2K Networks

Hacking Humans

313 Listeners

CISO Series Podcast by David Spark, Mike Johnson, and Andy Ellis

CISO Series Podcast

189 Listeners

Behind the Money by Financial Times

Behind the Money

226 Listeners