Exploring Information Security - Exploring Information Security

Exploring AI, APIs, and the Social Engineering of LLMs


Listen Later

Summary:

Timothy De Block is joined by Keith Hoodlet, Engineering Director at Trail of Bits, for a fascinating, in-depth look at AI red teaming and the security challenges posed by Large Language Models (LLMs). They discuss how prompt injection is effectively a new form of social engineering against machines, exploiting the training data's inherent human biases and logical flaws. Keith breaks down the mechanics of LLM inference, the rise of middleware for AI security, and cutting-edge attacks using everything from emojis and bad grammar to weaponized image scaling. The episode stresses that the fundamental solutions—logging, monitoring, and robust security design—are simply timeless principles being applied to a terrifyingly fast-moving frontier.

Key TakeawaysThe Prompt Injection Threat
  • Social Engineering the AI: Prompt injection works by exploiting the LLM's vast training data, which includes all of human history in digital format, including movies and fiction. Attackers use techniques that mirror social engineering to trick the model into doing something it's not supposed to, such as a customer service chatbot issuing an unauthorized refund.

  • Business Logic Flaws: Successful prompt injections are often tied to business logic flaws or a lack of proper checks and guardrails, similar to vulnerabilities seen in traditional applications and APIs.

  • Novel Attack Vectors: Attackers are finding creative ways to bypass guardrails:

    • Image Scaling: Trail of Bits discovered how to weaponize image scaling to hide prompt injections within images that appear benign to the user, but which pop out as visible text to the model when downscaled for inference.

    • Invisible Text: Attacks can use white text, zero-width characters (which don't show up when displayed or highlighted), or Unicode character smuggling in emails or prompts to covertly inject instructions.

    • Syntax & Emojis: Research has shown that bad grammar, run-on sentences, or even a simple sequence of emojis can successfully trigger prompt injections or jailbreaks.

Defense and Design
  • LLM Security is API Security: Since LLMs rely on APIs for their "tool access" and to perform actions (like sending an email or issuing a refund), security comes down to the same principles used for APIs: proper authorization, access control, and eliminating misconfiguration.

  • The Middleware Layer: Some companies are using middleware that sits between their application and the Frontier LLMs (like GPT or Claude) to handle system prompting, guard-railing, and filtering prompts, effectively acting as a Web Application Firewall (WAF) for LLM API calls.

  • Security Design Patterns: To defend against prompt injection, security design patterns are key:

    • Action-Selector Pattern: Instead of a text field, users click on pre-defined buttons that limit the model to a very specific set of safe actions.

    • Code-Then-Execute Pattern (CaMeL): The first LLM is used to write code (e.g., Pythonic code) based on the natural language prompt, and a second, quarantined LLM executes that safer code.

    • Map-Reduce Pattern: The prompt is broken into smaller chunks, processed, and then passed to another model, making it harder for a prompt injection to be maintained across the process.

  • Timeless Hygiene: The most critical defenses are logging, monitoring, and alerting. You must log prompts and outputs and monitor for abnormal behavior, such as a user suddenly querying a database thousands of times a minute or asking a chatbot to write Python code.

Resources & Links Mentioned
  • Trail of Bits Research:

    • Blog: blog.trailofbits.com

    • Company Site: trailofbits.com

    • Weaponizing image scaling against production AI systems

  • Call Me A Jerk: Persuading AI to Comply with Objectionable Requests

  • Securing LLM Agents Paper: Design Patterns for Securing LLM Agents against Prompt Injections.

  • Camel Prompt Injection

  • Defending LLM applications against Unicode character smuggling

  • Logit-Gap Steering: Efficient Short-Suffix Jailbreaks for Aligned Large Language Models

  • LLM Explanation: Three Blue One Brown (3Blue1Brown) has a great short video explaining how Large Language Models work.

  • Lakera Gandalf: Game for learning how to use prompt injection against AI

  • Keith Hoodlet's Personal Sites:

    • Website: securing.dev and thought.dev

Support the Podcast:

Enjoyed this episode? Leave us a review and share it with your network! Subscribe for more insightful discussions on information security and privacy.

Contact Information:

Leave a comment below or reach out via the contact form on the site, email timothy.deblock[@]exploresec[.]com, or reach out on LinkedIn.

Check out our services page and reach out if you see any services that fit your needs.

Social Media Links:

[RSS Feed] [iTunes] [LinkedIn][YouTube]

Subscribe

Sign up with your email address to receive news and updates.

Email Address
Sign Up

We respect your privacy.

Thank you!


...more
View all episodesView all episodes
Download on the App Store

Exploring Information Security - Exploring Information SecurityBy Timothy De Block

  • 4.7
  • 4.7
  • 4.7
  • 4.7
  • 4.7

4.7

43 ratings


More shows like Exploring Information Security - Exploring Information Security

View all
Security Now (Audio) by TWiT

Security Now (Audio)

2,002 Listeners

Risky Business by Patrick Gray

Risky Business

376 Listeners

Down the Security Rabbithole Podcast (DtSR) by Rafal (Wh1t3Rabbit) Los

Down the Security Rabbithole Podcast (DtSR)

98 Listeners

SANS Internet Stormcenter Daily Cyber Security Podcast (Stormcast) by Johannes B. Ullrich

SANS Internet Stormcenter Daily Cyber Security Podcast (Stormcast)

652 Listeners

CyberWire Daily by N2K Networks

CyberWire Daily

1,022 Listeners

The Daily by The New York Times

The Daily

112,617 Listeners

Darknet Diaries by Jack Rhysider

Darknet Diaries

8,017 Listeners

Cybersecurity Today by Jim Love

Cybersecurity Today

177 Listeners

Behind the Money by Financial Times

Behind the Money

227 Listeners

Defense in Depth by David Spark, Steve Zalewski, Geoff Belknap

Defense in Depth

74 Listeners

Hacker Valley Studio by Hacker Valley Media

Hacker Valley Studio

60 Listeners

Cyber Security Headlines by CISO Series

Cyber Security Headlines

136 Listeners

Hard Fork by The New York Times

Hard Fork

5,469 Listeners

The President's Daily Brief by The First TV

The President's Daily Brief

3,358 Listeners

Risky Bulletin by risky.biz

Risky Bulletin

46 Listeners