April 07, 2025

Breaking AI to Build Trust: A Conversation with a Microsoft Red Team Engineer

30 minutes

Get featured on the show by leaving us a Voice Mail: https://bit.ly/MIPVM

FULL SHOW NOTES
https://www.microsoftinnovationpodcast.com/672

We dive deep into the world of AI security with Microsoft's Senior Offensive Security Engineer, Joris De Gruyter, from the AI Red Team, who shares insights into how they test and break AI systems to ensure safety and trustworthiness.

TAKEAWAYS
• Microsoft requires all AI features to be thoroughly documented and approved by a central board
• The AI Red Team tests products adversarially and as regular users to identify vulnerabilities
• Red teaming originated in military exercises during the Cold War before being adapted for software security
• The team tests for jailbreaks, harmful content generation, data exfiltration, and bias
• Team members come from diverse backgrounds including PhDs in machine learning, traditional security, and military experience
• New AI modalities like audio, images, and video each present unique security challenges
• Mental health support is prioritized since team members regularly encounter disturbing content
• Working exclusively with failure modes creates a healthy skepticism about AI capabilities
• Hands-on experimentation is recommended for anyone wanting to develop AI skills
• Curating your own information sources rather than relying on algorithms helps discover new knowledge

Check out the Microsoft co-pilot and other AI tools to start experimenting and finding practical ways they can help in your daily work.

Microsoft 365 Copilot Adoption is a Microsoft Press book for leaders and consultants. It shows how to identify high-value use cases, set guardrails, enable champions, and measure impact, so Copilot sticks. Practical frameworks, checklists, and metrics you can use this month. Get the book: https://bit.ly/CopilotAdoption

Support the show

If you want to get in touch with me, you can message me here on Linkedin.

Thanks for listening 🚀 - Mark Smith

...more