Software Engineering Daily

Unstructured Data and LLMs with Crag Wolfe and Matt Robinson


Listen Later

The majority of enterprise data exists in heterogenous formats such as HTML, PDF, PNG, and PowerPoint. However, large language models do best when trained with clean, curated data. This presents a major data cleaning challenge.

Unstructured is focused on extracting and transforming complex data to prepare it for vector databases and LLM frameworks.

Crag Wolfe is Head of Engineering and Matt Robinson is Head of Product at Unstructured. They join the podcast to talk about data cleaning in the LLM age.

Sean’s been an academic, startup founder, and Googler. He has published works covering a wide range of topics from information visualization to quantum computing. Currently, Sean is Head of Marketing and Developer Relations at Skyflow and host of the podcast Partially Redacted, a podcast about privacy and security engineering. You can connect with Sean on Twitter @seanfalconer .

 

Please click here to see the transcript of this episode.

Sponsorship inquiries: [email protected]

The post Unstructured Data and LLMs with Crag Wolfe and Matt Robinson appeared first on Software Engineering Daily.

...more
View all episodesView all episodes
Download on the App Store

Software Engineering DailyBy Software Engineering Daily

  • 4.4
  • 4.4
  • 4.4
  • 4.4
  • 4.4

4.4

610 ratings


More shows like Software Engineering Daily

View all
Hanselminutes with Scott Hanselman by Scott Hanselman

Hanselminutes with Scott Hanselman

378 Listeners

Software Engineering Radio - the podcast for professional software developers by se-radio@computer.org

Software Engineering Radio - the podcast for professional software developers

262 Listeners

The Changelog: Software Development, Open Source by Changelog Media

The Changelog: Software Development, Open Source

285 Listeners

The Cloudcast by Massive Studios

The Cloudcast

154 Listeners

Thoughtworks Technology Podcast by Thoughtworks

Thoughtworks Technology Podcast

43 Listeners

Talk Python To Me by Michael Kennedy

Talk Python To Me

584 Listeners

Soft Skills Engineering by Jamison Dance and Dave Smith

Soft Skills Engineering

271 Listeners

AWS Podcast by Amazon Web Services

AWS Podcast

200 Listeners

Data Engineering Podcast by Tobias Macey

Data Engineering Podcast

143 Listeners

Syntax - Tasty Web Development Treats by Wes Bos & Scott Tolinski - Full Stack JavaScript Web Developers

Syntax - Tasty Web Development Treats

985 Listeners

CoRecursive: Coding Stories by Adam Gordon Bell - Software Developer

CoRecursive: Coding Stories

185 Listeners

Kubernetes Podcast from Google by Abdel Sghiouar, Kaslin Fields

Kubernetes Podcast from Google

182 Listeners

The Stack Overflow Podcast by The Stack Overflow Podcast

The Stack Overflow Podcast

63 Listeners

The Real Python Podcast by Real Python

The Real Python Podcast

137 Listeners

Oxide and Friends by Oxide Computer Company

Oxide and Friends

47 Listeners