Weaviate Podcast

REFRAG with Xiaoqiang Lin - Weaviate Podcast #130!


Listen Later

Xiaoqiang Lin is a Ph.D. student at the National University of Singapore. During his time at Meta, Xiaoqiang lead the research behind REFRAG: Rethinking RAG-based Decoding. Traditional RAG systems use vectors to retrieve relevant context with semantic search, but then throw away the vectors when passing the context to the LLM. REFRAG instead feeds the LLM these pre-compute vectors, achieving massive gains in long context processing and LLM inference speed! REFRAG makes Time-To-First-Token (TTFT) 31x faster and Time-To-Iterative-Token (TTIT) 3x faster, boosting overall LLM throughput by 7x while also being able to handle much longer contexts!


There are so many interesting aspects to this and I really loved diving into the details with Xiaoqiang! I hope you enjoy the podcast!

...more
View all episodesView all episodes
Download on the App Store

Weaviate PodcastBy Weaviate

  • 4
  • 4
  • 4
  • 4
  • 4

4

4 ratings


More shows like Weaviate Podcast

View all
Practical AI by Practical AI LLC

Practical AI

204 Listeners

AWS Podcast by Amazon Web Services

AWS Podcast

205 Listeners

All-In with Chamath, Jason, Sacks & Friedberg by All-In Podcast, LLC

All-In with Chamath, Jason, Sacks & Friedberg

9,958 Listeners

Dwarkesh Podcast by Dwarkesh Patel

Dwarkesh Podcast

516 Listeners

No Priors: Artificial Intelligence | Technology | Startups by Conviction

No Priors: Artificial Intelligence | Technology | Startups

130 Listeners

Latent Space: The AI Engineer Podcast by swyx + Alessio

Latent Space: The AI Engineer Podcast

91 Listeners

Interconnects by Nathan Lambert

Interconnects

9 Listeners

OpenAI Podcast by OpenAI

OpenAI Podcast

52 Listeners