The Real Python Podcast

Leveraging Documents and Data to Create a Custom LLM Chatbot


Listen Later

How do you customize a LLM chatbot to address a collection of documents and data? What tools and techniques can you use to build embeddings into a vector database? This week on the show, Calvin Hendryx-Parker is back to discuss developing an AI-powered, Large Language Model-driven chat interface.

Calvin is the co-founder and CTO of Six Feet Up, a Python and AI consultancy. He shares a recent project for a family-owned seed company that wanted to build a tool for customers to access years of farm research. These documents were stored as brochure-style PDFs and spanned 50 years.

We discuss several of the tools used to augment a LLM. Calvin covers working with LangChain and vectorizing data with ChromaDB. We talk about the obstacles and limitations of capturing documentation.

Calvin also shares a smaller project that you can try out yourself. It takes the information from a conference website and creates a chatbot using Django and Python prompt-toolkit.

This episode is sponsored by Mailtrap.

Course Spotlight: Command Line Interfaces in Python

Command line arguments are the key to converting your programs into useful and enticing tools that are ready to be used in the terminal of your operating system. In this course, you’ll learn their origins, standards, and basics, and how to implement them in your program.

Topics:

  • 00:00:00 – Introduction
  • 00:02:21 – Background on the project
  • 00:03:51 – Complexity of adding documents
  • 00:09:01 – Retrieval-augmented generation and providing links
  • 00:13:46 – Updating information and larger conversation context
  • 00:18:08 – Sponsor: Mailtrap
  • 00:18:43 – Working with context
  • 00:21:02 – Temperature adjustment
  • 00:22:07 – Rally Conference Chatbot Project
  • 00:26:20 – Vectorization using ChromaDB
  • 00:32:49 – Employing Python prompt-toolkit
  • 00:35:07 – Learning libraries on the fly
  • 00:37:38 – Video Course Spotlight
  • 00:39:00 – Problems with tables in documents
  • 00:42:30 – Everything looks like a chat box
  • 00:44:26 – Finding the right fit for a client and customer
  • 00:49:05 – What are questions you ask a new client now?
  • 00:51:54 – Canada Air anecdote
  • 00:56:20 – How do you stay up to date on these topics?
  • 01:01:03 – What are you excited about in the world of Python?
  • 01:03:22 – What do you want to learn next?
  • 01:04:58 – How can people follow your work online?
  • 01:05:31 – IndyPy
  • 01:07:13 – Thanks and goodbye
  • Show Links:

    • Transforming Agricultural Data with AI — Six Feet Up
    • Build ChatGPT-like Apps with AI — Six Feet Up
    • Innovate with AI: Build ChatGPT-like Apps - YouTube
    • What is retrieval-augmented generation? - IBM Research Blog
    • rally-llm-presentation - sixfeetup - GitHub
    • Python Prompt Toolkit 3.0 — Documentation
    • Chroma - the AI-native open-source embedding database
    • Embeddings and Vector Databases With ChromaDB – Real Python
    • LangChain
    • Build an LLM RAG Chatbot With LangChain – Real Python
    • Air Canada must pay after chatbot lies to grieving passenger - The Register
    • I’d Buy That for a Dollar: Chevy Dealership’s AI Chatbot Goes Rogue
    • Omnivore
    • TLDR AI - Get smarter about AI in 5 minutes
    • Tech Brew
    • Simon Willison’s Weblog
    • llm: Access large language models from the command-line - simonw - GitHub
    • PyCon US 2024
    • Syntorial: The Ultimate Synthesizer Tutorial
    • Blog — Six Feet Up
    • Calvin Hendryx-Parker - LinkedIn
    • Eclipse Insights: How AI is Transforming Solar Astronomy - YouTube
    • Level up your Python skills with our expert-led courses:

      • Sneaky REST APIs With Django Ninja
      • How to Work With a PDF in Python
      • Command Line Interfaces in Python
      • Support the podcast & join our community of Pythonistas

        ...more
        View all episodesView all episodes
        Download on the App Store

        The Real Python PodcastBy Real Python

        • 4.7
        • 4.7
        • 4.7
        • 4.7
        • 4.7

        4.7

        134 ratings


        More shows like The Real Python Podcast

        View all
        Hanselminutes with Scott Hanselman by Scott Hanselman

        Hanselminutes with Scott Hanselman

        378 Listeners

        Software Engineering Radio - the podcast for professional software developers by se-radio@computer.org

        Software Engineering Radio - the podcast for professional software developers

        262 Listeners

        The Changelog: Software Development, Open Source by Changelog Media

        The Changelog: Software Development, Open Source

        285 Listeners

        LINUX Unplugged by Jupiter Broadcasting

        LINUX Unplugged

        263 Listeners

        Thoughtworks Technology Podcast by Thoughtworks

        Thoughtworks Technology Podcast

        43 Listeners

        Talk Python To Me by Michael Kennedy

        Talk Python To Me

        585 Listeners

        Software Engineering Daily by Software Engineering Daily

        Software Engineering Daily

        630 Listeners

        Super Data Science: ML & AI Podcast with Jon Krohn by Jon Krohn

        Super Data Science: ML & AI Podcast with Jon Krohn

        295 Listeners

        Python Bytes by Michael Kennedy and Brian Okken

        Python Bytes

        212 Listeners

        Data Engineering Podcast by Tobias Macey

        Data Engineering Podcast

        142 Listeners

        Syntax - Tasty Web Development Treats by Wes Bos & Scott Tolinski - Full Stack JavaScript Web Developers

        Syntax - Tasty Web Development Treats

        984 Listeners

        CoRecursive: Coding Stories by Adam Gordon Bell - Software Developer

        CoRecursive: Coding Stories

        185 Listeners

        DataFramed by DataCamp

        DataFramed

        267 Listeners

        Practical AI by Practical AI LLC

        Practical AI

        196 Listeners

        The Stack Overflow Podcast by The Stack Overflow Podcast

        The Stack Overflow Podcast

        63 Listeners