The Real Python Podcast

Using NumPy and Linear Algebra for Faster Python Code


Listen Later

Are you still using loops and lists to process your data in Python? Have you heard of a Python library with optimized data structures and built-in operations that can speed up your data science code? This week on the show, Jodie Burchell, developer advocate for data science at JetBrains, returns to share secrets for harnessing linear algebra and NumPy for your projects.

Jodie details how most people begin their data science journey using loops to iterate over values and apply operations sequentially. We talk about how loops are friendly for beginners, being clear to read and easy to debug, but unfortunately don’t scale well, especially with large amounts of data.

Jodie shares some of the basics of linear algebra and how to organize data into vectors. We talk about how the NumPy library leverages those concepts to improve data processing. We discuss how the library includes operations for vector and matrix addition and subtraction, and why these operations are more efficient than loops. We also cover how NumPy stores arrays in memory and when working with them is faster vs when it’s not.

Course Spotlight: Data Cleaning With pandas and NumPy

In this video course, you’ll learn how to clean up messy data using pandas and NumPy. You’ll become equipped to deal with a range of problems, such as missing values, inconsistent formatting, malformed records, and nonsensical outliers.

Topics:

  • 00:00:00 – Introduction
  • 00:02:35 – Vectorize all the things! - PyCon UK 2022 Talk
  • 00:06:39 – Becoming familiar with linear algebra
  • 00:09:05 – Beginners start with loops
  • 00:11:25 – Starting with basic linear algebra
  • 00:12:25 – The basic unit of a vector
  • 00:18:06 – NumPy representing vectors in Python
  • 00:23:25 – Sponsor: InfluxDB
  • 00:24:13 – Block management
  • 00:25:54 – Replacing a loop with vector-based operations
  • 00:34:06 – NumPy broadcasting
  • 00:38:52 – Approximating nearest neighbors
  • 00:43:49 – Video Course Spotlight
  • 00:45:15 – Solving the problem
  • 00:46:44 – Getting rid of nested loops
  • 00:48:54 – A peek under the hood
  • 00:53:28 – How arrays vs lists are stored in memory
  • 01:00:24 – Considering a GPU
  • 01:03:37 – Real Python resources on the subject
  • 01:04:08 – Upcoming talks and conferences
  • 01:07:31 – Thanks and goodbye
  • Show Links:

    • Vectorize all the things! How basic linear algebra can speed up your data science code - YouTube
    • Introduction to Linear Algebra, 5th Edition
    • Linear Algebra - Mathematics - MIT OpenCourseWare
    • Linear Algebra and Learning from Data
    • Linear Algebra in Python: Matrix Inverses and Least Squares
    • NumPy: the absolute basics for beginners - NumPy Manual
    • Broadcasting — NumPy v1.24 Manual
    • spotify/annoy: Approximate Nearest Neighbors in C++/Python optimized
    • Look Ma, No For-Loops: Array Programming With NumPy – Real Python
    • NumPy Tutorial: Your First Steps Into Data Science in Python – Real Python
    • How to Iterate Over Rows in pandas, and Why You Shouldn’t – Real Python
    • RADAR: Thrive in the era of data - DataCamp
    • Vectorize all the things! Using linear algebra and NumPy to make your Python code lightning fast. - Python Web Conference 2023
    • Jodie Burchell - PyCon US 2023
    • Jodie Burchell’s Blog - Standard error
    • Jodie Burchell 🇦🇺🇩🇪 (@t_redactyl) - Twitter
    • Jodie Burchell 🇦🇺🇩🇪 (@[email protected]) - Fosstodon
    • JetBrains: Essential tools for software developers and teams
    • Level up your Python skills with our expert-led courses:

      • Data Cleaning With pandas and NumPy
      • Histogram Plotting in Python: NumPy, Matplotlib, Pandas & Seaborn
      • Using NumPy's np.arange() Effectively
      • Support the podcast & join our community of Pythonistas

        ...more
        View all episodesView all episodes
        Download on the App Store

        The Real Python PodcastBy Real Python

        • 4.7
        • 4.7
        • 4.7
        • 4.7
        • 4.7

        4.7

        134 ratings


        More shows like The Real Python Podcast

        View all
        Hanselminutes with Scott Hanselman by Scott Hanselman

        Hanselminutes with Scott Hanselman

        377 Listeners

        Software Engineering Radio - the podcast for professional software developers by se-radio@computer.org

        Software Engineering Radio - the podcast for professional software developers

        265 Listeners

        The Changelog: Software Development, Open Source by Changelog Media

        The Changelog: Software Development, Open Source

        287 Listeners

        LINUX Unplugged by Jupiter Broadcasting

        LINUX Unplugged

        262 Listeners

        Thoughtworks Technology Podcast by Thoughtworks

        Thoughtworks Technology Podcast

        41 Listeners

        Talk Python To Me by Michael Kennedy

        Talk Python To Me

        585 Listeners

        Software Engineering Daily by Software Engineering Daily

        Software Engineering Daily

        628 Listeners

        Super Data Science: ML & AI Podcast with Jon Krohn by Jon Krohn

        Super Data Science: ML & AI Podcast with Jon Krohn

        295 Listeners

        Python Bytes by Michael Kennedy and Brian Okken

        Python Bytes

        213 Listeners

        Data Engineering Podcast by Tobias Macey

        Data Engineering Podcast

        140 Listeners

        Syntax - Tasty Web Development Treats by Wes Bos & Scott Tolinski - Full Stack JavaScript Web Developers

        Syntax - Tasty Web Development Treats

        987 Listeners

        CoRecursive: Coding Stories by Adam Gordon Bell - Software Developer

        CoRecursive: Coding Stories

        186 Listeners

        DataFramed by DataCamp

        DataFramed

        269 Listeners

        Practical AI by Practical AI LLC

        Practical AI

        190 Listeners

        The Stack Overflow Podcast by The Stack Overflow Podcast

        The Stack Overflow Podcast

        63 Listeners