The Python Podcast.__init__

Speed Up Your Python Data Applications By Parallelizing Them With Bodo


Listen Later

Summary

The speed of Python is a subject of constant debate, but there is no denying that for compute heavy work it is not the optimal tool. Rather than rewriting your data oriented applications, or having to rearchitect them, the team at Bodo wrote a compiler that will do the optimization for you. In this episode Ehsan Totoni explains how they are able to translate pure Python into massively parallel processes that are optimized for high performance compute systems.

Announcements
  • Hello and welcome to Podcast.__init__, the podcast about Python’s role in data and science.
  • When you’re ready to launch your next app or want to try a project you hear about on the show, you’ll need somewhere to deploy it, so take a look at our friends over at Linode. With the launch of their managed Kubernetes platform it’s easy to get started with the next generation of deployment and scaling, powered by the battle tested Linode platform, including simple pricing, node balancers, 40Gbit networking, dedicated CPU and GPU instances, and worldwide data centers. Go to pythonpodcast.com/linode and get a $100 credit to try out a Kubernetes cluster of your own. And don’t forget to thank them for their continued support of this show!
  • Your host as usual is Tobias Macey and today I’m interviewing Ehsan Totoni about Bodo, an inferential compiler for Python that automatically parallelizes your data oriented projects
  • Interview
    • Introductions
    • How did you get introduced to Python?
    • Can you describe what Bodo is and the story behind it?
    • What are some of the use cases that it is being applied to?
    • What are the motivating factors for something like Dask or Ray as compared to Bodo?
    • What are the software patterns that contribute to slowdowns in data processing code?
      • What are some of the ways that the compiler is able to optimize those operations?
      • Can you describe how Bodo is implemented?
      • How does Bodo process the Python code for compiling to the optimized form?
        • What are the compilation techniques for understanding the semantics of the code being processed?
        • How do you manage packages that rely on C extensions?
        • What do you use as an intermediate representation for translating into the optimized output?
        • What is the workflow for applying Bodo to a Python project?
          • What debugging utilities does it provide for identifying any errors that occur due to the added parallelism?
          • What kind of support does Bodo have for optimizing a machine learning project with Bodo? (e.g. using PyTorch/Tensorflow/MxNet/etc.)
          • When working with a workflow orchestrator such as Dagster for Airflow, what would the integration process look like for being able to take advantage of the optimized Bodo output?
          • What are the most interesting, innovative, or unexpected ways that you have seen Bodo used?
          • What are the most interesting, unexpected, or challenging lessons that you have learned while working on Bodo?
          • When is Bodo the wrong choice?
          • What do you have planned for the future of Bodo?
          • Keep In Touch
            • LinkedIn
            • @EhsanTn on Twitter
            • ehsantn on GitHub
            • Picks
              • Tobias
                • Paracord Crafts
                • Ehsan
                  • [
                  • Closing Announcements
                    • Thank you for listening! Don’t forget to check out our other show, the Data Engineering Podcast for the latest on modern data management.
                    • Visit the site to subscribe to the show, sign up for the mailing list, and read the show notes.
                    • If you’ve learned something or tried out a project from the show then tell us about it! Email [email protected]) with your story.
                    • To help other people find the show please leave a review on iTunes and tell your friends and co-workers
                    • Links
                      • Bodo
                        • Data Engineering Podcast Episode
                        • University of Illinois Urbana-Champaign
                        • HPC
                        • MPI
                        • Elastic Fabric Adapter
                        • All-to-All Communication
                        • Dask
                          • Data Engineering Podcast Episode
                          • Ray
                            • Podcast Episode
                            • Pandas Extension Arrays
                              • Podcast Episode
                              • GeoPandas
                              • Numba
                              • LLVM
                              • scikit-learn
                              • Horovod
                              • Dagster
                                • Podcast.__init__ Episode
                                • Data Engineering Podcast Episode
                                • Airflow
                                  • Podcast Episode
                                  • IPython Parallel
                                  • Parquet
                                  • The intro and outro music is from Requiem for a Fish The Freak Fandango Orchestra / CC BY-SA

                                    ...more
                                    View all episodesView all episodes
                                    Download on the App Store

                                    The Python Podcast.__init__By Tobias Macey

                                    • 4.4
                                    • 4.4
                                    • 4.4
                                    • 4.4
                                    • 4.4

                                    4.4

                                    100 ratings


                                    More shows like The Python Podcast.__init__

                                    View all
                                    TED Talks Daily by TED

                                    TED Talks Daily

                                    11,284 Listeners

                                    6 Minute English by BBC Radio

                                    6 Minute English

                                    1,779 Listeners

                                    The Changelog: Software Development, Open Source by Changelog Media

                                    The Changelog: Software Development, Open Source

                                    285 Listeners

                                    Data Skeptic by Kyle Polich

                                    Data Skeptic

                                    474 Listeners

                                    Talk Python To Me by Michael Kennedy

                                    Talk Python To Me

                                    585 Listeners

                                    Software Engineering Daily by Software Engineering Daily

                                    Software Engineering Daily

                                    629 Listeners

                                    The TWIML AI Podcast (formerly This Week in Machine Learning & Artificial Intelligence) by Sam Charrington

                                    The TWIML AI Podcast (formerly This Week in Machine Learning & Artificial Intelligence)

                                    425 Listeners

                                    Super Data Science: ML & AI Podcast with Jon Krohn by Jon Krohn

                                    Super Data Science: ML & AI Podcast with Jon Krohn

                                    296 Listeners

                                    Python Bytes by Michael Kennedy and Brian Okken

                                    Python Bytes

                                    213 Listeners

                                    Syntax - Tasty Web Development Treats by Wes Bos & Scott Tolinski - Full Stack JavaScript Web Developers

                                    Syntax - Tasty Web Development Treats

                                    987 Listeners

                                    DataFramed by DataCamp

                                    DataFramed

                                    267 Listeners

                                    Practical AI by Practical AI LLC

                                    Practical AI

                                    196 Listeners

                                    The Real Python Podcast by Real Python

                                    The Real Python Podcast

                                    137 Listeners

                                    Last Week in AI by Skynet Today

                                    Last Week in AI

                                    275 Listeners

                                    Latent Space: The AI Engineer Podcast by swyx + Alessio

                                    Latent Space: The AI Engineer Podcast

                                    66 Listeners