Data Science at Home

What data transformation library should I use? Pandas vs Dask vs Ray vs Modin vs Rapids (Ep. 112)


Listen Later

In this episode I speak about data transformation frameworks available for the data scientist who writes Python code.

The usual suspect is clearly Pandas, as the most widely used library and de-facto standard. However when data volumes increase and distributed algorithms are in place (according to a map-reduce paradigm of computation), Pandas no longer performs as expected. Other frameworks play a role in such context. 

In this episode I explain the frameworks that are the best equivalent to Pandas in bigdata contexts.

Don't forget to join our Discord channel and comment previous episodes or propose new ones.

 

This episode is supported by Amethix Technologies

Amethix works to create and maximize the impact of the world’s leading corporations, startups, and nonprofits, so they can create a better future for everyone they serve. Amethix is a consulting firm focused on data science, machine learning, and artificial intelligence.

 

References

Pandas a fast, powerful, flexible and easy to use open source data analysis and manipulation tool - https://pandas.pydata.org/

  • Modin - Scale your pandas workflows by changing one line of code - https://github.com/modin-project/modin

  • Dask advanced parallelism for analytics https://dask.org/

  • Ray is a fast and simple framework for building and running distributed applications https://github.com/ray-project/ray

  • RAPIDS - GPU data science https://rapids.ai/

    ...more
    View all episodesView all episodes
    Download on the App Store

    Data Science at HomeBy Francesco Gadaleta

    • 4.2
    • 4.2
    • 4.2
    • 4.2
    • 4.2

    4.2

    72 ratings


    More shows like Data Science at Home

    View all
    On Point with Meghna Chakrabarti by WBUR

    On Point with Meghna Chakrabarti

    3,998 Listeners

    Making Sense with Sam Harris by Sam Harris

    Making Sense with Sam Harris

    26,331 Listeners

    Nature Podcast by Springer Nature Limited

    Nature Podcast

    766 Listeners

    Software Engineering Daily by Software Engineering Daily

    Software Engineering Daily

    623 Listeners

    Science Vs by Spotify Studios

    Science Vs

    12,153 Listeners

    Science Friday by Science Friday and WNYC Studios

    Science Friday

    6,472 Listeners

    Super Data Science: ML & AI Podcast with Jon Krohn by Jon Krohn

    Super Data Science: ML & AI Podcast with Jon Krohn

    301 Listeners

    The Daily by The New York Times

    The Daily

    113,521 Listeners

    Up First from NPR by NPR

    Up First from NPR

    57,047 Listeners

    The Atlantic Interview by The Atlantic

    The Atlantic Interview

    27 Listeners

    Modern Wisdom by Chris Williamson

    Modern Wisdom

    4,111 Listeners

    The Peter Attia Drive by Peter Attia, MD

    The Peter Attia Drive

    8,710 Listeners

    Practical AI by Practical AI LLC

    Practical AI

    203 Listeners

    Consider This from NPR by NPR

    Consider This from NPR

    6,470 Listeners

    The Ezra Klein Show by New York Times Opinion

    The Ezra Klein Show

    16,418 Listeners