Data Science at Home

What data transformation library should I use? Pandas vs Dask vs Ray vs Modin vs Rapids (Ep. 112)


Listen Later

In this episode I speak about data transformation frameworks available for the data scientist who writes Python code.

The usual suspect is clearly Pandas, as the most widely used library and de-facto standard. However when data volumes increase and distributed algorithms are in place (according to a map-reduce paradigm of computation), Pandas no longer performs as expected. Other frameworks play a role in such context. 

In this episode I explain the frameworks that are the best equivalent to Pandas in bigdata contexts.

Don't forget to join our Discord channel and comment previous episodes or propose new ones.

 

This episode is supported by Amethix Technologies

Amethix works to create and maximize the impact of the world’s leading corporations, startups, and nonprofits, so they can create a better future for everyone they serve. Amethix is a consulting firm focused on data science, machine learning, and artificial intelligence.

 

References

Pandas a fast, powerful, flexible and easy to use open source data analysis and manipulation tool - https://pandas.pydata.org/

  • Modin - Scale your pandas workflows by changing one line of code - https://github.com/modin-project/modin

  • Dask advanced parallelism for analytics https://dask.org/

  • Ray is a fast and simple framework for building and running distributed applications https://github.com/ray-project/ray

  • RAPIDS - GPU data science https://rapids.ai/

    ...more
    View all episodesView all episodes
    Download on the App Store

    Data Science at HomeBy Francesco Gadaleta

    • 4.2
    • 4.2
    • 4.2
    • 4.2
    • 4.2

    4.2

    72 ratings


    More shows like Data Science at Home

    View all
    On Point with Meghna Chakrabarti by WBUR

    On Point with Meghna Chakrabarti

    4,026 Listeners

    Making Sense with Sam Harris by Sam Harris

    Making Sense with Sam Harris

    26,380 Listeners

    Nature Podcast by Springer Nature Limited

    Nature Podcast

    755 Listeners

    Software Engineering Daily by Software Engineering Daily

    Software Engineering Daily

    628 Listeners

    Science Vs by Spotify Studios

    Science Vs

    12,134 Listeners

    Science Friday by Science Friday and WNYC Studios

    Science Friday

    6,461 Listeners

    Super Data Science: ML & AI Podcast with Jon Krohn by Jon Krohn

    Super Data Science: ML & AI Podcast with Jon Krohn

    305 Listeners

    The Daily by The New York Times

    The Daily

    113,219 Listeners

    Up First from NPR by NPR

    Up First from NPR

    56,957 Listeners

    The Atlantic Interview by The Atlantic

    The Atlantic Interview

    14 Listeners

    Modern Wisdom by Chris Williamson

    Modern Wisdom

    4,024 Listeners

    The Peter Attia Drive by Peter Attia, MD

    The Peter Attia Drive

    8,036 Listeners

    Practical AI by Practical AI LLC

    Practical AI

    211 Listeners

    Consider This from NPR by NPR

    Consider This from NPR

    6,466 Listeners

    The Ezra Klein Show by New York Times Opinion

    The Ezra Klein Show

    16,524 Listeners