Data Science at Home

What data transformation library should I use? Pandas vs Dask vs Ray vs Modin vs Rapids (Ep. 112)


Listen Later

In this episode I speak about data transformation frameworks available for the data scientist who writes Python code.

The usual suspect is clearly Pandas, as the most widely used library and de-facto standard. However when data volumes increase and distributed algorithms are in place (according to a map-reduce paradigm of computation), Pandas no longer performs as expected. Other frameworks play a role in such context. 

In this episode I explain the frameworks that are the best equivalent to Pandas in bigdata contexts.

Don't forget to join our Discord channel and comment previous episodes or propose new ones.

 

This episode is supported by Amethix Technologies

Amethix works to create and maximize the impact of the world’s leading corporations, startups, and nonprofits, so they can create a better future for everyone they serve. Amethix is a consulting firm focused on data science, machine learning, and artificial intelligence.

 

References

Pandas a fast, powerful, flexible and easy to use open source data analysis and manipulation tool - https://pandas.pydata.org/

  • Modin - Scale your pandas workflows by changing one line of code - https://github.com/modin-project/modin

  • Dask advanced parallelism for analytics https://dask.org/

  • Ray is a fast and simple framework for building and running distributed applications https://github.com/ray-project/ray

  • RAPIDS - GPU data science https://rapids.ai/

    ...more
    View all episodesView all episodes
    Download on the App Store

    Data Science at HomeBy Francesco Gadaleta

    • 4.2
    • 4.2
    • 4.2
    • 4.2
    • 4.2

    4.2

    72 ratings


    More shows like Data Science at Home

    View all
    Radiolab by WNYC Studios

    Radiolab

    43,835 Listeners

    TED Talks Daily by TED

    TED Talks Daily

    11,280 Listeners

    Learning English Conversations by BBC Radio

    Learning English Conversations

    1,060 Listeners

    Stuff You Should Know by iHeartPodcasts

    Stuff You Should Know

    77,235 Listeners

    Data Skeptic by Kyle Polich

    Data Skeptic

    474 Listeners

    Talk Python To Me by Michael Kennedy

    Talk Python To Me

    585 Listeners

    AWS Podcast by Amazon Web Services

    AWS Podcast

    200 Listeners

    Super Data Science: ML & AI Podcast with Jon Krohn by Jon Krohn

    Super Data Science: ML & AI Podcast with Jon Krohn

    295 Listeners

    Learning English from the News by BBC Radio

    Learning English from the News

    253 Listeners

    DataFramed by DataCamp

    DataFramed

    267 Listeners

    Practical AI by Practical AI LLC

    Practical AI

    196 Listeners

    The Intelligence from The Economist by The Economist

    The Intelligence from The Economist

    2,538 Listeners

    Raport o stanie świata Dariusza Rosiaka by Dariusz Rosiak

    Raport o stanie świata Dariusza Rosiaka

    42 Listeners

    The Ancients by History Hit

    The Ancients

    2,824 Listeners

    Hard Fork by The New York Times

    Hard Fork

    5,364 Listeners