Data Engineering Podcast

Revolutionizing Python Notebooks with Marimo


Listen Later

Summary
In this episode of the Data Engineering Podcast Akshay Agrawal from Marimo discusses the innovative new Python notebook environment, which offers a reactive execution model, full Python integration, and built-in UI elements to enhance the interactive computing experience. He discusses the challenges of traditional Jupyter notebooks, such as hidden states and lack of interactivity, and how Marimo addresses these issues with features like reactive execution and Python-native file formats. Akshay also explores the broader landscape of programmatic notebooks, comparing Marimo to other tools like Jupyter, Streamlit, and Hex, highlighting its unique approach to creating data apps directly from notebooks and eliminating the need for separate app development. The conversation delves into the technical architecture of Marimo, its community-driven development, and future plans, including a commercial offering and enhanced AI integration, emphasizing Marimo's role in bridging the gap between data exploration and production-ready applications.

Announcements
  • Hello and welcome to the Data Engineering Podcast, the show about modern data management
  • Tired of data migrations that drag on for months or even years? What if I told you there's a way to cut that timeline by up to 6x while guaranteeing accuracy? Datafold's Migration Agent is the only AI-powered solution that doesn't just translate your code; it validates every single data point to ensure perfect parity between your old and new systems. Whether you're moving from Oracle to Snowflake, migrating stored procedures to dbt, or handling complex multi-system migrations, they deliver production-ready code with a guaranteed timeline and fixed price. Stop burning budget on endless consulting hours. Visit dataengineeringpodcast.com/datafold to book a demo and see how they're turning months-long migration nightmares into week-long success stories.
  • Your host is Tobias Macey and today I'm interviewing Akshay Agrawal about Marimo, a reusable and reproducible Python notebook environment
Interview
  • Introduction
  • How did you get involved in the area of data management?
  • Can you describe what Marimo is and the story behind it?
  • What are the core problems and use cases that you are focused on addressing with Marimo?
    • What are you explicitly not trying to solve for with Marimo?
  • Programmatic notebooks have been around for decades now. Jupyter was largely responsible for making them popular outside of academia. How have the applications of notebooks changed in recent years?
    • What are the limitations that have been most challenging to address in production contexts?
  • Jupyter has long had support for multi-language notebooks/notebook kernels. What is your opinion on the utility of that feature as a core concern of the notebook system?
  • Beyond notebooks, Streamlit and Hex have become quite popular for publishing the results of notebook-style analysis. How would you characterize the feature set of Marimo for those use cases?
  • For a typical data team that is working across data pipelines, business analytics, ML/AI engineering, etc. How do you see Marimo applied within and across those contexts?
  • One of the common difficulties with notebooks is that they are largely a single-player experience. They may connect into a shared compute cluster for scaling up execution (e.g. Ray, Dask, etc.). How does Marimo address the situation where a data platform team wants to offer notebooks as a service to reduce the friction to getting started with analyzing data in a warehouse/lakehouse context?
  • How are you seeing teams integrate Marimo with orchestrators (e.g. Dagster, Airflow, Prefect)?
  • What are some of the most interesting or complex engineering challenges that you have had to address while building and evolving Marimo?\
  • What are the most interesting, innovative, or unexpected ways that you have seen Marimo used?
  • What are the most interesting, unexpected, or challenging lessons that you have learned while working on Marimo?
  • When is Marimo the wrong choice?
  • What do you have planned for the future of Marimo?
Contact Info
  • LinkedIn
Parting Question
  • From your perspective, what is the biggest gap in the tooling or technology for data management today?
Closing Announcements
  • Thank you for listening! Don't forget to check out our other shows. Podcast.__init__ covers the Python language, its community, and the innovative ways it is being used. The AI Engineering Podcast is your guide to the fast-moving world of building AI systems.
  • Visit the site to subscribe to the show, sign up for the mailing list, and read the show notes.
  • If you've learned something or tried out a project from the show then tell us about it! Email [email protected] with your story.
Links
  • Marimo
  • Jupyter
  • IPython
  • Streamlit
    • Podcast.__init__ Episode
  • Vector Embeddings
  • Dimensionality Reduction
  • Kaggle
  • Pytest
  • PEP 723 script dependency metadata
  • MatLab
  • Visicalc
  • Mathematica
  • RMarkdown
  • RShiny
  • Elixir Livebook
  • Databricks Notebooks
  • Papermill
  • Pluto - Julia Notebook
  • Hex
  • Directed Acyclic Graph (DAG)
  • Sumble Kaggle founder Anthony Goldblum's startup
  • Ray
  • Dask
  • Jupytext
  • nbdev
  • DuckDB
    • Podcast Episode
  • Iceberg
  • Superset
  • jupyter-marimo-proxy
  • JupyterHub
  • Binder
  • Nix
  • AnyWidget
  • Jupyter Widgets
  • Matplotlib
  • Altair
  • Plotly
  • DataFusion
  • Polars
  • MotherDuck
The intro and outro music is from The Hug by The Freak Fandango Orchestra / CC BY-SA
...more
View all episodesView all episodes
Download on the App Store

Data Engineering PodcastBy Tobias Macey

  • 4.5
  • 4.5
  • 4.5
  • 4.5
  • 4.5

4.5

136 ratings


More shows like Data Engineering Podcast

View all
Software Engineering Radio - the podcast for professional software developers by se-radio@computer.org

Software Engineering Radio - the podcast for professional software developers

274 Listeners

The Changelog: Software Development, Open Source by Changelog Media

The Changelog: Software Development, Open Source

284 Listeners

The Cloudcast by Massive Studios

The Cloudcast

152 Listeners

Thoughtworks Technology Podcast by Thoughtworks

Thoughtworks Technology Podcast

40 Listeners

Data Skeptic by Kyle Polich

Data Skeptic

481 Listeners

Talk Python To Me by Michael Kennedy

Talk Python To Me

590 Listeners

Software Engineering Daily by Software Engineering Daily

Software Engineering Daily

621 Listeners

Super Data Science: ML & AI Podcast with Jon Krohn by Jon Krohn

Super Data Science: ML & AI Podcast with Jon Krohn

298 Listeners

Python Bytes by Michael Kennedy and Brian Okken

Python Bytes

215 Listeners

DataFramed by DataCamp

DataFramed

267 Listeners

Practical AI by Practical AI LLC

Practical AI

192 Listeners

The Stack Overflow Podcast by The Stack Overflow Podcast

The Stack Overflow Podcast

62 Listeners

The Real Python Podcast by Real Python

The Real Python Podcast

139 Listeners

Latent Space: The AI Engineer Podcast by swyx + Alessio

Latent Space: The AI Engineer Podcast

75 Listeners

The Pragmatic Engineer by Gergely Orosz

The Pragmatic Engineer

63 Listeners