Kodsnack

Kodsnack 654 - German-style strings, with Matt Topol


Listen Later

Fredrik talks to Matt Topol about Arrow and how the Arrow ecosystem is evolving. Arrow is an open source, columnar in-memory data format designed for efficient data processing and analytics - which means passing data between things without needing to transform it, and ideally even without needing to copy it.

What makes the ecosystem grow, and why is it very cool to have Arrow on the GPU? What is the connection between Arrow, machine learning, and Hugging face? Matt emphasizes the value of open standards, even as they work with or within more closed systems they can help open things up, and help bring about more modular solutions so that developers can focus on doing their core area really well.

This episode can be seen as a follow-up to episode 567, where Matt first joined to discuss everything Arrow.

Recorded during Øredev 2024.

Thank you Cloudnet for sponsoring our VPS!

Comments, questions or tips? We a re @kodsnack, @tobiashieta, @oferlundand @bjoreman on Twitter, have a page on Facebook and can be emailed at [email protected] if you want to write longer. We read everything we receive.

If you enjoy Kodsnack we would love a review in iTunes! You can also support the podcast by buying us a coffee (or two!) through Ko-fi.

Links
  • Matt
  • Matt's Øredev 2023 talks: State of the Apache Arrow ecosystem: How your project can leverage Arrow! and Leveraging Apache Arrow for ML workflows
  • Previous episodes with Matt
  • Øredev 2024
  • Matt's Øredev 2024 talks - on Arrow ADBC and Composable and modular data systems
  • ADBC - Arrow database connectivity
  • Arrow
  • Snowflake
  • Snowflake drivers for ADBC
  • Bigquery
  • The Bigquery driver
  • Microsoft Fabric
  • Duckdb
  • Postgres
  • SQLite
  • Arrow flight - RPC framework for services based on Arrow data
  • Arrow flight SQL
  • Microsoft Power BI
  • Velox
  • Apache datafusion
  • Query planning
  • Substrait - query IR
  • Polaris
  • Libcudf
  • Nvidia RAPIDS
  • Pytorch
  • Tensorflow
  • Arrow device interface
  • DLPack - in-memory tensor structure
  • Tensors
  • Nanoarrow
  • Voltron data - where Matt used to work. He's now at Columnar
  • Theseus GPU compute engine
  • The composable data management system manifesto
  • Support us on Ko-fi!
  • Matt's book - In-memory analytics with Apache Arrow
  • Spark
  • Spark connect
  • RPC
  • UDFs
  • Photon
  • Datafusion
  • Apache Cassandra
  • ODBC
  • JDBC
  • R - programming language for statistical computing
  • Hugging face
  • Ray
  • Stringview - "German-style strings"
  • Scaling up with R and Arrow - the book on using Arrow with R
Titles
  • It's gotten a lot bigger
  • The bones of it are in the repo
  • (Powered by ADBC)
  • Individual compute components
  • Feed it substrate
  • Where the ecosystem is going
  • Arrow on the GPU
  • The data stays on the GPU
  • A forced copy
  • Leverage that device interface
  • Without forcing the copy
  • Shy of that last mile
  • Turtles all the way down
  • The guy who said yes
  • German-style strings
...more
View all episodesView all episodes
Download on the App Store

KodsnackBy Kristoffer, Fredrik, Tobias

  • 5
  • 5
  • 5
  • 5
  • 5

5

1 ratings


More shows like Kodsnack

View all
Historiepodden by Acast

Historiepodden

56 Listeners

Kapitalet by Monopol Media AB

Kapitalet

12 Listeners

Allt du velat veta by Acast - Fritte Fritzson

Allt du velat veta

8 Listeners

Tyngre Träningssnack by Tyngre

Tyngre Träningssnack

6 Listeners

Filip & Fredrik podcast by filipandfredrik.com

Filip & Fredrik podcast

90 Listeners

Den nya ekonomin by Dagens industri

Den nya ekonomin

2 Listeners

Marknaden by Monopol Media

Marknaden

6 Listeners

Asdf by Therése & Anton

Asdf

0 Listeners

Krigshistoriepodden by Krigshistoriepodden

Krigshistoriepodden

5 Listeners

Developers! - mer än bara kod by Madeleine Schönemann och Sofia Larsson

Developers! - mer än bara kod

0 Listeners

Fråga Anders och Måns by Somrig Ostsås AB

Fråga Anders och Måns

21 Listeners