Regular Programming

About Data Pipelines


Listen Later

Lars dove into data pipelines, and emerged bearing arrows and wishing for a lot fewer copies.

What is there to think about regarding data pipelines, what is interesting about them?

Which tools are out there, and why might you want to use them?

Why all this talk about making fewer copies of data?

What does Lars' current ideal pipeline look like, and where does Elixir fit in?

Links

  • Matt Topol
  • Apache Arrow
  • Large language models
  • Vector search
  • BigQuery
  • sed
  • AWK
  • jq
  • Replacing Hadoop with bash - "Command-line Tools can be 235x Faster than your Hadoop Cluster"
  • Hadoop
  • MapReduce
  • Unix pipes
  • Directed acyclic graph
  • tee - to "materialize inbetween states"
  • Apache Beam
  • Apache Spark
  • Apache Flink
  • Apache Pulsar
  • Airbyte - shoves data between systems using connectors
  • Cronjob
  • Fivetran - Airbyte competitor
  • Apache Airflow
  • ETL - Extract, transform, load
  • Designing data-intensive applications
  • Stream processing
  • Ephemerality
  • Data lake
  • Data warehouse
  • The people's front of Judea
  • DBT - SQL-SQL batch-work-thingy
  • SQL with Jinja templates
  • Snowflake - data warehouse thing
  • Scala
  • Broadway
  • Oban - "robust job processing for Elixir"
  • Dashbit
  • pandas - Python data library
  • APL
  • Arrow flight
  • GRPC
  • DataFusion - query execution engine
  • Polars - "DataFrames in Rust"
  • Explorer - built on top of Polars
  • Voltron data
  • The Composable Codex
  • Pyarrow - Arrow bindings for Python

Quotes

  • I've been reading a lot about data pipelines
  • What's so special about data pipelines?
  • There's a lot of special tooling
  • There's a lot of bad, bad tooling
  • Less than optimal tooling
  • Converging on something biggerlk
  • He got me eventually
  • All of your steps in one bucket
  • What tools do you associate with data?
  • I inherited a data pipeline
  • BashReduce
  • Iterate on the L and the T
  • The modern data stack
  • And then you demand more work
  • No unnecessary copies
  • Barely a copy
  • Reconnecting with my Python roots
...more
View all episodesView all episodes
Download on the App Store

Regular ProgrammingBy Lars Wikman, Andreas Ekeroot


More shows like Regular Programming

View all
Radiolab by WNYC Studios

Radiolab

43,941 Listeners

Kodsnack by Kristoffer, Fredrik, Tobias

Kodsnack

1 Listeners

Accidental Tech Podcast by Marco Arment, Casey Liss, John Siracusa

Accidental Tech Podcast

2,092 Listeners

Allt du velat veta by Acast - Fritte Fritzson

Allt du velat veta

16 Listeners

Elixir Wizards by SmartLogic LLC

Elixir Wizards

22 Listeners

Thinking Elixir Podcast by ThinkingElixir.com

Thinking Elixir Podcast

33 Listeners

Developers! - mer än bara kod by Madeleine Schönemann och Sofia Larsson

Developers! - mer än bara kod

0 Listeners