The Swyx Mixtape by Swyx

October 24, 2022 [Tech] dbt as a standard - Laurie Voss

Listen to The Right Track: https://www.heavybit.com/library/podcasts/the-right-track/ep-6-domain-expertise-with-laurie-voss-of-netlify/

Transcript

Stefania: I wanted to maybe shift a little bit in terms of how the industry is changing before we move on to how you have seen data cultures being built and data trusts being undermined and all those things.

Can you talk a little bit about how you see the industry has changed in the past few years?

Laurie: Yeah. I wrote a blog post about this recently.

I think it's probably the thing that spurred you to invite me to this podcast in the first place.

Stefania: Correct.

Laurie: Which is about nine months ago, I was introduced to DBT. DBT has been around for awhile now, I think five or six years, but it was new to me nine months ago.

And it definitely seems to be exponentially gaining in momentum at the moment.

I hear more and more people are using it and see more and more stuff built on top of it.

And the analogy that I made in the blog post is as a web developer, it felt kind of like Rails in 2006.

Ruby on Rails very fundamentally changed how web development was done, because web development prior to that was everybody has sort of like figured out some architecture for their website and it works okay. But it means that every time you hire someone to a company, you have to teach them your architecture. And it would take them a couple of weeks, or if it was complicated, it would take them a couple of months to figure out your architecture and become productive. And Ruby on Rails changed that.

Ruby on Rails was you hire someone and you say, "Well, it's a Rails app."

And on day one, they're productive.

They know how to change Rails apps.

They know how to configure them.

They know how to write the HTML and CSS and every other thing.

And that taking the time to productivity for a new hire from three months to one month times a million developers is a gigantic amount of productivity that you have unlocked.

The economic impact of that is huge. And DBT feels very similar.

It's not doing anything that we weren't doing before.

It's not doing anything that you couldn't do if you were rolling your own, but it is a standard and it works very well and it handles the edge cases and it's got all of the complexities accounted for.

So you can start with DBT and be pretty confident that you're not going to run into something that DBT can do.

And it also means that you can hire people who already know DBT.

We've done it at Netlify. We've hired people with experience in DBT and they were productive on day one.

They were like, "Cool. I see that you've got this model. It's got a bug. I've committed a change. I've added some tests. We have fixed this data model."

What happens on day two? It's great.

The value of a framework is that a framework exists more than like any specific technical advantage of that framework.

Stefania: Yeah. I love that positioning of DBT.

Do you have any thoughts on why this has not happened in the data space before?

We have a lot of open source tools already built.

We had like a huge rise in people using Spark and Hadoop and all those things for their data infrastructure awhile ago, maybe 10 years ago, and that's still happening in some of the companies.

What are your thoughts on why this is happening now?

Laurie: I think it was inevitable.

I mean, the big data craze was 10 years ago.

I recently was reminded by somebody that I wrote a blog post.

It was literally 10 years ago. It was like July 15th 2011.

I was like, statisticians are going to be the growth career for the next 10 years, because all I see is people collecting data blindly.

They're just creating data warehouses and just pouring logs into them and then doing the most simple analyses on them.

They're just like counting them up.

They're not doing anything more complicated than counting them up.

A lot of companies in 2010 made these huge investments and then were like, "What now?"

And they were like, "Well, we've sort of figured we'd be able to do some kind of analysis, but we don't know how. This data is enormous. It's very difficult to do."

It was inevitable that people would be trying to solve this problem.

And lots of people rolled their own over and over.

Programmers are programmers, so when they find themselves rolling their own at the third job in a row, that's usually when they start writing a framework.

And that seems to be what DBT emerged from.

I think it's natural that it emerged now. I think this is how long it takes.

This is how much iteration the industry needed to land at this.

Stefania: Yeah. That's a good insight.

I maybe want to touch on then also another thing that a lot of people talk about.

And ultimately, I mean, I think what most companies want to strive for, although it remains to be defined what it literally means, are self-serve analytics.

What does that mean to you and how does that fit into the DBT world?

Laurie: I have what might be a controversial opinion about self-serve analytics, which is that I don't think it's really going to work.

There are a couple of problems that make self-serve analytics difficult.

What people are focusing on right now are like just the pure technical problems.

One of the problems with self-serve analytics is that it's just hard to do.

You have to have enormous amounts of data.

If people are going to be exploratory about the data, then the database needs to be extremely fast.

If queries take 10 minutes, then you can't do ad hoc data exploration.

Nobody but a data scientist is going to hang around for 10 minutes waiting for a query to finish.

Stefania: Finishing your query is the new-- It's compiling.

Laurie: But even when you solve that problem, and I feel like a lot of companies now solve that problem, you run into the next problem, which is, what question do I ask?

What is the sensible way to ask?

And also, where is it?

Discovery is another thing.

If you've instrumented properly, you're going to have enormous numbers of data sources, even if you're using DBT.

And they're all neatly arrayed in very nicely named tables and the tables of documentation, you're going to have 100, 200, 300 tables, right?

You have all sorts of forms of data.

And unless somebody goes through every table by name and tries to figure out what's in that table.

And does it answer my question?

The data team knows where the data is and it's very hard to make that data automatically discoverable.

I don't think people have solved that problem.

Even if you solved that...

...more