hi everyone Welcome to our event this event is brought to you by data dos club which is a community of people who love
data and we have weekly events and today one is one of such events and I guess we
are also a community of people who like to wake up early if you're from the states right Christopher or maybe not so
much because this is the time we usually have uh uh our events uh for our guests
and presenters from the states we usually do it in the evening of Berlin time but yes unfortunately it kind of
slipped my mind but anyways we have a lot of events you can check them in the
description like there's a link um I don't think there are a lot of them right now on that link but we will be
adding more and more I think we have like five or six uh interviews scheduled so um keep an eye on that do not forget
to subscribe to our YouTube channel this way you will get notified about all our future streams that will be as awesome
as the one today and of course very important do not forget to join our community where you can hang out with
other data enthusiasts during today's interview you can ask any question there's a pin Link in live chat so click
on that link ask your question and we will be covering these questions during the interview now I will stop sharing my
screen and uh there is there's a a message in uh and Christopher is from
you so we actually have this on YouTube but so they have not seen what you wrote
but there is a message from to anyone who's watching this right now from Christopher saying hello everyone can I
call you Chris or you okay I should go I should uh I should look on YouTube then okay yeah but anyways I'll you don't
need like you we'll need to focus on answering questions and I'll keep an eye
I'll be keeping an eye on all the question questions so um
yeah if you're ready we can start I'm ready yeah and you prefer Christopher
not Chris right Chris is fine Chris is fine it's a bit shorter um
okay so this week we'll talk about data Ops again maybe it's a tradition that we talk about data Ops every like once per
year but we actually skipped one year so because we did not have we haven't had
Chris for some time so today we have a very special guest Christopher Christopher is the co-founder CEO and
head chef or hat cook at data kitchen with 25 years of experience maybe this
is outdated uh cuz probably now you have more and maybe you stopped counting I
don't know but like with tons of years of experience in analytics and software engineering Christopher is known as the
co-author of the data Ops cookbook and data Ops Manifesto and it's not the
first time we have Christopher here on the podcast we interviewed him two years ago also about data Ops and this one
will be about data hops so we'll catch up and see what actually changed in in
these two years and yeah so welcome to the interview well thank you for having
me I'm I'm happy to be here and talking all things related to data Ops and why
why why bother with data Ops and happy to talk about the company or or what's changed
excited yeah so let's dive in so the questions for today's interview are prepared by Johanna berer as always
thanks Johanna for your help so before we start with our main topic for today
data Ops uh let's start with your ground can you tell us about your career Journey so far and also for those who
have not heard have not listened to the previous podcast maybe you can um talk
about yourself and also for those who did listen to the previous you can also maybe give a summary of what has changed
in the last two years so we'll do yeah so um my name is Chris so I guess I'm
a sort of an engineer so I spent about the first 15 years of my career in
software sort of working and building some AI systems some non- AI systems uh
at uh Us's NASA and MIT linol lab and then some startups and then um
Microsoft and then about 2005 I got I got the data bug uh I think you know my
kids were small and I thought oh this data thing was easy and I'd be able to go home uh for dinner at 5 and life
would be fine um because I was a big you started your own company right and uh it didn't work out that way
and um and what was interesting is is for me it the problem wasn't doing the
data like I we had smart people who did data science and data engineering the act of creating things it was like the
systems around the data that were hard um things it was really hard to not have
errors in production and I would sort of driving to work and I had a Blackberry at the time and I would not look at my
Blackberry all all morning I had this long drive to work and I'd sit in the parking lot and take a deep breath and
look at my Blackberry and go uh oh is there going to be any problems today and I'd be and if there wasn't I'd walk and
very happy um and if there was I'd have to like rce myself um and you know and
then the second problem is the team I worked for we just couldn't go fast enough the customers were super
demanding they didn't care they all they always thought things should be faster and we are always behind and so um how
do you you know how do you live in that world where things are breaking left and right you're terrified of making errors
um and then second you just can't go fast enough um and it's preh Hadoop era
right it's like before all this big data Tech yeah before this was we were using
uh SQL Server um and we actually you know we had smart people so we we we
built an engine in SQL Server that made SQL Server a column or
database so we built a column or database inside of SQL Server um so uh
in order to make certain things fast and and uh yeah it was it was really uh it's not
bad I mean the principles are the same right before Hadoop it's it's still a database there's still indexes there's
still queries um things like that we we uh at the time uh you would use olap
engines we didn't use those but you those reports you know are for models it's it's not that different um you know
we had a rack of servers instead of the cloud um so yeah and I think so what what I
took from that was uh it's just hard to run a team of people to do do data and analytics and it's not
really I I took it from a manager perspective I started to read Deming and
think about the work that we do as a factory you know and in a factory that produces insight and not automobiles um
and so how do you run that factory so it produces things that are good of good
quality and then second since I had come from software I've been very influenced
by by the devops movement how you automate deployment how you run in an agile way how you
produce um how you how you change things quickly and how you innovate and so
those two things of like running you know running a really good solid production line that has very low errors
um and then second changing that production line at at very very often they're kind of opposite right um and so
how do you how do you as a manager how do you technically approach that and
then um 10 years ago when we started data kitchen um we've always been a profitable company and so we started off
uh with some customers we started building some software and realized that we couldn't work any other way and that
the way we work wasn't understood by a lot of people so we had to write a book and a Manifesto to kind of share our our
methods and then so yeah we've been in so we've been in business now about a little over 10
years oh that's cool and uh like what
uh so let's talk about dat offs and you mentioned devops and how you were inspired by that and by the way like do
you remember roughly when devops as I think started to appear like when did people start calling these principles
and like tools around them as de yeah so agile Manifesto well first of all the I
mean I had a boss in 1990 at Nasa who had this idea build a
little test a little learn a lot right that was his Mantra and then which made
made a lot of sense um and so and then the sort of agile software Manifesto
came out which is very similar in 2001 and then um the sort of first real
devops was a guy at Twitter started to do automat automated deployment you know
push a button and that was like 200 Nish and so the first I think devops
Meetup was around then so it's it's it's been 15 years I guess 6 like I was
trying to so I started my career in 2010 so I my first job was a Java
developer and like I remember for some things like we would just uh SFTP to the
machine and then put the jar archive there and then like keep our fingers crossed that it doesn't break uh uh like
it was not really the I wouldn't call it this way right you were deploying you
had a Dey process I put it yeah
right was that so that was documented too it was like put the jar on production cross your
fingers I think there was uh like a page on uh some internal Viki uh yeah that
describes like with passwords and don't like what you should do yeah that was and and I think what's interesting is
why that changed right and and we laugh at it now but that was why didn't you
invest in automating deployment or a whole bunch of automated regression
tests right that would run because I think in software now that would be rare
that people wouldn't use C CD they wouldn't have some automated tests you know functional
regression tests that would be the exception whereas that the norm at the beginning of your career and so that's
what's interesting and I think you know if we if we talk about what's changed in the last two three years I I think it is
getting more standard there are um there's a lot more companies who are
talking data Ops or data observability um there's a lot more tools that are a lot more people are
using get in data and analytics than ever before I think thanks to DBT um and
there's a lot of tools that are I think getting more code Centric right that
they're not treating their configuration like a black box there there's several
bi tools that tout the fact that they that they're uh you know they're they're git Centric you know and and so and that
they're testable and that they have apis so things like that I think people maybe let's take a step back and just do a
quick summary of what data Ops data Ops is and then we can talk about like what changed in the last two years sure so I
guess it starts with a problem and that it's it sort of
admits some dark things about data and analytics and that we're not really successful and we're not really happy um
and if you look at the statistics on sort of projects and problems and even
the psychology like I think about a year or two we did a survey of
data Engineers 700 data engineers and 78% of them wanted their job to come with a therapist and 50% were thinking
of leaving the career altogether and so why why is everyone sort of unhappy well I I I think what happens is
teams either fall into two buckets they're sort of heroic teams who
are doing their they're working night and day they're trying really hard for their customer um and then they get
burnt out and then they quit honestly and then the second team have wrapped
their projects up in so much process and proceduralism and steps that doing
anything is sort of so slow and boring that they again leave in frustration um
or or live in cynicism and and that like the only outcome is quit and
start uh woodworking yeah the only outcome really is quit and start working
and um as a as a manager I always hated that right because when when your team
is either full of heroes or proceduralism you always have people who have the whole system in their head
they're certainly key people and then when they leave they take all that knowledge with them and then that
creates a bottleneck and so both of which are aren aren't and I think the
main idea of data Ops is there's a balance between fear and herois
that you can live you don't you know you don't have to be fearful 95% of the time maybe one or two% it's good to be
fearful and you don't have to be a hero again maybe one or two per it's good to be a hero but there's a balance um and
and in that balance you actually are much more prod