Serverless Chats

Episode #39: Big Data and Serverless with Lynn Langit


Listen Later

About Lynn Langit

Lynn Langit is a Cloud Architect who codes. She's a Cloud and Big Data Architect, AWS Community Hero, Google Cloud Developer Expert, and Microsoft Azure Insider. She has a wealth of cloud training courses on Lynda.com. Lynn is currently working on Cloud-based bioinformatics projects.

  • Twitter: @LynnLangit
  • Site: LynnLangit.com
  • Courses: https://www.linkedin.com/learning/instructors/lynn-langit
  • GCP for Bioinformatics: https://github.com/lynnlangit/gcp-for-bioinformatics


Mentioned Articles:

  • Genome Engineering Applications: Early Adopters of the Cloud by Jeff Barr
  • Scaling Custom Machine Learning on AWS
  • Scaling Custom Machine Learning on AWS — Part 2 EMR
  • Scaling Custom Machine Learning on AWS — Part 3 Kubernetes
  • Shopping with DNA
  • Learn | Build | Teach


Transcript

Jeremy: Hi everyone I'm Jeremy Daly and you're listening to Serverless Chats. This week, I'm chatting with Lynn Langit. Hi Lynn. Thanks for joining me.

Lynn: Hi. Thanks for inviting me.

Jeremy: So you refer to yourself as a coding cloud architect. You're also an author and an instructor. So why don't you tell the listeners a little bit about yourself and what you've been up to lately?

Lynn: Sure. I run my own consulting company. I've done so for eight years now and I work on various projects on the cloud. Most recently I've been doing most of my work on GCP because that's what my customers are interested in. But I've done production work on AWS and Azure. And I've actually done some POCs now on Alibaba Cloud. So one of the characteristics of me and my team is that we work on whichever clouds best serve our customers, which makes work really fun. In terms of the work that we do it really depends on what the customer needs because I have this ability to work in multi-cloud. Sometimes it's me working with C levels or senior technical people helping them to make technology choices, so based on their particular vertical. But at other times I'll hire a team of subcontractors for a particular project and we might build a POC. We might actually build all the way to MVP for a customer.

Lynn: And then occasionally I take projects where I build all the way out. The longest one I've had over the past few years is I did a project for 14 months where we went from design all the way out to product. And I worked every single day I was embedded with the developer team. So I do everything from design to coding to testing. It's a fun life.

Jeremy: It sounds like it. Well, so listen, I have been following you for a very long time and I'm a huge fan of the work that you've done. I've watched some of your videos on LinkedIn Learning and just been following along with some of this other stuff that you've done. And really like you said, a lot of what you have done has been around big data and recently you've been getting into, or you have gotten into, big data and serverless. And that's really what I'd love to talk to you about today because I just find big data to be absolutely fascinating and just the volume of data that we are collecting nowadays is absolutely insane. It's overwhelming.

And I don't know if traditional systems or if especially smaller teams working on some of these specialty products have the capability or the resources to keep up with the amount of data that's coming in based off of sort of some of these traditional methods to do that. So we can get into all of that. And I have a feeling this discussion will go all over the place, which is awesome. But maybe we could start just by sort of level setting the audience and just explaining what big data is or I think maybe what you mean by big data.

Lynn: I can have a really simple explanation. I'll say the explanation and I'll tell you why. So the explanation is data of a size that doesn't function effectively in your SQL Server or your Oracle Server or your data warehouse, so your traditional systems. And the reason I say this is because that is my professional background. I've been doing this for about 20 years now and for the first five or so maybe seven, I was working in those systems. I've actually written three books on SQL Server data warehousing. I worked for Microsoft as a developer evangelist back in 2007 to 2011. And the consulting practice that I built initially was around optimization of relational database systems.

So I was literally working on systems and figuring out, oh, this could be optimized. Let's optimize it. Oh, whoops, we have too much data now, what do we do? So when I left Microsoft in 2011 to launch my consultancy, I left because I was so fascinated by what was coming beyond these systems. One of the impetus was the launching of Hadoop as an open source project. And literally when I left Microsoft, I went to New Jersey and I took a class with Hadoop Developers, which was really throwing me in the deep end because I had come out of the Windows ecosystem. Of course the class was on Linux in Java, all coding. And I learned a lot that week.

Jeremy: I can imagine, yeah. So that's maybe my question there. So big data is this volume of data, this immense amount of data that's coming in that I think as you put it, that sort of these traditional systems like a SQL Server or even an Oracle can't handle or at least can't handle at a scale that would make the processing easy. So you mentioned Hadoop and there's other things like Redshift is now a popular choice for sort of data warehousing. And then you've got Snowflake and Tableau and some of these other things I think that are ... products out there that are trying to find a way to analyze this data. But what is the problem with these traditional systems when it comes to this massive amount of data?

Lynn: Well, it goes to the CAP theorem, which is consistency, availability and partitioning. This is sort of classic database ... what are the capabilities of a database? And it's really kind of common sense. A database can have two of the three but not all three. So you can have basically the ability for transactions which is relational databases or you can have the ability to add partitions is really kind of to simplify it easily. Because if you think about it, when you're adding partitions, you're adding redundancy. It's a trade off. And so are you adding partitions for scalability? And so when adding partitions makes a relational database too slow, then what do you do? So what you then do is you partition the data in the database to SQL and NoSQL.

And again, I did a whole bunch of work back in 2011, 2012, 2013. I worked with MongoDB, I worked with Redis. And one of the sort of key, I don't know, books I guess, would be Seven Databases in Seven Weeks. It's still very valid book even t...

...more
View all episodesView all episodes
Download on the App Store

Serverless ChatsBy Jeremy Daly & Rebecca Marshburn

  • 5
  • 5
  • 5
  • 5
  • 5

5

29 ratings