Serverless Chats

Episode #8: Observability in Modern Applications with Ran Ribenzaft


Listen Later

About Ran Ribenzaft:

Ran is a passionate developer, with vast experience in network, infrastructure, and cyber-security. He's constantly chasing new technologies, with a current focus on Serverless. He is an open source contributor and currently the co-founder and CTO at Epsagon, a tool for monitoring serverless applications.

  • Twitter: @ranrib
  • Epsagon website: Epsagon.com
  • Epsagon blog: Epsagon.com/blog

Transcript:

Jeremy: Hi, everyone. I'm Jeremy Daly and you're listening to Serverless Chats. This week, I'm chatting with Ran Ribenzaft. Hi, Ran. Thanks for joining me.

Ran: Thank you very much for having me, Jeremy.

Jeremy: So you are the CTO at Epsagon. So why don't you tell the listeners a little bit about yourself and also what Epsagon is doing?

Ran: So I'm Ran, the co-founder and the CTO over at Epsagon, based in Israel, Tel Aviv, a fun place to be, very warm. In my previous roles, I've been doing mostly cybersecurity stuff. So mostly getting into kernels and things that I can't tell you, you probably heard some of them in the news. But let's keep it discreet. And in my recent role, I'm the CTO here at Epsagon, a start up where I'm one of the co-founders, and mainly focuses on monitoring and troubleshooting modern applications. But in general, the idea is to have a single platform where you get both the monitoring capabilities, which is "is my application working properly," "is it meeting the SLAs performance,"  and so on. And the troubleshooting part, where something bad happens, you need to scroll through the logs and correlate between them and do all this distributed tracing. That's Epsagon in a nutshell.

Jeremy: Alright, great. So I wanted to have you on because I want to talk about observability in modern applications, and by modern applications, we mean cloud native, or serverless, or distributed, or whatever sort of buzzword we might want to use to describe it. But as our applications grow beyond the traditional monoliths, being able to observe what is happening in your applications is a huge part of what we need to do when we are building these modern, interconnected systems. So maybe you could just give me a minute or so on what's the difference between your traditional monitoring and observability systems, and where we are now and what's changed ?

Ran: Definitely. So it starts with the change in our infrastructure in our way we code. So if we used to have this monolithic application running on our on-prem servers, so the things that you wanted to monitor are like: what is the network throughput, and what is the CPU usage, and hard disks and so on. And, you know, just making sure the application, the process itself is alive there. But shifting to more modern application, which I think in my mind the modern application is something that you don't mess with the infrastructure around it. You get most of the services out of the box working for you in a matter of configurations that you just can, you know, tick some boxes that I want this feature and so on. And you just built your own business logic through that where it can run — I don't care whether it's in your server, something like a function of the service or other thing. So this is modern application, and in this kind of modern application, there's a big difference in what you want to monitor. Honestly, we're doing monitoring to make sure our business works. Our application is our business. I want to make sure it works, so things like how much CPU is being consumed or network throughput, or all these kinds of metrics that just show me charts about infrastructure are getting irrelevant over time. Like, for example, if I used to have a chart of how much CPU usage my database is consuming, so now we don't really care if I'm going to a managed database - a fully managed one, not like a semi-managed - I don't really care about the CPU or anything else. I just want to make sure it works, and my application can speak to it at the right timing, at the right performance, and it gets the right results. So that's the first thing. The second thing is mostly about the nature of these kind of applications, which we broke them from being a big monolith, a big single monolith, to multiples of microservices, you can call it microservices, service, nanoservices, but the fact that there was one giant thing that broke into 10 or hundreds of resources, suddenly presents a different problem. A problem where you need to understand what is the interconnectivity between these resources, that you need to keep track of messages that [are] going from one service to another, and once something bad happened, you want to see the root cause analysis. This is like a repetitive thing that you can hear over and over. This root cause analysis, so the ability to jump from the error - the error can be like a performance issue or like exception in the code - all the way to the beginning. The beginning can be the user that clicks on a button on your business website that caused this chain of events. So these are the kinds of things that you want to see where, in traditional APMs, in traditional monitoring solutions, you don't have it. And in the future, once you'll find it more and more like that.

Jeremy: Yes, so you mentioned, you said logs. You said metrics. You said interconnectivity. You talked about a couple of different things, and I think it's probably important for listeners who maybe aren't 100% familiar with what observability is. There's this thing called the three pillars of observability, which are logs, metrics and traces. So maybe we can talk about that and you can sort of tell us why each one of those things is important.

Ran: Yeah, I'll start first with the metrics. So metrics are like the key component that you can ask questions about. Let's say how [many] events that I got per day, how [many] purchases, how [many] events of [this] kind that promote my business [are there] per day or per timeframe that you want to see. These kind of metrics are the base unit that you want to monitor. Now, when something bad happens in this metric, sometimes you need to see something bigger than just a number that will tell you, "Hey, we found out that the amount of transactions that you're seeing per day is lower than 100." So you want to see a trace or, in my opinion, a trace is more like a story. What happened — like tell me the exact event where this metric was below that 100 or the KPI that I've measured. I want to see what you talked with, which resources were being involved, how long each kind of these operations took, and why or how is it different from being a good trace. Now, when you wanna dive even deeper, so you need to get to your logs. Logs are like the ultimate developer utility to troubleshoot problems. I mean, regardless, what you'll see in metrics or in traces, logs are the core thing that developers put in their code in order to troubleshoot and debug their applications and often, you want to see correlated to one another. So, for example, I want to ask these questions: "How many purchases did I have on my website in the specific day?" Now we'll see there is a spike, or the opposite of a spike, some down in their registration. It's probably going to be because of a problem. So I want to see all the traces that correlate to these specific events, and I want to see all the logs that correlates to the traces that have found to this metric. So all three are connected to each other and all this observability, which is a nice buzzword, it's a just a translation of being able to monitor our business in production. That's for me, the things logs, m...

...more
View all episodesView all episodes
Download on the App Store

Serverless ChatsBy Jeremy Daly & Rebecca Marshburn

  • 5
  • 5
  • 5
  • 5
  • 5

5

29 ratings