
Sign up to save your podcasts
Or
In this episode of the Hybrid Cloud Forecast series, Andre talks to Rama Akkiraju, IBM Fellow: CTO, AIOps. They delve deeper into the principles of Operations - that if everything was done right - the systems should be self-healing, self-monitoring and self-managing. IT management is to observe everything, then detect when problems occur, or predict before they occur, diagnose and fix them. Rama talks about reactive and proactive incident management - one is about putting out fires after they have been started, and the other about predicting the fire, but the best way is to prevent the problem altogether. Andre and Rama talk about how to use AI to detect anomalous behavior, and the dilemma of SRE's while they look at data coming from many different sources such as logs, metrics to decide which ones are actionable and which ones to ignore. Rama also discusses the role of AI in IT management such as grouping log anomalies, prioritizing them, and suggesting solutions based on past actions, with the option for the human in the loop to further fine tune the models.
Art by Jake Volz.
5
22 ratings
In this episode of the Hybrid Cloud Forecast series, Andre talks to Rama Akkiraju, IBM Fellow: CTO, AIOps. They delve deeper into the principles of Operations - that if everything was done right - the systems should be self-healing, self-monitoring and self-managing. IT management is to observe everything, then detect when problems occur, or predict before they occur, diagnose and fix them. Rama talks about reactive and proactive incident management - one is about putting out fires after they have been started, and the other about predicting the fire, but the best way is to prevent the problem altogether. Andre and Rama talk about how to use AI to detect anomalous behavior, and the dilemma of SRE's while they look at data coming from many different sources such as logs, metrics to decide which ones are actionable and which ones to ignore. Rama also discusses the role of AI in IT management such as grouping log anomalies, prioritizing them, and suggesting solutions based on past actions, with the option for the human in the loop to further fine tune the models.
Art by Jake Volz.