Share Machine Learning with Coffee
Share to email
Share to Facebook
Share to X
By Gustavo Lujan
The podcast currently has 9 episodes available.
We introduce the concept of a perceptron as the basic component of a neural network. We talk about how important is to understand the concept of backpropagation applied to a single neuron.
We discuss Independent Component Analysis as one of the most popular and robust techniques to decompose mixed signals. ICA has important applications in audio processing, video, EEG and in many datasets, which present very high multicollinearity.
We present 3 clustering algorithms which will help us detect anomalies: DBSCAN, Gaussian Mixture Models and K-means. These 3 algorithms are very popular and basic but have passed the test of time. All these algorithms have many variations which try to overcome some of the disadvantages of the original implementation.
Anomaly detection is not something recent, techniques have been around for decades. Control charts are graphs with solid mathematical and statistical foundations which monitor how a process changes over time. They implement control limits which automatically flag anomalies in a process in real-time. Depending on the problem at hand, control charts might be a better alternative to more sophisticated machine learning approaches for anomaly detection.
Adaboost is one of the classic machine learning algorithms. Just like Random Forest and XGBoost, Adaboost belongs to the ensemble models, in other words, it aggregates the results of simpler classifiers to make robust predictions. The main different of Adaboost is that it is an adaptive algorithm, which means that it learns from the misclassified instances of previous models, assigning more weights to those errors and focusing its attention on those instances in the next round.
XGBoost is an open-source software library which has won several Machine Learning competitions in Kaggle. It is based on the principles of gradient boosting, which is based on the ideas of the Leo Breiman, the creator of Random Forest. The theory behind gradient boosting was later formalized by Jerome H. Friedman. Gradient boosting combines weak learners just as Random Forest. XGBoost is an engineering implementation which includes a clever penalization of trees and a proportional shrinking of leaf nodes.
We talk about how Data Science and Machine Learning can help us better understand COVID-19 challenges. In this episode, we go back to the Kaggle website where different institutions, including the White House, have come together to try to analyze more than 45,000 published articles. The task is about answering 10 different questions which will help scientist around the world better understand this new virus and future pandemics.
In this episode I talk about my personal journey, how I became a Data Scientist. I start by talking about how I decided to go to college, what major to choose, how I chose my master’s degree. I talk about my time studying a PhD in Engineering and the most useful classes I took related to machine learning and data science. Finally, I briefly talk about my job experience as Data Scientist.
In this, our first episode, we will define the objective of the show as well as expectations. The show is designed for anyone who is interested in this fascinating world of Machine Learning.
The podcast currently has 9 episodes available.