Categorically, artificial intelligence (AI) can appear be an odd juxtaposition of order and disorder — we direct the AI with algorithms, yet the system produces new insights seemingly magically. This two-part blog unpacks the mysteries of two very different AI techniques: supervised and unsupervised learning. Supervised Learning: The Workhorse of AI Most of the well-known applications of machine learning and computational AI involve supervised learning. The modeler amasses a vast set of existing data (e.g., financial transactions, internet photographs, or the texts of tweets) and a base-level “ground truth” outcome that is already known, perhaps in retrospect or by expensive human investigation. Equipped with any number of computational algorithms, the scientist becomes the “supervisor” whose code trains the model to reproduce, in the lab, the known outcomes with a low probability of error. The models are then deployed to live a happy life scoring credit risk and fraud likelihood, finding pictures of Chihuahuas and muffins, or flagging insulting tweets. Technically, each model computes a probabilistically weighted predicted outcome that we believe to be like those outcomes from the training examples. The state of the art for supervised learning is now well established; you can choose from dozens of comprehensive predictive analytics and neural network packages. Unsupervised Learning: Inferences in the Absence of Outcomes But what if there is no set of “true outcomes” known, or the ones at hand are restricted in quality or quantity? What can machine learning do for us then? This is the domain of the far trickier unsupervised learning, which draws inferences in the absence of outcomes. Good unsupervised learning requires more care, judgement and experience than supervised, because there is no clear, mathematically representable goal for the computer to blindly optimize to without understanding the underlying domain. The Challenge of Outlier Detection A central task within unsupervised modeling is outlier detection: Which examples are most unlike most of their peers? Outlier detection and transaction fraud scoring provide an easy illustration: Which customers request money transfers with patterns substantially different from most of their peers? Which medical providers bill insurance for sets of claims most unlike their peers? Which transactions on an individual payment card are most different from a customer’s usual behaviors? The solution pattern for these tasks is a problem- and domain-specific transformation of the raw data into a quantitative vector space of features — up to now, exactly in line with supervised predictive modeling. This is followed by a more generic mathematical construction to yield a numerical score of the “degree of outlier-ness,” in the absence of ground-truth training outcomes. Because there are far fewer principles, and less didactic instruction and widely available software compared to classic supervised modeling, there are even more analytic “gotchas” requiring deep analytic scientist experience and judgement. Difficulties and considerations in outlier detection include: The need to define a metric or distance. Many techniques require defining a “metric” or “distance” function between pairs of observations. One problem is that the individual components of this feature vector have qualitatively different meanings – how can one balance adding or subtracting apples and oranges, and kumquats and kangaroos?Often this is done ad-hoc or, unfortunately, without intention as the underlying algorithm method assumes a metric. What should be done in the real-life scenario of a combination of quantitative and categorical features? Supervised modeling can often be blissfully ignorant of this problem, since the quantitative optimization with known targets tends to scale and transform each feature automatically, to the degree that it contributes predictive value. In an unsupervised context, an explicit metric will