
Sign up to save your podcasts
Or


CHAPTER 1: What is AI?
You may never directly teach AI yourself, but as we discussed in the Introduction, you participate in the process just about every time you interact with the digital world. You may also be in an organization that is considering if and how to adopt AI tools. These days, it is highly likely that an eager executive will push to "do something with AI" in your organization. These words are music to the ears of vendors who spend big money marketing their products as "powered by AI" whether they are or not. You can add a lot of value by understanding how AI learns so you can ask hard questions and set realistic expectations in your life and for your organization. You can be a big part of the solution by understanding and helping to position potential AI tools in the context of specific problems and human work that's already happening. We'll get into this more in the next chapter but, for now, know that the few AI projects that succeed are the ones that focus hard on context and people up front. Successful AI projects answer the question, "Just because we can, should we?"
"Daddy! I can't get the !@#$% sand out of my shoes!" The tiny voice from the back seat of the car was my three-year-old daughter appropriately vocalizing profanity for the very first time. I was simultaneously horrified, proud, curious, and (let's be honest) amused. Her still-developing intelligence had for the first time understood the perfect context for profanity and nailed it. My wife and I had not specifically taught her to swear when she wasn't able to shake sand out of her little sneakers. She had (unfortunately) heard my wife and me swearing in other situations, none of which involved sand or shoes. She had gathered information from those specific cases and correctly applied it to an entirely new situation with which she had no prior experience.
When we encounter AI that can do something similar, we see intelligence in the machine. We marveled at the unveiling of ChatGPT because the underlying AI could take completely off-the-wall input it had never seen before and come back with a reasonable response in the appropriate context. My daughter had never been strapped into her car seat with a shoe full of unwanted sand, but her developing brain had been exposed to enough unrelated situations to figure out that this was a four-letter-word moment. This is what psychologists call "transduction," a form of reasoning where developing children learn from specific cases they experience and apply their new knowledge to general (new) cases they haven't experienced. Much of machine learning and AI, including the Transformer developed by Google, are conceived to solve general transduction problems, along with a related type of problem called "sequence modeling," discussed in the next paragraph. The Transformer, invented by researchers at Google in 2017 and developed into AI applications in many languages, could similarly encounter an English sentence it had never seen before—such as, "What do I say in German when I am very frustrated because I can't get the sand out of my shoe?"—and come up with "Ich bekomme den @#$% sand nicht aus meinem schuh!"
Here's another example of the human brain at work. Consider this series of words: pine, sauce, crab. What's the next word in the series? If you quickly guess "pie," "Adam's," or "computer," you are using your instinctive powers of reasoning to subconsciously assess the relationship between the first three words and find something they have in common—in this case "apple"—to inform your choice of the next word. You can also puzzle this out through a more deliberate process of elimination using your analytical brain. This may be slower, but it can also lead to the correct answer more often. We're all wired for both instinctual insight and analytical thinking, though individually we often skew one way or the other (Kounios & Beeman, 2015). Your brain is built for instinctual insight, so the more language you're exposed to, the more likely it is that your brain quickly finds a relationship between the first three words in the sequence to use as context to come up with a fourth word. This type of cognition is part of something called "fluency," where pathways in your brain have been trained by repeated exposure to information. Your fluent pathways are strengthened when you subconsciously create a common associate like "apple" between remote associates like "pine," "sauce," and "crab," all words or concepts that don't share an obvious connection. When you make up a mnemonic, such as a silly limerick, to help you remember something, you're using the same underlying cognitive mechanism. We see intelligence when we encounter machines that can mimic sequential insight like this in a general way. Picking what comes next is the type of problem in both psychology and machine learning called "sequence modeling." These are very important problems for humans. Figuring out what happens next, or even the few possibilities that might happen next, is a big part of how we are successful as a species. We are especially impressed when the answer isn't something we would have come up with on our own. Just as in the example with my daughter, the key to intelligence is that the machine, the AI, performs well when it comes across something it hasn't ever encountered before. That general capability sets AI apart from other kinds of computer programs that work under tighter constraints.
Does this mean everything called AI around us is able solve general problems? Nope. Software companies desperately want to take advantage of the excitement over AI by slapping the AI label on their products. But a computer system is not AI just because it follows rules to do useful work, no matter how slick the packaging. Rules are created by looking at a bunch of specific cases, then writing up the logic for what to do in those cases. Think back to the semi-automated sawmill example in the Introduction. That computer system was likely programmed based on an old, expert-authored manual of rules for how to saw a log into valuable lumber. What magic there is comes from the clever detection of the outline of the log in a digital photograph, which is itself based on geometric and mathematical rules for finding the edge of a simple, predictable shape. This is by far the best and most efficient way to solve that particular problem. It would be a waste of time and money to show an AI a bunch of logs and a bunch of lumber and teach it to come up with the right cut pattern. One goal of this book is for you to be able to ask questions and think critically about what does and doesn't deserve to be called AI, and even more important, to assess which kinds of problems are worth the effort and uncertainty that come with AI. Because teaching a machine takes a lot of work, and you usually don't know what you're going to get.
How Machines Learn
How do machines learn? Scientists work hard to use the human brain as a model for learning intelligence. After all, they don't have much else to go on! The starting point for artificial intelligence is informed—at least at a high level—by our understanding of the design of the brain and theories of how we learn.
Your brain is a giant mass of interconnected cells called neurons. But it's more than just a skull full of neuron spaghetti. Neurons are elongated cells that form the wiring of your brain. Each neuron cell listens for a signal from nearby neurons. When the signal gets strong enough, the cell activates and sends its own signal out to its neighboring neurons, propagating patterns of signals through the different parts of your brain. Take, for example, your eyes looking at a brightly lit square of paper, half white, half black. Nerve endings in your eye are excited by nearby light-sensitive cells that pop off a signal. That signal tells your neurons to transmit their own signal, but in a pattern that reflects the pattern of light and dark hitting the back of your eye. The pattern of signals travels down what is effectively a data cable from your eye to your brain.
The signals dump into your brain where the arrangement of neurons isn't just random, but is organized into neighborhoods, or specialized networks, where the neurons in the network are particularly good at specific kinds of signaling. For example, detecting a bright light. These networks are organized into layers that are good at specific kinds of thinking. You can think of the layers as a stack of pancakes, where the top pancake of networked neurons does the simplest task like measuring overall brightness at different grid coordinates. That layer hands the map of what's bright and what's not to the next layer, which detects edges in the image—the outline of the square and the boundary between the black and white sides. Your brain continues this general organization where each layer takes input, uses its network of neurons to process it to some degree, then hands off the result to the next layer (Gazzaniga, 2018). For example, when you look at your dog, your eyes send a bunch of electrical signals representing brightness, contrast, and color to the layer of your brain that is your visual cortex. Your visual cortex takes that input and turns it into signals that it hands off to other layers of the brain that do a specific job. There are layers to store and recall memories ("That's my dog, Lilo"), set off emotions ("I love my big baby girl, Lilo"), create speech ("Come here, big baby"), and move our hands (scratch, scratch, scratch). We've been trained by our repeated experiences of the world around us to recognize, feel love for, interact with, and pet our dog. As we grow and develop as children, we learn to recognize all sorts of animals, like kangaroos and deer, but unless we're living in a zoo, we don't moon over them and scratch their ears. But we can tell them apart from dogs!
Similarly, AI systems are designed to use pretend digital equivalents of neurons, networks, and layers to process information. So far, we've talked about language AI, but there is a whole world of visual AI as well. Take a learning task like figuring out if a picture contains (a) a dog or (b) no dog. A visual AI has a layer that takes in a collection of numbers representing the intensity, color, and position of all the dots (pixels) that together make up a digital photo. That input layer hands the raw data off to the next layer, which figures out what's bright and what's dark, then hands everything off to the next layer which figures out where there is something that humans would recognize as an edge, or line. The next layer figures out which lines are organized into simple shapes. The next layer determines which shapes are important and hands those off to the final layer, which makes a guess as to whether one of the shapes is a dog. Just as in our brains, each layer in the AI doesn't care what the other layers do; it's good at its one task. And just like in our brains, when you put all the layers together, you may get intelligence. Remember "deep learning" from the Introduction? Before 2015, machine learning was done with a single, flat neural network. "Deep" just means you have more than two layers besides the input and output layer. There's no magic number of layers in an AI "brain." You decide how many layers to start with, based on the type of AI and the kind of learning. When you teach a machine from scratch, the only layer you specifically set up is the first input layer. The layers after that aren't set up ahead of time to do anything specific. They all start out as generic collections of digital neurons. A new AI has to learn what to do layer by layer. All this adds up to what gets loosely called an "algorithm."
Layers are as far as we're going to go in terms of AI's internal wiring. There are many wonderful books you can read to delve into the fascinating and beautiful construction of AI algorithms. Or you can ask your favorite AI to explain it to you, though I'd recommend a combination. But for practical purposes, the algorithm is just the starting point. The magic happens when you teach the algorithm to do something truly remarkable.
The General Systems Theory of psychology attempts to explain human behavior by looking at the three main variables of human psychology: biological (hungry), psychological (decide to seek food), and social/behavioral (somebody feed me). If you're a baby, you get the inputs your body needs (food) by controlling your outputs (crying in a tone that means hungry vs uncomfortable from a wet diaper). You output information (crying) to your environment by planning actions to get what you want. In a system, this is called "feed-forward." The actions you plan (time to cry) are based on a guess of the consequences of those actions (Dad feeds me). You run the plan (cry) and compare the actual consequences with what you thought they'd be (did I get fed or not?). This is called feedback. If you didn't get what you wanted (still hungry), you adjust the plan (cry louder), which is using feedback (McConnell, 1989). The teaching of machines, machine learning, is all about infant computer programs going through the cycle of planning actions and guessing consequences (feed-forward), doing the actions, then comparing the actual result to the guessed result (feedback), adjusting if necessary to repeat the cycle (using feedback).
We're going dig into an example of how AI is taught by people and deployed into the real world. Before we get there, it's helpful to understand the general approach to machine learning along with some of the technical terms for key parts of the process and steps that are applied.
First, you need a topic or situation based in the real world. Artificial intelligence, like humans, needs to focus on one thing at a time while learning, so in our AI, we focus on a specific topic to provide loose boundaries. We call this topic the domain. The Google Brain team chose foreign languages as the domain when they were developing and testing their Transformer. In our example, the domain is "dogs." Within the general topic, we go further and articulate a particular problem to solve. This problem is called the task. Our task is "Decide if a picture has a dog in it, or not." Next is a definition of success. I can't overstate how critical it is to decide on and define the successful outcome we want ahead of time. Your measure of success is called the metric. Recall that Google's Transformer was first taught to pass a longstanding standardized test of English-to-German and English-to-French language translation. This was their metric, or measure of success. With AI, you're teaching a machine to approximate or augment a cognitive process that only a human can do, so you or your organization MUST understand baseline human performance and articulate ahead of time what success looks like for the AI. Your measure of success is called the metric. The AI doesn't have to "beat the human" like the chess-playing computers from the 1990s. It's enough to set a standard that the AI helps a human to accomplish faster. Our example metric is "Find more than 90% of the dog pictures." This metric is the critical educational outcome that guides how you teach the machine. The next step is to procure the equivalent of a textbook for AI training: enough relevant data for the lesson. How much data is enough? Enormous, truly huge volumes of data are required to successfully teach AI. You need to start with every scrap of data relevant to your problem that you can beg, borrow, or steal (not really). It will likely still not be "enough." This is why the most successful AI research, and the most successful AI products, come from huge companies that spend billions and decades collecting our data. We call this the training data. Our example data set is six thousand family photos, some with the dog, some without. Your AI will study the data you give it, reading or looking at it over and over. The sum of what it learns during this process depends on the volume and quality of data you provide. The data has to be described and characterized by humans so you know the answers ahead of time, just like an instructor's answer key in a textbook. We call this labeling. For our example, three different veterinarians each looked at all six thousand pictures and labeled each "dog" or "no dog." The final step in preparing to teach is design of the empty, untrained brain of the AI. What kind of brain? How many layers? How do the layers talk with each other? This is called the model architecture. We choose Residual Network, since it's a well-tested architecture for image recognition. You can treat it as a black box, so we won't go into more detail.
Now you teach! You organize a repeated series of lessons and quizzes where the AI does the feed-forward part of learning. It uses its untrained brain to look at a randomly selected set of half of the dog pictures. This half of the total data set is called the training data. The AI does the task of predicting the right answer (dog or no dog) and then takes a quiz where you check its prediction against the human labels. After each quiz, you use a computer program to give the AI feedback on what it got right and wrong. It uses the feedback to adjust how its brain re-reads the data and comes up with answers (a mathematical process called gradient descent). You repeat for potentially hundreds of cycles of training so the different layers in the AI brain learn to do specific tasks, much as the layers in a baby's brain learn their job in the larger task of recognizing animals or getting someone to feed them. During the repeated cycles of training, the AI develops an equivalent to fluency from repetition and by learning hidden gems like how a common associate such as the combination of dark nose and round eye shape ties together remote associates like German shepherd, pug, and beagle. You stop the cycle of lessons and quizzes when the AI gets a good score a few times in a row, better than ninety percent correct on our metric, and it's clear it isn't learning anymore. This repeated good score is called convergence. If you want to impress someone when they are bragging about their AI, ask them, "How many training cycles before convergence?"
Now for the final exam. The AI reads the other half of data it's never seen before, called the testing data set, and does the task—just once for this other half of the data, called the testing data set. Remember, the data is labeled by experts, so you know the answers to the test. If the AI passes the test and hits your predetermined metric of correctly identifying ninety percent of the dog pictures, it gets a good grade, and you celebrate! This final exam is the proof that AI can learn enough from a specific case where it has access to the answers (training) and then successfully generalize to a case it's never seen before where you know the answer but it does not (testing).
Much like a new graduate, your AI now has theoretical knowledge but hasn't been out in the real world where it really counts. The really hard part of this process is launching your newly trained AI out into the real world (deployed), but in a way that allows it to continue learning safely. The task may be low stakes like identifying birds from their songs and relatively easy to deploy or high stakes like pointing out bone fractures on x-rays and relatively hard to deploy. No matter, training is just the first step before figuring out how to get your AI from the classroom into the real world (deployed). You'd think after all that, you'd be done. Unfortunately, newly graduated AI is destined to fail unless it is deployed in a way that allows it to continue learning on the job because it is simply impossible for your training to include every possible scenario, or combination of data. Remember, the whole point of AI is that it can do good work when it encounters things it's never seen before. Much like a well-educated person, AI that keeps learning on the job can use its training, and now experience, to solve problems in a changing environment (continual learning).
Now we'll use a real example to review the terms domain, task, metric, data, labeling, training, gradient descent, convergence, testing, deploying, and continual learning
I recently visited Iceland for the first time. On our way through the glacial areas of the Southwest we went on a hike from a barren, regularly flooded volcanic plain into an older, sheltered valley with plenty of trees. Songbirds suddenly appeared and chirped their hearts out as soon as we got into trees that were more than waist high. Iceland has plenty of trees, but they rarely grow more than five or six feet tall due to the heavy wind and wild swings in the amount of daylight, from twenty-four hours of light in the summer to twenty four-hours of dark in the winter. The rapid appearance of birds made me aware of the absence of birdsong everywhere else in Iceland, something I take for granted as background noise living in the U.S. mid-Atlantic region. So I got curious about birdsong and remembered hearing about an AI-powered birdwatching app called Merlin. Merlin is the result of a wonderful citizen-scientist collaboration at Cornell University. The coolest part of the app is an AI feature called "Sound ID" that can identify more than four hundred and fifty bird species in the U.S. and Canada alone from brief recordings you make of the world around you. The goal of the Merlin team was to capture the knowledge and expertise of a relatively few expert birdwatchers and share it with as many people as possible so they may also learn how to identify the birds around them. Think back to our historical precedents for AI: the invention of writing by the ancient Sumerians and the invention of the modern printing press by Johannes Gutenberg. Before writing and printing, a birdwatcher, or more likely a bird hunter, could teach at most a few other people to track birds by verbally describing what to listen for: "If you hear a repeated metallic chirp followed by a sort of up-and-down trilling, it's a bunting." Writing, then printing, and by extension the internet captured that knowledge so that many more could benefit, and a few centuries later, aspiring birdwatchers could listen to audio recordings, and then go stand in the backyard and try to pick out individual birds from the cacophony of birdsong around them. Now, AI in the form of Merlin puts the expertise of some of the most accomplished bird experts in the world in your pocket. It walks you through each call you're hearing and helps you learn what bird it belongs to. Merlin is used by hundreds of thousands of people, many of whom, despite the birdwatching books on their shelves, were unlikely to learn to identify birds without it. So let's take a look at how Merlin came to be.
Birds, like humans, are lifelong vocal learners (as are dolphins and bats). As chicks, they learn from their parents to both vocalize and understand sounds as chicks and they keep learning for the rest of their lives. Researchers discovered that birds use a form of cognitive language—their equivalent to words, grammar, and phrases—as more than communications signals. Birds will adjust and respond to changes in the order of chirps and warbles, which we anthropomorphize as grammar. In their own unique way, they will respond to minute changes in very high frequency parts of birdsong (Fishbein et al., 2019). Ornithologists, immersed in the study of birdsongs and bird language curated and labeled sound recordings and made them available to the public on the internet. Artificial intelligence researchers love freely available data that has already been characterized or labeled by experts because they can use it to train AI. Even more, AI researchers love language data in any form because its intricacies help to drive new discoveries, often relevant to cognition—the goal of AI. So AI researchers at the University of California San Diego doing early work with pictures of birds from the internet were thrilled when ornithologists at Cornell contacted them and invited them to check out the huge and growing collection of birdsong recordings at Cornell's Macaulay Library—at the time of this writing, more than 1,300 species (Galchen, 2024).
The scientists worked together to choose a domain—the topic for the AI—which in this case was bird vocalizations. The task for the AI—the particular problem it needed to solve—was identifying the bird that made a particular sound. The metric, or how they'd know if they got it right, was precision—how often the AI thought it was right and it actually was. The data were the one million recordings of birds in the Macaulay Library, many contributed by amateur bird watchers for research like this. They also included recordings of sounds you might hear together with birdsong out in the world, like wind, cars honking, and dogs barking. The labels—the answer key—were details added to each bird recording by citizen scientists (amateur birdwatchers) and expert ornithologists. The labeled data were divided into two halves, with the first half dedicated to training and the second to testing. The researchers chose a deep learning model architecture for the untrained brain called a residual network, a model architecture known for its flexibility.
The AI was trained by being made to "listen" to each type of bird to classify its species hundreds and hundreds of times, sometimes with background noises thrown in. The Merlin AI doesn't actually "listen" to anything. Much of the information contained in an audio recording of a bird singing (the level or volume of sound at different frequencies over a period of time) can also be represented in visual form as something called a spectrogram, and this is what the AI learned to recognize. You see spectrograms in movies and on TV when producers want to show you "sound waves." Children of the 1960s saw a crude spectrogram on Lost In Space when The Robot spoke, kids of the 1980s saw KITT's red fluctuating speech lights on the dashboard in Knight Rider, and millennials watching Futurama saw Bender the robot's crass speech mirrored in the wiggly lines of his mouth. So Merlin represents sound as a spectrogram image when it learns.
Each training cycle was followed by a quiz to see if the AI correctly identified the bird from its song. After each quiz, the AI was given feedback on how well it did. As it was trained, the AI used something called a "gradient descent calculation" to adjust the layers of its brain to optimize its learning. "Gradient" means the direction to adjust a layer to reduce errors. "Descent" means how much to adjust a layer. The AI went through cycles of training until it converged on a final level of performance (meaning it had learned all it could and wasn't getting any better). The trained AI was then tested with the other half of the labeled data it had not seen before to measure its precision (the final exam). Good news for budding amateur birdwatchers: It passed! The Merlin AI fits our definition of AI because it is a computer system taught by humans to do something no single human is likely capable of—recognize the unique song of thousands of birds worldwide.
The Merlin AI team then worked with app developers to deploy the AI into the Merlin app and set up the Merlin AI to continue to collect data and adjust its performance—what we call "continual learning"—based on the feedback of its users. If you use Merlin and give it feedback, then you teach the (Merlin) machine!
A Word About Data
We tend to trust knowledge and expertise when we have a sense that nobody's hiding anything. Our human educational system is built on a trusted combination of transparency, credentialling, and standardized evaluation. When someone is a trained, credentialed middle school science teacher, we generally know what to expect within a real-life range of ability. When a university professor teaches statistics or history to graduate students, the curriculum is overseen by a standards committee, the syllabus is almost always public, and the textbook or reading material is broadly published and available. You'll note that both the Merlin Sound ID AI and Google Brain's Transformer were trained using publicly available, well understood data. Both went on to have an impact on our world. That's not a coincidence. The best performing and most impactful AI will always come from transparent information. Would you accept a human teacher in your kids' school who used secret-sauce teaching materials that only they had knowledge of? Would you hire an expert who graduated from a university that used its own confidential "proprietary" textbooks and refused to be accredited by a third party?
AI is good and getting better at capturing human knowledge and approximating cognition, or thinking. It's good at breaking down bottlenecks and barriers to the use of expert knowledge by more people. It's also only as capable as we make it, since it's derived from the data in our world and the standard of "capable" set directly by us or indirectly by participating in digital systems where our judgment is captured. We trust AI when we trust the data it learned from, and we trust AI is "right" based on our own judgment or the impartial judgment of experts we trust. But this trust is not a given. A big part of teaching AI is selecting good data, finding ways to identify and ignore bad data, and then representing the data in a way that preserves the information we care about. There are whole fields of study and professions focused on these topics. If you're curious, look up "ground truth data" and "representation learning" to learn more.
The Language Of AI: Demystifying Jargon
As we begin incorporating AI into our lives, it's important to understand key terminology and to recognize the perils of advertising and marketing, gaslighting, and hype. For example, ChatGPT is AI, but not all AI is ChatGPT, as much as OpenAI would love for you to believe this. What follows is a glossary of AI terms that get thrown around a lot, some of which you have already encountered in this book. I provide a commonsense, nontechnical explanation for each, and I encourage you to look up the terms that seem most important to you elsewhere—both for more detailed information and to understand how these technical concepts relate to each other. In fact, I strongly encourage you to learn as much about AI as you possibly can. You teach the machines.
But first a few words of advice. When you do a web search for some AI jargon, put the word "intuition" at the end. When I was in graduate school for computer science, I learned to ask professors to help me understand the intuition behind complex mathematical and computational concepts. This gave them room to separate hard facts and mathematical truth from the "gist," the overarching, big picture, human-relatable concept. As much as possible, start with well-referenced or primary sources before turning to AI, if you use AI at all. For scientific and mathematical explanations, I often start with Wikipedia, as it's been hand-curated by people and experts over many years to be an accurate and useable reference. Please donate to Wikipedia at donate.wikimedia.org because the non-profit organization behind this website works tirelessly to empower hundreds of thousands of citizen experts to curate and fact-check knowledge. In return, all this freely given expertise is scraped off the Wikipedia website and used to train proprietary AI by the biggest, most profitable companies in the world. For those concerned with possible bias in Wikipedia articles, the organization offers an essay, "Wikipedia: Guide to Addressing Bias." However, Wikipedia cautions that the essay itself should be read with healthy skepticism.
Reddit is another helpful source because it captures the interaction and reasoning of its human contributors, with an up- or down-vote that can promote accurate information and demote baloney. But be brave and try to read primary scientific papers even if you don't understand most of what's presented. Another thing I've learned is that you can pick up important points from computer science, math, and other scientific papers without understanding all the details. You may be surprised. Plenty of "experts" don't understand what's in a paper the first time they read it. You can also learn about the progress of AI over time by seeing what papers are "highly cited" or referenced by other papers.
The point of further reading is to develop a sense of the field of AI. Know what you don't know. You may never become an expert, but this kind of reading can help you develop an intuitive sense of what is real vs hype, "magic" vs sleight of hand, distraction vs threat. Given the stakes and what's to come for our society and economy, a good bullshit detector is priceless.
I hope the definitions that follow are a helpful start at cutting through often overwhelming jargon and powering up your BS detector. Some of these terms and concepts appear earlier in the book, but since many of these topics are fairly abstract and complex, reading a more detailed explanation, along with additional examples, can be helpful. This is by no means a complete glossary, and the explanations are my own, based on study, work, and research in the field. They are intended to be conceptually and intuitively helpful, not thorough technical documentation. Please use this brief glossary as a starting point, and build on what's here by doing your own further reading and research.
The definitions, rather than appearing in alphabetical order, are organized in such a way that the terms logically follow one another.
Definitions
The real definition of "algorithm" is a series of readily explainable mathematical instructions or formulas used to solve a problem. The equations and formulas of geometry are examples of actual algorithms. The circumference of a circle is two times its radius multiplied by the constant value of pi. C = 2πr. When it comes to AI, social media started with simple algorithms, initially based on your social network—people you connected with on the app. A lot has changed since then. In Meta's own words, "We began with manual feature engineering for small models and progressed to building hundreds of deep neural network models with trillions of parameters" (Meta, 2023). What exists now is possibly the most powerful, nonexplainable artificial intelligence directed at understanding and changing human behavior outside classified government surveillance. I refer to the artificial intelligence in social media as "my algorithm" when it shows me a video of a puppy, an advertisement for a powerful flashlight, and a political message that evokes an emotional response.
Social media companies are likely happy we call their AI systems "algorithms" because it's a less threatening and more marketable word. In the world of social media, artificial intelligence continually learns how to "engage" you, your parents, and your kids. "Engage" is a euphemism for "attract and hold your attention." The business model of every social media company is to "monetize engagement"—in other words, sell two things: advertisements and data about you. Your "algorithm" in reality is a personal artificial intelligence that knows how to hook you and keep you using social media for as long as possible. It continually learns what will attract your attention from behavioral data it collects directly, and also that it is receiving from every other app, website, navigation system, payment service, and physical business you give your email, phone number, or tracking cookie. You teach your own social media machine. And all this happens without the social media company really understanding how their artificial intelligence hooks you. They only care that it does. In my own life, what social media companies call engagement, I experience as addiction. I can't open up Instagram without getting sucked in. Before I know it, my "screen time" is up over four hours per day, a level of exposure researchers have found linked to increased symptoms of anxiety and depression (Zablotsky et al., 2024). Using "algorithm" to describe my social media AI is like using "vape" to describe a highly optimized electronic nicotine delivery device wrapped in child-friendly, colorful packaging, sold by companies that don't care how nicotine interacts with our brain to reinforce dependence, only that it does. End rant.
"Wait, didn't you say there was AI in my car? But my car doesn't have eyes. How does it collect and use data?" Automotive AI, and just about every AI system that can respond to the physical world in real time (robots), uses sensors like cameras and accelerometers, along with computers on board that convert the images and brake force readings from unstructured to structured data that is then handed off to the AI. Automotive AI is initially taught using recordings made during millions of hours of cars driving around. That's part of what Google and others are doing when they send specially equipped camera cars to drive through your neighborhood. True story: It's gone now, but an early capture of my house by a Google Street View camera car showed my brother-in-law crouched between two parked cars acting like a monster about to pounce. He saw the camera car coming and hid between the cars. My brother-in-law created something called "noise" or an "outlier" in the data collected by Google, assuming he was the only random person pretending to be a monster that day. In addition to 360-degree cameras, these cars may also be equipped with radar and other sensors that capture distances, physical shapes, and motion, which can then be combined with the pictures to teach AI about the driving environment of our neighborhood.
The weight of a child is equal to ten times their age, plus or minus a few pounds.
The observations that make up the model are (a) that age is important to weight, (b) ten is the multiple that'll give you weight from age, and (c) we can expect a few pounds of error. Artificial intelligence models are learned from much more complex data (e.g., all the text ever published online) and contain many more observations, called "parameters"—sometimes into the tens of millions or billions—but a similar principle still holds. However, with both our fictional weight model and with artificial intelligence models, you have to remember that the model is only as useful as what it has "seen" before. The model learns parameters (observations) by example from the data it was given. To complicate matters further, AI parameters are never something understandable like "age." It's hard to think about, but the parameters (observations) in an artificial intelligence model are completely… artificial. Because of the complexity involved, an AI model is not explainable in human terms; we treat it as a black box, a system where the precise internal workings are not known.
Before you get started, you decide that if the machine can guess right more than three out of four times, it's a success. Then you train. You randomly select half of your data, give it to the machine, but hold back the answer of whether the fishing was good at each location. The machine guesses if the fishing was good, you tell it whether it got it right or wrong. It changes how it guesses and the cycle repeats. As it does a better job of guessing if the fish are biting, it creates its own secret parameters (observations) of what makes for good fishing. Eventually, it doesn't get any better at guessing and you record the parameters as your new fishing model. You've completed the training part of machine learning. Now you test the trained model by giving it the other half of the tide, time, and location data it's never seen before. Again, you know the right answer to whether the fishing is good. If the model (machine) guesses right at least three times out of four, you've successfully taught the machine. You've completed the testing part of machine learning. Importantly, somewhere in the layers of its neural network, the machine learned something you couldn't from all that data, some hidden factors nestled behind the data you do have. You'll never know what those factors are, but your new black box model does a pretty good job of guessing. Now you can use it to build a fish forecasting system that takes in water temperature, tide level, date, time, and location and tells a fisherman whether it's worth it to go out.
I like to eat ice
it responds with
cream
That's because in all the text used to train ChatGPT, "I like to eat ice cream" has the highest attention score. In other words, it is the most likely combination. Behind ChatGPT is a pre-trained large language model that in concept contains all of the words in the English language, together with the degree of likelihood that each word will be the next to come after the words before it. You can test this by prompting ChatGPT with the same words but re-arranged into a plain list of words without correct grammar. For example, when you type
what is the most likely word to come next in the sequence "eat like to ice"
ChatGPT responds with
cream
This is the core behavior of generative AI. It can get a lot more complicated, but the principle is the same. The "G" in ChatGPT stands for Generative. Note: Not all AI is generative. Another important type of AI is "bidirectional."
what is the meaning of the word "dog" in the sentence "the car broke down and it got so hot the dog let off steam"
it responds with
In this context, "dog" is slang for something that is of poor quality or unreliable.
On the other hand, at the time of writing ChatGPT replies to the same prompt with
the word "dog" most likely refers to an actual dog—as in the animal.
In this example, you can see that the bidirectional AI (Gemini) was better at picking up the semantics of the word "dog" based on the context of the sentence. This is because the generative AI (ChatGPT) was trained on language data that contained way more mentions of dog the animal than dog the slang word, and it made a prediction based on what it deemed most likely to come next (mentions of dog, the animal).
Exercises: Try It Out
Notice how the response changes when you give it more to pay attention to. Notice that when it doesn't have much to go on, it responds with a question. This is the interactive nature of chat-based AI. A friend and colleague described working with chat-based AI as like having an eager but inexperienced intern. A good AI intern wants to get it right, so asks a lot of questions to be sure it's heading in the right direction. This is deliberate, and a good characteristic. Contrast this with the certainty of Google's AI Overview response. Under what circumstances would you prefer one over the other?
References
Fishbein, Adam R., William J. Idsardi, Gregory F. Ball, & Robert J. Dooling, 2019. Sound Sequences in Birdsong: How Much Do Birds Really Care? Philosophical Transactions of the Royal Society B. The Royal Society Publishing. (Retrieved on April 19, 2025, from https://royalsocietypublishing.org/doi/10.1098/rstb.2019.0044)
Galchen, Rivka, 2024. How Scientists Started to Decode Birdsong. The New Yorker, October 14.
Gazzaniga, Michael S., 2018. The Consciousness Instinct: Unraveling the Mystery of How the Brain Makes the Mind. Farrar, Straus and Giroux.
Kounios, John, & Mark Beeman. 2015. The Eureka Factor: Aha Moments, Creative Insights, and the Brain. Independently Published.
McConnell, James V., 1989. Understanding Human Behavior. (6th Ed.). Holt, Rinehart, and Winston.
Meta, 2023. New AI Advancements Drive Meta's Ads System Performance and Efficiency. Meta. (Retrieved on April 7, 2025, from https://ai.meta.com/blog/ai-ads-performance-efficiency-meta-lattice/)
West, Jack, Lea Thiemt, Shimaa Ahmed, et al., 2024. A Picture Is Worth 500 Labels: A Case Study of Demographic Disparities in Local Machine Learning Models for Instagram and TikTok. University of Wisconsin‒Madison. ARXiv.org. (Retrieved on April 19, 2025, from https://arxiv.org/pdf/2403.19717)
Wikipedia, n.d. Wikipedia: Guide to Addressing Bias. (Retrieved on May 7, 2025, from Wikipedia:Guide to addressing bias - Wikipedia)
Zablotsky, Benjamin, Basilica Arockiaraj, Gelila Haile, & Amanda Ng, 2024. Daily Screen Time Among Teenagers: United States, July 2021‒December 2023. (Retrieved on April 7, 2025, from Centers for Disease Control and Prevention, National Center for Health Statistics. Products - Data Briefs - Number 513 -October 2024)
By Jeff Pennington and MJ PenningtonCHAPTER 1: What is AI?
You may never directly teach AI yourself, but as we discussed in the Introduction, you participate in the process just about every time you interact with the digital world. You may also be in an organization that is considering if and how to adopt AI tools. These days, it is highly likely that an eager executive will push to "do something with AI" in your organization. These words are music to the ears of vendors who spend big money marketing their products as "powered by AI" whether they are or not. You can add a lot of value by understanding how AI learns so you can ask hard questions and set realistic expectations in your life and for your organization. You can be a big part of the solution by understanding and helping to position potential AI tools in the context of specific problems and human work that's already happening. We'll get into this more in the next chapter but, for now, know that the few AI projects that succeed are the ones that focus hard on context and people up front. Successful AI projects answer the question, "Just because we can, should we?"
"Daddy! I can't get the !@#$% sand out of my shoes!" The tiny voice from the back seat of the car was my three-year-old daughter appropriately vocalizing profanity for the very first time. I was simultaneously horrified, proud, curious, and (let's be honest) amused. Her still-developing intelligence had for the first time understood the perfect context for profanity and nailed it. My wife and I had not specifically taught her to swear when she wasn't able to shake sand out of her little sneakers. She had (unfortunately) heard my wife and me swearing in other situations, none of which involved sand or shoes. She had gathered information from those specific cases and correctly applied it to an entirely new situation with which she had no prior experience.
When we encounter AI that can do something similar, we see intelligence in the machine. We marveled at the unveiling of ChatGPT because the underlying AI could take completely off-the-wall input it had never seen before and come back with a reasonable response in the appropriate context. My daughter had never been strapped into her car seat with a shoe full of unwanted sand, but her developing brain had been exposed to enough unrelated situations to figure out that this was a four-letter-word moment. This is what psychologists call "transduction," a form of reasoning where developing children learn from specific cases they experience and apply their new knowledge to general (new) cases they haven't experienced. Much of machine learning and AI, including the Transformer developed by Google, are conceived to solve general transduction problems, along with a related type of problem called "sequence modeling," discussed in the next paragraph. The Transformer, invented by researchers at Google in 2017 and developed into AI applications in many languages, could similarly encounter an English sentence it had never seen before—such as, "What do I say in German when I am very frustrated because I can't get the sand out of my shoe?"—and come up with "Ich bekomme den @#$% sand nicht aus meinem schuh!"
Here's another example of the human brain at work. Consider this series of words: pine, sauce, crab. What's the next word in the series? If you quickly guess "pie," "Adam's," or "computer," you are using your instinctive powers of reasoning to subconsciously assess the relationship between the first three words and find something they have in common—in this case "apple"—to inform your choice of the next word. You can also puzzle this out through a more deliberate process of elimination using your analytical brain. This may be slower, but it can also lead to the correct answer more often. We're all wired for both instinctual insight and analytical thinking, though individually we often skew one way or the other (Kounios & Beeman, 2015). Your brain is built for instinctual insight, so the more language you're exposed to, the more likely it is that your brain quickly finds a relationship between the first three words in the sequence to use as context to come up with a fourth word. This type of cognition is part of something called "fluency," where pathways in your brain have been trained by repeated exposure to information. Your fluent pathways are strengthened when you subconsciously create a common associate like "apple" between remote associates like "pine," "sauce," and "crab," all words or concepts that don't share an obvious connection. When you make up a mnemonic, such as a silly limerick, to help you remember something, you're using the same underlying cognitive mechanism. We see intelligence when we encounter machines that can mimic sequential insight like this in a general way. Picking what comes next is the type of problem in both psychology and machine learning called "sequence modeling." These are very important problems for humans. Figuring out what happens next, or even the few possibilities that might happen next, is a big part of how we are successful as a species. We are especially impressed when the answer isn't something we would have come up with on our own. Just as in the example with my daughter, the key to intelligence is that the machine, the AI, performs well when it comes across something it hasn't ever encountered before. That general capability sets AI apart from other kinds of computer programs that work under tighter constraints.
Does this mean everything called AI around us is able solve general problems? Nope. Software companies desperately want to take advantage of the excitement over AI by slapping the AI label on their products. But a computer system is not AI just because it follows rules to do useful work, no matter how slick the packaging. Rules are created by looking at a bunch of specific cases, then writing up the logic for what to do in those cases. Think back to the semi-automated sawmill example in the Introduction. That computer system was likely programmed based on an old, expert-authored manual of rules for how to saw a log into valuable lumber. What magic there is comes from the clever detection of the outline of the log in a digital photograph, which is itself based on geometric and mathematical rules for finding the edge of a simple, predictable shape. This is by far the best and most efficient way to solve that particular problem. It would be a waste of time and money to show an AI a bunch of logs and a bunch of lumber and teach it to come up with the right cut pattern. One goal of this book is for you to be able to ask questions and think critically about what does and doesn't deserve to be called AI, and even more important, to assess which kinds of problems are worth the effort and uncertainty that come with AI. Because teaching a machine takes a lot of work, and you usually don't know what you're going to get.
How Machines Learn
How do machines learn? Scientists work hard to use the human brain as a model for learning intelligence. After all, they don't have much else to go on! The starting point for artificial intelligence is informed—at least at a high level—by our understanding of the design of the brain and theories of how we learn.
Your brain is a giant mass of interconnected cells called neurons. But it's more than just a skull full of neuron spaghetti. Neurons are elongated cells that form the wiring of your brain. Each neuron cell listens for a signal from nearby neurons. When the signal gets strong enough, the cell activates and sends its own signal out to its neighboring neurons, propagating patterns of signals through the different parts of your brain. Take, for example, your eyes looking at a brightly lit square of paper, half white, half black. Nerve endings in your eye are excited by nearby light-sensitive cells that pop off a signal. That signal tells your neurons to transmit their own signal, but in a pattern that reflects the pattern of light and dark hitting the back of your eye. The pattern of signals travels down what is effectively a data cable from your eye to your brain.
The signals dump into your brain where the arrangement of neurons isn't just random, but is organized into neighborhoods, or specialized networks, where the neurons in the network are particularly good at specific kinds of signaling. For example, detecting a bright light. These networks are organized into layers that are good at specific kinds of thinking. You can think of the layers as a stack of pancakes, where the top pancake of networked neurons does the simplest task like measuring overall brightness at different grid coordinates. That layer hands the map of what's bright and what's not to the next layer, which detects edges in the image—the outline of the square and the boundary between the black and white sides. Your brain continues this general organization where each layer takes input, uses its network of neurons to process it to some degree, then hands off the result to the next layer (Gazzaniga, 2018). For example, when you look at your dog, your eyes send a bunch of electrical signals representing brightness, contrast, and color to the layer of your brain that is your visual cortex. Your visual cortex takes that input and turns it into signals that it hands off to other layers of the brain that do a specific job. There are layers to store and recall memories ("That's my dog, Lilo"), set off emotions ("I love my big baby girl, Lilo"), create speech ("Come here, big baby"), and move our hands (scratch, scratch, scratch). We've been trained by our repeated experiences of the world around us to recognize, feel love for, interact with, and pet our dog. As we grow and develop as children, we learn to recognize all sorts of animals, like kangaroos and deer, but unless we're living in a zoo, we don't moon over them and scratch their ears. But we can tell them apart from dogs!
Similarly, AI systems are designed to use pretend digital equivalents of neurons, networks, and layers to process information. So far, we've talked about language AI, but there is a whole world of visual AI as well. Take a learning task like figuring out if a picture contains (a) a dog or (b) no dog. A visual AI has a layer that takes in a collection of numbers representing the intensity, color, and position of all the dots (pixels) that together make up a digital photo. That input layer hands the raw data off to the next layer, which figures out what's bright and what's dark, then hands everything off to the next layer which figures out where there is something that humans would recognize as an edge, or line. The next layer figures out which lines are organized into simple shapes. The next layer determines which shapes are important and hands those off to the final layer, which makes a guess as to whether one of the shapes is a dog. Just as in our brains, each layer in the AI doesn't care what the other layers do; it's good at its one task. And just like in our brains, when you put all the layers together, you may get intelligence. Remember "deep learning" from the Introduction? Before 2015, machine learning was done with a single, flat neural network. "Deep" just means you have more than two layers besides the input and output layer. There's no magic number of layers in an AI "brain." You decide how many layers to start with, based on the type of AI and the kind of learning. When you teach a machine from scratch, the only layer you specifically set up is the first input layer. The layers after that aren't set up ahead of time to do anything specific. They all start out as generic collections of digital neurons. A new AI has to learn what to do layer by layer. All this adds up to what gets loosely called an "algorithm."
Layers are as far as we're going to go in terms of AI's internal wiring. There are many wonderful books you can read to delve into the fascinating and beautiful construction of AI algorithms. Or you can ask your favorite AI to explain it to you, though I'd recommend a combination. But for practical purposes, the algorithm is just the starting point. The magic happens when you teach the algorithm to do something truly remarkable.
The General Systems Theory of psychology attempts to explain human behavior by looking at the three main variables of human psychology: biological (hungry), psychological (decide to seek food), and social/behavioral (somebody feed me). If you're a baby, you get the inputs your body needs (food) by controlling your outputs (crying in a tone that means hungry vs uncomfortable from a wet diaper). You output information (crying) to your environment by planning actions to get what you want. In a system, this is called "feed-forward." The actions you plan (time to cry) are based on a guess of the consequences of those actions (Dad feeds me). You run the plan (cry) and compare the actual consequences with what you thought they'd be (did I get fed or not?). This is called feedback. If you didn't get what you wanted (still hungry), you adjust the plan (cry louder), which is using feedback (McConnell, 1989). The teaching of machines, machine learning, is all about infant computer programs going through the cycle of planning actions and guessing consequences (feed-forward), doing the actions, then comparing the actual result to the guessed result (feedback), adjusting if necessary to repeat the cycle (using feedback).
We're going dig into an example of how AI is taught by people and deployed into the real world. Before we get there, it's helpful to understand the general approach to machine learning along with some of the technical terms for key parts of the process and steps that are applied.
First, you need a topic or situation based in the real world. Artificial intelligence, like humans, needs to focus on one thing at a time while learning, so in our AI, we focus on a specific topic to provide loose boundaries. We call this topic the domain. The Google Brain team chose foreign languages as the domain when they were developing and testing their Transformer. In our example, the domain is "dogs." Within the general topic, we go further and articulate a particular problem to solve. This problem is called the task. Our task is "Decide if a picture has a dog in it, or not." Next is a definition of success. I can't overstate how critical it is to decide on and define the successful outcome we want ahead of time. Your measure of success is called the metric. Recall that Google's Transformer was first taught to pass a longstanding standardized test of English-to-German and English-to-French language translation. This was their metric, or measure of success. With AI, you're teaching a machine to approximate or augment a cognitive process that only a human can do, so you or your organization MUST understand baseline human performance and articulate ahead of time what success looks like for the AI. Your measure of success is called the metric. The AI doesn't have to "beat the human" like the chess-playing computers from the 1990s. It's enough to set a standard that the AI helps a human to accomplish faster. Our example metric is "Find more than 90% of the dog pictures." This metric is the critical educational outcome that guides how you teach the machine. The next step is to procure the equivalent of a textbook for AI training: enough relevant data for the lesson. How much data is enough? Enormous, truly huge volumes of data are required to successfully teach AI. You need to start with every scrap of data relevant to your problem that you can beg, borrow, or steal (not really). It will likely still not be "enough." This is why the most successful AI research, and the most successful AI products, come from huge companies that spend billions and decades collecting our data. We call this the training data. Our example data set is six thousand family photos, some with the dog, some without. Your AI will study the data you give it, reading or looking at it over and over. The sum of what it learns during this process depends on the volume and quality of data you provide. The data has to be described and characterized by humans so you know the answers ahead of time, just like an instructor's answer key in a textbook. We call this labeling. For our example, three different veterinarians each looked at all six thousand pictures and labeled each "dog" or "no dog." The final step in preparing to teach is design of the empty, untrained brain of the AI. What kind of brain? How many layers? How do the layers talk with each other? This is called the model architecture. We choose Residual Network, since it's a well-tested architecture for image recognition. You can treat it as a black box, so we won't go into more detail.
Now you teach! You organize a repeated series of lessons and quizzes where the AI does the feed-forward part of learning. It uses its untrained brain to look at a randomly selected set of half of the dog pictures. This half of the total data set is called the training data. The AI does the task of predicting the right answer (dog or no dog) and then takes a quiz where you check its prediction against the human labels. After each quiz, you use a computer program to give the AI feedback on what it got right and wrong. It uses the feedback to adjust how its brain re-reads the data and comes up with answers (a mathematical process called gradient descent). You repeat for potentially hundreds of cycles of training so the different layers in the AI brain learn to do specific tasks, much as the layers in a baby's brain learn their job in the larger task of recognizing animals or getting someone to feed them. During the repeated cycles of training, the AI develops an equivalent to fluency from repetition and by learning hidden gems like how a common associate such as the combination of dark nose and round eye shape ties together remote associates like German shepherd, pug, and beagle. You stop the cycle of lessons and quizzes when the AI gets a good score a few times in a row, better than ninety percent correct on our metric, and it's clear it isn't learning anymore. This repeated good score is called convergence. If you want to impress someone when they are bragging about their AI, ask them, "How many training cycles before convergence?"
Now for the final exam. The AI reads the other half of data it's never seen before, called the testing data set, and does the task—just once for this other half of the data, called the testing data set. Remember, the data is labeled by experts, so you know the answers to the test. If the AI passes the test and hits your predetermined metric of correctly identifying ninety percent of the dog pictures, it gets a good grade, and you celebrate! This final exam is the proof that AI can learn enough from a specific case where it has access to the answers (training) and then successfully generalize to a case it's never seen before where you know the answer but it does not (testing).
Much like a new graduate, your AI now has theoretical knowledge but hasn't been out in the real world where it really counts. The really hard part of this process is launching your newly trained AI out into the real world (deployed), but in a way that allows it to continue learning safely. The task may be low stakes like identifying birds from their songs and relatively easy to deploy or high stakes like pointing out bone fractures on x-rays and relatively hard to deploy. No matter, training is just the first step before figuring out how to get your AI from the classroom into the real world (deployed). You'd think after all that, you'd be done. Unfortunately, newly graduated AI is destined to fail unless it is deployed in a way that allows it to continue learning on the job because it is simply impossible for your training to include every possible scenario, or combination of data. Remember, the whole point of AI is that it can do good work when it encounters things it's never seen before. Much like a well-educated person, AI that keeps learning on the job can use its training, and now experience, to solve problems in a changing environment (continual learning).
Now we'll use a real example to review the terms domain, task, metric, data, labeling, training, gradient descent, convergence, testing, deploying, and continual learning
I recently visited Iceland for the first time. On our way through the glacial areas of the Southwest we went on a hike from a barren, regularly flooded volcanic plain into an older, sheltered valley with plenty of trees. Songbirds suddenly appeared and chirped their hearts out as soon as we got into trees that were more than waist high. Iceland has plenty of trees, but they rarely grow more than five or six feet tall due to the heavy wind and wild swings in the amount of daylight, from twenty-four hours of light in the summer to twenty four-hours of dark in the winter. The rapid appearance of birds made me aware of the absence of birdsong everywhere else in Iceland, something I take for granted as background noise living in the U.S. mid-Atlantic region. So I got curious about birdsong and remembered hearing about an AI-powered birdwatching app called Merlin. Merlin is the result of a wonderful citizen-scientist collaboration at Cornell University. The coolest part of the app is an AI feature called "Sound ID" that can identify more than four hundred and fifty bird species in the U.S. and Canada alone from brief recordings you make of the world around you. The goal of the Merlin team was to capture the knowledge and expertise of a relatively few expert birdwatchers and share it with as many people as possible so they may also learn how to identify the birds around them. Think back to our historical precedents for AI: the invention of writing by the ancient Sumerians and the invention of the modern printing press by Johannes Gutenberg. Before writing and printing, a birdwatcher, or more likely a bird hunter, could teach at most a few other people to track birds by verbally describing what to listen for: "If you hear a repeated metallic chirp followed by a sort of up-and-down trilling, it's a bunting." Writing, then printing, and by extension the internet captured that knowledge so that many more could benefit, and a few centuries later, aspiring birdwatchers could listen to audio recordings, and then go stand in the backyard and try to pick out individual birds from the cacophony of birdsong around them. Now, AI in the form of Merlin puts the expertise of some of the most accomplished bird experts in the world in your pocket. It walks you through each call you're hearing and helps you learn what bird it belongs to. Merlin is used by hundreds of thousands of people, many of whom, despite the birdwatching books on their shelves, were unlikely to learn to identify birds without it. So let's take a look at how Merlin came to be.
Birds, like humans, are lifelong vocal learners (as are dolphins and bats). As chicks, they learn from their parents to both vocalize and understand sounds as chicks and they keep learning for the rest of their lives. Researchers discovered that birds use a form of cognitive language—their equivalent to words, grammar, and phrases—as more than communications signals. Birds will adjust and respond to changes in the order of chirps and warbles, which we anthropomorphize as grammar. In their own unique way, they will respond to minute changes in very high frequency parts of birdsong (Fishbein et al., 2019). Ornithologists, immersed in the study of birdsongs and bird language curated and labeled sound recordings and made them available to the public on the internet. Artificial intelligence researchers love freely available data that has already been characterized or labeled by experts because they can use it to train AI. Even more, AI researchers love language data in any form because its intricacies help to drive new discoveries, often relevant to cognition—the goal of AI. So AI researchers at the University of California San Diego doing early work with pictures of birds from the internet were thrilled when ornithologists at Cornell contacted them and invited them to check out the huge and growing collection of birdsong recordings at Cornell's Macaulay Library—at the time of this writing, more than 1,300 species (Galchen, 2024).
The scientists worked together to choose a domain—the topic for the AI—which in this case was bird vocalizations. The task for the AI—the particular problem it needed to solve—was identifying the bird that made a particular sound. The metric, or how they'd know if they got it right, was precision—how often the AI thought it was right and it actually was. The data were the one million recordings of birds in the Macaulay Library, many contributed by amateur bird watchers for research like this. They also included recordings of sounds you might hear together with birdsong out in the world, like wind, cars honking, and dogs barking. The labels—the answer key—were details added to each bird recording by citizen scientists (amateur birdwatchers) and expert ornithologists. The labeled data were divided into two halves, with the first half dedicated to training and the second to testing. The researchers chose a deep learning model architecture for the untrained brain called a residual network, a model architecture known for its flexibility.
The AI was trained by being made to "listen" to each type of bird to classify its species hundreds and hundreds of times, sometimes with background noises thrown in. The Merlin AI doesn't actually "listen" to anything. Much of the information contained in an audio recording of a bird singing (the level or volume of sound at different frequencies over a period of time) can also be represented in visual form as something called a spectrogram, and this is what the AI learned to recognize. You see spectrograms in movies and on TV when producers want to show you "sound waves." Children of the 1960s saw a crude spectrogram on Lost In Space when The Robot spoke, kids of the 1980s saw KITT's red fluctuating speech lights on the dashboard in Knight Rider, and millennials watching Futurama saw Bender the robot's crass speech mirrored in the wiggly lines of his mouth. So Merlin represents sound as a spectrogram image when it learns.
Each training cycle was followed by a quiz to see if the AI correctly identified the bird from its song. After each quiz, the AI was given feedback on how well it did. As it was trained, the AI used something called a "gradient descent calculation" to adjust the layers of its brain to optimize its learning. "Gradient" means the direction to adjust a layer to reduce errors. "Descent" means how much to adjust a layer. The AI went through cycles of training until it converged on a final level of performance (meaning it had learned all it could and wasn't getting any better). The trained AI was then tested with the other half of the labeled data it had not seen before to measure its precision (the final exam). Good news for budding amateur birdwatchers: It passed! The Merlin AI fits our definition of AI because it is a computer system taught by humans to do something no single human is likely capable of—recognize the unique song of thousands of birds worldwide.
The Merlin AI team then worked with app developers to deploy the AI into the Merlin app and set up the Merlin AI to continue to collect data and adjust its performance—what we call "continual learning"—based on the feedback of its users. If you use Merlin and give it feedback, then you teach the (Merlin) machine!
A Word About Data
We tend to trust knowledge and expertise when we have a sense that nobody's hiding anything. Our human educational system is built on a trusted combination of transparency, credentialling, and standardized evaluation. When someone is a trained, credentialed middle school science teacher, we generally know what to expect within a real-life range of ability. When a university professor teaches statistics or history to graduate students, the curriculum is overseen by a standards committee, the syllabus is almost always public, and the textbook or reading material is broadly published and available. You'll note that both the Merlin Sound ID AI and Google Brain's Transformer were trained using publicly available, well understood data. Both went on to have an impact on our world. That's not a coincidence. The best performing and most impactful AI will always come from transparent information. Would you accept a human teacher in your kids' school who used secret-sauce teaching materials that only they had knowledge of? Would you hire an expert who graduated from a university that used its own confidential "proprietary" textbooks and refused to be accredited by a third party?
AI is good and getting better at capturing human knowledge and approximating cognition, or thinking. It's good at breaking down bottlenecks and barriers to the use of expert knowledge by more people. It's also only as capable as we make it, since it's derived from the data in our world and the standard of "capable" set directly by us or indirectly by participating in digital systems where our judgment is captured. We trust AI when we trust the data it learned from, and we trust AI is "right" based on our own judgment or the impartial judgment of experts we trust. But this trust is not a given. A big part of teaching AI is selecting good data, finding ways to identify and ignore bad data, and then representing the data in a way that preserves the information we care about. There are whole fields of study and professions focused on these topics. If you're curious, look up "ground truth data" and "representation learning" to learn more.
The Language Of AI: Demystifying Jargon
As we begin incorporating AI into our lives, it's important to understand key terminology and to recognize the perils of advertising and marketing, gaslighting, and hype. For example, ChatGPT is AI, but not all AI is ChatGPT, as much as OpenAI would love for you to believe this. What follows is a glossary of AI terms that get thrown around a lot, some of which you have already encountered in this book. I provide a commonsense, nontechnical explanation for each, and I encourage you to look up the terms that seem most important to you elsewhere—both for more detailed information and to understand how these technical concepts relate to each other. In fact, I strongly encourage you to learn as much about AI as you possibly can. You teach the machines.
But first a few words of advice. When you do a web search for some AI jargon, put the word "intuition" at the end. When I was in graduate school for computer science, I learned to ask professors to help me understand the intuition behind complex mathematical and computational concepts. This gave them room to separate hard facts and mathematical truth from the "gist," the overarching, big picture, human-relatable concept. As much as possible, start with well-referenced or primary sources before turning to AI, if you use AI at all. For scientific and mathematical explanations, I often start with Wikipedia, as it's been hand-curated by people and experts over many years to be an accurate and useable reference. Please donate to Wikipedia at donate.wikimedia.org because the non-profit organization behind this website works tirelessly to empower hundreds of thousands of citizen experts to curate and fact-check knowledge. In return, all this freely given expertise is scraped off the Wikipedia website and used to train proprietary AI by the biggest, most profitable companies in the world. For those concerned with possible bias in Wikipedia articles, the organization offers an essay, "Wikipedia: Guide to Addressing Bias." However, Wikipedia cautions that the essay itself should be read with healthy skepticism.
Reddit is another helpful source because it captures the interaction and reasoning of its human contributors, with an up- or down-vote that can promote accurate information and demote baloney. But be brave and try to read primary scientific papers even if you don't understand most of what's presented. Another thing I've learned is that you can pick up important points from computer science, math, and other scientific papers without understanding all the details. You may be surprised. Plenty of "experts" don't understand what's in a paper the first time they read it. You can also learn about the progress of AI over time by seeing what papers are "highly cited" or referenced by other papers.
The point of further reading is to develop a sense of the field of AI. Know what you don't know. You may never become an expert, but this kind of reading can help you develop an intuitive sense of what is real vs hype, "magic" vs sleight of hand, distraction vs threat. Given the stakes and what's to come for our society and economy, a good bullshit detector is priceless.
I hope the definitions that follow are a helpful start at cutting through often overwhelming jargon and powering up your BS detector. Some of these terms and concepts appear earlier in the book, but since many of these topics are fairly abstract and complex, reading a more detailed explanation, along with additional examples, can be helpful. This is by no means a complete glossary, and the explanations are my own, based on study, work, and research in the field. They are intended to be conceptually and intuitively helpful, not thorough technical documentation. Please use this brief glossary as a starting point, and build on what's here by doing your own further reading and research.
The definitions, rather than appearing in alphabetical order, are organized in such a way that the terms logically follow one another.
Definitions
The real definition of "algorithm" is a series of readily explainable mathematical instructions or formulas used to solve a problem. The equations and formulas of geometry are examples of actual algorithms. The circumference of a circle is two times its radius multiplied by the constant value of pi. C = 2πr. When it comes to AI, social media started with simple algorithms, initially based on your social network—people you connected with on the app. A lot has changed since then. In Meta's own words, "We began with manual feature engineering for small models and progressed to building hundreds of deep neural network models with trillions of parameters" (Meta, 2023). What exists now is possibly the most powerful, nonexplainable artificial intelligence directed at understanding and changing human behavior outside classified government surveillance. I refer to the artificial intelligence in social media as "my algorithm" when it shows me a video of a puppy, an advertisement for a powerful flashlight, and a political message that evokes an emotional response.
Social media companies are likely happy we call their AI systems "algorithms" because it's a less threatening and more marketable word. In the world of social media, artificial intelligence continually learns how to "engage" you, your parents, and your kids. "Engage" is a euphemism for "attract and hold your attention." The business model of every social media company is to "monetize engagement"—in other words, sell two things: advertisements and data about you. Your "algorithm" in reality is a personal artificial intelligence that knows how to hook you and keep you using social media for as long as possible. It continually learns what will attract your attention from behavioral data it collects directly, and also that it is receiving from every other app, website, navigation system, payment service, and physical business you give your email, phone number, or tracking cookie. You teach your own social media machine. And all this happens without the social media company really understanding how their artificial intelligence hooks you. They only care that it does. In my own life, what social media companies call engagement, I experience as addiction. I can't open up Instagram without getting sucked in. Before I know it, my "screen time" is up over four hours per day, a level of exposure researchers have found linked to increased symptoms of anxiety and depression (Zablotsky et al., 2024). Using "algorithm" to describe my social media AI is like using "vape" to describe a highly optimized electronic nicotine delivery device wrapped in child-friendly, colorful packaging, sold by companies that don't care how nicotine interacts with our brain to reinforce dependence, only that it does. End rant.
"Wait, didn't you say there was AI in my car? But my car doesn't have eyes. How does it collect and use data?" Automotive AI, and just about every AI system that can respond to the physical world in real time (robots), uses sensors like cameras and accelerometers, along with computers on board that convert the images and brake force readings from unstructured to structured data that is then handed off to the AI. Automotive AI is initially taught using recordings made during millions of hours of cars driving around. That's part of what Google and others are doing when they send specially equipped camera cars to drive through your neighborhood. True story: It's gone now, but an early capture of my house by a Google Street View camera car showed my brother-in-law crouched between two parked cars acting like a monster about to pounce. He saw the camera car coming and hid between the cars. My brother-in-law created something called "noise" or an "outlier" in the data collected by Google, assuming he was the only random person pretending to be a monster that day. In addition to 360-degree cameras, these cars may also be equipped with radar and other sensors that capture distances, physical shapes, and motion, which can then be combined with the pictures to teach AI about the driving environment of our neighborhood.
The weight of a child is equal to ten times their age, plus or minus a few pounds.
The observations that make up the model are (a) that age is important to weight, (b) ten is the multiple that'll give you weight from age, and (c) we can expect a few pounds of error. Artificial intelligence models are learned from much more complex data (e.g., all the text ever published online) and contain many more observations, called "parameters"—sometimes into the tens of millions or billions—but a similar principle still holds. However, with both our fictional weight model and with artificial intelligence models, you have to remember that the model is only as useful as what it has "seen" before. The model learns parameters (observations) by example from the data it was given. To complicate matters further, AI parameters are never something understandable like "age." It's hard to think about, but the parameters (observations) in an artificial intelligence model are completely… artificial. Because of the complexity involved, an AI model is not explainable in human terms; we treat it as a black box, a system where the precise internal workings are not known.
Before you get started, you decide that if the machine can guess right more than three out of four times, it's a success. Then you train. You randomly select half of your data, give it to the machine, but hold back the answer of whether the fishing was good at each location. The machine guesses if the fishing was good, you tell it whether it got it right or wrong. It changes how it guesses and the cycle repeats. As it does a better job of guessing if the fish are biting, it creates its own secret parameters (observations) of what makes for good fishing. Eventually, it doesn't get any better at guessing and you record the parameters as your new fishing model. You've completed the training part of machine learning. Now you test the trained model by giving it the other half of the tide, time, and location data it's never seen before. Again, you know the right answer to whether the fishing is good. If the model (machine) guesses right at least three times out of four, you've successfully taught the machine. You've completed the testing part of machine learning. Importantly, somewhere in the layers of its neural network, the machine learned something you couldn't from all that data, some hidden factors nestled behind the data you do have. You'll never know what those factors are, but your new black box model does a pretty good job of guessing. Now you can use it to build a fish forecasting system that takes in water temperature, tide level, date, time, and location and tells a fisherman whether it's worth it to go out.
I like to eat ice
it responds with
cream
That's because in all the text used to train ChatGPT, "I like to eat ice cream" has the highest attention score. In other words, it is the most likely combination. Behind ChatGPT is a pre-trained large language model that in concept contains all of the words in the English language, together with the degree of likelihood that each word will be the next to come after the words before it. You can test this by prompting ChatGPT with the same words but re-arranged into a plain list of words without correct grammar. For example, when you type
what is the most likely word to come next in the sequence "eat like to ice"
ChatGPT responds with
cream
This is the core behavior of generative AI. It can get a lot more complicated, but the principle is the same. The "G" in ChatGPT stands for Generative. Note: Not all AI is generative. Another important type of AI is "bidirectional."
what is the meaning of the word "dog" in the sentence "the car broke down and it got so hot the dog let off steam"
it responds with
In this context, "dog" is slang for something that is of poor quality or unreliable.
On the other hand, at the time of writing ChatGPT replies to the same prompt with
the word "dog" most likely refers to an actual dog—as in the animal.
In this example, you can see that the bidirectional AI (Gemini) was better at picking up the semantics of the word "dog" based on the context of the sentence. This is because the generative AI (ChatGPT) was trained on language data that contained way more mentions of dog the animal than dog the slang word, and it made a prediction based on what it deemed most likely to come next (mentions of dog, the animal).
Exercises: Try It Out
Notice how the response changes when you give it more to pay attention to. Notice that when it doesn't have much to go on, it responds with a question. This is the interactive nature of chat-based AI. A friend and colleague described working with chat-based AI as like having an eager but inexperienced intern. A good AI intern wants to get it right, so asks a lot of questions to be sure it's heading in the right direction. This is deliberate, and a good characteristic. Contrast this with the certainty of Google's AI Overview response. Under what circumstances would you prefer one over the other?
References
Fishbein, Adam R., William J. Idsardi, Gregory F. Ball, & Robert J. Dooling, 2019. Sound Sequences in Birdsong: How Much Do Birds Really Care? Philosophical Transactions of the Royal Society B. The Royal Society Publishing. (Retrieved on April 19, 2025, from https://royalsocietypublishing.org/doi/10.1098/rstb.2019.0044)
Galchen, Rivka, 2024. How Scientists Started to Decode Birdsong. The New Yorker, October 14.
Gazzaniga, Michael S., 2018. The Consciousness Instinct: Unraveling the Mystery of How the Brain Makes the Mind. Farrar, Straus and Giroux.
Kounios, John, & Mark Beeman. 2015. The Eureka Factor: Aha Moments, Creative Insights, and the Brain. Independently Published.
McConnell, James V., 1989. Understanding Human Behavior. (6th Ed.). Holt, Rinehart, and Winston.
Meta, 2023. New AI Advancements Drive Meta's Ads System Performance and Efficiency. Meta. (Retrieved on April 7, 2025, from https://ai.meta.com/blog/ai-ads-performance-efficiency-meta-lattice/)
West, Jack, Lea Thiemt, Shimaa Ahmed, et al., 2024. A Picture Is Worth 500 Labels: A Case Study of Demographic Disparities in Local Machine Learning Models for Instagram and TikTok. University of Wisconsin‒Madison. ARXiv.org. (Retrieved on April 19, 2025, from https://arxiv.org/pdf/2403.19717)
Wikipedia, n.d. Wikipedia: Guide to Addressing Bias. (Retrieved on May 7, 2025, from Wikipedia:Guide to addressing bias - Wikipedia)
Zablotsky, Benjamin, Basilica Arockiaraj, Gelila Haile, & Amanda Ng, 2024. Daily Screen Time Among Teenagers: United States, July 2021‒December 2023. (Retrieved on April 7, 2025, from Centers for Disease Control and Prevention, National Center for Health Statistics. Products - Data Briefs - Number 513 -October 2024)