January 10, 2026

Audiobook: Chapter 1 What Is AI?

1 hour 18 minutes

CHAPTER 1: What is AI?

You may never directly teach AI yourself, but as we discussed in the Introduction, you participate in the process just about every time you interact with the digital world. You may also be in an organization that is considering if and how to adopt AI tools. These days, it is highly likely that an eager executive will push to "do something with AI" in your organization. These words are music to the ears of vendors who spend big money marketing their products as "powered by AI" whether they are or not. You can add a lot of value by understanding how AI learns so you can ask hard questions and set realistic expectations in your life and for your organization. You can be a big part of the solution by understanding and helping to position potential AI tools in the context of specific problems and human work that's already happening. We'll get into this more in the next chapter but, for now, know that the few AI projects that succeed are the ones that focus hard on context and people up front. Successful AI projects answer the question, "Just because we can, should we?"

"Daddy! I can't get the !@#$% sand out of my shoes!" The tiny voice from the back seat of the car was my three-year-old daughter appropriately vocalizing profanity for the very first time. I was simultaneously horrified, proud, curious, and (let's be honest) amused. Her still-developing intelligence had for the first time understood the perfect context for profanity and nailed it. My wife and I had not specifically taught her to swear when she wasn't able to shake sand out of her little sneakers. She had (unfortunately) heard my wife and me swearing in other situations, none of which involved sand or shoes. She had gathered information from those specific cases and correctly applied it to an entirely new situation with which she had no prior experience.

When we encounter AI that can do something similar, we see intelligence in the machine. We marveled at the unveiling of ChatGPT because the underlying AI could take completely off-the-wall input it had never seen before and come back with a reasonable response in the appropriate context. My daughter had never been strapped into her car seat with a shoe full of unwanted sand, but her developing brain had been exposed to enough unrelated situations to figure out that this was a four-letter-word moment. This is what psychologists call "transduction," a form of reasoning where developing children learn from specific cases they experience and apply their new knowledge to general (new) cases they haven't experienced. Much of machine learning and AI, including the Transformer developed by Google, are conceived to solve general transduction problems, along with a related type of problem called "sequence modeling," discussed in the next paragraph. The Transformer, invented by researchers at Google in 2017 and developed into AI applications in many languages, could similarly encounter an English sentence it had never seen before—such as, "What do I say in German when I am very frustrated because I can't get the sand out of my shoe?"—and come up with "Ich bekomme den @#$% sand nicht aus meinem schuh!"

Here's another example of the human brain at work. Consider this series of words: pine, sauce, crab. What's the next word in the series? If you quickly guess "pie," "Adam's," or "computer," you are using your instinctive powers of reasoning to subconsciously assess the relationship between the first three words and find something they have in common—in this case "apple"—to inform your choice of the next word. You can also puzzle this out through a more deliberate process of elimination using your analytical brain. This may be slower, but it can also lead to the correct answer more often. We're all wired for both instinctual insight and analytical thinking, though individually we often skew one way or the other (Kounios & Beeman, 2015). Your brain is built for instinctual insight, so the more language you're exposed to, the more likely it is that your brain quickly finds a relationship between the first three words in the sequence to use as context to come up with a fourth word. This type of cognition is part of something called "fluency," where pathways in your brain have been trained by repeated exposure to information. Your fluent pathways are strengthened when you subconsciously create a common associate like "apple" between remote associates like "pine," "sauce," and "crab," all words or concepts that don't share an obvious connection. When you make up a mnemonic, such as a silly limerick, to help you remember something, you're using the same underlying cognitive mechanism. We see intelligence when we encounter machines that can mimic sequential insight like this in a general way. Picking what comes next is the type of problem in both psychology and machine learning called "sequence modeling." These are very important problems for humans. Figuring out what happens next, or even the few possibilities that might happen next, is a big part of how we are successful as a species. We are especially impressed when the answer isn't something we would have come up with on our own. Just as in the example with my daughter, the key to intelligence is that the machine, the AI, performs well when it comes across something it hasn't ever encountered before. That general capability sets AI apart from other kinds of computer programs that work under tighter constraints.

Does this mean everything called AI around us is able solve general problems? Nope. Software companies desperately want to take advantage of the excitement over AI by slapping the AI label on their products. But a computer system is not AI just because it follows rules to do useful work, no matter how slick the packaging. Rules are created by looking at a bunch of specific cases, then writing up the logic for what to do in those cases. Think back to the semi-automated sawmill example in the Introduction. That computer system was likely programmed based on an old, expert-authored manual of rules for how to saw a log into valuable lumber. What magic there is comes from the clever detection of the outline of the log in a digital photograph, which is itself based on geometric and mathematical rules for finding the edge of a simple, predictable shape. This is by far the best and most efficient way to solve that particular problem. It would be a waste of time and money to show an AI a bunch of logs and a bunch of lumber and teach it to come up with the right cut pattern. One goal of this book is for you to be able to ask questions and think critically about what does and doesn't deserve to be called AI, and even more important, to assess which kinds of problems are worth the effort and uncertainty that come with AI. Because teaching a machine takes a lot of work, and you usually don't know what you're going to get.

How Machines Learn

How do machines learn? Scientists work hard to use the human brain as a model for learning intelligence. After all, they don't have much else to go on! The starting point for artificial intelligence is informed—at least at a high level—by our understanding of the design of the brain and theories of how we learn.

Your brain is a giant mass of interconnected cells called neurons. But it's more than just a skull full of neuron spaghetti. Neurons are elongated cells that form the wiring of your brain. Each neuron cell listens for a signal from nearby neurons. When the signal gets strong enough, the cell activates and sends its own signal out to its neighboring neurons, propagating patterns of signals through the different parts of your brain. Take, for example, your eyes looking at a brightly lit square of paper, half white, half black. Nerve endings in your eye are excited by nearby light-sensitive cells that pop off a signal. That signal tells your neurons to transmit their own signal, but in a pattern that reflects the pattern of light and dark hitting the back of your eye. The pattern of signals travels down what is effectively a data cable from your eye to your brain.

The signals dump into your brain where the arrangement of neurons isn't just random, but is organized into neighborhoods, or specialized networks, where the neurons in the network are particularly good at specific kinds of signaling. For example, detecting a bright light. These networks are organized into layers that are good at specific kinds of thinking. You can think of the layers as a stack of pancakes, where the top pancake of networked neurons does the simplest task like measuring overall brightness at different grid coordinates. That layer hands the map of what's bright and what's not to the next layer, which detects edges in the image—the outline of the square and the boundary between the black and white sides. Your brain continues this general organization where each layer takes input, uses its network of neurons to process it to some degree, then hands off the result to the next layer (Gazzaniga, 2018). For example, when you look at your dog, your eyes send a bunch of electrical signals representing brightness, contrast, and color to the layer of your brain that is your visual cortex. Your visual cortex takes that input and turns it into signals that it hands off to other layers of the brain that do a specific job. There are layers to store and recall memories ("That's my dog, Lilo"), set off emotions ("I love my big baby girl, Lilo"), create speech ("Come here, big baby"), and move our hands (scratch, scratch, scratch). We've been trained by our repeated experiences of the world around us to recognize, feel love for, interact with, and pet our dog. As we grow and develop as children, we learn to recognize all sorts of animals, like kangaroos and deer, but unless we're living in a zoo, we don't moon over them and scratch their ears. But we can tell them apart from dogs!

Similarly, AI systems are designed to use pretend digital equivalents of neurons, networks, and layers to process information. So far, we've talked about language AI, but there is a whole world of visual AI as well. Take a learning task like figuring out if a picture contains (a) a dog or (b) no dog. A visual AI has a layer that takes in a collection of numbers representing the intensity, color, and position of all the dots (pixels) that together make up a digital photo. That input layer hands the raw data off to the next layer, which figures out what's bright and what's dark, then hands everything off to the next layer which figures out where there is something that humans would recognize as an edge, or line. The next layer figures out which lines are organized into simple shapes. The next layer determines which shapes are important and hands those off to the final layer, which makes a guess as to whether one of the shapes is a dog. Just as in our brains, each layer in the AI doesn't care what the other layers do; it's good at its one task. And just like in our brains, when you put all the layers together, you may get intelligence. Remember "deep learning" from the Introduction? Before 2015, machine learning was done with a single, flat neural network. "Deep" just means you have more than two layers besides the input and output layer. There's no magic number of layers in an AI "brain." You decide how many layers to start with, based on the type of AI and the kind of learning. When you teach a machine from scratch, the only layer you specifically set up is the first input layer. The layers after that aren't set up ahead of time to do anything specific. They all start out as generic collections of digital neurons. A new AI has to learn what to do layer by layer. All this adds up to what gets loosely called an "algorithm."

Layers are as far as we're going to go in terms of AI's internal wiring. There are many wonderful books you can read to delve into the fascinating and beautiful construction of AI algorithms. Or you can ask your favorite AI to explain it to you, though I'd recommend a combination. But for practical purposes, the algorithm is just the starting point. The magic happens when you teach the algorithm to do something truly remarkable.

The General Systems Theory of psychology attempts to explain human behavior by looking at the three main variables of human psychology: biological (hungry), psychological (decide to seek food), and social/behavioral (somebody feed me). If you're a baby, you get the inputs your body needs (food) by controlling your outputs (crying in a tone that means hungry vs uncomfortable from a wet diaper). You output information (crying) to your environment by planning actions to get what you want. In a system, this is called "feed-forward." The actions you plan (time to cry) are based on a guess of the consequences of those actions (Dad feeds me). You run the plan (cry) and compare the actual consequences with what you thought they'd be (did I get fed or not?). This is called feedback. If you didn't get what you wanted (still hungry), you adjust the plan (cry louder), which is using feedback (McConnell, 1989). The teaching of machines, machine learning, is all about infant computer programs going through the cycle of planning actions and guessing consequences (feed-forward), doing the actions, then comparing the actual result to the guessed result (feedback), adjusting if necessary to repeat the cycle (using feedback).

We're going dig into an example of how AI is taught by people and deployed into the real world. Before we get there, it's helpful to understand the general approach to machine learning along with some of the technical terms for key parts of the process and steps that are applied.

First, you need a topic or situation based in the real world. Artificial intelligence, like humans, needs to focus on one thing at a time while learning, so in our AI, we focus on a specific topic to provide loose boundaries. We call this topic the domain. The Google Brain team chose foreign languages as the domain when they were developing and testing their Transformer. In our example, the domain is "dogs." Within the general topic, we go further and articulate a particular problem to solve. This problem is called the task. Our task is "Decide if a picture has a dog in it, or not." Next is a definition of success. I can't overstate how critical it is to decide on and define the successful outcome we want ahead of time. Your measure of success is called the metric. Recall that Google's Transformer was first taught to pass a longstanding standardized test of English-to-German and English-to-French language translation. This was their metric, or measure of success. With AI, you're teaching a machine to approximate or augment a cognitive process that only a human can do, so you or your organization MUST understand baseline human performance and articulate ahead of time what success looks like for the AI. Your measure of success is called the metric. The AI doesn't have to "beat the human" like the chess-playing computers from the 1990s. It's enough to set a standard that the AI helps a human to accomplish faster. Our example metric is "Find more than 90% of the dog pictures." This metric is the critical educational outcome that guides how you teach the machine. The next step is to procure the equivalent of a textbook for AI training: enough relevant data for the lesson. How much data is enough? Enormous, truly huge volumes of data are required to successfully teach AI. You need to start with every scrap of data relevant to your problem that you can beg, borrow, or steal (not really). It will likely still not be "enough." This is why the most successful AI research, and the most successful AI products, come from huge companies that spend billions and decades collecting our data. We call this the training data. Our example data set is six thousand family photos, some with the dog, some without. Your AI will study the data you give it, reading or looking at it over and over. The sum of what it learns during this process depends on the volume and quality of data you provide. The data has to be described and characterized by humans so you know the answers ahead of time, just like an instructor's answer key in a textbook. We call this labeling. For our example, three different veterinarians each looked at all six thousand pictures and labeled each "dog" or "no dog." The final step in preparing to teach is design of the empty, untrained brain of the AI. What kind of brain? How many layers? How do the layers talk with each other? This is called the model architecture. We choose Residual Network, since it's a well-tested architecture for image recognition. You can treat it as a black box, so we won't go into more detail.

Now you teach! You organize a repeated series of lessons and quizzes where the AI does the feed-forward part of learning. It uses its untrained brain to look at a randomly selected set of half of the dog pictures. This half of the total data set is called the training data. The AI does the task of predicting the right answer (dog or no dog) and then takes a quiz where you check its prediction against the human labels. After each quiz, you use a computer program to give the AI feedback on what it got right and wrong. It uses the feedback to adjust how its brain re-reads the data and comes up with answers (a mathematical process called gradient descent). You repeat for potentially hundreds of cycles of training so the different layers in the AI brain learn to do specific tasks, much as the layers in a baby's brain learn their job in the larger task of recognizing animals or getting someone to feed them. During the repeated cycles of training, the AI develops an equivalent to fluency from repetition and by learning hidden gems like how a common associate such as the combination of dark nose and round eye shape ties together remote associates like German shepherd, pug, and beagle. You stop the cycle of lessons and quizzes when the AI gets a good score a few times in a row, better than ninety percent correct on our metric, and it's clear it isn't learning anymore. This repeated good score is called convergence. If you want to impress someone when they are bragging about their AI, ask them, "How many training cycles before convergence?"

Now for the final exam. The AI reads the other half of data it's never seen before, called the testing data set, and does the task—just once for this other half of the data, called the testing data set. Remember, the data is labeled by experts, so you know the answers to the test. If the AI passes the test and hits your predetermined metric of correctly identifying ninety percent of the dog pictures, it gets a good grade, and you celebrate! This final exam is the proof that AI can learn enough from a specific case where it has access to the answers (training) and then successfully generalize to a case it's never seen before where you know the answer but it does not (testing).

Much like a new graduate, your AI now has theoretical knowledge but hasn't been out in the real world where it really counts. The really hard part of this process is launching your newly trained AI out into the real world (deployed), but in a way that allows it to continue learning safely. The task may be low stakes like identifying birds from their songs and relatively easy to deploy or high stakes like pointing out bone fractures on x-rays and relatively hard to deploy. No matter, training is just the first step before figuring out how to get your AI from the classroom into the real world (deployed). You'd think after all that, you'd be done. Unfortunately, newly graduated AI is destined to fail unless it is deployed in a way that allows it to continue learning on the job because it is simply impossible for your training to include every possible scenario, or combination of data. Remember, the whole point of AI is that it can do good work when it encounters things it's never seen before. Much like a well-educated person, AI that keeps learning on the job can use its training, and now experience, to solve problems in a changing environment (continual learning).

Now we'll use a real example to review the terms domain, task, metric, data, labeling, training, gradient descent, convergence, testing, deploying, and continual learning

I recently visited Iceland for the first time. On our way through the glacial areas of the Southwest we went on a hike from a barren, regularly flooded volcanic plain into an older, sheltered valley with plenty of trees. Songbirds suddenly appeared and chirped their hearts out as soon as we got into trees that were more than waist high. Iceland has plenty of trees, but they rarely grow more than five or six feet tall due to the heavy wind and wild swings in the amount of daylight, from twenty-four hours of light in the summer to twenty four-hours of dark in the winter. The rapid appearance of birds made me aware of the absence of birdsong everywhere else in Iceland, something I take for granted as background noise living in the U.S. mid-Atlantic region. So I got curious about birdsong and remembered hearing about an AI-powered birdwatching app called Merlin. Merlin is the result of a wonderful citizen-scientist collaboration at Cornell University. The coolest part of the app is an AI feature called "Sound ID" that can identify more than four hundred and fifty bird species in the U.S. and Canada alone from brief recordings you make of the world around you. The goal of the Merlin team was to capture the knowledge and expertise of a relatively few expert birdwatchers and share it with as many people as possible so they may also learn how to identify the birds around them. Think back to our historical precedents for AI: the invention of writing by the ancient Sumerians and the invention of the modern printing press by Johannes Gutenberg. Before writing and printing, a birdwatcher, or more likely a bird hunter, could teach at most a few other people to track birds by verbally describing what to listen for: "If you hear a repeated metallic chirp followed by a sort of up-and-down trilling, it's a bunting." Writing, then printing, and by extension the internet captured that knowledge so that many more could benefit, and a few centuries later, aspiring birdwatchers could listen to audio recordings, and then go stand in the backyard and try to pick out individual birds from the cacophony of birdsong around them. Now, AI in the form of Merlin puts the expertise of some of the most accomplished bird experts in the world in your pocket. It walks you through each call you're hearing and helps you learn what bird it belongs to. Merlin is used by hundreds of thousands of people, many of whom, despite the birdwatching books on their shelves, were unlikely to learn to identify birds without it. So let's take a look at how Merlin came to be.

Birds, like humans, are lifelong vocal learners (as are dolphins and bats). As chicks, they learn from their parents to both vocalize and understand sounds as chicks and they keep learning for the rest of their lives. Researchers discovered that birds use a form of cognitive language—their equivalent to words, grammar, and phrases—as more than communications signals. Birds will adjust and respond to changes in the order of chirps and warbles, which we anthropomorphize as grammar. In their own unique way, they will respond to minute changes in very high frequency parts of birdsong (Fishbein et al., 2019). Ornithologists, immersed in the study of birdsongs and bird language curated and labeled sound recordings and made them available to the public on the internet. Artificial intelligence researchers love freely available data that has already been characterized or labeled by experts because they can use it to train AI. Even more, AI researchers love language data in any form because its intricacies help to drive new discoveries, often relevant to cognition—the goal of AI. So AI researchers at the University of California San Diego doing early work with pictures of birds from the internet were thrilled when ornithologists at Cornell contacted them and invited them to check out the huge and growing collection of birdsong recordings at Cornell's Macaulay Library—at the time of this writing, more than 1,300 species (Galchen, 2024).

The scientists worked together to choose a domain—the topic for the AI—which in this case was bird vocalizations. The task for the AI—the particular problem it needed to solve—was identifying the bird that made a particular sound. The metric, or how they'd know if they got it right, was precision—how often the AI thought it was right and it actually was. The data were the one million recordings of birds in the Macaulay Library, many contributed by amateur bird watchers for research like this. They also included recordings of sounds you might hear together with birdsong out in the world, like wind, cars honking, and dogs barking. The labels—the answer key—were details added to each bird recording by citizen scientists (amateur birdwatchers) and expert ornithologists. The labeled data were divided into two halves, with the first half dedicated to training and the second to testing. The researchers chose a deep learning model architecture for the untrained brain called a residual network, a model architecture known for its flexibility.

The AI was trained by being made to "listen" to each type of bird to classify its species hundreds and hundreds of times, sometimes with background noises thrown in. The Merlin AI doesn't actually "listen" to anything. Much of the information contained in an audio recording of a bird singing (the level or volume of sound at different frequencies over a period of time) can also be represented in visual form as something called a spectrogram, and this is what the AI learned to recognize. You see spectrograms in movies and on TV when producers want to show you "sound waves." Children of the 1960s saw a crude spectrogram on Lost In Space when The Robot spoke, kids of the 1980s saw KITT's red fluctuating speech lights on the dashboard in Knight Rider, and millennials watching Futurama saw Bender the robot's crass speech mirrored in the wiggly lines of his mouth. So Merlin represents sound as a spectrogram image when it learns.

Each training cycle was followed by a quiz to see if the AI correctly identified the bird from its song. After each quiz, the AI was given feedback on how well it did. As it was trained, the AI used something called a "gradient descent calculation" to adjust the layers of its brain to optimize its learning. "Gradient" means the direction to adjust a layer to reduce errors. "Descent" means how much to adjust a layer. The AI went through cycles of training until it converged on a final level of performance (meaning it had learned all it could and wasn't getting any better). The trained AI was then tested with the other half of the labeled data it had not seen before to measure its precision (the final exam). Good news for budding amateur birdwatchers: It passed! The Merlin AI fits our definition of AI because it is a computer system taught by humans to do something no single human is likely capable of—recognize the unique song of thousands of birds worldwide.

The Merlin AI team then worked with app developers to deploy the AI into the Merlin app and set up the Merlin AI to continue to collect data and adjust its performance—what we call "continual learning"—based on the feedback of its users. If you use Merlin and give it feedback, then you teach the (Merlin) machine!

A Word About Data

We tend to trust knowledge and expertise when we have a sense that nobody's hiding anything. Our human educational system is built on a trusted combination of transparency, credentialling, and standardized evaluation. When someone is a trained, credentialed middle school science teacher, we generally know what to expect within a real-life range of ability. When a university professor teaches statistics or history to graduate students, the curriculum is overseen by a standards committee, the syllabus is almost always public, and the textbook or reading material is broadly published and available. You'll note that both the Merlin Sound ID AI and Google Brain's Transformer were trained using publicly available, well understood data. Both went on to have an impact on our world. That's not a coincidence. The best performing and most impactful AI will always come from transparent information. Would you accept a human teacher in your kids' school who used secret-sauce teaching materials that only they had knowledge of? Would you hire an expert who graduated from a university that used its own confidential "proprietary" textbooks and refused to be accredited by a third party?

AI is good and getting better at capturing human knowledge and approximating cognition, or thinking. It's good at breaking down bottlenecks and barriers to the use of expert knowledge by more people. It's also only as capable as we make it, since it's derived from the data in our world and the standard of "capable" set directly by us or indirectly by participating in digital systems where our judgment is captured. We trust AI when we trust the data it learned from, and we trust AI is "right" based on our own judgment or the impartial judgment of experts we trust. But this trust is not a given. A big part of teaching AI is selecting good data, finding ways to identify and ignore bad data, and then representing the data in a way that preserves the information we care about. There are whole fields of study and professions focused on these topics. If you're curious, look up "ground truth data" and "representation learning" to learn more.

The Language Of AI: Demystifying Jargon

As we begin incorporating AI into our lives, it's important to understand key terminology and to recognize the perils of advertising and marketing, gaslighting, and hype. For example, ChatGPT is AI, but not all AI is ChatGPT, as much as OpenAI would love for you to believe this. What follows is a glossary of AI terms that get thrown around a lot, some of which you have already encountered in this book. I provide a commonsense, nontechnical explanation for each, and I encourage you to look up the terms that seem most important to you elsewhere—both for more detailed information and to understand how these technical concepts relate to each other. In fact, I strongly encourage you to learn as much about AI as you possibly can. You teach the machines.

But first a few words of advice. When you do a web search for some AI jargon, put the word "intuition" at the end. When I was in graduate school for computer science, I learned to ask professors to help me understand the intuition behind complex mathematical and computational concepts. This gave them room to separate hard facts and mathematical truth from the "gist," the overarching, big picture, human-relatable concept. As much as possible, start with well-referenced or primary sources before turning to AI, if you use AI at all. For scientific and mathematical explanations, I often start with Wikipedia, as it's been hand-curated by people and experts over many years to be an accurate and useable reference. Please donate to Wikipedia at donate.wikimedia.org because the non-profit organization behind this website works tirelessly to empower hundreds of thousands of citizen experts to curate and fact-check knowledge. In return, all this freely given expertise is scraped off the Wikipedia website and used to train proprietary AI by the biggest, most profitable companies in the world. For those concerned with possible bias in Wikipedia articles, the organization offers an essay, "Wikipedia: Guide to Addressing Bias." However, Wikipedia cautions that the essay itself should be read with healthy skepticism.

Reddit is another helpful source because it captures the interaction and reasoning of its human contributors, with an up- or down-vote that can promote accurate information and demote baloney. But be brave and try to read primary scientific papers even if you don't understand most of what's presented. Another thing I've learned is that you can pick up important points from computer science, math, and other scientific papers without understanding all the details. You may be surprised. Plenty of "experts" don't understand what's in a paper the first time they read it. You can also learn about the progress of AI over time by seeing what papers are "highly cited" or referenced by other papers.

The point of further reading is to develop a sense of the field of AI. Know what you don't know. You may never become an expert, but this kind of reading can help you develop an intuitive sense of what is real vs hype, "magic" vs sleight of hand, distraction vs threat. Given the stakes and what's to come for our society and economy, a good bullshit detector is priceless.

I hope the definitions that follow are a helpful start at cutting through often overwhelming jargon and powering up your BS detector. Some of these terms and concepts appear earlier in the book, but since many of these topics are fairly abstract and complex, reading a more detailed explanation, along with additional examples, can be helpful. This is by no means a complete glossary, and the explanations are my own, based on study, work, and research in the field. They are intended to be conceptually and intuitively helpful, not thorough technical documentation. Please use this brief glossary as a starting point, and build on what's here by doing your own further reading and research.

The definitions, rather than appearing in alphabetical order, are organized in such a way that the terms logically follow one another.

Definitions

Algorithm—At the time this book was written, "algorithm" was used as a euphemism for artificial intelligence that seems to know something about us or the physical world. We use "algorithm" instead of artificial intelligence because it's a comfortable word to describe something that can get, well, creepy. Have you ever said or heard a friend say something like, "The algorithm knew I was thinking about buying a house!" followed by an uncomfortable laugh? We've been using "algorithm" instead of "artificial intelligence" for years because it's a more comfortable word to describe something that can get, well, creepy. But anytime you refer to the algorithms on Amazon or Instagram, you are in fact referring to AI.

The real definition of "algorithm" is a series of readily explainable mathematical instructions or formulas used to solve a problem. The equations and formulas of geometry are examples of actual algorithms. The circumference of a circle is two times its radius multiplied by the constant value of pi. C = 2πr. When it comes to AI, social media started with simple algorithms, initially based on your social network—people you connected with on the app. A lot has changed since then. In Meta's own words, "We began with manual feature engineering for small models and progressed to building hundreds of deep neural network models with trillions of parameters" (Meta, 2023). What exists now is possibly the most powerful, nonexplainable artificial intelligence directed at understanding and changing human behavior outside classified government surveillance. I refer to the artificial intelligence in social media as "my algorithm" when it shows me a video of a puppy, an advertisement for a powerful flashlight, and a political message that evokes an emotional response.

Social media companies are likely happy we call their AI systems "algorithms" because it's a less threatening and more marketable word. In the world of social media, artificial intelligence continually learns how to "engage" you, your parents, and your kids. "Engage" is a euphemism for "attract and hold your attention." The business model of every social media company is to "monetize engagement"—in other words, sell two things: advertisements and data about you. Your "algorithm" in reality is a personal artificial intelligence that knows how to hook you and keep you using social media for as long as possible. It continually learns what will attract your attention from behavioral data it collects directly, and also that it is receiving from every other app, website, navigation system, payment service, and physical business you give your email, phone number, or tracking cookie. You teach your own social media machine. And all this happens without the social media company really understanding how their artificial intelligence hooks you. They only care that it does. In my own life, what social media companies call engagement, I experience as addiction. I can't open up Instagram without getting sucked in. Before I know it, my "screen time" is up over four hours per day, a level of exposure researchers have found linked to increased symptoms of anxiety and depression (Zablotsky et al., 2024). Using "algorithm" to describe my social media AI is like using "vape" to describe a highly optimized electronic nicotine delivery device wrapped in child-friendly, colorful packaging, sold by companies that don't care how nicotine interacts with our brain to reinforce dependence, only that it does. End rant.

Data—When we're talking about AI, data are digital representations of the real or online world. Data are (or is; often data is treated as a collective noun) always a representative example of the real world, but almost never every possible representation. Since artificial intelligence learns by example, the more data you can give it, the more it can learn. Data can be structured, like a spreadsheet made up of columns and rows with numbers or text in each little box. Data can also be unstructured, like a digital photo, the electronic files used to hold the words of this book, or the squiggly line of your heartbeat on a monitor in the hospital. Humans can work with unstructured data as is; we can look at a picture and understand what we see or read this paragraph. Artificial intelligence, however, needs help. For the purposes of teaching artificial intelligence, all unstructured data must become structured. For example, the picture you took of a spring daffodil is transformed by a computer program into two columns of numbers with one row for every tiny little part of the picture. The bright area of white in the middle of the picture will produce many rows that give the coordinates of tiny little spots called pixels that make up the area. Each row will contain numbers representing the color and intensity at the exact spot. AI reads the structured data and learns from it. A 4K camera takes pictures that have 4000 rows!

"Wait, didn't you say there was AI in my car? But my car doesn't have eyes. How does it collect and use data?" Automotive AI, and just about every AI system that can respond to the physical world in real time (robots), uses sensors like cameras and accelerometers, along with computers on board that convert the images and brake force readings from unstructured to structured data that is then handed off to the AI. Automotive AI is initially taught using recordings made during millions of hours of cars driving around. That's part of what Google and others are doing when they send specially equipped camera cars to drive through your neighborhood. True story: It's gone now, but an early capture of my house by a Google Street View camera car showed my brother-in-law crouched between two parked cars acting like a monster about to pounce. He saw the camera car coming and hid between the cars. My brother-in-law created something called "noise" or an "outlier" in the data collected by Google, assuming he was the only random person pretending to be a monster that day. In addition to 360-degree cameras, these cars may also be equipped with radar and other sensors that capture distances, physical shapes, and motion, which can then be combined with the pictures to teach AI about the driving environment of our neighborhood.

Model—Collection of observations created by a computer system as it learns something from data. The following example is intuitive and easy to follow but completely made up and not actually true. If you're a pediatrician, please forgive me! Let's pretend I collect structured data from a thousand children by recording their weight at each birthday up to age fifteen. I make a spreadsheet with two columns, one for age, one for weight. I put my own data into the spreadsheet: At one year old, I was 10 pounds. At two years old, 22 pounds. Fifteen years old, 149 pounds. Let's pretend that same trend holds for most of the other children in the data. I can use a computer running a math equation (called linear regression) to learn a simple model, again not real but for illustration only:

The weight of a child is equal to ten times their age, plus or minus a few pounds.

The observations that make up the model are (a) that age is important to weight, (b) ten is the multiple that'll give you weight from age, and (c) we can expect a few pounds of error. Artificial intelligence models are learned from much more complex data (e.g., all the text ever published online) and contain many more observations, called "parameters"—sometimes into the tens of millions or billions—but a similar principle still holds. However, with both our fictional weight model and with artificial intelligence models, you have to remember that the model is only as useful as what it has "seen" before. The model learns parameters (observations) by example from the data it was given. To complicate matters further, AI parameters are never something understandable like "age." It's hard to think about, but the parameters (observations) in an artificial intelligence model are completely… artificial. Because of the complexity involved, an AI model is not explainable in human terms; we treat it as a black box, a system where the precise internal workings are not known.

Foundation Model—A special kind of model used to represent a very broad area of our world or life. It could be as general and foundational as "the English language" or "pictures." Some of the first foundation models were indeed built from images. For example, a foundation model for images is a powerful tool to help artificial intelligence understand what makes up a line, basic shapes like squares and circles, light versus dark, and color. You can think of a foundation model as the part of your brain that processes the primary input from your eyes, ears, and other senses. You likely first experienced the power of a foundation model when your phone could unlock itself by looking at your face, then when you searched for a picture in the photo album on your phone. Foundation models based on language are the basis, in fact the foundation, of the powerful AI tools we started to use in the early 2020s. The next generation of foundation models is being built by the largest AI companies from multiple broad areas, for example, combining images and language.
Large Language Model—Foundation model built from human language. The machine has been taught the vocabulary, sentence structure, and style of an entire language. Additionally, the machine has been taught the likelihood of how words, sentences, and style occur together across an entire human language.
Machine Learning—The process of using a computer system to learn a model from data. Another (made-up) example: You need to forecast the global fish supply but have not been able to use regular math to figure out how water temperature, tide, location, and day of the year come together to determine how many fish will be caught. Your data contain the pounds of fish caught, water temperature, and height of the ocean tide at every minute of the day for a whole year at a thousand locations around the world. You decide to try machine learning: to teach a machine to guess whether fishing will be good.

Before you get started, you decide that if the machine can guess right more than three out of four times, it's a success. Then you train. You randomly select half of your data, give it to the machine, but hold back the answer of whether the fishing was good at each location. The machine guesses if the fishing was good, you tell it whether it got it right or wrong. It changes how it guesses and the cycle repeats. As it does a better job of guessing if the fish are biting, it creates its own secret parameters (observations) of what makes for good fishing. Eventually, it doesn't get any better at guessing and you record the parameters as your new fishing model. You've completed the training part of machine learning. Now you test the trained model by giving it the other half of the tide, time, and location data it's never seen before. Again, you know the right answer to whether the fishing is good. If the model (machine) guesses right at least three times out of four, you've successfully taught the machine. You've completed the testing part of machine learning. Importantly, somewhere in the layers of its neural network, the machine learned something you couldn't from all that data, some hidden factors nestled behind the data you do have. You'll never know what those factors are, but your new black box model does a pretty good job of guessing. Now you can use it to build a fish forecasting system that takes in water temperature, tide level, date, time, and location and tells a fisherman whether it's worth it to go out.

Artificial Intelligence—Computer systems that learn models from data through the process of machine learning. These models can "understand" (describe and represent) something like language, pictures of animals, x-rays, or rainfall. These models can also do tasks like translate languages, identify cats in pictures, diagnose broken bones, or predict rainwater runoff patterns.
Supervised—When a human tells a machine whether it guessed right or wrong. The fishing example above is supervised machine learning, supervised artificial intelligence.
Unsupervised—When a machine learns something that may or may not be "right" in the eyes of a human. On its own, without supervision. An example of unsupervised machine learning, or unsupervised artificial intelligence, would be when a machine learns which words are important on the Wikipedia website based on something like how often a word shows up in all the articles. You didn't teach it. Without supervision, it'll probably learn "and" is an important word. Unsupervised machine learning is often a first step in developing AI systems. After a round of unsupervised learning to get a rough model, humans can supervise the next round, during which you'll teach the machine to ignore "and," "the," and "is" in favor of words that you deem actually important.
Training—The part of machine learning when you tell a machine what it got right and wrong through multiple rounds of guessing. You know the answer (fishing is good or not), and the machine learns how to guess the answer based on the data you give it.
Testing—The part of machine learning where you show a freshly trained machine data it's never seen before and test whether it meets some benchmark you decided ahead of time.
Pre-training—Often used to describe the process of training a foundation model to "understand" fundamentals, e.g., of language or images, with the expectation that additional training and testing will be done so the model can be used for a specific task like translation from English to German or facial recognition.
Accuracy—How often a machine is correct compared to an established benchmark in the training and testing data. When you test a new model during the machine learning process, you're often testing its accuracy. Let's say you have your uncle over for dinner and a game of trivia. If your uncle answers "blue" and "round" when you ask him the color of the sky and the shape of the earth, he is accurate. If, after a few glasses of wine, your uncle answers "blue" and "flat" to the same question, his accuracy is suffering. It's important to note that AI accuracy is not a real-world measure and is solely from an AI (not human) perspective. If your uncle was raised by flat-earthers and never learned that the world was round, his answer of "flat" is as accurate as the data he had from his parents. Similarly, if an AI is taught using flawed data, it can be accurate given the data but not empirically, objectively correct when out in the real world.
Precision—How often a machine gives you the same answer. In other words, consistency. When you're testing a new model during the machine learning process, you sometimes ask it the same question multiple times, an opportunity for the machine to give you the same answer (or not). If your uncle gives you a different answer every time you ask him if the world is round, he is not precise. If your uncle always replies "blue" when you ask him if penguins can fly, he is precise but not accurate. In chapter 4, you'll read about research I've done with Google's AI Overview. Spoiler alert: It has problems with both accuracy and precision.
Reliability—A subjective term for the overall performance of AI against human expectations. Highly reliable AI will behave the way we expect it to every time, day or night, out in our real world. The reality is that you often have to decide if the AI is "good enough" for the job you want it to do. Computer scientists and AI companies often want to keep the conversation focused on accuracy, the result of the experiments they control. People like you and me, who have to live with AI in the real world, care only about reliability. The higher the stakes—the more we have to rely on the AI to keep us safe or make us money—the more we expect it to be reliable. I sometimes give up real-world accuracy for precision in higher-stakes situations. For example, I consider the AI in my car to be highly reliable even though its automated lane keeping once tried to take an exit that wasn't there. A highway exit had been closed, but new lines had not yet been painted on the road. The old reflective paint lines had been scraped off so they wouldn't mislead drivers at night, leaving a smudged line-like pattern on the road leading off to the right where the exit used to be. My car suddenly tried to swerve right to follow the scraped smudges into the closed exit. Objectively, this was not what I expected and could be considered incorrect in human terms. At first, I was wary and thought about turning off the lane-keeping feature. But because the car's AI was consistent, precise, and did the same thing it would've done had actual painted lines been there, I saw it as predictable. And predictability is to me an important part of how I perceive reliability. Same decision every time. The AI saw a line-like pattern and followed it. Highly precise. My car could have ignored those remnants of lines, kept going, and arguably have been more objectively correct, but then I wouldn't trust it as much (and it might've put me into a guard rail). The Google AI Overview research you'll read about in chapter 4 reveals challenges with both objective truth and precision, but what made me wary the most was that Google AI Overview gave me different answers to the same question. Not precise. Google AI Overview quickly became unreliable to me, so I tend to ignore it.
Transfer Learning—A type of machine learning where you start with a model that has been trained on one data set. You then continue to train the starting model using a different data set. If you've ever benefitted from knowing common Latin root words while studying a foreign language or learning vocabulary in English, you've done your own form of transfer learning. If you know that "aqua" is the Latin word for water, you can transfer that knowledge to learn the words "aquifer" in English, "aquifère" in French, and "agua" in Spanish. Important but not immediately commercially attractive applications of AI have been developed this way. After researchers released some of the first foundation models built from large collections of image data gathered from the internet and our smartphones, medical AI researchers used them as a starting point to teach AI to automatically detect bone fractures. They found that they could get much better results from the relatively small amount of x-ray data they had. The starting model "understood" fundamental characteristics of images like lines, shapes, intensity, and gradient. Those parts of the model transferred to the x-ray model training and didn't have to be learned from scratch from a relatively smaller number of x-rays. The model could "focus" on learning the most important task of identifying fractures in the images.
Generative, Generative AI—A type of AI that generates the next most likely response based on what you give it. You can think of generative as a fill-in-the-blank AI where the blank is at the end of the sentence. Generative AI responds with what is most likely to come next based on the starting point it is given. The starting point is called a "prompt" and can be whatever the AI is designed to accept. Words, a picture, a list of numbers, sound or video recordings can all be the starting point for generative AI to… generate what its underlying model has learned is the most likely next piece of information—the response. For example, generative AI can give you the next word in a sentence based on all the words that came before. When you log into ChatGPT and type

I like to eat ice

it responds with

cream

That's because in all the text used to train ChatGPT, "I like to eat ice cream" has the highest attention score. In other words, it is the most likely combination. Behind ChatGPT is a pre-trained large language model that in concept contains all of the words in the English language, together with the degree of likelihood that each word will be the next to come after the words before it. You can test this by prompting ChatGPT with the same words but re-arranged into a plain list of words without correct grammar. For example, when you type

what is the most likely word to come next in the sequence "eat like to ice"

ChatGPT responds with

cream

This is the core behavior of generative AI. It can get a lot more complicated, but the principle is the same. The "G" in ChatGPT stands for Generative. Note: Not all AI is generative. Another important type of AI is "bidirectional."

Bidirectional AI—A type of AI that produces a response based not only on what is likely to come next, but also on what came before. You can think of bidirectional as AI that is good at understanding things based on the full context of the prompt you give it. Google Gemini is an example of AI that uses a bidirectional model. At the time of writing, when you give Gemini the prompt

what is the meaning of the word "dog" in the sentence "the car broke down and it got so hot the dog let off steam"

it responds with

In this context, "dog" is slang for something that is of poor quality or unreliable.

On the other hand, at the time of writing ChatGPT replies to the same prompt with

the word "dog" most likely refers to an actual dog—as in the animal.

In this example, you can see that the bidirectional AI (Gemini) was better at picking up the semantics of the word "dog" based on the context of the sentence. This is because the generative AI (ChatGPT) was trained on language data that contained way more mentions of dog the animal than dog the slang word, and it made a prediction based on what it deemed most likely to come next (mentions of dog, the animal).

Prompt—The starting point that you give AI. Typically, when going back and forth in an interactive "conversation." In school terminology, it's like the essay prompt you respond to with a written document. The word "prompt" came into use with the advent of language-based AI such as ChatGPT. It's really just a new word to describe the input given to an AI tool with the expectation the tool will respond. We typically describe the commands we type into an AI like ChatGPT as our "prompt." If your phone unlocks by showing your face, the image of your face is the "prompt" to the AI on the phone, which responds with "unlock" or "don't unlock." If you're using an AI search tool that searches the web based on a picture you give it, the picture is your "prompt."
Hallucinate—Generative AI will always generate a response. No matter what. When the response sounds reasonable but is actually complete nonsense, it is called AI hallucination. By its nature, generative AI will give you the most likely response based on all the example data it has learned from. Unfortunately, this response is not guaranteed to be correct.
Prompt Engineering—A (mostly) buzzword for the new-as-of-2022 process of interacting with an AI system to get the most value out of it. Have you done some trial-and-error interaction with an AI? Congratulations! You're a Prompt Engineer! Often people acting as prompt engineers are figuring out how to give an AI the right starting point so it will respond in a useful way and not hallucinate. AI systems are not human and have learned by example, so prompt engineers learn how to interact with each (prompt) in a way that "makes sense" given the data the particular AI has been exposed to and the limitations of the tool it is embedded in.
Bot—Slang for any AI that automates a task. "Bot" is often used to describe AI that automates simple tasks in a workflow. For example, a bot might be used to check for billing errors in an accounting system. Another example of a bot is your email spam filter.
Chatbot—Slang for a language-based AI tool that can automatically interact with you in a conversational style. Interaction with modern AI chatbots like OpenAI's ChatGPT is a deliberate feature. The more context the AI has to go on, the more you "chat" back and forth with it, the better it will perform at its given task. At the time of publication, chatbots primarily interact via text or sometimes audio and video. Future chatbots will interact via very realistic video.
Robot—Physical machine that performs a task in the real world. Robots may or may not also use AI for some of their systems. A manufacturing robot that welds car parts probably does not use AI, as it is welding the same parts over and over again. In this case, it is cheaper and better to program the robot to follow a set of predefined procedures. A robotic vacuum cleaner probably uses AI for obstacle detection and avoidance. My car is part AI-enabled robot because it will turn the steering wheel to follow lines painted on the road for its automatic lane keeping safety feature.
Computer Vision—A field of AI and technology that combines cameras with computer systems to observe and interpret the physical world. Computer vision was revolutionized by machine learning and AI. Computer vision has been around for a lot longer than today's AI. Handwriting recognition computer vision systems have read the handwritten and printed addresses on your mail for a long time.
Facial Recognition—A specialized type of artificial intelligence that learns to uniquely identify the faces of human individuals. If your phone unlocks based on an image of your face, it is using facial recognition AI. Facial recognition is part of surveillance, photography, and social media. Do you remember when social media apps automatically recognized and tagged the faces of your friends in pictures? After controversy, one of the biggest apps stopped collecting, processing, and storing faceprints on their servers in 2021. Instead, they and others moved the AI to your phone where it does facial recognition "locally," often to show you advertisements based on the faces of people in your pictures. Researchers have shown as recently as 2024 that these systems are biased in how they link faces to the concepts they use to serve advertisements, such as overly associating the Great Wall of China with Asian women, art paintings with White women, and nudity with White men (West et al., 2024).
Natural Language Processing—A specialized field of AI and technology focused on processing human language for lots of different purposes. Combining natural language processing with computer vision means AI that will interact with humans and the physical world both visually and using human language.
Interactive—A computer system that can respond to human input, typically over multiple back-and-forth cycles. A video game is interactive, as is a chatbot.
Fine-tuning—Teaching a general machine to be better at a more specific task. Or, put another way, the process of starting with a general trained and tested model but then teaching it to be better at something more nuanced. Much fine-tuning is done by starting with a foundation model, say an AI based on a large language model of the English language. The AI is capable of writing thank-you notes for you in a generic style. You could fine-tune the model to be better at responding to prompts with language in a more specific tone or style by doing additional training. You could give it a large collection of letters written in a formal Victorian style, ask it to write a thousand thank-you notes, but only accept the thank-you notes that continued in formal style after a salutation of "Dearest Auntie of Mine." Repeat this enough times, and the model will be better—fine-tuned—for generating formal thank-you notes.
Neural Network—A computer system that mimics the interconnected neurons of your brain. A neuron in your brain takes input in the form of either electrical or chemical signals, processes the input, and if the result of processing rises above a certain threshold, sends either an electrical or chemical signal to the next neuron. A computer neuron is a virtual, digital neuron. It's actually computer code that takes a number as input and processes it by multiplying the input number by a value specific to the virtual neuron (known as a "weight"). If the result rises above a certain threshold, the neuron will pass the number along to the next virtual neuron. Both biological and digital systems learn by creating connected pathways—networks—of many neurons that process information correctly. In machine learning, the cycles of trial-and-error training are used to adjust the weights, the values assigned to each neuron, until the network learns to process information correctly.
Recurrent Neural Network—Recurrent neural networks are good at learning from sequential data like words in a sentence. In concept, they "read" the sentence over and over (recurring) to learn things from it. Let's go back to the "I like to eat ice cream" example. Let's say you want to teach a recurrent neural network to take in a bunch of sentences and tell you which ones are generally positive and happy vs the ones that are negative and sad. Your recurrent neural network will start with the word "I." It "remembers" the word "I" when it looks at the next word "like." It remembers the short phrase "I like" when it looks at the next word "to." It remembers the short phrase "I like to" when it looks at the next word "eat," and so on until the network has the whole sentence in its memory. After receiving feedback over a few training cycles, it correctly determines that the whole sentence "I like to eat ice cream" reflects a positive sentiment. It then goes on to learn that "my car drives crazy well" also reflects a positive sentiment, even with the idiomatic use of "crazy." Your network is learning that certain one-after-the-other sequences of words indicate positive, happy sentiment. When you give the recurrent network the same words in a different order, "well my car drives crazy," and it guesses negative sentiment, you've taught it that the order of the words matters. A recurrent neural network can learn deeper semantic meaning by building up a representation of sentences one word after the other (recurrent), while never forgetting the words that came before. All this seems intuitive to us humans, but getting a computer to do this was a big deal back around the middle of the first decade of this century.
Convolutional Neural Network—Convolutional neural networks are good at learning from pictures or other grid-like data to find patterns. In concept, they "see" the most important patterns contained in a picture or a grid of numbers. Convolutional neural networks learn from more complex information by breaking it down into layers of simpler information.
Transformer—Transformers are neural networks good at learning from very large amounts of data. In concept, they learn which pieces of data are worth paying attention to, and how those pieces of data are important to each other. This kind of learning takes less computing than recurrent or convolutional neural networks, so can be scaled up to learn from many, many more examples.
Deep Learning—Using more than two layers of neural networks to approximate the multi-layered structure of the human brain. The first layer takes input, the next layer learns simple patterns, the layer after that more complex patterns, and so on.
Explainable—Artificial intelligence that can demonstrate why it gave a certain response. When AI gets something wrong or hallucinates, we want to know "why." Similarly, we want an explanation when AI is technically accurate given its training data but not objectively, empirically correct from a human standpoint. At this writing, explainability is a "holy grail" of AI. The systems are so complex, with neural networks so dense and interconnected, that from a practical standpoint they are black boxes. It's not currently possible to explain how an AI learned a particular neural network pathway from its training data. The latter point is important because AI are only as "smart" as the data they have been exposed to. Objectively wrong responses indicate a gap in training data that AI developers would like to know about so they can fill it. There is a whole field of research on explainability, and some newer AI are getting better at showing users a loose form of reasoning for their response. But that reasoning is not an explanation for exactly how an AI came up with a response based on specific training data. That capability is yet to be discovered.
Bias—When AI responds in a way that shows a preference in one direction that is not objectively correct. For example, an AI taught to predict credit risk that ends up giving lower scores to Black people even if they have the same credit history as White people is exhibiting bias. Bias in AI almost always happens because training data contains bias. In the credit score example, historical data used to train the AI contained reduced scores for Black people originally assigned by people or credit-rating formulas with financial prejudice.
Ethics—Simply put, expecting AI systems and the people who develop and use them to behave according to formal and informal ethical principles. These principles are entirely human and in the eye of the beholder. For example, I may think it is unethical to use AI to pick targets for military strikes, but you may think it is ethical because AI can be less biased or prone to error than humans.

Exercises: Try It Out

Download Merlin and use Sound ID to learn to identify one type of bird in your backyard.
Create an account. Use this series of prompts with ChatGPT.

I like to eat…
On a cold day I like to eat…
On a hot day I like to eat…

Notice how the response changes when you give it more to pay attention to. Notice that when it doesn't have much to go on, it responds with a question. This is the interactive nature of chat-based AI. A friend and colleague described working with chat-based AI as like having an eager but inexperienced intern. A good AI intern wants to get it right, so asks a lot of questions to be sure it's heading in the right direction. This is deliberate, and a good characteristic. Contrast this with the certainty of Google's AI Overview response. Under what circumstances would you prefer one over the other?

Dinner party: Do this exercise with a group of people.

Hand out a piece of paper with the same three questions you used with ChatGPT on it.
Ask people to secretly fill in their answers on the piece of paper.
Collect and read everyone's answers, looking for similarities and differences. Then share your observations with your guests.
The stack of papers are a large language model.
You are now a generative pre-trained transformer.

References

Fishbein, Adam R., William J. Idsardi, Gregory F. Ball, & Robert J. Dooling, 2019. Sound Sequences in Birdsong: How Much Do Birds Really Care? Philosophical Transactions of the Royal Society B. The Royal Society Publishing. (Retrieved on April 19, 2025, from https://royalsocietypublishing.org/doi/10.1098/rstb.2019.0044)

Galchen, Rivka, 2024. How Scientists Started to Decode Birdsong. The New Yorker, October 14.

Gazzaniga, Michael S., 2018. The Consciousness Instinct: Unraveling the Mystery of How the Brain Makes the Mind. Farrar, Straus and Giroux.

Kounios, John, & Mark Beeman. 2015. The Eureka Factor: Aha Moments, Creative Insights, and the Brain. Independently Published.

McConnell, James V., 1989. Understanding Human Behavior. (6th Ed.). Holt, Rinehart, and Winston.

Meta, 2023. New AI Advancements Drive Meta's Ads System Performance and Efficiency. Meta. (Retrieved on April 7, 2025, from https://ai.meta.com/blog/ai-ads-performance-efficiency-meta-lattice/)

West, Jack, Lea Thiemt, Shimaa Ahmed, et al., 2024. A Picture Is Worth 500 Labels: A Case Study of Demographic Disparities in Local Machine Learning Models for Instagram and TikTok. University of Wisconsin‒Madison. ARXiv.org. (Retrieved on April 19, 2025, from https://arxiv.org/pdf/2403.19717)

Wikipedia, n.d. Wikipedia: Guide to Addressing Bias. (Retrieved on May 7, 2025, from Wikipedia:Guide to addressing bias - Wikipedia)

Zablotsky, Benjamin, Basilica Arockiaraj, Gelila Haile, & Amanda Ng, 2024. Daily Screen Time Among Teenagers: United States, July 2021‒December 2023. (Retrieved on April 7, 2025, from Centers for Disease Control and Prevention, National Center for Health Statistics. Products - Data Briefs - Number 513 -October 2024)

...more

View all episodes

By Jeff Pennington and MJ Pennington

January 10, 2026

Audiobook: Chapter 1 What Is AI?

1 hour 18 minutes

CHAPTER 1: What is AI?

How Machines Learn

Now we'll use a real example to review the terms domain, task, metric, data, labeling, training, gradient descent, convergence, testing, deploying, and continual learning

A Word About Data

The Language Of AI: Demystifying Jargon

The definitions, rather than appearing in alphabetical order, are organized in such a way that the terms logically follow one another.

Definitions

Algorithm—At the time this book was written, "algorithm" was used as a euphemism for artificial intelligence that seems to know something about us or the physical world. We use "algorithm" instead of artificial intelligence because it's a comfortable word to describe something that can get, well, creepy. Have you ever said or heard a friend say something like, "The algorithm knew I was thinking about buying a house!" followed by an uncomfortable laugh? We've been using "algorithm" instead of "artificial intelligence" for years because it's a more comfortable word to describe something that can get, well, creepy. But anytime you refer to the algorithms on Amazon or Instagram, you are in fact referring to AI.

Data—When we're talking about AI, data are digital representations of the real or online world. Data are (or is; often data is treated as a collective noun) always a representative example of the real world, but almost never every possible representation. Since artificial intelligence learns by example, the more data you can give it, the more it can learn. Data can be structured, like a spreadsheet made up of columns and rows with numbers or text in each little box. Data can also be unstructured, like a digital photo, the electronic files used to hold the words of this book, or the squiggly line of your heartbeat on a monitor in the hospital. Humans can work with unstructured data as is; we can look at a picture and understand what we see or read this paragraph. Artificial intelligence, however, needs help. For the purposes of teaching artificial intelligence, all unstructured data must become structured. For example, the picture you took of a spring daffodil is transformed by a computer program into two columns of numbers with one row for every tiny little part of the picture. The bright area of white in the middle of the picture will produce many rows that give the coordinates of tiny little spots called pixels that make up the area. Each row will contain numbers representing the color and intensity at the exact spot. AI reads the structured data and learns from it. A 4K camera takes pictures that have 4000 rows!

Model—Collection of observations created by a computer system as it learns something from data. The following example is intuitive and easy to follow but completely made up and not actually true. If you're a pediatrician, please forgive me! Let's pretend I collect structured data from a thousand children by recording their weight at each birthday up to age fifteen. I make a spreadsheet with two columns, one for age, one for weight. I put my own data into the spreadsheet: At one year old, I was 10 pounds. At two years old, 22 pounds. Fifteen years old, 149 pounds. Let's pretend that same trend holds for most of the other children in the data. I can use a computer running a math equation (called linear regression) to learn a simple model, again not real but for illustration only:

The weight of a child is equal to ten times their age, plus or minus a few pounds.

Foundation Model—A special kind of model used to represent a very broad area of our world or life. It could be as general and foundational as "the English language" or "pictures." Some of the first foundation models were indeed built from images. For example, a foundation model for images is a powerful tool to help artificial intelligence understand what makes up a line, basic shapes like squares and circles, light versus dark, and color. You can think of a foundation model as the part of your brain that processes the primary input from your eyes, ears, and other senses. You likely first experienced the power of a foundation model when your phone could unlock itself by looking at your face, then when you searched for a picture in the photo album on your phone. Foundation models based on language are the basis, in fact the foundation, of the powerful AI tools we started to use in the early 2020s. The next generation of foundation models is being built by the largest AI companies from multiple broad areas, for example, combining images and language.
Large Language Model—Foundation model built from human language. The machine has been taught the vocabulary, sentence structure, and style of an entire language. Additionally, the machine has been taught the likelihood of how words, sentences, and style occur together across an entire human language.
Machine Learning—The process of using a computer system to learn a model from data. Another (made-up) example: You need to forecast the global fish supply but have not been able to use regular math to figure out how water temperature, tide, location, and day of the year come together to determine how many fish will be caught. Your data contain the pounds of fish caught, water temperature, and height of the ocean tide at every minute of the day for a whole year at a thousand locations around the world. You decide to try machine learning: to teach a machine to guess whether fishing will be good.

Artificial Intelligence—Computer systems that learn models from data through the process of machine learning. These models can "understand" (describe and represent) something like language, pictures of animals, x-rays, or rainfall. These models can also do tasks like translate languages, identify cats in pictures, diagnose broken bones, or predict rainwater runoff patterns.
Supervised—When a human tells a machine whether it guessed right or wrong. The fishing example above is supervised machine learning, supervised artificial intelligence.
Unsupervised—When a machine learns something that may or may not be "right" in the eyes of a human. On its own, without supervision. An example of unsupervised machine learning, or unsupervised artificial intelligence, would be when a machine learns which words are important on the Wikipedia website based on something like how often a word shows up in all the articles. You didn't teach it. Without supervision, it'll probably learn "and" is an important word. Unsupervised machine learning is often a first step in developing AI systems. After a round of unsupervised learning to get a rough model, humans can supervise the next round, during which you'll teach the machine to ignore "and," "the," and "is" in favor of words that you deem actually important.
Training—The part of machine learning when you tell a machine what it got right and wrong through multiple rounds of guessing. You know the answer (fishing is good or not), and the machine learns how to guess the answer based on the data you give it.
Testing—The part of machine learning where you show a freshly trained machine data it's never seen before and test whether it meets some benchmark you decided ahead of time.
Pre-training—Often used to describe the process of training a foundation model to "understand" fundamentals, e.g., of language or images, with the expectation that additional training and testing will be done so the model can be used for a specific task like translation from English to German or facial recognition.
Accuracy—How often a machine is correct compared to an established benchmark in the training and testing data. When you test a new model during the machine learning process, you're often testing its accuracy. Let's say you have your uncle over for dinner and a game of trivia. If your uncle answers "blue" and "round" when you ask him the color of the sky and the shape of the earth, he is accurate. If, after a few glasses of wine, your uncle answers "blue" and "flat" to the same question, his accuracy is suffering. It's important to note that AI accuracy is not a real-world measure and is solely from an AI (not human) perspective. If your uncle was raised by flat-earthers and never learned that the world was round, his answer of "flat" is as accurate as the data he had from his parents. Similarly, if an AI is taught using flawed data, it can be accurate given the data but not empirically, objectively correct when out in the real world.
Precision—How often a machine gives you the same answer. In other words, consistency. When you're testing a new model during the machine learning process, you sometimes ask it the same question multiple times, an opportunity for the machine to give you the same answer (or not). If your uncle gives you a different answer every time you ask him if the world is round, he is not precise. If your uncle always replies "blue" when you ask him if penguins can fly, he is precise but not accurate. In chapter 4, you'll read about research I've done with Google's AI Overview. Spoiler alert: It has problems with both accuracy and precision.
Reliability—A subjective term for the overall performance of AI against human expectations. Highly reliable AI will behave the way we expect it to every time, day or night, out in our real world. The reality is that you often have to decide if the AI is "good enough" for the job you want it to do. Computer scientists and AI companies often want to keep the conversation focused on accuracy, the result of the experiments they control. People like you and me, who have to live with AI in the real world, care only about reliability. The higher the stakes—the more we have to rely on the AI to keep us safe or make us money—the more we expect it to be reliable. I sometimes give up real-world accuracy for precision in higher-stakes situations. For example, I consider the AI in my car to be highly reliable even though its automated lane keeping once tried to take an exit that wasn't there. A highway exit had been closed, but new lines had not yet been painted on the road. The old reflective paint lines had been scraped off so they wouldn't mislead drivers at night, leaving a smudged line-like pattern on the road leading off to the right where the exit used to be. My car suddenly tried to swerve right to follow the scraped smudges into the closed exit. Objectively, this was not what I expected and could be considered incorrect in human terms. At first, I was wary and thought about turning off the lane-keeping feature. But because the car's AI was consistent, precise, and did the same thing it would've done had actual painted lines been there, I saw it as predictable. And predictability is to me an important part of how I perceive reliability. Same decision every time. The AI saw a line-like pattern and followed it. Highly precise. My car could have ignored those remnants of lines, kept going, and arguably have been more objectively correct, but then I wouldn't trust it as much (and it might've put me into a guard rail). The Google AI Overview research you'll read about in chapter 4 reveals challenges with both objective truth and precision, but what made me wary the most was that Google AI Overview gave me different answers to the same question. Not precise. Google AI Overview quickly became unreliable to me, so I tend to ignore it.
Transfer Learning—A type of machine learning where you start with a model that has been trained on one data set. You then continue to train the starting model using a different data set. If you've ever benefitted from knowing common Latin root words while studying a foreign language or learning vocabulary in English, you've done your own form of transfer learning. If you know that "aqua" is the Latin word for water, you can transfer that knowledge to learn the words "aquifer" in English, "aquifère" in French, and "agua" in Spanish. Important but not immediately commercially attractive applications of AI have been developed this way. After researchers released some of the first foundation models built from large collections of image data gathered from the internet and our smartphones, medical AI researchers used them as a starting point to teach AI to automatically detect bone fractures. They found that they could get much better results from the relatively small amount of x-ray data they had. The starting model "understood" fundamental characteristics of images like lines, shapes, intensity, and gradient. Those parts of the model transferred to the x-ray model training and didn't have to be learned from scratch from a relatively smaller number of x-rays. The model could "focus" on learning the most important task of identifying fractures in the images.
Generative, Generative AI—A type of AI that generates the next most likely response based on what you give it. You can think of generative as a fill-in-the-blank AI where the blank is at the end of the sentence. Generative AI responds with what is most likely to come next based on the starting point it is given. The starting point is called a "prompt" and can be whatever the AI is designed to accept. Words, a picture, a list of numbers, sound or video recordings can all be the starting point for generative AI to… generate what its underlying model has learned is the most likely next piece of information—the response. For example, generative AI can give you the next word in a sentence based on all the words that came before. When you log into ChatGPT and type

I like to eat ice

it responds with

cream

what is the most likely word to come next in the sequence "eat like to ice"

ChatGPT responds with

cream

Bidirectional AI—A type of AI that produces a response based not only on what is likely to come next, but also on what came before. You can think of bidirectional as AI that is good at understanding things based on the full context of the prompt you give it. Google Gemini is an example of AI that uses a bidirectional model. At the time of writing, when you give Gemini the prompt

what is the meaning of the word "dog" in the sentence "the car broke down and it got so hot the dog let off steam"

it responds with

In this context, "dog" is slang for something that is of poor quality or unreliable.

On the other hand, at the time of writing ChatGPT replies to the same prompt with

the word "dog" most likely refers to an actual dog—as in the animal.

Prompt—The starting point that you give AI. Typically, when going back and forth in an interactive "conversation." In school terminology, it's like the essay prompt you respond to with a written document. The word "prompt" came into use with the advent of language-based AI such as ChatGPT. It's really just a new word to describe the input given to an AI tool with the expectation the tool will respond. We typically describe the commands we type into an AI like ChatGPT as our "prompt." If your phone unlocks by showing your face, the image of your face is the "prompt" to the AI on the phone, which responds with "unlock" or "don't unlock." If you're using an AI search tool that searches the web based on a picture you give it, the picture is your "prompt."
Hallucinate—Generative AI will always generate a response. No matter what. When the response sounds reasonable but is actually complete nonsense, it is called AI hallucination. By its nature, generative AI will give you the most likely response based on all the example data it has learned from. Unfortunately, this response is not guaranteed to be correct.
Prompt Engineering—A (mostly) buzzword for the new-as-of-2022 process of interacting with an AI system to get the most value out of it. Have you done some trial-and-error interaction with an AI? Congratulations! You're a Prompt Engineer! Often people acting as prompt engineers are figuring out how to give an AI the right starting point so it will respond in a useful way and not hallucinate. AI systems are not human and have learned by example, so prompt engineers learn how to interact with each (prompt) in a way that "makes sense" given the data the particular AI has been exposed to and the limitations of the tool it is embedded in.
Bot—Slang for any AI that automates a task. "Bot" is often used to describe AI that automates simple tasks in a workflow. For example, a bot might be used to check for billing errors in an accounting system. Another example of a bot is your email spam filter.
Chatbot—Slang for a language-based AI tool that can automatically interact with you in a conversational style. Interaction with modern AI chatbots like OpenAI's ChatGPT is a deliberate feature. The more context the AI has to go on, the more you "chat" back and forth with it, the better it will perform at its given task. At the time of publication, chatbots primarily interact via text or sometimes audio and video. Future chatbots will interact via very realistic video.
Robot—Physical machine that performs a task in the real world. Robots may or may not also use AI for some of their systems. A manufacturing robot that welds car parts probably does not use AI, as it is welding the same parts over and over again. In this case, it is cheaper and better to program the robot to follow a set of predefined procedures. A robotic vacuum cleaner probably uses AI for obstacle detection and avoidance. My car is part AI-enabled robot because it will turn the steering wheel to follow lines painted on the road for its automatic lane keeping safety feature.
Computer Vision—A field of AI and technology that combines cameras with computer systems to observe and interpret the physical world. Computer vision was revolutionized by machine learning and AI. Computer vision has been around for a lot longer than today's AI. Handwriting recognition computer vision systems have read the handwritten and printed addresses on your mail for a long time.
Facial Recognition—A specialized type of artificial intelligence that learns to uniquely identify the faces of human individuals. If your phone unlocks based on an image of your face, it is using facial recognition AI. Facial recognition is part of surveillance, photography, and social media. Do you remember when social media apps automatically recognized and tagged the faces of your friends in pictures? After controversy, one of the biggest apps stopped collecting, processing, and storing faceprints on their servers in 2021. Instead, they and others moved the AI to your phone where it does facial recognition "locally," often to show you advertisements based on the faces of people in your pictures. Researchers have shown as recently as 2024 that these systems are biased in how they link faces to the concepts they use to serve advertisements, such as overly associating the Great Wall of China with Asian women, art paintings with White women, and nudity with White men (West et al., 2024).
Natural Language Processing—A specialized field of AI and technology focused on processing human language for lots of different purposes. Combining natural language processing with computer vision means AI that will interact with humans and the physical world both visually and using human language.
Interactive—A computer system that can respond to human input, typically over multiple back-and-forth cycles. A video game is interactive, as is a chatbot.
Fine-tuning—Teaching a general machine to be better at a more specific task. Or, put another way, the process of starting with a general trained and tested model but then teaching it to be better at something more nuanced. Much fine-tuning is done by starting with a foundation model, say an AI based on a large language model of the English language. The AI is capable of writing thank-you notes for you in a generic style. You could fine-tune the model to be better at responding to prompts with language in a more specific tone or style by doing additional training. You could give it a large collection of letters written in a formal Victorian style, ask it to write a thousand thank-you notes, but only accept the thank-you notes that continued in formal style after a salutation of "Dearest Auntie of Mine." Repeat this enough times, and the model will be better—fine-tuned—for generating formal thank-you notes.
Neural Network—A computer system that mimics the interconnected neurons of your brain. A neuron in your brain takes input in the form of either electrical or chemical signals, processes the input, and if the result of processing rises above a certain threshold, sends either an electrical or chemical signal to the next neuron. A computer neuron is a virtual, digital neuron. It's actually computer code that takes a number as input and processes it by multiplying the input number by a value specific to the virtual neuron (known as a "weight"). If the result rises above a certain threshold, the neuron will pass the number along to the next virtual neuron. Both biological and digital systems learn by creating connected pathways—networks—of many neurons that process information correctly. In machine learning, the cycles of trial-and-error training are used to adjust the weights, the values assigned to each neuron, until the network learns to process information correctly.
Recurrent Neural Network—Recurrent neural networks are good at learning from sequential data like words in a sentence. In concept, they "read" the sentence over and over (recurring) to learn things from it. Let's go back to the "I like to eat ice cream" example. Let's say you want to teach a recurrent neural network to take in a bunch of sentences and tell you which ones are generally positive and happy vs the ones that are negative and sad. Your recurrent neural network will start with the word "I." It "remembers" the word "I" when it looks at the next word "like." It remembers the short phrase "I like" when it looks at the next word "to." It remembers the short phrase "I like to" when it looks at the next word "eat," and so on until the network has the whole sentence in its memory. After receiving feedback over a few training cycles, it correctly determines that the whole sentence "I like to eat ice cream" reflects a positive sentiment. It then goes on to learn that "my car drives crazy well" also reflects a positive sentiment, even with the idiomatic use of "crazy." Your network is learning that certain one-after-the-other sequences of words indicate positive, happy sentiment. When you give the recurrent network the same words in a different order, "well my car drives crazy," and it guesses negative sentiment, you've taught it that the order of the words matters. A recurrent neural network can learn deeper semantic meaning by building up a representation of sentences one word after the other (recurrent), while never forgetting the words that came before. All this seems intuitive to us humans, but getting a computer to do this was a big deal back around the middle of the first decade of this century.
Convolutional Neural Network—Convolutional neural networks are good at learning from pictures or other grid-like data to find patterns. In concept, they "see" the most important patterns contained in a picture or a grid of numbers. Convolutional neural networks learn from more complex information by breaking it down into layers of simpler information.
Transformer—Transformers are neural networks good at learning from very large amounts of data. In concept, they learn which pieces of data are worth paying attention to, and how those pieces of data are important to each other. This kind of learning takes less computing than recurrent or convolutional neural networks, so can be scaled up to learn from many, many more examples.
Deep Learning—Using more than two layers of neural networks to approximate the multi-layered structure of the human brain. The first layer takes input, the next layer learns simple patterns, the layer after that more complex patterns, and so on.
Explainable—Artificial intelligence that can demonstrate why it gave a certain response. When AI gets something wrong or hallucinates, we want to know "why." Similarly, we want an explanation when AI is technically accurate given its training data but not objectively, empirically correct from a human standpoint. At this writing, explainability is a "holy grail" of AI. The systems are so complex, with neural networks so dense and interconnected, that from a practical standpoint they are black boxes. It's not currently possible to explain how an AI learned a particular neural network pathway from its training data. The latter point is important because AI are only as "smart" as the data they have been exposed to. Objectively wrong responses indicate a gap in training data that AI developers would like to know about so they can fill it. There is a whole field of research on explainability, and some newer AI are getting better at showing users a loose form of reasoning for their response. But that reasoning is not an explanation for exactly how an AI came up with a response based on specific training data. That capability is yet to be discovered.
Bias—When AI responds in a way that shows a preference in one direction that is not objectively correct. For example, an AI taught to predict credit risk that ends up giving lower scores to Black people even if they have the same credit history as White people is exhibiting bias. Bias in AI almost always happens because training data contains bias. In the credit score example, historical data used to train the AI contained reduced scores for Black people originally assigned by people or credit-rating formulas with financial prejudice.
Ethics—Simply put, expecting AI systems and the people who develop and use them to behave according to formal and informal ethical principles. These principles are entirely human and in the eye of the beholder. For example, I may think it is unethical to use AI to pick targets for military strikes, but you may think it is ethical because AI can be less biased or prone to error than humans.

Exercises: Try It Out

Download Merlin and use Sound ID to learn to identify one type of bird in your backyard.
Create an account. Use this series of prompts with ChatGPT.

I like to eat…
On a cold day I like to eat…
On a hot day I like to eat…

Dinner party: Do this exercise with a group of people.

Hand out a piece of paper with the same three questions you used with ChatGPT on it.
Ask people to secretly fill in their answers on the piece of paper.
Collect and read everyone's answers, looking for similarities and differences. Then share your observations with your guests.
The stack of papers are a large language model.
You are now a generative pre-trained transformer.

References

Galchen, Rivka, 2024. How Scientists Started to Decode Birdsong. The New Yorker, October 14.

Gazzaniga, Michael S., 2018. The Consciousness Instinct: Unraveling the Mystery of How the Brain Makes the Mind. Farrar, Straus and Giroux.

Kounios, John, & Mark Beeman. 2015. The Eureka Factor: Aha Moments, Creative Insights, and the Brain. Independently Published.

McConnell, James V., 1989. Understanding Human Behavior. (6th Ed.). Holt, Rinehart, and Winston.

Meta, 2023. New AI Advancements Drive Meta's Ads System Performance and Efficiency. Meta. (Retrieved on April 7, 2025, from https://ai.meta.com/blog/ai-ads-performance-efficiency-meta-lattice/)

Wikipedia, n.d. Wikipedia: Guide to Addressing Bias. (Retrieved on May 7, 2025, from Wikipedia:Guide to addressing bias - Wikipedia)

...more

Share Audiobook: Chapter 1 What Is AI?

Sign up to save your podcasts

Audiobook: Chapter 1 What Is AI?

Audiobook: Chapter 1 What Is AI?