If you’ve listened to the podcast for a while, you might have heard our ElevenLabs-powered AI co-host Charlie a few times. Text-to-speech has made amazing progress in the last 18 months, with OpenAI’s Advanced Voice Mode (aka “Her”) as a sneak peek of the future of AI interactions (see our “Building AGI in Real Time” recap). Yet, we had yet to see a real killer app for AI voice (not counting music).
Today’s guests, Raiza Martin and Usama Bin Shafqat, are the lead PM and AI engineer behind the NotebookLM feature flag that gave us the first viral AI voice experience, the “Deep Dive” podcast:
The idea behind the “Audio Overviews” feature is simple: take a bunch of documents, websites, YouTube videos, etc, and generate a podcast out of them. This was one of the first demos that people built with voice models + RAG + GPT models, but it was always a glorified speech-to-text. Raiza and Usama took a very different approach:
* Make it conversational: when you listen to a NotebookLM audio there are a ton of micro-interjections (Steven Johnson calls them disfluencies) like “Oh really?” or “Totally”, as well as pauses and “uh…”, like you would expect in a real conversation. These are not generated by the LLM in the transcript, but they are built into the the audio model. See ~28:00 in the pod for more details.
* Listeners love tension: if two people are always in agreement on everything, it’s not super interesting. They tuned the model to generate flowing conversations that mirror the tone and rhythm of human speech. They did not confirm this, but many suspect the 2 year old SoundStorm paper is related to this model.
* Generating new insights: because the hosts’ goal is not to summarize, but to entertain, it comes up with funny metaphors and comparisons that actually help expand on the content rather than just paraphrasing like most models do. We have had listeners make podcasts out of our podcasts, like this one.
This is different than your average SOTA-chasing, MMLU-driven model buildooor. Putting product and AI engineering in the same room, having them build evals together, and understanding what the goal is lets you get these unique results.
The 5 rules for AI PMs
We always focus on AI Engineers, but this episode had a ton of AI PM nuggets as well, which we wanted to collect as NotebookLM is one of the most successful products in the AI space:
1. Less is more: the first version of the product had 0 customization options. All you could do is give it source documents, and then press a button to generate. Most users don’t know what “temperature” or “top-k” are, so you’re often taking the magic away by adding more options in the UI. Since recording they added a few, like a system prompt, but those were features that users were “hacking in”, as Simon Willison highlighted in his blog post.
2. Use Real-Time Feedback: they built a community of 65,000 users on Discord that is constantly reporting issues and giving feedback; sometimes they noticed server downtime even before the Google internal monitoring did. Getting real time pings > aggregating user data when doing initial iterations.
3. Embrace Non-Determinism: AI outputs variability is a feature, not a bug. Rather than limiting the outputs from the get-go, build toggles that you can turn on/off with feature flags as the feedback starts to roll in.
4. Curate with Taste: if you try your product and it sucks, you don’t need more data to confirm it. Just scrap that and iterate again. This is even easier for a product like this; if you start listening to one of the podcasts and turn it off after 10 seconds, it’s never a good sign.
5. Stay Hands-On: It’s hard to build taste if you don’t experiment. Trying out all your competitors products as well as unrelated tools really helps you understand what users are seeing in market, and how to improve on it.
Chapters
00:00 Introductions01:39 From Project Tailwind to NotebookLM09:25 Learning from 65,000 Discord members12:15 How NotebookLM works18:00 Working with Steven Johnson23:00 How to prioritize features25:13 Structuring the data pipelines29:50 How to eval34:34 Steering the podcast outputs37:51 Defining speakers personalities39:04 How do you make audio engaging?45:47 Humor is AGI51:38 Designing for non-determinism53:35 API when?55:05 Multilingual support and dialect considerations57:50 Managing system prompts and feature requests01:00:58 Future of NotebookLM01:04:59 Podcasts for your codebase01:07:16 Plans for real-time chat01:08:27 Wrap up
Show Notes
* Notebook LM
* AI Test Kitchen
* Nicholas Carlini
* Steven Johnson
* Wealth of Nations
* Histories of Mysteries by Andrej Karpathy
* chicken.pdf Threads
* Area 120
* Raiza Martin
* Usama Bin Shafqat
Transcript
NotebookLM [00:00:00]: Hey everyone, we're here today as guests on Latent Space. It's great to be here, I'm a long time listener and fan, they've had some great guests on this show before. Yeah, what an honor to have us, the hosts of another podcast, join as guests. I mean a huge thank you to Swyx and Alessio for the invite, thanks for having us on the show. Yeah really, it seems like they brought us here to talk a little bit about our show, our podcast. Yeah, I mean we've had lots of listeners ourselves, listeners at Deep Dive. Oh yeah, we've made a ton of audio overviews since we launched and we're learning a lot. There's probably a lot we can share around what we're building next, huh? Yeah, we'll share a little bit at least. The short version is we'll keep learning and getting better for you. We're glad you're along for the ride. So yeah, keep listening. Keep listening and stay curious. We promise to keep diving deep and bringing you even better options in the future. Stay curious.
Alessio [00:00:52]: Hey everyone, welcome to the Latent Space Podcast. This is Alessio, partner and CTO at Residence at Decibel Partners. And I'm joined by my co-host, Swyx, founder of Smol.ai.
Swyx [00:01:01]: Hey, and today we're back in the studio with our special guest, Raiza Martin. And Raiza, I forgot to get your last name, Shafqat.
Raiza [00:01:10]: Yes.
Swyx [00:01:10]: Okay, welcome.
Raiza [00:01:12]: Hello, thank you for having us.
Swyx [00:01:14]: So AI podcasters meet human podcasters, always fun. Congrats on the success of Notebook LM. I mean, how does it feel?
Raiza [00:01:22]: It's been a lot of fun. A lot of it, honestly, was unexpected. But my favorite part is really listening to the audio overviews that people have been making.
Swyx [00:01:29]: Maybe we should do a little bit of intros and tell the story. You know, what is your path into the sort of Google AI org? Or maybe, actually, I don't even know what org you guys are in.
Raiza [00:01:39]: I can start. My name is Raisa. I lead the Notebook LM team inside of Google Labs. So specifically, that's the org that we're in. It's called Google Labs. It's only about two years old. And our whole mandate is really to build AI products. That's it. We work super closely with DeepMind. Our entire thing is just, like, try a bunch of things and see what's landing with users. And the background that I have is, really, I worked in payments before this, and I worked in ads right before, and then startups. I tell people, like, at every time that I changed orgs, I actually almost quit Google. Like, specifically, like, in between ads and payments, I was like, all right, I can't do this. Like, this is, like, super hard. I was like, it's not for me. I'm, like, a very zero-to-one person. But then I was like, okay, I'll try. I'll interview with other teams. And when I interviewed in payments, I was like, oh, these people are really cool. I don't know if I'm, like, a super good fit with this space, but I'll try it because the people are cool. And then I really enjoyed that, and then I worked on, like, zero-to-one features inside of payments, and I had a lot of fun. But then the time came again where I was like, oh, I don't know. It's like, it's time to leave. It's time to start my own thing. But then I interviewed inside of Google Labs, and I was like, oh, darn. Like, there's definitely, like—
Alessio [00:02:48]: They got you again.
Raiza [00:02:49]: They got me again. And so now I've been here for two years, and I'm happy that I stayed because especially with, you know, the recent success of Notebook LM, I'm like, dang, we did it. I actually got to do it. So that was really cool.
Usama [00:03:02]: Kind of similar, honestly. I was at a big team at Google. We do sort of the data center supply chain planning stuff. Google has, like, the largest sort of footprint. Obviously, there's a lot of management stuff to do there. But then there was this thing called Area 120 at Google, which does not exist anymore. But I sort of wanted to do, like, more zero-to-one building and landed a role there. We were trying to build, like, a creator commerce platform called Kaya. It launched briefly a couple years ago. But then Area 120 sort of transitioned and morphed into Labs. And, like, over the last few years, like, the focus just got a lot clearer. Like, we were trying to build new AI products and do it in the wild and sort of co-create and all of that. So, you know, we've just been trying a bunch of different things. And this one really landed, which has felt pretty phenomenal. Really, really landed.
Swyx [00:03:53]: Let's talk about the brief history of Notebook LM. You had a tweet, which is very helpful for doing research. May 2023, during Google I.O., you announced Project Tailwind.
Raiza [00:04:03]: Yeah.
Swyx [00:04:03]: So today is October 2024. So you joined October 2022?
Raiza [00:04:09]: Actually, I used to lead AI Test Kitchen. And this was actually, I think, not I.O. 2023. I.O. 2022 is when we launched AI Test Kitchen, or announced it. And I don't know if you remember it.
Swyx [00:04:23]: That's how you, like, had the basic prototype for Gemini.
Raiza [00:04:26]: Yes, yes, exactly. Lambda.
Swyx [00:04:28]: Gave beta access to people.
Raiza [00:04:29]: Yeah, yeah, yeah. And I remember, I was like, wow, this is crazy. We're going to launch an LLM into the wild. And that was the first project that I was working on at Google. But at the same time, my manager at the time, Josh, he was like, hey, I want you to really think about, like, what real products would we build that are not just demos of the technology? That was in October of 2022. I was sitting next to an engineer that was working on a project called Talk to Small Corpus. His name was Adam. And the idea of Talk to Small Corpus is basically using LLM to talk to your data. And at the time, I was like, wait, there's some, like, really practical things that you can build here. And just a little bit of background, like, I was an adult learner. Like, I went to college while I was working a full-time job. And the first thing I thought was, like, this would have really helped me with my studying, right? Like, if I could just, like, talk to a textbook, especially, like, when I was tired after work, that would have been huge. We took a lot of, like, the Talk to Small Corpus prototypes, and I showed it to a lot of, like, college students, particularly, like, adult learners. They were like, yes, like, I get it, right? Like, I didn't even have to explain it to them. And we just continued to iterate the prototype from there to the point where we actually got a slot as part of the I.O. demo in 23.
Swyx [00:05:42]: And Corpus, was it a textbook? Oh, my gosh.
Raiza [00:05:45]: Yeah. It's funny. Actually, when he explained the project to me, he was like, talk to Small Corpus. It was like, talk to a small corpse?
Swyx [00:05:51]: Yeah, nobody says Corpus.
Raiza [00:06:00]: It was like, a small corpse? This is not AI. Yeah, yeah. And it really was just, like, a way for us to describe the amount of data that we thought, like, it could be good for.
Swyx [00:06:02]: Yeah, but even then, you're still, like, doing rag stuff. Because, you know, the context length back then was probably, like, 2K, 4K.
Raiza [00:06:08]: Yeah, it was basically rag.
Raiza [00:06:09]: That was essentially what it was.
Raiza [00:06:10]: And I remember, I was like, we were building the prototypes. And at the same time, I think, like, the rest of the world was. Right? We were seeing all of these, like, chat with PDF stuff come up. And I was like, come on, we gotta go. Like, we have to, like, push this out into the world. I think if there was anything, I wish we would have launched sooner because I wanted to learn faster. But I think, like, we netted out pretty well.
Alessio [00:06:30]: Was the initial product just text-to-speech? Or were you also doing kind of, like, synthesizing of the content, refining it? Or were you just helping people read through it?
Raiza [00:06:40]: Before we did the I.O. announcement in 23, we'd already done a lot of studies. And one of the first things that I realized was the first thing anybody ever typed was, summarize the thing. Right?
Raiza [00:06:53]: Summarize the document.
Raiza [00:06:54]: And it was, like, half like a test and half just like, oh, I know the content. I want to see how well it does this. So it was part of the first thing that we launched. It was called Project Tailwind back then. It was just Q&A, so you could chat with the doc just through text, and it would automatically generate a summary as well. I'm not sure if we had it back then.
Raiza [00:07:12]: I think we did.
Raiza [00:07:12]: It would also generate the key topics in your document, and it could support up to, like, 10 documents. So it wasn't just, like, a single doc.
Alessio [00:07:20]: And then the I.O. demo went well, I guess. And then what was the discussion from there to where we are today? Is there any, maybe, intermediate step of the product that people missed between this was launch or?
Raiza [00:07:33]: It was interesting because every step of the way, I think we hit, like, some pretty critical milestones. So I think from the initial demo, I think there was so much excitement of, like, wow, what is this thing that Google is launching? And so we capitalized on that. We built the wait list. That's actually when we also launched the Discord server, which has been huge for us because for us in particular, one of the things that I really wanted to do was to be able to launch features and get feedback ASAP. Like, the moment somebody tries it, like, I want to hear what they think right now, and I want to ask follow-up questions. And the Discord has just been so great for that. But then we basically took the feedback from I.O., we continued to refine the product.
Raiza [00:08:12]: So we added more features.
Raiza [00:08:13]: We added sort of, like, the ability to save notes, write notes. We generate follow-up questions. So there's a bunch of stuff in the product that shows, like, a lot of that research. But it was really the rolling out of things. Like, we removed the wait list, so rolled out to all of the United States. We rolled out to over 200 countries and territories. We started supporting more languages, both in the UI and, like, the actual source stuff. We experienced, like, in terms of milestones, there was, like, an explosion of, like, users in Japan. This was super interesting in terms of just, like, unexpected. Like, people would write to us and they would be like, this is amazing. I have to read all of these rules in English, but I can chat in Japanese. It's like, oh, wow. That's true, right? Like, with LLMs, you kind of get this natural, it translates the content for you. And you can ask in your sort of preferred mode. And I think that's not just, like, a language thing, too. I think there's, like, I do this test with Wealth of Nations all the time because it's, like, a pretty complicated text to read. The Evan Smith classic.
Swyx [00:09:11]: It's, like, 400 pages or something.
Raiza [00:09:12]: Yeah. But I like this test because I'm, like, asking, like, Normie, you know, plain speak. And then it summarizes really well for me. It sort of adapts to my tone.
Swyx [00:09:22]: Very capitalist.
Raiza [00:09:25]: Very on brand.
Swyx [00:09:25]: I just checked in on a Notebook LM Discord. 65,000 people. Yeah.
Raiza [00:09:29]: Crazy.
Swyx [00:09:29]: Just, like, for one project within Google. It's not, like, it's not labs. It's just Notebook LM.
Raiza [00:09:35]: Just Notebook LM.
Swyx [00:09:36]: What do you learn from the community?
Raiza [00:09:39]: I think that the Discord is really great for hearing about a couple of things.
Raiza [00:09:43]: One, when things are going wrong. I think, honestly, like, our fastest way that we've been able to find out if, like, the servers are down or there's just an influx of people being, like, it says
Raiza [00:09:53]: system unable to answer.
Raiza [00:09:54]: Anybody else getting this?
Raiza [00:09:56]: And I'm, like, all right, let's go.
Raiza [00:09:58]: And it actually catches it a lot faster than, like, our own monitoring does.
Raiza [00:10:01]: It's, like, that's been really cool. So, thank you.
Swyx [00:10:03]: Canceled eat a dog.
Raiza [00:10:05]: So, thank you to everybody. Please keep reporting it. I think the second thing is really the use cases.
Raiza [00:10:10]: I think when we put it out there, I was, like, hey, I have a hunch of how people will use it, but, like, to actually hear about, you know, not just the context of, like, the use of Notebook LM, but, like, what is this person's life like? Why do they care about using this tool?
Raiza [00:10:23]: Especially people who actually have trouble using it, but they keep pushing.
Raiza [00:10:27]: Like, that's just so critical to understand what was so motivating, right?
Raiza [00:10:31]: Like, what was your problem that was, like, so worth solving? So, that's, like, a second thing.
Raiza [00:10:34]: The third thing is also just hearing sort of, like, when we have wins and when we don't have wins because there's actually a lot of functionality where I'm, like, hmm, I
Raiza [00:10:42]: don't know if that landed super well or if that was actually super critical.
Raiza [00:10:45]: As part of having this sort of small project, right, I want to be able to unlaunch things, too. So, it's not just about just, like, rolling things out and testing it and being, like, wow, now we have, like, 99 features. Like, hopefully we get to a place where it's, like, there's just a really strong core feature set and the things that aren't as great, we can just unlaunch.
Swyx [00:11:02]: What have you unlaunched? I have to ask.
Raiza [00:11:04]: I'm in the process of unlaunching some stuff, but, for example, we had this idea that you could highlight the text in your source passage and then you could transform it. And nobody was really using it and it was, like, a very complicated piece of our architecture and it's very hard to continue supporting it in the context of new features. So, we were, like, okay, let's do a 50-50 sunset of this thing and see if anybody complains.
Raiza [00:11:28]: And so far, nobody has.
Swyx [00:11:29]: Is there, like, a feature flagging paradigm inside of your architecture that lets you feature flag these things easily?
Raiza [00:11:36]: Yes, and actually...
Raiza [00:11:37]: What is it called?
Swyx [00:11:38]: Like, I love feature flagging.
Raiza [00:11:40]: You mean, like, in terms of just, like, being able to expose things to users?
Swyx [00:11:42]: Yeah, as a PM. Like, this is your number one tool, right?
Raiza [00:11:44]: Yeah, yeah.
Swyx [00:11:45]: Let's try this out. All right, if it works, roll it out. If it doesn't, roll it back, you know?
Raiza [00:11:49]: Yeah, I mean, we just run Mendel experiments for the most part. And, actually, I don't know if you saw it, but on Twitter, somebody was able to get around our flags and they enabled all the experiments.
Raiza [00:11:58]: They were, like, check out what the Notebook LM team is cooking.
Raiza [00:12:02]: I was, like, oh!
Raiza [00:12:03]: And I was at lunch with the rest of the team and I was, like, I was eating. I was, like, guys, guys, Magic Draft League!
Raiza [00:12:10]: They were, like, oh, no!
Raiza [00:12:12]: I was, like, okay, just finish eating and then let's go figure out what to do.
Raiza [00:12:15]: Yeah.
Alessio [00:12:15]: I think a post-mortem would be fun, but I don't think we need to do it on the podcast now. Can we just talk about what's behind the magic? So, I think everybody has questions, hypotheses about what models power it. I know you might not be able to share everything, but can you just get people very basic? How do you take the data and put it in the model? What text model you use? What's the text-to-speech kind of, like, jump between the two? Sure.
Raiza [00:12:42]: Yeah.
Raiza [00:12:42]: I was going to say, SRaiza, he manually does all the podcasts.
Raiza [00:12:46]: Oh, thank you.
Usama [00:12:46]: Really fast. You're very fast, yeah.
Raiza [00:12:48]: Both of the voices at once.
Usama [00:12:51]: Voice actor.
Raiza [00:12:52]: Good, good.
Usama [00:12:52]: Yeah, so, for a bit of background, we were building this thing sort of outside Notebook LM to begin with. Like, just the idea is, like, content transformation, right? Like, we can do different modalities. Like, everyone knows that. Everyone's been poking at it. But, like, how do you make it really useful? And, like, one of the ways we thought was, like, okay, like, you maybe, like, you know, people learn better when they're hearing things. But TTS exists, and you can, like, narrate whatever's on screen. But you want to absorb it the same way. So, like, that's where we sort of started out into the realm of, like, maybe we try, like, you know, two people are having a conversation kind of format. We didn't actually start out thinking this would live in Notebook, right? Like, Notebook was sort of, we built this demo out independently, tried out, like, a few different sort of sources. The main idea was, like, go from some sort of sources and transform it into a listenable, engaging audio format. And then through that process, we, like, unlocked a bunch more sort of learnings. Like, for example, in a sense, like, you're not prompting the model as much because, like, the information density is getting unrolled by the model prompting itself, in a sense. Because there's two speakers, and they're both technically, like, AI personas, right? That have different angles of looking at things. And, like, they'll have a discussion about it. And that sort of, we realized that's kind of what was making it riveting, in a sense. Like, you care about what comes next, even if you've read the material already. Because, like, people say they get new insights on their own journals or books or whatever. Like, anything that they've written themselves. So, yeah, from a modeling perspective, like, it's, like Reiza said earlier, like, we work with the DeepMind audio folks pretty closely. So, they're always cooking up new techniques to, like, get better, more human-like audio. And then Gemini 1.5 is really, really good at absorbing long context. So, we sort of, like, generally put those things together in a way that we could reliably produce the audio.
Raiza [00:14:52]: I would add, like, there's something really nuanced, I think, about sort of the evolution of, like, the utility of text-to-speech. Where, if it's just reading an actual text response, and I've done this several times. I do it all the time with, like, reading my text messages. Or, like, sometimes I'm trying to read, like, a really dense paper, but I'm trying to do actual work. I'll have it, like, read out the screen. There is something really robotic about it that is not engaging. And it's really hard to consume content in that way. And it's never been really effective. Like, particularly for me, where I'm, like, hey, it's actually just, like, it's fine for, like, short stuff. Like, texting, but even that, it's, like, not that great. So, I think the frontier of experimentation here was really thinking about there is a transform that needs to happen in between whatever.
Raiza [00:15:38]: Here's, like, my resume, right?
Raiza [00:15:39]: Or here's, like, a 100-page slide deck or something. There is a transform that needs to happen that is inherently editorial. And I think this is where, like, that two-person persona, right, dialogue model, they have takes on the material that you've presented. That's where it really sort of, like, brings the content to life in a way that's, like, not robotic. And I think that's, like, where the magic is, is, like, you don't actually know what's going to happen when you press generate.
Raiza [00:16:08]: You know, for better or for worse.
Raiza [00:16:09]: Like, to the extent that, like, people are, like, no, I actually want it to be more predictable now. Like, I want to be able to tell them. But I think that initial, like, wow was because you didn't know, right? When you upload your resume, what's it about to say about you? And I think I've seen enough of these where I'm, like, oh, it gave you good vibes, right? Like, you knew it was going to say, like, something really cool. As we start to shape this product, I think we want to try to preserve as much of that wow as much as we can. Because I do think, like, exposing, like, all the knobs and, like, the dials, like, we've been thinking about this a lot. It's like, hey, is that, like, the actual thing?
Raiza [00:16:43]: Is that the thing that people really want?
Alessio [00:16:45]: Have you found differences in having one model just generate the conversation and then using text-to-speech to kind of fake two people? Or, like, are you actually using two different kind of system prompts to, like, have a conversation step-by-step? I'm always curious, like, if persona system prompts make a big difference? Or, like, you just put in one prompt and then you just let it run?
Usama [00:17:05]: I guess, like, generally we use a lot of inference, as you can tell with, like, the spinning thing takes a while. So, yeah, there's definitely, like, a bunch of different things happening under the hood. We've tried both approaches and they have their, sort of, drawbacks and benefits. I think that that idea of, like, questioning, like, the two different personas, like, persists throughout, like, whatever approach we try. It's like, there's a bit of, like, imperfection in there. Like, we had to really lean into the fact that, like, to build something that's engaging, like, it needs to be somewhat human and it needs to be just not a chatbot. Like, that was sort of, like, what we need to diverge from. It's like, you know, most chatbots will just narrate the same kind of answer, like, given the same sources, for the most part, which is ridiculous. So, yeah, there's, like, experimentation there under the hood, like, with the model to, like, make sure that it's spitting out, like, different takes and different personas and different, sort of, prompting each other is, like, a good analogy, I guess.
Swyx [00:18:00]: Yeah, I think Steven Johnson, I think he's on your team. I don't know what his role is. He seems like chief dreamer, writer.
Raiza [00:18:08]: Yeah, I mean, I can comment on Steven. So, Steven joined, actually, in the very early days, I think before it was even a fully funded project. And I remember when he joined, I was like, Steven Johnson's going to be on my team? You know, and for folks who don't know him, Steven is a New York Times bestselling author of, like, 14 books. He has a PBS show. He's, like, incredibly smart, just, like, a true, sort of, celebrity by himself. And then he joined Google, and he was like, I want to come here, and I want to build the thing that I've always dreamed of, which is a tool to help me think. I was like, a what? Like, a tool to help you think? I was like, what do you need help with? Like, you seem to be doing great on your own. And, you know, he would describe this to me, and I would watch his flow. And aside from, like, providing a lot of inspiration, to be honest, like, when I watched Steven work, I was like, oh, nobody works like this, right? Like, this is what makes him special. Like, he is such a dedicated, like, researcher and journalist, and he's so thorough, he's so smart. And then I had this realization of, like, maybe Steven is the product. Maybe the work is to take Steven's expertise and bring it to, like, everyday people that could really benefit from this. Like, just watching him work, I was like, oh, I could definitely use, like, a mini-Steven, like, doing work for me. Like, that would make me a better PM. And then I thought very quickly about, like, the adjacent roles that could use sort of this, like, research and analysis tool. And so, aside from being, you know, chief dreamer, Steven also represents, like, a super workflow that I think all of us, like, if we had access to a tool like it, would just inherently, like, make us better.
Swyx [00:19:46]: Did you make him express his thoughts while he worked, or you just silently watched him, or how does this work?
Raiza [00:19:52]: Oh, now you're making me admit it. But yes, I did just silently watch him.
Swyx [00:19:57]: This is a part of the PM toolkit, right? They give user interviews and all that.
Raiza [00:20:00]: Yeah, I mean, I did interview him, but I noticed, like, if I interviewed him, it was different than if I just watched him. And I did the same thing with students all the time. Like, I followed a lot of students around. I watched them study. I would ask them, like, oh, how do you feel now, right?
Raiza [00:20:15]: Or why did you do that? Like, what made you do that, actually?
Raiza [00:20:18]: Or why are you upset about, like, this particular thing? Why are you cranky about this particular topic? And it was very similar, I think, for Steven, especially because he was describing, he was in the middle of writing a book. And he would describe, like, oh, you know, here's how I research things, and here's how I keep my notes. Oh, and here's how I do it. And it was really, he was doing this sort of, like, self-questioning, right? Like, now we talk about, like, chain of, you know, reasoning or thought, reflection.
Raiza [00:20:44]: And I was like, oh, he's the OG.
Raiza [00:20:46]: Like, I watched him do it in real time. I was like, that's, like, L-O-M right there. And to be able to bring sort of that expertise in a way that was, like, you know, maybe, like, costly inference-wise, but really have, like, that ability inside of a tool that was, like, for starters, free inside of NotebookLM, it was good to learn whether or not people really did find use out of it.
Swyx [00:21:05]: So did he just commit to using NotebookLM for everything, or did you just model his existing workflow?
Raiza [00:21:12]: Both, right?
Raiza [00:21:12]: Like, in the beginning, there was no product for him to use. And so he just kept describing the thing that he wanted. And then eventually, like, we started building the thing. And then I would start watching him use it. One of the things that I love about Steven is he uses the product in ways where it kind of does it, but doesn't quite. Like, he's always using it at, like, the absolute max limit of this thing. But the way that he describes it is so full of promise, where he's like, I can see it going here. And all I have to do is sort of, like, meet him there and sort of pressure test whether or not, you know, everyday people want it. And we just have to build it.
Swyx [00:21:47]: I would say OpenAI has a pretty similar person, Andrew Mason, I think his name is. It's very similar, like, just from the writing world and using it as a tool for thought to shape Chachabitty. I don't think that people who use AI tools to their limit are common. I'm looking at my NotebookLM now. I've got two sources. You have a little, like, source limit thing. And my bar is over here, you know, and it stretches across the whole thing. I'm like, did he fill it up?
Raiza [00:22:09]: Yes, and he has, like, a higher limit than others, I think. He fills it up.
Raiza [00:22:14]: Oh, yeah.
Raiza [00:22:14]: Like, I don't think Steven even has a limit, actually.
Swyx [00:22:17]: And he has Notes, Google Drive stuff, PDFs, MP3, whatever.
Raiza [00:22:22]: Yes, and one of my favorite demos, he just did this recently, is he has actually PDFs of, like, handwritten Marie Curie notes. I see.
Swyx [00:22:29]: So you're doing image recognition as well. Yeah, it does support it today.
Raiza [00:22:32]: So if you have a PDF that's purely images, it will recognize it.
Raiza [00:22:36]: But his demo is just, like, super powerful.
Raiza [00:22:37]: He's like, okay, here's Marie Curie's notes. And it's like, here's how I'm using it to analyze it. And I'm using it for, like, this thing that I'm writing.
Raiza [00:22:44]: And that's really compelling.
Raiza [00:22:45]: It's like the everyday person doesn't think of these applications. And I think even, like, when I listen to Steven's demo, I see the gap. I see how Steven got there, but I don't see how I could without him. And so there's a lot of work still for us to build of, like, hey, how do I bring that magic down to, like, zero work? Because I look at all the steps that he had to take in order to do it, and I'm like, okay, that's product work for us, right? Like, that's just onboarding.
Alessio [00:23:09]: And so from an engineering perspective, people come to you and it's like, hey, I need to use this handwritten notes from Marie Curie from hundreds of years ago. How do you think about adding support for, like, data sources and then maybe any fun stories and, like, supporting more esoteric types of inputs?
Raiza [00:23:25]: So I think about the product in three ways, right? So there's the sources, the source input. There's, like, the capabilities of, like, what you could do with those sources. And then there's the third space, which is how do you output it into the world? Like, how do you put it back out there? There's a lot of really basic sources that we don't support still, right? I think there's sort of, like, the handwritten notes stuff is one, but even basic things like DocX or, like, PowerPoint, right? Like, these are the things that people, everyday people are like, hey, my professor actually gave me everything in DocX. Can you support that? And then just, like, basic stuff, like images and PDFs combined with text. Like, there's just a really long roadmap for sources that I think we just have to work on.
Raiza [00:24:04]: So that's, like, a big piece of it.
Raiza [00:24:05]: On the output side, and I think this is, like, one of the most interesting things that we learned really early on, is, sure, there's, like, the Q&A analysis stuff, which is like, hey, when did this thing launch? Okay, you found it in the slide deck. Here's the answer. But most of the time, the reason why people ask those questions is because they're trying to make something new. And so when, actually, when some of those early features leaked, like, a lot of the features we're experimenting with are the output types. And so you can imagine that people care a lot about the resources that they're putting into NotebookLM because they're trying to create something new. So I think equally as important as, like, the source inputs are the outputs that we're helping people to create. And really, like, you know, shortly on the roadmap, we're thinking about how do we help people use NotebookLM to distribute knowledge? And that's, like, one of the most compelling use cases is, like, shared notebooks. It's, like, a way to share knowledge. How do we help people take sources and, like, one-click new documents out of it, right? And I think that's something that people think is, like, oh, yeah, of course, right? Like, one push a document. But what does it mean to do it right? Like, to do it in your style, in your brand, right?
Raiza [00:25:08]: To follow your guidelines, stuff like that.
Raiza [00:25:09]: So I think there's a lot of work, like, on both sides of that equation.
Raiza [00:25:13]: Interesting.
Swyx [00:25:13]: Any comments on the engineering side of things?
Usama [00:25:16]: So, yeah, like I said, I was mostly working on building the text to audio, which kind of lives as a separate engineering pipeline, almost, that we then put into NotebookLM. But I think there's probably tons of NotebookLM engineering war stories on dealing with sources. And so I don't work too closely with engineers directly. But I think a lot of it does come down to, like, Gemini's native understanding of images really well with the latest generation.
Raiza [00:25:39]: Yeah, I think on the engineering and modeling side, I think we are a really good example of a team that's put a product out there, and we're getting a lot of feedback from the users, and we return the data to the modeling team, right? To the extent that we say, hey, actually, you know what people are uploading, but we can't really support super well?
Raiza [00:25:56]: Text plus image, right?
Raiza [00:25:57]: Especially to the extent that, like, NotebookLM can handle up to 50 sources, 500,000 words each. Like, you're not going to be able to jam all of that into, like, the context window. So how do we do multimodal embeddings with that? There's really, like, a lot of things that we have to solve that are almost there, but not quite there yet.
Alessio [00:26:16]: On then turning it into audio, I think one of the best things is it has so many of the human... Does that happen in the text generation that then becomes audio? Or is that a part of, like, the audio model that transforms the text?
Usama [00:26:27]: It's a bit of both, I would say. The audio model is definitely trying to mimic, like, certain human intonations and, like, sort of natural, like, breathing and pauses and, like, laughter and things like that. But yeah, in generating, like, the text, we also have to sort of give signals on, like, where those things maybe would make sense.
Alessio [00:26:45]: And on the input side, instead of having a transcript versus having the audio, like, can you take some of the emotions out of it, too? If I'm giving, like, for example, when we did the recaps of our podcast, we can either give audio of the pod or we can give a diarized transcription of it. But, like, the transcription doesn't have some of the, you know, voice kind of, like, things.
Raiza [00:27:05]: Yeah, yeah.
Alessio [00:27:05]: Do you reconstruct that when people upload audio or how does that work?
Raiza [00:27:09]: So when you upload audio today, we just transcribe it. So it is quite lossy in the sense that, like, we don't transcribe, like, the emotion from that as a source. But when you do upload a text file and it has a lot of, like, that annotation, I think that there is some ability for it to be reused in, like, the audio output, right? But I think it will still contextualize it in the deep dive format. So I think that's something that's, like, particularly important is, like, hey, today we only have one format.
Raiza [00:27:37]: It's deep dive.
Raiza [00:27:38]: It's meant to be a pretty general overview and it is pretty peppy.
Raiza [00:27:42]: It's just very upbeat.
Raiza [00:27:43]: It's very enthusiastic, yeah.
Raiza [00:27:45]: Yeah, yeah.
Raiza [00:27:45]: Even if you had, like, a sad topic, I think they would find a way to be, like, silver lining, though.
Raiza [00:27:50]: Really?
Raiza [00:27:51]: Yeah.
Raiza [00:27:51]: We're having a good chat.
Raiza [00:27:54]: Yeah, that's awesome.
Swyx [00:27:54]: One of the ways, many, many, many ways that deep dive went viral is people saying, like, if you want to feel good about yourself, just drop in your LinkedIn. Any other, like, favorite use cases that you saw from people discovering things in social media?
Raiza [00:28:08]: I mean, there's so many funny ones and I love the funny ones.
Raiza [00:28:11]: I think because I'm always relieved when I watch them. I'm like, haha, that was funny and not scary. It's great.
Raiza [00:28:17]: There was another one that was interesting, which was a startup founder putting their landing page and being like, all right, let's test whether or not, like, the value prop is coming through. And I was like, wow, that's right.
Raiza [00:28:26]: That's smart.
Usama [00:28:27]: Yeah.
Raiza [00:28:28]: And then I saw a couple of other people following up on that, too.
Raiza [00:28:32]: Yeah.
Swyx [00:28:32]: I put my about page in there and, like, yeah, if there are things that I'm not comfortable with, I should remove it. You know, so that it can pick it up. Right.
Usama [00:28:39]: I think that the personal hype machine was, like, a pretty viral one. I think, like, people uploaded their dreams and, like, some people, like, keep sort of dream journals and it, like, would sort of comment on those and, like, it was therapeutic. I didn't see those.
Raiza [00:28:54]: Those are good. I hear from Googlers all the time, especially because we launched it internally first. And I think we launched it during the, you know, the Q3 sort of, like, check-in cycle. So all Googlers have to write notes about, like, hey, you know, what'd you do in Q3? And what Googlers were doing is they would write, you know, whatever they accomplished in Q3 and then they would create an audio overview. And these people they didn't know would just ping me and be like, wow, I feel really good, like, going into a meeting with my manager.
Raiza [00:29:25]: And I was like, good, good, good, good. You really did that, right?
Usama [00:29:29]: I think another cool one is just, like, any Wikipedia article. Yeah. Like, you drop it in and it's just, like, suddenly, like, the best sort of summary overview.
Raiza [00:29:38]: I think that's what Karpathy did, right? Like, he has now a Spotify channel called Histories of Mysteries, which is basically, like, he just took, like, interesting stuff from Wikipedia and made audio overviews out of it.
Swyx [00:29:50]: Yeah, he became a podcaster overnight.
Raiza [00:29:52]: Yeah.
Raiza [00:29:53]: I'm here for it. I fully support him.
Raiza [00:29:55]: I'm racking up the listens for him.
Swyx [00:29:58]: Honestly, it's useful even without the audio. You know, I feel like the audio does add an element to it, but I always want, you know, paired audio and text. And it's just amazing to see what people are organically discovering. I feel like it's because you laid the groundwork with NotebookLM and then you came in and added the sort of TTS portion and made it so good, so human, which is weird. Like, it's this engineering process of humans. Oh, one thing I wanted to ask. Do you have evals?
Raiza [00:30:23]: Yeah.
Swyx [00:30:23]: Yes.
Raiza [00:30:24]: What? Potatoes for chefs.
Swyx [00:30:27]: What is that? What do you mean, potatoes?
Raiza [00:30:29]: Oh, sorry.
Raiza [00:30:29]: Sorry. We were joking with this, like, a couple of weeks ago. We were doing, like, side-by-sides. But, like, Raiza sent me the file and it was literally called Potatoes for Chefs. And I was like, you know, my job is really serious, but you have to laugh a little bit. Like, the title of the file is, like, Potatoes for Chefs.
Swyx [00:30:47]: Is it like a training document for chefs?
Usama [00:30:50]: It's just a side-by-side for, like, two different kind of audio transcripts.
Swyx [00:30:54]: The question is really, like, as you iterate, the typical engineering advice is you establish some kind of test or benchmark. You're at, like, 30 percent. You want to get it up to 90, right?
Raiza [00:31:05]: Yeah.
Swyx [00:31:05]: What does that look like for making something sound human and interesting and voice?
Usama [00:31:11]: We have the sort of formal eval process as well. But I think, like, for this particular project, we maybe took a slightly different route to begin with. Like, there was a lot of just within the team listening sessions. A lot of, like, sort of, like... Dogfooding.
Raiza [00:31:23]: Yeah.
Usama [00:31:23]: Like, I think the bar that we tried to get to before even starting formal evals with raters and everything was much higher than I think other projects would. Like, because that's, as you said, like, the traditional advice, right? Like, get that ASAP. Like, what are you looking to improve on? Whatever benchmark it is. So there was a lot of just, like, critical listening. And I think a lot of making sure that those improvements actually could go into the model. And, like, we're happy with that human element of it. And then eventually we had to obviously distill those down into an eval set. But, like, still there's, like, the team is just, like, a very, very, like, avid user of the product at all stages.
Raiza [00:32:02]: I think you just have to be really opinionated.
Raiza [00:32:05]: I think that sometimes, if you are, your intuition is just sharper and you can move a lot faster on the product.
Raiza [00:32:12]: Because it's like, if you hold that bar high, right?
Raiza [00:32:15]: Like, if you think about, like, the iterative cycle, it's like, hey, we could take, like, six months to ship this thing. To get it to, like, mid where we were. Or we could just, like, listen to this and be like, yeah, that's not it, right? And I don't need a rater to tell me that. That's my preference, right? And collectively, like, if I have two other people listen to it, they'll probably agree. And it's just kind of this step of, like, just keep improving it to the point where you're like, okay, now I think this is really impressive. And then, like, do evals, right? And then validate that.
Swyx [00:32:43]: Was the sound model done and frozen before you started doing all this? Or are you also saying, hey, we need to improve the sound model as well? Both.
Usama [00:32:51]: Yeah, we were making improvements on the audio and just, like, generating the transcript as well. I think another weird thing here was, like, we needed to be entertaining. And that's much harder to quantify than some of the other benchmarks that you can make for, like, you know, Sweebench or get better at this math.
Swyx [00:33:10]: Do you just have people rate one to five or, you know, or just thumbs up and down?
Usama [00:33:14]: For the formal rater evals, we have sort of like a Likert scale and, like, a bunch of different dimensions there. But we had to sort of break down what makes it entertaining into, like, a bunch of different factors. But I think the team stage of that was more critical. It was like, we need to make sure that, like, what is making it fun and engaging? Like, we dialed that as far as it goes. And while we're making other changes that are necessary, like, obviously, they shouldn't make stuff up or, you know, be insensitive.
Raiza [00:33:41]: Hallucinations. Safety.
Swyx [00:33:42]: Other safety things.
Raiza [00:33:43]: Right.
Swyx [00:33:43]: Like a bunch of safety stuff.
Raiza [00:33:45]: Yeah, exactly.
Usama [00:33:45]: So, like, with all of that and, like, also just, you know, following sort of a coherent narrative and structure is really important. But, like, with all of this, we really had to make sure that that central tenet of being entertaining and engaging and something you actually want to listen to. It just doesn't go away, which takes, like, a lot of just active listening time because you're closest to the prompts, the model and everything.
Swyx [00:34:07]: I think sometimes the difficulty is because we're dealing with non-deterministic models, sometimes you just got a bad roll of the dice and it's always on the distribution that you could get something bad. Basically, how many do you, like, do ten runs at a time? And then how do you get rid of the non-determinism?
Raiza [00:34:23]: Right.
Usama [00:34:23]: Yeah, that's bad luck.
Raiza [00:34:25]: Yeah.
Swyx [00:34:25]: Yeah.
Usama [00:34:26]: I mean, there still will be, like, bad audio overviews. There's, like, a bunch of them that happens. Do you mean for, like, the raider? For raiders, right?
Swyx [00:34:34]: Like, what if that one person just got, like, a really bad rating? You actually had a great prompt, you actually had a great model, great weights, whatever. And you just, you had a bad output.
Usama [00:34:42]: Like, and that's okay, right?
Raiza [00:34:44]: I actually think, like, the way that these are constructed, if you think about, like, the different types of controls that the user has, right? Like, what can the user do today to affect it?
Usama [00:34:54]: We push a button.
Raiza [00:34:55]: You just push a button.
Swyx [00:34:56]: I have tried to prompt engineer by changing the title. Yeah, yeah, yeah.
Raiza [00:34:59]: Changing the title, people have found out.
Raiza [00:35:02]: Yeah.
Raiza [00:35:02]: The title of the notebook, people have found out. You can add show notes, right? You can get them to think, like, the show has changed. Someone changed the language of the output. Changing the language of the output. Like, those are less well-tested because we focused on, like, this one aspect. So it did change the way that we sort of think about quality as well, right? So it's like, quality is on the dimensions of entertainment, of course, like, consistency, groundedness. But in general, does it follow the structure of the deep dive? And I think when we talk about, like, non-determinism, it's like, well, as long as it follows, like, the structure of the deep dive, right? It sort of inherently meets all those other qualities. And so it makes it a little bit easier for us to ship something with confidence to the extent that it's like, I know it's going to make a deep dive. It's going to make a good deep dive. Whether or not the person likes it, I don't know. But as we expand to new formats, as we open up controls, I think that's where it gets really much harder. Even with the show notes, right? Like, people don't know what they're going to get when they do that. And we see that already where it's like, this is going to be a lot harder to validate in terms of quality, where now we'll get a greater distribution. Whereas I don't think we really got, like, varied distribution because of, like, that pre-process that Raiza was talking about. And also because of the way that we'd constrain, like, what were we measuring for? Literally, just like, is it a deep dive?
Swyx [00:36:18]: And you determine what a deep dive is. Yeah. Everything needs a PM. Yeah, I have, this is very similar to something I've been thinking about for AI products in general. There's always like a chief tastemaker. And for Notebook LM, it seems like it's a combination of you and Steven.
Raiza [00:36:31]: Well, okay.
Raiza [00:36:32]: I want to take a step back.
Swyx [00:36:33]: And Raiza, I mean, presumably for the voice stuff.
Raiza [00:36:35]: Raiza's like the head chef, right? Of, like, deep dive, I think. Potatoes.
Raiza [00:36:40]: Of potatoes.
Raiza [00:36:41]: And I say this because I think even though we are already a very opinionated team, and Steven, for sure, very opinionated, I think of the audio generations, like, Raiza was the most opinionated, right? And we all, like, would say, like, hey, I remember, like, one of the first ones he sent me.
Raiza [00:36:57]: I was like, oh, I feel like they should introduce themselves. I feel like they should say a title. But then, like, we would catch things, like, maybe they shouldn't say their names.
Raiza [00:37:04]: Yeah, they don't say their names.
Usama [00:37:05]: That was a Steven catch, like, not give them names.
Raiza [00:37:08]: So stuff like that is, like, we all injected, like, a little bit of just, like, hey, here's, like, my take on, like, how a podcast should be, right? And I think, like, if you're a person who, like, regularly listens to podcasts, there's probably some collective preference there that's generic enough that you can standardize into, like, the deep dive format. But, yeah, it's the new formats where I think, like, oh, that's the next test. Yeah.
Swyx [00:37:30]: I've tried to make a clone, by the way. Of course, everyone did. Yeah. Everyone in AI was like, oh, no, this is so easy. I'll just take a TTS model. Obviously, our models are not as good as yours, but I tried to inject a consistent character backstory, like, age, identity, where they work, where they went to school, what their hobbies are. Then it just, the models try to bring it in too much.
Raiza [00:37:49]: Yeah.
Swyx [00:37:49]: I don't know if you tried this.
Raiza [00:37:51]: Yeah.
Swyx [00:37:51]: So then I'm like, okay, like, how do I define a personality? But it doesn't keep coming up every single time. Yeah.
Raiza [00:37:58]: I mean, we have, like, a really, really good, like, character designer on our team.
Raiza [00:38:02]: What?
Swyx [00:38:03]: Like a D&D person?
Raiza [00:38:05]: Just to say, like, we, just like we had to be opinionated about the format, we had to be opinionated about who are those two people talking.
Raiza [00:38:11]: Okay.
Raiza [00:38:12]: Right.
Raiza [00:38:12]: And then to the extent that, like, you can design the format, you should be able to design the people as well.
Raiza [00:38:18]: Yeah.
Swyx [00:38:18]: I would love, like, a, you know, like when you play Baldur's Gate, like, you roll, you roll like 17 on Charisma and like, it's like what race they are. I don't know.
Raiza [00:38:27]: I recently, actually, I was just talking about character select screens.
Raiza [00:38:30]: Yeah. I was like, I love that, right.
Raiza [00:38:32]: And I was like, maybe there's something to be learned there because, like, people have fallen in love with the deep dive as a, as a format, as a technology, but also as just like those two personas.
Raiza [00:38:44]: Now, when you hear a deep dive and you've heard them, you're like, I know those two.
Raiza [00:38:48]: Right.
Raiza [00:38:48]: And people, it's so funny when I, when people are trying to find out their names, like, it's a, it's a worthy task.
Raiza [00:38:54]: It's a worthy goal.
Raiza [00:38:55]: I know what you're doing. But the next step here is to sort of introduce, like, is this like what people want?
Raiza [00:39:00]: People want to sort of edit the personas or do they just want more of them?
Swyx [00:39:04]: I'm sure you're getting a lot of opinions and they all, they all conflict with each other. Before we move on, I have to ask, because we're kind of on this topic. How do you make audio engaging? Because it's useful, not just for deep dive, but also for us as podcasters. What is, what does engaging mean? If you could break it down for us, that'd be great.
Usama [00:39:22]: I mean, I can try. Like, don't, don't claim to be an expert at all.
Swyx [00:39:26]: So I'll give you some, like variation in tone and speed. You know, there's this sort of writing advice where, you know, this sentence is five words. This sentence is three, that kind of advice where you, where you vary things, you have excitement, you have laughter, all that stuff. But I'd be curious how else you break down.
Usama [00:39:42]: So there's the basics, like obviously structure that can't be meandering, right? Like there needs to be sort of a, an ultimate goal that the voices are trying to get to, human or artificial. I think one thing we find often is if there's just too much agreement between people, like that's not fun to listen to. So there needs to be some sort of tension and build up, you know, withholding information. For example, like as you listen to a story unfold, like you're going to learn more and more about it. And audio that maybe becomes even more important because like you actually don't have the ability to just like skim to the end of something. You're driving or something like you're going to be hooked because like there's, and that's how like, that's how a lot of podcasts work. Like maybe not interviews necessarily, but a lot of true crime, a lot of entertainment in general. There's just like a gradual unrolling of information. And that also like sort of goes back to the content transformation aspect of it. Like maybe you are going from, let's say the Wikipedia article of like one of the History of Mysteries, maybe episodes. Like the Wikipedia article is going to state out the information very differently. It's like, here's what happened would probably be in the very first paragraph. And one approach we could have done is like maybe a person's just narrating that thing. And maybe that would work for like a certain audience. Or I guess that's how I would picture like a standard history lesson to unfold. But like, because we're trying to put it in this two-person dialogue format, like there, we inject like the fact that, you know, there's, you don't give everything at first. And then you set up like differing opinions of the same topic or the same, like maybe you seize on a topic and go deeper into it and then try to bring yourself back out of it and go back to the main narrative. So that's, that's mostly from like the setting up the script perspective. And then the audio, I was saying earlier, it's trying to be as close to just human speech as possible. I think was the, what we found success with so far.
Raiza [00:41:40]: Yeah. Like with interjections, right?
Raiza [00:41:41]: Like I think like when you listen to two people talk, there's a lot of like, yeah, yeah, right. And then there's like a lot of like that questioning, like, oh yeah, really?
Raiza [00:41:49]: What did you think?
Swyx [00:41:50]: I noticed that. That's great.
Raiza [00:41:52]: Totally.
Usama [00:41:54]: Exactly.
Swyx [00:41:55]: My question is, do you pull in speech experts to do this? Or did you just come up with it yourselves? You can be like, okay, talk to a whole bunch of fiction writers to, to make things engaging or comedy writers or whatever, stand up comedy, right? They have to make audio engaging, but audio as well. Like there's professional fields of studying where people do this for a living, but us as AI engineers are just making this up as we go.
Raiza [00:42:19]: I mean, it's a great idea, but you definitely didn't.
Raiza [00:42:22]: Yeah.
Swyx [00:42:24]: My guess is you didn't.
Raiza [00:42:25]: Yeah.
Swyx [00:42:26]: There's a, there's a certain field of authority that people have. They're like, oh, like you can't do this because you don't have any experience like making engaging audio. But that's what you literally did.
Raiza [00:42:35]: Right.
Usama [00:42:35]: I mean, I was literally chatting with someone at Google earlier today about how some people think that like you need a linguistics person in the room for like making a good chatbot. But that's not actually true because like this person went to school for linguistics. And according to him, he's an engineer now. According to him, like most of his classmates were not actually good at language. Like they knew how to analyze language and like sort of the mathematical patterns and rhythms and language. But that doesn't necessarily mean they were going to be eloquent at like while speaking or writing. So I think, yeah, a lot of we haven't invested in specialists in audio format yet, but maybe that would.
Raiza [00:43:13]: I think it's like super interesting because I think there is like a very human question of like what makes something interesting. And there's like a very deep question of like what is it, right? Like what is the quality that we are all looking for? Is it does somebody have to be funny? Does something have to be entertaining? Does something have to be straight to the point? And I think when you try to distill that, this is the interesting thing I think about our experiment, about this particular launch is first, we only launched one format. And so we sort of had to squeeze everything we believed about what an interesting thing is into one package. And as a result of it, I think we learned it's like, hey, interacting with a chatbot is sort of novel at first, but it's not interesting, right? It's like humans are what makes interacting with chatbots interesting.
Raiza [00:43:59]: It's like, ha ha ha, I'm going to try to trick it. It's like, that's interesting.
Raiza [00:44:02]: Spell strawberry, right?
Raiza [00:44:04]: This is like the fun that like people have with it. But like that's not the LLM being interesting.
Raiza [00:44:08]: That's you just like kind of giving it your own flavor. But it's like, what does it mean to sort of flip it on its head and say, no, you be interesting now, right? Like you give the chatbot the opportunity to do it. And this is not a chatbot per se. It is like just the audio. And it's like the texture, I think, that really brings it to life. And it's like the things that we've described here, which is like, okay, now I have to like lead you down a path of information about like this commercialization deck.
Raiza [00:44:36]: It's like, how do you do that?
Raiza [00:44:38]: To be able to successfully do it, I do think that you need experts. I think we'll engage with experts like down the road, but I think it will have to be in the context of, well, what's the next thing we're building, right? It's like, what am I trying to change here? What do I fundamentally believe needs to be improved? And I think there's still like a lot more studying that we have to do in terms of like, well, what are people actually using this for? And we're just in such early days. Like it hasn't even been a month. Two, three weeks.
Usama [00:45:05]: Three weeks.
Raiza [00:45:06]: Yeah, yeah.
Usama [00:45:07]: I think one other element to that is the fact that you're bringing your own sources to it. Like it's your stuff. Like, you know this somewhat well, or you care to know about this. So like that, I think, changed the equation on its head as well. It's like your sources and someone's telling you about it. So like you care about how that dynamic is, but you just care for it to be good enough to be entertaining. Because ultimately they're talking about your mortgage deed or whatever.
Swyx [00:45:33]: So it's interesting just from the topic itself. Even taking out all the agreements and the hiding of the slow reveal. I mean, there's a baseline, maybe.
Usama [00:45:42]: Like if it was like too drab. Like if someone was reading it off, like, you know, that's like the absolute worst.
Raiza [00:45:46]: But like...
Swyx [00:45:47]: Do you prompt for humor? That's a tough one, right?
Raiza [00:45:51]: I think it's more of a generic way to bring humor out if possible. I think humor is actually one of the hardest things. Yeah.
Raiza [00:46:00]: But I don't know if you saw...
Raiza [00:46:00]: That is AGI.
Swyx [00:46:01]: Humor is AGI.
Raiza [00:46:02]: Yeah, but did you see the chicken one?
Raiza [00:46:03]: No.
Raiza [00:46:04]: Okay. If you haven't heard it... We'll splice it in here.
Swyx [00:46:06]: Okay.
Raiza [00:46:07]: Yeah.
Raiza [00:46:07]: There is a video on Threads. I think it was by Martino Wong. And it's a PDF.
Raiza [00:46:16]: Welcome to your deep dive for today. Oh, yeah. Get ready for a fun one. Buckle up. Because we are diving into... Chicken, chicken, chicken. Chicken, chicken. You got that right. By Doug Zonker. Now. And yes, you heard that title correctly. Titles. Our listener today submitted this paper. Yeah, they're going to need our help. And I can totally see why. Absolutely. It's dense. It's baffling. It's a lot. And it's packed with more chicken than a KFC buffet. What? That's hilarious.
Raiza [00:46:48]: That's so funny. So it's like stuff like that, that's like truly delightful, truly surprising.
Raiza [00:46:53]: But it's like we didn't tell it to be funny.
Usama [00:46:55]: Humor is contextual also. Like super contextual is what we're realizing. So we're not prompting for humor, but we're prompting for maybe a lot of other things that are bringing out that humor.
Alessio [00:47:04]: I think the thing about ad-generated content, if we look at YouTube, like we do videos on YouTube and it's like, you know, a lot of people like screaming in the thumbnails to get clicks. There's like everybody, there's kind of like a meta of like what you need to do to get clicks. But I think in your product, there's no actual creator on the other side investing the time. So you can actually generate a type of content that is maybe not universally appealing, you know, at a much, yeah, exactly. I think that's the most interesting thing. It's like, well, is there a way for like, take Mr.
Raiza [00:47:36]: Beast, right?
Alessio [00:47:36]: It's like Mr. Beast optimizes videos to reach the biggest audience and like the most clicks. But what if every video could be kind of like regenerated to be closer to your taste, you know, when you watch it?
Raiza [00:47:48]: I think that's kind of the promise of AI that I think we are just like touching on, which is, I think every time I've gotten information from somebody, they have delivered it to me in their preferred method, right?
Raiza [00:47:59]: Like if somebody gives me a PDF, it's a PDF.
Raiza [00:48:01]: Somebody gives me a hundred slide deck, that is the format in which I'm going to read it. But I think we are now living in the era where transformations are really possible, which is, look, like I don't want to read your hundred slide deck, but I'll listen to a 16 minute audio overview on the drive home. And that, that I think is, is really novel. And that is, is paving the way in a way that like maybe we wanted, but didn't
Raiza [00:48:24]: expect.
Raiza [00:48:25]: Where I also think you're listening to a lot of content that normally wouldn't have had content made about it. Like I watched this TikTok where this woman uploaded her diary from 2004.
Raiza [00:48:36]: For sure, right?
Raiza [00:48:36]: Like nobody was going to make a podcast about a diary.
Raiza [00:48:39]: Like hopefully not. Like it seems kind of embarrassing. It's kind of creepy. Yeah, it's kind of creepy.
Raiza [00:48:43]: But she was, she was doing this like live listen of like, oh, like here's a podcast of my diary.
Raiza [00:48:48]: And it's like, it's entertaining right now to sort of all listen to it together. But like the connection is personal. It was like, it was her interacting with like her information in a totally
Raiza [00:48:57]: different way.
Raiza [00:48:58]: And I think that's where like, oh, that's a super interesting space, right? Where it's like, I'm creating content for myself in a way that suits the way that I want to, I want to consume it.
Usama [00:49:06]: Or people compare like retirement plan options. Like no one's going to give you that content. Like for your personal financial situation.
Raiza [00:49:14]: Yeah.
Usama [00:49:14]: And like, even when we started out the experiment, like a lot of the goal was to go for really obscure content and see how well we could transform that. So like if you look at the mountain view, like city council meeting notes, like you're never going to read it. But like if it was a three minute summary, like that would be interesting. I see.
Swyx [00:49:33]: You have one system, one prompt that just covers everything you threw at it.
Raiza [00:49:37]: Maybe.
Swyx [00:49:39]: I'm just, I'm just like, yeah, it's really interesting. You know what? I'm trying to figure out what you nailed compared to others. And I think that the way that you treat your, the AI is like a little bit different than a lot of the builders I talked to. So I don't know what it is. You said, I wish I had a transcript right in front of me, but it's something like people treat AI as like a tool for thought, but usually it's kind of doing their bidding and you know, what you're really doing is loading up these like two virtual agents. I don't, you've never said the word agents. I put that in your mouth, but two virtual humans or AIs and letting them from the, from their own opinion and letting them kind of just live and embody it a little bit. Is that accurate?
Raiza [00:50:17]: I think that that is as close to accurate as possible. I mean, in general, I try to be careful about saying like, oh, you know,
Raiza [00:50:24]: letting, you know, yeah, like these, these personas live.
Raiza [00:50:27]: But I think to your earlier question of like, what makes it interesting? That's what it takes to make it interesting.
Raiza [00:50:32]: Yeah.
Raiza [00:50:32]: Right. And I think to do it well is like a worthy challenge. I also think that it's interesting because they're interested, right? Like, is it interesting to compare?
Raiza [00:50:42]: Yeah.
Raiza [00:50:42]: Is it, is it interesting to have two retirement plans?
Raiza [00:50:46]: No, but to listen to these two talk about it.
Raiza [00:50:50]: Oh my gosh.
Raiza [00:50:50]: You'd think it was like the best thing ever invented, right? It's like, get this, deep dive into 401k through Chase versus, you know,
Raiza [00:50:59]: whatever.
Swyx [00:51:00]: They do do a lot of get this.
Raiza [00:51:02]: I know. I know.
Raiza [00:51:03]: I dream about it.
Raiza [00:51:06]: I'm sorry.
Swyx [00:51:08]: There's a, I have a few more questions on just like the engineering around this. And obviously some of this is just me creatively asking how this works. How do you make decisions between when to trust the AI overlord to decide for you? In other words, stick it, let's say products as it is today. You want to improve it in some way. Do you engineer it into the system? Like write code to make sure it happens or you just stick it in the prompt and hope that the LLM does it for you?
Raiza [00:51:38]: Do you know what I mean?
Raiza [00:51:39]: Do you mean specifically about audio or sort of in general?
Swyx [00:51:41]: In general, like designing AI products. I think this is like the one thing that people are struggling with. And there's, there's compound AI people and then there's big AI people. So compound AI people will be like Databricks, have lots of little models, chain them together to make an output. It's deterministic. You control every single piece and you know, you produce what you produce. The open AI people, totally the opposite. Like write one giant prompts and let the model figure it out.
Raiza [00:52:05]: Yeah.
Swyx [00:52:06]: And obviously the answer for most people is going to be a spectrum in between those two, like big model, small model. When do you decide that?
Raiza [00:52:11]: I think it depends on the task. It also depends on, well, it depends on the task, but ultimately depends on what is your desired outcome? Like what am I engineering for here? And I think there's like several potential outputs and there's sort of like general
Raiza [00:52:24]: categories.
Raiza [00:52:24]: Am I trying to delight somebody? Am I trying to just like meet whatever the person is trying to do? Am I trying to sort of simplify a workflow?
Raiza [00:52:31]: At what layer am I implementing this?
Raiza [00:52:32]: Am I trying to implement this as part of the stack to reduce like friction, you know, particularly for like engineers or something? Or am I trying to engineer it so that I deliver like a super high quality
Raiza [00:52:43]: thing?
Raiza [00:52:44]: I think that the question of like which of those two, I think you're right, it
Raiza [00:52:48]: is a spectrum.
Raiza [00:52:49]: But I think fundamentally it comes down to like it's a craft, like it's still a craft as much as it is a science. And I think the reality is like you have to have a really strong POV about like what you want to get out of it and to be able to make that decision. Because I think if you don't have that strong POV, like you're going to get lost in sort of the detail of like capability. And capability is sort of the last thing that matters because it's like, models will catch up, right? Like models will be able to do, you know, whatever in the next five years. It's going to be insane. So I think this is like a race to like value. And it's like really having a strong opinion about like, what does that look
Raiza [00:53:25]: like today?
Raiza [00:53:25]: And how far are you going to be able to push it? Sorry, I think maybe that was like very like philosophical.
Swyx [00:53:31]: We get there.
Usama [00:53:32]: And I think that hits a lot of the points it's going to make.
Alessio [00:53:35]: I tweeted today or I ex-posted, whatever, that we're going to interview you on what we should ask you. So we got a list of feature requests, mostly. It's funny. Nobody actually had any like specific questions about how the product was built. They just want to know when you're releasing some feature. So I know you cannot talk about all of these things, but I think maybe it would give people an idea of like where the product is going. So I think the most common question I think five people asked is like, are you going to build an API? And, you know, do you see this product as still be kind of like a full head product for like a login and do everything there? Or do you want it to be a piece of infrastructure that people build on?
Raiza [00:54:13]: I mean, I think why not both?
Raiza [00:54:16]: I think we work at a place where you could have both. I think that end user products, like products that touch the hands of users
Raiza [00:54:23]: have a lot of value.
Raiza [00:54:24]: For me personally, like we learn a lot about what people are trying to do and what's like actually useful and what people are ready for. And so we're going to keep investing in that. I think at the same time, right, there are a lot of developers that are interested in using the same technology to build their own thing. We're going to look into that, how soon that's going to be ready. I can't really comment, but these are the things that like, Hey, we heard it.
Raiza [00:54:47]: We're trying to figure it out.
Raiza [00:54:48]: And I think there's room for both.
Swyx [00:54:50]: Is there a world in which this becomes a default Gemini interface because it's technically different org?
Raiza [00:54:55]: It's such a good question.
Raiza [00:54:56]: And I think every, every time someone asks me, it's like, Hey, I just lead
Raiza [00:55:00]: Domogolem.
Raiza [00:55:02]: We'll ask the Gemini folks what they think.
Alessio [00:55:05]: Multilingual support. I know people kind of hack this a little bit together. Any ideas for full support, but also I'm mostly interested in dialects. In Italy, we have Italian obviously, but we have a lot of local dialects. Like if you go to Rome, people don't really speak Italian, they speak local
Raiza [00:55:20]: dialect.
Alessio [00:55:21]: Do you think there's a path to which these models, especially the speech can learn very like niche dialects? Like how much data do you need? Can people contribute? Like I'm curious, like if you see this as a possibility.
Raiza [00:55:35]: Totally.
Usama [00:55:35]: So I guess high level, like we're definitely working on adding more
Raiza [00:55:39]: languages.
Usama [00:55:39]: That's like top priority. We're going to start small, but like theoretically we should be able to cover like most languages pretty soon. What a ridiculous statement, by the way.
Swyx [00:55:48]: That's, that's crazy.
Usama [00:55:49]: Unlike the soon or the pretty soon part.
Swyx [00:55:52]: No, but like, you know, a few years ago, like a small team of like, I don't know, 10 people saying that we will support the top 100, 200 languages is like absurd, but you can do it. Yeah, you can do it.
Raiza [00:56:03]: And I think like the speech team, you know, we are a small team, but the speech team is another team and the modeling team, like these folks are just like absolutely brilliant at what they do. And I think like when we've talked to them and we've said, Hey, you know, how
Raiza [00:56:17]: about more languages? How about more voices? How about dialects?
Raiza [00:56:20]: Right?
Raiza [00:56:20]: This is something that like they are game to do. And like, that's, that's the roadmap for them.
Usama [00:56:25]: The speech team supports like a bunch of other efforts across Google, like Gemini Live, for example, is also the models built by the same like sort of deep mind speech team. But yeah, the thing about dialects is really interesting. Cause like, and some of our sort of earliest testing with trying out other languages, we actually noticed that sometimes it wouldn't stick to a certain dialect, especially for like, I think for French, we noticed that like when we presented it to like a native speaker, it would sometimes go from like a Canadian person speaking French versus like a French person speaking French or an American person speaking French, which is not what we wanted. So there's a lot more sort of speech quality work that we need to do there to make sure that it works reliably. And at least sort of like the, the standard dialect that we want, but that does show that there's potential to sort of do the thing that you're talking about of like fixing a dialect that you want, maybe contribute your own voice or like you pick from one of the options. There's, there's a lot more headroom there. Yeah.
Alessio [00:57:20]: Because we have movies, like we have old Roman movies that have like different languages, but there's not that many, you know? So I'm always like, well, I'm sure like the Italian is so strong in the model that like when you're trying to like pull that away from it, like you kind of need a lot, but right.
Usama [00:57:35]: That's, that's all sort of like wonderful deep mind speech team.
Swyx [00:57:39]: Well, anyway, if you need Italian, he's got you.
Swyx [00:57:44]: Specifically Singlish.
Raiza [00:57:45]: I got you.
Swyx [00:57:46]: Managing system prompts. People want a lot of that. I assume.
Raiza [00:57:50]: Yes.
Swyx [00:57:50]: Ish.
Raiza [00:57:51]: Definitely looking into it for just core notebook LM. Like everybody's wanted that forever. So we're working on that. I think for the audio itself, we're trying to figure out the best way to do it. So we'll launch something sooner rather than later. So we'll probably stage it. And I think like, you know, just to be fully transparent, we'll probably launch something that's more of a fast follow than like a fully baked feature first.
Raiza [00:58:15]: Just because like, I see so many people put in like the fake show notes.
Raiza [00:58:18]: It's like, Hey, I'll, I'll help you out.
Raiza [00:58:19]: We'll just put a text box. Yeah. Yeah.
Usama [00:58:21]: I think a lot of people are like, this is almost perfect, but like, I just need that extra 10, 20%. Yeah.
Swyx [00:58:26]: I noticed that you say no a lot, I think, or you try to ship one thing and that there's different about you than maybe other PMs or other teams that try to ship, but they're like, Oh, here are all the knobs.
Raiza [00:58:38]: I'm just.
Swyx [00:58:38]: Take all my knobs. Yeah.
Raiza [00:58:40]: Yeah.
Swyx [00:58:40]: Top P top cake. It doesn't matter. I'll just put it in the docs and you figure it out. Right. Whereas for you, it's you, you actually just, you make one product.
Raiza [00:58:49]: Yeah.
Swyx [00:58:49]: As opposed to like 10, you could possibly have done.
Raiza [00:58:51]: Yeah.
Swyx [00:58:51]: I don't know.
Raiza [00:58:52]: It's interesting. I think about this a lot.
Raiza [00:58:53]: I think it requires a lot of discipline because I thought about the knobs.
Raiza [00:58:57]: I was like, Oh, I saw on Twitter, you know, on X people want the knobs. It's like, great.
Raiza [00:59:02]: Start mocking it up, making the text boxes, designing like the little fiddles.
Raiza [00:59:06]: Right.
Raiza [00:59:07]: And then I looked at it and I was kind of sad. I was like, well, right. It's like, Oh, it's like, this is not cool.
Raiza [00:59:12]: This is not fun.
Raiza [00:59:13]: This is not magical. It is sort of exactly what you would expect knobs to be. Then, you know, it's like, Oh, I mean, how much can you, you know, design a knob?
Raiza [00:59:24]: I thought about it. I was like, but the thing that people really like was that there wasn't any.
Raiza [00:59:29]: That they just pushed a button and it was cool.
Raiza [00:59:32]: And so I was like, how do we bring more of that?
Raiza [00:59:34]: Right.
Raiza [00:59:34]: That still gives the user the optionality that they want. And so this is where like, you have to have a strong POV. I think you have to like really boil down. What did I learn in like the month since I've launched this thing that people really want? And I can give it to them while preserving like that, that delightful sort of fun experience. And I think that's actually really hard.
Raiza [00:59:54]: Like I'm not going to come up with that by myself.
Raiza [00:59:55]: And like, that's something that like our team thinks about every day. We all have different ideas. We're all experimenting with sort of how to get the most out of like the insight and also ship it quick. So, so we'll see.
Raiza [01:00:06]: We'll find out soon if people like it or not.
Usama [01:00:08]: I think the other interesting thing about like AI development now is that the knobs are not necessarily like speak going back to all the sort of like craft and like human taste and all of that that went into building it. Like the knobs are not as easy to add as simply like I'm going to add a parameter to this and it's going to make it happen. It's like you kind of have to redo the quality process for everything. Yeah, the prioritization is also different.
Raiza [01:00:36]: It goes back to sort of like, it's a lot easier to do an eval for like the deep dive format than if like, okay, now I'm going to let you inject like these random things, right?
Raiza [01:00:45]: Okay.
Raiza [01:00:45]: How am I going to measure quality?
Raiza [01:00:46]: Either?
Raiza [01:00:46]: I say, I don't care because like you just input whatever.
Raiza [01:00:50]: Or I say, actually wait, right?
Raiza [01:00:53]: Like I want to help you get the best output ever.
Raiza [01:00:55]: What's it going to take?
Usama [01:00:56]: The knob actually needs to work reliably.
Raiza [01:00:58]: Yeah. Yeah. Very important part.
Alessio [01:01:00]: Two more things we definitely want to talk about. I guess now people equivalent notebook LM to like a podcast generator, but I guess, you know, there's a whole product suite there.
Raiza [01:01:09]: Yeah.
Alessio [01:01:10]: How should people think about that? Like is this, and also like the future of the product as far as monetization too, you know, like, is it going to be the voice thing going to be a core to it? Is it just going to be one output modality? And like, you're still looking to build like a broader kind of like a interface with data and documents.
Raiza [01:01:27]: I mean, that's such a, that's such a good question that I think the answer it's I'm waiting to get more data. I think because we are still in the period where everyone's really excited about it, everyone's trying it. I think I'm getting a lot of sort of like positive feedback on the audio. We have some early signal that says it's a really good hook, but people stay for the other features.
Raiza [01:01:49]: So that's really good too.
Raiza [01:01:50]: I was making a joke yesterday.
Raiza [01:01:51]: I was like, it'd be really nice, you know, if it was just the audio, because then I could just like simplify the train.
Raiza [01:01:58]: Right.
Raiza [01:01:58]: I don't have to think about all this other functionality, but I think the reality is that the framework kind of like what we were talking about earlier that we had laid out, which is like you bring your own sources. There's something you do in the middle and then there's an output is that really extensible one. And it's a really interesting one. And I think like, particularly when we think about what a big business looks like, especially when we think about commercialization, audio is just one such modality. But the editor itself, like the space in which you're able to do these things is like, that's the business, right? Like maybe the audio by itself, not so much, but like in this big package, like, oh, I could see that. I could see that being like a really big business.
Raiza [01:02:37]: Yep.
Alessio [01:02:37]: Any thoughts on some of the alternative interact with data and documents thing, like cloud artifacts, like a JGBD canvas, you know, kind of how do you see, maybe we're notebook LM stars, but like Gemini starts, like you have so many amazing teams and products at Google. There's sometimes like, I'm sure you have to figure that out.
Raiza [01:02:56]: Yeah.
Raiza [01:02:56]: Well, I love artifacts.
Raiza [01:02:59]: I played a little bit with canvas. I got a little dizzy using it. I was like, oh, there's something.
Raiza [01:03:03]: Well, you know, I like the idea of it fundamentally, but something about the UX was like, oh, this is like more disorienting than like artifacts.
Raiza [01:03:11]: And I couldn't figure out what it was. And I didn't spend a lot of time thinking about it, but I love that, right?
Raiza [01:03:16]: Like the thing where you are like, I'm working with, you know, an LLM, an agent, a chap or whatever to create something new. And there's like the chat space.
Raiza [01:03:26]: There's like the output space. I love that. And the thing that I think I feel angsty about is like, we've been talking about this for like a year, right?
Raiza [01:03:35]: Like, of course, like I'm going to say that, but it's like, but like for a year now I've had these like mocks that I was just like, I want to push the button.
Raiza [01:03:42]: But we prioritize other things.
Raiza [01:03:43]: We were like, okay, what can we like really win at? And like we prioritize audio, for example, instead of that. But just like when people were like, oh, what is this magic draft thing? Oh, it's like a hundred percent, right?
Raiza [01:03:54]: It's like stuff like that that we want to try to build into notebook too.
Raiza [01:03:57]: And I'd made this comment on Twitter as well, where I was like, now I don't know, actually, right? I don't actually know if that is the right thing.
Raiza [01:04:05]: Like, are people really getting utility out of this? I mean, from the launches, it seems like people are really getting it.
Raiza [01:04:11]: But I think now if we were to ship it, I have to rev on it like one layer more, right? I have to deliver like a differentiating value compared to like artifacts or chemicals, which is hard.
Swyx [01:04:20]: Which is because you've, you demonstrated the ability to fast follow. So you don't have to innovate every single time. I know, I know.
Raiza [01:04:27]: I think for me, it's just like the bar is high to ship.
Raiza [01:04:30]: And when I say that, I think it's sort of like conceptually like the value that you deliver to the user. I mean, you'll, you'll see a notebook alarm. There are a lot of corners that like that I have personally cut where it's like our UX designer is always like, I can't believe you let us ship with like these ugly scroll bars. And I'm like, no, no one notices, I promise.
Raiza [01:04:47]: He's like, no, everyone.
Raiza [01:04:48]: It's a screenshot, this thing.
Raiza [01:04:50]: But I mean, kidding aside, I think that's true that it's like we do want to be able to fast follow.
Raiza [01:04:54]: But I think we want to make sure that things also land really well. So the utility has to be there.
Swyx [01:04:59]: Code in, especially on our podcast has a special place. Is code notebook LLM interesting to you? I haven't, I've never, I don't see like a connect my GitHub to this thing. Yeah, yeah.
Raiza [01:05:10]: I think code, code is a big one. Code is a big one. I think we have been really focused, especially when we had like a much smaller team, we were really focused on like, let's push like an end to end journey together. Let's prove that we can do that. Because then once you lay the groundwork of like sources, do something in the chat output, once you have that, you just scale it up from there. Right. And it's like, now it's just a matter of like scaling the inputs, scaling the outputs, scaling the capabilities of the chat. So I think we're going to get there. And now I also feel like I have a much better view of like where the investment is required. Whereas previously I was like, Hey, like let's flesh out the story first before we put more engineers on this thing, because that's just going to slow us down.
Usama [01:05:49]: For what it's worth, the model still understands code. So I've seen at least one or two people just like download their GitHub repo, put it in there and get like an audio overview of your code.
Raiza [01:06:00]: Yeah, yeah. I've never tried that.
Usama [01:06:01]: This is like, these are all the files are connected together because the model still understands code. Like even if you haven't like.
Raiza [01:06:07]: I think on sort of like the creepy side of things, I did watch a student like with her permission, of course, I watched her do her homework in Notebook LM.
Raiza [01:06:17]: And I didn't tell her like what kind of homework to bring, but she brought like her computer science homework.
Raiza [01:06:23]: And I was like, Oh, and she uploaded it. And she said, here's my homework, read it. And it was just the instructions. And Notebook LM was like, okay, I've read it. And the student was like, okay, here's my code so far.
Raiza [01:06:37]: And she copy pasted it from the editor.
Raiza [01:06:39]: And she was like, check my homework. And Notebook LM was like, well, number one is wrong.
Raiza [01:06:44]: And I thought that was really interesting because it didn't tell her what was wrong. It just said it's wrong.
Raiza [01:06:48]: And she was like, okay, don't tell me the answer, but like walk me through like how you think about this. And it was what was interesting for me was that she didn't ask for the answer.
Raiza [01:06:58]: And I asked her, I was like, oh, why did you do that? And she was like, well, I actually want to learn it. She's like, because I'm gonna have to take a quiz on this at some point. And I was like, oh, yeah, it's a really good point.
Raiza [01:07:05]: And it was interesting because, you know, Notebook LM, while the formatting wasn't perfect, like did say like, hey, have you thought about using, you know, maybe an integer instead of like this?
Raiza [01:07:14]: And so that was, that was really interesting.
Alessio [01:07:16]: Are you adding like real-time chat on the output? Like, you know, there's kind of like the deep dive show and then there's like the listeners call in and say, hey.
Raiza [01:07:26]: Yeah, we're actively, that's one of the things we're actively prioritizing. Actually, one of the interesting things is now we're like, why would anyone want to do that? Like, what are the actual, like kind of going back to sort of having a strong POV about the experience? It's like, what is better? Like, what is fundamentally better about doing that? That's not just like being able to Q&A or Notebook. How is that different from like a conversation? Is it just the fact that there was a show and you want to tweak the show? Is it because you want to participate? So I think there's a lot there that like we can continue to unpack. But yes, that's coming.
Swyx [01:07:58]: It's because I formed a parasocial relationship. Yeah, that just might be part of your life.
Raiza [01:08:03]: Get this.
Raiza [01:08:05]: Totally.
Swyx [01:08:07]: Yeah, but it is obviously because OpenAI has just launched a real-time chat. It's a very hot topic. I would say one of the toughest AI engineering disciplines out there because even their API doesn't do interruptions that well, to be honest. And, you know, yeah, so real-time chat is tough.
Raiza [01:08:25]: I love that thing.
Raiza [01:08:26]: I love it.
Swyx [01:08:27]: Okay, so we have a couple ways to end. Either call to action or laying out one principle of AI PMing or engineering that you really think about a lot. Is there anything that comes to mind?
Raiza [01:08:39]: I feel like that's a test.
Raiza [01:08:40]: Of course, I'm going to say go to notebooklm.google.com, try it out, join the Discord and tell us what you think.
Swyx [01:08:46]: Yeah, especially like you have a technical audience. What do you want from a technical engineering audience?
Raiza [01:08:52]: I mean, I think it's interesting because the technical and engineering audience typically will just say, hey, where's the API?
Raiza [01:08:58]: But, you know, I think we addressed it. But I think what I would really be interested to discover is, is this useful to you?
Raiza [01:09:05]: Why is it useful?
Raiza [01:09:05]: What did you do? Right? Is it useful tomorrow?
Raiza [01:09:08]: How about next week?
Raiza [01:09:08]: Just the most useful thing for me is if you do stop using it or if you do keep using it, tell me why.
Raiza [01:09:14]: Because I think contextualizing it within your life, your background, your motivations, is what really helps me build really cool things.
Swyx [01:09:22]: And then one piece of advice for AI PMs.
Raiza [01:09:24]: Okay, if I had to pick one, it's just always be building. Build things yourself. I think for PMs, it's such a critical skill. And just take time to pop your head up and see what else is new out there. On the weekends, I try to have a lot of discipline. I only use ChatGPT and Cloud on the weekend. I try to use the APIs. Occasionally, I'll try to build something on GCP over the weekend because I don't do that normally at work. But it's just the rigor of just trying to be a builder yourself. And even just testing, right? You could have an idea of how a product should work and maybe your engineers are building it. But it's like, what was your proof of concept? What gave you conviction that that was the right thing?
Raiza [01:10:06]: Call to action?
Usama [01:10:07]: I feel like consistently, the most magical moments out of AI building come about for me when I'm really, really, really just close to the edge of the model capability. And sometimes it's farther than you think it is. I think while building this product, some of the other experiments, there were phases where it was easy to think that you've approached it. But sometimes at that point, what you really need is to show your thing to someone and they'll come up with creative ways to improve it. We're all sort of learning, I think. So yeah, I feel like unless you're hitting that bound of this is what Gemini 1.5 can do, probably the magic moment is somewhere there, in that sort of limit.
Swyx [01:10:48]: So push the edge of the capability. Yeah, totally.
Alessio [01:10:51]: It's funny because we had a Nicola Scarlini from DeepMind on the pod and he was like, if the model is always successful, you're probably not trying hard enough to give it heart.
Raiza [01:11:00]: Right. Thanks.
Alessio [01:11:00]: So, yeah.
Swyx [01:11:03]: My problem is sometimes I'm not smart enough to judge. Yeah, right.
Raiza [01:11:08]: Well, I think I hear that a lot.
Raiza [01:11:11]: Like people are always like, I don't know how to use it.
Raiza [01:11:14]: And it's hard.
Raiza [01:11:15]: Like I remember the first time I used Google search. I was like, what do we type?
Raiza [01:11:18]: My dad was like, anything.
Raiza [01:11:19]: It's like anything.
Raiza [01:11:20]: I got nothing in my brain, dad. What do you mean?
Raiza [01:11:23]: And I think there is a lot of like for product builders is like, have a strong opinion about like, what is the user supposed to do?
Raiza [01:11:30]: Yeah. Help them do it.
Swyx [01:11:31]: Principle for AI engineers or like just one advice that you have others?
Usama [01:11:36]: I guess like in addition to pushing the bounds and to do that, that often means like you're not going to get it right in the first go. So like, don't be afraid to just like batch multiple models together. I guess that's I'm basically describing an agent, but more thinking time equals just better results consistently. And that holds true for probably every single time that I've tried to build something.
Swyx [01:12:01]: Well, at some point we will talk about the sort of longer inference paradigm. It seems like DeepMind is rumored to be coming out with something. You can't comment, of course.
Raiza [01:12:09]: Yeah.
Swyx [01:12:09]: Well, thank you so much. You know, you've created. I actually said, I think you saw this. I think that Notebook LLM was kind of like the ChatGPT moment for Google.
Raiza [01:12:18]: That was so crazy when I saw that.
Raiza [01:12:19]: I was like, what?
Raiza [01:12:20]: Like, ChatGPT was huge for me. And I think, you know, when you said it and other people have said it, I was like, is it?
Raiza [01:12:27]: Yeah. That's crazy.
Swyx [01:12:28]: People weren't like really cognizant of Notebook LLM before and audio overviews and Notebook LLM like unlocked the, you know, a use case for people in the way that I would go so far as to say cloud projects never did. And I don't know. You know, I think a lot of it is competent PMing and engineering, but also just, you know, it's interesting how a lot of these projects are always like low key research previews for you. It's like you're a separate org, but like, you know, you built products and UI innovation on top of also working with research to improve the model. That was a success that wasn't planned to be this whole big thing. You know, your TPUs were on fire, right?
Raiza [01:13:06]: Oh my gosh, that was so funny.
Raiza [01:13:08]: I didn't know people would like really catch on to the Elmo fire, but it was just like one of those things where I was like, you know, we had to ask for more TPUs.
Raiza [01:13:16]: Yeah, we many times.
Raiza [01:13:18]: And, you know, it was a little bit of a, of a subtweet of like, Hey, reminder, give us more TPUs on here.
Raiza [01:13:25]: It's weird.
Swyx [01:13:25]: I just think like when people try to make big launches, then they flop. And then like when they're not trying and they just, they're just trying to build a good thing, then, then they succeed. It's, it's this fundamentally really weird magic that I haven't really encapsulated yet, but you've, you've done it. Well, thank you.
Raiza [01:13:40]: Thank you.
Raiza [01:13:40]: And, you know, I think we'll just keep going in like the same way. We just keep trying, keep trying to make it better.
Raiza [01:13:45]: I hope so.
Swyx [01:13:46]: All right.
Raiza [01:13:47]: Cool.
Swyx [01:13:47]: Thank you.
Raiza [01:13:48]: Thank you. Thanks for having us. Thanks.
Get full access to Latent Space at www.latent.space/subscribe