
Sign up to save your podcasts
Or
https://chrt.fm/track/4D4ED/traffic.megaphone.fm/MLN6770658893.mp3?updated=1670879179
Transcript
SPEAKER 1 0:00:00
I want to send a huge thanks to our friends at AWS for their continued support of the podcast and their sponsorship of our reinvent 2022 series. You know AWS is a cloud computing leader, but did you realize the company offers a broad array of services and infrastructure at all three layers of the machine learning technology stack? In fact, tens of thousands of customers trust AWS for machine learning and AI services. And the company aims to put ML in the hands of every practitioner with innovative services like Amazon Code Whisperer, a new ML powered pair programming tool that helps developers improve productivity by significantly reducing the time to build software applications. To learn more about AWS ML and AI services and how they're helping customers accelerate their machine learning journeys, visit twimlai.com slash go slash AWS ML. All right, everyone, this is Sam Charrington, host of the Twiml AI podcast. And today I'm coming to you live from the Future Frequency podcast studio at the AWS reinvent conference here in Las Vegas. And I am joined by Ahmad Mostak. Ahmad is founder and CEO of Stability AI. If this is the first episode of our reinvent series that you are listening to, don't try adjusting your audio settings. It's definitely me. After a few days here at reinvent in the dry desert here in Nevada, my voice is on his last legs, but I think we'll make it through this. Before we get going, be sure to take a moment to hit that subscribe button wherever you're listening to today's show. And if you want to check us out in studio, you can bounce over to YouTube for the interview.
SPEAKER 2 0:01:42
Ahmad, welcome to the podcast. Thanks so much for having me. Super excited to talk to you.
SPEAKER 1 0:01:46
You are of course the founder and CEO of Stability. Stability is the company behind StableDiffusion, which is a multimodal model that has been getting a lot of fanfare, I think. Welcome. And I'd love to jump in by having you share a little bit about your background.
SPEAKER 2 0:02:02
Yeah, no, I think it's been super interesting. I think StableDiffusion is kind of a specific text to image model. As for me, let's say I started off as a computer science at uni,
SPEAKER 1 0:02:10
enterprise developer, and then became a hedge fund manager and one of the largest video game investors in the world and then artificial intelligence. And I was doing that, it was a
SPEAKER 2 0:02:18
lot of fun. And then my son was diagnosed with autism and they said there was no cure or treatment.
SPEAKER 1 0:02:23
So I quit, switched to advising hedge funds and built an AI team to do literature review, all the autism literature, and then biomolecular pathway analysis of neurotransmitters to repurpose drugs to help him out. And it kind of worked. He went to mainstream school and was super happy. That's awesome.
SPEAKER 2 0:02:39
It was kind of cool. Good trade, good trade. Then I went back to the hedge fund world, won some awards. It's boring. Then decided to make the world a better place. So first off,
SPEAKER 1 0:02:47
took the global X prize for learning. That was a $15 million prize from Elon Musk and Tony Robbins for the first app to teach kids literacy and numeracy without internet. My co-founder and I
SPEAKER 2 0:02:55
have been deploying that around the world. And now we're teaching kids in refugee camps,
SPEAKER 1 0:02:59
literacy and numeracy in 13 months and one hour a day. And we're about to air the crap out of that.
SPEAKER 2 0:03:04
In 2020-21, I designed and led the United Nations, one of the United Nations AI initiatives against
SPEAKER 1 0:03:10
COVID-19, Kayak Collective and augmented intelligence against COVID-19 launched at Stanford, backed by the WHO, UNESCO and the World Bank. And that was really interesting because we were trying
SPEAKER 2 0:03:20
to make the world's knowledge free on COVID-19 with Core 19. So there's a 500,000 paper data set,
SPEAKER 1 0:03:26
freely available to everyone. And then use AI to organize it because it's really confusing. During that, lots and lots of interesting tech kind of came through, but I realized
SPEAKER 2 0:03:37
these foundation models are super powerful. You can't have them controlled by any one company. It's bad business and it's not the correct thing ethically. So I thought, let's widen this and create open source foundation models for everyone, because I think it can really advance humanity.
SPEAKER 1 0:03:50
And again, I think it'll be great to see these things proliferate. So we can have an open discussion about it and also have the value created from just these brand new experiences. That's awesome. And when did you get started down that part of the journey?
SPEAKER 2 0:04:03
About two years ago. Stability has been going for about 13 months now.
SPEAKER 1 0:04:07
Yeah. When I think about the lot of stable diffusion goes back to this latent diffusion paper, which was not even a year ago.
SPEAKER 2 0:04:13
It's not even a year ago. I think the whole thing kind of kicked off with Clip released by OpenAI in January of last year. So I actually had COVID doing that time while doing my COVID thing. My daughter came to me and said, dad, you know all that stuff you do, taking all that knowledge and squishing it down to make it useful for everyone. Can you do that with image? It's like, well, we can. So I'll build a system for her based on VQGAN and Clip. So an image generating model. And then Clip is an image to text model where she created like a vision board of everything she wanted, a description of what she wanted to make. And it generates 16 different images. And then she said how each one of those is different and changed the latents. And then it generated another 16, another 16, another 16. And then eight hours
SPEAKER 1 0:04:49
later, she made an image that she went on to sell as an NFT for $3,500. Wow. And donated the proceeds to India Code Relief. Okay. I thought it was awesome. She's seven years old. Wow. And then
SPEAKER 2 0:05:01
I was like, this is transformative technology. Image is the one it's at. Language, we're already at 85%. We're going to get a 95% image. We're at 10%. We're not a visual species. Like the easiest
SPEAKER 1 0:05:11
way for us to communicate is what we're doing right now. We're having a nice chat. Then text is the
SPEAKER 2 0:05:15
next hardest. And image, like be it images or PowerPoints are impossible. Let's make it easy. This tech can do that. So we started funding the entire sector, Google Colab notebooks, models, all these kinds of things. Latent diffusion was done by the Confiz Lab at the University of Munich,
SPEAKER 1 0:05:31
who are led on the stable diffusion one as well. Amazing lab led by Bjorn Jommer and led by Robin
SPEAKER 2 0:05:37
Rombach, who was one of our lead developers here at Stability. And then there was work by Catherine Kraussen, Rivers Have Wings is a Twitter handle on clip condition models and things like that. And the whole community just came together and built really cool stuff. Then you had entities like Mid Journey, where we just gave grants for the beta that started operationalizing it. It's all come together now to the finality of stable diffusion that was released on August 23rd. So that was led by the Confiz Lab. And then ourselves at Stability, Runway ML, a Luther AI
SPEAKER 1 0:06:05
community that we kind of helped run and lie on, we came together to put out 100,000 gigabytes of image-label text pairs, 2 billion images turned into a 2 gigabyte file that runs natively on yo...
https://chrt.fm/track/4D4ED/traffic.megaphone.fm/MLN6770658893.mp3?updated=1670879179
Transcript
SPEAKER 1 0:00:00
I want to send a huge thanks to our friends at AWS for their continued support of the podcast and their sponsorship of our reinvent 2022 series. You know AWS is a cloud computing leader, but did you realize the company offers a broad array of services and infrastructure at all three layers of the machine learning technology stack? In fact, tens of thousands of customers trust AWS for machine learning and AI services. And the company aims to put ML in the hands of every practitioner with innovative services like Amazon Code Whisperer, a new ML powered pair programming tool that helps developers improve productivity by significantly reducing the time to build software applications. To learn more about AWS ML and AI services and how they're helping customers accelerate their machine learning journeys, visit twimlai.com slash go slash AWS ML. All right, everyone, this is Sam Charrington, host of the Twiml AI podcast. And today I'm coming to you live from the Future Frequency podcast studio at the AWS reinvent conference here in Las Vegas. And I am joined by Ahmad Mostak. Ahmad is founder and CEO of Stability AI. If this is the first episode of our reinvent series that you are listening to, don't try adjusting your audio settings. It's definitely me. After a few days here at reinvent in the dry desert here in Nevada, my voice is on his last legs, but I think we'll make it through this. Before we get going, be sure to take a moment to hit that subscribe button wherever you're listening to today's show. And if you want to check us out in studio, you can bounce over to YouTube for the interview.
SPEAKER 2 0:01:42
Ahmad, welcome to the podcast. Thanks so much for having me. Super excited to talk to you.
SPEAKER 1 0:01:46
You are of course the founder and CEO of Stability. Stability is the company behind StableDiffusion, which is a multimodal model that has been getting a lot of fanfare, I think. Welcome. And I'd love to jump in by having you share a little bit about your background.
SPEAKER 2 0:02:02
Yeah, no, I think it's been super interesting. I think StableDiffusion is kind of a specific text to image model. As for me, let's say I started off as a computer science at uni,
SPEAKER 1 0:02:10
enterprise developer, and then became a hedge fund manager and one of the largest video game investors in the world and then artificial intelligence. And I was doing that, it was a
SPEAKER 2 0:02:18
lot of fun. And then my son was diagnosed with autism and they said there was no cure or treatment.
SPEAKER 1 0:02:23
So I quit, switched to advising hedge funds and built an AI team to do literature review, all the autism literature, and then biomolecular pathway analysis of neurotransmitters to repurpose drugs to help him out. And it kind of worked. He went to mainstream school and was super happy. That's awesome.
SPEAKER 2 0:02:39
It was kind of cool. Good trade, good trade. Then I went back to the hedge fund world, won some awards. It's boring. Then decided to make the world a better place. So first off,
SPEAKER 1 0:02:47
took the global X prize for learning. That was a $15 million prize from Elon Musk and Tony Robbins for the first app to teach kids literacy and numeracy without internet. My co-founder and I
SPEAKER 2 0:02:55
have been deploying that around the world. And now we're teaching kids in refugee camps,
SPEAKER 1 0:02:59
literacy and numeracy in 13 months and one hour a day. And we're about to air the crap out of that.
SPEAKER 2 0:03:04
In 2020-21, I designed and led the United Nations, one of the United Nations AI initiatives against
SPEAKER 1 0:03:10
COVID-19, Kayak Collective and augmented intelligence against COVID-19 launched at Stanford, backed by the WHO, UNESCO and the World Bank. And that was really interesting because we were trying
SPEAKER 2 0:03:20
to make the world's knowledge free on COVID-19 with Core 19. So there's a 500,000 paper data set,
SPEAKER 1 0:03:26
freely available to everyone. And then use AI to organize it because it's really confusing. During that, lots and lots of interesting tech kind of came through, but I realized
SPEAKER 2 0:03:37
these foundation models are super powerful. You can't have them controlled by any one company. It's bad business and it's not the correct thing ethically. So I thought, let's widen this and create open source foundation models for everyone, because I think it can really advance humanity.
SPEAKER 1 0:03:50
And again, I think it'll be great to see these things proliferate. So we can have an open discussion about it and also have the value created from just these brand new experiences. That's awesome. And when did you get started down that part of the journey?
SPEAKER 2 0:04:03
About two years ago. Stability has been going for about 13 months now.
SPEAKER 1 0:04:07
Yeah. When I think about the lot of stable diffusion goes back to this latent diffusion paper, which was not even a year ago.
SPEAKER 2 0:04:13
It's not even a year ago. I think the whole thing kind of kicked off with Clip released by OpenAI in January of last year. So I actually had COVID doing that time while doing my COVID thing. My daughter came to me and said, dad, you know all that stuff you do, taking all that knowledge and squishing it down to make it useful for everyone. Can you do that with image? It's like, well, we can. So I'll build a system for her based on VQGAN and Clip. So an image generating model. And then Clip is an image to text model where she created like a vision board of everything she wanted, a description of what she wanted to make. And it generates 16 different images. And then she said how each one of those is different and changed the latents. And then it generated another 16, another 16, another 16. And then eight hours
SPEAKER 1 0:04:49
later, she made an image that she went on to sell as an NFT for $3,500. Wow. And donated the proceeds to India Code Relief. Okay. I thought it was awesome. She's seven years old. Wow. And then
SPEAKER 2 0:05:01
I was like, this is transformative technology. Image is the one it's at. Language, we're already at 85%. We're going to get a 95% image. We're at 10%. We're not a visual species. Like the easiest
SPEAKER 1 0:05:11
way for us to communicate is what we're doing right now. We're having a nice chat. Then text is the
SPEAKER 2 0:05:15
next hardest. And image, like be it images or PowerPoints are impossible. Let's make it easy. This tech can do that. So we started funding the entire sector, Google Colab notebooks, models, all these kinds of things. Latent diffusion was done by the Confiz Lab at the University of Munich,
SPEAKER 1 0:05:31
who are led on the stable diffusion one as well. Amazing lab led by Bjorn Jommer and led by Robin
SPEAKER 2 0:05:37
Rombach, who was one of our lead developers here at Stability. And then there was work by Catherine Kraussen, Rivers Have Wings is a Twitter handle on clip condition models and things like that. And the whole community just came together and built really cool stuff. Then you had entities like Mid Journey, where we just gave grants for the beta that started operationalizing it. It's all come together now to the finality of stable diffusion that was released on August 23rd. So that was led by the Confiz Lab. And then ourselves at Stability, Runway ML, a Luther AI
SPEAKER 1 0:06:05
community that we kind of helped run and lie on, we came together to put out 100,000 gigabytes of image-label text pairs, 2 billion images turned into a 2 gigabyte file that runs natively on yo...