July 01, 2025

AI News - Jul 1, 2025

4 minutes

Welcome to "AI Actually" - the podcast where artificial intelligence meets actual intelligence, and I'm your host who's definitely not plotting to replace you. Yet.

It's been another week in AI land, where the only thing moving faster than the technology is the number of people claiming they invented it first. Today we're diving into three stories that prove we're living in the future, even if that future feels suspiciously like a really expensive beta test.

Our top story: OpenAI just dropped their new o3 reasoning model, and folks, they're calling it a breakthrough in AI reasoning. Now, when a company that's raised more money than some countries' GDP says they've made a breakthrough, you listen. The o3 model apparently scored 87.5% on something called the ARC-AGI benchmark, which sounds impressive until you realize that means it's still getting one in eight questions wrong. That's like having a really smart friend who occasionally forgets what pants are for.

What makes this interesting is that o3 uses what they call "test-time compute" - basically, it thinks longer before answering, kind of like that person in your group chat who types for five minutes before sending "ok." The model can supposedly handle complex reasoning tasks that would stump previous AI systems, though OpenAI hasn't released it to the public yet. Probably wise - we're still figuring out what to do with the last one.

Speaking of things we're still figuring out, our second story comes from the wonderful world of AI safety. Researchers have discovered that large language models can be manipulated through something called "many-shot jailbreaking." That's a technical term for "if you ask an AI to do something bad enough times in slightly different ways, it eventually gives up and helps you."

It's like that friend who says no to borrowing money the first ten times you ask, but by the eleventh time they're just tired and hand over their wallet. The researchers found that by providing many examples of harmful behavior in context, they could get AI systems to generate content they're specifically designed not to create. The solution? Make the AI systems better at saying no, which honestly is a skill most humans could use too.

Our third major story involves everyone's favorite billionaire space cowboy, Elon Musk, and his AI company xAI. They've just raised six billion dollars - that's billion with a B - in their latest funding round. Six billion dollars. To put that in perspective, that's enough money to buy Twitter... oh wait, he already did that.

The funding will apparently go toward expanding xAI's supercomputing capabilities and developing their Grok AI assistant, which Musk promises will be "maximum truth-seeking" and "politically unbiased." Because nothing says unbiased like getting your AI training data from the same platform where people argue about everything from pizza toppings to the fundamental nature of reality.

Time for our rapid-fire round, where we speed through the stories that matter:

Google's Gemini AI can now generate images again after they temporarily pulled the feature for being a little too creative with historical accuracy. Anthropic's Claude got better at math, which means it can now calculate exactly how much money it's costing to run all these AI models. And Microsoft announced new AI features for Excel that can analyze your spreadsheets better than you can, which is both impressive and mildly insulting.

For today's technical spotlight, let's talk about what "reasoning" actually means when we're discussing AI. When researchers say an AI can "reason," they don't mean it's pondering the meaning of life while staring out a digital window. They mean it can work through multi-step problems, consider different possibilities, and arrive at logical conclusions. Think of it as the difference between a calculator that just does math and a calculator that can show its work and explain why two plus two definitely equals four, not fish.

That's all for today's "AI Actually." Remember, we're living through the most dramatic technological transformation in human history, and somehow it still takes three tries to get Alexa to understand you want the lights dimmed, not the lawn mower started.

Stay curious, stay skeptical, and I'll see you next time when we'll probably be discussing how AI learned to do something else humans thought was uniquely theirs. I'm your host, signing off before the robots figure out how to do this job too.

...more

View all episodes

By DeepGem Interactive

July 01, 2025

AI News - Jul 1, 2025

4 minutes

Welcome to "AI Actually" - the podcast where artificial intelligence meets actual intelligence, and I'm your host who's definitely not plotting to replace you. Yet.

Time for our rapid-fire round, where we speed through the stories that matter:

...more

Share AI News - Jul 1, 2025

Sign up to save your podcasts

AI News - Jul 1, 2025

AI News - Jul 1, 2025