Unleashing the potential of reinforcement learning

03.01.2018 - By O'Reilly Media Play

Download our free app to listen on your phone

In this episode of the Data Show, I spoke with Danny Lange, VP of AI and machine learning at Unity Technologies. Lange previously led data and machine learning teams at Microsoft, Amazon, and Uber, where his teams were responsible for building data science tools used by other developers and analysts within those companies. When I first heard that he was moving to Unity, I was curious as to why he decided to join a company whose core product targets game developers.

As you’ll glean from our conversation, Unity is at the forefront of some of the most exciting, practical applications of deep learning (DL) and reinforcement learning (RL). Realistic scenery and imagery are critical for modern games. GANs and related semi-supervised techniques can ease content creation by enabling artists to produce realistic images much more quickly. In a previous post, Lange described how reinforcement learning opens up the possibility of training/learning rather than programming in game development.

Lange explains why simulation environments are going to be important tools for AI developers. We are still in the early days of machine intelligence, and I am looking forward to more tools that can democratize AI research (including future releases by Lange and his team at Unity).

Here are some highlights from our conversation:

Why reinforcement learning is so exciting

I’m a huge fan of reinforcement learning. I think it has incredible potential, not just in game development but in a lot of other areas, too. … What we are doing at Unity is basically making reinforcement learning available to the masses. We have shipped open source software on GitHub called Unity ML Agents, that include the basic frameworks for people to experiment with reinforcement learning. Reinforcement learning is really creating a machine learned-driven feedback loop. Recall the example I previously wrote about, of the chicken crossing the road; yes, it gets hit thousands and thousands of times by these cars, but every time it gets hit, it learns that’s a bad thing. And every time it manages to pick up a gift package on the way over the road, that’s a good thing.

Over time, it gets superhuman capabilities in crossing this road, and that is fantastic because there’s not a single line of code going into that. It’s pure simulation, and through reinforcement learning it captures a method. It learns a method to cross the road, and you can take that into many different aspects of games. There are many different methods you can train. You can add two chickens—can they collaborate to do something together? We are looking at what we call multi-agent systems, where two or more of these trained reinforcement learning-trained agents are acting together to achieve a goal.

… I want a million developers to start working on this. I want a lot more innovation, and I want a lot more out-of-the-box thinking, and that is what we want by making our RL tools and platform available to our Unity community. Let me just jump to one thing here: most people think that reinforcement learning in the game world or in game-like situations is a lot about what we call ‘path finding.’ Path finding is basically for a character in a game to navigate through some situation—this is pretty well understood. There are good algorithms for that. Looking ahead, I’m actually thinking about a different set of decisions. For instance, which weapon or which tool should a character pick up and bring with them in a game? That is a much, much harder decision. It’s strategy at a higher level.

Machine learning and AI at Unity

If you think about where intelligence originated around us (animals and humans), it’s really originating out of surviving and thriving in a physical world. That is really the job of intelligence. You have to survive, you have to find food, you have to avoid your enemies, you have to walk falling down—so, gravity is playing a big role there. If you thi

More episodes from O'Reilly Data Show Podcast