
Sign up to save your podcasts
Or


TLDR:
Andon Labs, evaluates AI in the real world to measure capabilities and to see what can go wrong. For example, we previously made LLMs operate vending machines, and now we're testing if they can control robots at offices. There are two parts to this test:
We find that LLMs display very little practical intelligence in this embodied setting. We think evals are important for safe AI development. We will report concerning incidents in our periodic safety reports.
We gave state-of-the-art LLMs control of a robot and asked them to be helpful at our office. While it was a very fun experience, we can’t say [...]
---
First published:
Source:
---
Narrated by TYPE III AUDIO.
---
Images from the article:
Apple Podcasts and Spotify do not show images in the episode description. Try Pocket Casts, or another podcast app.
By LessWrongTLDR:
Andon Labs, evaluates AI in the real world to measure capabilities and to see what can go wrong. For example, we previously made LLMs operate vending machines, and now we're testing if they can control robots at offices. There are two parts to this test:
We find that LLMs display very little practical intelligence in this embodied setting. We think evals are important for safe AI development. We will report concerning incidents in our periodic safety reports.
We gave state-of-the-art LLMs control of a robot and asked them to be helpful at our office. While it was a very fun experience, we can’t say [...]
---
First published:
Source:
---
Narrated by TYPE III AUDIO.
---
Images from the article:
Apple Podcasts and Spotify do not show images in the episode description. Try Pocket Casts, or another podcast app.

26,330 Listeners

2,453 Listeners

8,557 Listeners

4,182 Listeners

93 Listeners

1,601 Listeners

9,927 Listeners

95 Listeners

511 Listeners

5,512 Listeners

15,931 Listeners

545 Listeners

131 Listeners

94 Listeners

467 Listeners