Most interactions with AI are through a text-driven interface. Users craft a prompt, and the AI delivers a response. That might be a text response, an image, audio or even some form of video output. But there’s the problem, it is output, not input. AI scans images, video, and audio, but not as humans do. […]