June 27, 2021

283 – 🔊 Voice Content And Usability with Preston So

1 hour 1 minute

In this episode, we get to talk with Preston So, Senior Director of Product Strategy at Oracle. We talk to Preston about his new book VOICE CONTENT AND USABILITY. We discuss the concepts of building conversational designs that are ethical, accessible, and usable.

✨ Episode Sponsor

Auth0: https://auth0.com
Auth0 on YouTube: https://www.youtube.com/auth0
Auth0 on Twitch: https://www.twitch.tv/auth0
Auth0 Avocado Labs online meetup events: https://avocadolabs.dev/

🔗 Episode Links

Preston’s new book – Voice Content And Usability: https://abookapart.com/products/voice-content-and-usability
Publisher: https://abookapart.com/
Preston on Twitter: https://twitter.com/prestonso
Preston’s Website: https://preston.so/
Preston on LinkedIn: https://www.linkedin.com/in/prestonso/
Oracle: https://www.oracle.com/
Previous episode – 🪓 Headless CMS, Decoupling Drupal with Gatsby, & Conversational Design with Preston So https://www.thundernerds.io/2020/06/headless-cms-decoupling-drupal-w-gatsby-conversational-design-w-preston-so/
Ask GeorgiaGov: https://georgia.gov/chat
Google Cloud Dialogflow: https://cloud.google.com/dialogflow
Diglossia: https://en.wikipedia.org/wiki/Diglossia
Word by Word: The Secret Life of Dictionaries: https://www.amazon.com/Word-Secret-Life-Dictionaries/dp/110187094X
Conversations with Things: UX Design for Chat and Voice: https://www.amazon.com/Conversations-Things-Design-Chat-Voice/dp/1933820268/ref=sr_1_1
Invisible Man: https://www.amazon.com/Invisible-Man-Ralph-Ellison/dp/0679732764
Gatsby: The Definitive Guide: https://preston.so/books/gatsby/
Hosts:
- Frederick Weiss: https://twitter.com/FrederickWeiss
- Brian Hinton: https://twitter.com/mrbrianhinton

📜 Transcript

Brian Hinton: [00:00:00] I’m Brian Hinton.

Frederick Weiss: and I’m Frederick Philip von Weiss.

And thank you so much for consuming the Thunder Nerds, a conversation with the people behind the technology

that love what they do

[00:00:46] Brian Hinton: [00:00:46] and do

tech good.

[00:00:52] Frederick Weiss: Yeah, thanks

everybody for watching the show. If you can please go to the notification bell and subscribe.

Brian Hinton: We’d like to thank Auth0, Auth0 is

this season’s sponsor. They make it easy for developers to build a custom secure and standards-based

out their YouTube and Twitch under the username, Auth0 with some great developer resources and streams, and

last but not least is our avocado labs.

[00:01:43] I love that name. An online destination that their developer advocates run

organizing some great meetups. Thank you Auth0.

[00:01:52] Frederick Weiss: [00:01:52] Yes. Thanks

Auth0! Let’s go ahead and welcome our guest.

[00:01:50] Frederick Weiss: [00:01:50] Thanks so much,

Brian. So with that being said, and without any dues being further, let's go ahead and get to our

guest and welcome him back. We have the author of the new book, VOICE

CONTENT AND USABILITY, senior director product strategy at Oracle, speaker,

Preston So. Preston, welcome back to the show!

[00:02:17] Preston So: [00:02:17] Hey Frederick.

Hey Brian. Thanks so much for having me back on Thunder Nerds. Might I say it’s a real pleasure to be

back here one more time to talk about my new book. Thanks for having me.

[00:02:26] Frederick Weiss: [00:02:26] I

appreciate it. And we started a little late and you have an event that you were just doing. Do you mind

telling us a little bit about that event?

[00:02:32] Preston So: [00:02:32] What that. I

will. And my first and foremost dear apologies to everyone who was waiting for this live stream. I had

the misfortune of forgetting getting, send out a confirmation email and an email that actually I had,

let's say, Hey, this event is happening today. So we started a bit late and we ended a bit late.

[00:02:51] It was my launch event for my new book, which is here, voice content and

usability. And we had a great time doing some discussion about the implications of voice interfaces

for those of us who work with the web, which is, I think a lot of us in the funder nerds audience, as well

as the implications of voice on our society.

[00:03:13] And of course, The vaunted and traditional book cake, which is something

that everyone at a book apart, my publisher has to unveil as part of the process of launching a new

book. It was a very interesting process, but very sorry to those who were waiting on this

YouTube person.

[00:03:33] Frederick Weiss: [00:03:33] Oh,

sorry. Did you say a book cake?

[00:03:35] Like

[00:03:35] Preston So: [00:03:35] utterly a

cake? Yeah. Book cake. Maybe I'm saying too much. I don't know how, like it should be cake.

Gotcha. Yeah. If she, yeah. Not like everything is cake, oh, it's all cake that

yes. It's all cake as well, but a book cake, because basically oh, you're supposed to

have a cake that looks like your book and represents your book.

[00:03:55] Yeah. So it was a great launch event and it was a real

pleasure to share a little bit about the process. I went through writing the book and some of the

really exciting things that I taught. Love that.

[00:04:09] Frederick Weiss: [00:04:09] And

speaking of the book, we're going to be giving away three copies of the ebook courtesy of a book apart

today.

[00:04:16] If you can just chat with us, ask us your questions. Maybe

tell us you want a book we're going to randomly give away some books. So we'll be doing that as the

show progresses on. Preston first, let me talk to you a little bit about you being with us last

time, promoting your last book.

[00:04:35] Decoupling Drupal. Am I saying that correct?

[00:04:38] Preston So: [00:04:38] Yes. Decoupled

Drupal in Practice.

[00:04:42] Frederick Weiss: [00:04:44] How was the success of that and how did that prompt you to start writing a

new book? You just wrote that book not too long ago and all of a sudden you have another book.

[00:04:54] So I see a pattern every year, a new book, I wish I could come out

with a new book every year. Like someone would say RL Stein of goosebumps or something like

that. But This has been a really interesting process because my books tend to be very focused on

really technical aspects of the ways in which we work with our content and the ways in which we work

on the web.

[00:05:19] Preston So: [00:05:19] The first book

I wrote was back in 2018, a couple of Drupal and practice. And I think one question I get a lot and

definitely happy to answer for some of those on the call or those in the audience.

What's it, what is it like as a technologist to write a book? Especially for those who are

developers or designers.

[00:05:38] So this book is actually my first book that is not a coding book, not a

technical book. It doesn't have any code snippets in it. Couple of code formatted sections that are

really tiny, but it doesn't really have any sort of tutorials as to how to spin up a command line

interface or things like that.

[00:05:58] It's really focused on the user experience and design audience and the

accessibility audience, which is a very different audience from the audiences that I'm used to writing

for. What's interesting is that decoupled, Drupal and practice is about the architectural underpinnings

or the foundation of how you can deploy content.

[00:06:16] That's oriented towards things like JavaScript applications or other

sorts of environments, like voice interfaces. But it really dives into the Navy. Voice content

usability. However, is really unlike that because it really focuses on how we as designers, as user

experience professionals who are working on usability testing or usability research can really engage

with this new field that is emerging around voice interface design, and specifically around things like

voice content strategy and voice content design.

[00:06:49] But the other thing I will say is that I actually made the mistake. I had

the privilege, or some would say the misfortune of writing two books at the same time, over the past year

and a half. And the other book that I've got coming out this fall is Gatsby, the definitive guide, which

is about Gatsby JS, the static site framework.

[00:07:08] So right back in the other direction.

[00:07:10] Brian Hinton: [00:07:10] So

you're going to write

[00:07:12] Frederick Weiss: [00:07:12] every

other year,

[00:07:14] Brian Hinton: [00:07:14] three

minutes a year

[00:07:15] Preston So: [00:07:15] I was thinking

of more Fibonacci sequence, actually, Brian like I think I should write five and then eight and then

13 Yeah, they might get a little shorter and they might be filled with some more memes.

[00:07:25] So why is voice content usability? Like, why did you're like, okay,

now I really think I need to write this.

[00:07:33] Yeah. Yeah. Specifically too, if I

[00:07:34] Frederick Weiss: [00:07:34] could

append to that point, Brian Y you said yourself, like you moved away from like a coding

kind of thing. Like why go that way into

[00:07:42] Preston So: [00:07:42] the

accessibility?

[00:07:44] So I've always been really into web development, but my real

core interest and passion has always been for design and user experience. I started out as a web designer. I

started out as a print designer. I actually also did computer programming back in the back in those

days and got into web development that way.

[00:08:02] But it really wasn't necessarily something that was an itch. I

got to scratch very much this aspect of design and user experience that is beyond the web. And I've

always been interested, not only in how we can serve some of the users who are interacting with some of the

content that we produce or some of the experiences that we create in terms of technology beyond the

web.

[00:08:27] I was also really interested in how we can actually best serve. Users that

already exist and users that are already within the demographics of the audiences that we're trying to

serve. I've always been interested in web accessibility first and foremost, as well as some of the

aspects of how accessible it really changes the ways that we think about other user interfaces that might

not have gotten and so much attention from the standpoint of how they can better serve disabled users and

those who might be elderly and have a little bit more trouble for example, using a mouse

or typing on a keyboard and those two audiences, specifically the elderly and disabled communities

around the U S we're communities that we aim to serve with the first ever voice interface for

residents of the state of Georgia.

[00:09:15] I worked to ask Georgia gov, which had the specific goal of really

focusing on. How we can serve residents of the state of Georgia who want to be able to find out things like

registering the vote or how they can get a small business loan or how they can renew their fishing license

without necessarily having to incur the cognitive costs of either interacting with a screen reader driven

website or interacting with, let's say somebody in person at an agency office.

[00:09:45] And I think one of the really interesting insights that we found is that

I think really unexpectedly is that a lot of the websites that we build, obviously we think. Nowadays

because so many people use the web because disabled folks use screen readers because so many people now are

used to the paradigm of the web.

[00:10:04] The website is really the gospel of how people should now consume

content and how people do consume content. But I think one of the things that's been born out by this

project is that the kinds of things that people would ask an Amazon Alexa sitting in their own home about

the state of Georgia and the government capabilities that are available to them were completely

different.

[00:10:26] And in some cases, diametrically opposed to the sorts of queries and

things that people would search for on the georgia.gov website, which is the ultimate source of all of the

information that we used. And that really illuminates a little bit of this. I would say a little bit

of this hidden bias that we have.

[00:10:43] Towards the website as the primary conduit for information, when in some

ways it really should be just considered one facet of a wide variety of ways to access our content

equitably. So then what do we

[00:10:56] Frederick Weiss: [00:10:56] do?

Are we expected to have multiple locations for our content, like specifically. I'm going to

build content for voice, or I'm going to build content for a website, and I'm going to build content

that goes into an application.

[00:11:14] Or am I or does it behoove us to write content that is a uniform and

maybe in a specific way, and possibly you might answer in what way that, that might be as one source of

truth.

[00:11:31] Preston So: [00:11:31] That's a

really challenging question. And obviously I shouldn't really go too far here without saying that some

of those questions are answered in my book, voice, content, and usability as a book of parts.

[00:11:42] Please don't give everything away just a little bit. Could you read

the whole book out loud, please? That'd be here all day. Yeah. Yeah, we do

have, so what I will say is that this is the perennial debate, right? I think one of the things that

we as designers struggle with as we really deal with this exploding, Kind of menagerie of user experiences

that we increasingly have to deal with is what do we do with our data?

[00:12:10] What do we do with all of these things that we've built that are in

some ways, very much oriented towards, or very focused on the audiences that we've cultivated over time,

namely our websites and mobile applications being for these very visually rooted experiences and

demographics that are used to these visual experiences, the things that are really problematic about some of

the approaches that were characteristic of the early days of voice content.

[00:12:37] Let's say when people were experimenting with voice interfaces or chat

bots, as a means to deliver a certain type of content, you would have a parallel version of the information

that was already housed in your website. And those of us who are content designers or content straps,

Can really feel the pain that comes from the notion of having a set of content over here in one silo,

that's destined for the website and another piece of content over here, that's destined for a voice

interface.

[00:13:04] How do you keep those two things in sync? And now that we have regulations

like GDPR and HIPAA, for example, that are really obligatory, that content stays current, or that content

stays up to date with what we need. How do we actually make sure that all of this content stays up to date

without having it be in a single source of truth for content?

[00:13:24] Now, my book definitely doesn't make any prescriptions about going in

one direction or the other where, oh yeah, you must do it this way. Or you must do it that way because there

are exceptions to everything and nothing is ever cut and dry. However, I generally err on the side of

saying that look at the case of what we did with the state of Georgia, georgia.gov, they insisted

actually that we use one single source of truth for content that was going to be an omni-channel or channel

agnostic source of truth for content because ultimately a lot of us don't have the luxury to maintain

multiple versions of content that are destined for multiple conduits of content.

[00:14:00] So we ended up keeping it all in one source and we ended up maintaining it

all in one. And having both voice and web versions of the content pull from the exact same repository

of content, which ends up being more scalable in the long run, especially now that Georgia has built an

additional chat bot that is a written chat bot, a textual chat bot, but also pulls from the same

content.

[00:14:25] I'm curious, there

[00:14:25] Brian Hinton: [00:14:25] was a course

of your research and writing of this book. Was there anything that shocked you or surprised you that you

didn't like?

[00:14:33] Preston So: [00:14:33] Didn't

immediately realize. Yeah. It's a great question, Brian. I there's a there's

too many to list because I think one of the things that's really one of the things that's really

tough about voice interfaces is that up until recently, it's been really challenging for a lot of

those who are not computational linguists or machine learning engineers or people who are really deeply

involved in some of these very low level technologies to really get involved with voice.

[00:15:08] However, one of the things I will share is that in some ways there's

really interesting emergencies of some of the foibles in voice interface design. When you start working with

this technology that is very reminiscent of back in the day and those of us who were listening

to Thunderbirds.

[00:15:27] Have worked in the web for a while, will recognize, for example, the

things that we used to deal with in the early two thousands or mid two thousands, like quirks mode

compatibility, or some of the really odd browser hacks that we had to do with CSS. And there's

weird things like that in voice interfaces.

[00:15:46] One example of this that I'll share and I'll keep it just to one

is when we build, ask Georgia gov, which of course is that voice interface for the residents of the state of

Georgia. There was a situation where we had a retrospective. And one of the things that we did for

Georgia was they wanted to have the ability to administer and manage all this content in one single

place.

[00:16:08] And we had a parallel set of logs and reports that would sit right next to

the logs and reports for the website. So whenever somebody would hit a 4 0 4 error on the website they

could compare and see. How many times did this piece of content also air out, for example, for the

voice interface for Alexa, were there situations where the search return, the results or where it triggered

4 0 4 errors on the content management system that we were using to serve both the website and the voice

interface.

[00:16:39] So we had this retrospective about eight months after the launch of the

interface, which was in 2017. And we had a discussion about some of the logs and we kind of leaf

through them and said, okay, what are some of the errors that we're seeing? And what are some of the

things that we can do to either adjust the content or maybe even do some debugging of the interface

itself?

[00:17:00] There was this one result that kept on coming up over and over again, this

one error, this 404 error, basically a search that somebody conducted that returned no results, no content.

And it was the word Lawson's L a w S O N apostrophe S. And this kept on popping up over and over

again. It was about 16 times.

[00:17:20] If I remember correctly in the log and we thought. Who is searching, who

wants to search for this, like proper now this brand name this person named Lawson did

they get this confused with the different kind of application on their Alexa that they're trying to use?

And we sat there and scratched our heads for a few minutes.

[00:17:38] And one of the native Georgians in the room suddenly perked up and she

said, you know what? I think it's somebody who is from Georgia, who has a Southern drawl, who is trying

to say the word license as in driver's license or nursing license or fishing license, and sure

enough.

[00:17:57] That was exactly what happened. And this is one of those situations where,

Hey you can do the best designed application that adheres to the latest and greatest standards and

specifications like we did back in those days with CSS and come within an inch of perfection when it comes

to these voice interfaces that we build custom.

[00:18:17] But ultimately it's in the hands of people like Amazon or Google,

whether or not they can actually understand the kaleidoscope of American English dialects that we have in

this country. And that we really should be able to understand. And I think it's a really good sign that

yeah, these voice assistants are really good.

[00:18:35] But they're not yet at that point where they can beat us at our own

game of human conversation. Yeah.

[00:18:40] Frederick Weiss: [00:18:40] This

brings me if you don't mind really quick, Brian, this is something that Todd Libby wrote

here, and he he also appended to his question where their edge and he wrote challenging where

they're challenging edge cases with respect to a 11 Y that you ran into the Georgia project.

[00:19:00] Preston So: [00:19:01]

Yeah. Great question, Todd. And when it comes to the work that we did on accessibility, on

Astoria gov in terms of edge cases, I will share that. I think one of the big challenges,

there were several challenges, right? And one I think is one of the one of those

challenges that's inherent to.

[00:19:20] Voice interfaces that are pure voice interfaces, which I, and others

define as basically a voice interface that lacks a screen. So there's no visual component, no tactical

or physical cues on it. Yeah not a gooey. You're basically just interacting with

somebody through the spoken word.

[00:19:37] And I think this is not really an edge case so I don't wanna say

that this answers the question, but one of the things that I think a lot of people forget, and I think is

really important to keep in mind when working with voice interfaces, when it comes to extending the

accessibility of your content on a website or your web properties, is the fact that pure voice interfaces

that are lacking in a visual or physical component are actually not accessible to

certain disabled people, namely those who are deaf or those who are deaf blind.

[00:20:10] And the notion that I think a lot of people have today, Is

voice interfaces can solve a lot of cases for accessibility, but that's really not the case

because when it comes to so many of the demographics that we need to serve in the disabled community,

there are certain solutions that only go part of the way there and we're going to do that.

[00:20:34] Yup. And yeah, so that's yep. Yeah. That's

exactly right. How do we also make sure that we can serve content on a mobile? Consumable way to

refreshable braille displays that are maybe not necessarily the same thing as the kind of let's

say screaming and experience.

[00:20:52] That's very rooted in the visual structure of a webpage it's very

early days still in this, the sort of notion of multimodal accessibility or how to really make

sure that a lot of the user interfaces that we have are not actually stepping on the toes of other folks who

are accessing content in particular ways.

[00:21:12] The edge case, however, that I will share is I think a lot of people also

make the assumption that these voice interfaces and voice assistants can be. The

ultimate solution for a lot of folks who are blind or have low vision, but that's really a tough sell in

some ways, because I think one of the things that's really important to recognize about these peer voice

interfaces like Alexa, is that they have a learning curve too.

[00:21:38] We know this web meter and some of these browsers or three

meters like ChromeVox or Jaws have issues that require people to ascend a very steep learning

curve to use them in an effective and efficient way. And voice interfaces are very much the same way. So one

of the things that we encountered during our usability testing was.

[00:22:01] Just one of those things that we didn't necessarily expect,

which is that a lot of people that we had come in and worked with and went through our usability

study, really had very little experience with Alexa devices. And I think for those who are looking at

voice interfaces as a means to be a compelling potential sidelong alternative to swim meters,

that might necessarily, that might potentially be a little bit problematic and how they

efficiently guide users to their content as as the voice interface designer, Chris Mari writes

it is something to think about, which is there is still a learning curve.

[00:22:41] And how do you actually address that learning curve in a way that makes

sense to those users that you need to.

[00:22:47] Brian Hinton: [00:22:47] Yeah,

I'm curious in the sense of Georgia, where we're at my current role, we're working on a chat

bot. And one thing that we've found most difficult is I think it's called semantic, parsing a word

converts that conversation into what logically makes sense.

[00:23:03] What are they asking? And it's like the difference, like the capital

of Georgia, someone's saying capital of Georgia and that's all they say, or what's the capital

of Georgia or Georgia Capitol is like did you encounter anything weird in that sense or any

cases.

[00:23:19] Preston So: [00:23:19] Yeah. I

talk about this a lot in chapter three of my book, which is about writing those conversational dialogues

that really are the lattice work of the voice interfaces that we produce.

[00:23:32] And it's a really challenging kind of thing because a lot of these

questions, Brian are really rooted in the technology that you're using. Because some voice

ecosystems or conversational ecosystems are better equipped to deal with. Let's say variations, like the

ones that you mentioned just now, than others are.

[00:23:49] But there is a lot of work being done to improve the situation. So back in

the day in 2016, when we worked on Astoria gov and in the grand scheme of voice interfaces and the

history of conversation design five years ago is a long time ago. We might as well be talking about

clay tablets and abacuses at this point, because that was an era where a lot of those utterances that people

would state in order to do a process of what's.

[00:24:17] Intent identification where the user interface is able to piece together a

sense of what the user actually wants to achieve, which is much easier said than done. That's a

process that used to be very much a sort of manually driven process. For example, let's say that

you're trying to identify a yeah.

[00:24:37] You're trying to identify a question like what is the

capital of Georgia? It has to be phrased like a question, let's say. And one of the things that I think

is really challenging for a lot of people who are just getting started with voice interfaces is that in some

of these ecosystems, some of these technologies obligate you to be very clear about defining how the user

has to respond.

[00:24:56] And as we know, as users. The ways that we actually respond to some of

these questions and the ways in which we actually say some of these things can be phrased completely

differently from the ways in which we've actually coded the voice interfaces or conversational

interfaces or chatbots to consider.

[00:25:14] And whenever we have, what's called a, an out of domain error where

the chat bot or the conversational interface or voice interface, isn't able to actually understand

what you're saying, because the way that you phrased it, even though it's a perfectly logical thing

isn't accounted for within the context of what the voice interfaces in is able to understand

through its programming is a very big problem.

[00:25:39] So I'd talk about intent identification and the problems

that occur when you have these very dedicated slots or tokens or some of these No, basically

this teasing out process that you have to do with intern identification that really relies on some of these

boilerplate templates that users have to use to say these things, but that's not how we speak.

[00:26:00] That's not natural, right? Nobody really wants to have to say

things the same way. Over and over again, to be understood by a voice interface. Although there is usability

research evidence that suggests that some users do prefer that. But there are some ecosystems now, like

dialogue flow, for example or some of the major new conversational tools that

are out there are getting better at understanding, let's say all the different variations that you

could possibly have and being able to intelligently parse through that and say, okay, this is the intent of

what the user is trying to do.

[00:26:36] Even though this person might have said something that's very remote

from the, let's say a normal way or the default way that we would expect.

[00:26:45] Brian Hinton: [00:26:45] Yeah. My

favorite, like real life scenario of beating my brain, being the AI, trying to understand is when I,

somewhere, I can't remember where it was, Midwest that they asked what Coke do you want?

[00:26:56] And I said, Coke. And they're like, I'm sorry. Is that okay? Yeah.

[00:27:04] Preston So: [00:27:04] That's

what they call it.

[00:27:07] Brian Hinton: [00:27:07] I can't

imagine dealing with that sort of a scenario, isn't it? AI type? Yeah. That's

funny too, cause it could be something where if you're trying to communicate something out

to the bot or the voice technology, you got to think about the context of the personification

of this voice or the overall brand. If I'm interacting with a hospital, I don't

want the voice to sound all silly and goofy. I I want it to sound like a, just a normal, regular

voice. There are some kinds of situations that you might want or even languages for that

matter. If I'm somebody in Italy and I'm looking for.

[00:27:48] Frederick Weiss: [00:27:48] A lasagna

recipe and I'm in Italy and I'm looking for a lasagna recipe and I go to, and it sends me to the

food network and it starts reading me like a M roll recipe in in English. And I don't understand

English. There's all kinds of interesting facets

[00:28:01] Preston So: [00:28:01] to this,

yeah, this really brings up, I think a couple of interesting elements of the ways in which the

conversation design or voice interface design landscape really requires us to think very differently about

some of the things that we usually took for granted.

[00:28:17] And one of those really is the building blocks of language. And

I'm very lucky in that. Working with voice interfaces over the past five or six years has really allowed

me to scratch my itch when it comes to my academic background, which is actually in linguistics. I have a

degree in linguistics. Not a lot of people know that.

[00:28:34] But the biggest issue, I think a lot of us face is we're moving.

In several directions at the same time, the first is that we're moving a lot of the ways in which

we use to write user interface, texts, or content from the written word over into the spoken word,

which is a very different realm from how we normally write UI texts.

[00:28:56] Are we, how are we? Normally I actually write content. And just one

example to illustrate that is the fact that we don't really say the phrase to whom it may concern when

we actually speak. And we also don't really write the word literally, as often as we say it in

conversation. So a lot of these little nuances are things that can often be missed.

[00:29:17] And there's two ways in which this really. Can be a problem. The first

is that there are certain expectations that users will have that their voice interface reflects the kind of

informal or colloquial conversation that they might have with a friend. And when it doesn't reflect

that, and when the voice interface comes out with this very kind of stilted utterance or something,

that's a very uncanny valley, like I can really interrupt or dislodge the user from what is

called habitability and a voice interface.

[00:29:48] This is something that is talked about quite a bit in voice

interface literature, where the user has to feel like they're not gonna want to actually tear their hair

out or what little hair they have in terms of having a conversation with a voice interface. So

that's number one, but I think number two is really interesting given that you alluded to some of the

challenges around multilingualism.

[00:30:09] Types of conversation. And this really comes to, I think, some of the

elements of voice interface design that remain a largely unexplored area and also an area that is very

challenging because of the fact that so much of our conversational technology and voice interface technology

has so far been rooted in the English speaking world.

[00:30:30] And one of those issues is when we think about the ways in which we want

to serve multilingual audiences and international audiences on the web, we just have to provide translatable

strings, right? We just have to provide like these versions of these different pieces of texts that we have

or different pieces of content we have.

[00:30:48] But that is a very different kind of proposition when it comes to

some of these other languages. And I think one of the biggest issues that we have to focus on. Is the fact

that not all languages work like English, not all languages operate in the same kinds of systems and the

same kinds of assumptions that a lot of us have about English.

[00:31:08] And one of the things that is really interesting to me is that I'm

noticing more and more some of this Anglophone privilege or Anglophone bias in a lot of the voice interfaces

that are coming out that are meant to be multilingual are also direct translations of an English interface

because fundamentally some languages simply do not work the same way as English.

[00:31:28] There's a phenomenon in linguistics called Dyke Glossier. And this is

something I talk about on my blog, Preston dot. And this notion of glossy is actually a phenomenon. I

studied also when I was in college where the written form of a language is so vastly different from the

spoken form of a language that they might as well be considered two different dialects or two different

vernaculars.

[00:31:50] And in some cases. Like Brazilian Portuguese, for example, you really have

to learn two different grammatical systems and two different lexicons and two different approaches to

the language in order to make yourself understood. Because if I went out on the street and I started

speaking in the way that I write, I wouldn't actually be necessarily understood.

[00:32:10] It I'd be understood because people would be able to understand, but

it would be a very strange and off-putting conversation. What I find is very interesting with a lot of the

work that conversation designers are doing today is that there's a lot of focus on efficiency and

scalability, where we can build one single conversational agent or one single conversational interface that

manifests as a chat bot as a slack bot, as a WhatsApp bot Facebook messenger bot, and as an Alexa skill and

a Google.

[00:32:36] But there's a big problem with that, because that assumes that the

same kind of conversation you would have with a chatbot is going to be the kind of conversation you have

with a voice interface. And one of the things that we see in linguistics and also in the kinds of

conversations that we have on a daily basis through email and texts and at the delegate.

[00:32:57] It isn't the case that our spoken conversations are word for word or

even letter for letter. Exactly the same as our written conversations. And for those who don't speak

English, for those who are operating in a realm where let's say that the language that

they're writing for is not English.

[00:33:16] A lot of those considerations and concerns become a lot more important

than essential when it comes to some of the design that we have to do. And I think this means that we have a

long way to go in the English speaking world to understand how some of these conversational interfaces

really are rooted in our ways of speaking in ways that might not be so appropriate for the rest of the

world that we need to.

[00:33:38] Brian Hinton: [00:33:38] Yeah, all of

this made me think of a book. I recently read word by word, the secret life of dictionaries. And

it's a fantastic book, but it's like the slang too, of how you mentioned the different

versions of Portuguese, the slang is different like Mexican slang versus Spanish, Mexican slang

versus Spain.

[00:34:00] Spanish slang, very different and English slang, different, like someone

said, and also how people will say things like cool versus cool, like completely different. And

how to interpret that yeah, Johnny. Yeah. Tone.

[00:34:15] Preston So: [00:34:15] Yeah. And I

think this really illustrates a couple of different things.

[00:34:18] You've got the subtext that is not something in UI text or in

web content or in any of the word mediums that we have. And paralanguage sticks in this realm of,

okay. How are you actually? Really reflect back the fact that the user or the interface might be

speaking in a sarcastic tone or in a more assigned tone or in a very stilted tone.

[00:34:43] Like those three things can mean very different things, even though they

all use the same single sentence. But the other thing that's really interesting too, Brian, and I think

you raised a really good point there, which is it's not just the fact that we have all these

differences between languages and the ways that they operate.

[00:34:58] We also have very important differences. Like I mentioned earlier with

that Lawson's example around those of us who speak English. And one of the things that worries me

a lot about some of these voice interfaces is first of all, the fact that we hear fundamentally one

single dialect represented oftentimes in this realm of voice interfaces.

[00:35:19] And it's very similar in some ways to the ways in which newscasters

and weather forecasters used to have to be obligated. By their organizations to speak using a middle

American or general American dialect. It was unacceptable in certain past decades, in the news media for

somebody to speak with a Southern accent or somebody to speak with a different dialect of American English

on the air.

[00:35:43] And that's something that's represented now in voice interfaces,

in both a very limiting and very pernicious way. Because as we know, from interacting with so many different

people from so many different walks of life, not only do we have examples of people who might be

bilingual or who might be members of a queer or trans communities who have to switch between different

modes of speech or those who are bilingual descendants of immigrant communities who have to be able to

code switch between English and Spanish, why aren't those sorts of interesting toggles and those sorts

of interesting nuances.

[00:36:17] Representative voice interfaces too, because maybe the kind of

conversation that I want to have is the kind of conversation that I would have at home in new Delhi, where

I'm switching in between English and Hindi mid-sentence or I'm switching in between English and what

I think mid sentence. So these sorts of considerations are not only important for those who are users of

English in outside of America, which I think is one example of the America centric approach that we

often have with technology all over the place.

[00:36:46] But also the fact that we have been very marginalized and

underrepresented. Oppressed groups of people in the United States who speak in certain

ways that are not reflected in how we want voice interfaces to speak as well. And I think one very

compelling example, two very compelling examples of this is first of all, the fact that the ways

in which people use AAV or African American vernacular English is very different from the sorts of

voice interfaces that we interact with.

[00:37:14] For example, why is it that we can't hear those sorts of

conversations represented in an Alexa device. It has something to do with the intrinsic bias that a lot of

us have for a more middle American or general American approach to the conversations that we have. Of

course, fundamentally and foundationally a white American form of speech.

[00:37:33] And by the same token we know that those who identify as LGBTQ

have very different approaches to using certain language. There's certain code terms. There are

certain colloquialisms that are really not understood by audiences that are outside of that community. And

how do we make sure that voice interfaces can also represent those things?

[00:37:54] And this ties back to one of the things I talked about. In the final

chapter of my book, which really is focused on the problems that surface that we don't consider when we

go Willy nilly into this realm of voice interfaces and serving people through conversation in ways that we

don't expect. And one of those examples is think about why organizations today and think about why

it is that so many people want to get into voice interfaces and want to get into chat bots in the first

place.

[00:38:20] So many people are doing this because these airlines, hotels, large

companies, corporations, they fundamentally want to be able to reduce the load on their customer service,

frontline agents or those who are cost center staffers. But if you think about it, who are these call center

staffers? Who are these people who answered your recall when you're calling them in the middle of the

night from the airport, screaming about your lost luggage or screaming about your canceled flight.

[00:38:44] It's somebody who might be in the Philippines or somebody who might be

in India or somebody. Might be in the global south, it was a person of color who is from a lower middle of

middle income country who doesn't have the resources necessary to speak in a general American dialect in

the same way that you would expect somebody who's from your own community to speak.

[00:39:03] And this really illustrates a very, I think, big concern in voice

interfaces today, which is. When we begin to sterilize and flatten out all of these rich nuances that make

our conversations with all of these different people and from all of these different lived experiences, so

important to our worldview and to the ways in which we interact with the world.

[00:39:26] What does that do to our future as users? What does that do to our level

of trust in our user interfaces? What does that do to credibility and authority? Of those user interfaces

and the information that they provide, because let me be honest. When I think about the fact that a voice

interface might lead to a Filipino center worker or somebody who is in Mumbai, who is in a call

center losing their jobs.

[00:39:52] I'm not so sure that I want that replacement to be this uncanny valley

voice that is very stilted and mechanical and might not necessarily reflect the world that we live in today.

And I think this really ties into a lot of the issues that we face around misinformation and automated

racism and algorithmic oppression that we see around machine vision and so on and so forth, voice

interfaces and voice technology and conversational technology.

[00:40:18] These are also domains that are not exempt from the issues that we have in

society. Yeah,

[00:40:24] Frederick Weiss: [00:40:24] we start

losing the quality of humanity and what you Manatee is, but is there anything I know you were talking

a lot about in chapter six, about, about the future. Are there any brighter notes that you could

no.

[00:40:41] Frederick, there's not, yeah. I don't want to go down the matrix

road, but are there any like cool new things is that we could be looking forward to or things

that we could start thinking about now that would be advantageous for us to go, oh, you know what, let me

next year start thinking about this so I could get my projects.

[00:41:01] Preston So: [00:41:01] Yeah,

absolutely. There's so much to think about. And obviously I wouldn't have written this book if

I thought it was going to be a dystopian nightmare and the next few years, or next few decades because

voice technology really does have a lot of illuminating and very interesting prospects

that I think there's really important things to call out there.

[00:41:19] Not just the facts. And this is not something I mentioned very much in my

book, but I do mention it very briefly in my Alyssa part, article usability testing for voice content,

which is that there aren't a lot of people who I really appreciate waist interfaces for one unexposed.

And that is that I think, as we all know, a lot of us, especially over the course of the last year and a

half.

[00:41:42] And I do want to make sure to hold space for those who are

still dealing with grief or suffering right now from the consequences of the coronavirus

pandemic. Especially of course in India and Australia currently going through a very severe lockdown

and the third wave ongoing in Africa Voice interfaces have been shown to stave off loneliness

for a lot of people.

[00:42:05] There is research that suggests that having a voice interface that is

there to have a conversation with is something that could be very beneficial for mental health. And in

the future, as these conversations become better and better as voice interfaces, get to the point where they

can do much better, small talk than these really simplistic, let's say gimmicky responses that

they often issue.

[00:42:28] I think we can really look forward to a lot of interesting, let's say

social benefits from voice interfaces. The other one though, I think is also the fact that there

is going to be more efficiency when it comes to content delivery and information delivery. There's a.

Futurists named Mark Curtis, who refers to what's called the conversational singularity.

[00:42:47] And we know about the kind of tech or AI singularity, the conversational

singularity is along the same lines, which is this notion that as we move further and further into the

future, there's going to be a point in time where conversational interfaces will be indistinguishable

from other humans when it comes to the kind of conversation that we three are having right now.

[00:43:09] And one of the things that I think is important to call out, of course, as

well. Okay. That's a great kind of future, but conversational singularity is going to be

indistinguishable, but for whom, right? Whose conversations are going to be indistinguishable.

As I was just saying earlier, but I think one of the really interesting things about the

conversational singularity and some of them.

[00:43:27] Let's say conversations, centric, approaches that are coming out,

which wash away some of the weird distortions that we have today, some of these arbitrary lines in the stand

that we have, where you talk with a certain Alexa skill or a certain Google assistant, and they can only

help you with this one, certain task.

[00:43:43] They can only help you order a pizza, but they can't help you book a

flight. These sorts of interactions will soon become smoother because you know what, maybe I do want

to go directly into just like I would with a hotel concierge. Actually have a conversation that moves

directly into ordering a pizza.

[00:43:58] With extra pineapple as it should be. And then directly into booking a

flight over to my favorite vacation destination. So a lot of these efficiencies are going

to become very important in the future. And I think what's going to happen in the next few decades is

we'll start to see ways in which, okay.

[00:44:16] Yeah. Some of these issues that we have with how conversational

interfaces work or reflect the world that we live in back at us are going to become better in terms of

the efficiency and ultimately the performance of user interfaces in the same way as that websites and mobile

applications have become much more efficient and much more able to get us over to the things that we want to

do.

[00:44:41] Frederick Weiss: [00:44:41] I

remember at a Google IO, they had a, what was the one assistant that called to book a hair appointment for

somebody. And they were like, oh yeah it's completely indistinguishable from a person that's

wrong.

[00:44:53] Preston So: [00:44:53] I can totally

tell I'm saying,

[00:44:57] Frederick Weiss: [00:44:57] yeah, I,

yeah, I think you could tell, but they said and if you're on a phone call things you have

things in the background you're trying to get through things quickly and you're like, yeah,

whatever.

[00:45:09] Yeah. It could work. I'm sure. One day, like you said, a person

will get like that that movie, her with Joaquin, Phoenix and Scarlett, Troy.

[00:45:17] Preston So: [00:45:17] Yeah.

Who among us hasn't accidentally answered an automated phone call. That sounds

exactly like a conversation. What are those spam calls that were all besieged by lately and answered a

question because it sounded so real or perish the thought, and this is going to be very

revealing.

[00:45:32] I think we've all done this, you accidentally answer somebody's

voicemail. Automated message saying, Hey, it's Preston. Oh, Hey I'll leave a message at the

tone. Oh wait. Okay.

[00:45:43] But yeah, I think it's a really exciting time and I do think

that I think one of the things that's important, and I think this book is very timely, right?

Because one of the things I will admit is that when this book first was being germinated as an

idea, I thought it might be a little early because this project that we did for Georgia was very

early in its time.

[00:46:03] It's one of the first ever content driven information driven voice

interfaces. It's also really one of the first, very few examples of state governments and local

governments doing this kind of work at the time, too. But now I think it's very timely because

one of the things that we've seen over the course of the past year and a half is smart speakers,

smart home systems.

[00:46:24] Everyone's buying them, they're flying off the shelves and

increasingly here as we re-answer the world or live with the virus as it continues to be a problem for

so many of us in the world, Just start getting used to some of these other ways of interacting with content.

Other ways of interacting with information, with use cases and applications that we need to actually go

through.

[00:46:48] And voice is just one of those. And I think we're going to see a

lot more investment and a lot more care from the user experience side, not just the developer side

in this realm of, okay, we've done this for the web and the web has served us really well for the

last few decades, but how do we actually make sure that some of these more multimodal approaches, as we

mentioned earlier on accessibility or some of these more interesting immersive or voiced an oral and

immersive approaches can be things that will be compelling for users and designers and practitioners in the

future as well.

[00:47:25] Frederick Weiss: [00:47:25] Makes

sense? What do you think Brian? Or should we go to the lightning round? Yeah. Yeah.

[00:47:31] We're

[00:47:31] Brian Hinton: [00:47:31] getting

close to the end here. So we're

[00:47:33] Frederick Weiss: [00:47:33] flying

rats on. I've got my gloves on. Let's go ahead.

[00:47:37] Brian Hinton: [00:47:37] Yeah. So

we're each gonna ask you a question, answer yours, and one at a time. And I'll go first.

So would you rather be able to run at a hundred miles per hour or fly at 10?

[00:47:49] Preston So: [00:47:49] I have to

think about this one. Probably fly and it's. Yeah. It's because you can see more. Yeah.

That's fair.

[00:47:59] Frederick Weiss: [00:47:59] Preston.

What is your favorite thing about yourself?

[00:48:04] Preston So: [00:48:04] Oh my gosh.

Oh my gosh. These are some questions y'all really, I don't remember the last lightning round

being like this. I think my favorite aspect about myself is that I have learned a lot and I've had the

privilege of living in many different countries, which not everybody has the privilege to say.

[00:48:28] And that's given me a lot of good perspective. I'll say that.

Would you rather live where it snows all the time or where the temperature never falls below a

hundred

[00:48:39] degrees? Wow. This is like Snowpiercer versus thread 3d or something like

that. Def. So I'm somebody who needs, so right now I am in an air conditioned room, even though

it's actually not that hard of a day here in New York city, I need the cold, I cannot deal with the

heat.

[00:48:57] And so yeah, it's definitely snowing all the time. I could

probably be okay. In, in, in Antarctica actually, I would say, okay,

[00:49:06] Frederick Weiss: [00:49:06] Preston,

what book are you yourself reading? To to learn from currently that you're

[00:49:12] Preston So: [00:49:12] enjoying. All

right. I'm currently reading three different books. Not really making much progress

in either of those; it's like the Fibonacci sequence of reading books and increasing those

every year.

[00:49:28] One book that I'm reading, which I will share, which is a very

esoteric book right now is Bosnian Croatian and Serbian a textbook because I'm learning

Serbo Croatian at the moment as a language, but I'm also reading two other books that are really

interesting. The first is conversations with things which is a book written by Rebecca Ivanhoe.

[00:49:49] And I forget the co authors name. I have it right here. I should look at

it. As well as Margot Bloomstein book trustworthy, which is a book about how brands can be

more authentic in how they operate in terms of content strategy.

[00:50:09] Cool.

[00:50:09] Brian Hinton: [00:50:11] What

current fact about your life would most impress your five-year-old self?

[00:50:19] Preston So: [00:50:19] Oh my God.

Wow. My five-year-old self. Got it. I thought that was an easy question. You answered it last time. Did I

really? Oh my gosh. Let me think. The fact about myself, that people I think

the fact that my five-year-old self would most be impressed by is the fact that, oh my gosh

after the fact.

[00:50:48] Frederick Weiss: [00:50:48] I

remember last time you said moving to New York and working in New York was one of your childhood dreams,

[00:50:54] Preston So: [00:50:54] giving them

the answers. That's really funny, cause that's not what I, that's not what I would say to myself

actually. That's really interesting. You know what I'll say is this actually I think

this is an interesting one because just to get a little personal here when I was, and a

lot of us dealt with this when we were younger a lot of us as children, as young toddlers are as

young.

[00:51:14] Kids, we deal with speech impediments or other issues with

let's say pronouncing words correctly, or doing those sorts of things. And I grew up

with a speech impediment, which makes also some of the voice technology kind of things, really poignant.

So what I would say is my five-year-old self would definitely be very proud of me for the fact that I

can basically go on stage in front of 3000 people and not break a sweat or have this live stream

with also 3000 people.

[00:51:43] Of course, there's 3000 people listening to this right now. And

not break a sweat either. Yeah. With a personal note there.

[00:51:52] Frederick Weiss: [00:51:52] Nice.

What is the most interesting thing that you learned in the process of writing

[00:51:58] Preston So: [00:51:58] this book?

Most interesting thing that I learned in the process of writing this book, the most interesting thing I

learned in the process of writing this book is probably the.

[00:52:10] Unexpected applications of accessibility. And unexpected challenges around

accessibility that occur with voice interfaces, especially given the fact that I think a lot of us are

accessibility efficient autos or those who are really passionate about accessibility. I think

we often forget that.

[00:52:31] Not only are there so many different types of interfaces that

we need to consider the interface that has become the most important one today, which is that

the screen reader for websites is actually not necessarily the most optimal or pleasant experience. And

I already did have a sense of this because I do a lot of and this is

one thing I think everybody should do is you should always take.

[00:52:57] Sort of user interface, you're building and using it from the

perspective of somebody who's using a screen reader or somebody who's using an assistive

interface, because it is very important to understand how people work from that perspective.

But one of the things, so I already knew that screen meters were really very tough, but I guess

one of the things that I didn't necessarily realize is just how much people actually really

don't like the screen reader sometimes, and really see it as an obstacle to getting to what they

need.

[00:53:28] That was a very long answer.

[00:53:31] Frederick Weiss: [00:53:31]

That's okay.

[00:53:35] Brian Hinton: [00:53:35] What book

has made you cry?

[00:53:41] Preston So: [00:53:41] What book has

made me cry? Gosh. Yeah, that's a really interesting question. Wow.

There's been, there's definitely been many books that have made me cry. I would

say the book that both made me cry and made the deepest impact on me is probably, oh my gosh.

I'm just trying to think about this now because yeah the, what I will say is the book

that has definitely made the biggest impact on me and made me cry.

[00:54:22] Both of those were probably invisible men. Which is a book that I

recommend everybody read. It's one of those books that you read in high school or college

English class, but it's a very important book and something that I think everybody should

definitely read. Let me

[00:54:40] Frederick Weiss: [00:54:40] I'm

out of lightning round questions.

[00:54:43] Brian, do you have anything else on that? Oh, no, I think we're good.

Great. Let's get to our final topic here at the end Preston. We like to ask our guests for parting

words of wisdom, any kind of things that you'd like to tell our audience at the end.

[00:55:01] Preston So: [00:55:01] Yeah.

Great question. I think my biggest parting words of advice for everyone, and this is not just

those who are in the design field or who are in the technology world.

[00:55:16] But I think one of the things that I would recommend for everyone who is

watching this or listening to this, or will watch or listen to this is that it's really important

to really listen to. And uplift and amplify and also hear and take into account in your own day-to-day work

and your own day-to-day life.

[00:55:41] The lived experiences of those who are completely unlike you. And by

completely unlike you all of those people who face a multiple axes of marginalization or oppression,

or who faced very deep obstacles in our world today, who might be disabled, might be women or femmes might

be people who are queer or trans might be people who are of color, who are black or indigenous.

[00:56:09] And I think one thing that is really important to me, and one thing

that's very important to the way I live my life is. So I really deeply understand where everyone

is coming from in terms of their context and in terms of how they have come to be the person that they are

today. Because ultimately as practitioners of technology, as those who work on technology, the ultimate

reason we're doing this is to help everybody who is our audience, succeed with what they're

doing.

[00:56:42] And there's no way to do that unless you really deeply understand and

take the time to learn about and comprehend what it is that your audience goes through in any field

that we work as people in this world that we live in.

[00:57:02] Frederick Weiss: [00:57:02] Very well

said. Thank you, Preston. Again all your social links we have at Preston So.

[00:57:07] So on Twitter, your website is Preston So. Presence on LinkedIn. And of course the

new book, Voice

Content and Usability by Preston So. Get it

there today. Preston again, thank you so much for being on the show. We really appreciate it.

[00:57:32] Brian Hinton: [00:57:32] No, thank

you for taking the time.

[00:57:35] Preston So: [00:57:35] Thank you both

so much. It was such a pleasure to be here on Thunder Nerds again, and I'd love to come back sometime.

Maybe I'll rehearse some lightning talk or lightning question responses for next time, but thanks

so much for having me. I appreciate it. Yep.

[00:57:47] Frederick Weiss: [00:57:47] Thank

you!

[00:57:48] Yeah, for the next book. We'll see you then. Thanks everybody. Oh,

hold on. I got one last comment. Let's see. Thank you for all. Todd- “Thank you for all the

phenomenal conversation”. Thank you so much, Todd. Thank you everybody for watching. Really appreciate

it. Take care everyone.

If you have questions, or suggestions to modify the transcript, PLEASE let us know at

[email protected]

...more

View all episodes

By Frederick Philip Von Weiss, and Brian Hinton

4.9

2222 ratings

June 27, 2021

283 – 🔊 Voice Content And Usability with Preston So

1 hour 1 minute

✨ Episode Sponsor

Auth0: https://auth0.com
Auth0 on YouTube: https://www.youtube.com/auth0
Auth0 on Twitch: https://www.twitch.tv/auth0
Auth0 Avocado Labs online meetup events: https://avocadolabs.dev/

🔗 Episode Links

Preston’s new book – Voice Content And Usability: https://abookapart.com/products/voice-content-and-usability
Publisher: https://abookapart.com/
Preston on Twitter: https://twitter.com/prestonso
Preston’s Website: https://preston.so/
Preston on LinkedIn: https://www.linkedin.com/in/prestonso/
Oracle: https://www.oracle.com/
Previous episode – 🪓 Headless CMS, Decoupling Drupal with Gatsby, & Conversational Design with Preston So https://www.thundernerds.io/2020/06/headless-cms-decoupling-drupal-w-gatsby-conversational-design-w-preston-so/
Ask GeorgiaGov: https://georgia.gov/chat
Google Cloud Dialogflow: https://cloud.google.com/dialogflow
Diglossia: https://en.wikipedia.org/wiki/Diglossia
Word by Word: The Secret Life of Dictionaries: https://www.amazon.com/Word-Secret-Life-Dictionaries/dp/110187094X
Conversations with Things: UX Design for Chat and Voice: https://www.amazon.com/Conversations-Things-Design-Chat-Voice/dp/1933820268/ref=sr_1_1
Invisible Man: https://www.amazon.com/Invisible-Man-Ralph-Ellison/dp/0679732764
Gatsby: The Definitive Guide: https://preston.so/books/gatsby/
Hosts:
- Frederick Weiss: https://twitter.com/FrederickWeiss
- Brian Hinton: https://twitter.com/mrbrianhinton