In this episode, we get to talk with Preston So, Senior Director of Product Strategy at Oracle. We talk to Preston about his new book VOICE CONTENT AND USABILITY. We discuss the concepts of building conversational designs that are ethical, accessible, and usable.
✨ Episode Sponsor
- Auth0: https://auth0.com
- Auth0 on YouTube: https://www.youtube.com/auth0
- Auth0 on Twitch: https://www.twitch.tv/auth0
- Auth0 Avocado Labs online meetup events: https://avocadolabs.dev/
🔗 Episode Links
- Preston’s new book – Voice Content And Usability: https://abookapart.com/products/voice-content-and-usability
- Publisher: https://abookapart.com/
- Preston on Twitter: https://twitter.com/prestonso
- Preston’s Website: https://preston.so/
- Preston on LinkedIn: https://www.linkedin.com/in/prestonso/
- Oracle: https://www.oracle.com/
- Previous episode – 🪓 Headless CMS, Decoupling Drupal with Gatsby, & Conversational Design with Preston So https://www.thundernerds.io/2020/06/headless-cms-decoupling-drupal-w-gatsby-conversational-design-w-preston-so/
- Ask GeorgiaGov: https://georgia.gov/chat
- Google Cloud Dialogflow: https://cloud.google.com/dialogflow
- Diglossia: https://en.wikipedia.org/wiki/Diglossia
- Word by Word: The Secret Life of Dictionaries: https://www.amazon.com/Word-Secret-Life-Dictionaries/dp/110187094X
- Conversations with Things: UX Design for Chat and Voice: https://www.amazon.com/Conversations-Things-Design-Chat-Voice/dp/1933820268/ref=sr_1_1
- Invisible Man: https://www.amazon.com/Invisible-Man-Ralph-Ellison/dp/0679732764
- Gatsby: The Definitive Guide: https://preston.so/books/gatsby/
- Hosts:
- Frederick Weiss: https://twitter.com/FrederickWeiss
- Brian Hinton: https://twitter.com/mrbrianhinton
📜 Transcript
Brian Hinton: [00:00:00] I’m Brian Hinton.
Frederick Weiss: and I’m Frederick Philip von Weiss.
And thank you so much for consuming the Thunder Nerds, a conversation with the people behind the technology
[00:00:46] Brian Hinton: [00:00:46] and do
[00:00:52] Frederick Weiss: Yeah, thanks
everybody for watching the show. If you can please go to the notification bell and subscribe.
Brian Hinton: We’d like to thank Auth0, Auth0 is
this season’s sponsor. They make it easy for developers to build a custom secure and standards-based
login, a unified login and authentication as a service, to try them out, go to Auth0.com today. Also check
out their YouTube and Twitch under the username, Auth0 with some great developer resources and streams, and
last but not least is our avocado labs.
[00:01:43] I love that name. An online destination that their developer advocates run
organizing some great meetups. Thank you Auth0.
[00:01:52] Frederick Weiss: [00:01:52] Yes. Thanks
Auth0! Let’s go ahead and welcome our guest.
[00:01:50] Frederick Weiss: [00:01:50] Thanks so much,
Brian. So with that being said, and without any dues being further, let's go ahead and get to our
guest and welcome him back. We have the author of the new book, VOICE
CONTENT AND USABILITY, senior director product strategy at Oracle, speaker,
Preston So. Preston, welcome back to the show!
[00:02:17] Preston So: [00:02:17] Hey Frederick.
Hey Brian. Thanks so much for having me back on Thunder Nerds. Might I say it’s a real pleasure to be
back here one more time to talk about my new book. Thanks for having me.
[00:02:26] Frederick Weiss: [00:02:26] I
appreciate it. And we started a little late and you have an event that you were just doing. Do you mind
telling us a little bit about that event?
[00:02:32] Preston So: [00:02:32] What that. I
will. And my first and foremost dear apologies to everyone who was waiting for this live stream. I had
the misfortune of forgetting getting, send out a confirmation email and an email that actually I had,
let's say, Hey, this event is happening today. So we started a bit late and we ended a bit late.
[00:02:51] It was my launch event for my new book, which is here, voice content and
usability. And we had a great time doing some discussion about the implications of voice interfaces
for those of us who work with the web, which is, I think a lot of us in the funder nerds audience, as well
as the implications of voice on our society.
[00:03:13] And of course, The vaunted and traditional book cake, which is something
that everyone at a book apart, my publisher has to unveil as part of the process of launching a new
book. It was a very interesting process, but very sorry to those who were waiting on this
[00:03:33] Frederick Weiss: [00:03:33] Oh,
sorry. Did you say a book cake?
[00:03:35] Preston So: [00:03:35] utterly a
cake? Yeah. Book cake. Maybe I'm saying too much. I don't know how, like it should be cake.
Gotcha. Yeah. If she, yeah. Not like everything is cake, oh, it's all cake that
yes. It's all cake as well, but a book cake, because basically oh, you're supposed to
have a cake that looks like your book and represents your book.
[00:03:55] Yeah. So it was a great launch event and it was a real
pleasure to share a little bit about the process. I went through writing the book and some of the
really exciting things that I taught. Love that.
[00:04:09] Frederick Weiss: [00:04:09] And
speaking of the book, we're going to be giving away three copies of the ebook courtesy of a book apart
[00:04:16] If you can just chat with us, ask us your questions. Maybe
tell us you want a book we're going to randomly give away some books. So we'll be doing that as the
show progresses on. Preston first, let me talk to you a little bit about you being with us last
time, promoting your last book.
[00:04:35] Decoupling Drupal. Am I saying that correct?
[00:04:38] Preston So: [00:04:38] Yes. Decoupled
[00:04:42] Frederick Weiss: [00:04:44] How was the success of that and how did that prompt you to start writing a
new book? You just wrote that book not too long ago and all of a sudden you have another book.
[00:04:54] So I see a pattern every year, a new book, I wish I could come out
with a new book every year. Like someone would say RL Stein of goosebumps or something like
that. But This has been a really interesting process because my books tend to be very focused on
really technical aspects of the ways in which we work with our content and the ways in which we work
[00:05:19] Preston So: [00:05:19] The first book
I wrote was back in 2018, a couple of Drupal and practice. And I think one question I get a lot and
definitely happy to answer for some of those on the call or those in the audience.
What's it, what is it like as a technologist to write a book? Especially for those who are
[00:05:38] So this book is actually my first book that is not a coding book, not a
technical book. It doesn't have any code snippets in it. Couple of code formatted sections that are
really tiny, but it doesn't really have any sort of tutorials as to how to spin up a command line
interface or things like that.
[00:05:58] It's really focused on the user experience and design audience and the
accessibility audience, which is a very different audience from the audiences that I'm used to writing
for. What's interesting is that decoupled, Drupal and practice is about the architectural underpinnings
or the foundation of how you can deploy content.
[00:06:16] That's oriented towards things like JavaScript applications or other
sorts of environments, like voice interfaces. But it really dives into the Navy. Voice content
usability. However, is really unlike that because it really focuses on how we as designers, as user
experience professionals who are working on usability testing or usability research can really engage
with this new field that is emerging around voice interface design, and specifically around things like
voice content strategy and voice content design.
[00:06:49] But the other thing I will say is that I actually made the mistake. I had
the privilege, or some would say the misfortune of writing two books at the same time, over the past year
and a half. And the other book that I've got coming out this fall is Gatsby, the definitive guide, which
is about Gatsby JS, the static site framework.
[00:07:08] So right back in the other direction.
[00:07:10] Brian Hinton: [00:07:10] So
[00:07:12] Frederick Weiss: [00:07:12] every
[00:07:14] Brian Hinton: [00:07:14] three
[00:07:15] Preston So: [00:07:15] I was thinking
of more Fibonacci sequence, actually, Brian like I think I should write five and then eight and then
13 Yeah, they might get a little shorter and they might be filled with some more memes.
[00:07:25] So why is voice content usability? Like, why did you're like, okay,
now I really think I need to write this.
[00:07:33] Yeah. Yeah. Specifically too, if I
[00:07:34] Frederick Weiss: [00:07:34] could
append to that point, Brian Y you said yourself, like you moved away from like a coding
kind of thing. Like why go that way into
[00:07:42] Preston So: [00:07:42] the
[00:07:44] So I've always been really into web development, but my real
core interest and passion has always been for design and user experience. I started out as a web designer. I
started out as a print designer. I actually also did computer programming back in the back in those
days and got into web development that way.
[00:08:02] But it really wasn't necessarily something that was an itch. I
got to scratch very much this aspect of design and user experience that is beyond the web. And I've
always been interested, not only in how we can serve some of the users who are interacting with some of the
content that we produce or some of the experiences that we create in terms of technology beyond the
[00:08:27] I was also really interested in how we can actually best serve. Users that
already exist and users that are already within the demographics of the audiences that we're trying to
serve. I've always been interested in web accessibility first and foremost, as well as some of the
aspects of how accessible it really changes the ways that we think about other user interfaces that might
not have gotten and so much attention from the standpoint of how they can better serve disabled users and
those who might be elderly and have a little bit more trouble for example, using a mouse
or typing on a keyboard and those two audiences, specifically the elderly and disabled communities
around the U S we're communities that we aim to serve with the first ever voice interface for
residents of the state of Georgia.
[00:09:15] I worked to ask Georgia gov, which had the specific goal of really
focusing on. How we can serve residents of the state of Georgia who want to be able to find out things like
registering the vote or how they can get a small business loan or how they can renew their fishing license
without necessarily having to incur the cognitive costs of either interacting with a screen reader driven
website or interacting with, let's say somebody in person at an agency office.
[00:09:45] And I think one of the really interesting insights that we found is that
I think really unexpectedly is that a lot of the websites that we build, obviously we think. Nowadays
because so many people use the web because disabled folks use screen readers because so many people now are
used to the paradigm of the web.
[00:10:04] The website is really the gospel of how people should now consume
content and how people do consume content. But I think one of the things that's been born out by this
project is that the kinds of things that people would ask an Amazon Alexa sitting in their own home about
the state of Georgia and the government capabilities that are available to them were completely
[00:10:26] And in some cases, diametrically opposed to the sorts of queries and
things that people would search for on the georgia.gov website, which is the ultimate source of all of the
information that we used. And that really illuminates a little bit of this. I would say a little bit
of this hidden bias that we have.
[00:10:43] Towards the website as the primary conduit for information, when in some
ways it really should be just considered one facet of a wide variety of ways to access our content
equitably. So then what do we
[00:10:56] Frederick Weiss: [00:10:56] do?
Are we expected to have multiple locations for our content, like specifically. I'm going to
build content for voice, or I'm going to build content for a website, and I'm going to build content
that goes into an application.
[00:11:14] Or am I or does it behoove us to write content that is a uniform and
maybe in a specific way, and possibly you might answer in what way that, that might be as one source of
[00:11:31] Preston So: [00:11:31] That's a
really challenging question. And obviously I shouldn't really go too far here without saying that some
of those questions are answered in my book, voice, content, and usability as a book of parts.
[00:11:42] Please don't give everything away just a little bit. Could you read
the whole book out loud, please? That'd be here all day. Yeah. Yeah, we do
have, so what I will say is that this is the perennial debate, right? I think one of the things that
we as designers struggle with as we really deal with this exploding, Kind of menagerie of user experiences
that we increasingly have to deal with is what do we do with our data?
[00:12:10] What do we do with all of these things that we've built that are in
some ways, very much oriented towards, or very focused on the audiences that we've cultivated over time,
namely our websites and mobile applications being for these very visually rooted experiences and
demographics that are used to these visual experiences, the things that are really problematic about some of
the approaches that were characteristic of the early days of voice content.
[00:12:37] Let's say when people were experimenting with voice interfaces or chat
bots, as a means to deliver a certain type of content, you would have a parallel version of the information
that was already housed in your website. And those of us who are content designers or content straps,
Can really feel the pain that comes from the notion of having a set of content over here in one silo,
that's destined for the website and another piece of content over here, that's destined for a voice
[00:13:04] How do you keep those two things in sync? And now that we have regulations
like GDPR and HIPAA, for example, that are really obligatory, that content stays current, or that content
stays up to date with what we need. How do we actually make sure that all of this content stays up to date
without having it be in a single source of truth for content?
[00:13:24] Now, my book definitely doesn't make any prescriptions about going in
one direction or the other where, oh yeah, you must do it this way. Or you must do it that way because there
are exceptions to everything and nothing is ever cut and dry. However, I generally err on the side of
saying that look at the case of what we did with the state of Georgia, georgia.gov, they insisted
actually that we use one single source of truth for content that was going to be an omni-channel or channel
agnostic source of truth for content because ultimately a lot of us don't have the luxury to maintain
multiple versions of content that are destined for multiple conduits of content.
[00:14:00] So we ended up keeping it all in one source and we ended up maintaining it
all in one. And having both voice and web versions of the content pull from the exact same repository
of content, which ends up being more scalable in the long run, especially now that Georgia has built an
additional chat bot that is a written chat bot, a textual chat bot, but also pulls from the same
[00:14:25] I'm curious, there
[00:14:25] Brian Hinton: [00:14:25] was a course
of your research and writing of this book. Was there anything that shocked you or surprised you that you
[00:14:33] Preston So: [00:14:33] Didn't
immediately realize. Yeah. It's a great question, Brian. I there's a there's
too many to list because I think one of the things that's really one of the things that's really
tough about voice interfaces is that up until recently, it's been really challenging for a lot of
those who are not computational linguists or machine learning engineers or people who are really deeply
involved in some of these very low level technologies to really get involved with voice.
[00:15:08] However, one of the things I will share is that in some ways there's
really interesting emergencies of some of the foibles in voice interface design. When you start working with
this technology that is very reminiscent of back in the day and those of us who were listening
[00:15:27] Have worked in the web for a while, will recognize, for example, the
things that we used to deal with in the early two thousands or mid two thousands, like quirks mode
compatibility, or some of the really odd browser hacks that we had to do with CSS. And there's
weird things like that in voice interfaces.
[00:15:46] One example of this that I'll share and I'll keep it just to one
is when we build, ask Georgia gov, which of course is that voice interface for the residents of the state of
Georgia. There was a situation where we had a retrospective. And one of the things that we did for
Georgia was they wanted to have the ability to administer and manage all this content in one single
[00:16:08] And we had a parallel set of logs and reports that would sit right next to
the logs and reports for the website. So whenever somebody would hit a 4 0 4 error on the website they
could compare and see. How many times did this piece of content also air out, for example, for the
voice interface for Alexa, were there situations where the search return, the results or where it triggered
4 0 4 errors on the content management system that we were using to serve both the website and the voice
[00:16:39] So we had this retrospective about eight months after the launch of the
interface, which was in 2017. And we had a discussion about some of the logs and we kind of leaf
through them and said, okay, what are some of the errors that we're seeing? And what are some of the
things that we can do to either adjust the content or maybe even do some debugging of the interface
[00:17:00] There was this one result that kept on coming up over and over again, this
one error, this 404 error, basically a search that somebody conducted that returned no results, no content.
And it was the word Lawson's L a w S O N apostrophe S. And this kept on popping up over and over
again. It was about 16 times.
[00:17:20] If I remember correctly in the log and we thought. Who is searching, who
wants to search for this, like proper now this brand name this person named Lawson did
they get this confused with the different kind of application on their Alexa that they're trying to use?
And we sat there and scratched our heads for a few minutes.
[00:17:38] And one of the native Georgians in the room suddenly perked up and she
said, you know what? I think it's somebody who is from Georgia, who has a Southern drawl, who is trying
to say the word license as in driver's license or nursing license or fishing license, and sure
[00:17:57] That was exactly what happened. And this is one of those situations where,
Hey you can do the best designed application that adheres to the latest and greatest standards and
specifications like we did back in those days with CSS and come within an inch of perfection when it comes
to these voice interfaces that we build custom.
[00:18:17] But ultimately it's in the hands of people like Amazon or Google,
whether or not they can actually understand the kaleidoscope of American English dialects that we have in
this country. And that we really should be able to understand. And I think it's a really good sign that
yeah, these voice assistants are really good.
[00:18:35] But they're not yet at that point where they can beat us at our own
game of human conversation. Yeah.
[00:18:40] Frederick Weiss: [00:18:40] This
brings me if you don't mind really quick, Brian, this is something that Todd Libby wrote
here, and he he also appended to his question where their edge and he wrote challenging where
they're challenging edge cases with respect to a 11 Y that you ran into the Georgia project.
[00:19:00] Preston So: [00:19:01]
Yeah. Great question, Todd. And when it comes to the work that we did on accessibility, on
Astoria gov in terms of edge cases, I will share that. I think one of the big challenges,
there were several challenges, right? And one I think is one of the one of those
challenges that's inherent to.
[00:19:20] Voice interfaces that are pure voice interfaces, which I, and others
define as basically a voice interface that lacks a screen. So there's no visual component, no tactical
or physical cues on it. Yeah not a gooey. You're basically just interacting with
somebody through the spoken word.
[00:19:37] And I think this is not really an edge case so I don't wanna say
that this answers the question, but one of the things that I think a lot of people forget, and I think is
really important to keep in mind when working with voice interfaces, when it comes to extending the
accessibility of your content on a website or your web properties, is the fact that pure voice interfaces
that are lacking in a visual or physical component are actually not accessible to
certain disabled people, namely those who are deaf or those who are deaf blind.
[00:20:10] And the notion that I think a lot of people have today, Is
voice interfaces can solve a lot of cases for accessibility, but that's really not the case
because when it comes to so many of the demographics that we need to serve in the disabled community,
there are certain solutions that only go part of the way there and we're going to do that.
[00:20:34] Yup. And yeah, so that's yep. Yeah. That's
exactly right. How do we also make sure that we can serve content on a mobile? Consumable way to
refreshable braille displays that are maybe not necessarily the same thing as the kind of let's
say screaming and experience.
[00:20:52] That's very rooted in the visual structure of a webpage it's very
early days still in this, the sort of notion of multimodal accessibility or how to really make
sure that a lot of the user interfaces that we have are not actually stepping on the toes of other folks who
are accessing content in particular ways.
[00:21:12] The edge case, however, that I will share is I think a lot of people also
make the assumption that these voice interfaces and voice assistants can be. The
ultimate solution for a lot of folks who are blind or have low vision, but that's really a tough sell in
some ways, because I think one of the things that's really important to recognize about these peer voice
interfaces like Alexa, is that they have a learning curve too.
[00:21:38] We know this web meter and some of these browsers or three
meters like ChromeVox or Jaws have issues that require people to ascend a very steep learning
curve to use them in an effective and efficient way. And voice interfaces are very much the same way. So one
of the things that we encountered during our usability testing was.
[00:22:01] Just one of those things that we didn't necessarily expect,
which is that a lot of people that we had come in and worked with and went through our usability
study, really had very little experience with Alexa devices. And I think for those who are looking at
voice interfaces as a means to be a compelling potential sidelong alternative to swim meters,
that might necessarily, that might potentially be a little bit problematic and how they
efficiently guide users to their content as as the voice interface designer, Chris Mari writes
it is something to think about, which is there is still a learning curve.
[00:22:41] And how do you actually address that learning curve in a way that makes
sense to those users that you need to.
[00:22:47] Brian Hinton: [00:22:47] Yeah,
I'm curious in the sense of Georgia, where we're at my current role, we're working on a chat
bot. And one thing that we've found most difficult is I think it's called semantic, parsing a word
converts that conversation into what logically makes sense.
[00:23:03] What are they asking? And it's like the difference, like the capital
of Georgia, someone's saying capital of Georgia and that's all they say, or what's the capital
of Georgia or Georgia Capitol is like did you encounter anything weird in that sense or any
[00:23:19] Preston So: [00:23:19] Yeah. I
talk about this a lot in chapter three of my book, which is about writing those conversational dialogues
that really are the lattice work of the voice interfaces that we produce.
[00:23:32] And it's a really challenging kind of thing because a lot of these
questions, Brian are really rooted in the technology that you're using. Because some voice
ecosystems or conversational ecosystems are better equipped to deal with. Let's say variations, like the
ones that you mentioned just now, than others are.
[00:23:49] But there is a lot of work being done to improve the situation. So back in
the day in 2016, when we worked on Astoria gov and in the grand scheme of voice interfaces and the
history of conversation design five years ago is a long time ago. We might as well be talking about
clay tablets and abacuses at this point, because that was an era where a lot of those utterances that people
would state in order to do a process of what's.
[00:24:17] Intent identification where the user interface is able to piece together a
sense of what the user actually wants to achieve, which is much easier said than done. That's a
process that used to be very much a sort of manually driven process. For example, let's say that
you're trying to identify a yeah.
[00:24:37] You're trying to identify a question like what is the
capital of Georgia? It has to be phrased like a question, let's say. And one of the things that I think
is really challenging for a lot of people who are just getting started with voice interfaces is that in some
of these ecosystems, some of these technologies obligate you to be very clear about defining how the user
[00:24:56] And as we know, as users. The ways that we actually respond to some of
these questions and the ways in which we actually say some of these things can be phrased completely
differently from the ways in which we've actually coded the voice interfaces or conversational
interfaces or chatbots to consider.
[00:25:14] And whenever we have, what's called a, an out of domain error where
the chat bot or the conversational interface or voice interface, isn't able to actually understand
what you're saying, because the way that you phrased it, even though it's a perfectly logical thing
isn't accounted for within the context of what the voice interfaces in is able to understand
through its programming is a very big problem.
[00:25:39] So I'd talk about intent identification and the problems
that occur when you have these very dedicated slots or tokens or some of these No, basically
this teasing out process that you have to do with intern identification that really relies on some of these
boilerplate templates that users have to use to say these things, but that's not how we speak.
[00:26:00] That's not natural, right? Nobody really wants to have to say
things the same way. Over and over again, to be understood by a voice interface. Although there is usability
research evidence that suggests that some users do prefer that. But there are some ecosystems now, like
dialogue flow, for example or some of the major new conversational tools that
are out there are getting better at understanding, let's say all the different variations that you
could possibly have and being able to intelligently parse through that and say, okay, this is the intent of
what the user is trying to do.
[00:26:36] Even though this person might have said something that's very remote
from the, let's say a normal way or the default way that we would expect.
[00:26:45] Brian Hinton: [00:26:45] Yeah. My
favorite, like real life scenario of beating my brain, being the AI, trying to understand is when I,
somewhere, I can't remember where it was, Midwest that they asked what Coke do you want?
[00:26:56] And I said, Coke. And they're like, I'm sorry. Is that okay? Yeah.
[00:27:04] Preston So: [00:27:04] That's
[00:27:07] Brian Hinton: [00:27:07] I can't
imagine dealing with that sort of a scenario, isn't it? AI type? Yeah. That's
funny too, cause it could be something where if you're trying to communicate something out
to the bot or the voice technology, you got to think about the context of the personification
of this voice or the overall brand. If I'm interacting with a hospital, I don't
want the voice to sound all silly and goofy. I I want it to sound like a, just a normal, regular
voice. There are some kinds of situations that you might want or even languages for that
matter. If I'm somebody in Italy and I'm looking for.
[00:27:48] Frederick Weiss: [00:27:48] A lasagna
recipe and I'm in Italy and I'm looking for a lasagna recipe and I go to, and it sends me to the
food network and it starts reading me like a M roll recipe in in English. And I don't understand
English. There's all kinds of interesting facets
[00:28:01] Preston So: [00:28:01] to this,
yeah, this really brings up, I think a couple of interesting elements of the ways in which the
conversation design or voice interface design landscape really requires us to think very differently about
some of the things that we usually took for granted.
[00:28:17] And one of those really is the building blocks of language. And
I'm very lucky in that. Working with voice interfaces over the past five or six years has really allowed
me to scratch my itch when it comes to my academic background, which is actually in linguistics. I have a
degree in linguistics. Not a lot of people know that.
[00:28:34] But the biggest issue, I think a lot of us face is we're moving.
In several directions at the same time, the first is that we're moving a lot of the ways in which
we use to write user interface, texts, or content from the written word over into the spoken word,
which is a very different realm from how we normally write UI texts.
[00:28:56] Are we, how are we? Normally I actually write content. And just one
example to illustrate that is the fact that we don't really say the phrase to whom it may concern when
we actually speak. And we also don't really write the word literally, as often as we say it in
conversation. So a lot of these little nuances are things that can often be missed.
[00:29:17] And there's two ways in which this really. Can be a problem. The first
is that there are certain expectations that users will have that their voice interface reflects the kind of
informal or colloquial conversation that they might have with a friend. And when it doesn't reflect
that, and when the voice interface comes out with this very kind of stilted utterance or something,
that's a very uncanny valley, like I can really interrupt or dislodge the user from what is
called habitability and a voice interface.
[00:29:48] This is something that is talked about quite a bit in voice
interface literature, where the user has to feel like they're not gonna want to actually tear their hair
out or what little hair they have in terms of having a conversation with a voice interface. So
that's number one, but I think number two is really interesting given that you alluded to some of the
challenges around multilingualism.
[00:30:09] Types of conversation. And this really comes to, I think, some of the
elements of voice interface design that remain a largely unexplored area and also an area that is very
challenging because of the fact that so much of our conversational technology and voice interface technology
has so far been rooted in the English speaking world.
[00:30:30] And one of those issues is when we think about the ways in which we want
to serve multilingual audiences and international audiences on the web, we just have to provide translatable
strings, right? We just have to provide like these versions of these different pieces of texts that we have
or different pieces of content we have.
[00:30:48] But that is a very different kind of proposition when it comes to
some of these other languages. And I think one of the biggest issues that we have to focus on. Is the fact
that not all languages work like English, not all languages operate in the same kinds of systems and the
same kinds of assumptions that a lot of us have about English.
[00:31:08] And one of the things that is really interesting to me is that I'm
noticing more and more some of this Anglophone privilege or Anglophone bias in a lot of the voice interfaces
that are coming out that are meant to be multilingual are also direct translations of an English interface
because fundamentally some languages simply do not work the same way as English.
[00:31:28] There's a phenomenon in linguistics called Dyke Glossier. And this is
something I talk about on my blog, Preston dot. And this notion of glossy is actually a phenomenon. I
studied also when I was in college where the written form of a language is so vastly different from the
spoken form of a language that they might as well be considered two different dialects or two different
[00:31:50] And in some cases. Like Brazilian Portuguese, for example, you really have
to learn two different grammatical systems and two different lexicons and two different approaches to
the language in order to make yourself understood. Because if I went out on the street and I started
speaking in the way that I write, I wouldn't actually be necessarily understood.
[00:32:10] It I'd be understood because people would be able to understand, but
it would be a very strange and off-putting conversation. What I find is very interesting with a lot of the
work that conversation designers are doing today is that there's a lot of focus on efficiency and
scalability, where we can build one single conversational agent or one single conversational interface that
manifests as a chat bot as a slack bot, as a WhatsApp bot Facebook messenger bot, and as an Alexa skill and
[00:32:36] But there's a big problem with that, because that assumes that the
same kind of conversation you would have with a chatbot is going to be the kind of conversation you have
with a voice interface. And one of the things that we see in linguistics and also in the kinds of
conversations that we have on a daily basis through email and texts and at the delegate.
[00:32:57] It isn't the case that our spoken conversations are word for word or
even letter for letter. Exactly the same as our written conversations. And for those who don't speak
English, for those who are operating in a realm where let's say that the language that
they're writing for is not English.
[00:33:16] A lot of those considerations and concerns become a lot more important
than essential when it comes to some of the design that we have to do. And I think this means that we have a
long way to go in the English speaking world to understand how some of these conversational interfaces
really are rooted in our ways of speaking in ways that might not be so appropriate for the rest of the
[00:33:38] Brian Hinton: [00:33:38] Yeah, all of
this made me think of a book. I recently read word by word, the secret life of dictionaries. And
it's a fantastic book, but it's like the slang too, of how you mentioned the different
versions of Portuguese, the slang is different like Mexican slang versus Spanish, Mexican slang
[00:34:00] Spanish slang, very different and English slang, different, like someone
said, and also how people will say things like cool versus cool, like completely different. And
how to interpret that yeah, Johnny. Yeah. Tone.
[00:34:15] Preston So: [00:34:15] Yeah. And I
think this really illustrates a couple of different things.
[00:34:18] You've got the subtext that is not something in UI text or in
web content or in any of the word mediums that we have. And paralanguage sticks in this realm of,
okay. How are you actually? Really reflect back the fact that the user or the interface might be
speaking in a sarcastic tone or in a more assigned tone or in a very stilted tone.
[00:34:43] Like those three things can mean very different things, even though they
all use the same single sentence. But the other thing that's really interesting too, Brian, and I think
you raised a really good point there, which is it's not just the fact that we have all these
differences between languages and the ways that they operate.
[00:34:58] We also have very important differences. Like I mentioned earlier with
that Lawson's example around those of us who speak English. And one of the things that worries me
a lot about some of these voice interfaces is first of all, the fact that we hear fundamentally one
single dialect represented oftentimes in this realm of voice interfaces.
[00:35:19] And it's very similar in some ways to the ways in which newscasters
and weather forecasters used to have to be obligated. By their organizations to speak using a middle
American or general American dialect. It was unacceptable in certain past decades, in the news media for
somebody to speak with a Southern accent or somebody to speak with a different dialect of American English
[00:35:43] And that's something that's represented now in voice interfaces,
in both a very limiting and very pernicious way. Because as we know, from interacting with so many different
people from so many different walks of life, not only do we have examples of people who might be
bilingual or who might be members of a queer or trans communities who have to switch between different
modes of speech or those who are bilingual descendants of immigrant communities who have to be able to
code switch between English and Spanish, why aren't those sorts of interesting toggles and those sorts
[00:36:17] Representative voice interfaces too, because maybe the kind of
conversation that I want to have is the kind of conversation that I would have at home in new Delhi, where
I'm switching in between English and Hindi mid-sentence or I'm switching in between English and what
I think mid sentence. So these sorts of considerations are not only important for those who are users of
English in outside of America, which I think is one example of the America centric approach that we
often have with technology all over the place.
[00:36:46] But also the fact that we have been very marginalized and
underrepresented. Oppressed groups of people in the United States who speak in certain
ways that are not reflected in how we want voice interfaces to speak as well. And I think one very
compelling example, two very compelling examples of this is first of all, the fact that the ways
in which people use AAV or African American vernacular English is very different from the sorts of
voice interfaces that we interact with.
[00:37:14] For example, why is it that we can't hear those sorts of
conversations represented in an Alexa device. It has something to do with the intrinsic bias that a lot of
us have for a more middle American or general American approach to the conversations that we have. Of
course, fundamentally and foundationally a white American form of speech.
[00:37:33] And by the same token we know that those who identify as LGBTQ
have very different approaches to using certain language. There's certain code terms. There are
certain colloquialisms that are really not understood by audiences that are outside of that community. And
how do we make sure that voice interfaces can also represent those things?
[00:37:54] And this ties back to one of the things I talked about. In the final
chapter of my book, which really is focused on the problems that surface that we don't consider when we
go Willy nilly into this realm of voice interfaces and serving people through conversation in ways that we
don't expect. And one of those examples is think about why organizations today and think about why
it is that so many people want to get into voice interfaces and want to get into chat bots in the first
[00:38:20] So many people are doing this because these airlines, hotels, large
companies, corporations, they fundamentally want to be able to reduce the load on their customer service,
frontline agents or those who are cost center staffers. But if you think about it, who are these call center
staffers? Who are these people who answered your recall when you're calling them in the middle of the
night from the airport, screaming about your lost luggage or screaming about your canceled flight.
[00:38:44] It's somebody who might be in the Philippines or somebody who might be
in India or somebody. Might be in the global south, it was a person of color who is from a lower middle of
middle income country who doesn't have the resources necessary to speak in a general American dialect in
the same way that you would expect somebody who's from your own community to speak.
[00:39:03] And this really illustrates a very, I think, big concern in voice
interfaces today, which is. When we begin to sterilize and flatten out all of these rich nuances that make
our conversations with all of these different people and from all of these different lived experiences, so
important to our worldview and to the ways in which we interact with the world.
[00:39:26] What does that do to our future as users? What does that do to our level
of trust in our user interfaces? What does that do to credibility and authority? Of those user interfaces
and the information that they provide, because let me be honest. When I think about the fact that a voice
interface might lead to a Filipino center worker or somebody who is in Mumbai, who is in a call
center losing their jobs.
[00:39:52] I'm not so sure that I want that replacement to be this uncanny valley
voice that is very stilted and mechanical and might not necessarily reflect the world that we live in today.
And I think this really ties into a lot of the issues that we face around misinformation and automated
racism and algorithmic oppression that we see around machine vision and so on and so forth, voice
interfaces and voice technology and conversational technology.
[00:40:18] These are also domains that are not exempt from the issues that we have in
[00:40:24] Frederick Weiss: [00:40:24] we start
losing the quality of humanity and what you Manatee is, but is there anything I know you were talking
a lot about in chapter six, about, about the future. Are there any brighter notes that you could
[00:40:41] Frederick, there's not, yeah. I don't want to go down the matrix
road, but are there any like cool new things is that we could be looking forward to or things
that we could start thinking about now that would be advantageous for us to go, oh, you know what, let me
next year start thinking about this so I could get my projects.
[00:41:01] Preston So: [00:41:01] Yeah,
absolutely. There's so much to think about. And obviously I wouldn't have written this book if
I thought it was going to be a dystopian nightmare and the next few years, or next few decades because
voice technology really does have a lot of illuminating and very interesting prospects
that I think there's really important things to call out there.
[00:41:19] Not just the facts. And this is not something I mentioned very much in my
book, but I do mention it very briefly in my Alyssa part, article usability testing for voice content,
which is that there aren't a lot of people who I really appreciate waist interfaces for one unexposed.
And that is that I think, as we all know, a lot of us, especially over the course of the last year and a
[00:41:42] And I do want to make sure to hold space for those who are
still dealing with grief or suffering right now from the consequences of the coronavirus
pandemic. Especially of course in India and Australia currently going through a very severe lockdown
and the third wave ongoing in Africa Voice interfaces have been shown to stave off loneliness
[00:42:05] There is research that suggests that having a voice interface that is
there to have a conversation with is something that could be very beneficial for mental health. And in
the future, as these conversations become better and better as voice interfaces, get to the point where they
can do much better, small talk than these really simplistic, let's say gimmicky responses that
[00:42:28] I think we can really look forward to a lot of interesting, let's say
social benefits from voice interfaces. The other one though, I think is also the fact that there
is going to be more efficiency when it comes to content delivery and information delivery. There's a.
Futurists named Mark Curtis, who refers to what's called the conversational singularity.
[00:42:47] And we know about the kind of tech or AI singularity, the conversational
singularity is along the same lines, which is this notion that as we move further and further into the
future, there's going to be a point in time where conversational interfaces will be indistinguishable
from other humans when it comes to the kind of conversation that we three are having right now.
[00:43:09] And one of the things that I think is important to call out, of course, as
well. Okay. That's a great kind of future, but conversational singularity is going to be
indistinguishable, but for whom, right? Whose conversations are going to be indistinguishable.
As I was just saying earlier, but I think one of the really interesting things about the
conversational singularity and some of them.
[00:43:27] Let's say conversations, centric, approaches that are coming out,
which wash away some of the weird distortions that we have today, some of these arbitrary lines in the stand
that we have, where you talk with a certain Alexa skill or a certain Google assistant, and they can only
help you with this one, certain task.
[00:43:43] They can only help you order a pizza, but they can't help you book a
flight. These sorts of interactions will soon become smoother because you know what, maybe I do want
to go directly into just like I would with a hotel concierge. Actually have a conversation that moves
directly into ordering a pizza.
[00:43:58] With extra pineapple as it should be. And then directly into booking a
flight over to my favorite vacation destination. So a lot of these efficiencies are going
to become very important in the future. And I think what's going to happen in the next few decades is
we'll start to see ways in which, okay.
[00:44:16] Yeah. Some of these issues that we have with how conversational
interfaces work or reflect the world that we live in back at us are going to become better in terms of
the efficiency and ultimately the performance of user interfaces in the same way as that websites and mobile
applications have become much more efficient and much more able to get us over to the things that we want to
[00:44:41] Frederick Weiss: [00:44:41] I
remember at a Google IO, they had a, what was the one assistant that called to book a hair appointment for
somebody. And they were like, oh yeah it's completely indistinguishable from a person that's
[00:44:53] Preston So: [00:44:53] I can totally
[00:44:57] Frederick Weiss: [00:44:57] yeah, I,
yeah, I think you could tell, but they said and if you're on a phone call things you have
things in the background you're trying to get through things quickly and you're like, yeah,
[00:45:09] Yeah. It could work. I'm sure. One day, like you said, a person
will get like that that movie, her with Joaquin, Phoenix and Scarlett, Troy.
[00:45:17] Preston So: [00:45:17] Yeah.
Who among us hasn't accidentally answered an automated phone call. That sounds
exactly like a conversation. What are those spam calls that were all besieged by lately and answered a
question because it sounded so real or perish the thought, and this is going to be very
[00:45:32] I think we've all done this, you accidentally answer somebody's
voicemail. Automated message saying, Hey, it's Preston. Oh, Hey I'll leave a message at the
[00:45:43] But yeah, I think it's a really exciting time and I do think
that I think one of the things that's important, and I think this book is very timely, right?
Because one of the things I will admit is that when this book first was being germinated as an
idea, I thought it might be a little early because this project that we did for Georgia was very
[00:46:03] It's one of the first ever content driven information driven voice
interfaces. It's also really one of the first, very few examples of state governments and local
governments doing this kind of work at the time, too. But now I think it's very timely because
one of the things that we've seen over the course of the past year and a half is smart speakers,
[00:46:24] Everyone's buying them, they're flying off the shelves and
increasingly here as we re-answer the world or live with the virus as it continues to be a problem for
so many of us in the world, Just start getting used to some of these other ways of interacting with content.
Other ways of interacting with information, with use cases and applications that we need to actually go
[00:46:48] And voice is just one of those. And I think we're going to see a
lot more investment and a lot more care from the user experience side, not just the developer side
in this realm of, okay, we've done this for the web and the web has served us really well for the
last few decades, but how do we actually make sure that some of these more multimodal approaches, as we
mentioned earlier on accessibility or some of these more interesting immersive or voiced an oral and
immersive approaches can be things that will be compelling for users and designers and practitioners in the
[00:47:25] Frederick Weiss: [00:47:25] Makes
sense? What do you think Brian? Or should we go to the lightning round? Yeah. Yeah.
[00:47:31] Brian Hinton: [00:47:31] getting
close to the end here. So we're
[00:47:33] Frederick Weiss: [00:47:33] flying
rats on. I've got my gloves on. Let's go ahead.
[00:47:37] Brian Hinton: [00:47:37] Yeah. So
we're each gonna ask you a question, answer yours, and one at a time. And I'll go first.
So would you rather be able to run at a hundred miles per hour or fly at 10?
[00:47:49] Preston So: [00:47:49] I have to
think about this one. Probably fly and it's. Yeah. It's because you can see more. Yeah.
[00:47:59] Frederick Weiss: [00:47:59] Preston.
What is your favorite thing about yourself?
[00:48:04] Preston So: [00:48:04] Oh my gosh.
Oh my gosh. These are some questions y'all really, I don't remember the last lightning round
being like this. I think my favorite aspect about myself is that I have learned a lot and I've had the
privilege of living in many different countries, which not everybody has the privilege to say.
[00:48:28] And that's given me a lot of good perspective. I'll say that.
Would you rather live where it snows all the time or where the temperature never falls below a
[00:48:39] degrees? Wow. This is like Snowpiercer versus thread 3d or something like
that. Def. So I'm somebody who needs, so right now I am in an air conditioned room, even though
it's actually not that hard of a day here in New York city, I need the cold, I cannot deal with the
[00:48:57] And so yeah, it's definitely snowing all the time. I could
probably be okay. In, in, in Antarctica actually, I would say, okay,
[00:49:06] Frederick Weiss: [00:49:06] Preston,
what book are you yourself reading? To to learn from currently that you're
[00:49:12] Preston So: [00:49:12] enjoying. All
right. I'm currently reading three different books. Not really making much progress
in either of those; it's like the Fibonacci sequence of reading books and increasing those
[00:49:28] One book that I'm reading, which I will share, which is a very
esoteric book right now is Bosnian Croatian and Serbian a textbook because I'm learning
Serbo Croatian at the moment as a language, but I'm also reading two other books that are really
interesting. The first is conversations with things which is a book written by Rebecca Ivanhoe.
[00:49:49] And I forget the co authors name. I have it right here. I should look at
it. As well as Margot Bloomstein book trustworthy, which is a book about how brands can be
more authentic in how they operate in terms of content strategy.
[00:50:09] Brian Hinton: [00:50:11] What
current fact about your life would most impress your five-year-old self?
[00:50:19] Preston So: [00:50:19] Oh my God.
Wow. My five-year-old self. Got it. I thought that was an easy question. You answered it last time. Did I
really? Oh my gosh. Let me think. The fact about myself, that people I think
the fact that my five-year-old self would most be impressed by is the fact that, oh my gosh
[00:50:48] Frederick Weiss: [00:50:48] I
remember last time you said moving to New York and working in New York was one of your childhood dreams,
[00:50:54] Preston So: [00:50:54] giving them
the answers. That's really funny, cause that's not what I, that's not what I would say to myself
actually. That's really interesting. You know what I'll say is this actually I think
this is an interesting one because just to get a little personal here when I was, and a
lot of us dealt with this when we were younger a lot of us as children, as young toddlers are as
[00:51:14] Kids, we deal with speech impediments or other issues with
let's say pronouncing words correctly, or doing those sorts of things. And I grew up
with a speech impediment, which makes also some of the voice technology kind of things, really poignant.
So what I would say is my five-year-old self would definitely be very proud of me for the fact that I
can basically go on stage in front of 3000 people and not break a sweat or have this live stream
[00:51:43] Of course, there's 3000 people listening to this right now. And
not break a sweat either. Yeah. With a personal note there.
[00:51:52] Frederick Weiss: [00:51:52] Nice.
What is the most interesting thing that you learned in the process of writing
[00:51:58] Preston So: [00:51:58] this book?
Most interesting thing that I learned in the process of writing this book, the most interesting thing I
learned in the process of writing this book is probably the.
[00:52:10] Unexpected applications of accessibility. And unexpected challenges around
accessibility that occur with voice interfaces, especially given the fact that I think a lot of us are
accessibility efficient autos or those who are really passionate about accessibility. I think
[00:52:31] Not only are there so many different types of interfaces that
we need to consider the interface that has become the most important one today, which is that
the screen reader for websites is actually not necessarily the most optimal or pleasant experience. And
I already did have a sense of this because I do a lot of and this is
one thing I think everybody should do is you should always take.
[00:52:57] Sort of user interface, you're building and using it from the
perspective of somebody who's using a screen reader or somebody who's using an assistive
interface, because it is very important to understand how people work from that perspective.
But one of the things, so I already knew that screen meters were really very tough, but I guess
one of the things that I didn't necessarily realize is just how much people actually really
don't like the screen reader sometimes, and really see it as an obstacle to getting to what they
[00:53:28] That was a very long answer.
[00:53:31] Frederick Weiss: [00:53:31]
[00:53:35] Brian Hinton: [00:53:35] What book
[00:53:41] Preston So: [00:53:41] What book has
made me cry? Gosh. Yeah, that's a really interesting question. Wow.
There's been, there's definitely been many books that have made me cry. I would
say the book that both made me cry and made the deepest impact on me is probably, oh my gosh.
I'm just trying to think about this now because yeah the, what I will say is the book
that has definitely made the biggest impact on me and made me cry.
[00:54:22] Both of those were probably invisible men. Which is a book that I
recommend everybody read. It's one of those books that you read in high school or college
English class, but it's a very important book and something that I think everybody should
[00:54:40] Frederick Weiss: [00:54:40] I'm
out of lightning round questions.
[00:54:43] Brian, do you have anything else on that? Oh, no, I think we're good.
Great. Let's get to our final topic here at the end Preston. We like to ask our guests for parting
words of wisdom, any kind of things that you'd like to tell our audience at the end.
[00:55:01] Preston So: [00:55:01] Yeah.
Great question. I think my biggest parting words of advice for everyone, and this is not just
those who are in the design field or who are in the technology world.
[00:55:16] But I think one of the things that I would recommend for everyone who is
watching this or listening to this, or will watch or listen to this is that it's really important
to really listen to. And uplift and amplify and also hear and take into account in your own day-to-day work
and your own day-to-day life.
[00:55:41] The lived experiences of those who are completely unlike you. And by
completely unlike you all of those people who face a multiple axes of marginalization or oppression,
or who faced very deep obstacles in our world today, who might be disabled, might be women or femmes might
be people who are queer or trans might be people who are of color, who are black or indigenous.
[00:56:09] And I think one thing that is really important to me, and one thing
that's very important to the way I live my life is. So I really deeply understand where everyone
is coming from in terms of their context and in terms of how they have come to be the person that they are
today. Because ultimately as practitioners of technology, as those who work on technology, the ultimate
reason we're doing this is to help everybody who is our audience, succeed with what they're
[00:56:42] And there's no way to do that unless you really deeply understand and
take the time to learn about and comprehend what it is that your audience goes through in any field
that we work as people in this world that we live in.
[00:57:02] Frederick Weiss: [00:57:02] Very well
said. Thank you, Preston. Again all your social links we have at Preston So.
[00:57:07] So on Twitter, your website is Preston So. Presence on LinkedIn. And of course the
new book, Voice
Content and Usability by Preston So. Get it
there today. Preston again, thank you so much for being on the show. We really appreciate it.
[00:57:32] Brian Hinton: [00:57:32] No, thank
[00:57:35] Preston So: [00:57:35] Thank you both
so much. It was such a pleasure to be here on Thunder Nerds again, and I'd love to come back sometime.
Maybe I'll rehearse some lightning talk or lightning question responses for next time, but thanks
so much for having me. I appreciate it. Yep.
[00:57:47] Frederick Weiss: [00:57:47] Thank
[00:57:48] Yeah, for the next book. We'll see you then. Thanks everybody. Oh,
hold on. I got one last comment. Let's see. Thank you for all. Todd- “Thank you for all the
phenomenal conversation”. Thank you so much, Todd. Thank you everybody for watching. Really appreciate
If you have questions, or suggestions to modify the transcript, PLEASE let us know at