Tech Friday

A true friend who betrays all your secrets: Korean AI сhatbot turned a data protection failure


Listen Later

Most Americans are unsure of how consciously companies behave while using and protecting personal info. Nearly 81% of them report being insecure about potential risks of data collection, and 66% claim to feel the same about government data collection. 

It`s really difficult to weigh the level of potential risks and understand the anticipated harm that irresponsible behavior with personal data can cause. How can this affect our further actions, what restrictions and changes will it bring?

Let`s analyze the main points of what happens when disclosing personal data and how to properly protect yourself using the example of the recent situation in South Korea.

Korean company ScatterLab launched a “scientific and data-driven” app, which was supposed to predict the degree of attachment in relationships.

In December 2020, the company introduced an A.I. chatbot Lee-Luda. 

The bot was positioned as a well-trained AI consultant, taught on more than 10 billion conversation logs from the app. “20-year-old female” Lee-Luda is ready to set a true friendship with everybody. 
As the company`s CEO mentioned, CEO the purpose of Lee-Luda was to become “an A.I. chatbot that people prefer as a conversation partner over a person.”

Just after a couple of weeks of the bot launch users could not help but pay attention to the harsh treatment and statements from the bot towards certain social groups and minorities (LGBTQ+, people with disabilities, feminists, etc.).

The developer company, ScatterLab, explained this phenomenon by the fact that the bot took information from the basic dataset for training, not from personal user discussions.
Thus, it is clear that the company did not properly filter out the set of phrases and profanity before starting the bot training.

Lee-Luda could not have learned how to include such personal information in its responses unless they existed in the training dataset. 
But there some “good news” as well: it is possible to recover the training dataset from the AI chatbot. So, if personal information existed in the training dataset, it can be extracted by querying the chatbot.

Still going not so bad, huh? 
To make things worse, ScatterLab had uploaded a training set of 1,700 sentences, which was a part of the larger dataset is collected, on Github. 
It exposed names of more than 20 people, along with the locations they have been to, their relationship status, and some of their medical information.

Despite the fact that this situation has become a high-profile event in Korea, it has not received attention on a global scale (and we think quite unfairly).
It's not about the negligence and dishonesty of the creators, this incident reflects the general trend in the development of the AI industry. Users of software based on technology have little control over the collection and use of personal data.
Situations like this should make you think about more careful and conscientious data management.

The pace of technology development is significantly ahead of the adoption of regulatory standards for their use. It is hard to foresee where the technology will lead us in a couple of years.

So, the glocal question is “Are AI and tech companies able to independently control the ethical component of the used and developed innovations?”.
Is it worth going back to the concept of "corporate social responsibility"? And where is this golden mean (Innovation VS Humanity)?



...more
View all episodesView all episodes
Download on the App Store

Tech FridayBy Zfort Group