We all value privacy – at least to some extent. But some of us want to be famous, and all of us want to connect with friends and acquaintances. We like the convenience from technology that requires our personal information to operate. So we share our personal details in many ways, and our data flows like water down a stream into lakes and oceans, some of which we’d prefer to avoid. And our information becomes a piece of society’s knowledge base. Databases like the U.S. Census have essential purposes, but they’re only reliable and complete if we are comfortable sharing our data. How to respect individual privacy and achieve reliable databases? That’s a challenge!
In this podcast episode Alex Watson, co-founder and CEO of Gretel.ai, explains two essential phrases to understand how this can be done. Alex founded a security startup called Harvest.ai, which was acquired by Amazon Web Services in 2016, when he became AWS General Manager and it launched its first customer-facing security offering. Gretel.ai is an early-stage startup that offers tools to help developers safely share and collaborate with sensitive data in real-time.
Alex explains that privacy is a problem rooted in code, not in compliance. By auto-anonymization, the personal data of an individual is separated from the underlying data so that the database where the information is needed comes to it without identifying the individual. The essential information is shared without allowing someone to know which individual’s information it is. While nothing is hack-proof, auto-anonymization eliminates the link between an individual and data about that individual as it moves to another user. Personal privacy is preserved in the transmission and further use.
The other key phrase to understand is differentially private synthetic data. Data Privacy Detective Podcast 55 offers an introduction to the topic. This phrase means that information within a database has been changed to eliminate the ability to trace back the data to a particular individual. The information is private and individual to a person, but as pieces of data are shared for a purpose, they are not traceable to a specific person. The database user only needs the provided information, not the identity of individuals who contributed each piece.
There is great public benefit in encouraging people to share sensitive data – e.g., public health databases, sociological research, Census Bureau studies. But people will share their private data only if they are comfortable knowing it will not be misused. Database users should ensure that they do not acquire personal data that identifies individuals without the need to have that information.
Auto-anonymization and differentially private synthetic data – two phrases one should know. Their proper usage can achieve privacy by design. This will be an important contribution to creating reliable databases humankind needs to advance public health and other social good.
If you have ideas for more interviews or stories, please email 
[email protected].