Data Science at Home

The dark side of AI: metadata and the death of privacy (Ep. 91)

12.23.2019 - By Francesco GadaletaPlay

Download our free app to listen on your phone

Download on the App StoreGet it on Google Play

Get in touch with us

Join the discussion about data science, machine learning and artificial intelligence on our Discord server

 

Episode transcript

We always hear the word “metadata”, usually in a sentence that goes like this

 

Your Honor, I swear, we were not collecting users data, just metadata.

 

Usually the guy saying this sentence is Zuckerberg, but could be anybody from Amazon or Google. “Just” metadata, so no problem. This is one of the biggest lies about the reality of data collection.

 

F: Ok the first question is, what the hell is metadata? 

 

Metadata is data about data. 

 

F: Ok… still not clear.Imagine you make a phone call to your mum. How often do you call your mum, Francesco?F: Every day of course! (coughing)

 

Good boy! Ok, so let’s talk about today’s phone call. Let’s call “data” the stuff that you and your mum actually said. What did you talk about? 

 

F: She was giving me the recipe for her famous lasagna. 

So your mum’s lasagna is the DATA. What is the metadata of this phone call? The lasagna has data of its own attached to it: the date and time when the conversation happened, the duration of the call, the unique hardware identifiers of your phone and your mum’s phone, the identifiers of the two sim cards, the location of the cell towers that pinged the call, the GPS coordinates of the phones themselves. 

 

F: yeah well, this lasagna comes with a lot of data :) 

And this is assuming that this data is not linked to any other data like your Facebook account or your web browsing history. More of that later. 

 

F: Whoa Whoa Whoa, ok. Let’s put a pin in that. Going back to the “basic” metadata that you describe. I think we understand the concept of data about data. I am sure you did your research and you would love to paint me a dystopian nightmare, as always. Tell us why is this a big deal? 

 

Metadata is a very big deal. In fact, metadata is far more “useful” than the actual data, where by “useful” I mean that it allows a third party to learn about you and your whole life. What I am saying is, the fact that you talk with your mum every day for 15 minutes is telling me more about you than the content of the actual conversations. In a way, the content does not matter. Only the metadata matters. 

 

F: Ok, can you explain this point a bit more? 

 

Imagine this scenario: you work in an office in Brussels, and you go by car. Every day, you use your time in the car while you go home to call your mum. So every day around 6pm, a cell tower along the path from your office to your home pings a call from your phone to your mum’s phone. Someone who is looking at your metadata, knows exactly where you are while you call your mum. Every day you will talk about something different, and it doesn't really matter.  Your location will come through loud and clear. A lot of additional information can be deduced from this too: for example, you are moving along a motorway, therefore you have a car. The metadata of a call to mum now becomes information on where you are at 6pm, and the way you travel. 

 

F: I see. So metadata about the phone call is, in fact, real data about me. 

 

Exactly. YOU are what is interesting, not your mum’s lasagna.

 

F: you say so because you haven’t tried my mum’s lasagna. But I totally get your point.

 

Now, imagine that one day, instead of going straight home, you decide to go somewhere else. Maybe you are secretly looking for another job. Your metadata is recording the fact that after work you visit the offices of a rival company. Maybe you are a journalist and you visit your anonymous source. Your metadata records wherever you go, and one of these places is your secret meeting with your source. Anyone’s metadata can be combined with yours. There will be someone who was with you at the time and place of your secret meeting. Anyone who comes in contact with you can be tagged and monitored. Now their anonymity has been reduced. 

 

F: I get it. So, compared to the content of my c

More episodes from Data Science at Home