[glossary_exclude]We leak so much information about ourselves.[/glossary_exclude]by Leo A. Notenboom
(Image: DALL-E 3)
I was listening to a podcast recently in which one of the hosts described giving ChatGPT the history of the questions he had asked it and then asking it to build a profile on him based on that.
It was interesting, of course, but it got me to thinking... why not use internet searches as the input? There's much more information available in our search histories since we use it for more general-purpose things.
I was not disappointed, a little surprised, and more than a little impressed.
[glossary_exclude]Using AI to build a profile from search history[/glossary_exclude][glossary_exclude]I used AI tools to build a personal profile of me based on my Google search history. Claude AI used 16,000 searches over five years, but ChatGPT managed with just 126 over six months. Both provided surprisingly accurate profiles. The upshot is that benign data can be used to construct detailed personal profiles. We all need to be aware of online data leakage and what we expose when we use the internet.[/glossary_exclude]
My search history
My Google search history goes back to 2006 and has 48,392 entries. The first entry was "signed up for" (I have no idea why), and the most recent search was "download search history" (for obvious reasons)1. The whole list was over 900 megabytes in size.
That's a lot of data to chew on.
In fact, it was too much.
Claude
I started with Claude because it can accept much more data in a query than ChatGPT can. Unfortunately, 48,000 search terms were still over the limit. In fact, I had to cut it back to 16,000 searches representing the most recent five years.
I wrote this query:
Provide a profile for the person whose search history is as follows:
followed by 16,000 searches.
Here's what Claude had to say:
Based on the search history provided, here is a possible profile for this person:
- Male, likely in his 40s or 50s
- Lives in the Seattle area
- Works in tech, possibly as a software developer or IT professional
- Interested in computers, technology, online security, Linux/Unix, web development (WordPress, PHP, etc)
- Owns a Nikon digital SLR camera
- Dog owner, possibly of a Welsh Corgi
- Enjoys watching TV shows like Game of Thrones, Westworld, Vikings
- Travels occasionally, including some international trips
- Financially savvy; invests and tracks finances online
- Health-conscious; watches diet and nutrition
- Politically left-leaning
Some key details that point to this profile:
- Technical topics like Linux command line, security tips, and web development suggest a tech professional
- Interest in Nikon cameras and Photoshop indicates a photography hobby
- Corgi-related searches imply a Corgi owner
- TV interests point to sci-fi/fantasy and historical dramas
- Travel searches about things like passport renewal and international banking
- Finance-related searches on stocks, Bitcoin, and bank accounts
- Nutrition/diet searches suggest health-consciousness
- Political searches on topics like ACLU donations suggest left-leaning views
The combination of tech-related, finance, photography and pet hobbies indicates a male in his 40s or 50s who likely lives in the Seattle area and works in a tech-related job. Some details may be guesses, but the search topics provide a reasonably detailed profile.
It's not far off. (Though I chuckled that it thinks I'm in my 40s or 50s. ) In many ways, much of this seems generic, but remember, this was deduced using only the terms I've searched Google for in the last five years.
I was impressed.
ChatGPT
ChatGPT's limits are such that I could give it only about 126 searches from the last six months2 of my searches.
Same question:
Provide a profile for the person whose search history is as follows:
followed by 126 searches.
Here's what ChatGPT had to say: