Following the people and events that make up the research community at Duke

Category: Artificial Intelligence

New Blogger Shariar Vaez-Ghaemi: Arts and Artificial Intelligence

Sticky post

Hi! My name is Shariar. My friends usually pronounce that as Shaw-Ree-Awr, and my parents pronounce it as a Share-Ee-Awr, but feel free to mentally process my name as “Sher-Rye-Eer,” “Shor-yor-ior-ior-ior-ior,” or whatever phonetic concoction your heart desires. I always tell people that there’s no right way to interpret language, especially if you’re an AI (which you might be).

Speaking of AI, I’m excited to study statistics and mathematics at Duke! This dream was born out of my high school research internship with New York Times bestselling author Jonah Berger, through which I immersed myself in the applications of machine learning to the social sciences. Since Dr. Berger and I completed our ML-guided study of the social psychology of communicative language, I’ve injected statistical learning techniques into my investigations of political science, finance, and even fantasy football.

Unwinding in the orchestra room after a performance

When I’m not cramped behind a Jupyter Notebook or re-reading a particularly long research abstract for the fourth time, I’m often pursuing a completely different interest: the creative arts. I’m an orchestral clarinetist and quasi-jazz pianist by training, but my proudest artistic endeavours have involved cinema. During high school, I wrote and directed three short films, including a post-apocalyptic dystopian comedy and a silent rendition of the epic poem “Epopeya de la Gitana.”

I often get asked whether there’s any bridge between machine learning and the creative arts*, to which the answer is yes! In fact, as part of my entry project for Duke-based developer team Apollo Endeavours, I created a statistical language model that writes original poetry. Wandering
Mind, as I call the system, is just one example of the many ways that artificial intelligence can do what we once considered exclusively-human tasks. The program isn’t quite as talented as Frost or Dickinson, but it’s much better at writing poetry than I am.

In a movie production (I’m the one wearing a Totoro onesie)

I look forward to presenting invigorating research topics to blog readers for the next year or more. Though machine learning is my scientific expertise, my investigations could transcend all boundaries of discipline, so you may see me passionately explaining biology experiments, environmental studies, or even macroeconomic forecasts. Go Blue Devils!

(* In truth, I almost never get asked this question by real people unless I say, “You know, there’s actually a connection between machine learning and arts.”)

By Shariar Vaez-Ghaemi, Class of 2025

‘Anonymous Has Viewed Your Profile’: All Networks Lead to Re-Identification

Sticky post

For half an hour this rainy Wednesday, October 6th, I logged on to a LinkedIn Live series webinar with Dr. Jiaming Xu from the Fuqua School of Business. I sat inside the bridge between Perkins and Bostock, my laptop connected to DukeBlue wifi. I had Instagram open on my phone and was tapping through friends’ stories while I waited for the broadcast to start. I had Google Docs open in another tab to take notes. 

The title of the webinar was “Can Anyone Truly Be Anonymous Online?” 

Xu spoke about “network privacy,” which is “the intersection of network analysis and data privacy.” When you make an account, connect to wifi, share your location, search something online, or otherwise hint at your personal information, you are creating a “user profile”: a network of personal data that hints at your identity. 

You are probably familiar with how social media companies track your decisions to curate a more engaging experience for you (i.e. the reason I scroll through TikTok for 5 minutes, then 30 minutes, then… Oh no! Two hours have gone by). Other companies track other kinds of data— data that isn’t always just for algorithmic manipulation or creepy-accurate Amazon ads (i.e. “Hey! I was just thinking about buying cat litter. How did Mr. Bezos know?”). Your name, work history, date of birth, address, location, and other critical identifying factors can be collected even if you think your profile is scrubbed clean. In a rather on-the-nose anecdote to his LinkedIn audience on Wednesday, Xu explained that in April 2021, over 500 million user profiles on LinkedIn were hacked. Valuable, “sensitive, work-related data,” he noted, was made vulnerable. 

Image courtesy of Flickr

So, what do you have to worry about? I know I tend to not worry about my personal information online; letting companies collect my data benefits me. I can get targeted Google ads about things I’m interested in and cool filters on Snapchat. In a medical setting, Xu said, prediction algorithms may help patients’ health in the long run. But even anonymized and sanitized data can be traced back to you. For further reading: in an essay published in July 2021, philosophers Evan Selinger and Judy Rhee elaborate on the dangers of “normalizing surveillance.”

The meat of Xu’s talk was how your data can be traced back to you. Xu gave three examples. 

The first was a study conducted by researchers at the University of Texas- Austin attempting to identify users submitting “anonymous” reviews for movies on Netflix (keep in mind this was 2007, so picture the red Netflix logo on the DVD box accordingly). To achieve this, they cross-referenced the network of reviews published by Netflix with the network of individuals signed up on IMDB; they matched those who reviewed movies similarly on both platforms with their public profiles on IMDB. You can read more about that specific study here. (For those unafraid of the full research paper, click here). 

Let’s take a pause to learn a new vocab word! “Signatures.” In this example, the signature was users’ movie ratings. See if you can name the signature in the other two examples.

The second example was conducted by the same researchers; to identify users on Twitter who shared their data anonymously, it was simply a matter of cross-referencing the network of Twitter users with Flickr users. If you know a guy who knows a guy who knows a guy who knows a guy, you and that group of people are likely to initiate that same chain of following each other on every social media platform you have (it may remind you of the theory that you are connected by “six degrees of separation” from every person on the planet, which, as it turns out, is also supported by social media data). The researchers were able to identify the correct users 30.8% of the time. 

Time for another vocab break! Those users who connect groups of people who know a guy who know a guy who know a guy are called “seeds.” Speaking of which, did you identify the signature in this example? 

Image courtesy of Flickr

The third and final example was my personal favorite because it was the funkiest and creative. Facebook user data— also “scrubbed clean” before being sold to third-party advertisers— was overlain with LinkedIn user data to reveal a network of connections that are repeated. How did they match up those networks, you ask? First, the algorithm assigned a computed score to every individual user based on how many Facebook friends they have and one for every user based on how many LinkedIn connections they have. Then, each user was assigned a list of integers based on their friends’ popularity score. Bet you weren’t expecting that. 

This method sort of improves upon the Twitter/Flickr example, but in addition to overlaying networks and chains of users, it better matches who is who. Since you are likely to know a guy who knows a guy who knows a guy, but you are also likely to know all of those guys down the line, following specific chains does not always accurately convey who is who. Unlike the seeds signature, the friends’ popularity signature was able to correctly re-identify users most of the time. 

Sitting in the bridge Wednesday, I was connected to many networks that I wouldn’t think could be used to identify me through my limited public data. Now, I’m not so sure.

So, what’s the lesson here? At the least, it was fun to learn about, even if the ultimate realization leaves us powerless against big data analytics. Your data has monetary value, and it is not as secure as you think: but it may be worth asking whether or not we even have the ability to protect our anonymity.

Powered by WordPress & Theme by Anders Norén