Following the people and events that make up the research community at Duke

Students exploring the Innovation Co-Lab

Category: Artificial Intelligence

Student Researchers Share What They Know About AI and Health

The healthcare industry and academic medicine are excited about the potential for artificial intelligence — really clever computers — to make our care better and more efficient.

The students from Duke’s Health Data Science (HDS) and AI Health Data Science Fellowship who presented their work at the 2022 Duke AI Health Poster Showcase on Dec. 6 did an excellent job explaining their research findings to someone like me, who knows very little about artificial intelligence and how it works. Here’s what I learned:

Artificial intelligence is a way of training computer systems to complete complex tasks that ordinarily require human thinking, like visual categorization, language translation, and decision-making. Several different forms of artificial intelligence were presented that do healthcare-related things like sorting images of kidney cells, measuring the angles of a joint, or classifying brain injury in CT scans.

Talking to the researchers made it clear that this technology is mainly intended to be supplemental to experts by saving them time or providing clinical decision support.

Meet Researcher Akhil Ambekar

Akhil standing next to his poster “Glomerular Segmentation and Classification Pipeline Using NEPTUNE Whole Slide Images”

Akhil Ambekar and team developed a pipeline to automate the classification of glomerulosclerosis, or scarring of the filtering part of the kidneys, using microscopic biopsy images. Conventionally, this kind of classification is done by a pathologist. It is time-consuming and limited in terms of accuracy and reproducibility of observations. This AI model was trained by providing it with many questions and corresponding answers so that it could learn how to correctly answer questions. A real pathologist oversaw this work, ensuring that the computer’s training was accurate.

Akil’s findings suggest that this is a feasible approach for machine classification of glomerulosclerosis. I asked him how this research might be used in medicine and learned that a program like this could save expert pathologists a lot of time.

What was Akhil’s favorite part of this project? Engaging in research, experimenting with Python and running different models, trying to find what works best.

Meet Researcher Irene Tanner

Irene Tanner and her poster, “Developing a Deep Learning Pipeline to Measure the Hip-Knee-Ankle Angle in Full Leg Radiographs”

The research Irene Tanner and her team have done aims to develop a deep learning-based pipeline to calculate hip-knee-ankle angles from full leg x-rays. This work is currently in progress, but preliminary results suggest the model can precisely identify points needed to calculate the angles of hip to knee to ankle. In the future, this algorithm could be applied to predict outcomes like pain and physical function after a patient has a joint replacement surgery.

What was Irene’s favorite part of this project? Developing a relationship with mentor, Dr. Maggie Horn, who she said provided endless support whenever help was needed.

Meet Researcher Brian Lerner

Brian Lerner and his poster, “Using Deep Learning to Classify Traumatic Brain Injury in CT Scans”

Brian Lerner and his team investigated the application of deep learning to standardize and sharpen diagnoses of traumatic brain injury (TBI) from Computerized Tomography (CT) scans of the brain. Preliminary findings suggest that the model used (simple slice) is likely not sufficient to capture the patterns in the data. However, future directions for this work might examine how the model could be improved. Through this project, Brian had the opportunity to shadow a neurologist in the ER and speculated upon many possibilities for the use of this research in the field.

What was Brian’s favorite part of this project? Shadowing neurosurgeon Dr. Syed Adil at Duke Hospital and learning what the real-world needs for this science are.

Many congratulations to all who presented at this year’s AI Health Poster Showcase, including the many not featured in this article. A big thanks for helping me to learn about how AI Health research might be transformative in answering difficult problems in medicine and population health.

By Victoria Wilson, Class of 2023

2019 Duke Grad Founds Cryptocurrency Startup Fei Protocol

As cryptocurrency gains popularity, people continue to question “How and where can these tokens be used?” A November 2021 study by Pew Research reported that 86% of Americans claimed to have heard about cryptocurrency and 16% say they personally have invested in, traded, or otherwise used it

Despite this, there are still very few places where one can make purchases directly using crypto. This means that in order to use cryptocurrency, people must first convert it back to US dollars, which can cost a lot due to transaction fees. Additionally, the exchange rate between any given crypto token and USD changes by the second, resulting in a lack of price stability.

(If you are unfamiliar with cryptocurrency or transaction (gas) fees please refer to my prior article here.)

Duke Alum, Joey Santoro, sensed this gap and saw an opportunity. Santoro graduated from Duke in 2019 with a major in Computer Science. There needed to be a volatility-free token with a stable valuation (i.e. matching the USD), to move between the worlds of crypto and fiat currency. This is also known as a stablecoin. While several were already in existence, Santoro wanted to create a more scalable and decentralized one.

Thus, in December of 2020, Joey founded the Fei Protocol. Fei is a stablecoin in Ethereum native decentralized finance (DeFi). Stablecoins are a type of token that aids in maintaining a liquid market by pegging the token’s value to the USD.  Fei is able to achieve this through various stability mechanisms. Stablecoins can be used for real-life transactions while still benefiting from instant processing and the security of cryptocurrency payments.

When asked why he chose to work in crypto as opposed to Machine Learning (ML) or Artificial Intelligence (AI) Joey explained that it came down to how much impact he could have.

“The barrier for making an avenue of innovation in crypto is so much lower than something like a machine learning. Higher risk, higher reward.”

joey santoro

Santoro did not come to Duke with the plan of founding a web3 DeFi protocol. In fact, when he matriculated he was actually pre-med and originally only took CS 101 because it was a pre-requisite for the Neuroscience major.

However, it did not take long for Joey to realize he wanted to work in the crypto space. In his second semester, he joined the Duke Blockchain Lab and ended up teaching a blockchain course in his junior and senior years.

Because decentralized finance is still so new, no one completely knows what they are doing, which creates considerable opportunities for innovation. Additionally, because the crypto space is decentralized, it is inherently collaborative and community-driven. 

“Being able to write code that’s immediately interoperable with dozens of financial protocols is the coolest thing ever,” Santoro said

Joey argues anyone can become an expert in a particular area in crypto in a couple of months. He said economists and mechanism designers are increasingly moving into the crypto space. 

When the Fei Protocol launched in 2020 it was the height of a bull market for crypto and there was heavy demand for a decentralized stablecoin. While there were several other stablecoins in existence, USDC and tether were the most popular and they were both centralized, meaning they were owned by companies. 

“What so important to me and why I do this is because I want people to be able to do whatever they want with their money.”

JOey santoro

The demand for a decentralized stablecoin created excitement around Feio but also a highly compressed timescale. The Fei Protocol ended up having the largest token launch for an Ethereum DeFi protocol in history, raising $1.25B. However, when it launched,  the peg broke due to issues with the incentive mechanism and bugs in the code.

Santoro recalled the surreal and challenging experience of watching the protocol he spent countless weeks working on fall apart before his eyes. However, his team and investors decided to stick it through and try to salvage what they had built. It took over a month just to fix everything that had gone wrong. In the meantime, people were threatening Santoro and his team. 

While the Fei protocol faced challenges while launching,  Joey and his team were able to adapt, learn from their mistakes, and come back stronger. They recently conducted a multi-billion-dollar merge with Rari Capital and launched Fei version2 (V2).

Additionally, this is the first multi-billion dollar merger in DeFi meaning that the decision to merge was voted on by members of the respective Decentralized Autonomous Organizations (DAOs). This is a huge milestone in the world of DeFi and sets a precedent for the potential of decentralized business operations. 

Joey Santoro Presenting at the ETHDenver Convention

Moving forward Joey explained, “I’m obsessed with simplicity now; I still move fast but more carefully.”

Post by Anna Gotskind, Class of 2022

Opening the Black Box: Duke Researchers Discuss Bias in AI

Artificial intelligence has not only inherited many of the strongest capabilities of the human brain, but it has also proven to use them more efficiently and effectively. Object recognition, map navigation, and speech translation are just a few of the many skills that modern AI programs have mastered, and the list will not stop growing anytime soon.

Unfortunately, AI has also magnified one of humanity’s least desirable traits: bias. In recent years, algorithms influenced by bias have often caused more problems than they sought to fix.

When Google’s image recognition AI was found to be classifying some Black people as gorillas in 2015, the only consolation for those affected was that AI is improving at a rapid pace, and thus, incidents of bias would hopefully begin to disappear. Six years later, when Facebook’s AI made virtually the exact same mistake by labeling a video of Black men as “primates,” both tech fanatics and casual observers could see a fundamental flaw in the industry.

Jacky Alciné’s tweet exposing Google’s racist AI algorithm enraged thousands in 2015.


On November 17th, 2021, two hundred Duke Alumni living in all corners of the world – from Pittsburgh to Istanbul and everywhere in between – assembled virtually to learn about the future of algorithms, AI, and bias. The webinar, which was hosted by the Duke Alumni Association’s Forever Learning Institute, gave four esteemed Duke professors a chance to discuss their view of bias in the artificial intelligence world.

Dr. Stacy Tantum, Bell-Rhodes Associate Professor of the Practice of Electrical and Computer Engineering, was the first to mention the instances of racial bias in image classification systems. According to Tantum, early facial recognition did not work well for people of darker skin tones because the underlying training data – observations that inform the model’s learning process – did not have a broad representation of all skin tones. She further echoed the importance of model transparency, noting that if an engineer treats an AI as a “black box” – or a decision-making process that does not need to be explained – then they cannot reasonably assert that the AI is unbiased.

Stacy Tantum, who has introduced case studies on ethics to students in her Intro to Machine Learning Class, echoes the importance of teaching bias in AI classrooms.

While Tantum emphasized the importance of supervision of algorithm generation, Dr. David Hoffman – Steed Family Professor of the Practice of Cybersecurity Policy at the Sanford School of Public Policy – explained the integration of algorithm explainability and privacy. He pointed to the emergence of regulatory legislation in other countries that ensure restrictions, accountability, and supervision of personal data in cybersecurity applications. Said Hoffman, “If we can’t answer the privacy question, we can’t put appropriate controls and protections in place.”

To discuss the implications of blurry privacy regulations, Dr. Manju Puri – J.B. Fuqua Professor of Finance at the Fuqua School of Business – discussed how the big data feeding modern AI algorithms impact each person’s digital footprint. Puri noted that data about a person’s phone usage patterns can be used by banks to decide whether that person should receive a loan. “People who call their mother every day tend to default less, and people who walk the same path every day tend to default less.” She contends that the biggest question is how to behave in a digital world where every action can be used against us.

Dr. Philip Napoli has observed behaviors in the digital world for several years as James R. Shepley Professor of Public Policy at the Sanford School, specifically focusing on self-reinforcing cycles of social media algorithms. He contends that Facebook’s algorithms, in particular, reward content that gets people angry, which motivates news organizations and political parties to post galvanizing content that will swoop through the feeds of millions. His work shows that AI algorithms can not only impact the behaviors of individuals, but also massive organizations.

At the end of the panel, there was one firm point of agreement between all speakers: AI is tremendously powerful. Hoffman even contended that there is a risk associated with not using artificial intelligence, which has proven to be a revolutionary tool in healthcare, finance, and security, among other fields. However, while proven to be immensely impactful, AI is not guaranteed to have a positive impact in all use cases – rather, as shown by failed image recognition platforms and racist healthcare algorithms that impacted millions of Black people, AI can be incredibly harmful.

Thus, while many in the AI community dream of a world where algorithms can be an unquestionable force for good, the underlying technology has a long way to go. What stands between the status quo and that idealistic future is not more data or more code, but less bias in data and code.

Post by Shariar Vaez-Ghaemi, Class of 2025


Decentralized Finance and the Power of Smart Contracts

When people use apps or services like Netflix, Instagram, Amazon, etc. they sign, or rather virtually accept, digital user agreements. Digital agreements have been around since the 1990s. These agreements are written and enforced by the institutions that create these services and products. However, in certain conditions, these systems fail and these digital or service-level agreements can be breached, causing people to feel robbed. 

A recent example of this is the Robinhood scandal that occurred in mid-2021. Essentially, people came together and all wanted to buy the same stock. However, Robinhood ended up restricting buying, citing issues with volatile stock and regulatory agreements. As a result, they ended up paying $70 million dollars in fines for system outages and misleading customers. And individual customers were left feeling robbed. This was partially the result of centralization and Robinhood having full control over the platform as well as enforcing the digital agreement.

Zak Ayesh Presenting on Chainlink
and Decentralized Smart Contracts

Zak Ayesh, a developer advocate at Chainlink recently came to Duke to talk about decentralized Smart Contracts that could solve many of the problems with current centralized digital agreements and traditional paper contracts as well. 

What makes smart contracts unique is that they programmatically implement a series of if-then rules without the need for a third-party human interaction. While currently these are primarily being used on blockchains, they were actually created by computer scientist Nick Szabo in 1994. Most smart contracts now run on blockchains because it allows them to remain decentralized and transparent. If unfamiliar with blockchain refer to my previous article here. 

Smart contracts are self-executing contracts with the terms of the agreement being directly written into computer code.

Zak Ayesh

There are several benefits to decentralized contracts. The first is transparency. Because every action on a blockchain is recorded and publicly available, the enforcement of smart contracts is unavoidably built-in. Next is trust minimization and guaranteed execution. With smart contracts, there is reduced counterparty risk — that’s the probability one party involved in a transaction or agreement might default on its contractual obligation because neither party has control of the agreement’s execution or enforcement. Lastly, they are more efficient due to automation. Operating on blockchains allows for cheaper and more frictionless transactions than traditional alternatives. For instance, the complexities of cross-border remittances involving multiple jurisdictions and sets of legal compliances can be simplified through coded automation in smart contracts.

Dr. Campbell Harvey, a J. Paul Sticht Professor of International Business at Fuqua, has done considerable research on smart contracts as well, culminating in the publication of a book, DeFi and the Future of Finance which was released in the fall of 2021.

In the book, Dr. Harvey explores the role smart contracts play in decentralized finance and how Ethereum and other smart contract platforms give rise to the ability for decentralized application or dApp. Additionally, smart contracts can only exist as long as the chain or platform they live on exists. However, because these platforms are decentralized, they remove the need for a third party to mediate the agreement. Harvey quickly realized how beneficial this could be in finance, specifically decentralized finance or DeFi where third-party companies, like banks, mediate agreements at a high price.  

“Because it costs no more at an organization level to provide services to a customer with $100 or $100 million in assets, DeFi proponents believe that all meaningful financial infrastructure will be replaced by smart contracts which can provide more value to a larger group of users,” Harvey explains in the book

Beyond improving efficiency, this also creates greater accessibility to financial services. Smart contracts provide a foundation for DeFi by eliminating the middleman through publicly traceable coded agreements. However, the transition will not be completely seamless and Harvey also investigates the risks associated with smart contracts and advancements that need to be made for them to be fully scalable.

Ultimately, there is a smart contract connectivity problem. Essentially, smart contracts are unable to connect with external systems, data feeds, application programming interfaces (APIs), existing payment systems, or any other off-chain resource on their own. This is something called the Oracle Problem which Chainlink is looking to solve.

Harvey explains that when a smart contract is facilitating an exchange between two tokens, it determines the price by comparing exchange rates with another similar contract on the same chain. The other smart contract is therefore acting as a price oracle, meaning it is providing external price information. However, there are many opportunities to exploit this such as purchasing large amounts on one oracle exchange in order to alter the price and then go on to purchase even more on a different exchange in the opposite direction. This allows for capitalization on price movement by manipulating the information the oracle communicates to other smart contracts or exchanges. 

That being said, smart contracts are being used heavily, and Pratt senior Manmit Singh has been developing them since his freshman year along with some of his peers in the Duke Blockchain Lab. One of his most exciting projects involved developing smart contracts for cryptocurrency-based energy trading on the Ethereum Virtual Machine allowing for a more seamless way to develop energy units.

One example of how this could be used outside of the crypto world is insurance. Currently, when people get into a car accident it takes months or even a year to evaluate the accident and release compensation. In the future, there could be sensors placed on cars connected to smart contracts that immediately evaluate the damage and payout.

Decentralization allows us to avoid using intermediaries and simply connect people to people or people to information as opposed to first connecting people to institutions that can then connect them to something else. This also allows for fault tolerance: if one blockchain goes down, the entire system does not go down with it. Additionally, because there is no central source controlling the system, it is very difficult to gain control of thus protecting against attack resistance and collusion resistance. While risks like the oracle problem need to be further explored, the world and importance of DeFi, as well as smart contracts, is only growing.

And as Ayesh put it, “This is the future.”

Post by Anna Gotskind, Class of 2022

Back in Action: HackDuke’s 2021 “Code for Good”

If you walked across Duke’s Engineering Quad between 9AM on Saturday, October 23rd, and 5PM on Sunday, October 24th, the scene might’ve looked like that of any other day: students gathered in small groups, working diligently.

But then you’d see the giant banner and realize something special was afoot. These students were participating in HackDuke’s “Code for Good,” one of the most eminent social good hackathons in the country.

Participants have to “build something, not just an idea,” said Anita Li, co-director of HackDuke. Working in teams, students develop software, hardware, or quantum solutions to problems in one of four tracks: inequality, health, education, and energy and environment.

Participants can win “track prizes,” where $2,400 in total donations are made in winners’ names ($300 for first, $200 for second, $100 for third) to charities doing work in that track. There are other prizes too. Sponsors, including Capital One, Accenture, and Microsoft give incentives: if participants incorporate their technology or use their database, they’re qualified to win that sponsor’s prize (gift cards, usually, or software worth hundreds of dollars).

This year, Duke’s department of Student Affairs sponsored the health track, in hopes that participants might come up with ideas that could help promote student wellness here at Duke. “It’s a great space for thinking about these issues,” Li said.

Li told me they had more than 1,000 registrations, though there’s always a little less turnout. HackDuke is open to all students and recent graduates, so that “you get to see these cool ideas from everywhere.”

Just under half of this year’s participants were from Duke, almost 10% hailed from UNC, and the rest were from other universities across the US and the world. 30 percent of participants were women — a significant increase from the last HackDuke covered by the Research Blog, in 2014. 

This year is “particularly interesting,” Li said, because of the hybrid model. Last year, everything was virtual. This year, about 300 (vaccinated) students attended in person, making HackDuke one of the few Major League Hacking events with an in-person component this year. With the hybrid model, talks, workshops, and demos are all livestreamed so that no one misses out.

Some social events also had online elements: you could zoom into the Bob Ross painting session as well as the open mic, which Li said quickly turned into karaoke night. The spicy ramen challenge was “a little harder over Zoom.”

I came across Sydney Wang and Ray Lennon, along with teammate Jean Rabideau, as they were building a web app called JamJar for the Education Track contest. In the app, students give real-time feedback to teachers about how well they’re understanding the material. There are three categories: engagement (you can rank your engagement along a scale from “mentally I’m in outer space” to “locked in), understanding (“where am I?” to “crystal clear”), and speed (“a glacial pace” to “TOO FAST!”). Student responses get compiled and graphed to show mean markers of understanding over time. 

Lennon said he’s participating because “this is the best way to learn: to be thrown in the fire and have to learn as you go.” Wang felt the same way. She’s new to coding, and feels like she’s learning a lot from Lennon.

Like Lennon and Wang, many participants see HackDuke as an opportunity to learn. There are technical workshops where participants can learn HTML and CSS. There are talks where speakers discuss working in the coding and social good sector. The CTO of change.org, Elaine Zhou, flew to Durham to speak to participants about her experience. So there’s a networking opportunity, too — participants can meet people like Zhou doing the work they want to do, and professors and company representatives who can help them on their journey to get there.

There were challenges. Staying hydrated was one: by Sunday morning, they’d gone through seven cases of water, 16 cases of soda, and three cases of red bull. “It takes a lot of liquids,” Li said. And then there’s sleep — or lack thereof. When Li was participating in her freshman year, she slept for about three hours. Many people pull all-nighters, but “nap sporadically everywhere,” Li said. “It’s like finals season, with everyone knocked out.” She saw a handful of guys sleeping on the floor in Fitzpatrick. She gave them bed pads. 

Li’s love for HackDuke is contagious. She loves to see participants focusing on social good and drawing on their awareness of what’s happening in the world. “People are thinking about things that are intense; they’re really worrying about issues facing certain communities,” Li said.

At HackDuke, people really are coding for good.

Post by Zella Hanson

New Blogger Shariar Vaez-Ghaemi: Arts and Artificial Intelligence

Hi! My name is Shariar. My friends usually pronounce that as Shaw-Ree-Awr, and my parents pronounce it as a Share-Ee-Awr, but feel free to mentally process my name as “Sher-Rye-Eer,” “Shor-yor-ior-ior-ior-ior,” or whatever phonetic concoction your heart desires. I always tell people that there’s no right way to interpret language, especially if you’re an AI (which you might be).

Speaking of AI, I’m excited to study statistics and mathematics at Duke! This dream was born out of my high school research internship with New York Times bestselling author Jonah Berger, through which I immersed myself in the applications of machine learning to the social sciences. Since Dr. Berger and I completed our ML-guided study of the social psychology of communicative language, I’ve injected statistical learning techniques into my investigations of political science, finance, and even fantasy football.

Unwinding in the orchestra room after a performance

When I’m not cramped behind a Jupyter Notebook or re-reading a particularly long research abstract for the fourth time, I’m often pursuing a completely different interest: the creative arts. I’m an orchestral clarinetist and quasi-jazz pianist by training, but my proudest artistic endeavours have involved cinema. During high school, I wrote and directed three short films, including a post-apocalyptic dystopian comedy and a silent rendition of the epic poem “Epopeya de la Gitana.”

I often get asked whether there’s any bridge between machine learning and the creative arts*, to which the answer is yes! In fact, as part of my entry project for Duke-based developer team Apollo Endeavours, I created a statistical language model that writes original poetry. Wandering
Mind, as I call the system, is just one example of the many ways that artificial intelligence can do what we once considered exclusively-human tasks. The program isn’t quite as talented as Frost or Dickinson, but it’s much better at writing poetry than I am.

In a movie production (I’m the one wearing a Totoro onesie)

I look forward to presenting invigorating research topics to blog readers for the next year or more. Though machine learning is my scientific expertise, my investigations could transcend all boundaries of discipline, so you may see me passionately explaining biology experiments, environmental studies, or even macroeconomic forecasts. Go Blue Devils!

(* In truth, I almost never get asked this question by real people unless I say, “You know, there’s actually a connection between machine learning and arts.”)

By Shariar Vaez-Ghaemi, Class of 2025

‘Anonymous Has Viewed Your Profile’: All Networks Lead to Re-Identification

For half an hour this rainy Wednesday, October 6th, I logged on to a LinkedIn Live series webinar with Dr. Jiaming Xu from the Fuqua School of Business. I sat inside the bridge between Perkins and Bostock, my laptop connected to DukeBlue wifi. I had Instagram open on my phone and was tapping through friends’ stories while I waited for the broadcast to start. I had Google Docs open in another tab to take notes. 

The title of the webinar was “Can Anyone Truly Be Anonymous Online?” 

Xu spoke about “network privacy,” which is “the intersection of network analysis and data privacy.” When you make an account, connect to wifi, share your location, search something online, or otherwise hint at your personal information, you are creating a “user profile”: a network of personal data that hints at your identity. 

You are probably familiar with how social media companies track your decisions to curate a more engaging experience for you (i.e. the reason I scroll through TikTok for 5 minutes, then 30 minutes, then… Oh no! Two hours have gone by). Other companies track other kinds of data— data that isn’t always just for algorithmic manipulation or creepy-accurate Amazon ads (i.e. “Hey! I was just thinking about buying cat litter. How did Mr. Bezos know?”). Your name, work history, date of birth, address, location, and other critical identifying factors can be collected even if you think your profile is scrubbed clean. In a rather on-the-nose anecdote to his LinkedIn audience on Wednesday, Xu explained that in April 2021, over 500 million user profiles on LinkedIn were hacked. Valuable, “sensitive, work-related data,” he noted, was made vulnerable. 

Image courtesy of Flickr

So, what do you have to worry about? I know I tend to not worry about my personal information online; letting companies collect my data benefits me. I can get targeted Google ads about things I’m interested in and cool filters on Snapchat. In a medical setting, Xu said, prediction algorithms may help patients’ health in the long run. But even anonymized and sanitized data can be traced back to you. For further reading: in an essay published in July 2021, philosophers Evan Selinger and Judy Rhee elaborate on the dangers of “normalizing surveillance.”

The meat of Xu’s talk was how your data can be traced back to you. Xu gave three examples. 

The first was a study conducted by researchers at the University of Texas- Austin attempting to identify users submitting “anonymous” reviews for movies on Netflix (keep in mind this was 2007, so picture the red Netflix logo on the DVD box accordingly). To achieve this, they cross-referenced the network of reviews published by Netflix with the network of individuals signed up on IMDB; they matched those who reviewed movies similarly on both platforms with their public profiles on IMDB. You can read more about that specific study here. (For those unafraid of the full research paper, click here). 

Let’s take a pause to learn a new vocab word! “Signatures.” In this example, the signature was users’ movie ratings. See if you can name the signature in the other two examples.

The second example was conducted by the same researchers; to identify users on Twitter who shared their data anonymously, it was simply a matter of cross-referencing the network of Twitter users with Flickr users. If you know a guy who knows a guy who knows a guy who knows a guy, you and that group of people are likely to initiate that same chain of following each other on every social media platform you have (it may remind you of the theory that you are connected by “six degrees of separation” from every person on the planet, which, as it turns out, is also supported by social media data). The researchers were able to identify the correct users 30.8% of the time. 

Time for another vocab break! Those users who connect groups of people who know a guy who know a guy who know a guy are called “seeds.” Speaking of which, did you identify the signature in this example? 

Image courtesy of Flickr

The third and final example was my personal favorite because it was the funkiest and creative. Facebook user data— also “scrubbed clean” before being sold to third-party advertisers— was overlain with LinkedIn user data to reveal a network of connections that are repeated. How did they match up those networks, you ask? First, the algorithm assigned a computed score to every individual user based on how many Facebook friends they have and one for every user based on how many LinkedIn connections they have. Then, each user was assigned a list of integers based on their friends’ popularity score. Bet you weren’t expecting that. 

This method sort of improves upon the Twitter/Flickr example, but in addition to overlaying networks and chains of users, it better matches who is who. Since you are likely to know a guy who knows a guy who knows a guy, but you are also likely to know all of those guys down the line, following specific chains does not always accurately convey who is who. Unlike the seeds signature, the friends’ popularity signature was able to correctly re-identify users most of the time. 

Sitting in the bridge Wednesday, I was connected to many networks that I wouldn’t think could be used to identify me through my limited public data. Now, I’m not so sure.

So, what’s the lesson here? At the least, it was fun to learn about, even if the ultimate realization leaves us powerless against big data analytics. Your data has monetary value, and it is not as secure as you think: but it may be worth asking whether or not we even have the ability to protect our anonymity.

Page 3 of 3

Powered by WordPress & Theme by Anders Norén