data Archives - Research Blog

In the digital age, we are well-acquainted with “data,” a crouton-esque word tossed into conversations, ingrained in the morning rush like half-caf cappuccinos and spreadsheets. Conceptually, data feels benign, necessary, and totally absorbed into the zeitgeist of the 21st century (alongside Survivor, smartphones, and Bitcoin). Data conjures up the census; white-coat scientists and their clinical trials; suits and ties; NGO board meetings; pearled strings of binary code; bar graphs, pie charts, scatter plots, pictographs, endless excel rows and columns, and more rows and more columns.

However, within the conversation of global health, researchers and laymen alike would more often than not describe data collection, use, and sharing as critical for resource mobilization, disease monitoring, surveillance prevention, treatment, etc. (Look at measles eradication! Polio! Malaria! Line graphs A, B, and C!)

Thanks to the internet, extracting health data is also faster, easier, and more widespread than ever . We have grown increasingly concerned, and rightfully so, about data ownership and data sovereignty.

Who is privy to data? Who can possess it? Can you possess it? As you can see, the conversation quickly becomes convoluted, philosophical even.

Dr. Wendy Prudhomme O’Meara is an associate professor at Duke University Medical School in the Division of Infectious Diseases, visiting professor at Moi University, and the Co-Field Director of Research for AMPATH. Her research focuses on malaria.

Dr. Wendy Prudhomme O’Meara, moderator of the Data as a Commodity seminar on Sept. 29 and associate professor at Duke University Medical School in the Division of Infectious Diseases, discussed bioethical complexities of data creation and ownership within global health partnerships.

“We can see that activities—where data is being collected in one place, removed from the context, and value being extracted from it for personal or financial benefit — has very strong parallels to the kind of resource extraction and exploitation that characterized colonization,” she said in her introductory remarks.

Data, like other raw materials (i.e. coffee, sugar, tobacco, etc.), can be extracted, often disproportionately, from lower-middle income countries (LMICs) at the expense of the local populations. This reinforces unequal power dynamics and harkens to the tenets of colonialism and imperialism.

This observation is exemplified by panelist Thiago Hernandes Rocha’s research which focuses on public policy evaluation and data mining. He acknowledges that global health research, in general, should prioritize the health improvements of the studied community rather than publications or grant funding. This may seem somewhat obvious to you; however, though academic competition often fosters nuances in the field, it also contributes to the commercialization of global health. Don’t be shy, everyone point your finger at Big Pharma!

Though Dr. Rocha’s data mining technique refers to “pattern-searching” and analysis of dense data sets, I find “mining” to be an apt analogy for the exploitative potential of data extraction and research partnerships between higher income countries and LMICs.

Dr. Thiago Hernandes Rocha joins the discussion via Zoom. He is an advisor on health data analysis for the Pan American Health Organization.

Consider the British diamond industry in Cape Colony, South Africa, and the parallels between past colonial mineral extraction and current global health data extraction. Imagine taking a pickaxe to the earth.

Now consider the environmental ramifications of mining, and who they disproportionately affect. Consider the lingering social and economic inequalities. Of course, data is not a mine of diamonds (as your Hay Day farm might suggest) nor is it ivory or rubber or timber. It’s less tangible (you can’t necessarily hold it or physically possess it) and, therefore, its extraction also feels less tangible, even though this process can have very concrete consequences.

Data as a power dynamic is a rather recent characterization in academic discourse. Researchers and companies alike have pushed the “open data” movement to increase data availability to all people for all uses. You can see how, in a utopian society, this would be fantastic. Think of the transparency! I’m sure you can also see how, in our non-utopian society, this can be exploited.

Dr. Bethany Hedt-Gauthier, a Harvard University biostatistician and seminar speaker, described herself as “pro ‘open data’ … in a world without power dynamics” — an amendment critical to understanding research as a commodity itself.

She justified her stance by referencing the systematic review of authorship in collaborative health research in Africa that she conducted in collaboration with others in the field. They found that even when sub-Saharan African populations were the main sites of study, when partnered with high-income, elite institutions (like Duke or Harvard), the African authors were significantly less likely to be first or senior authors despite the comparable number of academics on both sides of the partnership. To what can we attribute this discrepancy?

Dr. Bethany Hedt-Gauthier is a biostatistician in health systems research that focuses on the optimization of care and health outcomes in sub-Saharan Africa.

Dr. Hedt-Gauthier describes forms of capital that contribute to this issue, from cultural capital (i.e. credentials) to symbolic capital (i.e. legitimacy) to financial capital; however, she poses colonialism (and its continuity in socioeconomic and political power dynamics today) as the root of this incongruity from which the aforementioned forms of capital bud and flower like poisonous oleander. In recent years, institutions, including Duke, have increased efforts to “decolonize” global health to achieve greater equity, equal participation, and better health outcomes overall.

Dr. Hedt-Gauthier briefly chronicled some of her own research in Rwanda at the start of the COVID-19 pandemic. Within her research partnerships, she recollected slowing down, thoughtfully engaging in two-way dialogue, and posing questions like the following: “Who is involved [in the partnership]?” “Are all parties equally represented in paper authorship?” “If not, how can we share resources to ensure this?” “How can we assure that the people involved in the generation of data are also involved in the interpretation of its results?” “Who has access to data?” “What does co-authorship look like?”

Investing time and energy into multi-country databases, funding collaborative research infrastructures, removing barriers within academia, and training researchers are just some of the methods proposed by the speakers to facilitate equitable partnerships, data sharing and use, and continued global health decolonization.

Dr. Osondu Ogbuoji is an Assistant Research Professor at Duke Global Health Institute (DGHI) and Deputy Director at the Center for Policy Impact in Global Health at DGHI.

Dr. Osondu Ogbuoji, the final panelist, puts it best: “… We should ensure that the people in the room having the discussion about what values the data has should be as diverse as possible and ideally should have all the stakeholders. In our own research, sometimes we think we have an idea of what data to collect, but then we talk to the country partners and they have a totally different idea.”

Though the question of data ownership may feel lofty or intangible, though data legality is confusing, though you may feel yourself adrift in the debate of commodity and capital, the speakers have thrown you a buoy, grab on, and understand that generally:

It is necessary to engage with “data” in a communicative and critical manner; it is necessary to build research partnerships that are synergistic and reciprocal; and, finally, it is necessary to approach global health via these partnerships to advance the field towards greater equity.

Post by Alex Clifford, Class of 2024

Watch the recorded seminar here: https://www.youtube.com/watch?v=wRmFzif8a1c

Hamlet is Everywhere. To Cite, or Not to Cite?

By Vanessa Moss

On August 30, 2019

In Art, Computers/Technology, Students

Some stories are too good to forget. With almost formulaic accuracy, elements from classic narratives are constantly being reused and retained in our cultural consciousness, to the extent that a room of people who’ve never read Romeo and Juliet could probably still piece out its major plot points. But when stories are so pervasive, how can we tell what’s original and what’s Shakespeare with a facelift?

This summer, three Duke undergraduate students in the Data+ summer research program built a computer program to find reused stories.

“We’re looking for invisible adaptations, or appropriations, of stories where there are underlying themes or the messages remain the same,” explains Elise Xia, a sophomore in mechanical engineering. “The goal of our project was to create a model where we could take one of these original stories, get data from it, and find other stories in literature, film, TV that are adaptations.”

The Lion King for example, is a well-known appropriation of Hamlet. The savannahs of Africa are a far cry from Denmark, and “Simba” bears no etymologic resemblance to “Hamlet”, yet they’re fundamentally the same story: A power-hungry uncle kills the king and ousts the heir to the throne, only for an eventually cataclysmic return of the prince. In an alternate ending for the film, Disney directors even considered quoting Hamlet.

“The only difference is that there’s no incest in The Lion King,” jokes Mikaela Johnson, an English and religious studies major and member of the Invisible Adaptations team.

With Hamlet as their model text, the team used a Natural Language Processing system to turn words into data points and compare other movie scripts and novels to the original play.

But the students had to strike a balance between the more surficial yet comprehensive analysis of computers (comparing place names, character names, and direct quotes) with the deeper textual analysis that humans provide.

So, they developed another branch of analysis: After sifting through about 30,000 scholarly texts on Hamlet to identify major themes — monarchy, death, ghost, power, revenge, uncle, etc. – their computer program screened Wikipedia’s database for those key words to identify new adaptations. After comparing the titles found from both primary and secondary sources, they had their final list of Hamlet adaptations.

“What we really tried to do was break down what a story is and how humans understand stories, and then try to translate that into a way a computer can do it,” says Nikhil Kaul, rising junior in computer science and philosophy. “And in a sense, it’s impossible.”

Finding the threshold between a unique story and derivative stories could have serious implications for copyright law and intellectual property in the future. But Grant Glass, UNC graduate student of English and comparative literature and the project manager of this study, believes that the real purpose of the research is to understand the context of each story.

“Appropriating without recognition removes the historical context of how that story was made,” Glass explains. Often, problematic facets of the story are too deeply ingrained to coat over with fresh literary paint: “All of the ugliness of text shouldn’t be capable of being whitewashed – They are compelling stories, but they’re problematic. We owe past baggage to be understood.”

Adaptations include small hat-tips to their original source; quoting the original or using character names. But appropriations of works do nothing to signal their source to their audience, which is why the Data+ team’s thematic analysis of Wikipedia pages was vital in getting a comprehensive list of previously unrecognized adaptations.

“A good adaptation would subvert expectations of the original text,” Glass says. Seth Rogan’s animated comedy, Sausage Party, one of the more surprising movie titles the students’ program found, does just that. “It’s a really vulgar, pretty funny movie,” Kaul explains. “It’s very existential and meta and has a lot of death at the end of it, much like Hamlet does. So, the program picked up on those similarities.”

Without this new program, the unexpected resemblance could’ve gone unnoticed by literary academia – and whether or not Seth Rogan intended to parallel a grocery store to the Danish royal court, it undoubtedly spins a reader’s expectation of Hamlet on its head.

Science in haiku: // Interdisciplinary // Student poetry

By Vanessa Moss

On August 9, 2019

In Data, Engineering, Mathematics, Statistics, Students

On Friday, August 2, ten weeks of research by Data+ and Code+ students wrapped up with a poster session in Gross Hall where they flaunted their newly created posters, websites and apps. But they weren’t expecting to flaunt their poetry skills, too!

Data+ is one of the Rhodes Information Initiative programs at Duke. This summer, 83 students addressed 27 projects addressing issues in health, public policy, environment and energy, history, culture, and more. The Duke Research Blog thought we ought to test these interdisciplinary students’ mettle with a challenge: Transforming research into haiku.

Which haiku is your
favorite? See all of their
finished work below!

Eric Zhang (group members Xiaoqiao Xing and Micalyn Struble not pictured) in “Neuroscience in the Courtroom”

Maria Henriquez and Jake Sumner on “Using Machine Learning to Predict Lower Extremity Musculoskeletal Injury Risk for Student Athletes”

Samantha Miezio, Ellis Ackerman, and Rodrigo Aruajo in “Durham Evictions: A snapshot of costs, locations, and impacts”

Nikhil Kaul, Elise Xia, and Mikaela Johnson on “Invisible Adaptations”

Karen Jin, Katherine Cottrell, and Vincent Wang in “Data-driven approaches to illuminate the responses of lakes to multiple stressors”.

By Vanessa Moss

Kicking Off a Summer of Research With Data+

By Guest Post

On June 12, 2019

In Computers/Technology, Mathematics, Science Communication & Education, Statistics, Students, Visualization

If the May 28 kickoff meeting was any indication, it’s going to be a busy summer for the more than 80 students participating in Duke’s summer research program, Data+.

Offered through the Rhodes Information Initiative at Duke (iiD), Data+ is a 10-week summer program with a focus on data-driven research. Participants come from varied backgrounds in terms of majors and experience. Project themes range from health, public policy, energy and environment, and interdisciplinary inquiry.

“It’s like a language immersion camp, but for data science,” said Ariel Dawn, Rhodes iiD Events & Communication Specialist. “The kids are going to have to learn some of those [programming] languages like Java or Python to have their projects completed,” Dawn said.

Dawn, who previously worked for the Office of the Vice Provost for Research, arrived during the program’s humble beginnings in 2015. Data+ began in 2014 as a small summer project in Duke’s math department funded by a grant from the National Science Foundation. The following year the program grew to 40 students, and it has grown every year since.

Today, the program also collaborates with the Code+ and CS+ summer programs, with more than 100 students participating. Sponsors have grown to include major corporations such as Exxonmobil, which will fund two Data+ projects on oil research within the Gulf of Mexico and the United Kingdom in 2019.

“It’s different than an internship, because an internship you’re kind of told what to do,” said Kathy Peterson, Rhodes iiD Business Manager. “This is where the students have to work through different things and make discoveries along the way,” Peterson said.

From late May to July, undergraduates work on a research project under the supervision of a graduate student or faculty advisor. This year, Data+ chose more than 80 eager students out of a pool of over 350 applicants. There are 27 projects being featured in the program.

Over the summer, students are given a crash course in data science, how to conduct their study and present their work in front of peers. Data+ prioritizes collaboration as students are split into teams while working in a communal environment.

“Data is collected on you every day in so many different ways, sometimes we can do a lot of interesting things with that,” Dawn said. “You can collect all this information that’s really granular and relates to you as an individual, but in a large group it shows trends and what the big picture is.”

Data+ students also delve into real world issues. Since 2013, Duke professor Jonathan Mattingly has led a student-run investigation on gerrymandering in political redistricting plans through Data+ and Bass Connections. Their analysis became part of a 205-page Supreme Court ruling.

The program has also made strides to connect with the Durham community. In collaboration with local company DataWorks NC, students will examine Durham’s eviction data to help identify policy changes that could help residents stay in their homes.

“It [Data+] gives students an edge when they go look for a job,” Dawn said. “We hear from so many students who’ve gotten jobs, and [at] some point during their interview employers said, ‘Please tell us about your Data+ experience.’”

From finding better sustainable energy to examining story adaptations within books and films, the projects cover many topics.

A project entitled “Invisible Adaptations: From Hamlet to the Avengers,” blends algorithms with storytelling. Led by UNC-Chapel Hill grad student Grant Class, students will make comparisons between Shakespeare’s work and today’s “Avengers” franchise.

“It’s a much different vibe,” said computer science major Katherine Cottrell. “I feel during the school year there’s a lot of pressure and now we’re focusing on productivity which feels really good.”

Cottrell and her group are examining the responses to lakes affected by multiple stressors.

Data+ concludes with a final poster session on Friday, August 2, from 2 p.m. to 4 p.m. in the Gross Hall Energy Hub. Everyone in the Duke Community and beyond is invited to attend. Students will present their findings along with sister programs Code+ and the summer Computer Science Program.

Following the people and events that make up the research community at Duke

Tag: data

Is it Time to Decolonize Global Health Data?

Hamlet is Everywhere. To Cite, or Not to Cite?

Science in haiku: // Interdisciplinary // Student poetry

Which haiku is your
favorite? See all of their
finished work below!

Kicking Off a Summer of Research With Data+

Tag: data

Which haiku is your favorite? See all of their finished work below!

Which haiku is your
favorite? See all of their
finished work below!