Duke Research Blog

Following the people and events that make up the research community at Duke.

Tag: data

Hamlet is Everywhere. To Cite, or Not to Cite?

Some stories are too good to forget. With almost formulaic accuracy, elements from classic narratives are constantly being reused and retained in our cultural consciousness, to the extent that a room of people who’ve never read Romeo and Juliet could probably still piece out its major plot points. But when stories are so pervasive, how can we tell what’s original and what’s Shakespeare with a facelift?

This summer, three Duke undergraduate students in the Data+ summer research program built a computer program to find reused stories.

“We’re looking for invisible adaptations, or appropriations, of stories where there are underlying themes or the messages remain the same,” explains Elise Xia, a sophomore in mechanical engineering. “The goal of our project was to create a model where we could take one of these original stories, get data from it, and find other stories in literature, film, TV that are adaptations.”

The Lion King for example, is a well-known appropriation of Hamlet. The savannahs of Africa are a far cry from Denmark, and “Simba” bears no etymologic resemblance to “Hamlet”, yet they’re fundamentally the same story: A power-hungry uncle kills the king and ousts the heir to the throne, only for an eventually cataclysmic return of the prince. In an alternate ending for the film, Disney directors even considered quoting Hamlet.

“The only difference is that there’s no incest in The Lion King,” jokes Mikaela Johnson, an English and religious studies major and member of the Invisible Adaptations team.

With Hamlet as their model text, the team used a Natural Language Processing system to turn words into data points and compare other movie scripts and novels to the original play.

But the students had to strike a balance between the more surficial yet comprehensive analysis of computers (comparing place names, character names, and direct quotes) with the deeper textual analysis that humans provide.

So, they developed another branch of analysis: After sifting through about 30,000 scholarly texts on Hamlet to identify major themes — monarchy, death, ghost, power, revenge, uncle, etc. – their computer program screened Wikipedia’s database for those key words to identify new adaptations. After comparing the titles found from both primary and secondary sources, they had their final list of Hamlet adaptations.

“What we really tried to do was break down what a story is and how humans understand stories, and then try to translate that into a way a computer can do it,” says Nikhil Kaul, rising junior in computer science and philosophy. “And in a sense, it’s impossible.”

Finding the threshold between a unique story and derivative stories could have serious implications for copyright law and intellectual property in the future. But Grant Glass, UNC graduate student of English and comparative literature and the project manager of this study, believes that the real purpose of the research is to understand the context of each story.

“Appropriating without recognition removes the historical context of how that story was made,” Glass explains. Often, problematic facets of the story are too deeply ingrained to coat over with fresh literary paint: “All of the ugliness of text shouldn’t be capable of being whitewashed – They are compelling stories, but they’re problematic. We owe past baggage to be understood.”

Adaptations include small hat-tips to their original source; quoting the original or using character names. But appropriations of works do nothing to signal their source to their audience, which is why the Data+ team’s thematic analysis of Wikipedia pages was vital in getting a comprehensive list of previously unrecognized adaptations.

“A good adaptation would subvert expectations of the original text,” Glass says. Seth Rogan’s animated comedy, Sausage Party, one of the more surprising movie titles the students’ program found, does just that. “It’s a really vulgar, pretty funny movie,” Kaul explains. “It’s very existential and meta and has a lot of death at the end of it, much like Hamlet does. So, the program picked up on those similarities.”

 Without this new program, the unexpected resemblance could’ve gone unnoticed by literary academia – and whether or not Seth Rogan intended to parallel a grocery store to the Danish royal court, it undoubtedly spins a reader’s expectation of Hamlet on its head.

By Vanessa Moss

Science in haiku: // Interdisciplinary // Student poetry

On Friday, August 2, ten weeks of research by Data+ and Code+ students wrapped up with a poster session in Gross Hall where they flaunted their newly created posters, websites and apps. But they weren’t expecting to flaunt their poetry skills, too! 

Data+ is one of the Rhodes Information Initiative programs at Duke. This summer, 83 students addressed 27 projects addressing issues in health, public policy, environment and energy, history, culture, and more. The Duke Research Blog thought we ought to test these interdisciplinary students’ mettle with a challenge: Transforming research into haiku.

Which haiku is your
favorite? See all of their
finished work below!

Eric Zhang (group members Xiaoqiao Xing and Micalyn Struble not pictured) in “Neuroscience in the Courtroom”
Maria Henriquez and Jake Sumner on “Using Machine Learning to Predict Lower Extremity Musculoskeletal Injury Risk for Student Athletes”
Samantha Miezio, Ellis Ackerman, and Rodrigo Aruajo in “Durham Evictions: A snapshot of costs, locations, and impacts”
Nikhil Kaul, Elise Xia, and Mikaela Johnson on “Invisible Adaptations”
Karen Jin, Katherine Cottrell, and Vincent Wang in “Data-driven approaches to illuminate the responses of lakes to multiple stressors”.

By Vanessa Moss

Kicking Off a Summer of Research With Data+

If the May 28 kickoff meeting was any indication, it’s going to be a busy summer for the more than 80 students participating in Duke’s summer research program, Data+.

Offered through the Rhodes Information Initiative at Duke  (iiD), Data+ is a 10-week summer program with a focus on data-driven research. Participants come from varied backgrounds in terms of majors and experience. Project themes range  from health, public policy, energy and environment, and interdisciplinary inquiry.

“It’s like a language immersion camp, but for data science,” said Ariel Dawn, Rhodes iiD Events & Communication Specialist. “The kids are going to have to learn some of those [programming] languages like Java or Python to have their projects completed,” Dawn said.

Dawn, who previously worked for the Office of the Vice Provost for Research, arrived during the program’s humble beginnings in 2015. Data+ began in 2014 as a small summer project in Duke’s math department funded by a grant from the National Science Foundation. The following year the program grew to 40 students, and it has grown every year since.

Today, the program also collaborates with the Code+ and CS+ summer programs, with  more than 100 students participating. Sponsors have grown to include major corporations such as Exxonmobil, which will fund two Data+ projects on oil research within the Gulf of Mexico and the United Kingdom in 2019.

“It’s different than an internship, because an internship you’re kind of told what to do,” said Kathy Peterson, Rhodes iiD Business Manager. “This is where the students have to work through different things and make discoveries along the way,” Peterson said.

From late May to July, undergraduates work on a research project under the supervision of a graduate student or faculty advisor. This year, Data+ chose more than 80 eager students out of a pool of over 350 applicants. There are 27 projects being featured in the program.

Over the summer, students are given a crash course in data science, how to conduct their study and present their work in front of peers. Data+ prioritizes collaboration as students are split into teams while working in a communal environment.

“Data is collected on you every day in so many different ways, sometimes we can do a lot of interesting things with that,” Dawn said.  “You can collect all this information that’s really granular and relates to you as an individual, but in a large group it shows trends and what the big picture is.”

Data+ students also delve into real world issues. Since 2013, Duke professor Jonathan Mattingly has led a student-run investigation on gerrymandering in political redistricting plans through Data+ and Bass Connections. Their analysis became part of a 205-page Supreme Court ruling.

The program has also made strides to connect with the Durham community. In collaboration with local company DataWorks NC, students will examine Durham’s eviction data to help identify policy changes that could help residents stay in their homes.

“It [Data+] gives students an edge when they go look for a job,” Dawn said. “We hear from so many students who’ve gotten jobs, and [at] some point during their interview employers said, ‘Please tell us about your Data+ experience.’”

From finding better sustainable energy to examining story adaptations within books and films, the projects cover many topics.

A project entitled “Invisible Adaptations: From Hamlet to the Avengers,” blends algorithms with storytelling. Led by UNC-Chapel Hill grad student Grant Class, students will make comparisons between Shakespeare’s work and today’s “Avengers” franchise.

“It’s a much different vibe,” said computer science major Katherine Cottrell. “I feel during the school year there’s a lot of pressure and now we’re focusing on productivity which feels really good.”

Cottrell and her group are examining the responses to lakes affected by multiple stressors.

Data+ concludes with a final poster session on Friday, August 2, from 2 p.m. to 4 p.m. in the Gross Hall Energy Hub. Everyone in the Duke Community and beyond is invited to attend. Students will present their findings along with sister programs Code+ and the summer Computer Science Program.

Writing by Deja Finch (left)
Art by Maya O’Neal (right)

Powered by WordPress & Theme by Anders Norén