Some stories are too good to forget. With almost formulaic accuracy, elements from classic narratives are constantly being reused and retained in our cultural consciousness, to the extent that a room of people who’ve never read Romeo and Juliet could probably still piece out its major plot points. But when stories are so pervasive, how can we tell what’s original and what’s Shakespeare with a facelift?

This summer, three Duke undergraduate students in the Data+ summer research program built a computer program to find reused stories.

“We’re looking for invisible adaptations, or appropriations, of stories where there are underlying themes or the messages remain the same,” explains Elise Xia, a sophomore in mechanical engineering. “The goal of our project was to create a model where we could take one of these original stories, get data from it, and find other stories in literature, film, TV that are adaptations.”

The Lion King for example, is a well-known appropriation of Hamlet. The savannahs of Africa are a far cry from Denmark, and “Simba” bears no etymologic resemblance to “Hamlet”, yet they’re fundamentally the same story: A power-hungry uncle kills the king and ousts the heir to the throne, only for an eventually cataclysmic return of the prince. In an alternate ending for the film, Disney directors even considered quoting Hamlet.

“The only difference is that there’s no incest in The Lion King,” jokes Mikaela Johnson, an English and religious studies major and member of the Invisible Adaptations team.

With Hamlet as their model text, the team used a Natural Language Processing system to turn words into data points and compare other movie scripts and novels to the original play.

But the students had to strike a balance between the more surficial yet comprehensive analysis of computers (comparing place names, character names, and direct quotes) with the deeper textual analysis that humans provide.

So, they developed another branch of analysis: After sifting through about 30,000 scholarly texts on Hamlet to identify major themes — monarchy, death, ghost, power, revenge, uncle, etc. – their computer program screened Wikipedia’s database for those key words to identify new adaptations. After comparing the titles found from both primary and secondary sources, they had their final list of Hamlet adaptations.

“What we really tried to do was break down what a story is and how humans understand stories, and then try to translate that into a way a computer can do it,” says Nikhil Kaul, rising junior in computer science and philosophy. “And in a sense, it’s impossible.”

Finding the threshold between a unique story and derivative stories could have serious implications for copyright law and intellectual property in the future. But Grant Glass, UNC graduate student of English and comparative literature and the project manager of this study, believes that the real purpose of the research is to understand the context of each story.

“Appropriating without recognition removes the historical context of how that story was made,” Glass explains. Often, problematic facets of the story are too deeply ingrained to coat over with fresh literary paint: “All of the ugliness of text shouldn’t be capable of being whitewashed – They are compelling stories, but they’re problematic. We owe past baggage to be understood.”

Adaptations include small hat-tips to their original source; quoting the original or using character names. But appropriations of works do nothing to signal their source to their audience, which is why the Data+ team’s thematic analysis of Wikipedia pages was vital in getting a comprehensive list of previously unrecognized adaptations.

“A good adaptation would subvert expectations of the original text,” Glass says. Seth Rogan’s animated comedy, Sausage Party, one of the more surprising movie titles the students’ program found, does just that. “It’s a really vulgar, pretty funny movie,” Kaul explains. “It’s very existential and meta and has a lot of death at the end of it, much like Hamlet does. So, the program picked up on those similarities.”

 Without this new program, the unexpected resemblance could’ve gone unnoticed by literary academia – and whether or not Seth Rogan intended to parallel a grocery store to the Danish royal court, it undoubtedly spins a reader’s expectation of Hamlet on its head.

By Vanessa Moss