Anyone who has ever tried to formulate and answer their own research question knows that it means entering uncharted waters. This past weekend the hundreds of students in Duke Datathon 2018 did just that, using only their computer science prowess and a splash of innovation.

Here’s how it worked: the students were provided three data sets by Credit Sesame, a free credit score estimator, and given eight hours to use their insight and computer science knowledge to interpret the data and create as much value for the company as they could. Along the way, Duke Undergraduate Machine Learning (DUML), the organization hosting the event, provided mentors and workshops to help the participants find direction and achieve their goals. 

Datathon participants attempting to derive meaning from the Credit Sesame Data

This year was the first such ‘Datathon’ event to take place at Duke. The event attracted big-name sponsors such as Google and Pinterest and was made possible by the DUML executive team, headed by co-presidents Rohith Kuditipudi and Shrey Gupta (to see a full list of event sponsors, click here).

DUML faculty advisor Dr. Rebecca Steorts said that even the planning of the event transcended disciplines: one of her undergraduate students and co-president of DUML, Shrey Gupta, found a way to utilize statistics to predict how many people would be attending. “It’s all about finding computational ways of combining disciplines to solve the problem,” Steorts said, and it’s very apparent that her students have taken this to heart.

The winning team (Jie Cai, Catie Grasse, Feroze Mohideen) presenting on how they can best gauge which customers are most “valuable” to Credit Sesame

After more than an hour of deliberations, the eight top teams were selected and five finalists were asked to present their findings to the judges. The winning team (Jie Cai, Catie Grasse, Feroze Mohideen) proposed a way to gauge which customers who create trial accounts are most likely to be profitable, by using a computer filtering program to predict likely customer engagement based on customer-supplied data and their interaction with the free trial. Other top teams discussed similar topics with different variations on how Credit Sesame might best create this profile to determine who the “valuable” customers are likely to be.

DUML hosts other events throughout the year to engage students such as their MLBytes Speaker Series and ECE Seminar Series. To learn more about Duke Undergraduate Machine Learning, click here.

by Rebecca Williamson