Duke Research Blog

Following the people and events that make up the research community at Duke.

Category: Data Page 1 of 6

Digging Into Durham’s Eviction Problem

This is what 20 years of evictions looks like. It’s an animated heat map of Durham, the streets overlaid with undulating blobs of red and orange and yellow, like a grease stain.

Duke students in the summer research program Data+ have created a time-lapse map of the more than 200,000 evictions filed in Durham County since 2000.

Dark red areas represent eviction hotspots. These neighborhoods are where families cook their favorite meals, where children do their homework, where people celebrate holidays. They’re also where many people live one crisis away from losing their neighbors, or becoming homeless themselves.

Duke junior Samantha Miezio points to a single census tract along NC 55 where, in the wake of an apartment building sale, more than 100 households received an eviction notice in that spot in one month alone. It “just speaks to the severity of the issue,” Miezio said.

Miezio was part of a team that spent 10 weeks this summer mapping and analyzing evictions data from the Durham County Sheriff’s Office, thanks to an effort by DataWorks NC to compile such data and make it more accessible.

The findings are stark.

Every hour in Durham, at least one renter is threatened with losing their home. About 1,000 eviction cases were filed a month against tenants between 2010 and 2017. That’s roughly one for every 280 residents in Durham, where evictions per capita is one of the highest in the state and double the national average.

The data tell us that while Durham’s evictions crisis has actually improved from where it was a few years ago, stubborn hotspots persist, said team member Ellis Ackerman, a math major at North Carolina State University.

When the students looked at the data month by month, a few things stood out. For one, winter evictions are common. While some countries such as France and Austria ban winter evictions to keep from pushing people onto the street in the cold, in Durham, “January is the worst month by far,” said team member Rodrigo Araujo, a junior majoring in computer science. “In the winter months utility bills are higher; they’re struggling to pay for that.”

Rodrigo Araujo (Computer Science, 2021) talks about the Durham evictions project.

The team also investigated the relationship between evictions and rents from 2012 to 2014 to see how much they move in tandem with each other. Their initial results using two years’ worth of rent data showed that when rents went up, evictions weren’t too far behind.

“Rents increased, and then two months later, evictions increased,” Miezio said.

But the impacts of rising rents weren’t felt evenly. Neighborhoods with more residents of color were significantly affected while renters in white neighborhoods were not. “This crisis is disproportionately affecting those who are already at a disadvantage from historical inequalities,” Miezio said.

A person can be evicted for a number of reasons, but most evictions happen because people get behind on their rent. The standard guideline is no more than 30% of your monthly income before taxes should go to housing and keeping the lights on.

But in Durham, where 47% of households rent rather than own a home, only half of renters meet that goal. As of 2019 an estimated 28,917 households are living in rentals they can’t afford.

The reason is incomes haven’t kept pace with rents, especially for low-wage workers such as waiters, cooks, or home health aides.

Durham’s median rents rose from $798 in 2010 to $925 in 2016. That’s out of reach for many area families. A minimum wage worker in Durham earning $7.25/hour would need to work a staggering 112 hours a week — the equivalent of nearly three full-time jobs — to afford a modest two-bedroom unit in 2019 at fair market rent, according to a report by the National Low Income Housing Coalition.

Spending a sizable chunk of your income on housing means having less left over for food, child care, transportation, savings, and other basic necessities. One unexpected expense or emergency — maybe the kid gets sick or the car needs repairs, or there’s a cut back on hours at work — can mean tenants have a harder time making the rent.

“Evictions are traumatic life experiences for the tenants,” and can have ripple effects for years, Miezio said.

Tenants may have only a few days to pay what’s due or find a new place and move out. The Sheriff may come with movers and pile a person’s belonging on the curb, or move them to a storage facility at the tenant’s expense.

A forced move can also mean children must change schools in the middle of the school year.

Benefits may go to the wrong address. Families are uprooted from their social support networks of friends and neighbors.

Not every case filed ends with the tenant actually getting forced out, “but those filings can still potentially inhibit their ability to find future housing,” Miezio said. Not to mention the cost and hassle of appearing in court and paying fines and court fees.

Multiple groups are working to help Durham residents avoid eviction and stay in their homes. In a partnership between Duke Law and Legal Aid of North Carolina, the Civil Justice Clinic’s 2-year-old Eviction Diversion Program provides free legal assistance to people who are facing eviction.

“The majority of people who have an eviction filed against them don’t have access to an attorney,” Miezio said.

In a cost-benefit analysis, the team’s models suggest that “with a pretty small increase in funding to reduce evictions, on the order of $100,000 to $150,000, Durham could be saving millions of dollars” in the form of reduced shelter costs, hospital costs, plus savings on mental health services other social services, Ackerman said.

Ellis Ackerman, a senior math major from NC State University, talks about the Durham evictions research project.

Moving forward, they’re launching a website in order to share their findings. “I’ve learned HTML and CSS this summer,” said Miezio, who is pursuing an individualized degree program in urban studies. “That’s one of the things I love about Data+. I’m getting paid to learn.”

Miezio plans to continue the project this fall through an independent study course focused on policy solutions to evictions, such as universal right to counsel.

“Housing access and stability are important to Durham,” said Duke’s vice president for Durham affairs Stelfanie Williams. “Applied research projects such as this, reflecting a partnership between the university and community, are opportunities for students to ‘learn by doing’ and to collaborate with community leaders on problem-solving.”

Data+ 2019 is sponsored by Bass Connections, the Rhodes Information Initiative at Duke, the Social Science Research Institute, the Duke Energy Initiative, and the departments of Mathematics and Statistical Science.

Other Duke sponsors include DTECH, Science, Law, and Policy Lab, Duke Health, Duke University Libraries, Sanford School of Public Policy, Nicholas School of the Environment, Duke Global Health Institute, Development and Alumni Affairs, the Duke River Center, Representing Migrations Humanities Lab, Energy Initiative, Franklin Humanities Institute, Duke Forge, the K-Lab, Duke Clinical Research, Office for Information Technology and the Office of the Provost, as well as the departments of Electrical & Computer Engineering, Computer Science, Biomedical Engineering, Biostatistics & Bioinformatics and Biology.

Government funding comes from the National Science Foundation. Outside funding comes from Exxon Mobil, the International Institute for Sustainable Development (IISD), Global Financial Markets Center, and Tether Energy.

Writing by Robin Smith; Video by Wil Weldon
Post by Robin Smith Video by Wil Weldon

Science in haiku: // Interdisciplinary // Student poetry

On Friday, August 2, ten weeks of research by Data+ and Code+ students wrapped up with a poster session in Gross Hall where they flaunted their newly created posters, websites and apps. But they weren’t expecting to flaunt their poetry skills, too! 

Data+ is one of the Rhodes Information Initiative programs at Duke. This summer, 83 students addressed 27 projects addressing issues in health, public policy, environment and energy, history, culture, and more. The Duke Research Blog thought we ought to test these interdisciplinary students’ mettle with a challenge: Transforming research into haiku.

Which haiku is your
favorite? See all of their
finished work below!

Eric Zhang (group members Xiaoqiao Xing and Micalyn Struble not pictured) in “Neuroscience in the Courtroom”
Maria Henriquez and Jake Sumner on “Using Machine Learning to Predict Lower Extremity Musculoskeletal Injury Risk for Student Athletes”
Samantha Miezio, Ellis Ackerman, and Rodrigo Aruajo in “Durham Evictions: A snapshot of costs, locations, and impacts”
Nikhil Kaul, Elise Xia, and Mikaela Johnson on “Invisible Adaptations”
Karen Jin, Katherine Cottrell, and Vincent Wang in “Data-driven approaches to illuminate the responses of lakes to multiple stressors”.

By Vanessa Moss

Overdiagnosis and the Future of Cancer Medicine

For many years, the standard strategy for fighting against cancer has been to find it early with screening when the person is still healthy, then hit it with a merciless treatment regimen to make it go away.

But not all tumors will become life-threatening cancers. Many, in fact, would have caused no issues for the rest of the patients’ lives had they not been found by screening. These cases belong to the category of overdiagnosis, one of the chief complaints against population-level screening programs.

Scientists are reconsidering the way to treat tumors because the traditional hit-it-hard approach has often caused the cancer to seemingly go away, only to have a few cells survive and the entire tumor roar back later with resistance to previously effective medicine.

Dr. Marc Ryser, the professor who gave this meaty talk

In his May 23 talk to Duke Population Health, “Cancer Overdiagnosis: A Discourse on Population Health, Biologic Mechanism and Statistics,” Marc Ryser, an assistant professor at Duke’s Departments of Population Health Sciences and Mathematics, walked us through how parallel developments across different disciplines have been reshaping our cancer battle plan. He said the effort to understand the true prevalence of overdiagnosis is a point of focus in this shift.

Past to Future: the changing cancer battle plan
Credit: Marc Ryser, edit: Brian Du

Ryser started with the longstanding biological theory behind how tumors develop. Under the theory of clonal sweeps, a relatively linear progression of successive key mutations sweeps through the tumor, giving it increasing versatility until it is clinically diagnosed by a doctor as cancer.

Clonal sweeps model, each shade is a new clone that introduces a mutation credit: Sievers et al. 2016

With this as the underpinning model, the battle plan of screen early, treat hard (point A) makes sense because it would be better to break the chain of progression early rather than later when the disease is more developed and much more aggressive. So employing screening extensively across the population for the various types of cancer is the sure choice, right?

But the data at the population level for many different categories of cancers doesn’t support this view (point B). Excluding the cases of cervical cancer and colorectal cancer, which have benefited greatly from screening interventions, the incidence of advanced cases of breast cancer and other cancers have stayed at similar levels or actually continued to increase during the years of screening interventions. This has raised the question of when screening is truly the best option.

Scientists are thinking now in terms of a “benefit-harm balance” when mass-screening public health interventions are carried out. Overdiagnosis would pile up on the harms side, because it introduces unnecessary procedures that are associated with adverse effects.

Thinking this way would be a major adjustment, and it has brought with it major confusion.

Paralleling this recent development on the population level, new biological understanding of how tumors develop has also introduced confusion. Scientists have discovered that tumors are more heterogeneous than the clonal sweeps model would make it appear. Within one tumor, there may be many different subpopulations of cancer cells, of varying characteristics and dangerousness, competing and coexisting.

Additional research has since suggested a more complex, evolutionary and ecological based model known as the Big Bang-mutual evolution model. Instead of the “stepwise progression from normal to increasingly malignant cells with the acquisition of successive driver mutations, some cancers appear to evolve more like a Big Bang, where the malignant ability is already concentrated in the founder cell,” Ryser said.

As the first cell starts to replicate, its descendants evolve in parallel into different subpopulations expressing different characteristics. While more research has been published in favor of this model, some scientists remain skeptical.

Ryser’s research contributes to this ongoing discussion. In comparing the patterns by which mutations are present or absent in cancerous and benign tumors, he obtained results favoring the Big Bang-mutual evolution model. Rather than seeing a neat region of mutation within the tumor, which would align with the clonal sweeps model, he saw mutations dispersed throughout the tumor, like the spreading of newborn stars in the wake of the Big Bang.

How to think about mutations within a tumor
credit: NASA

The more-complicated Big Bang-mutual evolution model justifies an increasingly nuanced approach to cancer treatment that has been developing in the past few years. Known as precision medicine (point C), its goal is to provide the best treatment available to a person based on their unique set of characteristics: genetics, lifestyle, and environment. As cancer medicine evolves with this new paradigm, when to screen will remain a key question, as will the benefit-harm balance.

There’s another problem, though: Overdiagnosis is incredibly hard to quantify. In fact, it’s by nature not possible to directly measure it. That’s where another area of Ryser’s research seeks to find the answers. He is working to accurately model overdiagnosis to estimate its extent and impact.

Going forward, his research goal is to try to understand how to bring together different scales to best understand overdiagnosis. Considering it in the context of the multiscale developments he mentioned in his talk may be the key to better understand it.

Post by Brian Du

Building a Mangrove Map

“Gap maps” are the latest technology when it comes to organizing data. Although they aren’t like traditional maps, they can help people navigate through dense resources of information and show scientists the unexplored areas of research.

A ‘gap map’ comparing conservation interventions and outcomes in tropical mangrove habitats around the world turns out to be a beautiful thing.

At Duke’s 2019 Master’s Projects Spring Symposium, Willa Brooks, Amy Manz, and Colyer Woolston presented the results of their year-long Masters Project to create this map.

You’d never know by looking at the simple, polished grid of information that it took 29 Ph.D. students, master’s students and undergraduates nearly a full year to create it. As a member of the Bass Connections team that has been helping to support this research, I can testify that gap maps take a lot of time and effort — but they’re worth it.

Amy Manz, Willa Brooks, and Colyer Woolston present their evidence map (or gap map) at the 2019 Master’s Projects Spring Symposium

When designing a research question, it’s important to recognize what is already known, so that you can clearly visualize and target the gaps in the knowledge.

But sifting through thousands of papers on tropical mangroves to find the one study you are looking for can be incredible overwhelming and time-intensive. This is purpose of a gap map: to neatly organize existing research into a comprehensive grid, effectively shining a light on the areas where research is lacking, and highlighting patterns in areas where the research exists.

In partnership with World Wildlife Fund, Willa, Amy, and Colyer’s team has been working under the direction of Nicholas School of the Environment professors Lisa Campbell and Brian Silliman to screen the abstracts of over 10,000 articles, 779 of which ended up being singled out for a second round of full-text screening. In the first round, we were looking for very specific inclusion criteria, and in the second, we were extracting data from each study to identify the outcomes of conservation interventions in tropical mangrove, seagrass, and coral reef habitats around the world.

Coastal Mangroves (Photo from WikiCommons: US National Oceanic and Atmospheric Administration)

While the overall project looked at all three habitats, Willa, Amy, and Colyer’s Master’s Project focused specifically on mangroves, which are salt-tolerant shrubs that grow along the coast in tropical and subtropical regions. These shrubs provide a rich nursery habitat to a diverse group of birds and aquatic species, and promote the stability of coastlines by trapping sediment runoff in their roots. However, mangrove forests are in dramatic decline.

According to World Wildlife Fund, 35 percent of mangrove ecosystems in the world are already gone. Those that remain are facing intense pressure from threats like forest clearing, overharvesting, overfishing, pollution, climate change, and human destruction of coral reefs. Now more than ever, it is so important to study the conservation of these habitats, and implement solutions that will save these coastal forests and all the life they support. The hope is that our gap map will help point future researchers towards these solutions, and aid in the fight to save the mangroves.

This year’s team built a gap map that successfully mapped linkages between interventions and outcomes, indicating which areas are lacking in research. However, the gap map is limited because it does not show the strength or nature of these relationships. Next year, another Bass Connections team will tackle this challenge of analyzing the results, and further explore the realm of tropical conservation research.

Post by Anne Littlewood, Trinity ’21

A How-To Guide for Climate-Proof Cities

Roughly 400 miles separate Memphis and New Orleans. Interstate 55 connects the two cities, snaking south parallel to the Mississippi River. The drive is dull. There are few cars. The trees are endless.

South of the Louisiana border, the land turns flat, low, and wet. The air grows warmer, and heavy with moisture. I-55 cuts through the center of Maurepas Swamp, a 100,000-plus acre tract of protected wetlands. Groves of gumball and oak are rare here—instead, thin swamps of bald cypress and tupelo trees surround the highway on either side. At night, only their skeletal silhouettes are visible. They rise from the low water, briefly illuminated by passing headlights. Even in the dark, the trees are unmistakably dead.

*  *  *

A healthy cypress swamp in Lake Martin, Louisiana (Source: U.S. Geological Survey)

Traditionally, Maurepas Swamp serves as a natural barrier against flooding that threatens New Orleans each year. Native flora soaks up the rainfall, spreading it across a network of cypress roots and cattail. But centuries of logging and canal construction have drastically altered the swamp’s ecological composition. The Mississippi levee system compounded the issue, isolating the swamp from vital sources of fresh water and nutrients. Flooded with saltwater, much of the existing cypress withered and died. Young trees, now, are few and scattered. 

Maurepas Swamp highlights the danger of even the most well-intentioned changes to the  environment. This problem is hardly unique to the wetlands. “Many of the issues that we are experiencing today were seen as solutions in the past,” says Nancy Grimm, a professor of ecology at Arizona State University. “What we want to do now is to think about the future, so that the solutions of today don’t become the problems of tomorrow.”

Nancy Grimm addresses urban sustainability at the 2019 Henry J. Oosting Memorial Lecture in Ecology. (Source: Nicholas School of the Environment)

Grimm is the co-director of the UREx Sustainability Research Network. UREx aims to climate-proof urban municipalities without sacrificing environmental stability. To do so, UREx has partnered with several cities across the United States and Latin America. Each city hosts a workshop geared towards municipal decision makers, such as government officials,  environmental NGOS, and more. Together, these participants design different “futures” addressing their cities’ most pressing concerns. 

Phoenix, Arizona is one of the nine initial cities partnering with UREx. One of the hottest cities in the United States, Phoenix is already plagued with extreme heat and drought. By 2060, Phoenix is projected to have 132 days above 100°F—a 44 percent increase from data collected in 2010.  

UREx doesn’t dwell too much on these statistics.  “We’re bombarded constantly by dystopian narratives of tomorrow,” says Grimm, with a slight smile. “Instead, what we want to think about are ways we can envision a more positive future.”

The Phoenix workshop produced five distinct visions of what the city could look like in sixty years. Some scenarios are more ambitious than others—“The Right Kind of Green,” for example, imagines a vastly transformed city defined by urban gardens and lush vegetation. But each vision of Phoenix contains a common goal: a greener, cooler city that retains its soul. 

A visualization accompanies each scenario. In one, a family walks about a small orchard. The sky is blue, and the sun is out. But no one seems bothered by the heat. The oranges are vibrant; the trees thick, and full. It’s an idyllic future. But it’s one within grasp.  

Post by Jeremy Jacobs

Style Recommendations From Data Scientists

A combination of data science and psychology is behind the recommendations for products we get when shopping online.

At the intersection of social psychology, data science and fashion is Amy Winecoff.

Amy Winecoff uses her background in psychology and neuroscience to improve recommender systems for shopping.

After earning a Ph.D. in psychology and neuroscience here at Duke, Winecoff spent time teaching before moving over to industry.

Today, Winecoff works as a senior data scientist at True Fit, a company that provides tools to retailers to help them decide what products they suggest to their customers.

True Fit’s software relies on collecting data about how clothes fit people who have bought them. With this data on size and type of clothing, True Fit can make size recommendations for a specific consumer looking to buy a certain product.    

In addition to recommendations on size, True Fit is behind many sites’ recommendations of products similar to those you are browsing or have bought.

While these recommender systems have been shown to work well for sites like Netflix, where you may have watched many different movies and shows in the recent past that can be used to make recommendations, Winecoff points out that this can be difficult for something like pants, which people don’t tend to buy in bulk.

To overcome this barrier, True Fit has engineered its system, called the Discovery engine, to parse a single piece of clothing into fifty different traits. With this much information, making recommendations for similar styles can be easier.

However, Winecoff’s background in social psychology has led her to question how well these algorithms make predictions that are in line with human behavior. She argues that understanding how people form their preferences is an integral part of designing a system to make recommendations.

One way Winecoff is testing how true the predictions are to human preferences is employing psychological studies to gain insight in how to fine-tune mathematical-based recommendations.

With a general goal of determining how humans determine similarity in clothes, Winecoff designed an online study where subjects are presented with a piece of clothing and told the garment is out of stock. They are then presented with two options and must pick one to replace the out-of-stock item. By varying one aspect in each of the two choices, like different color, pattern, or skirt length, Winecoff and her colleagues can distinguish which traits are most salient to a person when determining similarity.

Winecoff’s work illustrates the power of combining algorithmic recommendations with social psychological outcomes, and that science reaches into unexpected places, like influencing your shopping choices.  

Post by undergraduate blogger Sarah Haurin
Post by undergraduate blogger Sarah Haurin

Magazine Covers Hew to Stereotypes, But Also Surprise

Data + Women’s Spaces

Media plays a large role in the lives of most people. It’s everywhere. Even if you don’t actively purchase magazines, you are exposed to the covers in daily life. They are at newsstands, in grocery stores, in waiting rooms, online and more. Intrigued by the messages embedded in magazine covers, Nathan Liang (psychology, statistics), Sandra Luksic (philosophy, political science) and Alexis Malone (statistics) sought out to understand how women are represented in media as a part of a research project in the Data+ program.

Data+ is one of the many summer research opportunities at Duke. It’s a 10-week program focused on data science that allows undergraduate students to explore different research topics using data-driven approaches. Students work collaboratively in small interdisciplinary teams and develop skills to marshal, analyze, and visualize data.

The team’s project, titled Women’s Spaces, focused on a primary research question: Which messages are pervasive in women’s and men’s magazines and how do these messages change over time, across magazines, and between different target audiences.

Together, the team analyzed 500+ magazine covers published between January 2010 and June 2018, from Cosmopolitan, Esquire, Essence, Good Housekeeping and Seventeen. They used image analysis, text analysis and sentiment analysis in order to understand how women are represented on the magazine covers.

To conduct image analysis the team used Microsoft Azure Face Detect with Python in order to identify cover models. This software accounted for perceived emotions, age and race. They also noted the race/ethnicity and hair length of the cover models. Their research revealed that excluding Essence, 85 percent of magazine covers were white and had below average body sizes. One specific thing they found was that men had a greater range of emotions while women seemed to always appear happy. Furthermore, there was less emotional variance among minorities and in general, no Asian men. However, they did note that there may have been a software bias in that Microsoft Azure may not have picked up as well on the emotions of minorities.

In order to conduct text analysis, the team had to self-type the text on the magazine covers because oftentimes the text on magazine covers was layered on top of images making it hard for software to detect. This reduced the number of magazines that they were able to analyze because it took up so much time. They then used a Term Frequency-Inverse Document Frequency (tf-idf) algorithm to determine both how often a term occurred on the cover how important a term was. Their results revealed several keywords associated with different magazines. Some of these include sex (Cosmopolitan),  curvy, beauty, and business (Essence), cooking, cleaning, and kitchen (GH), cute (Seventeen), and cars, America, and Barbeque (Esquire)

Tf-idf word cloud for all magazines

Lastly, they conducted a sentiment analysis. Sentiment analysis involved computationally identifying the opinions expressed in the magazine covers to determine their attitude on the topic being displayed. While sentiment libraries exist, there were not any that had magazine/advertising industry-specific sentiments and thus, were not usable for the research. As a result, the team created their own sentiment dictionary with categories like “positive,” “negative,” “sex,” “sell-words,” “appearance,” “home,” “professional,” “male” and “female.”

At the end of the summer, their main takeaway was that magazines tend to reinforce gender norms and stereotypes. The covers also backed up some of the established preconceived notions they had about magazines. However, they also discovered messages of empowerment. Interestingly, these were often connected to beauty as well as consumerism.

In a presentation, the team explained that one of the lessons they took away from the summer was that Data science is not objective, but biases are hard to spot. They noted that throughout the process they made sure to question their methodologies of analyzing data. It was particularly challenging to determine where the biases were coming into play: be it their questions, data sources or even understanding of feminism. Because of the interdisciplinary nature of the project, combining humanities with data science, the team was academically diverse. Luksic stated in the presentation that she, especially, came in skeptical of the idea that technology was assumed to be “objective”.

Luksic added, “It’s one thing to know, on a abstract level, that data science is not objective. It is another thing entirely to try to do or practice data science in a way that minimizes your subjectivities. Ultimately, we hope for a data science that can incorporate subjectivity in a way that emphasizes differences, such as between black-centered feminism and anti-black feminism.”

The discoveries made by the team play into a larger discussion about women’s roles in media and how that influences feminism and empowerment in relation to marketing and how that impacts women’s movements.

Luksic stated, “the versatility of data science allowed us to pursue multiple different paths with different conceptions of feminisms underlying them, which was exciting and empowering.”

By Anna Gotskind

Math on the Basketball Court

Boston Celtics data analyst David Sparks, Ph.D, really knew his audience Thursday, November 8, when he gave a presentation centered around the two most important themes at Duke: basketball and academics. He gave the crowd hope that you don’t have to be a Marvin Bagley III to make a career out of basketball — in fact, you don’t have to be an athlete at all; you can be a mathematician.

David Sparks (photo from Duke Political Science)

Sparks loves basketball, and he spends every day watching games and practices for his job. What career fits this description, you might ask? After graduating from Duke in 2012 with a Ph.D. in Political Science, Sparks went to work for the Boston Celtics, as the Director of Basketball Analytics. His job entails analyzing basketball data and building statistical models to ensure that the team will win.

The most important statistic when looking at basketball data is offensive / defensive efficiency, Sparks told the audience gathered for the “Data Dialogue” series hosted by the Information Initiative at Duke. Offensive efficiency translates to the number of points per possession while defensive efficiency measures how poorly the team forced the other offense to perform. These are measured with four factors: effective field goal percentage (shots made/ shots taken), turnover rate, successful rebound percentage, and foul rate. By looking at these four factors for both offensive and defensive efficiency, Sparks can figure out which of these areas are lacking, and share with the coach where there is room for improvement. “We all agree that we want to win, and the way you win is through efficiency,” Sparks said.

Since there is not a lot of room for improvement in the short windows between games during the regular season, a large component of Sparks’ job involves informing the draft and how the team should run practices during preseason.

David Sparks wins over his audience by showing Duke basketball clips to illustrate a point. Sparks spoke as part of the “Data Dialogue” series hosted by the Information Initiative at Duke.

Data collection these days is done by computer software. Synergy Sports Technology, the dominant data provider in professional basketball, has installed cameras in all 29 NBA arenas. These cameras are constantly watching and coding plays during games, tracking the locations of each player and the movements of the ball. They can analyze the amount of times the ball was touched and determine how long it was possessed each time, or recognize screens and calculate the height at which rebounds are grabbed. This software has revolutionized basketball analytics, because the implication of computer coding is that data scientists like Sparks can go back and look for new things later.

The room leaned in eagerly as Sparks finished his presentation, intrigued by the profession that is interdisciplinary at its core — an unlikely combination of sports and applied math. If math explains basketball, maybe we can all find a way to connect our random passions in the professional sphere.

Meet Dr. Sandra K. Johnson, Engineering “Hidden Figure”

When Dr. Sandra K. Johnson first tried her hand at electrical engineering during a summer institute in high school, she knew that she was born to be an electrical engineer. Now, as the first African-American woman to receive a Ph.D. in computer engineering in the United States, Johnson visited Duke to share her story as a “hidden figure” and inspire not just black women, but all students not to be discouraged by obstacles they may face in pursuit of their passion.

Though she did discuss her achievements, Johnson’s talk also made it clear that more than successes, it was the opposition she faced that most motivated her to persevere in electrical engineering. While pursuing a Master’s degree at Stanford, she met Dr. William Shockley, who in his free time was conducting research he believed would prove that African Americans were intellectually inferior to other races. Johnson had originally been planning on just finishing her program with a Master’s and then going into the workforce, but after hearing what this man was trying to prove, she decided she would prove to him that she was capable of doing anything that the non-black students in the same program could do. She finished the program with a Ph.D. in electrical engineering. She continued to make this declaration to anyone who didn’t believe she was capable: “before I leave this place, I will make a believer out of you.”

Dr. Johnson is the founder, CTO and CEO of Global Mobile Finance, Inc., a finance and tech startup based in Research Triangle Park, NC. Photo from BlackComputeHER.

While mapping out her own path to pursuing her goals, Johnson also firmly believed in making the path easier for other black people pursuing advanced degrees. When asked what the current generation of students could be doing to help themselves, she said to find mentors and to mentor others. Johnson shared an anecdote of sitting in a lab at Stanford waiting to begin an experiment when a man walked up to her and said she was in the wrong place. After talking to him for several minutes and showing him that she knew even more about the subject than he did and was in the right place, she told him that the next time someone who looked like her walked into the lab, not to be so sure of himself. Johnson went on to become an IBM Fellow, an IEEE Fellow, and a member of the prestigious Academy of Electrical Engineers. At the end of her talk, Johnson discussed what she believes is the best way to expedite change — to have people of color as founders and CEOs of major corporations that have the power to increase minority representation in their workforce. This is what she intends to do with her own company, Global Mobile Finance, Inc. If her current track record is any indication, there is no doubt her company will become a major corporation in the years to come, opening more doors for black women and other minorities pursuing their passions.

Post by Victoria Priester

Cracking the Code on Credit Cards at Datathon 2018

Anyone who has ever tried to formulate and answer their own research question knows that it means entering uncharted waters. This past weekend the hundreds of students in Duke Datathon 2018 did just that, using only their computer science prowess and a splash of innovation.

Here’s how it worked: the students were provided three data sets by Credit Sesame, a free credit score estimator, and given eight hours to use their insight and computer science knowledge to interpret the data and create as much value for the company as they could. Along the way, Duke Undergraduate Machine Learning (DUML), the organization hosting the event, provided mentors and workshops to help the participants find direction and achieve their goals. 

Datathon participants attempting to derive meaning from the Credit Sesame Data

This year was the first such ‘Datathon’ event to take place at Duke. The event attracted big-name sponsors such as Google and Pinterest and was made possible by the DUML executive team, headed by co-presidents Rohith Kuditipudi and Shrey Gupta (to see a full list of event sponsors, click here).

DUML faculty advisor Dr. Rebecca Steorts said that even the planning of the event transcended disciplines: one of her undergraduate students and co-president of DUML, Shrey Gupta, found a way to utilize statistics to predict how many people would be attending. “It’s all about finding computational ways of combining disciplines to solve the problem,” Steorts said, and it’s very apparent that her students have taken this to heart.

The winning team (Jie Cai, Catie Grasse, Feroze Mohideen) presenting on how they can best gauge which customers are most “valuable” to Credit Sesame

After more than an hour of deliberations, the eight top teams were selected and five finalists were asked to present their findings to the judges. The winning team (Jie Cai, Catie Grasse, Feroze Mohideen) proposed a way to gauge which customers who create trial accounts are most likely to be profitable, by using a computer filtering program to predict likely customer engagement based on customer-supplied data and their interaction with the free trial. Other top teams discussed similar topics with different variations on how Credit Sesame might best create this profile to determine who the “valuable” customers are likely to be.

DUML hosts other events throughout the year to engage students such as their MLBytes Speaker Series and ECE Seminar Series. To learn more about Duke Undergraduate Machine Learning, click here.

by Rebecca Williamson

 

 

 

 

 

Page 1 of 6

Powered by WordPress & Theme by Anders Norén