Duke Research Blog

Following the people and events that make up the research community at Duke.

Category: Data (Page 1 of 4)

Opportunities at the Intersection of Technology and Healthcare

What’d you do this Halloween?

I attended a talk on the intersection of technology and healthcare by Dr. Erich Huang, who is an assistant professor of Biostatistics & Bioinformatics and Assistant Dean for Biomedical Informatics. He’s also the new co-director of Duke Forge, a health data science research group.

This was not a conventional Halloween activity by any means, but I felt lucky to be exposed to this impactful research surrounded by views of the Duke forest in fall in Penn Pavilion at IBM-Duke Day.

Erich Huang

Erich Huang, M.D., PhD. is the co-director of Duke Forge, our new health data effort.

Dr. Huang began his talk with a statistic: only six out of 53 landmark cancer biology research papers are reproducible. This fact was shocking (and maybe a little bit scary?), considering  that these papers serve as the foundation for saving cancer patients’ lives. Dr. Huang said that it’s time to raise standards for cancer research.

What is his proposed solution? Using data provenance, which is essentially a historical record of data and its origins, when dealing with important biomedical data.

He mentioned Duke Data Service (DukeDS), which is an information technology service that features data provenance for scientific workflows. With DukeDS, researchers are able to share data with approved team members across campus or across the world.

Next, Dr. Huang demonstrated the power of data science in healthcare by describing an example patient. Mr. Smith is 63 years old with a history of heart attacks and diabetes. He has been having trouble sleeping and his feet have been red and puffy. Mr. Smith meets the criteria for heart failure and appropriate interventions, such as a heart pump and blood thinners.

A problem that many patients at risk of heart failure face is forgetting to take their blood thinners. Using Pillsy, a company that makes smart pill bottles with automatic tracking, we could record Mr. Smith’s medication taking and record this information on the blockchain, or by storing blocks of information that are linked together so that each block points to an older version of that information. This type of technology might allow for the recalculation of dosage so that Mr. Smith could take the appropriate amount after a missed dose of a blood thinner.

These uses of data science, and specifically blockchain and data provenance, show great opportunity at the intersection of technology and healthcare. Having access to secure and traceable data can lead to research being more reproducible and therefore reliable.

At the end of his presentation, Dr. Huang suggested as much collaboration in research between IBM and Duke as possible, especially in his field. Seeing that the Research Triangle Park location of IBM is the largest IBM development site in the world and is conveniently located to one of the best research universities in the nation, his suggestion makes complete sense.

By Nina Cervantes        

Who Gets Sick and Why?

During his presentation as part of the Chautauqua lecture series, Duke sociologist Dr. Tyson Brown explained his research exploring the ways racial inequalities affect a person’s health later in life. His project mainly looks at the Baby Boomer generation, Americans born between 1946 and 1964.

With incredible increases in life expectancy, from 47 years in 1900 to 79 today, elderly people are beginning to form a larger percentage of the population. However among black people, the average life expectancy is three and a half years shorter.

“Many of you probably do not think that three and half years is a lot,” Brown said. “But imagine how much less time that is with your family and loved ones. In the end, I think all of us agree we want those extra three and a half years.”

Not only does the black population in America have shorter lives on average but they also tend to have sicker lives with higher blood pressures, greater chances of stroke, and higher probability of diabetes. In total, the number of deaths that would be prevented if African-American people had the same life expectancy as white people is 880,000 over a nine-year span. Now, the question Brown has challenged himself with is “Why does this discrepancy occur?”

Brown said he first concluded that health habits and behaviors do not create this life expectancy gap because white and black people have similar rates of smoking, drinking, and illegal drug use. He then decided to explore socioeconomic status. He discovered that as education increases, mortality decreases. And as income increases, self-rated health increases. He said that for every dollar a white person makes, a black person makes 59 cents.

This inequality in income points to the possible cause for the racial inequality in health, he said.  Additionally, in terms of wealth instead of income, a black person has 6 cents compared to the white person’s dollar. Possibly even more concerning than this inconsistency is the fact that it has gotten worse, not better, over time. Before the 2006 recession, blacks had 10-12 cents of wealth for every white person’s dollar.

Brown believes that this financial stress forms one of many stressors in black lives including chronic stressors, everyday discrimination, traumatic events, and neighborhood disorder which affect their health.

Over time, these stressors create something called physiological dysregulation, otherwise known as wear and tear, through repeated activation of  the stress response, he said. Recognition of the prevalence of these stressors in black lives has lead to Brown’s next focus on the extent of the effect of stressors on health. For his data, he uses the Health and Retirement Study and self-rated health (proven to predict mortality better than physician evaluations). For his methods, he employs structural equation modeling. Racial inequalities in socioeconomic resources, stressors and biomarkers of physiological dysregulation collectively explain 87% of the health gap with any number of causes capable of filling the remaining percentage.

Brown said his next steps include using longitudinal and macro-level data on structural inequality to understand how social inequalities “get under the skin” over a person’s lifetime. He suggests that the next steps for society, organizations, and the government to decrease this racial discrepancy rest in changing economic policy, increasing wages, guaranteeing work, and reducing residential segregation.

Post by Lydia Goff

Smoking Weed: the Good, Bad and Ugly

DURHAM, N.C. — Research suggests that the earlier someone is exposed to weed, the worse it is for them.

Very early on in our life, we develop basic motor and sensory functions. In adolescence, our teenage years, we start developing more complex functions — cognitive, social and emotional functions. These developments differ based on one’s experience growing up — their family, their school, their relationships — and are fundamental to our growth as healthy human beings.

This process has shown to be impaired when marijuana is introduced, according to Dr. Diana Dow-Edwards of SUNY Downstate Medical Center.

Sure, a lot of people may think marijuana isn’t so bad…but think again. At an Oct. 11 seminar at Duke’s Center on Addiction & Behavior Change, Dow-Edwards enlightened those who attended with correlations between smoking the reefer and things like IQ, psychosis and memory.

(https://media.makeameme.org/created/Littering-and-SMOKIN.jpg)

Dow-Edwards is currently a professor of physiology and pharmacology and clearly knows her stuff. She was throwing complicated graphs and large studies at us, all backing up her primary claim: the “dose-response relationship.” Basically the more you smoke (“dose”), the more of a biological effect it will have on you (“response”).

Looking at pot users after adolescence showed that occasionally smoking did not cause a big change in IQ, and frequently smoking affected IQ a little. However, looking at adults who smoked during adolescence correlated to a huge drop of around 7 IQ points for infrequent smokers and 10 points for frequent smokers. Here we see how both age and frequency play a role in weed’s effect on cognition. So if you are going to make the choice to light up, maybe wait until your executive functions mature around 24 years old.

Smoking weed earlier in life also showed a strong correlation with an earlier onset of psychosis, a very serious mental disorder in which you start to lose sense of reality. Definitely not good. I’m not trynna get diagnosed with psychosis any time soon!

One perhaps encouraging study for you smokers out there was that marijuana really had no effect on long-term memory. Non-smokers were better at verbal learning than heavy smokers…until after a three week abstinence break, where the heavy smokers’ memories recovered to match the control groups’. So while smoking weed when you have a test coming up maybe isn’t the best idea, there’s not necessarily a need to fear in the long run.

(Hanson et al, 2010)

A similar study showed that signs of depression and anxiety also normalized after 28 days of not smoking. Don’t get too hyped though, because even after the abstinence period, there was still “persistent impulsivity and reduced reward responses,” as well as a drop in attention accuracy.

A common belief about weed is that it is not addicting, but it actually is. What happens is that after repetitively smoking, feeling high no longer equates to feeling better than normal, but rather being sober equates to feeling worse than normal. This can lead to irritability, reduced appetite, and sleeplessness. Up to 1/2 of teens who smoke pot daily become dependent, and in broader terms, 9 percent of people who just experiment become dependent.

In summary, “marijuana interferes with normal brain development and maturation.” While it’s not going to kill you, it does effect your cognitive functions. Plus, you are at a higher risk for mental disorders like psychosis and future dependence. So choose wisely, my friends.

By Will Sheehan

Will Sheehan

Students Bring Sixty Years of Data to Life on the Web

For fields like environmental science, collecting data is hard.

Fall colors by Mariel Carr

Fall colors in the Hubbard Brook Experimental Forest, in New Hampshire’s White Mountains.

Gathering results on a single project can mean months of painstaking measurements, observations and notes, likely in limited conditions, hopefully to be published in a highly specialized journal with a target audience made up mostly of just other specialists in the field.

That’s why when, this past summer, Duke students Devri Adams, Camila Restrepo and Annie Lott set out with  graduate students Richard Marinos, Matt Ross and Professor Emily Bernhardt to combine over six decades of data on the Hubbard Brook Experimental Forest into a workable, aesthetically pleasing visualization website, they were really breaking new ground in the way the public can appreciate this truly massive store of information.

The site’s navigation shows users what kinds of data they might explore in beautiful fashion.

Spanning some 8,000 acres of New Hampshire’s sprawling White Mountain National Forest, Hubbard Brook has captured the thoughts and imaginations of generations of environmental researchers. Over 60 years of study and authorized experimentation in the region have brought us some of the longest continuous environmental data sets ever collected, tracking changes across a variety of factors for the second half of the 20th century.

Now, for the first time ever, this data has been brought together into a comprehensive, agile interface available to specialists and students alike. This website is developed with the user constantly in mind. At once in-depth and flexible, each visualization is designed so that a casual viewer can instantly grasp a variety of factors all at the same time—pH, water source, molecule size and more all made clearly evident from the structures of the graphs.

Additionally, this website’s axes can be as flexible as you need them to be; users can manipulate them to compare any two variables they want, allowing for easy study of all potential correlations.

All code used to build this website has been made entirely open source, and a large chunk of the site was developed with undergrads and high schoolers in mind. The team hopes to supplement textbook material with a series of five “data stories” exploring different studies done on the forest. The effects of acid rain, deforestation, dilutification, and calcium experimentation all come alive on the website’s interactive graphs, demonstrating the challenges and changes this forest has faced since studies on it first began.

The team hopes to have created a useful and user-friendly interface that’s easy for anyone to use. By bringing data out of the laboratory and onto the webpage, this project brings us one step further in the movement to make research accessible to and meaningful for the entire world.

Post by Daniel Egitto

Durham Traffic Data Reveal Clues to Safer Streets

Ghost bikes are a haunting site. The white-painted bicycles, often decorated with flowers or photographs, mark the locations where cyclists have been hit and killed on the street.

A white-painted bike next to a street.

A Ghost Bike located in Chapel Hill, NC.

Four of these memorials currently line the streets of Durham, and the statistics on non-fatal crashes in the community are equally sobering. According to data gathered by the North Carolina Department of Transportation, Durham county averaged 23 bicycle and 116 pedestrian crashes per year between 2011 and 2015.

But a team of Duke researchers say these grim crash data may also reveal clues for how to make Durham’s streets safer for bikers, walkers, and drivers.

This summer, a team of Duke students partnered with Durham’s Department of Transportation to analyze and map pedestrian, bicycle and motor vehicle crash data as part of the 10-week Data+ summer research program.

In the Ghost Bikes project, the team created an interactive website that allows users to explore how different factors such as the time-of-day, weather conditions, and sociodemographics affect crash risk. Insights from the data also allowed the team to develop policy recommendations for improving the safety of Durham’s streets.

“Ideally this could help make things safer, help people stay out of hospitals and save lives,” said Lauren Fox, a Duke cultural anthropology major who graduated this spring, and a member of the DATA+ Ghost Bikes team.

A map of Durham county with dots showing the locations of bicycle crashes

A heat map from the team’s interactive website shows areas with the highest density of bicycle crashes, overlaid with the locations of individual bicycle crashes.

The final analysis showed some surprising trends.

“For pedestrians the most common crash isn’t actually happening at intersections, it is happening at what is called mid-block crossings, which happen when someone is crossing in the middle of the road,” Fox said.

To mitigate the risks, the team’s Executive Summary includes recommendations to install crosswalks, median islands and bike lanes to roads with a high density of crashes.

They also found that males, who make up about two-thirds of bicycle commuters over the age of 16, are involved in 75% of bicycle crashes.

“We found that male cyclists over age 16 actually are hit at a statistically higher rate,” said Elizabeth Ratliff, a junior majoring in statistical science. “But we don’t know why. We don’t know if this is because males are riskier bikers, if it is because they are physically bigger objects to hit, or if it just happens to be a statistical coincidence of a very unlikely nature.”

To build their website, the team integrated more than 20 sets of crash data from a wide variety of different sources, including city, county, regional and state reports, and in an array of formats, from maps to Excel spreadsheets.

“They had to fit together many different data sources that don’t necessarily speak to each other,” said faculty advisor Harris Solomon, an associate professor of cultural anthropology and global health at Duke.  The Ghost Bikes project arose out of Solomon’s research on traffic accidents in India, supported by the National Science Foundation Cultural Anthropology Program.

In Solomon’s Spring 2017 anthropology and global health seminar, students explored the role of the ghost bikes as memorials in the Durham community. The Data+ team approached the same issues from a more quantitative angle, Solomon said.

“The bikes are a very concrete reminder that the data are about lives and deaths,” Solomon said. “By visiting the bikes, the team was able to think about the very human aspects of data work.”

“I was surprised to see how many stakeholders there are in biking,” Fox said. For example, she added, the simple act of adding a bike lane requires balancing the needs of bicyclists, nearby residents concerned with home values or parking spots, and buses or ambulances who require access to the road.

“I hadn’t seen policy work that closely in my classes, so it was interesting to see that there aren’t really simple solutions,” Fox said.

[youtube https://www.youtube.com/watch?v=YHIRqhdb7YQ&w=629&h=354]

 

Data+ is sponsored by Bass Connections, the Information Initiative at Duke, the Social Science Research Institute, the departments of Mathematics and Statistical Science and MEDx.

Other Duke sponsors include DTECH, Duke Health, Sanford School of Public Policy, Nicholas School of the Environment, Development and Alumni Affairs, Energy Initiative, Franklin Humanities Institute, Duke Institute for Brain Sciences, Office for Information Technology and the Office of the Provost, as well as the departments of Electrical & Computer Engineering, Computer Science, Biomedical Engineering, Biostatistics & Bioinformatics and Biology.

Government funding comes from the National Science Foundation. Outside funding comes from Accenture, Academic Analytics, Counter Tools and an anonymous donation.

Community partnerships, data and interesting problems come from the Durham Police Department, Durham Neighborhood Compass, Cary Institute of Ecosystem Studies, Duke Marine Lab, Center for Child and Family Policy, Northeast Ohio Medical University, TD Bank, Epsilon, Duke School of Nursing, University of Southern California, Durham Bicycle and Pedestrian Advisory Commission, Duke Surgery, MyHealth Teams, North Carolina Museum of Art and Scholars@Duke.

Writing by Kara Manke; video by Lauren Mueller and Summer Dunsmore

Pinpointing Where Durham’s Nicotine Addicts Get Their Fix

DURHAM, N.C. — It’s been five years since Durham expanded its smoking ban beyond bars and restaurants to include public parks, bus stops, even sidewalks.

While smoking in the state overall may be down, 19 percent of North Carolinians still light up, particularly the poor and those without a high school or college diploma.

Among North Carolina teens, consumption of electronic cigarettes in particular more than doubled between 2013 and 2015.

Now, new maps created by students in the Data+ summer research program show where nicotine addicts can get their fix.

Studies suggest that tobacco retailers are disproportionately located in low-income neighborhoods.

Living in a neighborhood with easy access to stores that sell tobacco makes it easier to start young and harder to quit.

The end result is that smoking, secondhand smoke exposure, and smoking-related diseases such as lung cancer, are concentrated among the most socially disadvantaged communities.

If you’re poor and lack a high school or college diploma, you’re more likely to live near a store that sells tobacco.

If you’re poor and lack a high school or college diploma, you’re more likely to live near a store that sells tobacco. Photo from Pixabay.

Where stores that sell tobacco are located matters for health, but for many states such data are hard to come by, said Duke statistics major James Wang.

Tobacco products bring in more than a third of in-store sales revenue at U.S. convenience stores — more than food, beverages, candy, snacks or beer. Despite big profits, more than a dozen states don’t require businesses to get a special license or permit to sell tobacco. North Carolina is one of them.

For these states, there is no convenient spreadsheet from the local licensing agency identifying all the businesses that sell tobacco, said Duke undergraduate Nikhil Pulimood. Previous attempts to collect such data in Virginia involved searching for tobacco retail stores by car.

“They had people physically drive across every single road in the state to collect the data. It took three years,” said team member and Duke undergraduate Felicia Chen.

Led by UNC PhD student in epidemiology Mike Dolan Fliss, the Duke team tried to come up with an easier way.

Instead of collecting data on the ground, they wrote an automated web-crawler program to extract the data from the Yellow Pages websites, using a technique called Web scraping.

By telling the software the type of business and location, they were able to create a database that included the names, addresses, phone numbers and other information for 266 potential tobacco retailers in Durham County and more than 15,500 statewide, including chains such as Family Fare, Circle K and others.

Map showing the locations of tobacco retail stores in Durham County, North Carolina.

Map showing the locations of tobacco retail stores in Durham County, North Carolina.

When they compared their web-scraped data with a pre-existing dataset for Durham County, compiled by a nonprofit called Counter Tools, hundreds of previously hidden retailers emerged on the map.

To determine which stores actually sold tobacco, they fed a computer algorithm data from more than 19,000 businesses outside North Carolina so it could learn how to distinguish say, convenience stores from grocery stores. When the algorithm received store names from North Carolina, it predicted tobacco retailers correctly 85 percent of the time.

“For example we could predict that if a store has the word “7-Eleven” in it, it probably sells tobacco,” Chen said.

As a final step, they also crosschecked their results by paying people a small fee to search for the stores online to verify that they exist, and call them to ask if they actually sell tobacco, using a crowdsourcing service called Amazon Mechanical Turk.

Ultimately, the team hopes their methods will help map the more than 336,000 tobacco retailers nationwide.

“With a complete dataset for tobacco retailers around the nation, public health experts will be able to see where tobacco retailers are located relative to parks and schools, and how store density changes from one neighborhood to another,” Wang said.

The team presented their work at the Data+ Final Symposium on July 28 in Gross Hall.

Data+ is sponsored by Bass Connections, the Information Initiative at Duke, the Social Science Research Institute, the departments of mathematics and statistical science and MEDx. This project team was also supported by Counter Tools, a non-profit based in Carrboro, NC.

Writing by Robin Smith; video by Lauren Mueller and Summer Dunsmore

Sizing Up Hollywood's Gender Gap

DURHAM, N.C. — A mere seven-plus decades after she first appeared in comic books in the early 1940s, Wonder Woman finally has her own movie.

In the two months since it premiered, the film has brought in more than $785 million worldwide, making it the highest grossing movie of the summer.

But if Hollywood has seen a number of recent hits with strong female leads, from “Wonder Woman” and “Atomic Blonde” to “Hidden Figures,” it doesn’t signal a change in how women are depicted on screen — at least not yet.

Those are the conclusions of three students who spent ten weeks this summer compiling and analyzing data on women’s roles in American film, through the Data+ summer research program.

The team relied on a measure called the Bechdel test, first depicted by the cartoonist Alison Bechdel in 1985.

Bechdel test

The “Bechdel test” asks whether a movie features at least two women who talk to each other about anything besides a man. Surprisingly, a lot of films fail. Art by Srravya [CC0], via Wikimedia Commons.

To pass the Bechdel test, a movie must satisfy three basic requirements: it must have at least two named women in it, they must talk to each other, and their conversation must be about something other than a man.

It’s a low bar. The female characters don’t have to have power, or purpose, or buck gender stereotypes.

Even a movie in which two women only speak to each other briefly in one scene, about nail polish — as was the case with “American Hustle” —  gets a passing grade.

And yet more than 40 percent of all U.S. films fail.

The team used data from the bechdeltest.com website, a user-compiled database of over 7,000 movies where volunteers rate films based on the Bechdel criteria. The number of criteria a film passes adds up to its Bechdel score.

“Spider Man,” “The Jungle Book,” “Star Trek Beyond” and “The Hobbit” all fail by at least one of the criteria.

Films are more likely to pass today than they were in the 1970s, according to a 2014 study by FiveThirtyEight, the data journalism site created by Nate Silver.

The authors of that study analyzed 1,794 movies released between 1970 and 2013. They found that the number of passing films rose steadily from 1970 to 1995 but then began to stall.

In the past two decades, the proportion of passing films hasn’t budged.

Since the mid-1990s, the proportion of films that pass the Bechdel test has flatlined at about 50 percent.

Since the mid-1990s, the proportion of films that pass the Bechdel test has flatlined at about 50 percent.

The Duke team was also able to obtain data from a 2016 study of the gender breakdown of movie dialogue in roughly 2,000 screenplays.

Men played two out of three top speaking roles in more than 80 percent of films, according to that study.

Using data from the screenplay study, the students plotted the relationship between a movie’s Bechdel score and the number of words spoken by female characters. Perhaps not surprisingly, films with higher Bechdel scores were also more likely to achieve gender parity in terms of speaking roles.

“The Bechdel test doesn’t really tell you if a film is feminist,” but it’s a good indicator of how much women speak, said team member Sammy Garland, a Duke sophomore majoring in statistics and Chinese.

Previous studies suggest that men do twice as much talking in most films — a proportion that has remained largely unchanged since 1995. The reason, researchers say, is not because male characters are more talkative individually, but because there are simply more male roles.

“To close the gap of speaking time, we just need more female characters,” said team member Selen Berkman, a sophomore majoring in math and computer science.

Achieving that, they say, ultimately comes down to who writes the script and chooses the cast.

The team did a network analysis of patterns of collaboration among 10,000 directors, writers and producers. Two people are joined whenever they worked together on the same movie. The 13 most influential and well-connected people in the American film industry were all men, whose films had average Bechdel scores ranging from 1.5 to 2.6 — meaning no top producer is regularly making films that pass the Bechdel test.

“What this tells us is there is no one big influential producer who is moving the needle. We have no champion,” Garland said.

Men and women were equally represented in fewer than 10 percent of production crews.

But assembling a more gender-balanced production team in the early stages of a film can make a difference, research shows. Films with more women in top production roles have female characters who speak more too.

“To better represent women on screen you need more women behind the scenes,” Garland said.

Dollar for dollar, making an effort to close the Hollywood gender gap can mean better returns at the box office too. Films that pass the Bechdel test earn $2.68 for every dollar spent, compared with $2.45 for films that fail — a 23-cent better return on investment, according to FiveThirtyEight.

Other versions of the Bechdel test have been proposed to measure race and gender in film more broadly. The advantage of analyzing the Bechdel data is that thousands of films have already been scored, said English major and Data+ team member Aaron VanSteinberg.

“We tried to watch a movie a week, but we just didn’t have time to watch thousands of movies,” VanSteinberg said.

A new report on diversity in Hollywood from the University of Southern California suggests the same lack of progress is true for other groups as well. In nearly 900 top-grossing films from 2007 to 2016, disabled, Latino and LGBTQ characters were consistently underrepresented relative to their makeup in the U.S. population.

Berkman, Garland and VanSteinberg were among more than 70 students selected for the 2017 Data+ program, which included data-driven projects on photojournalism, art restoration, public policy and more.

They presented their work at the Data+ Final Symposium on July 28 in Gross Hall.

Data+ is sponsored by Bass Connections, the Information Initiative at Duke, the Social Science Research Institute, the departments of mathematics and statistical science and MEDx. 

Writing by Robin Smith; video by Lauren Mueller and Summer Dunsmore

Mapping Electricity Access for a Sixth of the World's People

DURHAM, N.C. — Most Americans can charge their cell phones, raid the fridge or boot up their laptops at any time without a second thought.

Not so for the 1.2 billion people — roughly 16 percent of the world’s population — with no access to electricity.

Despite improvements over the past two decades, an estimated 780 million people will still be without power by 2030, especially in rural parts of sub-Saharan Africa, Asia and the Pacific.

To get power to these people, first officials need to locate them. But for much of the developing world, reliable, up-to-date data on electricity access is hard to come by.

Researchers say remote sensing can help.

For ten weeks from May through July, a team of Duke students in the Data+ summer research program worked on developing ways to assess electricity access automatically, using satellite imagery.

“Ground surveys take a lot of time, money and manpower,” said Data+ team member Ben Brigman. “As it is now, the only way to figure out if a village has electricity is to send someone out there to check. You can’t call them up or put out an online poll, because they won’t be able to answer.”

India at night

Satellite image of India at night. Large parts of the Indian countryside still aren’t connected to the grid, but remote sensing, machine learning could help pinpoint people living without power. Credits: NASA Earth Observatory images by Joshua Stevens, using Suomi NPP VIIRS data from Miguel Román, NASA’s Goddard Space Flight Center

Led by researchers in the Energy Data Analytics Lab and the Sustainable Energy Transitions Initiative, “the initial goal was to create a map of India, showing every village or town that does or does not have access to electricity,” said team member Trishul Nagenalli.

Electricity makes it possible to pump groundwater for crops, refrigerate food and medicines, and study or work after dark. But in parts of rural India, where Nagenalli’s parents grew up, many households use kerosene lamps to light homes at night, and wood or animal dung as cooking fuel.

Fires from overturned kerosene lamps are not uncommon, and indoor air pollution from cooking with solid fuels contributes to low birth weight, pneumonia and other health problems.

In 2005, the Indian government set out to provide electricity to all households within five years. Yet a quarter of India’s population still lives without power.

Ultimately, the goal is to create a machine learning algorithm — basically a set of instructions for a computer to follow — that can recognize power plants, irrigated fields and other indicators of electricity in satellite images, much like the algorithms that recognize your face on Facebook.

Rather than being programmed with specific instructions, machine learning algorithms “learn” from large amounts of data.

This summer the researchers focused on the unsung first step in the process: preparing the training data.

Phoenix power plant

Satellite image of a power plant in Phoenix, Arizona

Fellow Duke students Gouttham Chandrasekar, Shamikh Hossain and Boning Li were also part of the effort. First they compiled publicly available satellite images of U.S. power plants. Rather than painstakingly framing and labeling the plants in each photo themselves, they tapped the powers of the Internet to outsource the task and hired other people to annotate the images for them, using a crowdsourcing service called Amazon Mechanical Turk.

So far, they have collected more than 8,500 image annotations of different kinds of power plants, including oil, natural gas, hydroelectric and solar.

The team also compiled firsthand observations of the electrification rate for more than 36,000 villages in the Indian state of Bihar, which has one of the lowest electrification rates in the country. For each village, they also gathered satellite images showing light intensity at night, along with density of green land and other indicators of irrigated farms, as proxies for electricity consumption.

Using these data sets, the goal is to develop a computer algorithm which, through machine learning, teaches itself to detect similar features in unlabeled images, and distinguishes towns and villages that are connected to the grid from those that aren’t.

“We would like to develop our final algorithm to essentially go into a developing country and analyze whether or not a community there has access to electricity, and if so what kind,” Chandrasekar said.

Electrification map of Bihar, India

The proportion of households connected to the grid in more than 36,000 villages in Bihar, India

The project is far from finished. During the 2017-2018 school year, a Bass Connections team will continue to build on their work.

The summer team presented their research at the Data+ Final Symposium on July 28 in Gross Hall.

Data+ is sponsored by Bass Connections, the Information Initiative at Duke, the Social Science Research Institute, the departments of mathematics and statistical science and MEDx. This project team was also supported by the Duke University Energy Initiative.

Writing by Robin Smith; video by Lauren Mueller and Summer Dunsmore

Energy Program on Chopping Block, But New Data Suggest It Works

Duke research yields new data about energy efficiency program slated for elimination

Do energy efficiency “audits” really benefit companies over time? An interdisciplinary team of Duke researchers (economist Gale Boyd, statistician Jerome “Jerry” Reiter, and doctoral student Nicole Dalzell) have been tackling this question as it applies to a long-running Department of Energy (DOE) effort that is slated for elimination under President Trump’s proposed budget.

Evaluating a long-running energy efficiency effort

Since 1976, the DOE’s Industrial Assessments Centers (IAC) program has aimed to help small- and medium-sized manufacturers to become more energy-efficient by providing free energy “audits” from universities across the country. (Currently, 28 universities take part, including North Carolina State University.)

Gale Boyd

Gale Boyd is a Duke Economist

The Duke researchers’ project, supported by an Energy Research Seed Fund grant, has yielded a statistically sound new technique for matching publicly available IAC data with confidential plant information collected in the U.S. Census of Manufacturing (CMF).

The team has created a groundbreaking linked database that will be available in the Federal Statistical Research Data Center network for use by other researchers. Currently the database links IAC data from 2007 and confidential plant data from the 2012 CMF, but it can be expanded to include additional years.

The team’s analysis of this linked data indicate that companies participating in the DOE’s IAC program do become more efficient and improve in efficiency ranking over time when compared to peer companies in the same industry. Additional analysis could reveal the characteristics of companies that benefit most and the interventions that are most effective.

Applications for government, industry, utilities, researchers

This data could be used to inform the DOE’s IAC program, if the program is not eliminated.

But the data have other potential applications, too, says Boyd.

Individual companies who took part in the DOE program could discover the relative yields of their own energy efficiency measures: savings over time as well as how their efficiency ranking among peers has shifted.

Researchers, states, and utilities could use the data to identify manufacturing sectors and types of businesses that benefit most from information about energy efficiency measures, the specific measures connected with savings, and non-energy benefits of energy efficiency, e.g. on productivity.

Meanwhile, the probabilistic matching techniques developed as part of the project could help researchers in a range of fields—from public health to education—to build a better understanding of populations by linking data sets in statistically sound ways.

An interdisciplinary team leveraging Duke talent and resources

Boyd—a Duke economist who previously spent two decades doing applied policy evaluation at Argonne National Laboratory—has been using Census data to study energy efficiency and productivity for more than fifteen years. Boyd has co-appointments in Duke’s Social Science Research Institute and Department of Economics. He now directs the Triangle Research Data Center (TRDC), a partnership between the U.S. Census Bureau and Duke University in cooperation with the University of North Carolina and Research Triangle Institute.

The TRDC (located in Gross Hall for Interdisciplinary Innovation) is one of more than 30 locations in the country where researchers can access the confidential micro-data collected by the Federal Statistical System.

Jerry Reiter is a Duke statistician.

Jerry Reiter is a professor in Duke’s Department of Statistical Science, associate director of the Information Initiative at Duke (iiD), and a Duke alumnus (B.S’92). Reiter was dissertation supervisor for Nicole Dalzell, who completed her Ph.D. at Duke this spring and will be an assistant teaching professor in the Department of Mathematics and Statistics at Wake Forest University in the fall.

Boyd reports, “The opportunity to work in an interdisciplinary team with Jerry (one of the nation’s leading researchers on imputation and synthetic data) and Nicole (one of Duke’s bright new minds in this field) has opened my eyes a bit about how cavalier some researchers are with respect to uncertainty when we link datasets. Statisticians’ expertise in these areas can help the rest of us do better research, making it as sound and defensible as possible.”

What’s next for the project

The collaboration was made by possible by the Duke University Energy Initiative’s Energy Research Seed Fund, which supports new interdisciplinary research teams to secure preliminary results that can help secure external funding. The grant was co-funded by the Pratt School of Engineering and Information Initiative at Duke (iiD).

Given the potential uses of the team’s results by the private sector (particularly by electric utilities), other funding possibilities are likely to emerge.

Boyd, Reiter, and Dalzell have submitted an article to the journal Energy Policy and are discussing future research application of this data with colleagues in the field of energy efficiency and policy. Their working paper is available as part of the Environmental and Energy Economics Working Paper Series organized by the Energy Initiative and the Nicholas Institute for Environmental Policy Solutions.

Energy Efficiency Graphic

For more information, contact Gale Boyd: gale.boyd@duke.edu.

Guest Post from Braden Welborn, Duke University Energy Initiative

Students Share Research Journeys at Bass Connections Showcase

From the highlands of north central Peru to high schools in North Carolina, student researchers in Duke’s Bass Connections program are gathering data in all sorts of unique places.

As the school year winds down, they packed into Duke’s Scharf Hall last week to hear one another’s stories.

Students and faculty gathered in Scharf Hall to learn about each other’s research at this year’s Bass Connections showcase. Photo by Jared Lazarus/Duke Photography.

The Bass Connections program brings together interdisciplinary teams of undergraduates, graduate students and professors to tackle big questions in research. This year’s showcase, which featured poster presentations and five “lightning talks,” was the first to include teams spanning all five of the program’s diverse themes: Brain and Society; Information, Society and Culture; Global Health; Education and Human Development; and Energy.

“The students wanted an opportunity to learn from one another about what they had been working on across all the different themes over the course of the year,” said Lori Bennear, associate professor of environmental economics and policy at the Nicholas School, during the opening remarks.

Students seized the chance, eagerly perusing peers’ posters and gathering for standing-room-only viewings of other team’s talks.

The different investigations took students from rural areas of Peru, where teams interviewed local residents to better understand the transmission of deadly diseases like malaria and leishmaniasis, to the North Carolina Museum of Art, where mathematicians and engineers worked side-by-side with artists to restore paintings.

Machine learning algorithms created by the Energy Data Analytics Lab can pick out buildings from a satellite image and estimate their energy consumption. Image courtesy Hoël Wiesner.

Students in the Energy Data Analytics Lab didn’t have to look much farther than their smart phones for the data they needed to better understand energy use.

“Here you can see a satellite image, very similar to one you can find on Google maps,” said Eric Peshkin, a junior mathematics major, as he showed an aerial photo of an urban area featuring buildings and a highway. “The question is how can this be useful to us as researchers?”

With the help of new machine-learning algorithms, images like these could soon give researchers oodles of valuable information about energy consumption, Peshkin said.

“For example, what if we could pick out buildings and estimate their energy usage on a per-building level?” said Hoël Wiesner, a second year master’s student at the Nicholas School. “There is not really a good data set for this out there because utilities that do have this information tend to keep it private for commercial reasons.”

The lab has had success developing algorithms that can estimate the size and location of solar panels from aerial photos. Peshkin and Wiesner described how they are now creating new algorithms that can first identify the size and locations of buildings in satellite imagery, and then estimate their energy usage. These tools could provide a quick and easy way to evaluate the total energy needs in any neighborhood, town or city in the U.S. or around the world.

“It’s not just that we can take one city, say Norfolk, Virginia, and estimate the buildings there. If you give us Reno, Tuscaloosa, Las Vegas, Pheonix — my hometown — you can absolutely get the per-building energy estimations,” Peshkin said. “And what that means is that policy makers will be more informed, NGOs will have the ability to best service their community, and more efficient, more accurate energy policy can be implemented.”

Some students’ research took them to the sidelines of local sports fields. Joost Op’t Eynde, a master’s student in biomedical engineering, described how he and his colleagues on a Brain and Society team are working with high school and youth football leagues to sort out what exactly happens to the brain during a high-impact sports game.

While a particularly nasty hit to the head might cause clear symptoms that can be diagnosed as a concussion, the accumulation of lesser impacts over the course of a game or season may also affect the brain. Eynde and his team are developing a set of tools to monitor both these impacts and their effects.

A standing-room only crowd listened to a team present on their work “Tackling Concussions.” Photo by Jared Lazarus/Duke Photography.

“We talk about inputs and outputs — what happens, and what are the results,” Eynde said. “For the inputs, we want to actually see when somebody gets hit, how they get hit, what kinds of things they experience, and what is going on in the head. And the output is we want to look at a way to assess objectively.”

The tools include surveys to estimate how often a player is impacted, an in-ear accelerometer called the DASHR that measures the intensity of jostles to the head, and tests of players’ performance on eye-tracking tasks.

“Right now we are looking on the scale of a season, maybe two seasons,” Eynde said. “What we would like to do in the future is actually follow some of these students throughout their career and get the full data for four years or however long they are involved in the program, and find out more of the long-term effects of what they experience.”

Kara J. Manke, PhD

Post by Kara Manke

Page 1 of 4

Powered by WordPress & Theme by Anders Norén