Category: Data Page 4 of 11

A New Algorithm for “In-Betweening” images applied to Covid, Aging and Continental Drift

On April 14, 2021

In Computers/Technology, Data, Faculty, Guest Post, Mathematics, Physics, Visualization

Collaborating with a colleague in Shanghai, we recently published an article that explains the mathematical concept of ‘in-betweening,’in images – calculating intermediate stages of changes in appearance from one image to the next.

Our equilibrium-driven deformation algorithm (EDDA) was used to demonstrate three difficult tasks of ‘in-betweening’ images: Facial aging, coronavirus spread in the lungs, and continental drift.

Part I. Understanding Pneumonia Invasion and Retreat in COVID-19

The pandemic has influenced the entire world and taken away nearly 3 million lives to date. If a person were unlucky enough to contract the virus and COVID-19, one way to diagnose them is to carry out CT scans of their lungs to visualize the damage caused by pneumonia.

However, it is impossible to monitor the patient all the time using CT scans. Thus, the invading process is usually invisible for doctors and researchers.

To solve this difficulty, we developed a mathematical algorithm which relies on only two CT scans to simulate the pneumonia invasion process caused by COVID-19.

We compared a series of CT scans of a Chinese patient taken at different times. This patient had severe pneumonia caused by COVID-19 but recovered after a successful treatment. Our simulation clearly revealed the pneumonia invasion process in the patient’s lungs and the fading away process after the treatment.

Our simulation results also identify several significant areas in which the patient’s lungs are more vulnerable to the virus and other areas in which the lungs have better response to the treatment. Those areas were perfectly consistent with the medical analysis based on this patient’s actual, real-time CT scan images. The consistency of our results indicates the value of the method.

The COVID-19 pneumonia invading (upper panel) and fading away (lower panel) process from the data-driven simulations. Red circles indicate four significant areas in which the patient’s lungs were more vulnerable to the pneumonia and blue circles indicate two significant areas in which the patient’s lungs had better response to the treatment. (Image credit: Gao et al., 2021)

We also applied this algorithm to simulate human facial changes over time, in which the aging processes for different parts of a woman’s face were automatically created by the algorithm with high resolution. (Image credit: Gao et al., 2021. Video)

Part II. Solving the Puzzle of Continental Drift

It has always been mysterious how the continents we know evolved and formed from the ancient single supercontinent, Pangaea. But then German polar researcher Alfred Wegener proposed the continental drift hypothesis in the early 20th century. Although many geologists argued about his hypothesis initially, more sound evidence such as continental structures, fossils and the magnetic polarity of rocks has supported Wegener’s proposition.

Our data-driven algorithm has been applied to simulate the possible evolution process of continents from Pangaea period.

The underlying forces driving continental drift were determined by the equilibrium status of the continents on the current planet. In order to describe the edges that divide the land to create oceans, we proposed a delicate thresholding scheme.

The formation and deformation for different continents is clearly revealed in our simulation. For example, the ‘drift’ of the Antarctic continent from Africa can be seen happening. This exciting simulation presents a quick and obvious way for geologists to establish more possible lines of inquiry about how continents can drift from one status to another, just based on the initial and equilibrium continental status. Combined with other technological advances, this data-driven method may provide a path to solve Wegener’s puzzle of continental drift.

The theory of continental drift reconciled similar fossil plants and animals now found on widely separated continents. The southern part after Pangaea breaks (Gondwana) is shown here evidence of Wegener’s theory. (Image credit: United States Geological Survey)

The continental drift process of the data-driven simulations. Black arrow indicates the formation of the Antarctic. (Image credit: Gao et al., 2021)

The study was supported by the Department of Mathematics and Physics, Duke University.

CITATION: “Inbetweening auto-animation via Fokker-Planck dynamics and thresholding,” Yuan Gao, Guangzhen Jin & Jian-Guo Liu. Inverse Problems and Imaging, February, 2021, DOI: 10.3934/ipi.2021016. Online: http://www.aimsciences.org/article/doi/10.3934/ipi.2021016

Yuan Gao is the William W. Elliot Assistant Research Professor in the department of mathematics, Trinity College of Arts & Sciences.

Jian-Guo Liu is a Professor in the departments of mathematics and physics, Trinity College of Arts & Sciences.

Using Data Science for Early Detection of Autism

By Anna Gotskind

On March 31, 2021

In Behavior/Psychology, Computers/Technology, Data, Faculty, Medicine, Neuroscience

Autism Spectrum Disorder can be detected as early as six to twelve months old and the American Academy of Pediatrics recommends all children be screened between twelve and eighteen months of age.

But most diagnoses happen after the age of 4, and later detection makes it more difficult and expensive to treat.

One in 40 children is diagnosed with Autism Spectrum Disorder and Duke currently serves about 3,000 ASD patients per year. To improve care for patients with ASD, Duke researchers have been working to develop a data science approach to early detection.

Geraldine Dawson, the William Cleland Distinguished Professor in the Department of Psychiatry & Behavioral Sciences and Director of the Duke Center for Autism and Brain Development, and Dr. Matthew Engelhard, a Conners Fellow in Digital Health in Psychiatry & Behavioral Sciences, recently presented on the advances being made to improve ASD detection and better understand symptoms.

The earlier ASD is detected, the easier and less expensive it is to treat. Children with ASD face challenges in learning and social environments.

ASD differs widely from case to case, however. For most people, ASD makes it difficult to navigate the social world, and those with the diagnosis often struggle to understand facial expressions, maintain eye contact, and develop strong peer relations.

However, ASD also has many positive traits associated with it and autistic children often show unique skills and talents. Receiving a diagnosis is important for those with ASD so that they can receive learning accommodations and ensure that their environment helps promote growth.

Because early detection is so helpful researchers began to ask:

“Can digital behavioral assessments improve our ability to screen for neurodevelopmental disorders and monitor treatment outcomes?”
Dr. geraldine DawsoN

The current approach for ASD detection is questionnaires given to parents. However, there are many issues in this method of detection such as literacy and language barriers as well as requiring caregivers to have some knowledge of child development. Recent studies have demonstrated that digital assessments could potentially address these challenges by allowing for direct observation of the child’s behavior as well as the ability to capture the dynamic nature of behavior, and collect more data surrounding autism.

“Our goal is to reduce disparities in access to screening and enable earlier detection of ASD by developing digital behavioral screening tools that are scalable, feasible, and more accurate than current paper-and-pencil questionnaires that are standard of care.”
Dr. Geraldine Dawson

Guillermo Sapiro, a James B. Duke Distinguished Professor of Electrical and Computer Engineering, and his team have developed an app to do just this.

On the app, videos are shown to the child on an iPad or iPhone that prompt the child’s reaction through various stimuli. These are the same games and stimuli typically used in ASD diagnostic evaluations in the clinic. As they watch and interact, the child’s behavior is measured with the iPhone/iPad’s selfie camera. Some behavioral symptoms can be detected as early as six months of age are, such as: not paying as much attention to people, reduced affective expression, early motor differences, and failure to orient to name.

In the proof-of-concept study, computers were programmed to detect a child’s response to hearing their name called. The child’s name was called out by the examiner three times while movies were shown. Toddlers with ASD demonstrated about a second of latency in their responses.

Another study used gaze monitoring on an iPhone. Nearly a thousand toddlers were presented with a split screen where a person was on one side of the screen and toys were on the other. Typical toddlers shifted their gaze between the person and toy, whereas the autistic toddlers focused more on the toys. Forty of the toddlers involved in the study received an ASD diagnosis. Using eye gaze, researchers were also able to look at how toddlers responded to speech sounds as well as to observe early motor differences because toddlers with ASD frequently show postural sway (a type of head movement).

“The idea behind the app is to begin to combine all of these behaviors to develop a much more robust ASD algorithm. We do believe no one feature will allow us to detect ASD in developing children because there is so much variation”
DR. GERALDINE DAWSON

The app has multiple features and will allow ASD detection to be done in the home. Duke researchers are now one step away from launching an at-home study. Other benefits of this method include the ability to observe over time with parents collecting data once a month. In the future, this could be used in a treatment study to see if symptoms are improving.

Duke’s ASD researchers are also working to integrate information from the app with electronic health records (EHR) to see if information collected from routine medical care before age 1 can help with detection.

The SolarWinds Attack and the Future of Cybersecurity

By Anna Gotskind

On March 2, 2021

In Business/Economics, Computers/Technology, Data, Faculty, News Release, Policy, Science Communication & Education

Cybersecurity is the protection of computer systems and networks in order to prevent theft of or damage to their hardware, software, or electronic data. While cybersecurity has been around since the 1970s, its importance and relevance in mainstream media as well as politics is growing as an increased amount of information is stored electronically. In 1986, approximately 1% of the world’s information was stored in a digital format; by 2006, just twenty years later, this had increased to 94%.

Cyber Hacking has also become more prominent with the advent of the Digital Revolution and the start of the Information Era which began in the 1980s and rapidly grew in the early 2000s. It became an effective political form of attack to acquire confidential information from foreign countries.

In mid-December of 2020, it was revealed that several U.S. companies and even government agencies were victims of a cyberattack that began in September of 2019.

The Sanford School of Public Policy hosted a leading cybersecurity reporter Sean Lyngaas to lead a discussion on the national security implications of the SolarWinds hack with Sanford Professor David Hoffman as well as Visiting Scholar and Journalist Bob Sullivan. Lyngaas graduated from Duke in 2007 and majored in Public Policy at the Sanford School.

Lyngaas did not have a direct route into cybersecurity journalism. After completing his Masters in International Relations from The Fletcher School of Law and Diplomacy at Tufts University he moved to Washington D.C. to pursue a career as a policy analyst. However, at night when he was not applying for jobs he began pitching stories to trade journals. Despite not being a “super technical guy” Lyngaas ended up becoming passionate about cybersecurity and reporting on the increasing amounts of news surrounding the growing topic. Since 2012 Lyngaas has done extensive reporting on cybersecurity breaches and recently has published several detailed reports on the SolarWinds incident.

The SolarWinds attack is considered one of the most impactful cybersecurity events in history as a result of its intricacy and the number of government and private sector victims. Lyngaas explained that most people had not heard of SolarWinds until recently, but the company nevertheless, provides software to a multitude of fortune 500 companies and government agencies. One of the software products they sell is Orion, an IT performance monitoring platform that helps businesses manage and optimize their IT infrastructure. The Hackers infiltrated Orion’s update software and over several months sent out malicious updates to 18,000 companies and government agencies. Among the victims of this espionage campaign were the U.S. Justice Department and Microsoft. As a result of the campaign, countless email accounts were infiltrated and hacked.

“A perfect example of someone robbing a bank by knocking out the security guard and putting on his outfit to have access.”
Bob Sullivan

Sullivan added that this hack is particularly concerning because the target was personal information whereas previous large-scale hacks have been centered around breaching data. Additionally, SolarWind’s core business is not cybersecurity, however, they work with and provide software to many cybersecurity companies. The attack was revealed by FireEye, a cybersecurity company that announced they had been breached.

“FireEye got breached and they are the ones usually investigating the breaches”
Sean lyngaas

This situation has prompted both those involved in the cybersecurity industry as well as the public to reconsider the scope of cyberhacking and what can be done to prevent it.

“Computer spying by nation states has been going on for decades but we talk about it more openly now.” Lyngass stated.

Lyngaas added that the public is now expecting more transparency especially if there are threats to their information. He feels we need to have better standards for companies involved in cyber security. Solarwinds arguably was not using cybersecurity best practices and had recently made price cuts which may have contributed to their vulnerability. Hoffman explained that SolarWinds had been using an easy-to-guess password to their internal systems which allowed hackers access to the software update as well as the ability to sign a digital signature.

“We are not going to prevent these breaches; we are not going to prevent the Russians from cyber espionage.” Lyngaas stated

However, he believes by using best practices we can uncover these breaches earlier and react in a timely manner to reduce damage. Additionally, he thinks there needs to be a shift in government spending in terms of the balance between cyber defense and offense. Historically, there has been a lack of transparency in government cyber spending, however, it is known that there has been more spent on offense in the last several years.

Changes are starting to be made in the cybersecurity landscape that hopefully should aid in reducing attacks or at least the severity of their impacts. California recently created a law centered around publicizing breaches which will increase transparency. The panelists added that the increasing amount of news and information available to the public about cybersecurity is aiding efforts to understand and prevent it. President Biden was openly speaking about cybersecurity in relation to protecting the election from hackers and continues to consider it an urgent issue as it is crucial in order to protect confidential U.S. information.

As Lyngaas explained, it is practically impossible to completely prevent cyber attacks, however, through increasing transparency and using best practices, incidents like the SolarWinds hack will hopefully not have effects of the same scale again.

Post by Anna Gotskind

Increasing Access to Care with the Help of Big Data

By Cydney Livingston

On February 22, 2021

In Computers/Technology, Data, Global Health, Statistics

Artificial intelligence (AI) and data science have the potential to revolutionize global health. But what exactly is AI and what hurdles stand in the way of more widespread integration of big data in global health? Duke’s Global Health Institute (DGHI) hosted a Think Global webinar Wednesday, February 17^th to dive into these questions and more.

The webinar’s panelists were Andy Tatem (Ph.D), Joao Vissoci (Ph.D.), and Eric Laber (Ph.D.), moderated by DGHI’s Director of Research Design and Analysis Core, Liz Turner (Ph.D.). Tatem is a professor of spatial demography and epidemiology at the University of South Hampton and director of WorldPop. Vissoci is an assistant professor of surgery and global health at Duke University. Laber is a professor of statistical science and bioinformatics at Duke.

Left to right: Andy Tatem, Joao Vissoci, Eric Laber

Tatem, Vissoci, and Laber all use data science to address issues in the global health realm. Tatem’s work largely utilizes geospatial data sets to help inform global health decisions like vaccine distribution within a certain geographic area. Vissoci, who works with the GEMINI Lab at Duke (Global Emergency Medicine Innovation and Implementation Research), tries to leverage secondary data from health systems in order to understand issues of access to and distribution of care, as well as care delivery. Laber is interested in improving decision-making processes in healthcare spaces, attempting to help health professionals synthesize very complex data via AI.

All of their work is vital to modern biomedicine and healthcare, but, Turner said, “AI means a lot of different things to a lot of different people.” Laber defined AI in healthcare simply as using data to make healthcare better. “From a data science perspective,” Vissoci said, “[it is] synthesizing data … an automated way to give us back information.” This returned info is digestible trends and understandings derived from very big, very complex data sets. Tatem stated that AI has already “revolutionized what we can do” and said it is “powerful if it is directed in the right way.”

We often get sucked into a science-fiction version of AI, Laber said, but in actuality it is not some dystopian future but a set of tools that maximizes what can be derived from data.

However, as Tatem stated, “[AI] is not a magic, press a button” scenario where you get automatic results. A huge part of work for researchers like Tatem, Vissoci, and Laber is the “harmonization” of working with data producers, understanding data quality, integrating data sets, cleaning data, and other “back-end” processes.

This comes with many caveats.

“Bias is a huge problem,” said Laber. Vissoci reinforced this, stating that the models built from AI and data science are going to represent what data sources they are able to access – bias included. “We need better work in getting better data,” Vissoci said.

Further, there must be more up-front listening to and communication with “end-users from the very start” of projects, Tatem outlined. By taking a step back and listening, tools created through AI and data science may be better met with actual uptake and less skepticism or distrust. Vissoci said that “direct engagement with the people on the ground” transforms data into meaningful information.

Better structures for meandering privacy issues must also be developed. “A major overhaul is still needed,” said Laber. This includes things like better consent processes for patients’ to understand how their data is being used, although Tatem said this becomes “very complex” when integrating data.

Nonetheless the future looks promising and each panelist feels confident that the benefits will outweigh the difficulties that are yet to come in introducing big data to global health. One cool example Vissoci gave of an ongoing project deals with the influence of environmental change through deforestation in the Brazilian Amazon on the impacts of Indigenous populations. Through work with “heavy multidimensional data,” Vissoci and his team also have been able to optimize scarcely distributed Covid vaccine resource “to use in areas where they can have the most impact.”

Laber envisions a world with reduced or even no clinical trials if “randomization and experimentation” are integrated directly into healthcare systems. Tatem noted how he has seen extreme growth in the field in just the last 10 to 15 years, which seems only to be accelerating.

A lot of this work has to do with making better decisions about allocating resources, as Turner stated in the beginning of the panel. In an age of reassessment about equity and access, AI and data science could serve to bring both to the field of global health.

Post by Cydney Livingston

Student Team Quantifies Housing Discrimination in Durham

By Cydney Livingston

On February 9, 2021

In Business/Economics, Data, Field Research, Statistics, Students

Home values and race have an intimate connection in Durham, NC. From 1940 to 2020, if mean home values in Black-majority Census tracts had appreciated at rates equal to those in white Census tracts, the mean home value for homes in Black tracts would be $94,642 higher than it is.

That’s the disappointing, but perhaps not shocking, finding of a Duke Data+ team.

Because housing accounts for the biggest portion of wealth for families that fall outside of the top 10% of wealth in the U.S., this figure on home values represents a pervasive racial divide in wealth.

What started as a Data+ project in the summer of 2020 has expanded into an ongoing exploration of the connection between persistent wealth disparities across racial lines through housing. Omer Ali (Ph.D.), a postdoctoral associate with The Samuel Dubois Cook Center on Social Equity, is leading undergraduates Nicholas Datto and Pei Yi Zhuo in the continuation of their initial work. The trio presented an in-depth analysis of their work and methods Friday, February 5^th during a Data Dialogue.

Left to right: Omer Ali, Nicholas Datto, Pei Yi Zhuo

The team used a multitude of data to conduct their analyses, including the 1940 Census, Durham County records, CoreLogic data for home sales and NC voter registrations. Aside from the nearly $100,000 difference between mean home values between Black census tracts (defined as >50% Black homeowners from 1940-2020) and white census tracts (defined as >50% white homeowners from 1940-2020), Ali, Datto, and Zhou also found that over the last 10 years, home values have risen in Black neighborhoods as they have been losing Black residents. Within Census tracts, the team said that Black home-buyers in Durham occupy the least valuable homes.

Datto introduced the concept of redlining — systemic housing discrimination — and explained how this historic issue persists. From 1930-1940, the Home Owners’ Loan Corporation (HOLC) and Federal Housing Administration (FHA) designated certain neighborhoods unsuitable for mortgage lending. Neighborhoods were given a desirability grade from A to D, with D being the lowest.

In 1940, no neighborhoods with Black residents were designated as either A or B districts. That meant areas with non-white residents were considered more risky and thus less likely to receive FHA-guaranteed mortgages.

Datto explained that these historic classifications persist because the team found significant differences in the amount of accumulated home value over time by neighborhood rating. We are “seeing long-lasting effects of these redlined maps on homeowners in Durham, “ said Datto, with even “significant differences between white [and non-white] homeowners, even in C and D neighborhoods.”

Zhou explained the significance of tracking the changes of each Census tract – Black, white, or integrated – over the last 50 years. The “white-black disparity [in home value] has grown by 287%” in this time period, he said. Homes of comparable structural design and apparent worth are much less valuable for simply existing in Black neighborhoods and being owned by Black people. And the problem has only expanded.

Along with differences in home value, both Black and white neighborhoods have seen a decline in Black homeowners in the 21^st Century, pointing to a larger issue at hand. Though the work done so far merely documents these trends, rather than looking for correlation that may get at the underlying causes of the home-value disparity, the trends pair closely with other regions across the country being impacted by gentrification.

“Home values are going up in Black neighborhoods, but the number of Black people in those neighborhoods is going down,” said Datto.

Ali pointed out that there are evaluation practices that include evaluation of the neighborhood “as opposed to the structural properties of the home.” When a house is being evaluated, he said a home of similar structure owned by white homeowners would never be chosen as a comparator for a Latinx- or Black-owned home. This perpetuates historical disparities, as “minority neighborhoods have been historically undervalued” it is a compounding, systemic cycle.

The team hopes to export their methodology to a much larger scale. Thus far, this has presented some back-end issues with data and computer science, however “there is nothing in the analysis itself that couldn’t be [applied to other geographical locations,” they said.

Large socioeconomic racial disparities prevail in the U.S., from gaps in unemployment to infant mortality to incarceration rates to life expectancy itself. Though it should come as no surprise that home-values represent another area of inequity, work like Ali, Datto, and Zhou are conducting needs more traction, support, and expansion.

Post by Cydney Livingston

Cybersecurity for Autonomous Systems

By Guest Post

On January 6, 2021

In Computers/Technology, Data, Engineering, Guest Post

Over the past decades, we have adopted computers into virtually every aspect of our lives, but in doing so, we’ve made ourselves vulnerable to malicious interference or hacking. I had the opportunity to talk about this with Miroslav Pajic, the Dickinson Family associate professor in Duke’s electrical and computer engineering department. He has worked on cybersecurity in self-driving cars, medical devices, and even US Air Force hardware.

Miroslav Pajic is an electrical engineer

Pajic primarily works in “assured autonomy,” computers that do most things by themselves with “high-level autonomy and low human control and oversight.” “You want to build systems with strong performance and safety guarantees every time, in all conditions,” Pajic said. Assured Autonomy ensures security in “contested environments” where malicious interference can be expected. The stakes of this work are incredibly high. The danger of attacks on military equipment goes without saying, but cybersecurity on a civilian level can be just as dangerous. “Imagine,” he told me, “that you have a smart city coordinating traffic and that… all of (the traffic controls), at the same time, start doing weird things. There can be a significant impact if all cars stop, but imagine if all of them start speeding up.”

Pajic and some of his students with an autonomous car.

Since Pajic works with Ph.D. students and postdocs, I wanted to ask him how COVID-19 has affected his work. As if on cue, his wifi cut out, and he dropped from our zoom call. “This is a perfect example of how fun it is to work remotely,” he said when he returned. “Imagine that you’re debugging a fleet of drones… and that happens.”

In all seriousness, though, there are simulators created for working on cybersecurity and assured autonomy. CARLA, for one, is an open-source simulator of self-driving vehicles made by Intel. Even outside of a pandemic, these simulators are used extensively in the field. They’ve become very useful in returning accurate and cheap results without any actual risk, before graduating to real tests.

“If you’re going to fail,” Pajic says, “you want to fail quickly.”

Guest Post by Riley Richardson, Class of 2021, NC School of Science and Math

Quantifying the effects of structural racism on health

By Victoria Priester

On December 22, 2020

In Data, Faculty, Lecture, Statistics

America is getting both older and Blacker. The proportion of non-white older adults is increasing, and by 2050 the majority of elderly people will be racial minorities. In his Langford Lecture “Who gets sick and why? How racial inequality gets under the skin” on November 10, Professor Tyson H. Brown discussed the importance of studying older minorities when learning about human health. His current project aims to address gaps in research by quantifying effects of structural racism on health.

Health disparities result in unnecessary fatalities. Dr. Brown estimates that if we took away racial disparities in health, we could avoid 229 premature deaths per day. Health disparities also have substantial economic costs that add up to about 200 billion dollars annually. Dr. Brown explained that the effects of structural racism are so deadly because it is complex and not the same as the overt, intentional, interpersonal racism that most people think of. Thus, it is easier to ignore or to halt attempts to fix structural racism. Dr. Brown’s study posits that structural racism has five key tenets: it is multifaceted, interconnected, an institutionalized system, involves relational subordination and manifests in racial inequalities in life chances.

A motivator for Brown’s research was that less than 1% of studies of the effects of race on health have focused on structural racism, even though macro level structural racism has deleterious effects on health of Black people. When thinking about inequalities, the traditional mode of thinking is the group that dominates (in this case, white people) receives all benefits and the subordinates (in Dr. Brown’s study, Black people) receive all of the negative effects of racism. In this mode of thinking, whites actively benefit from social inequality. However, Dr. Brown discussed another theory: that structural racism and its effects on health undermines the fabric of our entire society and has negative impacts on both whites and Blacks. It is possible for whites to be harmed by structural racism, but not to the same extent as Black people.

Dr. Brown identified states as “important institutional actors that affect population health.” As a part of his research, he made a state level index of structural racism based off of data from 2010. The index was composed of nine indicators of structural racism, which combine to make an overall index of structural racism in states. In mapping out structural racism across the domains, the results were not what most people might expect. According to Dr. Brown’s study, structural racism tends to be highest in the midwest of the United States, rather than the south. These higher levels of structural racism were associated with worse self-rated health: one standard deviation increase in level of structural racism correlated with the equivalent of two standard deviation increases in age. In other words, a person who is affected by structural racism has similar self-rated health to people two age categories above them who do not experience negative effects of structural racism.

As the structural racism index increases, the Black-white difference in COVID-19 related deaths also increases. Overall, Dr. Brown found that structural racism is a key driver of inequalities in COVID-19 deaths between whites and Blacks. Looking forward, Dr. Brown is interested in learning more about how contemporary forms of racism contribute to inequality—such as searching racial slurs on Google and implicit bias, both of which are high in the southern United States.

After his discussion, colleagues raised questions about what can be done to eliminate negative effects of structural racism. Dr. Brown listed options such as rent protection, COVID-19 test sites in lower income communities and another stimulus bill. He also explained that the distribution of a COVID-19 vaccine needs to be done in an ethical manner and not exclude those who are less fortunate who really need the vaccine. We also need better data collection in general—the more we know about the effects of structural racism, the better we will be able to adapt equity practices to mitigate harm on Black communities.

By Victoria Priester

Contact Tracing Is a Call for Ingenuity and Innovation

By Cydney Livingston

On December 17, 2020

In Computers/Technology, Data, Medicine, Statistics, Visualization

The sudden need for contact-tracing technologies to address the Covid-19 pandemic is inspiring some miraculous human ingenuity.

Wednesday, December 16^th, Rodney Jenkins, Praudman Jain, and Kartik Nayak discussed Covid-19 contact tracing and the role of new technologies in a forum organized by the Duke Mobile App Gateway team.

Jenkins is the Health Director of Durham County’s Department of Public Health, Jain is CEO and founder of Vibrent Health. And Nayak is an Assistant Professor in Duke’s Computer Science department. The panel was hosted by Leatrice Martin (M.B.A.), Senior Program Coordinator for Duke’s Mobile App Gateway with Duke’s Clinical and Translational Science Institute.

Panelists, left to right: Rodney Jenkins (M.P.H.), Praudman Jain (M.S.), and Kartik Nayak (Ph.D.)

Contact tracing is critical to slowing the spread of Covid, and Jenkins says it’s not going away anytime soon. Jenkins, who only began his position with Durham County Public Health in January 2020, said Durham County’s contact tracing has been… interesting. As the virus approached Durham, “Durham County suffered a severe malware attack that really rendered platforms…useless.”

Eventually, though, the department developed its own method of tracing through trial and error. North Carolina’s Department of Health and Human Services (NC HHS), like many other health departments across the nation in March, was scrambling to adjust. NC HHS was not able to provide support for Durham’s contact tracing until July, when Jenkins identified a serious need for reinforcement due to disproportionate Covid cases amongst Latinx community members. In the meantime, Durham county received help from Duke’s Physician Assistant students and the Blue Cross Blue Shield Foundation. They expanded their team of five to 95 individuals investigating and tracing Durham County’s positive cases.

Rodney Jenkins MPH is the health director of the Durham County Public Health Department.

Jenkins proclaimed contact tracing as “sacred to public health” and a necessary element to “boxing in” Covid-19 – along with widespread testing.

Durham’s tracing tool is conducted through a HIPPA-compliant, secure online portal. Data about individuals is loaded into the system, transmitted to the contact tracing team, and then the team calls close contacts to enable a quick quarantine response. The department had to “make a huge jump very quickly,” said Jenkins. It was this speedy development and integration of new technology that has helped Durham County Public Health better manage the pandemic.

Jain, along with colleague Rachele Peterson, spoke about his company, Vibrent Health. Vibrent, which was recently awarded a five-year grant from the National Institutes of Health’s ‘ll of Us Research Program, is focused on creating and dispersing digital and mobile platforms for public health.

Naturally, this includes a new focus on Covid. With renewed interest in and dependency on contact tracing, Jain says there is a need for different tools to help various stakeholders – from researchers to citizens to government. He believes technology can “become the underlying infrastructure for accelerating science.”

Vibrent identified needs for a national tracing model, including the labor intensity of manual processes, disparate tools, and lack of automation.

Peterson said that as we “are all painfully aware,” the U.S. was not prepared for Covid, resulting in no national tracing solution. She offered that the success of tracing has been mostly due to efforts of “local heroes” like Jenkins. Through their five-year award, Vibrent is developing a next-generation tracing solution that they hope will better target infectious spread, optimize response time, reduce labor burden in managing spread, and increase public trust.

On left, Jain provided background information on his company, Vibrent Health. On right, Peterson outlined the current state of contact tracing in the U.S. and identified needs for a national tracing system.

Along with an online digital interface, the company is partnering with Virginia Commonwealth University to work on a statistical modeling system. Peterson likened their idea to the Waze navigation app, which relies on users to add important, real-time data. They hope to offer a visualization tool to identify individuals in close contact with infected or high-risk persons and identify places or routes where users are at higher risk.

Nayak closed the panel by discussing his work on a project complementary to contact tracing, dubbed Poirot. Poirot will use aggregated private contact summary data. Because physical distancing is key to preventing Covid spread, Nayak said it is both important and difficult to measure physical interactions through contact events due to privacy concerns over sensitive data. Using Duke as the case study, Poirot will help decision makers answer questions about which buildings have the most contact events or which populations – faculty versus students – are at higher risk. The technology can also help individuals identify how many daily contacts they have or the safest time of day to visit a particular building.

On left, Poirot’s design. On right, examples of Poirot’s uses by decision makers and individuals.

Nayak said users will only be able to learn about their own contact events, as well as aggregate stats, while decision makers can only access aggregate statistics and have no ability to link data to individuals.

Users will log into a Duke server and then privately upload their data using a technology called blinded tokens. Contact events will be discovered with the help of continuously changing, random identifiers with data summation at intermittent intervals. Data processing will use multiparty computation and differential privacy to ensure information is delinked from individuals. The tool is expected for release in the spring.

Screenshot of Duke’s Mobile App Gateway site.

Although we are just starting vaccination, the need for nationwide resources “will be ongoing,” Martin said.

We should continue to embrace contact tracing because widespread vaccination will take time, Jenkins said.

Jenkins, Jain, and Nayak are but a few who have stepped up to respond innovatively to Covid. It becomes increasingly apparent that we will continue to need individuals like them, as well as their technological tools, to ease the burden of an overworked and unprepared health system as the pandemic prevails in America.

Post by Cydney Livingston

COVID-19, and the Costs of Big Data

By Jeremy Jacobs

On September 2, 2020

In Computers/Technology, Data, Faculty, Global Health, Medicine

TikTok’s illicit collection of user data recently drew fire from US officials. But TikTok’s base—largely young adults under 25—was unfazed. In viral videos posted in July and August, users expressed little concern about their digital privacy.

“If china wants to know how obsessed i am with hockey,” wrote one user, “then just let them its not a secret.” “#Takemydata,” captioned another, in a video racking up 6,000 likes and over 42,000 views.

As digital technologies become ever more pervasive – or even invasive – concerns for privacy should be a concern, a pair of experts said in a Duke Science & Society webinar earlier this month.

TikTok and digital marketing aside, data collection can have real, tangible benefits. Case in point: COVID-19. Researchers at Duke and elsewhere are using peoples’ fitness trackers and smart watches to try to understand and predict the pandemic’s spread by monitoring a variety of health metrics, producing real-time snapshots of heart rate, blood pressure, sleep quality, and more. Webinar speaker Jessilyn Dunn of Duke biomedical engineering and her team have tapped into this data for CovIdentify, a Duke-funded effort to predict COVID infections using data collected by smartphones and wearable devices.

Health data from smartphones and fitness trackers may help predict and identify disease.

For several years, Dunn’s lab has researched digital biomarkers of disease—that is, how health data collected by tech we carry every day can predict anything from heart disease to cognitive decline.

It’s a potential goldmine: One recent poll suggests that 40 million Americans own some kind of smartwatch or fitness tracker. And the wearables market is rapidly expanding—by 2022, it may be worth upwards of 25 billion dollars.

As coronavirus cases began to rise in the US, Dunn’s lab quickly pivoted to develop COVID-specific biomarkers. “We have these devices … that perform physiologic monitoring,” Dunn said, “This is a method of taking vitals continuously to try to monitor what’s going on with people.”

Say you’re a participant in Dr. Dunn’s study. You download the CovIdentify app, which analyzes health data collected by your phone or smartwatch. Short daily surveys then assess your exposure to COVID-19 and whether you’ve developed any symptoms. Dunn and her team hope to find a link, some specific change in vitals that corresponds to COVID-19 infection.

There are some challenges. CovIdentify must account for variability between devices—data collected from a Fitbit, for example, might differ dramatically from an Apple Watch. And because COVID-19 manifests in unique ways across populations, a truly universal biomarker may not exist.

However, panelist Marielle Gross—a bioethicist at the University of Pittsburgh—said projects like Dunn’s raise questions of digital privacy. Gross emphasized how easily our health data can be abused.

Left: Jessilyn Dunn, PhD, a professor at Duke University and CovIdentify Researcher
Right: Marielle Gross, MD, MBE, a bioethicist and professor at the University of Pittsburgh

“Digital specimen is the digital representation of the human body,” she said. “Disrespecting it disrespects the body it represents.”

Dr. Gross cited South Korea’s efforts to curb COVID-19 as a cautionary tale. As part of the government’s response, which quickly minimized cases early in the pandemic, exposed or infected South Koreans were expected to stay home and isolate, tracked using GPS-enabled devices.

But many South Koreans chose to leave their devices at home, rather than be tracked by their government. In response, the government required its citizens to carry their devices, 24/7. In a pandemic, desperate measures may be called for. But, Gross suggests, it isn’t hard to imagine a grimmer future—where the government requires all citizens to share their location, all the time.

Gross argues that we must fundamentally shift how we think about our personal data. “There’s this broad assumption that we have to give up privacy to reap the benefits of collective data.” Gross noted. “And that’s false.”

Most ‘digital natives’ aren’t naive. They’re well aware that internet companies collect, analyze, and sell their data, sometimes to malicious effect. But many view data collection as a necessary tradeoff for an intuitive and tailored web experience.

So where do we go from here? Dr. Gross points to new developments like zero knowledge proofs, which use complex algorithms to verify data without actually seeing it. This technique promises anonymity without compromising the value of collective data. And as computing power increases, it may also be possible to perform real-time analysis without ever transmitting or storing collected health data.

And for future tech? In Dr. Gross’s opinion, ethical implications must be considered from day one. “Those sorts of considerations are not the kind of thing that you can tack on later. They have to be built into devices…at the ground floor.”

Post by Jeremy Jacobs

Who Makes Duke? Visualizing 50 Years of Enrollment Data

By Cydney Livingston

On August 10, 2020

In Data, Statistics, Visualization

Millions of data points. Ten weeks. Three Duke undergraduates. Two faculty facilitators. One project manager and one pretty cool data visualization website.

Meet 2020 Data+ team “On Being a Blue Devil: Visualizing the Makeup of Duke Students.”

Undergraduates Katherine Cottrell (’21), Michaela Kotarba (’22) and Alexander Burgin (’23) spent the last two and a half months looking at changes in Duke’s student body enrollment over the last 50 years. The cohort, working with project manager Anna Holleman, professor Don Taylor and university archivist Valerie Gillispie, used data from each of Duke’s colleges spanning back to 1970. Within the project, the students converted 30 years of on-paper data to machine-readable data which was a hefty task. “On Being a Blue Devil” presented their final product during a Zoom-style showcase Friday, July 31: An interactive data-visualization website. The site is live now but is still being edited as errors are found and clarifications are added.

The cover page of the launched interactive application.

The team highlighted a few findings. Over the last 20 years, there has been a massive surge in Duke enrollment of students from North Carolina. Looking more closely, it is possible that grad enrollment drives this spike due to the tendency for grad students to record North Carolina as their home-state following the first year of their program. Within the Pratt School of Engineering, the number of female students is on an upward trend. There is still a prevalent but closing gap in the distribution between male and female undergraduate engineering enrollment. A significant drop in grad school and international student enrollment in 2008 corresponds to the financial crisis of that year. The team believes there may be similar, interesting effects for 2020 enrollment due to COVID-19.

However, the majority of the presentation focused on the website and all of its handy features. The overall goal for the project was to create engaging visualizations that enable users to dive into and explore the historic data for themselves. Presentation attendees got a behind-the-scenes look at each of the site’s pages.

Breakdown of enrollment by region within different countries outside of the United States.

The “Domestic Map” allows website visitors to select the school, year, sex, semester, and state they wish to view. The “International Map” displays the same categories, with regional data replacing state distributions for international countries. Each query returns summary statistics on the number of students enrolled per state or region for the criteria selected.

A “Changes Over Time” tab clarifies data by keeping track of country and territory name changes, as well as changes in programs over the five decades of data. For example, Duke’s nursing program data is a bit complicated: One of its programs ended, then restarted a few years later, there are both undergraduate and graduate nursing schools, and over a decade’s worth of male nursing students are not accounted for in the data sets.

The “Enrollment by Sex” tab displays breakdown of enrollment using the Duke-established binary of male and female categories. This data is visualized in pie charts but can also be viewed as line graphs to look at trends over time and compare trends between schools.

“History of Duke” offers an interactive timeline that contextualizes the origins of each of Duke’s schools and includes a short blurb on their histories. There are also timelines for the history of race and ethnicity at Duke, as well as Duke’s LGBTQ history. Currently, no data on gender identity instead of legal sex was made available for the team. This is why they sought to contextualize the data that they do have. If the project continues, Cottrell, Kotarba, and Burgin strongly suggest that gender identity data be made accessible and included on the site. Racial data is also a top priority for the group, but they simply did not have access to this resource for during the duration of their summer project.

Timeline of Duke’s various schools since it was founded in the 1830’s.

Of course, like most good websites, there is an “About” section. Here users can meet the incredible team who put this all together, look over frequently asked questions, and even dive deeper into the data with the chance to look at original documents used in the research.

From left to right: Project lead Don Taylor (Ph.D), project lead Valerie Gillispie, and project manager Anna Holleman

Each of the three undergrads of the “On Being a Blue Devil” team gained valuable transferable skills – as is a goal of Duke’s Data+ program. But the tool they created is likely to go far beyond their quarantined summer. Their website is a unique product that makes data fun to play with and will drive a push for more data to be collected and included. Future researchers could add many more metrics, years, and data points to the tool, causing it to grow exponentially.

Many Duke faculty members are already vying for a chance to talk with the team about their work.