Following the people and events that make up the research community at Duke

Category: Data Page 1 of 8

The SolarWinds Attack and the Future of Cybersecurity

Sticky post

Cybersecurity is the protection of computer systems and networks in order to prevent theft of or damage to their hardware, software, or electronic data. While cybersecurity has been around since the 1970s, its importance and relevance in mainstream media as well as politics is growing as an increased amount of information is stored electronically. In 1986, approximately 1% of the world’s information was stored in a digital format; by 2006, just twenty years later, this had increased to 94%.

Cyber Hacking has also become more prominent with the advent of the Digital Revolution and the start of the Information Era which began in the 1980s and rapidly grew in the early 2000s. It became an effective political form of attack to acquire confidential information from foreign countries. 

In mid-December of 2020, it was revealed that several U.S. companies and even government agencies were victims of a cyberattack that began in September of 2019. 

The Sanford School of Public Policy hosted a leading cybersecurity reporter Sean Lyngaas to lead a discussion on the national security implications of the SolarWinds hack with Sanford Professor David Hoffman as well as Visiting Scholar and Journalist Bob Sullivan. Lyngaas graduated from Duke in 2007 and majored in Public Policy at the Sanford School. 

Lyngaas did not have a direct route into cybersecurity journalism. After completing his Masters in International Relations from The Fletcher School of Law and Diplomacy at Tufts University he moved to Washington D.C. to pursue a career as a policy analyst. However, at night when he was not applying for jobs he began pitching stories to trade journals. Despite not being a “super technical guy” Lyngaas ended up becoming passionate about cybersecurity and reporting on the increasing amounts of news surrounding the growing topic. Since 2012 Lyngaas has done extensive reporting on cybersecurity breaches and recently has published several detailed reports on the SolarWinds incident. 

Sean Lyngaas

The SolarWinds attack is considered one of the most impactful cybersecurity events in history as a result of its intricacy and the number of government and private sector victims. Lyngaas explained that most people had not heard of SolarWinds until recently, but the company nevertheless, provides software to a multitude of fortune 500 companies and government agencies. One of the software products they sell is Orion, an IT performance monitoring platform that helps businesses manage and optimize their IT infrastructure. The Hackers infiltrated Orion’s update software and over several months sent out malicious updates to 18,000 companies and government agencies. Among the victims of this espionage campaign were the U.S. Justice Department and Microsoft. As a result of the campaign, countless email accounts were infiltrated and hacked.

“A perfect example of someone robbing a bank by knocking out the security guard and putting on his outfit to have access.” 

Bob Sullivan

Sullivan added that this hack is particularly concerning because the target was personal information whereas previous large-scale hacks have been centered around breaching data. Additionally, SolarWind’s core business is not cybersecurity, however, they work with and provide software to many cybersecurity companies. The attack was revealed by FireEye, a cybersecurity company that announced they had been breached.

“FireEye got breached and they are the ones usually investigating the breaches”

Sean lyngaas

This situation has prompted both those involved in the cybersecurity industry as well as the public to reconsider the scope of cyberhacking and what can be done to prevent it.

“Computer spying by nation states has been going on for decades but we talk about it more openly now.” Lyngass stated. 

Lyngaas added that the public is now expecting more transparency especially if there are threats to their information. He feels we need to have better standards for companies involved in cyber security. Solarwinds arguably was not using cybersecurity best practices and had recently made price cuts which may have contributed to their vulnerability. Hoffman explained that SolarWinds had been using an easy-to-guess password to their internal systems which allowed hackers access to the software update as well as the ability to sign a digital signature. 

“We are not going to prevent these breaches; we are not going to prevent the Russians from cyber espionage.” Lyngaas stated

However, he believes by using best practices we can uncover these breaches earlier and react in a timely manner to reduce damage. Additionally, he thinks there needs to be a shift in government spending in terms of the balance between cyber defense and offense. Historically, there has been a lack of transparency in government cyber spending, however, it is known that there has been more spent on offense in the last several years.

Changes are starting to be made in the cybersecurity landscape that hopefully should aid in reducing attacks or at least the severity of their impacts. California recently created a law centered around publicizing breaches which will increase transparency. The panelists added that the increasing amount of news and information available to the public about cybersecurity is aiding efforts to understand and prevent it. President Biden was openly speaking about cybersecurity in relation to protecting the election from hackers and continues to consider it an urgent issue as it is crucial in order to protect confidential U.S. information. 

As Lyngaas explained, it is practically impossible to completely prevent cyber attacks, however, through increasing transparency and using best practices, incidents like the SolarWinds hack will hopefully not have effects of the same scale again.

Post by Anna Gottskind

Increasing Access to Care with the Help of Big Data

Sticky post

Artificial intelligence (AI) and data science have the potential to revolutionize global health. But what exactly is AI and what hurdles stand in the way of more widespread integration of big data in global health? Duke’s Global Health Institute (DGHI) hosted a Think Global webinar Wednesday, February 17th to dive into these questions and more.  

The webinar’s panelists were Andy Tatem (Ph.D), Joao Vissoci (Ph.D.), and Eric Laber (Ph.D.), moderated by DGHI’s Director of Research Design and Analysis Core, Liz Turner (Ph.D.).  Tatem is a professor of spatial demography and epidemiology at the University of South Hampton and director of WorldPop. Vissoci is an assistant professor of surgery and global health at Duke University. Laber is a professor of statistical science and bioinformatics at Duke.

Panelist moderator, Lisa Turner

Tatem, Vissoci, and Laber all use data science to address issues in the global health realm. Tatem’s work largely utilizes geospatial data sets to help inform global health decisions like vaccine distribution within a certain geographic area. Vissoci, who works with the GEMINI Lab at Duke (Global Emergency Medicine Innovation and Implementation Research), tries to leverage secondary data from health systems in order to understand issues of access to and distribution of care, as well as care delivery. Laber is interested in improving decision-making processes in healthcare spaces, attempting to help health professionals synthesize very complex data via AI.

All of their work is vital to modern biomedicine and healthcare, but, Turner said, “AI means a lot of different things to a lot of different people.” Laber defined AI in healthcare simply as using data to make healthcare better. “From a data science perspective,” Vissoci said, “[it is] synthesizing data … an automated way to give us back information.” This returned info is digestible trends and understandings derived from very big, very complex data sets. Tatem stated that AI has already “revolutionized what we can do” and said it is “powerful if it is directed in the right way.”

A screenshot from worldpop.org

We often get sucked into a science-fiction version of AI, Laber said, but in actuality it is not some dystopian future but a set of tools that maximizes what can be derived from data.

However, as Tatem stated, “[AI] is not a magic, press a button” scenario where you get automatic results. A huge part of work for researchers like Tatem, Vissoci, and Laber is the “harmonization” of working with data producers, understanding data quality, integrating data sets, cleaning data, and other “back-end” processes.

This comes with many caveats.

“Bias is a huge problem,” said Laber. Vissoci reinforced this, stating that the models built from AI and data science are going to represent what data sources they are able to access – bias included. “We need better work in getting better data,” Vissoci said.

Further, there must be more up-front listening to and communication with “end-users from the very start” of projects, Tatem outlined. By taking a step back and listening, tools created through AI and data science may be better met with actual uptake and less skepticism or distrust. Vissoci said that “direct engagement with the people on the ground” transforms data into meaningful information.

Better structures for meandering privacy issues must also be developed. “A major overhaul is still needed,” said Laber. This includes things like better consent processes for patients’ to understand how their data is being used, although Tatem said this becomes “very complex” when integrating data.

Nonetheless the future looks promising and each panelist feels confident that the benefits will outweigh the difficulties that are yet to come in introducing big data to global health. One cool example Vissoci gave of an ongoing project deals with the influence of environmental change through deforestation in the Brazilian Amazon on the impacts of Indigenous populations. Through work with “heavy multidimensional data,” Vissoci and his team also have been able to optimize scarcely distributed Covid vaccine resource “to use in areas where they can have the most impact.”

Laber envisions a world with reduced or even no clinical trials if “randomization and experimentation” are integrated directly into healthcare systems. Tatem noted how he has seen extreme growth in the field in just the last 10 to 15 years, which seems only to be accelerating.

A lot of this work has to do with making better decisions about allocating resources, as Turner stated in the beginning of the panel. In an age of reassessment about equity and access, AI and data science could serve to bring both to the field of global health.

Post by Cydney Livingston

Student Team Quantifies Housing Discrimination in Durham

Sticky post

Home values and race have an intimate connection in Durham, NC. From 1940 to 2020, if mean home values in Black-majority Census tracts had appreciated at rates equal to those in white Census tracts, the mean home value for homes in Black tracts would be $94,642 higher than it is.

That’s the disappointing, but perhaps not shocking, finding of a Duke Data+ team.

Because housing accounts for the biggest portion of wealth for families that fall outside of the top 10% of wealth in the U.S., this figure on home values represents a pervasive racial divide in wealth.

What started as a Data+ project in the summer of 2020 has expanded into an ongoing exploration of the connection between persistent wealth disparities across racial lines through housing. Omer Ali (Ph.D.), a postdoctoral associate with The Samuel Dubois Cook Center on Social Equity, is leading undergraduates Nicholas Datto and Pei Yi Zhuo in the continuation of their initial work. The trio presented an in-depth analysis of their work and methods Friday, February 5th during a Data Dialogue.

The team used a multitude of data to conduct their analyses, including the 1940 Census, Durham County records, CoreLogic data for home sales and NC voter registrations. Aside from the nearly $100,000 difference between mean home values between Black census tracts (defined as >50% Black homeowners from 1940-2020) and white census tracts (defined as >50% white homeowners from 1940-2020), Ali, Datto, and Zhou also found that over the last 10 years, home values have risen in Black neighborhoods as they have been losing Black residents. Within Census tracts, the team said that Black home-buyers in Durham occupy the least valuable homes.

Home Owners Loan Corporation data

Datto introduced the concept of redlining — systemic housing discrimination — and explained how this historic issue persists. From 1930-1940, the Home Owners’ Loan Corporation (HOLC) and Federal Housing Administration (FHA) designated certain neighborhoods unsuitable for mortgage lending. Neighborhoods were given a desirability grade from A to D, with D being the lowest.

In 1940, no neighborhoods with Black residents were designated as either A or B districts. That meant areas with non-white residents were considered more risky and thus less likely to receive FHA-guaranteed mortgages.

Datto explained that these historic classifications persist because the team found significant differences in the amount of accumulated home value over time by neighborhood rating. We are “seeing long-lasting effects of these redlined maps on homeowners in Durham, “ said Datto, with even “significant differences between white [and non-white] homeowners, even in C and D neighborhoods.”

Zhou explained the significance of tracking the changes of each Census tract – Black, white, or integrated – over the last 50 years. The “white-black disparity [in home value] has grown by 287%” in this time period, he said. Homes of comparable structural design and apparent worth are much less valuable for simply existing in Black neighborhoods and being owned by Black people. And the problem has only expanded.

Along with differences in home value, both Black and white neighborhoods have seen a decline in Black homeowners in the 21st Century, pointing to a larger issue at hand. Though the work done so far merely documents these trends, rather than looking for correlation that may get at the underlying causes of the home-value disparity, the trends pair closely with other regions across the country being impacted by gentrification.

“Home values are going up in Black neighborhoods, but the number of Black people in those neighborhoods is going down,” said Datto.

Ali pointed out that there are evaluation practices that include evaluation of the neighborhood “as opposed to the structural properties of the home.” When a house is being evaluated, he said a home of similar structure owned by white homeowners would never be chosen as a comparator for a Latinx- or Black-owned home. This perpetuates historical disparities, as “minority neighborhoods have been historically undervalued” it is a compounding, systemic cycle.

The team hopes to export their methodology to a much larger scale. Thus far, this has presented some back-end issues with data and computer science, however “there is nothing in the analysis itself that couldn’t be [applied to other geographical locations,” they said.

Large socioeconomic racial disparities prevail in the U.S., from gaps in unemployment to infant mortality to incarceration rates to life expectancy itself. Though it should come as no surprise that home-values represent another area of inequity, work like Ali, Datto, and Zhou are conducting needs more traction, support, and expansion.

Post by Cydney Livingston

Cybersecurity for Autonomous Systems

Sticky post

Over the past decades, we have adopted computers into virtually every aspect of our lives, but in doing so, we’ve made ourselves vulnerable to malicious interference or hacking. I had the opportunity to talk about this with Miroslav Pajic, the Dickinson Family associate professor in Duke’s electrical and computer engineering department. He has worked on cybersecurity in self-driving cars, medical devices, and even US Air Force hardware.

Miroslav Pajic is an electrical engineer

Pajic primarily works in “assured autonomy,” computers that do most things by themselves with “high-level autonomy and low human control and oversight.” “You want to build systems with strong performance and safety guarantees every time, in all conditions,” Pajic said. Assured Autonomy ensures security in “contested environments” where malicious interference can be expected. The stakes of this work are incredibly high. The danger of attacks on military equipment goes without saying, but cybersecurity on a civilian level can be just as dangerous. “Imagine,” he told me, “that you have a smart city coordinating traffic and that… all of (the traffic controls), at the same time, start doing weird things. There can be a significant impact if all cars stop, but imagine if all of them start speeding up.”

Pajic and some of his students with an autonomous car.

Since Pajic works with Ph.D. students and postdocs, I wanted to ask him how COVID-19 has affected his work. As if on cue, his wifi cut out, and he dropped from our zoom call. “This is a perfect example of how fun it is to work remotely,” he said when he returned. “Imagine that you’re debugging a fleet of drones… and that happens.” 

In all seriousness, though, there are simulators created for working on cybersecurity and assured autonomy. CARLA, for one, is an open-source simulator of self-driving vehicles made by Intel. Even outside of a pandemic, these simulators are used extensively in the field. They’ve become very useful in returning accurate and cheap results without any actual risk, before graduating to real tests.

“If you’re going to fail,” Pajic says, “you want to fail quickly.”

Guest Post by Riley Richardson, Class of 2021, NC School of Science and Math

Quantifying the effects of structural racism on health

Photo from Scholars@Duke

America is getting both older and Blacker. The proportion of non-white older adults is increasing, and by 2050 the majority of elderly people will be racial minorities. In his Langford Lecture “Who gets sick and why? How racial inequality gets under the skin” on November 10, Professor Tyson H. Brown discussed the importance of studying older minorities when learning about human health. His current project aims to address gaps in research by quantifying effects of structural racism on health. 

Health disparities result in unnecessary fatalities. Dr. Brown estimates that if we took away racial disparities in health, we could avoid 229 premature deaths per day. Health disparities also have substantial economic costs that add up to about 200 billion dollars annually. Dr. Brown explained that the effects of structural racism are so deadly because it is complex and not the same as the overt, intentional, interpersonal racism that most people think of. Thus, it is easier to ignore or to halt attempts to fix structural racism. Dr. Brown’s study posits that structural racism has five key tenets: it is multifaceted, interconnected, an institutionalized system, involves relational subordination and manifests in racial inequalities in life chances. 

A motivator for Brown’s research was that less than 1% of studies of the effects of race on health have focused on structural racism, even though macro level structural racism has deleterious effects on health of Black people. When thinking about inequalities, the traditional mode of thinking is the group that dominates (in this case, white people) receives all benefits and the subordinates (in Dr. Brown’s study, Black people) receive all of the negative effects of racism. In this mode of thinking, whites actively benefit from social inequality. However, Dr. Brown discussed another theory: that structural racism and its effects on health undermines the fabric of our entire society and has negative impacts on both whites and Blacks. It is possible for whites to be harmed by structural racism, but not to the same extent as Black people. 

Dr. Brown identified states as “important institutional actors that affect population health.” As a part of his research, he made a state level index of structural racism based off of data from 2010. The index was composed of nine indicators of structural racism, which combine to make an overall index of structural racism in states. In mapping out structural racism across the domains, the results were not what most people might expect. According to Dr. Brown’s study, structural racism tends to be highest in the midwest of the United States, rather than the south. These higher levels of structural racism were associated with worse self-rated health: one standard deviation increase in level of structural racism correlated with the equivalent of two standard deviation increases in age. In other words, a person who is affected by structural racism has similar self-rated health to people two age categories above them who do not experience negative effects of structural racism. 

As the structural racism index increases, the Black-white difference in COVID-19 related deaths also increases. Overall, Dr. Brown found that structural racism is a key driver of inequalities in COVID-19 deaths between whites and Blacks. Looking forward, Dr. Brown is interested in learning more about how contemporary forms of racism contribute to inequality—such as searching racial slurs on Google and implicit bias, both of which are high in the southern United States. 

After his discussion, colleagues raised questions about what can be done to eliminate negative effects of structural racism. Dr. Brown listed options such as rent protection, COVID-19 test sites in lower income communities and another stimulus bill. He also explained that the distribution of a COVID-19 vaccine needs to be done in an ethical manner and not exclude those who are less fortunate who really need the vaccine. We also need better data collection in general—the more we know about the effects of structural racism, the better we will be able to adapt equity practices to mitigate harm on Black communities.

By Victoria Priester

Contact Tracing Is a Call for Ingenuity and Innovation

The sudden need for contact-tracing technologies to address the Covid-19 pandemic is inspiring some miraculous human ingenuity.

Wednesday, December 16th, Rodney Jenkins, Praudman Jain, and Kartik Nayak discussed Covid-19 contact tracing and the role of new technologies in a forum organized by the Duke Mobile App Gateway team.

Jenkins is the Health Director of Durham County’s Department of Public Health, Jain is CEO and founder of Vibrent Health. And Nayak is an Assistant Professor in Duke’s Computer Science department. The panel was hosted by Leatrice Martin (M.B.A.), Senior Program Coordinator for Duke’s Mobile App Gateway with Duke’s Clinical and Translational Science Institute.

Contact tracing is critical to slowing the spread of Covid, and Jenkins says it’s not going away anytime soon. Jenkins, who only began his position with Durham County Public Health in January 2020, said Durham County’s contact tracing has been… interesting. As the virus approached Durham, “Durham County suffered a severe malware attack that really rendered platforms…useless.”

Eventually, though, the department developed its own method of tracing through trial and error. North Carolina’s Department of Health and Human Services (NC HHS), like many other health departments across the nation in March, was scrambling to adjust. NC HHS was not able to provide support for Durham’s contact tracing until July, when Jenkins identified a serious need for reinforcement due to disproportionate Covid cases amongst Latinx community members. In the meantime, Durham county received help from Duke’s Physician Assistant students and the Blue Cross Blue Shield Foundation. They expanded their team of five to 95 individuals investigating and tracing Durham County’s positive cases.

Rodney Jenkins MPH is the health director of the Durham County Public Health Department.

Jenkins proclaimed contact tracing as “sacred to public health” and a necessary element to “boxing in” Covid-19 – along with widespread testing.

Durham’s tracing tool is conducted through a HIPPA-compliant, secure online portal. Data about individuals is loaded into the system, transmitted to the contact tracing team, and then the team calls close contacts to enable a quick quarantine response. The department had to “make a huge jump very quickly,” said Jenkins. It was this speedy development and integration of new technology that has helped Durham County Public Health better manage the pandemic.

Jain, along with colleague Rachele Peterson, spoke about his company, Vibrent Health.  Vibrent, which was recently awarded a five-year grant from the National Institutes of Health’s ‘ll of Us Research Program, is focused on creating and dispersing digital and mobile platforms for public health.

Naturally, this includes a new focus on Covid. With renewed interest in and dependency on contact tracing, Jain says there is a need for different tools to help various stakeholders – from researchers to citizens to government.  He believes technology can “become the underlying infrastructure for accelerating science.”

Vibrent identified needs for a national tracing model, including the labor intensity of manual processes, disparate tools, and lack of automation.

Peterson said that as we “are all painfully aware,” the U.S. was not prepared for Covid, resulting in no national tracing solution. She offered that the success of tracing has been mostly due to efforts of “local heroes” like Jenkins. Through their five-year award, Vibrent is developing a next-generation tracing solution that they hope will better target infectious spread, optimize response time, reduce labor burden in managing spread, and increase public trust.

Along with an online digital interface, the company is partnering with Virginia Commonwealth University to work on a statistical modeling system. Peterson likened their idea to the Waze navigation app, which relies on users to add important, real-time data. They hope to offer a visualization tool to identify individuals in close contact with infected or high-risk persons and identify places or routes where users are at higher risk.

Nayak closed the panel by discussing his work on a project complementary to contact tracing, dubbed Poirot. Poirot will use aggregated private contact summary data. Because physical distancing is key to preventing Covid spread, Nayak said it is both important and difficult to measure physical interactions through contact events due to privacy concerns over sensitive data. Using Duke as the case study, Poirot will help decision makers answer questions about which buildings have the most contact events or which populations – faculty versus students – are at higher risk. The technology can also help individuals identify how many daily contacts they have or the safest time of day to visit a particular building.

Nayak said users will only be able to learn about their own contact events, as well as aggregate stats, while decision makers can only access aggregate statistics and have no ability to link data to individuals.

Users will log into a Duke server and then privately upload their data using a technology called blinded tokens. Contact events will be discovered with the help of continuously changing, random identifiers with data summation at intermittent intervals. Data processing will use multiparty computation and differential privacy to ensure information is delinked from individuals. The tool is expected for release in the spring.

Screenshot of Duke’s Mobile App Gateway site.

Although we are just starting vaccination, the need for nationwide resources “will be ongoing,” Martin said.

We should continue to embrace contact tracing because widespread vaccination will take time, Jenkins said.

Jenkins, Jain, and Nayak are but a few who have stepped up to respond innovatively to Covid. It becomes increasingly apparent that we will continue to need individuals like them, as well as their technological tools, to ease the burden of an overworked and unprepared health system as the pandemic prevails in America.

Post by Cydney Livingston

COVID-19, and the Costs of Big Data

TikTok’s illicit collection of user data recently drew fire from US officials. But TikTok’s base—largely young adults under 25—was unfazed. In viral videos posted in July and August, users expressed little concern about their digital privacy. 

“If china wants to know how obsessed i am with hockey,” wrote one user, “then just let them its not a secret.” “#Takemydata,” captioned another, in a video racking up 6,000 likes and over 42,000 views. 

As digital technologies become ever more pervasive – or even invasive – concerns for privacy should be a concern, a pair of experts said in a Duke Science & Society webinar earlier this month. 

TikTok and digital marketing aside, data collection can have real, tangible benefits. Case in point: COVID-19. Researchers at Duke and elsewhere are using peoples’ fitness trackers and smart watches to try to understand and predict the pandemic’s spread by monitoring a variety of health metrics, producing real-time snapshots of heart rate, blood pressure, sleep quality, and more. Webinar speaker Jessilyn Dunn of Duke biomedical engineering and her team have tapped into this data for CovIdentify, a Duke-funded effort to predict COVID infections using data collected by smartphones and wearable devices. 

Health data from smartphones and fitness trackers may help predict and identify disease.

For several years, Dunn’s lab has researched digital biomarkers of disease—that is, how health data collected by tech we carry every day can predict anything from heart disease to cognitive decline. 

It’s a potential goldmine: One recent poll suggests that 40 million Americans own some kind of smartwatch or fitness tracker. And the wearables market is rapidly expanding—by 2022, it may be worth upwards of 25 billion dollars.

As coronavirus cases began to rise in the US, Dunn’s lab quickly pivoted to develop COVID-specific biomarkers. “We have these devices … that perform physiologic monitoring,” Dunn said, “This is a method of taking vitals continuously to try to monitor what’s going on with people.” 

Say you’re a participant in Dr. Dunn’s study. You download the CovIdentify app, which analyzes health data collected by your phone or smartwatch. Short daily surveys then assess your exposure to COVID-19 and whether you’ve developed any symptoms. Dunn and her team hope to find a link, some specific change in vitals that corresponds to COVID-19 infection.   

There are some challenges. CovIdentify must account for variability between devices—data collected from a Fitbit, for example, might differ dramatically from an Apple Watch. And because COVID-19 manifests in unique ways across populations, a truly universal biomarker may not exist. 

However, panelist Marielle Gross—a bioethicist at the University of Pittsburgh—said projects like Dunn’s raise questions of digital privacy. Gross emphasized how easily our health data can be abused. 

Left: Jessilyn Dunn, PhD, a professor at Duke University and CovIdentify Researcher
Right: Marielle Gross, MD, MBE, a bioethicist and professor at the University of Pittsburgh

“Digital specimen is the digital representation of the human body,” she said. “Disrespecting it disrespects the body it represents.”

Dr. Gross cited South Korea’s efforts to curb COVID-19 as a cautionary tale. As part of the government’s  response, which quickly minimized cases early in the pandemic, exposed or infected South Koreans were expected to stay home and isolate, tracked using GPS-enabled devices.

But many South Koreans chose to leave their devices at home, rather than be tracked by their government. In response, the government required its citizens to carry their devices, 24/7. In a pandemic, desperate measures may be called for. But, Gross suggests, it isn’t hard to imagine a grimmer future—where the government requires all citizens to share their location, all the time.

Gross argues that we must fundamentally shift how we think about our personal data. “There’s this broad assumption that we have to give up privacy to reap the benefits of collective data.” Gross noted. “And that’s false.”

Most ‘digital natives’ aren’t naive. They’re well aware that internet companies collect, analyze, and sell their data, sometimes to malicious effect.  But many view data collection as a necessary tradeoff for an intuitive and tailored web experience.

So where do we go from here? Dr. Gross points to new developments like zero knowledge proofs, which use complex algorithms to verify data without actually seeing it. This technique promises anonymity without compromising the value of collective data. And as computing power increases, it may also be possible to perform real-time analysis without ever transmitting or storing collected health data.

And for future tech? In Dr. Gross’s opinion, ethical implications must be considered from day one. “Those sorts of considerations are not the kind of thing that you can tack on later. They have to be built into devices…at the ground floor.”

Post by Jeremy Jacobs

Who Makes Duke? Visualizing 50 Years of Enrollment Data

Millions of data points. Ten weeks. Three Duke undergraduates. Two faculty facilitators. One project manager and one pretty cool data visualization website.

Meet 2020 Data+ team “On Being a Blue Devil: Visualizing the Makeup of Duke Students.”

Undergraduates Katherine Cottrell (’21), Michaela Kotarba (’22) and Alexander Burgin (’23) spent the last two and a half months looking at changes in Duke’s student body enrollment over the last 50 years. The cohort, working with project manager Anna Holleman, professor Don Taylor and university archivist Valerie Gillispie, used data from each of Duke’s colleges spanning back to 1970. Within the project, the students converted 30 years of on-paper data to machine-readable data which was a hefty task. “On Being a Blue Devil” presented their final product during a Zoom-style showcase Friday, July 31: An interactive data-visualization website. The site is live now but is still being edited as errors are found and clarifications are added.

The cover page of the launched interactive application.

The team highlighted a few findings. Over the last 20 years, there has been a massive surge in Duke enrollment of students from North Carolina. Looking more closely, it is possible that grad enrollment drives this spike due to the tendency for grad students to record North Carolina as their home-state following the first year of their program. Within the Pratt School of Engineering, the number of female students is on an upward trend. There is still a prevalent but closing gap in the distribution between male and female undergraduate engineering enrollment. A significant drop in grad school and international student enrollment in 2008 corresponds to the financial crisis of that year. The team believes there may be similar, interesting effects for 2020 enrollment due to COVID-19.

However, the majority of the presentation focused on the website and all of its handy features. The overall goal for the project was to create engaging visualizations that enable users to dive into and explore the historic data for themselves. Presentation attendees got a behind-the-scenes look at each of the site’s pages.

Breakdown of enrollment by region within different countries outside of the United States.

The “Domestic Map” allows website visitors to select the school, year, sex, semester, and state they wish to view. The “International Map” displays the same categories, with regional data replacing state distributions for international countries. Each query returns summary statistics on the number of students enrolled per state or region for the criteria selected.

A “Changes Over Time” tab clarifies data by keeping track of country and territory name changes, as well as changes in programs over the five decades of data. For example, Duke’s nursing program data is a bit complicated: One of its programs ended, then restarted a few years later, there are both undergraduate and graduate nursing schools, and over a decade’s worth of male nursing students are not accounted for in the data sets.

The “Enrollment by Sex” tab displays breakdown of enrollment using the Duke-established binary of male and female categories. This data is visualized in pie charts but can also be viewed as line graphs to look at trends over time and compare trends between schools.

“History of Duke” offers an interactive timeline that contextualizes the origins of each of Duke’s schools and includes a short blurb on their histories. There are also timelines for the history of race and ethnicity at Duke, as well as Duke’s LGBTQ history. Currently, no data on gender identity instead of legal sex was made available for the team. This is why they sought to contextualize the data that they do have. If the project continues, Cottrell, Kotarba, and Burgin strongly suggest that gender identity data be made accessible and included on the site. Racial data is also a top priority for the group, but they simply did not have access to this resource for during the duration of their summer project.  

Timeline of Duke’s various schools since it was founded in the 1830’s.

Of course, like most good websites, there is an “About” section. Here users can meet the incredible team who put this all together, look over frequently asked questions, and even dive deeper into the data with the chance to look at original documents used in the research.

Each of the three undergrads of the “On Being a Blue Devil” team gained valuable transferable skills – as is a goal of Duke’s Data+ program. But the tool they created is likely to go far beyond their quarantined summer. Their website is a unique product that makes data fun to play with and will drive a push for more data to be collected and included. Future researchers could add many more metrics, years, and data points to the tool, causing it to grow exponentially.

Many Duke faculty members are already vying for a chance to talk with the team about their work.  

Artificial Intelligence Innovation in Taiwan

Taiwan is a small island off the coast of China that is roughly one fourth the size of North Carolina. Despite its size, Taiwan has made significant waves in the fields of science and technology. In the 2019 Global Talent Competitiveness Index Taiwan (labeled as Chinese Taipei) ranked number 1 in Asia and 15th globally.

However, despite being ahead of many countries in terms of technological innovation, Taiwan was still looking for further ways to improve and support research within the country. Therefore, in 2017 the Taiwan Ministry of Science and Technology (MOST), initiated an AI innovation research program in order to promote the development of AI technologies and attract top AI professionals to work in Taiwan.

Tsung-Yi Ho, a professor at the Department of Computer Science of National Tsing Hua University in Hsinchu, Taiwan came to Duke to present on the four AI centers that have been launched since then: the MOST Joint Research Center for AI Technology, All Vista Healthcare (AINTU), the AI for Intelligent Manufacturing Systems Research Center (AIMS), the Pervasive AI Research (PAIR) Labs, and the MOST AI Biomedical Research Center (AIBMRC) at National Taiwan University, National Tsing Hua University, National Chiao Tung University, and National Cheng Kung University, respectively. 

Within the four research centers, there are 79 research teams with more than 600 professors, experts, and researchers. The centers are focused on smart agriculture, smart factories, AI biomedical research, and AI manufacturing. 

The research centers have many different AI-focused programs. Tsung-Yi Ho first discussed the AI cloud service program. In the last two years since the program has been launched, they have created the Taiwania 2 supercomputer that has a computing capacity of 9 quadrillion floating-point operations per second. The supercomputer is ranked 20th in computing power and 10th in energy efficiency.

Next, Tsung-Yi Ho introduced the AI semiconductor Moonshot Program. They have been working on cognitive computing and AI chips, next-generation memory design, IoT System and Security for Intelligent edge, innovative sensing devices, circuits, and systems, emerging semiconductor processes, materials, and device technology, and component circuit and system design for unmanned vehicle system and AR/VR application. 

One of the things Taiwan is known for is manufacturing. The research centers are also looking to incorporate AI into manufacturing through motion generation, production line, and process optimization.

Keeping up with the biggest technological trends, the MOST research centers are all doing work to develop human-robot interactions, autonomous drones, and embedded AI on for self-driving cars.

Lastly, some of the research groups are focused on medical technological innovation including the advancement of brain image segmentation, homecare robots, and precision medicine.

Beyond this, the MOST has sponsored several programming, robotic and other contests to support tech growth and young innovators. 

Tsung-Yi Ho’s goal in presenting at Duke was to showcase the research highlights among four centers and bring research opportunities to attendees of Duke.

If interested, Duke students can reach out to Dina Khalilova to connect with Tsung-Yi Ho and get involved with the incredible AI innovation in Taiwan.

Post by Anna Gotskind

A Research Tour of Duke’s Largest Lab

“Lightning is like a dangerous animal that wants to go places. And you can’t stop it,” smiled Steve Cummer, Ph.D. as he gestured to the colorful image on the widescreen TV he’d set up outside his research trailer in an open field in Duke Forest.

Cummer, the William H. Younger Professor of electrical and computer engineering at Duke, is accustomed to lecturing in front of the students he teaches or his peers at conferences. But on this day, he was showing spectacular videos of lightning to curious members of the public who were given exclusive access to his research site on Eubanks Road in Chapel Hill, about 8 miles west of campus.

Steve Cummer shows a time-lapse video of lightning to the visitors on the annual Duke Forest Research Tour in the Blackwood Division of the Duke Forest.

More than two dozen members of the community had signed up for a tour of research projects in the Blackwood Division of Duke Forest (which recently expanded), a research-only area that is not normally open to the public. Cummer’s research site was the last stop of the afternoon research tour. The tour also covered native trees, moths and geological features of the Blackwood Division with biologist and ecologist Steve Hall, and air quality monitoring and remote sensing studies with John Walker and Dave Williams, from the U.S. Environmental Protection Agency.

The Hardwood Tower in the Blackwood Division is used for air quality monitoring and remote sensing studies. Researchers frequently climb the 138 foot tall tower to sample the air above the tree canopy.

Cummer’s research on lightning and sprites (electrical discharges associated with lightning that occur above thunderstorm clouds) sparked a lively question and answer session about everything from hurricanes to how to survive if you’re caught in a lightning storm. (Contrary to popular belief, crouching where you are is probably not the safest solution, he said. A car is a great hiding spot as long as you don’t touch anything made of metal.)

Cummer kept his tone fun and casual, like a live science television host, perched on the steps of his research trailer, referring to some of the scientific equipment spread out across the field as “salad bowls,” “pizza pans” and “lunar landers,” given their odd shapes. But the research he talked about was serious. Lightning is big business because it can cause billions of dollars in damage and insurance claims every year.

An ash tree (Fraxinus spp.) being examined by one of the visitors on the Duke Forest Research tour. Blackwood Division ash trees are showing signs of the highly destructive emerald ash borer invasion.

Surprisingly little is known about lightning, not even how it is first formed. “There are a shocking number of things,” he said, pausing to let his pun sink in, “that we really don’t understand about how lightning works. Starting with the very beginning, nobody knows exactly how it starts. Like, really the physics of that.”  But Cummer loves his research and has made some advances in this field (like devising more precise sensor systems), “When you’re the first person to understand something and you haven’t written about it yet or told anyone about it… that’s the best feeling.”

The Duke Forest hosted 49 research projects last year, which —with less than half of the projects reporting—represented over a million dollars of investment in Duke Forest-based work. 

“The Duke Forest is more than just a place to walk and to jog. It’s an outdoor classroom. It’s a living laboratory. It’s where faculty and teachers and students of all ages come to learn and explore,” explained Sara Childs, Duke Forest director.

The Duke Forest offers their research tour every year. Members of the public can sign up for the email newsletter to be notified about future events.

Post by Véronique Koch

Page 1 of 8

Powered by WordPress & Theme by Anders Norén