John Kahan has a poignant photo in his office at Microsoft, where he oversees customer data and analytics. The image shows Kahan, his wife and three daughters celebrating the birth of a boy, his reddish-blond hair hidden by a hat.
A few hours after the photo was taken, Kahan received a phone call he still has trouble recounting without choking up: his baby son, Aaron, had stopped breathing. A few days later he died with no explanation, a victim of Sudden Infant Death Syndrome.
Last year, with the 13th anniversary of Aaron’s death approaching, Kahan resolved to honor what would have been his only son’s bar mitzvah by climbing Mount Kilimanjaro to raise money and awareness for SIDS research.
When he returned from the climb, his team had a surprise for him — they’d been crunching numbers on infant deaths in the U.S. and using data-analysis algorithms to try to find new ways to reduce the number of babies lost to SIDS each year. To date, the data scientists have put in about 500 hours of their own time. Microsoft contributed free cloud hosting and software tools for their work.
Now, deploying analysis and data-visualization tools that can identify trends, the team has found promising leads in combating SIDS. The technology, normally used to generate Microsoft CEO Satya Nadella’s daily performance metrics dashboard or tell the Windows team how to best serve customers, in this case helped uncover various correlations; for example, lining up early prenatal care with a lower rate of deaths. The work also provides more information on such known SIDS risk factors as maternal tobacco use.
“Aaron died 13 years ago. In 13 years we have not really improved this,” said Kahan, who also lobbies Congress to preserve health-research funding and open up medical data sets for research. “Which basically means roughly 52,000 children in the U.S. have died and you have parents like us that sit there and go ‘I don’t know why.’ ”
Microsoft is partnering with a research team at Seattle Children’s hospital, led by neuroscientist Nino Ramirez. With access to a lab where they can do things like test different factors on slices of mouse-brain tissue, Ramirez’s team is examining which avenues hold up when researched further. Promising work will be published in medical and data-science journals with a view to influencing clinical practice.
“The processing power in the cloud, the visualization capabilities and the ability to take data science algorithms at scale and be able to at lightning speed look at correlations — there’s no way in God’s green earth you could have done that 15 years ago, and even if you could, it would have been massive IBM mainframes all over the place and you’d be waiting for the output,” Kahan says.
Juan Miguel Lavista, a principal data scientist who works for Kahan, was the parent of a week-old baby girl when he walked into Kahan’s office in 2013 and asked about the picture of the baby on his desk. Lavista assumed it was one of Kahan’s daughters until he told him about Aaron. Now Lavista is leading the SIDS project, along with people like Urszula Chajewska, who earlier in her career used machine learning to help ferret out malfunctioning equipment in Intel chip-fabrication plants.
Normally companies like Microsoft use these tools to optimize sales or track their businesses, but they are equally useful finding breakthroughs in health care. “The work we do at Microsoft is very different from the work on SIDS, but from a data-science perspective, it’s not any different,” Lavista says.
Each year in the U.S., six in 1,000 children die in their first year of life, Kahan says, and one of those six dies from unexplained causes.
SIDS is not one condition but rather a confluence of factors that make some babies more vulnerable during a critical period of development, Ramirez says. Finding the factors that combine in these cases, or identifying which babies are most at risk, can help doctors and parents alter risk factors and more closely monitor babies.
Conventional SIDS studies typically comprise only a few hundred cases. By contrast, the Microsoft team has the ability to mine massive data sets collected by the U.S. Centers for Disease Control and look for correlations that would be hard to see across a smaller pool of families. It’s an approach that machine-learning and artificial-intelligence experts have already been applying to the treatment of cancer and other diseases.
The CDC database has 90 columns of information on every child born in the U.S. between 2004 and 2010—29 million records in all. It notes the medical care of the mother during pregnancy, race, education, income and other factors. When a baby dies, that information is captured, too. The Microsoft data scientists created an interactive web showing the relationship between every single variable related to babies and parents and the correlation to SIDS.
One discovery is that women who access prenatal care in their first trimester have a lower-than-average risk of giving birth to a baby that dies of SIDS. Starting prenatal care later than that increases the risk by 30 to 40 percent. The reason may not be the medical care alone, Chajewska says. Rather it may be that the doctor visit serves to persuade pregnant women to do things like quit smoking or take vitamins. But the data help policymakers more precisely weigh the cost of things like free prenatal care against the impact.
“You can go to the politicians and say this is how many more kids are dying because of this. Now suddenly it’s real,” says Ramirez, who directs the hospital’s Center for Integrative Brain Research.
Ultimately, Ramirez wants to create an online work sheet on pregnant women that doctors can fill out to get a view into each patient’s risk factors for SIDS. Right now, the risks discussed with patients are much more general and based on things like race and age, but lacking are many disparate pieces of information into a richer view of probabilities.
“The (SIDS) field is not that big, and it started with pediatricians, but none of them have a background in data science,” Ramirez says. “They have their own databases and they try to bring things together but the professional look at the data doesn’t exist. It exists in genetics and cancer research and it has transformed the field. What we are starting here has transformed a little the SIDS field. “