Big Data Sets May Be the Key for Precision Health

Big Data Sets May Be the Key for Precision Health

Imagine it’s the year 2040. You open up your laptop and sign in to a virtual doctor’s office for your biannual telehealth physical. You step on a scale that transmits your weight and BMI directly to your doctor. Your fitness tracker tells your doctor about your physical activity and heart rate. Your doctor also has data from your medical record, including your genetic information and last year’s CT scan. Algorithms analyze all this information and alert your doctor that you are highly likely to develop heart disease in the next few months. Your doctor spends the next half hour helping you plan healthy changes to avoid getting sick.

This is one vision of precision health, the health care strategy that aims to radically personalize health care. Precision health aims to combine lots of information about a person (such as their genetics, medical history, and lifestyle) to help doctors diagnose and prevent disease.

Big data is critical to realizing vision of precision health.

To realize this vision, however, we have a lot of work to do. To personalize medical treatment, we need to gather and analyze a ton of information. Why? Because before we use information from one individual to predict what sort of treatment they need, we need to analyze information from lots of different people to find the patterns that will help us make those predictions.

One of the best ways to find new patterns in information by compiling massive sets of data, then examining them for trends. These large sets of information from lots of different sources are called “big data.” If we could gather and understand big data about individuals and their health, we could do a better job of treating them when they do get sick or help keep them from getting sick in the first place.

Doctors could use big data to diagnose diseases more precisely.

One way big data could help us personalize health care is through disease sub-typing. Rebecca Boyles is a bioinformatics scientist at RTI International, a nonprofit research organization headquartered in Research Triangle Park. She explains that “there are a whole lot of diseases that up until now have been characterized as kind of one diagnosis. That’s because of the phenotypes or symptomology of the disease. But if you can, from a data point of view, data mine that disease and tease apart what actually may be more complex than that, whether it be breast cancer or chronic obstructive pulmonary disorder or whatever it may be, it may actually be not one single disease despite the fact that it shares a common symptomology.”

Disease sub-typing is incredibly valuable for doctors and patients looking to manage a disease. It can also help researchers and pharma companies develop better treatments.

To get to these insights, we need data, and lots of it. But where does that data come from?

Where does healthcare data come from? Major sources are medical records, biomedical research, and Internet-of-things devices.

Healthcare data comes from a lot of different places. Three major sources of data include:

  • Medical records – these contain clinical data from specific patients, including diagnoses, prescription medicine lists, medical images, and results from lab tests
  • Biomedical research studies – in this case, patients consent to scientists collecting their data for research. The data is de-identified to make sure that patient privacy is protected
  • Internet-of-things devices – these are commonly bits of technology worn by patients that transmit data on specific activities. For example, data collected by Fitbits and other activity trackers, heart-rate monitors, and continuous glucose monitors

There is tension when balancing data access and patient privacy.

Each type of healthcare data has pros and cons for use in research. At the center of these issues, however, is the need to balance data access with a patients’ right to privacy.

For example, biomedical research studies produce some of the most useful and accurate sets of data. However, it’s challenging to convince people to let researchers use their data for future studies, especially when those studies may not have been designed yet.

Boyles believes this is a crucial barrier to overcome. “[The] matter of getting people to be willing to participate in studies in a way that consent is applicable to data re-use; I feel very strongly about this,” she says. “It’s not to undermine an individual’s ability to determine their own consent because I think that’s paramount. But to make that computationally understandable. I feel like there’s been a lack of understanding about how much that has hindered the biological advancement of research.”

Unfortunately, some companies are not good stewards of patient data.

While researchers are held to strict ethical standards for data use, some companies often ignore such constraints. This is particularly problematic when people don’t even realize someone else is using their data.

For example, in 2019 The Washington Post reported that the pregnancy tracking app Ovia shared sensitive medical information with employers. Later that year, Project Nightingale, a semi-secret collaboration between Google and healthcare provider Ascension, made headlines. We learned that Google had been given access to patient medical records without those patients knowing or consenting to the data sharing.

Stories like these understandably make people wary of giving anyone access to their data. They also highlight the need for updated regulations around data privacy that reflect modern priorities and new technology.

Huge potential for future insights, with North Carolina playing a pivotal role.

Despite the challenges, there is good reason to be optimistic about the future of big data in health care. With advances in technologies like medical imaging, new sources of high-quality data can give researchers new insights into how and why people get sick. Says Boyles, “I personally believe medical imaging is going to be the next genomics in terms of big data to challenge biomedical research. I think there’s a whole range that can be brought into play there, and where the innovations that some of my colleagues and myself at RTI work on is aligning those in disease and adverse outcome pathways so we can begin to understand where the genes have a role, where the disease and pathologies have a role, and where we may begin to see an improvement in human health outcome.”

With so much biomedical prowess in North Carolina, there’s a lot to keep track of. To learn more about trends driving innovation in NC, check out our explainers on next generation sequencing and biotechnology.