Researchers use AI to merge health data
- October 15, 2024
- Steve Rogerson

Researchers at the University of Illinois Chicago are using artificial intelligence (AI) to merge health data with input from nurses and therapists.
The interdisciplinary project will use AI to unify data from a broader range of health professions and create novel, holistic datasets that could transform health care, driving discoveries that positively impact patient outcomes and care.
The collaboration with University of Iowa, University of Missouri and Loyola University and technical partners Microsoft and Tackle AI received up to $10m from the federal Advanced Research Projects Agency for Health, or Arpa-H (arpa-h.gov). The award is the first Arpa-H funding received by UIC, which will serve as the contracting institution.
Researchers aim to combine structured data and free-text notes from nurses, physical and occupational therapists, speech and language pathologists, and physicians for more effective use in electronic health records. These notes often provide additional, valuable information about a patient’s progress, particularly as their care moves outside a hospital or clinic.
The project will focus on two complex patient populations: those who have experienced injuries related to a fall and infants transitioning from the neonatal intensive care unit (NICU) to home. Both populations rely on the care provided by a variety of health professionals.
“Healthcare is an interdisciplinary process, but existing data tools and infrastructure ignore most of the team,” said Andrew Boyd, professor of biomedical and health information sciences at UIC (www.uic.edu). “Other professions see patients more frequently and provide very high-fidelity data that get closer to the reality of the patient, instead of just the brief snapshots in time that you get from data documented by physicians.”
Researchers will use computational methods on the novel data sets to create all-team care summaries and AI applications. They will also use the data to make scientific discoveries that will improve care and treatment for patients.
“Falls and NICU patients require all-team care while in the hospital and via outpatient clinics,” said Catherine Craven, biomedical informatician at the University of Missouri School of Medicine (medicine.missouri.edu). “But fragmented, siloed documentation impedes communication. By unifying these data, we can improve communication between healthcare providers, the patient and their care partners and generate novel scientific insights that improve patient outcomes.”
These advances could also be applied to other care domains in addition to falls and NICU transitions, said Karen Dunn Lopez, professor of nursing at the University of Iowa (uiowa.edu). “When you address complex difficult problems, the insights you gain and solutions you develop will likely be applicable to less complex problems,” Lopez said. “Our team’s work will help us understand how to guide patient-centred decision making about the synergy of care provided by a multidisciplinary team.”
Much of AI’s promise for healthcare is its potential to extract insights automatically from electronic health record data. An algorithm may suggest a diagnosis based on symptoms or lab results, or match patients with the treatment that will be most effective for their case.
More data can lead to better AI guidance. Research has shown that including observations from nurses in patient data can lead to more accurate predictions on measures such as the risk of dying in a hospital than physician notes and lab results alone.
The value of multidisciplinary data is particularly clear for managing adult fall injuries, a complex area of healthcare. Falls are difficult to prevent and can lead to multiple negative health outcomes in older adults.
The top predictor of fall risk is the number of previous falls, but patients may not tell their physicians about all their falls. Reports on falls from emergency-room visits or outpatient therapy sessions may be overlooked in the flood of information in a patient’s health record.
Physical and occupational therapists also collect detailed information relevant to fall risk, such as strength and balance assessments. Because these reports are often subjective and text-based, they are hard to combine with physician notes or numerical data such as test results.
“Data are gold, but until they can be used, they are meaningless,” said Tanvi Bhatt, professor of physical therapy and rehabilitation sciences at UIC. “The text-based notes that we have are more narrative and descriptive, compared to lab measures. But if that text is lost, there is no continuum of care.”
Unifying these data with other sources could help clinicians identify the cause of a patient’s falls and link them with the most appropriate interventions to prevent future injuries. It could also help researchers design and test prediction models of fall risk and share those insights with patients in clear language.
Incorporating these data will also help involve the patient in healthcare decisions, according to Mary Khetani, professor of occupational therapy and rehabilitation sciences at UIC. The narrative notes taken by physical and occupational therapists often come directly from interviews with a patient and their family. Organising the data to share with patients and their caregivers can help them feel more informed and engaged as they navigate multiple healthcare services outside the hospital.
“We know that best practice is centring the expertise of the patient and family in decision making to drive the best outcomes and get their buy-in and adherence,” Khetani said. “But we can’t do that if we overload them with information.”
Computer scientists on the project will use and develop text-mining and language-processing tools to overcome the linguistic and technical hurdles that prevent the integration of data from other disciplines. The research will test whether large language models can be trained to help understand and connect text data across professions.
“Medical data are unique in many ways, one being that they tend to include jargon and other terms that don’t appear commonly in more popular online sources,” said Natalie Parde, associate professor of computer science at UIC. “Language processing tools tend not to work as well when applied to healthcare data. A central technical challenge in this grant is getting these tools and technologies to the point where we can use them reliably in a healthcare setting.”
Once integrated, the data from nurses, rehabilitation therapists and other health professionals can help train more detailed models to predict health risks or treatment effectiveness. AI tools can also generate concise summaries of large amounts of text and data.
For example, a primary care provider may get a synopsis based on their patient’s weekly physical and speech therapy visits. Or the parents of a premature infant could receive a summary of the nursing and rehabilitative therapies delivered in the NICU, to help them transition to follow-up care in a clinic or a natural environment such as their home.
“It’s not just a question of translating into lay language, it’s really a question of understanding what’s important to present to the patient or their provider,” said Barbara Di Eugenio, professor of computer science at UIC.
Through hackathons and other activities using deidentified data, the team will also invite data scientists and software developers to create additional clinical and research applications. All tools developed by the project will be open source and built with input and feedback from health-domain experts.
Other UIC team members on the project include Samantha Bond of the College of Applied Health Sciences, Miiri Kotche from the College of Engineering and David Chestek of the College of Medicine.