AWS applies machine learning to healthcare data

  • December 14, 2020
  • Steve Rogerson

Amazon Web Services (AWS) has announced HealthLake, a Hipaa-eligible service to help healthcare and life sciences organisations make better use of data.

HealthLake aggregates an organisation’s complete data across various silos and disparate formats into a centralised AWS data lake and automatically normalises this information using machine learning.

The service identifies each piece of clinical information, tags and indexes events in a timeline view with standard labels so it can be easily searched, and structures all the data into the Fast Healthcare Interoperability Resources (FHIR) standard format for a complete view of the health of individual patients and entire populations.

As a result, HealthLake makes it easier to query, perform analytics and run machine learning to derive meaningful value from the newly normalised data. Healthcare systems, pharmaceutical companies, clinical researchers, health insurers and more can use HealthLake to spot trends and anomalies in health data so they can make more precise predictions about the progression of disease, the efficacy of clinical trials, the accuracy of insurance premiums, and other applications.

As machine learning becomes more mainstream, companies across every vertical are trying to apply it to their data to deliver meaningful business value. Healthcare is applying machine learning to improve operations and patient care, with AWS users such as 3M, Anthem, AstraZeneca, Bristol Myers Squibb, Cerner, Fred Hutchinson Cancer Research Center, GE Healthcare, Infor, Pfizer and Philips embracing the cloud and machine learning to get more value out of their data troves.

From family history and clinical observations to diagnoses and medications, healthcare organisations are creating huge volumes of patient information every day with the goal of getting a full view of a patient’s health and applying analytics and machine learning to improve care, analyse population health trends and improve operational efficiency. However, clinical data are complex and renowned for being siloed, incomplete, incompatible, and stored in on-premises systems spread across multiple locations.

Getting all this information aggregated and in the FHIR format is a start towards the goal of standardising structured data, but most data remain unstructured and still need to be tagged, indexed and structured in chronological order to make all the data understandable and able to query.

Some healthcare organisations build rule-based tools to automate the process of transforming unstructured data – such as medical histories, physician notes and medical imaging reports – and tagging clinical information such as diagnoses, medications and procedures, but these often fail because the data need to be normalised across disparate systems and because the tools can’t account for every possible variation in spelling, unintended typos and grammatical errors.

Other organisations use general-purpose optical character recognition (OCR) software to process data sources, but these tools lack the medical expertise to be effective and so organisations resort to manual data entry by medical professionals, which adds expense to the digitisation process.

Even if organisations can aggregate and structure their data, they still need to build their own analytics and machine-learning applications to uncover relationships in the data, discover trends and make precise predictions. The cost and operational complexity of doing all this work are prohibitive to most organisations and, as a result, the vast majority end up missing out on the untapped potential to use their data to improve the health of patients and communities.

HealthLake offers medical providers, health insurers and pharmaceutical companies a service that brings together and makes sense of all their patient data, so healthcare organisations can make more precise predictions about the health of patients and populations. The Hipaa-eligible service lets organisations store, tag, index, standardise, query and apply machine learning to analyse data at petabyte scale in the cloud.

HealthLake lets organisations copy health data from on-premises systems to a secure data lake in the cloud and normalise every patient record across disparate formats automatically. Upon ingestion, it uses machine learning trained to understand medical terminology to identify and tag each piece of clinical information, index events into a timeline view, and enrich the data with standard labels such as medications, conditions, diagnoses and procedures so all this information can be easily searched.

For example, organisations can quickly and accurately find answers to questions such as: “How has the use of cholesterol-lowering medications helped our patients with high blood pressure last year?” To do this, users can create a list of patients by selecting “high cholesterol” from a standard list of medical conditions, “oral drugs” from a menu of treatments, and blood pressure values from the “blood pressure” structured field, and then they can further refine the list by choosing attributes such as time frame, gender and age.

Because HealthLake also automatically structures all a healthcare organisation’s data into the FHIR industry format, the information can be easily and securely shared between health systems and with third-party applications, enabling providers to collaborate more effectively and allowing patients unfettered access to their medical information.

“There has been an explosion of digitised health data in recent years with the advent of electronic medical records, but organisations are telling us that unlocking the value from this information using technology like machine learning is still challenging and riddled with barriers,” said Swami Sivasubramanian, AWS vice president. “With Amazon HealthLake, healthcare organisations can reduce the time it takes to transform health data in the cloud from weeks to minutes so it can be analysed securely, even at petabyte scale. This completely reinvents what’s possible with healthcare and brings us that much closer to everyone’s goal of providing patients with more personalised and predictive treatment for individuals and across entire populations.”

By aggregating, labelling, indexing and structuring all data, HealthLake makes it easier for users to query, analyse and use machine learning to make sense of their data. They can use other AWS analytics and machine-learning services with HealthLake, such as QuickSight for interactive dashboards and SageMaker for building, training and deploying custom machine-learning models.

For example, healthcare organisations can use Jupyter Notebook templates in SageMaker to run analysis for common tasks such as diagnosis predictions, hospital re-admittance probability and operating room use forecasts. Healthcare and life science organisations can use HealthLake to get a complete view of patient and population health, derive insights using analytics and machine learning, and discover previously obscured relationships and trends.