Integrative Molecular Diagnosis — A FHIR Integration Approach

Steve Munini
Helios Software
Published in
10 min readJul 12, 2021

--

What A Difference A Day Makes¹ is the story of Michelle, the 2-year-old daughter of Christina and Joe Lowry. The lump on Michelle’s neck was initially thought to be a bug bite, or a swollen lymph node resulting from an infection. When antibiotics didn’t work, a biopsy revealed that it was, in fact, a sarcoma.

The team at Children’s Hospital Los Angeles quickly ruled out radiation and surgery due to the location of the aggressive tumor which had rapidly grown in just a matter of days and was now blocking 50 percent of Michelle’s airway. The only path forward was some type of chemotherapy. The team at Children’s Hospital Los Angeles knew the type of cancer Michelle had, and they knew that they could use OncoKids®, a unique next-generation, sequence-based panel to identify the genetic alterations of Michelle’s tumor.

After taking a sample of the tumor, within 24 hours, the gene mutation was understood, and the team turned to rapidly identify a drug target. Michelle was given an oral drug called larotrectinib. Michelle responded to the medication within 24 hours and was shortly able to go home with her family.

The Power of Integration

Michelle’s successful outcome is the result of the rapid integration of complex healthcare data. Organizing that data, and presenting it in a coherent fashion to clinicians is an orchestration challenge. Just think about the different departments within the hospital that needed to collaborate rapidly to save Michelle’s life:

  • Clinicians working directly with the patient
  • MRIs performed by the Radiology department
  • Molecular data from the Center for Personalized Medicine
  • Slide data originating in the Pathology department and from the Bio Repository department

It’s unrealistic to think that one individual would understand all of these specialties, their systems, and data, understand the patient’s cancer, and understand the latest treatment options available. A team approach must be employed.

In recent years, integrative reports have been produced to assist clinicians in their diagnostic work and treatment selection, however, more can be done. Clinicians are now asking the question:

Can we further accelerate our diagnostic and treatment selection pipelines, especially for cases like Michelle’s which are time-critical?

FHIR as an Architecture for Integrative Molecular Diagnosis

Clinical researchers are asking next-generation machine learning and analytic questions to help accelerate our understanding of the massive amounts of data that emanate from modern imaging, slide data, and genomic systems.

Questions like:

  • Can we predict the cancer type from imaging data?
  • Can we predict the gene of interest?
  • Can we predict the mutation of interest?
  • Can we predict outcome?
  • Can we determine the best therapeutic/pharmacological solution?
  • How rapidly and safely can we answer these questions?

Numerous sources of data must be rapidly accessed, integrated, and analyzed, often using different analysis techniques and technologies. Here is a partial list of data sources that may need to be accessed:

  • EMR patient, encounter information, prior and preliminary test results.
  • Pathology system slide data for tumor/tissue resection, and molecular reports for genetic testing including cyogenetics, clinical microarrays, and next-generation sequencing tests.
  • Electronic Slide Management system
  • MRI data
  • Molecular data including variant calling files and phenotype/clinical indications
  • Oncology database of pharmacogenomics targets
  • Oncology database of chemotherapy regimen

Assembling a complete optimal treatment selection pipeline would be well beyond the scope of this article, however, we wish to explore how FHIR and the Helios FHIR Server may be used to provide an architecture for integrative molecular diagnosis.

https://blog.heliossoftware.com/fhir-architectural-patterns-ae828b13d40c

Clearly, the integration challenges in this problem set are abundant:

  • We need to work with many different kinds and types of healthcare data.
  • We need to rapidly integrate the data into a single, unified data model and data environment.
  • We need to rapidly apply advanced analytics to the problem at hand.

FHIR, and the Helios FHIR Server’s implementation of the standard is ideally suited to solving this problem.

  • FHIR provides a standards-based domain model. Everyone on the analytics team can think and speak in terms of FHIR Resources.
  • The FHIR Standard has built-in extensibility, enabling us to easily describe nonstandard data elements.
  • Support for FHIR by EMRs is growing rapidly, and with new bulk data API support becoming available, we will have access to even more FHIR data.
  • The Helios FHIR Server is a FHIR-native healthcare analytics solution, which uses FHIR as its data model and exposes it for analytic use cases.
  • The Helios FHIR Server enables analytics professionals and developers to use their preferred tools of choice.

This article explores two areas of integrative molecular diagnosis:

  • Identifying Cancer from Slide Images
  • Rapidly Summarizing Complex Textual Reports with NLP

We use FHIR as an integrative technology to characterize and describe these complex healthcare data types as well as store our resulting computed analysis.

Identifying Cancer from Slide Images

Let’s dive into one part of the overall pipeline — identifying cancer from slide images. In the example below, we explore how a research team might identify cancer using machine learning, and the Helios FHIR Server. It is important to note that this example is not a production system — it is a showcase of how a FHIR architecture may be implemented.

Below is a step-by-step guide using the Helios FHIR Server, Cassandra, Spark and Python. The overall program flow includes:

  • Load required libraries
  • Load train and test image data, and train the model using Databrick’s Machine Learning package.
  • Search for a patient with last name Chalmers, and navigate the FHIR Resource relationships to collect the patient’s recent Media FHIR Resources containing images to test.
  • Perform the image analysis.
  • Create a new RiskAssessment FHIR Resource to store the results of the image analysis including its prediction average.
FHIR Resources used in this example.

Image Analysis Code Walkthrough

The example code below is from the following Jupyter project file:

https://github.com/HeliosSoftware/pyspark-analytics/blob/master/cancer-prediction-full.ipynb

First, we load the train and test data into a dataframe.

Next, we prepare the data pipeline. We use the DeepImageFeaturizer that uses TensorFlow which enables us to turn an image file into a machine-understandable data structure.

Here, we are training the model and transforming them to obtain positive and negative predictions.

Using the Prediction Model

Now that we have trained the model (p_model), we will use it in “production” to predict positive/negative image data for a patient with the last name of Chalmers.

Here, the Python code is performing a series of FHIR queries against the Helios FHIR Server to obtain the necessary image data.

Next, we use the previously trained model to perform the prediction.

Next, we obtain a prediction average as an expression of prediction confidence:

Finally, we create a new RiskAssessment FHIR Resource to store our prediction, linking the Patient, Encounter and DiagnosticReport using FHIR References (See the References Between Resources in this article for more information on how references work in FHIR).

How Does It Work?

Image classification is a binary classification problem. The most complicated part is turning the images into a data structure on which the algorithm can run. This implementation uses the “InceptionV3” model in a “DeepImageFeaturizer” to take advantage of TensorFlow to turn the images into features, which are essentially arrays of numbers. Then, we will Spark’s LogisticRegression classifier to build a model that can read the features and labels of malignant or benign images and construct a model (or function) that can take any new feature and output a prediction of either malignant or benign.

An excellent article describing this approach is Making Image Classification Simple With Spark Deep Learning.

Rapidly Summarizing Complex Textual Reports with NLP

In an integrative molecular diagnostic approach, the analysis we did above on cancer image slides is just one data point that may help clinicians rapidly arrive at a diagnosis. Next, we explore using NLP (Natural Language Processing) to quickly synthesize and summarize a potentially complex report for faster reading.

First, we load the sparknlp package from John Snow Labs.

Next, we summarize the text using NLP for easier, more succinct reading. We use an extractive algorithm that identifies the most important sentences and produces an output of 50 words or less.

Here, we are adding the summary text to the DiagnosticReport, and issuing a PUT command to the Helios FHIR Server which creates a new version (version 2) of the same DiagnosicReport.

Setup HOW TO

What follows is a step-by-step guide for running this example yourself. The steps below help you install the necessary prerequisites, install the Cassandra database, install the Helios FHIR Server, install Spark, and finally run the Jupyter notebook.

Prerequisites

You will need two different versions of Java available on your computer. These instructions assume a Mac environment.

  • Java 8 — for running Cassandra and Spark
$ brew tap AdoptOpenJDK/openjdk
$ brew cask install adoptopenjdk8
  • Java 11 — for running the Helios FHIR Server
$ brew cask install adoptopenjdk11

Add the following lines to your ~/.bash_profile so you can easily switch between the two versions of Java:

export JAVA_8_HOME=$(/usr/libexec/java_home -v1.8)
export JAVA_11_HOME=$(/usr/libexec/java_home -v11)
alias java8='export JAVA_HOME=$JAVA_8_HOME'
alias java11='export JAVA_HOME=$JAVA_11_HOME'

Refresh your environment:

$ source ~/.bash_profile

Always check your version of java with:

$ java -version
openjdk version "11.0.5" 2019-10-15
OpenJDK Runtime Environment AdoptOpenJDK (build 11.0.5+10)
OpenJDK 64-Bit Server VM AdoptOpenJDK (build 11.0.5+10, mixed mode)

Above, you can see I’m running Java 11. Now, with this command, I’m running Java 8.

$ java8
$ java -version
java version "1.8.0_211"
Java(TM) SE Runtime Environment (build 1.8.0_211-b12)
Java HotSpot(TM) 64-Bit Server VM (build 25.211-b12, mixed mode)

Install Cassandra

Follow these instructions to install Cassandra locally.

Install the Helios FHIR Server

Navigate to https://heliossoftware.com/download-enterprise-edition/ and download the latest version of the Helios FHIR Server Enterprise Edition.

Follow these instructions to install the Helios FHIR Server.

After you have logged in to the administrative user interface, enable the following FHIR Resources:

DiagnosticReport
Encounter
Media
Patient
Specimen

Clone the pyspark-analytics project

This is the GitHub project that contains the Jupyter project we will be using.

$ git clone git@github.com:HeliosSoftware/pyspark-analytics.git

This project also contains a folder of sample data in “/sample-data”

Import the Sample Data

Build and run the fhir-importer using Java 11.

$ java11
$ git clone git@github.com:HeliosSoftware/fhir-importer.git
$ cd fhir-importer
$ mvn clean install

The fhir-importer jar will reside in the /target folder.

Run the following command.

$ java -jar <path to fhir-importer>/fhir-importer-0.0.1-SNAPSHOT.jar -directory <path to pyspark-analytics project>/sample-data

Install Jupyter

Assuming you have python installed on your computer, simply execute the following commands:

$ sudo pip3 install jupyter
$ sudo pip3 install numpy

Install Image Classification Dependencies

pip install tensorflow
pip install nose
pip install pillow
pip install keras
pip install h5py
pip install py4j

Make sure to be running Python 3.7, not 3.8.

Install Spark

Download and unzip spark from https://spark.apache.org/downloads.html

Select the 2.4.x Spark release and the “Prebuilt for Apache Hadoop 2.7and later” package type.

Un-tar Spark to a convenient folder.

Add the following to your ~/.bash_profile

export SPARK_HOME="/Users/[your username]/[a directory]/spark-2.4.7-bin-hadoop2.7" 
export PYSPARK_DRIVER_PYTHON=jupyter
export PYSPARK_DRIVER_PYTHON_OPTS='notebook'
export PATH=$SPARK_HOME/bin:$PATH

Refresh your environment:

$ source ~/.bash_profile

Clone the pyspark-analytics project

This is the GitHub project that contains the Jupyter project we will be using.

$ git clone git@github.com:HeliosSoftware/pyspark-analytics.git

Run pyspark!

Run the following command in the same directory as your jupyter notebook (ie where you cloned the https://github.com/HeliosSoftware/pyspark-analytics project)

Make sure to be running Java 8 before running pyspark. Some of the libraries used in this project require Java 8.

$ pyspark --driver-memory 12g --packages databricks:spark-deep-learning:1.5.0-spark2.4-s_2.11

Open the cancer-prediction-full.ipynb notebook file and run it! You will find detailed comments describing the logic in the notebook itself.

References1.  Children's Hospital of Los Angeles | What A Difference A Day Makes | Updated on November 2019

--

--