Researchers developed a machine learning algorithm using only claims-based features that could identify cases of pulmonary arterial hypertension (PAH) in electronic medical records and published their results in Respiratory Research

They reported that their algorithm was able to identify PAH cases when deployed across entire electronic medical records and said that the demographic and clinical features of the identified cases “were similar to known PAH patients from previously studied registries.”

The team led by Evan L. Brittain, MD, MSCI, from the Division of Cardiovascular Medicine at Vanderbilt Pulmonary Circulation Center in Nashville, Tennessee used either ICD-9/10 codes, current procedural terminology (CPT) codes, or PAH medications to screen electronic medical records for possible cases of PAH.

Continue Reading

They then manually reviewed a subset of the records as a case of PAH or “not PAH.” They used these records to train and test the machine learning algorithm. Finally, they manually reviewed the second cohort of medical records and combined them with the first to refine the system and make the so-called “final cohort.”

Read more about the diagnosis of PAH

They divided the “final cohort,” once again, into training and testing sets with algorithm characteristics defined on the test set. They validated their algorithm using an independent electronic medical record cohort.

In the first cohort, the researchers identified 194 patients with PAH and 786 patients with “not PAH.” In the “final cohort,” the machine learning algorithm had a sensitivity of 0.88, a specificity of 0.93, a positive predictive value of 0.89, and a negative predictive value of 0.92.

The main features of the algorithm were the persistence and strength of the use of PAH medication and CPT code for a right heart catheterization. When they applied the algorithm to the medical records, the researchers found 265 additional suspected cases of PAH that had typical PAH features in terms of demographics, comorbidities, and hemodynamics.

“This algorithm performed with favorable testing characteristics,” they concluded.


Schuler KP, Hemnes AR, Annis J, et al. An algorithm to identify cases of pulmonary arterial hypertension from the electronic medical record. Respir Res. 2022;28;23(1):138. doi:10.1186/s12931-022-02055-0