Researchers across the country developed the software platform, led by Justin Reese (Berkeley Lab) and Peter Robinson (Jackson Lab). It analyzes entries in electronic health records (EHRs) to find symptoms common in people diagnosed with long COVID. This helps to define subtypes of the condition.
According to a Berkeley Lab news post, the research identified strong correlations between different long COVID subtypes and pre-existing conditions. These include diabetes and hypertension.
Reese said the research may help improve the understanding of how and why individuals develop symptoms. It also may enable more effective treatments, helping clinicians develop tailored therapies for certain subsets.
The team used a database with EHR information from 6,469 patients diagnosed with long COVID after confirmed COVID-19 infection. They used machine learning to cluster patients into groups, then characterized the groups by analyzing relationships between symptoms and pre-existing diseases and other demographic features.
“Basically, we found long COVID features in the EHR data for each long COVID patient, and then assessed patient-patient similarity using semantic similarity, which essentially allows ‘fuzzy matching’ between features – for example, ‘cough’ is not the same as ‘shortness of breath,’ but they are similar since they both involve lung problems,” Reese said. “We compare all symptoms for the pair of the patients in this way, and get a score of how similar the two long COVID patients are. We can then perform unsupervised machine learning on these scores to find different subtypes.”