Publications
2023
2022
OBJECTIVE: For multi-center heterogeneous Real-World Data (RWD) with time-to-event outcomes and high-dimensional features, we propose the SurvMaximin algorithm to estimate Cox model feature coefficients for a target population by borrowing summary information from a set of health care centers without sharing patient-level information.
MATERIALS AND METHODS: For each of the centers from which we want to borrow information to improve the prediction performance for the target population, a penalized Cox model is fitted to estimate feature coefficients for the center. Using estimated feature coefficients and the covariance matrix of the target population, we then obtain a SurvMaximin estimated set of feature coefficients for the target population. The target population can be an entire cohort comprised of all centers, corresponding to federated learning, or a single center, corresponding to transfer learning.
RESULTS: Simulation studies and a real-world international electronic health records application study, with 15 participating health care centers across three countries (France, Germany, and the U.S.), show that the proposed SurvMaximin algorithm achieves comparable or higher accuracy compared with the estimator using only the information of the target site and other existing methods. The SurvMaximin estimator is robust to variations in sample sizes and estimated feature coefficients between centers, which amounts to significantly improved estimates for target sites with fewer observations.
CONCLUSIONS: The SurvMaximin method is well suited for both federated and transfer learning in the high-dimensional survival analysis setting. SurvMaximin only requires a one-time summary information exchange from participating centers. Estimated regression vectors can be very heterogeneous. SurvMaximin provides robust Cox feature coefficient estimates without outcome information in the target population and is privacy-preserving.
OBJECTIVE: The growing availability of electronic health records (EHR) data opens opportunities for integrative analysis of multi-institutional EHR to produce generalizable knowledge. A key barrier to such integrative analyses is the lack of semantic interoperability across different institutions due to coding differences. We propose a Multiview Incomplete Knowledge Graph Integration (MIKGI) algorithm to integrate information from multiple sources with partially overlapping EHR concept codes to enable translations between healthcare systems.
METHODS: The MIKGI algorithm combines knowledge graph information from (i) embeddings trained from the co-occurrence patterns of medical codes within each EHR system and (ii) semantic embeddings of the textual strings of all medical codes obtained from the Self-Aligning Pretrained BERT (SAPBERT) algorithm. Due to the heterogeneity in the coding across healthcare systems, each EHR source provides partial coverage of the available codes. MIKGI synthesizes the incomplete knowledge graphs derived from these multi-source embeddings by minimizing a spherical loss function that combines the pairwise directional similarities of embeddings computed from all available sources. MIKGI outputs harmonized semantic embedding vectors for all EHR codes, which improves the quality of the embeddings and enables direct assessment of both similarity and relatedness between any pair of codes from multiple healthcare systems.
RESULTS: With EHR co-occurrence data from Veteran Affairs (VA) healthcare and Mass General Brigham (MGB), MIKGI algorithm produces high quality embeddings for a variety of downstream tasks including detecting known similar or related entity pairs and mapping VA local codes to the relevant EHR codes used at MGB. Based on the cosine similarity of the MIKGI trained embeddings, the AUC was 0.918 for detecting similar entity pairs and 0.809 for detecting related pairs. For cross-institutional medical code mapping, the top 1 and top 5 accuracy were 91.0% and 97.5% when mapping medication codes at VA to RxNorm medication codes at MGB; 59.1% and 75.8% when mapping VA local laboratory codes to LOINC hierarchy. When trained with 500 labels, the lab code mapping attained top 1 and 5 accuracy at 77.7% and 87.9%. MIKGI also attained best performance in selecting VA local lab codes for desired laboratory tests and COVID-19 related features for COVID EHR studies. Compared to existing methods, MIKGI attained the most robust performance with accuracy the highest or near the highest across all tasks.
CONCLUSIONS: The proposed MIKGI algorithm can effectively integrate incomplete summary data from biomedical text and EHR data to generate harmonized embeddings for EHR codes for knowledge graph modeling and cross-institutional translation of EHR codes.
IMPORTANCE: The US health care system is experiencing a sharp increase in opioid-related adverse events and spending, and opioid overprescription may be a key factor in this crisis. Ambient opioid exposure within households is one of the known major dangers of overprescription.
OBJECTIVE: To quantify the association between the postsurgical initiation of prescription opioid use in opioid-naive patients and the subsequent prescription opioid misuse and chronic opioid use among opioid-naive family members.
DESIGN, SETTING, AND PARTICIPANTS: This cohort study was conducted using administrative data from the database of a US commercial insurance provider with more than 35 million covered individuals. Participants included pairs of patients who underwent surgery from January 1, 2008, to December 31, 2016, and their family members within the same household. Data were analyzed from January 1 to November 30, 2018.
EXPOSURES: Duration of opioid exposure and refills of opioid prescriptions received by patients after surgery.
MAIN OUTCOMES AND MEASURES: Risk of opioid misuse and chronic opioid use in family members were calculated using inverse probability weighted Cox proportional hazards regression models.
RESULTS: The final cohort included 843 531 pairs of patients and family members. Most pairs included female patients (445 456 [52.8%]) and male family members (442 992 [52.5%]), and a plurality of pairs included patients in the 45 to 54 years age group (249 369 [29.6%]) and family members in the 15 to 24 years age group (313 707 [37.2%]). A total of 3894 opioid misuse events (0.5%) and 7485 chronic opioid use events (0.9%) occurred in family members. In adjusted models, each additional opioid prescription refill for the patient was associated with a 19.2% (95% CI, 14.5%-24.0%) increase in hazard of opioid misuse in family members. The risk of opioid misuse appeared to increase only in households in which the patient obtained refills. Family members in households with any refill had a 32.9% (95% CI, 22.7%-43.8%) increased adjusted hazard of opioid misuse. When patients became chronic opioid users, the hazard ratio for opioid misuse among family members was 2.52 (95% CI, 1.68-3.80), and similar patterns were found for chronic opioid use.
CONCLUSIONS AND RELEVANCE: This cohort study found that opioid exposure was a household risk. Family members of a patient who received opioid prescription refills after surgery had an increased risk of opioid misuse and chronic opioid use.
The risk profiles of post-acute sequelae of COVID-19 (PASC) have not been well characterized in multi-national settings with appropriate controls. We leveraged electronic health record (EHR) data from 277 international hospitals representing 414,602 patients with COVID-19, 2.3 million control patients without COVID-19 in the inpatient and outpatient settings, and over 221 million diagnosis codes to systematically identify new-onset conditions enriched among patients with COVID-19 during the post-acute period. Compared to inpatient controls, inpatient COVID-19 cases were at significant risk for angina pectoris (RR 1.30, 95% CI 1.09-1.55), heart failure (RR 1.22, 95% CI 1.10-1.35), cognitive dysfunctions (RR 1.18, 95% CI 1.07-1.31), and fatigue (RR 1.18, 95% CI 1.07-1.30). Relative to outpatient controls, outpatient COVID-19 cases were at risk for pulmonary embolism (RR 2.10, 95% CI 1.58-2.76), venous embolism (RR 1.34, 95% CI 1.17-1.54), atrial fibrillation (RR 1.30, 95% CI 1.13-1.50), type 2 diabetes (RR 1.26, 95% CI 1.16-1.36) and vitamin D deficiency (RR 1.19, 95% CI 1.09-1.30). Outpatient COVID-19 cases were also at risk for loss of smell and taste (RR 2.42, 95% CI 1.90-3.06), inflammatory neuropathy (RR 1.66, 95% CI 1.21-2.27), and cognitive dysfunction (RR 1.18, 95% CI 1.04-1.33). The incidence of post-acute cardiovascular and pulmonary conditions decreased across time among inpatient cases while the incidence of cardiovascular, digestive, and metabolic conditions increased among outpatient cases. Our study, based on a federated international network, systematically identified robust conditions associated with PASC compared to control groups, underscoring the multifaceted cardiovascular and neurological phenotype profiles of PASC.
OBJECTIVE: To assess changes in international mortality rates and laboratory recovery rates during hospitalisation for patients hospitalised with SARS-CoV-2 between the first wave (1 March to 30 June 2020) and the second wave (1 July 2020 to 31 January 2021) of the COVID-19 pandemic.
DESIGN, SETTING AND PARTICIPANTS: This is a retrospective cohort study of 83 178 hospitalised patients admitted between 7 days before or 14 days after PCR-confirmed SARS-CoV-2 infection within the Consortium for Clinical Characterization of COVID-19 by Electronic Health Record, an international multihealthcare system collaborative of 288 hospitals in the USA and Europe. The laboratory recovery rates and mortality rates over time were compared between the two waves of the pandemic.
PRIMARY AND SECONDARY OUTCOME MEASURES: The primary outcome was all-cause mortality rate within 28 days after hospitalisation stratified by predicted low, medium and high mortality risk at baseline. The secondary outcome was the average rate of change in laboratory values during the first week of hospitalisation.
RESULTS: Baseline Charlson Comorbidity Index and laboratory values at admission were not significantly different between the first and second waves. The improvement in laboratory values over time was faster in the second wave compared with the first. The average C reactive protein rate of change was -4.72 mg/dL vs -4.14 mg/dL per day (p=0.05). The mortality rates within each risk category significantly decreased over time, with the most substantial decrease in the high-risk group (42.3% in March-April 2020 vs 30.8% in November 2020 to January 2021, p<0.001) and a moderate decrease in the intermediate-risk group (21.5% in March-April 2020 vs 14.3% in November 2020 to January 2021, p<0.001).
CONCLUSIONS: Admission profiles of patients hospitalised with SARS-CoV-2 infection did not differ greatly between the first and second waves of the pandemic, but there were notable differences in laboratory improvement rates during hospitalisation. Mortality risks among patients with similar risk profiles decreased over the course of the pandemic. The improvement in laboratory values and mortality risk was consistent across multiple countries.
BACKGROUND: Many U.S. institutions have adopted postsurgical opioid-prescribing guidelines to standardize prescribing practices, and yet there is inherent variability in patients' opioid consumption after surgery. The utility of these guidelines is limited by the fact that some patients' needs will inevitably exceed them, and yet there are no evidence-based tools to help providers identify these patients. In this study we aimed to maximize the value of these guidelines by training machine learning models to predict patients whose needs will be met by these smaller recommended prescriptions, and patients who may require an additional degree of personalization. The aim of the present study was to develop predictive models for determining whether a surgical patient's postdischarge opioid requirement will fall above or below common opioid prescribing guidelines.
METHODS: We conducted a retrospective cohort study of surgical patients at one institution from 2017 to 2018. Patients were called after discharge to collect opioid consumption data. Machine learning models were used to identify outlier opioid consumers (ie, exceeding our institutional prescribing guidelines) using diagnosis codes, medical history, in-hospital opioid use, and perioperative factors as predictors. External validation was performed on opioid consumption data collected at a second institution from 2020 to 2021, and sensitivity analysis was performed using a third institution's prescribing guidelines.
RESULTS: The development and external validation cohorts included 1,867 and 498 patients, respectively. Age, body mass index, tobacco use, preoperative opioid exposure, and in-hospital opioid consumption were the strongest predictors of postdischarge consumption. A lasso regression model exhibited an area under the receiver operating characteristic curve of 0.74 (95% confidence interval 0.67-0.81) in predicting postdischarge opioid consumption. External validation of a limited lasso model yielded an area under the receiver operating characteristic curve of 0.67 (0.60-0.74). Performance was preserved when evaluated on another institution's guidelines (area under the receiver operating characteristic curve 0.76 [0.72-0.80]).
CONCLUSION: Patient characteristics reliably predict postdischarge opioid consumption in relation to prescribing guidelines for both opioid-naive and exposed populations. This model may be used to help providers confidently follow prescribing guidelines for patients with typical opioid responsiveness and correctly pursue more personalized prescribing for others.
Admissions are generally classified as COVID-19 hospitalizations if the patient has a positive SARS-CoV-2 polymerase chain reaction (PCR) test. However, because 35% of SARS-CoV-2 infections are asymptomatic, patients admitted for unrelated indications with an incidentally positive test could be misclassified as a COVID-19 hospitalization. EHR-based studies have been unable to distinguish between a hospitalization specifically for COVID-19 versus an incidental SARS-CoV-2 hospitalization. From a retrospective EHR-based cohort in four US healthcare systems, a random sample of 1,123 SARS-CoV-2 PCR-positive patients hospitalized between 3/2020â€"8/2021 was manually chart-reviewed and classified as admitted-with-COVID-19 (incidental) vs. specifically admitted for COVID-19 (for-COVID-19). EHR-based phenotyped feature sets filtered out incidental admissions, which occurred in 26%. The top site-specific feature sets had 79-99% specificity with 62-75% sensitivity, while the best performing across-site feature set had 71-94% specificity with 69-81% sensitivity. A large proportion of SARS-CoV-2 PCR-positive admissions were incidental. Straightforward EHR-based phenotypes differentiated admissions, which is important to assure accurate public health reporting and research.