Kohane, Isaac, Bruce Aronow, Paul Avillach, Brett Beaulieu-Jones, Riccardo Bellazzi, Robert Bradford, Gabriel Brat, et al. 2021. “What Every Reader Should Know About Studies Using Electronic Health Record Data but May Be Afraid to Ask”. J Med Internet Res 23 (3): e22219.
Coincident with the tsunami of COVID-19-related publications, there has been a surge of studies using real-world data, including those obtained from the electronic health record (EHR). Unfortunately, several of these high-profile publications were retracted because of concerns regarding the soundness and quality of the studies and the EHR data they purported to analyze. These retractions highlight that although a small community of EHR informatics experts can readily identify strengths and flaws in EHR-derived studies, many medical editorial teams and otherwise sophisticated medical readers lack the framework to fully critically appraise these studies. In addition, conventional statistical analyses cannot overcome the need for an understanding of the opportunities and limitations of EHR-derived studies. We distill here from the broader informatics literature six key considerations that are crucial for appraising studies utilizing EHR data: data completeness, data collection and handling (eg, transformation), data type (ie, codified, textual), robustness of methods against EHR variability (within and across institutions, countries, and time), transparency of data and analytic code, and the multidisciplinary approach. These considerations will inform researchers, clinicians, and other stakeholders as to the recommended best practices in reviewing manuscripts, grants, and other outputs from EHR-data derived studies, and thereby promote and foster rigor, quality, and reliability of this rapidly growing field.
Weber, Griffin, Chuan Hong, Nathan Palmer, Paul Avillach, Shawn Murphy, Alba Gutiérrez-Sacristán, Zongqi Xia, et al. 2021. “International Comparisons of Harmonized Laboratory Value Trajectories to Predict Severe COVID-19: Leveraging the 4CE Collaborative Across 342 Hospitals and 6 Countries: A Retrospective Cohort Study”. MedRxiv.
OBJECTIVES: To perform an international comparison of the trajectory of laboratory values among hospitalized patients with COVID-19 who develop severe disease and identify optimal timing of laboratory value collection to predict severity across hospitals and regions. DESIGN: Retrospective cohort study. SETTING: The Consortium for Clinical Characterization of COVID-19 by EHR (4CE), an international multi-site data-sharing collaborative of 342 hospitals in the US and in Europe. PARTICIPANTS: Patients hospitalized with COVID-19, admitted before or after PCR-confirmed result for SARS-CoV-2. Primary and secondary outcome measures: Patients were categorized as ″ever-severe″ or ″never-severe″ using the validated 4CE severity criteria. Eighteen laboratory tests associated with poor COVID-19-related outcomes were evaluated for predictive accuracy by area under the curve (AUC), compared between the severity categories. Subgroup analysis was performed to validate a subset of laboratory values as predictive of severity against a published algorithm. A subset of laboratory values (CRP, albumin, LDH, neutrophil count, D-dimer, and procalcitonin) was compared between North American and European sites for severity prediction. RESULTS: Of 36,447 patients with COVID-19, 19,953 (43.7%) were categorized as ever-severe. Most patients (78.7%) were 50 years of age or older and male (60.5%). Longitudinal trajectories of CRP, albumin, LDH, neutrophil count, D-dimer, and procalcitonin showed association with disease severity. Significant differences of laboratory values at admission were found between the two groups. With the exception of D-dimer, predictive discrimination of laboratory values did not improve after admission. Sub-group analysis using age, D-dimer, CRP, and lymphocyte count as predictive of severity at admission showed similar discrimination to a published algorithm (AUC=0.88 and 0.91, respectively). Both models deteriorated in predictive accuracy as the disease progressed. On average, no difference in severity prediction was found between North American and European sites. CONCLUSIONS: Laboratory test values at admission can be used to predict severity in patients with COVID-19. Prediction models show consistency across international sites highlighting the potential generalizability of these models.
Beaulieu-Jones, Brett, William Yuan, Gabriel Brat, Andrew Beam, Griffin Weber, Marshall Ruffin, and Isaac Kohane. 2021. “Machine learning for patient risk stratification: standing on, or looking over, the shoulders of clinicians?”. NPJ Digit Med 4 (1): 62.
Machine learning can help clinicians to make individualized patient predictions only if researchers demonstrate models that contribute novel insights, rather than learning the most likely next step in a set of actions a clinician will take. We trained deep learning models using only clinician-initiated, administrative data for 42.9 million admissions using three subsets of data: demographic data only, demographic data and information available at admission, and the previous data plus charges recorded during the first day of admission. Models trained on charges during the first day of admission achieve performance close to published full EMR-based benchmarks for inpatient outcomes: inhospital mortality (0.89 AUC), prolonged length of stay (0.82 AUC), and 30-day readmission rate (0.71 AUC). Similar performance between models trained with only clinician-initiated data and those trained with full EMR data purporting to include information about patient state and physiology should raise concern in the deployment of these models. Furthermore, these models exhibited significant declines in performance when evaluated over only myocardial infarction (MI) patients relative to models trained over MI patients alone, highlighting the importance of physician diagnosis in the prognostic performance of these models. These results provide a benchmark for predictive accuracy trained only on prior clinical actions and indicate that models with similar performance may derive their signal by looking over clinician's shoulders-using clinical behavior as the expression of preexisting intuition and suspicion to generate a prediction. For models to guide clinicians in individual decisions, performance exceeding these benchmarks is necessary.
Estiri, Hossein, Zachary Strasser, Gabriel Brat, Yevgeniy Semenov, The Consortium Characterization COVID-19 EHR (4CE), Chirag Patel, and Shawn Murphy. 2021. “Evolving Phenotypes of non-hospitalized Patients that Indicate Long Covid”. MedRxiv.
For some SARS-CoV-2 survivors, recovery from the acute phase of the infection has been grueling with lingering effects. Many of the symptoms characterized as the post-acute sequelae of COVID-19 (PASC) could have multiple causes or are similarly seen in non-COVID patients. Accurate identification of phenotypes will be important to guide future research and help the healthcare system focus its efforts and resources on adequately controlled age- and gender-specific sequelae of a COVID-19 infection. In this retrospective electronic health records (EHR) cohort study, we applied a computational framework for knowledge discovery from clinical data, MLHO, to identify phenotypes that positively associate with a past positive reverse transcription-polymerase chain reaction (RT-PCR) test for COVID-19. We evaluated the post-test phenotypes in two temporal windows at 3-6 and 6-9 months after the test and by age and gender. Data from longitudinal diagnosis records stored in EHRs from Mass General Brigham in the Boston metropolitan area was used for the analyses. Statistical analyses were performed on data from March 2020 to June 2021. Study participants included over 96 thousand patients who had tested positive or negative for COVID-19 and were not hospitalized. We identified 33 phenotypes among different age/gender cohorts or time windows that were positively associated with past SARS-CoV-2 infection. All identified phenotypes were newly recorded in patients’ medical records two months or longer after a COVID-19 RT-PCR test in non-hospitalized patients regardless of the test result. Among these phenotypes, a new diagnosis record for anosmia and dysgeusia (OR: 2.60, 95% CI [1.94 - 3.46]), alopecia (OR: 3.09, 95% CI [2.53 - 3.76]), chest pain (OR: 1.27, 95% CI [1.09 - 1.48]), chronic fatigue syndrome (OR 2.60, 95% CI [1.22-2.10]), shortness of breath (OR 1.41, 95% CI [1.22 - 1.64]), pneumonia (OR 1.66, 95% CI [1.28 - 2.16]), and type 2 diabetes mellitus (OR 1.41, 95% CI [1.22 - 1.64]) are some of the most significant indicators of a past COVID-19 infection. Additionally, more new phenotypes were found with increased confidence among the cohorts who were younger than 65. Our approach avoids a flood of false positive discoveries while offering a more robust probabilistic approach compared to the standard linear phenome-wide association study (PheWAS). The findings of this study confirm many of the post-COVID symptoms and suggest that a variety of new diagnoses, including new diabetes mellitus and neurological disorder diagnoses, are more common among those with a history of COVID-19 than those without the infection. Additionally, more than 63 percent of PASC phenotypes were observed in patients under 65 years of age, pointing out the importance of vaccination to minimize the risk of debilitating post-acute sequelae of COVID-19 among younger adults.Competing Interest StatementThe authors have declared no competing interest.Funding StatementThis work was supported by the National Human Genome Research Institute grant 3U01HG008685-05S2 and the National Library of Medicine grant T15LM007092.Author DeclarationsI confirm all relevant ethical guidelines have been followed, and any necessary IRB and/or ethics committee approvals have been obtained.YesThe details of the IRB/oversight body that provided approval or exemption for the research described are given below:The use of clinical data in this study was approved by the MGB Human Research Committee with a waiver of informed consent.All necessary patient/participant consent has been obtained and the appropriate institutional forms have been archived.YesI understand that all clinical trials and any other prospective interventional studies must be registered with an ICMJE-approved registry, such as I confirm that any such study reported in the manuscript has been registered and the trial registration ID is provided (note: if posting a prospective study registered retrospectively, please provide a statement in the trial ID field explaining why the study was not registered in advance).Yes I have followed all appropriate research reporting guidelines and uploaded the relevant EQUATOR Network research reporting checklist(s) and other pertinent material as supplementary files, if applicable.YesData contains PHI and therefore is not publicly available.
Klann, Jeffrey, Hossein Estiri, Griffin Weber, Bertrand Moal, Paul Avillach, Chuan Hong, Amelia Tan, et al. 2021. “Validation of an internationally derived patient severity phenotype to support COVID-19 analytics from electronic health record data”. Journal of the American Medical Informatics Association 28 (7): 1411-20.
The Consortium for Clinical Characterization of COVID-19 by EHR (4CE) is an international collaboration addressing coronavirus disease 2019 (COVID-19) with federated analyses of electronic health record (EHR) data. We sought to develop and validate a computable phenotype for COVID-19 severity.Twelve 4CE sites participated. First, we developed an EHR-based severity phenotype consisting of 6 code classes, and we validated it on patient hospitalization data from the 12 4CE clinical sites against the outcomes of intensive care unit (ICU) admission and/or death. We also piloted an alternative machine learning approach and compared selected predictors of severity with the 4CE phenotype at 1 site.The full 4CE severity phenotype had pooled sensitivity of 0.73 and specificity 0.83 for the combined outcome of ICU admission and/or death. The sensitivity of individual code categories for acuity had high variability—up to 0.65 across sites. At one pilot site, the expert-derived phenotype had mean area under the curve of 0.903 (95\% confidence interval, 0.886-0.921), compared with an area under the curve of 0.956 (95\% confidence interval, 0.952-0.959) for the machine learning approach. Billing codes were poor proxies of ICU admission, with as low as 49\% precision and recall compared with chart review.We developed a severity phenotype using 6 code classes that proved resilient to coding variability across international institutions. In contrast, machine learning approaches may overfit hospital-specific orders. Manual chart review revealed discrepancies even in the gold-standard outcomes, possibly owing to heterogeneous pandemic conditions.We developed an EHR-based severity phenotype for COVID-19 in hospitalized patients and validated it at 12 international sites.
Stensland, Kristian, Peter Chang, David Jiang, David Canes, Aaron Berkenwald, Adrian Waisman, Kortney Robinson, et al. 2021. “Reducing postoperative opioid pill prescribing via a quality improvement approach”. International Journal for Quality in Health Care 33 (3).
The opioid epidemic has been fueled by prescribing unnecessary quantities of opioid pills for postoperative use. While evidence mounts that postoperative opioids can be reduced or eliminated, implementing such changes within various institutions can be met with many barriers to adoption.To address excess opioid prescribing within our institutions, we applied a plan-do-study-act (PDSA)-like quality improvement strategy to assess local opioid prescribing and use, modify our institutional protocols, and assess the impacts of the change. The opioid epidemic has been fueled by prescribing unnecessary quantities of opioid pills for postoperative use. While evidence mounts that postoperative opioids can be reduced or eliminated, implementing such changes within various institutions can be met with many barriers to adoption. We describe our approach, findings, and lessons learned from our quality improvement approach.We prospectively recorded home pain pill usage after robotic-assisted laparoscopic prostatectomy (RALP) and robotic-assisted partial nephrectomy (RAPN) at two academic institutions from July 2016 to July 2019. Patients prospectively recorded their home pain pill use on a take-home log. Other factors, including numeric pain rating scale on the day of discharge, were extracted from patient records. We analyzed our data and modified opioid prescription protocols to meet the reported use data of 80\% of patients. We continued collecting data after the protocol change. We also used our prospectively collected data to assess the accuracy of a retrospective phone survey designed to measure postdischarge opioid use. Our primary outcomes were the proportion of patients taking zero opioid pills postdischarge, median pills taken after discharge and the number of excess pills prescribed but not taken. We compared these outcomes before and after protocol change.A total of 266 patients (193 RALP, 73 RAPN) were included. Reducing the standard number of prescribed pills did not increase the percentage of patients taking zero pills postdischarge in either group (RALP: 47\% vs. 41\%; RAPN 48\% vs. 34\%). The patients in either group reporting postoperative Day 1 pain score of 0 or 1 were much more likely to use zero postdischarge opioid pills. Our reduction in prescribing protocol resulted in an estimated reduction in excess pills from 1555 excess pills in the prior protocol to just 155 excess pills in the new protocol.Our PDSA-like approach led to an acceptable protocol revision resulting in significant reductions in excess pills released into the community. Reducing the quantity of opioids prescribed postoperatively does not increase the percentage of patients taking zero pills postdischarge. To eliminate opioid use may require no-opioid pathways. Our approach can be used in implementing zero opioid discharge plans and can be applied to opioid reduction interventions at other institutions where barriers to reduced prescribing exist.


Zhang, Michael, Xiaotian Cheng, Daniel Copeland, Arjun Desai, Melody Guan, Gabriel Brat, and Serena Yeung. (2020) 2020. “Using Computer Vision to Automate Hand Detection and Tracking of Surgeon Movements in Videos of Open Surgery”. AMIA Annu Symp Proc 2020: 1373-82.
Open, or non-laparoscopic surgery, represents the vast majority of all operating room procedures, but few tools exist to objectively evaluate these techniques at scale. Current efforts involve human expert-based visual assessment. We leverage advances in computer vision to introduce an automated approach to video analysis of surgical execution. A state-of-the-art convolutional neural network architecture for object detection was used to detect operating hands in open surgery videos. Automated assessment was expanded by combining model predictions with a fast object tracker to enable surgeon-specific hand tracking. To train our model, we used publicly available videos of open surgery from YouTube and annotated these with spatial bounding boxes of operating hands. Our model's spatial detections of operating hands significantly outperforms the detections achieved using pre-existing hand-detection datasets, and allow for insights into intra-operative movement patterns and economy of motion.
Brat, Gabriel, Griffin Weber, Nils Gehlenborg, Paul Avillach, Nathan Palmer, Luca Chiovato, James Cimino, et al. (2020) 2020. “International electronic health record-derived COVID-19 clinical course profiles: the 4CE consortium”. NPJ Digit Med 3: 109.
We leveraged the largely untapped resource of electronic health record data to address critical clinical and epidemiological questions about Coronavirus Disease 2019 (COVID-19). To do this, we formed an international consortium (4CE) of 96 hospitals across five countries ( Contributors utilized the Informatics for Integrating Biology and the Bedside (i2b2) or Observational Medical Outcomes Partnership (OMOP) platforms to map to a common data model. The group focused on temporal changes in key laboratory test values. Harmonized data were analyzed locally and converted to a shared aggregate form for rapid analysis and visualization of regional differences and global commonalities. Data covered 27,584 COVID-19 cases with 187,802 laboratory tests. Case counts and laboratory trajectories were concordant with existing literature. Laboratory tests at the time of diagnosis showed hospital-level differences equivalent to country-level variation across the consortium partners. Despite the limitations of decentralized data generation, we established a framework to capture the trajectory of COVID-19 disease in patients and their response to interventions.
Teja, Bijan, Dana Raub, Sabine Friedrich, Paul Rostin, Maria Patrocínio, Jeffrey Schneider, Changyu Shen, et al. 2020. “Incidence, Prediction, and Causes of Unplanned 30-Day Hospital Admission After Ambulatory Procedures”. Anesth Analg 131 (2): 497-507.
BACKGROUND: Unanticipated hospital admission is regarded as a measure of adverse perioperative patient care. However, previously published studies for risk prediction after ambulatory procedures are sparse compared to those examining readmission after inpatient surgery. We aimed to evaluate the incidence and reasons for unplanned admission after ambulatory surgery and develop a prediction tool for preoperative risk assessment. METHODS: This retrospective cohort study included adult patients undergoing ambulatory, noncardiac procedures under anesthesia care at 2 tertiary care centers in Massachusetts, United States, between 2007 and 2017 as well as all hospitals and ambulatory surgery centers in New York State, United States, in 2014. The primary outcome was unplanned hospital admission within 30 days after discharge. We created a prediction tool (the PREdicting admission after Outpatient Procedures [PREOP] score) using stepwise backward regression analysis to predict unplanned hospital admission, based on criteria used by the Centers for Medicare & Medicaid Services, within 30 days after surgery in the Massachusetts hospital network registry. Model predictors included patient demographics, comorbidities, and procedural factors. We validated the score externally in the New York state registry. Reasons for unplanned admission were assessed. RESULTS: A total of 170,983 patients were included in the Massachusetts hospital network registry and 1,232,788 in the New York state registry. Among those, the observed rate of unplanned admission was 2.0% (3504) and 1.7% (20,622), respectively. The prediction model showed good discrimination in the training set with C-statistic of 0.77 (95% confidence interval [CI], 0.77-0.78) and satisfactory discrimination in the validation set with C-statistic of 0.71 (95% CI, 0.70-0.71). The risk of unplanned admission varied widely from 0.4% (95% CI, 0.3-0.4) among patients whose calculated PREOP scores were in the first percentile to 21.3% (95% CI, 20.0-22.5) among patients whose scores were in the 99th percentile. Predictions were well calibrated with an overall ratio of observed-to-expected events of 99.97% (95% CI, 96.3-103.6) in the training and 92.6% (95% CI, 88.8-96.4) in the external validation set. Unplanned admissions were most often related to malignancy, nonsurgical site infections, and surgical complications. CONCLUSIONS: We present an instrument for prediction of unplanned 30-day admission after ambulatory procedures under anesthesia care validated in a statewide cohort comprising academic and nonacademic hospitals as well as ambulatory surgery centers. The instrument may be useful in identifying patients at high risk for 30-day unplanned hospital admission and may be used for benchmarking hospitals, ambulatory surgery centers, and practitioners.