Publications

2014

Klann, Jeffrey G, Michael D Buck, Jeffrey Brown, Marc Hadley, Richard Elmore, Griffin M Weber, and Shawn N Murphy. (2014) 2014. “Query Health: Standards-Based, Cross-Platform Population Health Surveillance.”. Journal of the American Medical Informatics Association : JAMIA 21 (4): 650-6. https://doi.org/10.1136/amiajnl-2014-002707.

OBJECTIVE: Understanding population-level health trends is essential to effectively monitor and improve public health. The Office of the National Coordinator for Health Information Technology (ONC) Query Health initiative is a collaboration to develop a national architecture for distributed, population-level health queries across diverse clinical systems with disparate data models. Here we review Query Health activities, including a standards-based methodology, an open-source reference implementation, and three pilot projects.

MATERIALS AND METHODS: Query Health defined a standards-based approach for distributed population health queries, using an ontology based on the Quality Data Model and Consolidated Clinical Document Architecture, Health Quality Measures Format (HQMF) as the query language, the Query Envelope as the secure transport layer, and the Quality Reporting Document Architecture as the result language.

RESULTS: We implemented this approach using Informatics for Integrating Biology and the Bedside (i2b2) and hQuery for data analytics and PopMedNet for access control, secure query distribution, and response. We deployed the reference implementation at three pilot sites: two public health departments (New York City and Massachusetts) and one pilot designed to support Food and Drug Administration post-market safety surveillance activities. The pilots were successful, although improved cross-platform data normalization is needed.

DISCUSSIONS: This initiative resulted in a standards-based methodology for population health queries, a reference implementation, and revision of the HQMF standard. It also informed future directions regarding interoperability and data access for ONC's Data Access Framework initiative.

CONCLUSIONS: Query Health was a test of the learning health system that supplied a functional methodology and reference implementation for distributed population health queries that has been validated at three sites.

Mandl, Kenneth D, Isaac S Kohane, Douglas McFadden, Griffin M Weber, Marc Natter, Joshua Mandel, Sebastian Schneeweiss, et al. (2014) 2014. “Scalable Collaborative Infrastructure for a Learning Healthcare System (SCILHS): Architecture.”. Journal of the American Medical Informatics Association : JAMIA 21 (4): 615-20. https://doi.org/10.1136/amiajnl-2014-002727.

We describe the architecture of the Patient Centered Outcomes Research Institute (PCORI) funded Scalable Collaborative Infrastructure for a Learning Healthcare System (SCILHS, http://www.SCILHS.org) clinical data research network, which leverages the $48 billion dollar federal investment in health information technology (IT) to enable a queryable semantic data model across 10 health systems covering more than 8 million patients, plugging universally into the point of care, generating evidence and discovery, and thereby enabling clinician and patient participation in research during the patient encounter. Central to the success of SCILHS is development of innovative 'apps' to improve PCOR research methods and capacitate point of care functions such as consent, enrollment, randomization, and outreach for patient-reported outcomes. SCILHS adapts and extends an existing national research network formed on an advanced IT infrastructure built with open source, free, modular components.

2013

Natter, Marc D, Justin Quan, David M Ortiz, Athos Bousvaros, Norman T Ilowite, Christi J Inman, Keith Marsolo, et al. (2013) 2013. “An I2b2-Based, Generalizable, Open Source, Self-Scaling Chronic Disease Registry.”. Journal of the American Medical Informatics Association : JAMIA 20 (1): 172-9. https://doi.org/10.1136/amiajnl-2012-001042.

OBJECTIVE: Registries are a well-established mechanism for obtaining high quality, disease-specific data, but are often highly project-specific in their design, implementation, and policies for data use. In contrast to the conventional model of centralized data contribution, warehousing, and control, we design a self-scaling registry technology for collaborative data sharing, based upon the widely adopted Integrating Biology & the Bedside (i2b2) data warehousing framework and the Shared Health Research Information Network (SHRINE) peer-to-peer networking software.

MATERIALS AND METHODS: Focusing our design around creation of a scalable solution for collaboration within multi-site disease registries, we leverage the i2b2 and SHRINE open source software to create a modular, ontology-based, federated infrastructure that provides research investigators full ownership and access to their contributed data while supporting permissioned yet robust data sharing. We accomplish these objectives via web services supporting peer-group overlays, group-aware data aggregation, and administrative functions.

RESULTS: The 56-site Childhood Arthritis & Rheumatology Research Alliance (CARRA) Registry and 3-site Harvard Inflammatory Bowel Diseases Longitudinal Data Repository now utilize i2b2 self-scaling registry technology (i2b2-SSR). This platform, extensible to federation of multiple projects within and between research networks, encompasses >6000 subjects at sites throughout the USA.

DISCUSSION: We utilize the i2b2-SSR platform to minimize technical barriers to collaboration while enabling fine-grained control over data sharing.

CONCLUSIONS: The implementation of i2b2-SSR for the multi-site, multi-stakeholder CARRA Registry has established a digital infrastructure for community-driven research data sharing in pediatric rheumatology in the USA. We envision i2b2-SSR as a scalable, reusable solution facilitating interdisciplinary research across diseases.

Weber, Griffin M. (2013) 2013. “How Many Patients Are ‘normal’? Only 1.55%.”. AMIA Joint Summits on Translational Science Proceedings. AMIA Joint Summits on Translational Science 2013: 79.

When conducting a clinical study, a "normal" control population is often desired. Identifying normal patients in a clinical data repository (CDR) can be challenging because healthy patients do not go the hospital; and, because patients receive care from multiple hospitals, the absence of a diagnosis in one hospital's electronic health record does not mean a patient does not have the disease. We define a set of 10 simple heuristic filters to eliminate patients who would seemingly be poor candidates for a normal control (e.g., chronic conditions, rare diseases, no recent data, etc.). Surprisingly, out of 2,019,774 patients at two large academic hospitals, these filters excluded all but 31,352 (1.55%). This illustrates how difficult it can be to identify control cohorts, and it raises questions of what it truly means to be normal.

McMurry, Andrew J, Shawn N Murphy, Douglas MacFadden, Griffin Weber, William W Simons, John Orechia, Jonathan Bickel, et al. (2013) 2013. “SHRINE: Enabling Nationally Scalable Multi-Site Disease Studies.”. PloS One 8 (3): e55811. https://doi.org/10.1371/journal.pone.0055811.

Results of medical research studies are often contradictory or cannot be reproduced. One reason is that there may not be enough patient subjects available for observation for a long enough time period. Another reason is that patient populations may vary considerably with respect to geographic and demographic boundaries thus limiting how broadly the results apply. Even when similar patient populations are pooled together from multiple locations, differences in medical treatment and record systems can limit which outcome measures can be commonly analyzed. In total, these differences in medical research settings can lead to differing conclusions or can even prevent some studies from starting. We thus sought to create a patient research system that could aggregate as many patient observations as possible from a large number of hospitals in a uniform way. We call this system the 'Shared Health Research Information Network', with the following properties: (1) reuse electronic health data from everyday clinical care for research purposes, (2) respect patient privacy and hospital autonomy, (3) aggregate patient populations across many hospitals to achieve statistically significant sample sizes that can be validated independently of a single research setting, (4) harmonize the observation facts recorded at each institution such that queries can be made across many hospitals in parallel, (5) scale to regional and national collaborations. The purpose of this report is to provide open source software for multi-site clinical studies and to report on early uses of this application. At this time SHRINE implementations have been used for multi-site studies of autism co-morbidity, juvenile idiopathic arthritis, peripartum cardiomyopathy, colorectal cancer, diabetes, and others. The wide range of study objectives and growing adoption suggest that SHRINE may be applicable beyond the research uses and participating hospitals named in this report.

Weber, Griffin M. (2013) 2013. “Identifying Translational Science Within the Triangle of Biomedicine.”. Journal of Translational Medicine 11: 126. https://doi.org/10.1186/1479-5876-11-126.

BACKGROUND: The National Institutes of Health (NIH) Roadmap places special emphasis on "bench-to-bedside" research, or the "translation" of basic science research into practical clinical applications. The Clinical and Translational Science Awards (CTSA) Consortium is one example of the large investments being made to develop a national infrastructure to support translational science, which involves reducing regulatory burdens, launching new educational initiatives, and forming partnerships between academia and industry. However, while numerous definitions have been suggested for translational science, including the qualitative T1-T4 classification, a consensus has not yet been reached. This makes it challenging to tract the impact of these major policy changes.

METHODS: In this study, we use a bibliometric approach to map PubMed articles onto a graph, called the Triangle of Biomedicine. The corners of the triangle represent research related to animals, cells and molecules, and humans; and, the position of a publication on the graph is based on its topics, as determined by its Medical Subject Headings (MeSH). We define translation as movement of a collection of articles, or the articles that cite those articles, towards the human corner.

RESULTS: The Triangle of Biomedicine provides a quantitative way of determining if an individual scientist, research organization, funding agency, or scientific field is producing results that are relevant to clinical medicine. We validate our technique using examples that have been previously described in the literature and by comparing it to prior methods of measuring translational science.

CONCLUSIONS: The Triangle of Biomedicine is a novel way to identify translational science and track changes over time. This is important to policy makers in evaluating the impact of the large investments being made to accelerate translation. The Triangle of Biomedicine also provides a simple visual way of depicting this impact, which can be far more powerful than numbers alone.

Weber, Griffin M. (2013) 2013. “Federated Queries of Clinical Data Repositories: The Sum of the Parts Does Not Equal the Whole.”. Journal of the American Medical Informatics Association : JAMIA 20 (e1): e155-61. https://doi.org/10.1136/amiajnl-2012-001299.

BACKGROUND AND OBJECTIVE: In 2008 we developed a shared health research information network (SHRINE), which for the first time enabled research queries across the full patient populations of four Boston hospitals. It uses a federated architecture, where each hospital returns only the aggregate count of the number of patients who match a query. This allows hospitals to retain control over their local databases and comply with federal and state privacy laws. However, because patients may receive care from multiple hospitals, the result of a federated query might differ from what the result would be if the query were run against a single central repository. This paper describes the situations when this happens and presents a technique for correcting these errors.

METHODS: We use a one-time process of identifying which patients have data in multiple repositories by comparing one-way hash values of patient demographics. This enables us to partition the local databases such that all patients within a given partition have data at the same subset of hospitals. Federated queries are then run separately on each partition independently, and the combined results are presented to the user.

RESULTS: Using theoretical bounds and simulated hospital networks, we demonstrate that once the partitions are made, SHRINE can produce more precise estimates of the number of patients matching a query.

CONCLUSIONS: Uncertainty in the overlap of patient populations across hospitals limits the effectiveness of SHRINE and other federated query tools. Our technique reduces this uncertainty while retaining an aggregate federated architecture.

Weber, Griffin M, and Isaac S Kohane. (2013) 2013. “Extracting Physician Group Intelligence from Electronic Health Records to Support Evidence Based Medicine.”. PloS One 8 (5): e64933. https://doi.org/10.1371/journal.pone.0064933.

Evidence-based medicine employs expert opinion and clinical data to inform clinical decision making. The objective of this study is to determine whether it is possible to complement these sources of evidence with information about physician "group intelligence" that exists in electronic health records. Specifically, we measured laboratory test "repeat intervals", defined as the amount of time it takes for a physician to repeat a test that was previously ordered for the same patient. Our assumption is that while the result of a test is a direct measure of one marker of a patient's health, the physician's decision to order the test is based on multiple factors including past experience, available treatment options, and information about the patient that might not be coded in the electronic health record. By examining repeat intervals in aggregate over large numbers of patients, we show that it is possible to 1) determine what laboratory test results physicians consider "normal", 2) identify subpopulations of patients that deviate from the norm, and 3) identify situations where laboratory tests are over-ordered. We used laboratory tests as just one example of how physician group intelligence can be used to support evidence based medicine in a way that is automated and continually updated.

2012

Kohane, Isaac S, Andrew McMurry, Griffin Weber, Douglas MacFadden, Leonard Rappaport, Louis Kunkel, Jonathan Bickel, et al. (2012) 2012. “The Co-Morbidity Burden of Children and Young Adults With Autism Spectrum Disorders.”. PloS One 7 (4): e33224. https://doi.org/10.1371/journal.pone.0033224.

OBJECTIVES: Use electronic health records Autism Spectrum Disorder (ASD) to assess the comorbidity burden of ASD in children and young adults.

STUDY DESIGN: A retrospective prevalence study was performed using a distributed query system across three general hospitals and one pediatric hospital. Over 14,000 individuals under age 35 with ASD were characterized by their co-morbidities and conversely, the prevalence of ASD within these comorbidities was measured. The comorbidity prevalence of the younger (Age<18 years) and older (Age 18-34 years) individuals with ASD was compared.

RESULTS: 19.44% of ASD patients had epilepsy as compared to 2.19% in the overall hospital population (95% confidence interval for difference in percentages 13.58-14.69%), 2.43% of ASD with schizophrenia vs. 0.24% in the hospital population (95% CI 1.89-2.39%), inflammatory bowel disease (IBD) 0.83% vs. 0.54% (95% CI 0.13-0.43%), bowel disorders (without IBD) 11.74% vs. 4.5% (95% CI 5.72-6.68%), CNS/cranial anomalies 12.45% vs. 1.19% (95% CI 9.41-10.38%), diabetes mellitus type I (DM1) 0.79% vs. 0.34% (95% CI 0.3-0.6%), muscular dystrophy 0.47% vs 0.05% (95% CI 0.26-0.49%), sleep disorders 1.12% vs. 0.14% (95% CI 0.79-1.14%). Autoimmune disorders (excluding DM1 and IBD) were not significantly different at 0.67% vs. 0.68% (95% CI -0.14-0.13%). Three of the studied comorbidities increased significantly when comparing ages 0-17 vs 18-34 with p<0.001: Schizophrenia (1.43% vs. 8.76%), diabetes mellitus type I (0.67% vs. 2.08%), IBD (0.68% vs. 1.99%) whereas sleeping disorders, bowel disorders (without IBD) and epilepsy did not change significantly.

CONCLUSIONS: The comorbidities of ASD encompass disease states that are significantly overrepresented in ASD with respect to even the patient populations of tertiary health centers. This burden of comorbidities goes well beyond those routinely managed in developmental medicine centers and requires broad multidisciplinary management that payors and providers will have to plan for.

Murphy, Shawn N, Anil Dubey, Peter J Embi, Paul A Harris, Brent G Richter, Fran Turisco, Griffin M Weber, James E Tcheng, and Diane Keogh. (2012) 2012. “Current State of Information Technologies for the Clinical Research Enterprise across Academic Medical Centers.”. Clinical and Translational Science 5 (3): 281-4. https://doi.org/10.1111/j.1752-8062.2011.00387.x.

Information technology (IT) to support clinical research has steadily grown over the past 10 years. Many new applications at the enterprise level are available to assist with the numerous tasks necessary in performing clinical research. However, it is not clear how rapidly this technology is being adopted or whether it is making an impact upon how clinical research is being performed. The Clinical Research Forum's IT Roundtable performed a survey of 17 representative academic medical centers (AMCs) to understand the adoption rate and implementation strategies within this field. The results were compared with similar surveys from 4 and 6 years ago. We found the adoption rate for four prominent areas of IT-supported clinical research had increased remarkably, specifically regulatory compliance, electronic data capture for clinical trials, data repositories for secondary use of clinical data, and infrastructure for supporting collaboration. Adoption of other areas of clinical research IT was more irregular with wider differences between AMCs. These differences appeared to be partially due to a set of openly available applications that have emerged to occupy an important place in the landscape of clinical research enterprise-level support at AMC's.