Publications by Year: 2013

2013

Natter, Marc D, Justin Quan, David M Ortiz, Athos Bousvaros, Norman T Ilowite, Christi J Inman, Keith Marsolo, et al. (2013) 2013. “An I2b2-Based, Generalizable, Open Source, Self-Scaling Chronic Disease Registry.”. Journal of the American Medical Informatics Association : JAMIA 20 (1): 172-9. https://doi.org/10.1136/amiajnl-2012-001042.

OBJECTIVE: Registries are a well-established mechanism for obtaining high quality, disease-specific data, but are often highly project-specific in their design, implementation, and policies for data use. In contrast to the conventional model of centralized data contribution, warehousing, and control, we design a self-scaling registry technology for collaborative data sharing, based upon the widely adopted Integrating Biology & the Bedside (i2b2) data warehousing framework and the Shared Health Research Information Network (SHRINE) peer-to-peer networking software.

MATERIALS AND METHODS: Focusing our design around creation of a scalable solution for collaboration within multi-site disease registries, we leverage the i2b2 and SHRINE open source software to create a modular, ontology-based, federated infrastructure that provides research investigators full ownership and access to their contributed data while supporting permissioned yet robust data sharing. We accomplish these objectives via web services supporting peer-group overlays, group-aware data aggregation, and administrative functions.

RESULTS: The 56-site Childhood Arthritis & Rheumatology Research Alliance (CARRA) Registry and 3-site Harvard Inflammatory Bowel Diseases Longitudinal Data Repository now utilize i2b2 self-scaling registry technology (i2b2-SSR). This platform, extensible to federation of multiple projects within and between research networks, encompasses >6000 subjects at sites throughout the USA.

DISCUSSION: We utilize the i2b2-SSR platform to minimize technical barriers to collaboration while enabling fine-grained control over data sharing.

CONCLUSIONS: The implementation of i2b2-SSR for the multi-site, multi-stakeholder CARRA Registry has established a digital infrastructure for community-driven research data sharing in pediatric rheumatology in the USA. We envision i2b2-SSR as a scalable, reusable solution facilitating interdisciplinary research across diseases.

Weber, Griffin M. (2013) 2013. “How Many Patients Are ‘normal’? Only 1.55%.”. AMIA Joint Summits on Translational Science Proceedings. AMIA Joint Summits on Translational Science 2013: 79.

When conducting a clinical study, a "normal" control population is often desired. Identifying normal patients in a clinical data repository (CDR) can be challenging because healthy patients do not go the hospital; and, because patients receive care from multiple hospitals, the absence of a diagnosis in one hospital's electronic health record does not mean a patient does not have the disease. We define a set of 10 simple heuristic filters to eliminate patients who would seemingly be poor candidates for a normal control (e.g., chronic conditions, rare diseases, no recent data, etc.). Surprisingly, out of 2,019,774 patients at two large academic hospitals, these filters excluded all but 31,352 (1.55%). This illustrates how difficult it can be to identify control cohorts, and it raises questions of what it truly means to be normal.

McMurry, Andrew J, Shawn N Murphy, Douglas MacFadden, Griffin Weber, William W Simons, John Orechia, Jonathan Bickel, et al. (2013) 2013. “SHRINE: Enabling Nationally Scalable Multi-Site Disease Studies.”. PloS One 8 (3): e55811. https://doi.org/10.1371/journal.pone.0055811.

Results of medical research studies are often contradictory or cannot be reproduced. One reason is that there may not be enough patient subjects available for observation for a long enough time period. Another reason is that patient populations may vary considerably with respect to geographic and demographic boundaries thus limiting how broadly the results apply. Even when similar patient populations are pooled together from multiple locations, differences in medical treatment and record systems can limit which outcome measures can be commonly analyzed. In total, these differences in medical research settings can lead to differing conclusions or can even prevent some studies from starting. We thus sought to create a patient research system that could aggregate as many patient observations as possible from a large number of hospitals in a uniform way. We call this system the 'Shared Health Research Information Network', with the following properties: (1) reuse electronic health data from everyday clinical care for research purposes, (2) respect patient privacy and hospital autonomy, (3) aggregate patient populations across many hospitals to achieve statistically significant sample sizes that can be validated independently of a single research setting, (4) harmonize the observation facts recorded at each institution such that queries can be made across many hospitals in parallel, (5) scale to regional and national collaborations. The purpose of this report is to provide open source software for multi-site clinical studies and to report on early uses of this application. At this time SHRINE implementations have been used for multi-site studies of autism co-morbidity, juvenile idiopathic arthritis, peripartum cardiomyopathy, colorectal cancer, diabetes, and others. The wide range of study objectives and growing adoption suggest that SHRINE may be applicable beyond the research uses and participating hospitals named in this report.

Weber, Griffin M. (2013) 2013. “Federated Queries of Clinical Data Repositories: The Sum of the Parts Does Not Equal the Whole.”. Journal of the American Medical Informatics Association : JAMIA 20 (e1): e155-61. https://doi.org/10.1136/amiajnl-2012-001299.

BACKGROUND AND OBJECTIVE: In 2008 we developed a shared health research information network (SHRINE), which for the first time enabled research queries across the full patient populations of four Boston hospitals. It uses a federated architecture, where each hospital returns only the aggregate count of the number of patients who match a query. This allows hospitals to retain control over their local databases and comply with federal and state privacy laws. However, because patients may receive care from multiple hospitals, the result of a federated query might differ from what the result would be if the query were run against a single central repository. This paper describes the situations when this happens and presents a technique for correcting these errors.

METHODS: We use a one-time process of identifying which patients have data in multiple repositories by comparing one-way hash values of patient demographics. This enables us to partition the local databases such that all patients within a given partition have data at the same subset of hospitals. Federated queries are then run separately on each partition independently, and the combined results are presented to the user.

RESULTS: Using theoretical bounds and simulated hospital networks, we demonstrate that once the partitions are made, SHRINE can produce more precise estimates of the number of patients matching a query.

CONCLUSIONS: Uncertainty in the overlap of patient populations across hospitals limits the effectiveness of SHRINE and other federated query tools. Our technique reduces this uncertainty while retaining an aggregate federated architecture.

Weber, Griffin M, and Isaac S Kohane. (2013) 2013. “Extracting Physician Group Intelligence from Electronic Health Records to Support Evidence Based Medicine.”. PloS One 8 (5): e64933. https://doi.org/10.1371/journal.pone.0064933.

Evidence-based medicine employs expert opinion and clinical data to inform clinical decision making. The objective of this study is to determine whether it is possible to complement these sources of evidence with information about physician "group intelligence" that exists in electronic health records. Specifically, we measured laboratory test "repeat intervals", defined as the amount of time it takes for a physician to repeat a test that was previously ordered for the same patient. Our assumption is that while the result of a test is a direct measure of one marker of a patient's health, the physician's decision to order the test is based on multiple factors including past experience, available treatment options, and information about the patient that might not be coded in the electronic health record. By examining repeat intervals in aggregate over large numbers of patients, we show that it is possible to 1) determine what laboratory test results physicians consider "normal", 2) identify subpopulations of patients that deviate from the norm, and 3) identify situations where laboratory tests are over-ordered. We used laboratory tests as just one example of how physician group intelligence can be used to support evidence based medicine in a way that is automated and continually updated.

Weber, Griffin M. (2013) 2013. “Identifying Translational Science Within the Triangle of Biomedicine.”. Journal of Translational Medicine 11: 126. https://doi.org/10.1186/1479-5876-11-126.

BACKGROUND: The National Institutes of Health (NIH) Roadmap places special emphasis on "bench-to-bedside" research, or the "translation" of basic science research into practical clinical applications. The Clinical and Translational Science Awards (CTSA) Consortium is one example of the large investments being made to develop a national infrastructure to support translational science, which involves reducing regulatory burdens, launching new educational initiatives, and forming partnerships between academia and industry. However, while numerous definitions have been suggested for translational science, including the qualitative T1-T4 classification, a consensus has not yet been reached. This makes it challenging to tract the impact of these major policy changes.

METHODS: In this study, we use a bibliometric approach to map PubMed articles onto a graph, called the Triangle of Biomedicine. The corners of the triangle represent research related to animals, cells and molecules, and humans; and, the position of a publication on the graph is based on its topics, as determined by its Medical Subject Headings (MeSH). We define translation as movement of a collection of articles, or the articles that cite those articles, towards the human corner.

RESULTS: The Triangle of Biomedicine provides a quantitative way of determining if an individual scientist, research organization, funding agency, or scientific field is producing results that are relevant to clinical medicine. We validate our technique using examples that have been previously described in the literature and by comparing it to prior methods of measuring translational science.

CONCLUSIONS: The Triangle of Biomedicine is a novel way to identify translational science and track changes over time. This is important to policy makers in evaluating the impact of the large investments being made to accelerate translation. The Triangle of Biomedicine also provides a simple visual way of depicting this impact, which can be far more powerful than numbers alone.