Publications

2014

Kahlon, Maninder, Leslie Yuan, John Daigre, Eric Meeks, Katie Nelson, Cynthia Piontkowski, Katja Reuter, et al. (2014) 2014. “The Use and Significance of a Research Networking System.”. Journal of Medical Internet Research 16 (2): e46. https://doi.org/10.2196/jmir.3137.

BACKGROUND: Universities have begun deploying public Internet systems that allow for easy search of their experts, expertise, and intellectual networks. Deployed first in biomedical schools but now being implemented more broadly, the initial motivator of these research networking systems was to enable easier identification of collaborators and enable the development of teams for research.

OBJECTIVE: The intent of the study was to provide the first description of the usage of an institutional research "social networking" system or research networking system (RNS).

METHODS: Number of visits, visitor location and type, referral source, depth of visit, search terms, and click paths were derived from 2.5 years of Web analytics data. Feedback from a pop-up survey presented to users over 15 months was summarized.

RESULTS: RNSs automatically generate and display profiles and networks of researchers. Within 2.5 years, the RNS at the University of California, San Francisco (UCSF) achieved one-seventh of the monthly visit rate of the main longstanding university website, with an increasing trend. Visitors came from diverse locations beyond the institution. Close to 75% (74.78%, 208,304/278,570) came via a public search engine and 84.0% (210 out of a sample of 250) of these queried an individual's name that took them directly to the relevant profile page. In addition, 20.90% (214 of 1024) visits went beyond the page related to a person of interest to explore related researchers and topics through the novel and networked information provided by the tool. At the end of the period analyzed, more than 2000 visits per month traversed 5 or more links into related people and topics. One-third of visits came from returning visitors who were significantly more likely to continue to explore networked people and topics (P<.001). Responses to an online survey suggest a broad range of benefits of using the RNS in supporting the research and clinical mission.

CONCLUSIONS: Returning visitors in an ever-increasing pool of visitors to an RNS are among those that display behavior consistent with using the tool to identify new collaborators or research topics. Through direct user feedback we know that some visits do result in research-enhancing outcomes, although we cannot address the scale of impact. With the rapid pace of acquiring visitors searching for individual names, the RNS is evolving into a new kind of gateway for the university.

Mandl, Kenneth D, Isaac S Kohane, Douglas McFadden, Griffin M Weber, Marc Natter, Joshua Mandel, Sebastian Schneeweiss, et al. (2014) 2014. “Scalable Collaborative Infrastructure for a Learning Healthcare System (SCILHS): Architecture.”. Journal of the American Medical Informatics Association : JAMIA 21 (4): 615-20. https://doi.org/10.1136/amiajnl-2014-002727.

We describe the architecture of the Patient Centered Outcomes Research Institute (PCORI) funded Scalable Collaborative Infrastructure for a Learning Healthcare System (SCILHS, http://www.SCILHS.org) clinical data research network, which leverages the $48 billion dollar federal investment in health information technology (IT) to enable a queryable semantic data model across 10 health systems covering more than 8 million patients, plugging universally into the point of care, generating evidence and discovery, and thereby enabling clinician and patient participation in research during the patient encounter. Central to the success of SCILHS is development of innovative 'apps' to improve PCOR research methods and capacitate point of care functions such as consent, enrollment, randomization, and outreach for patient-reported outcomes. SCILHS adapts and extends an existing national research network formed on an advanced IT infrastructure built with open source, free, modular components.

Klann, Jeffrey G, Michael D Buck, Jeffrey Brown, Marc Hadley, Richard Elmore, Griffin M Weber, and Shawn N Murphy. (2014) 2014. “Query Health: Standards-Based, Cross-Platform Population Health Surveillance.”. Journal of the American Medical Informatics Association : JAMIA 21 (4): 650-6. https://doi.org/10.1136/amiajnl-2014-002707.

OBJECTIVE: Understanding population-level health trends is essential to effectively monitor and improve public health. The Office of the National Coordinator for Health Information Technology (ONC) Query Health initiative is a collaboration to develop a national architecture for distributed, population-level health queries across diverse clinical systems with disparate data models. Here we review Query Health activities, including a standards-based methodology, an open-source reference implementation, and three pilot projects.

MATERIALS AND METHODS: Query Health defined a standards-based approach for distributed population health queries, using an ontology based on the Quality Data Model and Consolidated Clinical Document Architecture, Health Quality Measures Format (HQMF) as the query language, the Query Envelope as the secure transport layer, and the Quality Reporting Document Architecture as the result language.

RESULTS: We implemented this approach using Informatics for Integrating Biology and the Bedside (i2b2) and hQuery for data analytics and PopMedNet for access control, secure query distribution, and response. We deployed the reference implementation at three pilot sites: two public health departments (New York City and Massachusetts) and one pilot designed to support Food and Drug Administration post-market safety surveillance activities. The pilots were successful, although improved cross-platform data normalization is needed.

DISCUSSIONS: This initiative resulted in a standards-based methodology for population health queries, a reference implementation, and revision of the HQMF standard. It also informed future directions regarding interoperability and data access for ONC's Data Access Framework initiative.

CONCLUSIONS: Query Health was a test of the learning health system that supplied a functional methodology and reference implementation for distributed population health queries that has been validated at three sites.

2013

Natter, Marc D, Justin Quan, David M Ortiz, Athos Bousvaros, Norman T Ilowite, Christi J Inman, Keith Marsolo, et al. (2013) 2013. “An I2b2-Based, Generalizable, Open Source, Self-Scaling Chronic Disease Registry.”. Journal of the American Medical Informatics Association : JAMIA 20 (1): 172-9. https://doi.org/10.1136/amiajnl-2012-001042.

OBJECTIVE: Registries are a well-established mechanism for obtaining high quality, disease-specific data, but are often highly project-specific in their design, implementation, and policies for data use. In contrast to the conventional model of centralized data contribution, warehousing, and control, we design a self-scaling registry technology for collaborative data sharing, based upon the widely adopted Integrating Biology & the Bedside (i2b2) data warehousing framework and the Shared Health Research Information Network (SHRINE) peer-to-peer networking software.

MATERIALS AND METHODS: Focusing our design around creation of a scalable solution for collaboration within multi-site disease registries, we leverage the i2b2 and SHRINE open source software to create a modular, ontology-based, federated infrastructure that provides research investigators full ownership and access to their contributed data while supporting permissioned yet robust data sharing. We accomplish these objectives via web services supporting peer-group overlays, group-aware data aggregation, and administrative functions.

RESULTS: The 56-site Childhood Arthritis & Rheumatology Research Alliance (CARRA) Registry and 3-site Harvard Inflammatory Bowel Diseases Longitudinal Data Repository now utilize i2b2 self-scaling registry technology (i2b2-SSR). This platform, extensible to federation of multiple projects within and between research networks, encompasses >6000 subjects at sites throughout the USA.

DISCUSSION: We utilize the i2b2-SSR platform to minimize technical barriers to collaboration while enabling fine-grained control over data sharing.

CONCLUSIONS: The implementation of i2b2-SSR for the multi-site, multi-stakeholder CARRA Registry has established a digital infrastructure for community-driven research data sharing in pediatric rheumatology in the USA. We envision i2b2-SSR as a scalable, reusable solution facilitating interdisciplinary research across diseases.

Weber, Griffin M. (2013) 2013. “How Many Patients Are ‘normal’? Only 1.55%.”. AMIA Joint Summits on Translational Science Proceedings. AMIA Joint Summits on Translational Science 2013: 79.

When conducting a clinical study, a "normal" control population is often desired. Identifying normal patients in a clinical data repository (CDR) can be challenging because healthy patients do not go the hospital; and, because patients receive care from multiple hospitals, the absence of a diagnosis in one hospital's electronic health record does not mean a patient does not have the disease. We define a set of 10 simple heuristic filters to eliminate patients who would seemingly be poor candidates for a normal control (e.g., chronic conditions, rare diseases, no recent data, etc.). Surprisingly, out of 2,019,774 patients at two large academic hospitals, these filters excluded all but 31,352 (1.55%). This illustrates how difficult it can be to identify control cohorts, and it raises questions of what it truly means to be normal.

McMurry, Andrew J, Shawn N Murphy, Douglas MacFadden, Griffin Weber, William W Simons, John Orechia, Jonathan Bickel, et al. (2013) 2013. “SHRINE: Enabling Nationally Scalable Multi-Site Disease Studies.”. PloS One 8 (3): e55811. https://doi.org/10.1371/journal.pone.0055811.

Results of medical research studies are often contradictory or cannot be reproduced. One reason is that there may not be enough patient subjects available for observation for a long enough time period. Another reason is that patient populations may vary considerably with respect to geographic and demographic boundaries thus limiting how broadly the results apply. Even when similar patient populations are pooled together from multiple locations, differences in medical treatment and record systems can limit which outcome measures can be commonly analyzed. In total, these differences in medical research settings can lead to differing conclusions or can even prevent some studies from starting. We thus sought to create a patient research system that could aggregate as many patient observations as possible from a large number of hospitals in a uniform way. We call this system the 'Shared Health Research Information Network', with the following properties: (1) reuse electronic health data from everyday clinical care for research purposes, (2) respect patient privacy and hospital autonomy, (3) aggregate patient populations across many hospitals to achieve statistically significant sample sizes that can be validated independently of a single research setting, (4) harmonize the observation facts recorded at each institution such that queries can be made across many hospitals in parallel, (5) scale to regional and national collaborations. The purpose of this report is to provide open source software for multi-site clinical studies and to report on early uses of this application. At this time SHRINE implementations have been used for multi-site studies of autism co-morbidity, juvenile idiopathic arthritis, peripartum cardiomyopathy, colorectal cancer, diabetes, and others. The wide range of study objectives and growing adoption suggest that SHRINE may be applicable beyond the research uses and participating hospitals named in this report.

Weber, Griffin M. (2013) 2013. “Identifying Translational Science Within the Triangle of Biomedicine.”. Journal of Translational Medicine 11: 126. https://doi.org/10.1186/1479-5876-11-126.

BACKGROUND: The National Institutes of Health (NIH) Roadmap places special emphasis on "bench-to-bedside" research, or the "translation" of basic science research into practical clinical applications. The Clinical and Translational Science Awards (CTSA) Consortium is one example of the large investments being made to develop a national infrastructure to support translational science, which involves reducing regulatory burdens, launching new educational initiatives, and forming partnerships between academia and industry. However, while numerous definitions have been suggested for translational science, including the qualitative T1-T4 classification, a consensus has not yet been reached. This makes it challenging to tract the impact of these major policy changes.

METHODS: In this study, we use a bibliometric approach to map PubMed articles onto a graph, called the Triangle of Biomedicine. The corners of the triangle represent research related to animals, cells and molecules, and humans; and, the position of a publication on the graph is based on its topics, as determined by its Medical Subject Headings (MeSH). We define translation as movement of a collection of articles, or the articles that cite those articles, towards the human corner.

RESULTS: The Triangle of Biomedicine provides a quantitative way of determining if an individual scientist, research organization, funding agency, or scientific field is producing results that are relevant to clinical medicine. We validate our technique using examples that have been previously described in the literature and by comparing it to prior methods of measuring translational science.

CONCLUSIONS: The Triangle of Biomedicine is a novel way to identify translational science and track changes over time. This is important to policy makers in evaluating the impact of the large investments being made to accelerate translation. The Triangle of Biomedicine also provides a simple visual way of depicting this impact, which can be far more powerful than numbers alone.

Weber, Griffin M. (2013) 2013. “Federated Queries of Clinical Data Repositories: The Sum of the Parts Does Not Equal the Whole.”. Journal of the American Medical Informatics Association : JAMIA 20 (e1): e155-61. https://doi.org/10.1136/amiajnl-2012-001299.

BACKGROUND AND OBJECTIVE: In 2008 we developed a shared health research information network (SHRINE), which for the first time enabled research queries across the full patient populations of four Boston hospitals. It uses a federated architecture, where each hospital returns only the aggregate count of the number of patients who match a query. This allows hospitals to retain control over their local databases and comply with federal and state privacy laws. However, because patients may receive care from multiple hospitals, the result of a federated query might differ from what the result would be if the query were run against a single central repository. This paper describes the situations when this happens and presents a technique for correcting these errors.

METHODS: We use a one-time process of identifying which patients have data in multiple repositories by comparing one-way hash values of patient demographics. This enables us to partition the local databases such that all patients within a given partition have data at the same subset of hospitals. Federated queries are then run separately on each partition independently, and the combined results are presented to the user.

RESULTS: Using theoretical bounds and simulated hospital networks, we demonstrate that once the partitions are made, SHRINE can produce more precise estimates of the number of patients matching a query.

CONCLUSIONS: Uncertainty in the overlap of patient populations across hospitals limits the effectiveness of SHRINE and other federated query tools. Our technique reduces this uncertainty while retaining an aggregate federated architecture.

Weber, Griffin M, and Isaac S Kohane. (2013) 2013. “Extracting Physician Group Intelligence from Electronic Health Records to Support Evidence Based Medicine.”. PloS One 8 (5): e64933. https://doi.org/10.1371/journal.pone.0064933.

Evidence-based medicine employs expert opinion and clinical data to inform clinical decision making. The objective of this study is to determine whether it is possible to complement these sources of evidence with information about physician "group intelligence" that exists in electronic health records. Specifically, we measured laboratory test "repeat intervals", defined as the amount of time it takes for a physician to repeat a test that was previously ordered for the same patient. Our assumption is that while the result of a test is a direct measure of one marker of a patient's health, the physician's decision to order the test is based on multiple factors including past experience, available treatment options, and information about the patient that might not be coded in the electronic health record. By examining repeat intervals in aggregate over large numbers of patients, we show that it is possible to 1) determine what laboratory test results physicians consider "normal", 2) identify subpopulations of patients that deviate from the norm, and 3) identify situations where laboratory tests are over-ordered. We used laboratory tests as just one example of how physician group intelligence can be used to support evidence based medicine in a way that is automated and continually updated.

2012

Kohane, Isaac S, Andrew McMurry, Griffin Weber, Douglas MacFadden, Leonard Rappaport, Louis Kunkel, Jonathan Bickel, et al. (2012) 2012. “The Co-Morbidity Burden of Children and Young Adults With Autism Spectrum Disorders.”. PloS One 7 (4): e33224. https://doi.org/10.1371/journal.pone.0033224.

OBJECTIVES: Use electronic health records Autism Spectrum Disorder (ASD) to assess the comorbidity burden of ASD in children and young adults.

STUDY DESIGN: A retrospective prevalence study was performed using a distributed query system across three general hospitals and one pediatric hospital. Over 14,000 individuals under age 35 with ASD were characterized by their co-morbidities and conversely, the prevalence of ASD within these comorbidities was measured. The comorbidity prevalence of the younger (Age<18 years) and older (Age 18-34 years) individuals with ASD was compared.

RESULTS: 19.44% of ASD patients had epilepsy as compared to 2.19% in the overall hospital population (95% confidence interval for difference in percentages 13.58-14.69%), 2.43% of ASD with schizophrenia vs. 0.24% in the hospital population (95% CI 1.89-2.39%), inflammatory bowel disease (IBD) 0.83% vs. 0.54% (95% CI 0.13-0.43%), bowel disorders (without IBD) 11.74% vs. 4.5% (95% CI 5.72-6.68%), CNS/cranial anomalies 12.45% vs. 1.19% (95% CI 9.41-10.38%), diabetes mellitus type I (DM1) 0.79% vs. 0.34% (95% CI 0.3-0.6%), muscular dystrophy 0.47% vs 0.05% (95% CI 0.26-0.49%), sleep disorders 1.12% vs. 0.14% (95% CI 0.79-1.14%). Autoimmune disorders (excluding DM1 and IBD) were not significantly different at 0.67% vs. 0.68% (95% CI -0.14-0.13%). Three of the studied comorbidities increased significantly when comparing ages 0-17 vs 18-34 with p<0.001: Schizophrenia (1.43% vs. 8.76%), diabetes mellitus type I (0.67% vs. 2.08%), IBD (0.68% vs. 1.99%) whereas sleeping disorders, bowel disorders (without IBD) and epilepsy did not change significantly.

CONCLUSIONS: The comorbidities of ASD encompass disease states that are significantly overrepresented in ASD with respect to even the patient populations of tertiary health centers. This burden of comorbidities goes well beyond those routinely managed in developmental medicine centers and requires broad multidisciplinary management that payors and providers will have to plan for.