Information technology (IT) to support clinical research has steadily grown over the past 10 years. Many new applications at the enterprise level are available to assist with the numerous tasks necessary in performing clinical research. However, it is not clear how rapidly this technology is being adopted or whether it is making an impact upon how clinical research is being performed. The Clinical Research Forum's IT Roundtable performed a survey of 17 representative academic medical centers (AMCs) to understand the adoption rate and implementation strategies within this field. The results were compared with similar surveys from 4 and 6 years ago. We found the adoption rate for four prominent areas of IT-supported clinical research had increased remarkably, specifically regulatory compliance, electronic data capture for clinical trials, data repositories for secondary use of clinical data, and infrastructure for supporting collaboration. Adoption of other areas of clinical research IT was more irregular with wider differences between AMCs. These differences appeared to be partially due to a set of openly available applications that have emerged to occupy an important place in the landscape of clinical research enterprise-level support at AMC's.
Publications
2012
2011
Decision support systems have been used to promote the practice of evidence-based medicine. Computer-assisted diagnosis can serve as one element of evidence-based radiology. One area where such tools may provide benefit is analysis of vertebral compression fractures (VCFs), which can be a challenge in MRI interpretation. VCFs may be benign or malignant in etiology, and several MRI features may help to make this important distinction. We describe a web-based decision support system for discriminating benign from malignant VCFs as a prototype for a more general diagnostic decision support framework for radiologists. The system has three components: a feature checklist with an image gallery derived from proven reference cases, a prediction model, and a reporting mechanism. The website allows users to input the findings for a case to be interpreted using a structured feature checklist. The image gallery complements the checklist, for clarity and training purposes. The input from the checklist is then used to calculate the likelihood of malignancy by a logistic regression prediction model. Standardized report text is generated that summarizes pertinent positive and negative findings. This computer-assisted diagnosis system demonstrates the integration of three areas where diagnostic decision support can aid radiologists: first, in image interpretation, through feature checklists and illustrative image galleries; second, in feature-based prediction modeling; and third, in structured reporting. We present a diagnostic decision support tool that provides radiologists with evidence-based guidance for discriminating benign from malignant VCF. This model may be useful in other difficult-diagnosis situations and requires further clinical testing.
Research-networking tools use data-mining and social networking to enable expertise discovery, matchmaking and collaboration, which are important facets of team science and translational research. Several commercial and academic platforms have been built, and many institutions have deployed these products to help their investigators find local collaborators. Recent studies, though, have shown the growing importance of multiuniversity teams in science. Unfortunately, the lack of a standard data-exchange model and resistance of universities to share information about their faculty have presented barriers to forming an institutionally supported national network. This case report describes an initiative, which, in only 6 months, achieved interoperability among seven major research-networking products at 28 universities by taking an approach that focused on addressing institutional concerns and encouraging their participation. With this necessary groundwork in place, the second phase of this effort can begin, which will expand the network's functionality and focus on the end users.
2010
BioNumbers (http://www.bionumbers.hms.harvard.edu) is a database of key numbers in molecular and cell biology–the quantitative properties of biological systems of interest to computational, systems and molecular cell biologists. Contents of the database range from cell sizes to metabolite concentrations, from reaction rates to generation times, from genome sizes to the number of mitochondria in a cell. While always of importance to biologists, having numbers in hand is becoming increasingly critical for experimenting, modeling, and analyzing biological systems. BioNumbers was motivated by an appreciation of how long it can take to find even the simplest number in the vast biological literature. All numbers are taken directly from a literature source and that reference is provided with the number. BioNumbers is designed to be highly searchable and queries can be performed by keywords or browsed by menus. BioNumbers is a collaborative community platform where registered users can add content and make comments on existing data. All new entries and commentary are curated to maintain high quality. Here we describe the database characteristics and implementation, demonstrate its use, and discuss future directions for its development.
Informatics for Integrating Biology and the Bedside (i2b2) is one of seven projects sponsored by the NIH Roadmap National Centers for Biomedical Computing (http://www.ncbcs.org). Its mission is to provide clinical investigators with the tools necessary to integrate medical record and clinical research data in the genomics age, a software suite to construct and integrate the modern clinical research chart. i2b2 software may be used by an enterprise's research community to find sets of interesting patients from electronic patient medical record data, while preserving patient privacy through a query tool interface. Project-specific mini-databases ("data marts") can be created from these sets to make highly detailed data available on these specific patients to the investigators on the i2b2 platform, as reviewed and restricted by the Institutional Review Board. The current version of this software has been released into the public domain and is available at the URL: http://www.i2b2.org/software.
2009
The authors developed a prototype Shared Health Research Information Network (SHRINE) to identify the technical, regulatory, and political challenges of creating a federated query tool for clinical data repositories. Separate Institutional Review Boards (IRBs) at Harvard's three largest affiliated health centers approved use of their data, and the Harvard Medical School IRB approved building a Query Aggregator Interface that can simultaneously send queries to each hospital and display aggregate counts of the number of matching patients. Our experience creating three local repositories using the open source Informatics for Integrating Biology and the Bedside (i2b2) platform can be used as a road map for other institutions. The authors are actively working with the IRBs and regulatory groups to develop procedures that will ultimately allow investigators to obtain identified patient data and biomaterials through SHRINE. This will guide us in creating a future technical architecture that is scalable to a national level, compliant with ethical guidelines, and protective of the interests of the participating hospitals.
2008
OBJECTIVE: To evaluate whether rosiglitazone maleate, an oral peroxisome-proliferating activated receptor gamma agonist and oral insulin sensitizing agent with potential antiangiogenic activity, delays onset of proliferative diabetic retinopathy (PDR).
METHODS: Longitudinal medical record review of all patients treated with rosiglitazone receiving both medical and ophthalmic care at the Joslin Diabetes Center from May 1, 2002, to May 31, 2003 (N = 124), and matched control patients not taking a glitazone drug (N = 158). The mean duration of follow-up was 2.8 years (range, 0.3-9.0 years).
RESULTS: Baseline characteristics and final hemoglobin A(1c) values (7.6% and 7.8%, respectively) were similar in the rosiglitazone and control groups (P = .10). In eyes with severe nonproliferative diabetic retinopathy at baseline (rosiglitazone group, 14 eyes; control group, 24 eyes), progression to PDR over 3 years occurred in 19.2% in the rosiglitazone group and 47.4% in the control group, representing a 59% relative risk reduction (Wilcoxon, P = .045; log-rank, P = .059). Fewer eyes in the rosiglitazone group experienced 3 or more lines of visual acuity loss (P = .03). The incidence of diabetic macular edema was similar in both groups.
CONCLUSIONS: Rosiglitazone may delay the onset of PDR, possibly because of its antiangiogenic activity. Future clinical investigations should consider analysis of this potential benefit along with ongoing evaluation of potential cardiac risk in studies where the risk-benefit profiles are deemed appropriate.
2006
Phylogenetic tree reconstruction is a process in which the ancestral relationships among a group of organisms are inferred from their DNA sequences. For all but trivial sized data sets, finding the optimal tree is computationally intractable. Many heuristic algorithms exist, but the branch-swapping algorithm used in the software package PAUP* is the most popular. This method performs a stochastic search over the space of trees, using a branch-swapping operation to construct neighboring trees in the search space. This study introduces a new stochastic search algorithm that operates over an alternative representation of trees, namely as permutations of taxa giving the order in which they are processed during stepwise addition. Experiments on several data sets suggest that this algorithm for generating an initial tree, when followed by branch-swapping, can produce better trees for a given total amount of time.
2004
BACKGROUND: Hormone therapy (HT) provides the most effective relief of menopausal symptoms. This therapy is associated with a decreased risk of osteoporosis and colorectal cancer but increased risks of cardiovascular disease (CVD), venous thrombosis, and breast cancer. Our objective was to identify which women should benefit from short-term HT by exploring the trade-off between symptom relief and risks of inducing disease.
METHODS: A Markov model simulates the effect of short-term (2 years) estrogen and progestin HT on life expectancy and quality-adjusted life expectancy (QALE) among 50-year-old menopausal women with intact uteri, using findings from the Women's Health Initiative. Quality-of-life (QOL) utility scores were derived from the literature. We assumed HT-affected QOL only during perimenopause, when it reduced symptoms by 80%.
RESULTS: Among asymptomatic women, short-term HT was associated with net losses in life expectancy and QALE of 1 to 3 months, depending on CVD risk. Women with mild or severe menopausal symptoms gained 3 to 4 months or 7 to 8 months of QALE, respectively. Among women at low risk for CVD, HT extended QALE if menopausal symptoms lowered QOL by as little as 4%. Among women at elevated CVD risk, HT extended QALE only if symptoms lowered QOL by at least 12%.
CONCLUSIONS: Hormone therapy is associated with losses in survival but gains in QALE for women with menopausal symptoms. Women expected to benefit from short-term HT can be identified by the severity of their menopausal symptoms and CVD risk.
Analysis of gene expression data obtained from microarrays presents a new set of challenges to machine learning modeling. In this domain, in which the number of variables far exceeds the number of cases, identifying relevant genes or groups of genes that are good markers for a particular classification is as important as achieving good classification performance. Although several machine learning algorithms have been proposed to address the latter, identification of gene markers has not been systematically pursued. In this article, we investigate several algorithms for selecting gene markers for classification. We test these algorithms using logistic regression, as this is a simple and efficient supervised learning algorithm. We demonstrate, using 10 different data sets, that a conditionally univariate algorithm constitutes a viable choice if a researcher is interested in quickly determining a set of gene expression levels that can serve as markers for disease. We show that the classification performance of logistic regression is not very different from that of more sophisticated algorithms that have been applied in previous studies, and that the gene selection in the logistic regression algorithm is reasonable in both cases. Furthermore, the algorithm is simple, its theoretical basis is well established, and our user-friendly implementation is now freely available on the internet, serving as a benchmarking tool for the development of new algorithms.