Publications

2003

Xu, Yaohui, Nicole Stange-Thomann, Griffin Weber, Ronghai Bo, Sheila Dodge, Robert G David, Karen Foley, et al. (2003) 2003. “Pathogen Discovery from Human Tissue by Sequence-Based Computational Subtraction.”. Genomics 81 (3): 329-35.

We have recently reported a new pathogen discovery approach, "computational subtraction". With this approach, non-human transcripts are detected by sequencing cDNA libraries from infected tissue and eliminating those transcripts that match the human genome. We show now that this method is experimentally feasible. We generated a cDNA library from a tissue sample of post-transplant lymphoproliferative disorder (PTLD). 27,840 independent cDNA sequences were filtered by computational subtraction against the known human sequence to identify 32 nonmatching transcripts. Of these, 22 (0.1%) were found to be amplifiable from both infected and noninfected samples and were inferred to be human DNA not yet contained in the available human genome sequence. The remaining 10 sequences could be amplified only from Epstein-Barr virus (EBV)-infected tissues. All 10 corresponded to the known EBV sequence. This proof-of-principle experiment demonstrates that computational subtraction can detect pathogenic microbes in primary human-diseased tissue.

2002

Weber, Griffin, Staal Vinterbo, and Lucila Ohno-Machado. (2002) 2002. “Building an Asynchronous Web-Based Tool for Machine Learning Classification.”. Proceedings. AMIA Symposium, 869-73.

Various unsupervised and supervised learning methods including support vector machines, classification trees, linear discriminant analysis and nearest neighbor classifiers have been used to classify high-throughput gene expression data. Simpler and more widely accepted statistical tools have not yet been used for this purpose, hence proper comparisons between classification methods have not been conducted. We developed free software that implements logistic regression with stepwise variable selection as a quick and simple method for initial exploration of important genetic markers in disease classification. To implement the algorithm and allow our collaborators in remote locations to evaluate and compare its results against those of other methods, we developed a user-friendly asynchronous web-based application with a minimal amount of programming using free, downloadable software tools. With this program, we show that classification using logistic regression can perform as well as other more sophisticated algorithms, and it has the advantages of being easy to interpret and reproduce. By making the tool freely and easily available, we hope to promote the comparison of classification methods. In addition, we believe our web application can be used as a model for other bioinformatics laboratories that need to develop web-based analysis tools in a short amount of time and on a limited budget.

Michaelson, James, Sameer Satija, Richard Moore, Griffin Weber, Elkan Halpern, Andrew Garland, Dhruv Puri, and Daniel B Kopans. (2002) 2002. “The Pattern of Breast Cancer Screening Utilization and Its Consequences.”. Cancer 94 (1): 37-43.

BACKGROUND: The objective of this study was to describe the pattern of screening utilization and its consequences in terms of tumor size and time of tumor appearance of invasive breast carcinoma among a population of women who were examined at a large service screening/diagnostic program over the last decade.

METHODS: Utilization of mammography was assessed from a population of 59,899 women who received 196,891 mammograms at the Massachusetts General Hospital Breast Imaging Division from January 1, 1990 to March 1, 1999, among which 604 invasive breast tumors were found. Two hundred six invasive, clinically detected tumors also were seen during this period among women who had no record of a previous mammogram. Additional information was available on screening of women from March 1, 1999 to June 1, 2001.

RESULTS: Fifty percent of the women who used screening did not begin until the age of 50 years, although 25% of the invasive breast tumors were found in women age < 50 years. Relatively few of the women who used screening returned promptly for their annual examinations; by 1.5 years, only 50% had returned. Approximately 25% of the invasive breast tumors were found in women for whom there was no record of a previous screening mammogram, and these tumors were larger (median, 15 mm) than the screen-detected tumors (median, 10 mm). Approximately 30% of the 604 invasive breast tumors in the screening population were found on nonmammographic grounds, and they also were larger (median, 15 mm) than the screen-detected tumors (median, 10 mm). However, only 3% of these 604 tumors were found by nonmammographic criteria within 6 months of the previous negative examination, and only 12% were found within 1 year. By back calculating the likely size of each of these tumors at the time of the negative mammogram, it could be seen that most tumors probably emerged as larger, palpable masses not because they were missed at the previous negative mammogram, because most were too small then to have been detected, but because too much time had been allowed to pass.

CONCLUSIONS: Far too many women did not comply with the American Cancer Society recommendation of prompt annual screening from the age of 40 years. Consequently, almost 50% of the invasive tumors emerged as larger and, thus, potentially more lethal, palpable masses.

Michaelson, James S, Melvin Silverstein, John Wyatt, Griffin Weber, Richard Moore, Elkan Halpern, Daniel B Kopans, and Kevin Hughes. (2002) 2002. “Predicting the Survival of Patients With Breast Carcinoma Using Tumor Size.”. Cancer 95 (4): 713-23.

BACKGROUND: Tumor size has long been recognized as the strongest predictor of the outcome of patients with invasive breast carcinoma, although it has not been settled whether the correlation between tumor size and the chance of death is independent of the method of detection, nor is it clear how tumor size at the time of treatment may be translated into a specific expectation of survival. In this report, the authors provide such a method.

METHODS: A Kaplan-Meier survival analysis was carried out for a population of 1352 women with invasive breast carcinoma who were treated at the Van Nuys Breast Center between 1966 and 1990, and the data were analyzed together with survival data published by others.

RESULTS: The authors found that the survival of patients with invasive breast carcinoma was a direct function of tumor size, independent of the method of detection. The results showed that the correlation between tumor size and survival was well fit by a simple equation, with which survival predictions could be made from information on tumor size. For example, a comparison of three large populations studied over the last 5 decades revealed a marked improvement (approximately 35% absolute) in the survival of patients with invasive breast carcinoma diagnosed on clinical grounds that could be ascribed to a reduction in tumor size. However, the capacity of screening mammography to find smaller tumors remains the best way reduce breast carcinoma deaths, with the potential for adding an additional approximately 20% absolute reduction in breast carcinoma deaths. The mathematic correlation between tumor size and survival is consistent with a biologic mechanism in which lethal distant metastasis occurs by discrete events of spread such that, for every invasive breast carcinoma cell in the primary tumor at the time of surgery, there is approximately a 1-in-1-billion chance that a lethal distant metastasis has formed.

CONCLUSIONS: The correlation between tumor size and lethality is well captured by a simple equation that is consistent with breast carcinoma death as the result of discrete events of cellular spread occurring with small but definable probabilities.

Weber, Griffin, Jay Shendure, David M Tanenbaum, George M Church, and Matthew Meyerson. (2002) 2002. “Identification of Foreign Gene Sequences by Transcript Filtering Against the Human Genome.”. Nature Genetics 30 (2): 141-2.

We have developed a computational subtraction approach to detect microbial causes for putative infectious diseases by filtering a set of human tissue-derived sequences against the human genome. We demonstrate the potential of this method by identifying sequences from known pathogens in established expressed-sequence tag libraries.