Publications

2021

Sofer T, Kurniansyah N, Aguet F, Ardlie K, Durda P, Nickerson DA, et al. Benchmarking association analyses of continuous exposures with RNA-seq in observational studies.. Briefings in bioinformatics. 2021;22(6).

Large datasets of hundreds to thousands of individuals measuring RNA-seq in observational studies are becoming available. Many popular software packages for analysis of RNA-seq data were constructed to study differences in expression signatures in an experimental design with well-defined conditions (exposures). In contrast, observational studies may have varying levels of confounding transcript-exposure associations; further, exposure measures may vary from discrete (exposed, yes/no) to continuous (levels of exposure), with non-normal distributions of exposure. We compare popular software for gene expression-DESeq2, edgeR and limma-as well as linear regression-based analyses for studying the association of continuous exposures with RNA-seq. We developed a computation pipeline that includes transformation, filtering and generation of empirical null distribution of association P-values, and we apply the pipeline to compute empirical P-values with multiple testing correction. We employ a resampling approach that allows for assessment of false positive detection across methods, power comparison and the computation of quantile empirical P-values. The results suggest that linear regression methods are substantially faster with better control of false detections than other methods, even with the resampling method to compute empirical P-values. We provide the proposed pipeline with fast algorithms in an R package Olivia, and implemented it to study the associations of measures of sleep disordered breathing with RNA-seq in peripheral blood mononuclear cells in participants from the Multi-Ethnic Study of Atherosclerosis.

Sarnowski C, Cousminer DL, Franceschini N, Raffield LM, Jia G, Fernández-Rhodes L, et al. Large trans-ethnic meta-analysis identifies AKR1C4 as a novel gene associated with age at menarche.. Human reproduction (Oxford, England). 2021;36(7):1999-2010.

STUDY QUESTION: Does the expansion of genome-wide association studies (GWAS) to a broader range of ancestries improve the ability to identify and generalise variants associated with age at menarche (AAM) in European populations to a wider range of world populations?

SUMMARY ANSWER: By including women with diverse and predominantly non-European ancestry in a large-scale meta-analysis of AAM with half of the women being of African ancestry, we identified a new locus associated with AAM in African-ancestry participants, and generalised loci from GWAS of European ancestry individuals.

WHAT IS KNOWN ALREADY: AAM is a highly polygenic puberty trait associated with various diseases later in life. Both AAM and diseases associated with puberty timing vary by race or ethnicity. The majority of GWAS of AAM have been performed in European ancestry women.

STUDY DESIGN, SIZE, DURATION: We analysed a total of 38 546 women who did not have predominantly European ancestry backgrounds: 25 149 women from seven studies from the ReproGen Consortium and 13 397 women from the UK Biobank. In addition, we used an independent sample of 5148 African-ancestry women from the Southern Community Cohort Study (SCCS) for replication.

PARTICIPANTS/MATERIALS, SETTING, METHODS: Each AAM GWAS was performed by study and ancestry or ethnic group using linear regression models adjusted for birth year and study-specific covariates. ReproGen and UK Biobank results were meta-analysed using an inverse variance-weighted average method. A trans-ethnic meta-analysis was also carried out to assess heterogeneity due to different ancestry.

MAIN RESULTS AND THE ROLE OF CHANCE: We observed consistent direction and effect sizes between our meta-analysis and the largest GWAS conducted in European or Asian ancestry women. We validated four AAM loci (1p31, 6q16, 6q22 and 9q31) with common genetic variants at P < 5 × 10-7. We detected one new association (10p15) at P < 5 × 10-8 with a low-frequency genetic variant lying in AKR1C4, which was replicated in an independent sample. This gene belongs to a family of enzymes that regulate the metabolism of steroid hormones and have been implicated in the pathophysiology of uterine diseases. The genetic variant in the new locus is more frequent in African-ancestry participants, and has a very low frequency in Asian or European-ancestry individuals.

LARGE SCALE DATA: N/A.

LIMITATIONS, REASONS FOR CAUTION: Extreme AAM (<9 years or >18 years) were excluded from analysis. Women may not fully recall their AAM as most of the studies were conducted many years later. Further studies in women with diverse and predominantly non-European ancestry are needed to confirm and extend these findings, but the availability of such replication samples is limited.

WIDER IMPLICATIONS OF THE FINDINGS: Expanding association studies to a broader range of ancestries or ethnicities may improve the identification of new genetic variants associated with complex diseases or traits and the generalisation of variants from European-ancestry studies to a wider range of world populations.

STUDY FUNDING/COMPETING INTEREST(S): Funding was provided by CHARGE Consortium grant R01HL105756-07: Gene Discovery For CVD and Aging Phenotypes and by the NIH grant U24AG051129 awarded by the National Institute on Aging (NIA). The authors have no conflict of interest to declare.

Chen J, Spracklen CN, Marenne G, Varshney A, Corbin LJ, Luan J, et al. The trans-ancestral genomic architecture of glycemic traits.. Nature genetics. 2021;53(6):840-6.

Glycemic traits are used to diagnose and monitor type 2 diabetes and cardiometabolic health. To date, most genetic studies of glycemic traits have focused on individuals of European ancestry. Here we aggregated genome-wide association studies comprising up to 281,416 individuals without diabetes (30% non-European ancestry) for whom fasting glucose, 2-h glucose after an oral glucose challenge, glycated hemoglobin and fasting insulin data were available. Trans-ancestry and single-ancestry meta-analyses identified 242 loci (99 novel; P < 5 × 10-8), 80% of which had no significant evidence of between-ancestry heterogeneity. Analyses restricted to individuals of European ancestry with equivalent sample size would have led to 24 fewer new loci. Compared with single-ancestry analyses, equivalent-sized trans-ancestry fine-mapping reduced the number of estimated variants in 99% credible sets by a median of 37.5%. Genomic-feature, gene-expression and gene-set analyses revealed distinct biological signatures for each trait, highlighting different underlying biological pathways. Our results increase our understanding of diabetes pathophysiology by using trans-ancestry studies for improved power and resolution.

Sofer T, Zheng X, Laurie CA, Gogarten SM, Brody JA, Conomos MP, et al. Variant-specific inflation factors for assessing population stratification at the phenotypic variance level.. Nature communications. 2021;12(1):3506.

In modern Whole Genome Sequencing (WGS) epidemiological studies, participant-level data from multiple studies are often pooled and results are obtained from a single analysis. We consider the impact of differential phenotype variances by study, which we term 'variance stratification'. Unaccounted for, variance stratification can lead to both decreased statistical power, and increased false positives rates, depending on how allele frequencies, sample sizes, and phenotypic variances vary across the studies that are pooled. We develop a procedure to compute variant-specific inflation factors, and show how it can be used for diagnosis of genetic association analyses on pooled individual level data from multiple studies. We describe a WGS-appropriate analysis approach, implemented in freely-available software, which allows study-specific variances and thereby improves performance in practice. We illustrate the variance stratification problem, its solutions, and the proposed diagnostic procedure, in simulations and in data from the Trans-Omics for Precision Medicine Whole Genome Sequencing Program (TOPMed), used in association tests for hemoglobin concentrations and BMI.

Keramati AR, Chen MH, Rodriguez BAT, Yanek LR, Bhan A, Gaynor BJ, et al. Genome sequencing unveils a regulatory landscape of platelet reactivity.. Nature communications. 2021;12(1):3626.

Platelet aggregation at the site of atherosclerotic vascular injury is the underlying pathophysiology of myocardial infarction and stroke. To build upon prior GWAS, here we report on 16 loci identified through a whole genome sequencing (WGS) approach in 3,855 NHLBI Trans-Omics for Precision Medicine (TOPMed) participants deeply phenotyped for platelet aggregation. We identify the RGS18 locus, which encodes a myeloerythroid lineage-specific regulator of G-protein signaling that co-localizes with expression quantitative trait loci (eQTL) signatures for RGS18 expression in platelets. Gene-based approaches implicate the SVEP1 gene, a known contributor of coronary artery disease risk. Sentinel variants at RGS18 and PEAR1 are associated with thrombosis risk and increased gastrointestinal bleeding risk, respectively. Our WGS findings add to previously identified GWAS loci, provide insights regarding the mechanism(s) by which genetics may influence cardiovascular disease risk, and underscore the importance of rare variant and regulatory approaches to identifying loci contributing to complex phenotypes.

Li R, Rueschman M, Gottlieb DJ, Redline S, Sofer T. A composite sleep and pulmonary phenotype predicting hypertension.. EBioMedicine. 2021;68:103433.

BACKGROUND: Multiple aspects of sleep and Sleep Disordered Breathing (SDB) have been linked to hypertension. However, the standard measure of SDB, the Apnoea Hypopnea Index (AHI), has not identified patients likely to experience large improvements in blood pressure with SDB treatment.

METHODS: To use machine learning to select sleep and pulmonary measures associated with hypertension development when considered jointly, we applied feature screening followed by Elastic Net penalized regression in association with incident hypertension using a wide array of polysomnography measures, and lung function, derived for the Sleep Heart Health Study (SHHS).

FINDINGS: At baseline, n=860 SHHS individuals with complete data were age 61 years, on average. Of these, 291 developed hypertension  5 years later. A combination of pulmonary function and 18 sleep phenotypes predicted incident hypertension (OR=1.43, 95% confidence interval [1.14, 1.80] per 1 standard deviation (SD) of the phenotype), while the apnoea-hypopnea index (AHI) had low evidence of association with incident hypertension (OR =1.13, 95% confidence interval [0.97, 1.33] per 1 SD). In a generalization analysis in 923 individuals from the Multi-Ethnic Study of Atherosclerosis, aged 65 on average with 615 individuals with hypertension, the new phenotype was cross-sectionally associated with hypertension (OR=1.26, 95% CI [1.10, 1.45]).

INTERPRETATION: A unique combination of sleep and pulmonary function measures better predicts hypertension compared to the AHI. The composite measure included indices capturing apnoea and hypopnea event durations, with shorter event lengths associated with increased risk of hypertension.

FUNDING: This research was supported by National Heart, Lung, and Blood Institute (NHLBI) contracts HHSN268201500003I, N01-HC-95159, N01-HC-95160, N01-HC-95161, N01-HC-95162, N01-HC-95163, N01-HC-95164, N01-HC-95165, N01-HC-95166, N01-HC-95167, N01-HC-95168, and N01-HC-95169 and by National Center for Advancing Translational Sciences grants UL1-TR- 000040, UL1-TR-001079, and UL1-TR-001420. The MESA Sleep ancillary study was supported by NHLBI grant HL-56984. Pulmonary phenotyping in MESA was funded by NHLBI grants R01-HL077612 and R01-HL093081. This work was supported by NHLBI grant R35HL135818 to Susan Redline.

Justice AE, Young K, Gogarten SM, Sofer T, Graff M, Love SAM, et al. Genome-wide association study of body fat distribution traits in Hispanics/Latinos from the HCHS/SOL.. Human molecular genetics. 2021;30(22):2190-204.

Central obesity is a leading health concern with a great burden carried by ethnic minority populations, especially Hispanics/Latinos. Genetic factors contribute to the obesity burden overall and to inter-population differences. We aimed to identify the loci associated with central adiposity measured as waist-to-hip ratio (WHR), waist circumference (WC) and hip circumference (HIP) adjusted for body mass index (adjBMI) by using the Hispanic Community Health Study/Study of Latinos (HCHS/SOL); determine if differences in associations differ by background group within HCHS/SOL and determine whether previously reported associations generalize to HCHS/SOL. Our analyses included 7472 women and 5200 men of mainland (Mexican, Central and South American) and Caribbean (Puerto Rican, Cuban and Dominican) background residing in the USA. We performed genome-wide association analyses stratified and combined across sexes using linear mixed-model regression. We identified 16 variants for waist-to-hip ratio adjusted for body mass index (WHRadjBMI), 22 for waist circumference adjusted for body mass index (WCadjBMI) and 28 for hip circumference adjusted for body mass index (HIPadjBMI), which reached suggestive significance (P < 1 × 10-6). Many loci exhibited differences in strength of associations by ethnic background and sex. We brought a total of 66 variants forward for validation in cohorts (N = 34 161) with participants of Hispanic/Latino, African and European descent. We confirmed four novel loci (P < 0.05 and consistent direction of effect, and P < 5 × 10-8 after meta-analysis), including two for WHRadjBMI (rs13301996, rs79478137); one for WCadjBMI (rs3168072) and one for HIPadjBMI (rs28692724). Also, we generalized previously reported associations to HCHS/SOL, (8 for WHRadjBMI, 10 for WCadjBMI and 12 for HIPadjBMI). Our study highlights the importance of large-scale genomic studies in ancestrally diverse Hispanic/Latino populations for identifying and characterizing central obesity susceptibility that may be ancestry-specific.

STUDY OBJECTIVES: In an older African-American sample (n = 231) we tested associations of the household environment and in-bed behaviors with sleep duration, efficiency, and wakefulness after sleep onset (WASO).

METHODS: Older adult participants completed a household-level sleep environment questionnaire, a sleep questionnaire, and underwent 7-day wrist actigraphy for objective measures of sleep. Perceived household environment (self-reported) was evaluated using questions regarding safety, physical comfort, temperature, noise, and light disturbances. In-bed behaviors included watching television, listening to radio/music, use of computer/tablet/phone, playing video games, reading books, and eating. To estimate the combined effect of the components in each domain (perceived household environment and in-bed behaviors), we calculated and standardized a weighted score per sleep outcome (e.g. duration, efficiency, WASO), with a higher score indicating worse conditions. The weights were derived from the coefficients of each component estimated from linear regression models predicting each sleep outcome while adjusting for covariates.

RESULTS: A standard deviation increase in an adverse household environment score was associated with lower self-reported sleep duration (β = -13.9 min, 95% confidence interval: -26.1, -1.7) and actigraphy-based sleep efficiency (β = -0.7%, -1.4, 0.0). A standard deviation increase in the in-bed behaviors score was associated with lower actigraphy-based sleep duration (β = -9.7 min, -18.0, -1.3), sleep efficiency (β = -1.2%, -1.9, -0.6), and higher WASO (5.3 min, 2.1, 8.6).

CONCLUSION: Intervening on the sleep environment, including healthy sleep practices, may improve sleep duration and continuity among African-Americans.

Bryan MS, Sofer T, Afshar M, Mossavar-Rahmani Y, Hosgood D, Punjabi NM, et al. Mendelian randomization analysis of arsenic metabolism and pulmonary function within the Hispanic Community Health Study/Study of Latinos.. Scientific reports. 2021;11(1):13470.

Arsenic exposure has been linked to poor pulmonary function, and inefficient arsenic metabolizers may be at increased risk. Dietary rice has recently been identified as a possible substantial route of exposure to arsenic, and it remains unknown whether it can provide a sufficient level of exposure to affect pulmonary function in inefficient metabolizers. Within 12,609 participants of HCHS/SOL, asthma diagnoses and spirometry-based measures of pulmonary function were assessed, and rice consumption was inferred from grain intake via a food frequency questionnaire. After stratifying by smoking history, the relationship between arsenic metabolism efficiency [percentages of inorganic arsenic (%iAs), monomethylarsenate (%MMA), and dimethylarsinate (%DMA) species in urine] and the measures of pulmonary function were estimated in a two-sample Mendelian randomization approach (genotype information from an Illumina HumanOmni2.5-8v1-1 array), focusing on participants with high inferred rice consumption. Among never-smoking high inferred consumers of rice (n = 1395), inefficient metabolism was associated with past asthma diagnosis and forced vital capacity below the lower limit of normal (LLN) (OR 1.40, p = 0.0212 and OR 1.42, p = 0.0072, respectively, for each percentage-point increase in %iAs; OR 1.26, p = 0.0240 and OR 1.24, p = 0.0193 for %MMA; OR 0.87, p = 0.0209 and OR 0.87, p = 0.0123 for the marker of efficient metabolism, %DMA). Among ever-smoking high inferred consumers of rice (n = 1127), inefficient metabolism was associated with peak expiratory flow below LLN (OR 1.54, p = 0.0108/percentage-point increase in %iAs, OR 1.37, p = 0.0097 for %MMA, and OR 0.83, p = 0.0093 for %DMA). Less efficient arsenic metabolism was associated with indicators of pulmonary dysfunction among those with high inferred rice consumption, suggesting that reductions in dietary arsenic could improve respiratory health.

Sofer T, Lee J, Kurniansyah N, Jain D, Laurie CA, Gogarten SM, et al. BinomiRare: A robust test for association of a rare genetic variant with a binary outcome for mixed models and any case-control proportion.. HGG advances. 2021;2(3).

Whole-genome sequencing (WGS) and whole-exome sequencing studies have become increasingly available and are being used to identify rare genetic variants associated with health and disease outcomes. Investigators routinely use mixed models to account for genetic relatedness or other clustering variables (e.g., family or household) when testing genetic associations. However, no existing tests of the association of a rare variant with a binary outcome in the presence of correlated data control the type 1 error where there are (1) few individuals harboring the rare allele, (2) a small proportion of cases relative to controls, and (3) covariates to adjust for. Here, we address all three issues in developing a framework for testing rare variant association with a binary trait in individuals harboring at least one risk allele. In this framework, we estimate outcome probabilities under the null hypothesis and then use them, within the individuals with at least one risk allele, to test variant associations. We extend the BinomiRare test, which was previously proposed for independent observations, and develop the Conway-Maxwell-Poisson (CMP) test and study their properties in simulations. We show that the BinomiRare test always controls the type 1 error, while the CMP test sometimes does not. We then use the BinomiRare test to test the association of rare genetic variants in target genes with small-vessel disease (SVD) stroke, short sleep, and venous thromboembolism (VTE), in whole-genome sequence data from the Trans-Omics for Precision Medicine (TOPMed) program.