Epidemiological sleep research strives to identify the interactions and causal mechanisms by which sleep affects human health, and to design intervention strategies for improving sleep throughout the lifespan. These goals can be advanced by further focusing on the environmental and genetic etiology of sleep disorders, and by development of risk stratification algorithms, to identify people who are at risk or are affected by, sleep disorders. These studies rely on comprehensive sleep-related data which often contains complex multi-dimensional physiological and molecular measurements across multiple timepoints. Thus, sleep research is well-suited for the application of computational approaches that can handle high-dimensional data. Here, we survey recent advances in machine and deep learning together with the availability of large human cohort studies with sleep data that can jointly drive the next breakthroughs in the sleep-research field. We describe sleep-related data types and datasets, and present some of the tasks in the field that can be targets for algorithmic approaches, as well as the challenges and opportunities in pursuing them.
Publications
2021
Autosomal genetic analyses of blood lipids have yielded key insights for coronary heart disease (CHD). However, X chromosome genetic variation is understudied for blood lipids in large sample sizes. We now analyze genetic and blood lipid data in a high-coverage whole X chromosome sequencing study of 65,322 multi-ancestry participants and perform replication among 456,893 European participants. Common alleles on chromosome Xq23 are strongly associated with reduced total cholesterol, LDL cholesterol, and triglycerides (min P = 8.5 × 10-72), with similar effects for males and females. Chromosome Xq23 lipid-lowering alleles are associated with reduced odds for CHD among 42,545 cases and 591,247 controls (P = 1.7 × 10-4), and reduced odds for diabetes mellitus type 2 among 54,095 cases and 573,885 controls (P = 1.4 × 10-5). Although we observe an association with increased BMI, waist-to-hip ratio adjusted for BMI is reduced, bioimpedance analyses indicate increased gluteofemoral fat, and abdominal MRI analyses indicate reduced visceral adiposity. Co-localization analyses strongly correlate increased CHRDL1 gene expression, particularly in adipose tissue, with reduced concentrations of blood lipids.
Long and short sleep duration are associated with elevated blood pressure (BP), possibly through effects on molecular pathways that influence neuroendocrine and vascular systems. To gain new insights into the genetic basis of sleep-related BP variation, we performed genome-wide gene by short or long sleep duration interaction analyses on four BP traits (systolic BP, diastolic BP, mean arterial pressure, and pulse pressure) across five ancestry groups in two stages using 2 degree of freedom (df) joint test followed by 1df test of interaction effects. Primary multi-ancestry analysis in 62,969 individuals in stage 1 identified three novel gene by sleep interactions that were replicated in an additional 59,296 individuals in stage 2 (stage 1 + 2 Pjoint < 5 × 10-8), including rs7955964 (FIGNL2/ANKRD33) that increases BP among long sleepers, and rs73493041 (SNORA26/C9orf170) and rs10406644 (KCTD15/LSM14A) that increase BP among short sleepers (Pint < 5 × 10-8). Secondary ancestry-specific analysis identified another novel gene by long sleep interaction at rs111887471 (TRPC3/KIAA1109) in individuals of African ancestry (Pint = 2 × 10-6). Combined stage 1 and 2 analyses additionally identified significant gene by long sleep interactions at 10 loci including MKLN1 and RGL3/ELAVL3 previously associated with BP, and significant gene by short sleep interactions at 10 loci including C2orf43 previously associated with BP (Pint < 10-3). 2df test also identified novel loci for BP after modeling sleep that has known functions in sleep-wake regulation, nervous and cardiometabolic systems. This study indicates that sleep and primary mechanisms regulating BP may interact to elevate BP level, suggesting novel insights into sleep-related BP regulation.
BACKGROUND: The large airway epithelial barrier provides one of the first lines of defense against respiratory viruses, including SARS-CoV-2 that causes COVID-19. Substantial inter-individual variability in individual disease courses is hypothesized to be partially mediated by the differential regulation of the genes that interact with the SARS-CoV-2 virus or are involved in the subsequent host response. Here, we comprehensively investigated non-genetic and genetic factors influencing COVID-19-relevant bronchial epithelial gene expression.
METHODS: We analyzed RNA-sequencing data from bronchial epithelial brushings obtained from uninfected individuals. We related ACE2 gene expression to host and environmental factors in the SPIROMICS cohort of smokers with and without chronic obstructive pulmonary disease (COPD) and replicated these associations in two asthma cohorts, SARP and MAST. To identify airway biology beyond ACE2 binding that may contribute to increased susceptibility, we used gene set enrichment analyses to determine if gene expression changes indicative of a suppressed airway immune response observed early in SARS-CoV-2 infection are also observed in association with host factors. To identify host genetic variants affecting COVID-19 susceptibility in SPIROMICS, we performed expression quantitative trait (eQTL) mapping and investigated the phenotypic associations of the eQTL variants.
RESULTS: We found that ACE2 expression was higher in relation to active smoking, obesity, and hypertension that are known risk factors of COVID-19 severity, while an association with interferon-related inflammation was driven by the truncated, non-binding ACE2 isoform. We discovered that expression patterns of a suppressed airway immune response to early SARS-CoV-2 infection, compared to other viruses, are similar to patterns associated with obesity, hypertension, and cardiovascular disease, which may thus contribute to a COVID-19-susceptible airway environment. eQTL mapping identified regulatory variants for genes implicated in COVID-19, some of which had pheWAS evidence for their potential role in respiratory infections.
CONCLUSIONS: These data provide evidence that clinically relevant variation in the expression of COVID-19-related genes is associated with host factors, environmental exposures, and likely host genetic variation.
BACKGROUND/OBJECTIVES: Neck circumference, an index of upper airway fat, has been suggested to be an important measure of body-fat distribution with unique associations with health outcomes such as obstructive sleep apnea and metabolic disease. This study aims to study the genetic bases of neck circumference.
METHODS: We conducted a multi-ethnic genome-wide association study of neck circumference, adjusted and unadjusted for BMI, in up to 15,090 European Ancestry (EA) and African American (AA) individuals. Because sexually dimorphic associations have been observed for anthropometric traits, we conducted both sex-combined and sex-specific analysis.
RESULTS: We identified rs227724 near the Noggin (NOG) gene as a possible quantitative locus for neck circumference in men (N = 8831, P = 1.74 × 10-9) but not in women (P = 0.08). The association was replicated in men (N = 1554, P = 0.045) in an independent dataset. This locus was previously reported to be associated with human height and with self-reported snoring. We also identified rs13087058 on chromosome 3 as a suggestive locus in sex-combined analysis (N = 15090, P = 2.94 × 10-7; replication P =0.049). This locus was also associated with electrocardiogram-assessed PR interval and is a cis-expression quantitative locus for the PDZ Domain-containing ring finger 2 (PDZRN3) gene. Both NOG and PDZRN3 interact with members of transforming growth factor-beta superfamily signaling proteins.
CONCLUSIONS: Our study suggests that neck circumference may have unique genetic basis independent of BMI.
Large datasets of hundreds to thousands of individuals measuring RNA-seq in observational studies are becoming available. Many popular software packages for analysis of RNA-seq data were constructed to study differences in expression signatures in an experimental design with well-defined conditions (exposures). In contrast, observational studies may have varying levels of confounding transcript-exposure associations; further, exposure measures may vary from discrete (exposed, yes/no) to continuous (levels of exposure), with non-normal distributions of exposure. We compare popular software for gene expression-DESeq2, edgeR and limma-as well as linear regression-based analyses for studying the association of continuous exposures with RNA-seq. We developed a computation pipeline that includes transformation, filtering and generation of empirical null distribution of association P-values, and we apply the pipeline to compute empirical P-values with multiple testing correction. We employ a resampling approach that allows for assessment of false positive detection across methods, power comparison and the computation of quantile empirical P-values. The results suggest that linear regression methods are substantially faster with better control of false detections than other methods, even with the resampling method to compute empirical P-values. We provide the proposed pipeline with fast algorithms in an R package Olivia, and implemented it to study the associations of measures of sleep disordered breathing with RNA-seq in peripheral blood mononuclear cells in participants from the Multi-Ethnic Study of Atherosclerosis.
STUDY QUESTION: Does the expansion of genome-wide association studies (GWAS) to a broader range of ancestries improve the ability to identify and generalise variants associated with age at menarche (AAM) in European populations to a wider range of world populations?
SUMMARY ANSWER: By including women with diverse and predominantly non-European ancestry in a large-scale meta-analysis of AAM with half of the women being of African ancestry, we identified a new locus associated with AAM in African-ancestry participants, and generalised loci from GWAS of European ancestry individuals.
WHAT IS KNOWN ALREADY: AAM is a highly polygenic puberty trait associated with various diseases later in life. Both AAM and diseases associated with puberty timing vary by race or ethnicity. The majority of GWAS of AAM have been performed in European ancestry women.
STUDY DESIGN, SIZE, DURATION: We analysed a total of 38 546 women who did not have predominantly European ancestry backgrounds: 25 149 women from seven studies from the ReproGen Consortium and 13 397 women from the UK Biobank. In addition, we used an independent sample of 5148 African-ancestry women from the Southern Community Cohort Study (SCCS) for replication.
PARTICIPANTS/MATERIALS, SETTING, METHODS: Each AAM GWAS was performed by study and ancestry or ethnic group using linear regression models adjusted for birth year and study-specific covariates. ReproGen and UK Biobank results were meta-analysed using an inverse variance-weighted average method. A trans-ethnic meta-analysis was also carried out to assess heterogeneity due to different ancestry.
MAIN RESULTS AND THE ROLE OF CHANCE: We observed consistent direction and effect sizes between our meta-analysis and the largest GWAS conducted in European or Asian ancestry women. We validated four AAM loci (1p31, 6q16, 6q22 and 9q31) with common genetic variants at P < 5 × 10-7. We detected one new association (10p15) at P < 5 × 10-8 with a low-frequency genetic variant lying in AKR1C4, which was replicated in an independent sample. This gene belongs to a family of enzymes that regulate the metabolism of steroid hormones and have been implicated in the pathophysiology of uterine diseases. The genetic variant in the new locus is more frequent in African-ancestry participants, and has a very low frequency in Asian or European-ancestry individuals.
LARGE SCALE DATA: N/A.
LIMITATIONS, REASONS FOR CAUTION: Extreme AAM (<9 years or >18 years) were excluded from analysis. Women may not fully recall their AAM as most of the studies were conducted many years later. Further studies in women with diverse and predominantly non-European ancestry are needed to confirm and extend these findings, but the availability of such replication samples is limited.
WIDER IMPLICATIONS OF THE FINDINGS: Expanding association studies to a broader range of ancestries or ethnicities may improve the identification of new genetic variants associated with complex diseases or traits and the generalisation of variants from European-ancestry studies to a wider range of world populations.
STUDY FUNDING/COMPETING INTEREST(S): Funding was provided by CHARGE Consortium grant R01HL105756-07: Gene Discovery For CVD and Aging Phenotypes and by the NIH grant U24AG051129 awarded by the National Institute on Aging (NIA). The authors have no conflict of interest to declare.