All

2022

Hujoel ML, Sherman M, Barton A, Mukamel R, Sankaran V, Terao C, Loh PR. Influences of rare copy-number variation on human complex traits. Cell. 2022;185(22):4233–4248.e27.
The human genome contains hundreds of thousands of regions harboring copy-number variants (CNV). However, the phenotypic effects of most such polymorphisms are unknown because only larger CNVs have been ascertainable from SNP-array data generated by large biobanks. We developed a computational approach leveraging haplotype sharing in biobank cohorts to more sensitively detect CNVs. Applied to UK Biobank, this approach accounted for approximately half of all rare gene inactivation events produced by genomic structural variation. This CNV call set enabled a detailed analysis of associations between CNVs and 56 quantitative traits, identifying 269 independent associations (p < 5 × 10-8) likely to be causally driven by CNVs. Putative target genes were identifiable for nearly half of the loci, enabling insights into dosage sensitivity of these genes and uncovering several gene-trait relationships. These results demonstrate the ability of haplotype-informed analysis to provide insights into the genetic basis of human complex traits.
Yengo L, Vedantam S, Marouli E, Sidorenko J, Bartell E, Sakaue S, Graff M, Eliasen A, Jiang Y, Raghavan S, Miao J, Arias J, Graham S, Mukamel RE, . , Loh PR, Yang J, Esko T, Assimes T, Auton A, Abecasis G, Willer C, Locke A, Berndt S, Lettre G, Frayling T, Okada Y, Wood A, Visscher P, Hirschhorn J. A saturated map of common genetic variants associated with human height. Nature. 2022;
Common single-nucleotide polymorphisms (SNPs) are predicted to collectively explain 40-50% of phenotypic variation in human height, but identifying the specific variants and associated regions requires huge sample sizes 1 . Here, using data from a genome-wide association study of 5.4 million individuals of diverse ancestries, we show that 12,111 independent SNPs that are significantly associated with height account for nearly all of the common SNP-based heritability. These SNPs are clustered within 7,209 non-overlapping genomic segments with a mean size of around 90 kb, covering about 21% of the genome. The density of independent associations varies across the genome and the regions of increased density are enriched for biologically relevant genes. In out-of-sample estimation and prediction, the 12,111 SNPs (or all SNPs in the HapMap 3 panel 2 ) account for 40% (45%) of phenotypic variance in populations of European ancestry but only around 10-20% (14-24%) in populations of other ancestries. Effect sizes, associated regions and gene prioritization are similar across ancestries, indicating that reduced prediction accuracy is likely to be explained by linkage disequilibrium and differences in allele frequency within associated regions. Finally, we show that the relevant biological pathways are detectable with smaller sample sizes than are needed to implicate causal genes and variants. Overall, this study provides a comprehensive map of specific genomic regions that contain the vast majority of common height-associated variants. Although this map is saturated for populations of European ancestry, further research is needed to achieve equivalent saturation in other ancestries.
Gao T, Soldatov R, Sarkar H, Kurkiewicz A, Biederstedt E, Loh PR, Kharchenko P. Haplotype-aware analysis of somatic copy number variations from single-cell transcriptomes. Nat Biotechnol. 2022;
Genome instability and aberrant alterations of transcriptional programs both play important roles in cancer. Single-cell RNA sequencing (scRNA-seq) has the potential to investigate both genetic and nongenetic sources of tumor heterogeneity in a single assay. Here we present a computational method, Numbat, that integrates haplotype information obtained from population-based phasing with allele and expression signals to enhance detection of copy number variations from scRNA-seq. Numbat exploits the evolutionary relationships between subclones to iteratively infer single-cell copy number profiles and tumor clonal phylogeny. Analysis of 22 tumor samples, including multiple myeloma, gastric, breast and thyroid cancers, shows that Numbat can reconstruct the tumor copy number profile and precisely identify malignant cells in the tumor microenvironment. We identify genetic subpopulations with transcriptional signatures relevant to tumor progression and therapy resistance. Numbat requires neither sample-matched DNA data nor a priori genotyping, and is applicable to a wide range of experimental settings and cancer types.
Genovese G, Mello C, Loh PR, Handsaker R, Kashin S, Whelan C, Bayer-Zwirello L, McCarroll S. Chromosomal phase improves aneuploidy detection in non-invasive prenatal testing at low fetal DNA fractions. Sci Rep. 2022;12(1):12025.
Non-invasive prenatal testing (NIPT) to detect fetal aneuploidy by sequencing the cell-free DNA (cfDNA) in maternal plasma is being broadly adopted. To detect fetal aneuploidies from maternal plasma, where fetal DNA is mixed with far-larger amounts of maternal DNA, NIPT requires a minimum fraction of the circulating cfDNA to be of placental origin, a level which is usually attained beginning at 10 weeks gestational age. We present an approach that leverages the arrangement of alleles along homologous chromosomes-also known as chromosomal phase-to make NIPT analyses more conclusive. We validate our approach with in silico simulations, then re-analyze data from a pregnant mother who, due to a fetal DNA fraction of 3.4%, received an inconclusive aneuploidy determination through NIPT. We find that the presence of a trisomy 18 fetus can be conclusively inferred from the patient's same molecular data when chromosomal phase is incorporated into the analysis. Key to the effectiveness of our approach is the ability of homologous chromosomes to act as natural controls for each other and the ability of chromosomal phase to integrate subtle quantitative signals across very many sequence variants. These results show that chromosomal phase increases the sensitivity of a common laboratory test, an idea that could also advance cfDNA analyses for cancer detection.
Hujoel ML, Loh PR, Neale B, Price A. Incorporating family history of disease improves polygenic risk scores in diverse populations. Cell Genom. 2022;2(7):100152.
Polygenic risk scores (PRSs) derived from genotype data and family history (FH) of disease provide valuable information for predicting disease risk, but PRSs perform poorly when applied to diverse populations. Here, we explore methods for combining both types of information (PRS-FH) in UK Biobank data. PRSs were trained using all British individuals (n = 409,000), and target samples consisted of unrelated non-British Europeans (n = 42,000), South Asians (n = 7,000), or Africans (n = 7,000). We evaluated PRS, FH, and PRS-FH using liability-scale R 2, primarily focusing on 3 well-powered diseases (type 2 diabetes, hypertension, and depression). PRS attained average prediction R 2s of 5.8%, 4.0%, and 0.53% in non-British Europeans, South Asians, and Africans, confirming poor cross-population transferability. In contrast, PRS-FH attained average prediction R 2s of 13%, 12%, and 10%, respectively, representing a large improvement in Europeans and an extremely large improvement in Africans. In conclusion, including family history improves the accuracy of polygenic risk scores, particularly in diverse populations.
Sherman MA, Yaari A, Priebe O, Dietlein F, Loh PR, Berger B. Genome-wide mapping of somatic mutation rates uncovers drivers of cancer. Nat Biotechnol. 2022;
Identification of cancer driver mutations that confer a proliferative advantage is central to understanding cancer; however, searches have often been limited to protein-coding sequences and specific non-coding elements (for example, promoters) because of the challenge of modeling the highly variable somatic mutation rates observed across tumor genomes. Here we present Dig, a method to search for driver elements and mutations anywhere in the genome. We use deep neural networks to map cancer-specific mutation rates genome-wide at kilobase-scale resolution. These estimates are then refined to search for evidence of driver mutations under positive selection throughout the genome by comparing observed to expected mutation counts. We mapped mutation rates for 37 cancer types and applied these maps to identify putative drivers within intronic cryptic splice regions, 5' untranslated regions and infrequently mutated genes. Our high-resolution mutation rate maps, available for web-based exploration, are a resource to enable driver discovery genome-wide.
Barton AR, Hujoel M, Mukamel R, Sherman M, Loh PR. A spectrum of recessiveness among Mendelian disease variants in UK Biobank. Am J Hum Genet. 2022;109(7):1298–1307.
Recent work has found increasing evidence of mitigated, incompletely penetrant phenotypes in heterozygous carriers of recessive Mendelian disease variants. We leveraged whole-exome imputation within the full UK Biobank cohort (n ∼ 500K) to extend such analyses to 3,475 rare variants curated from ClinVar and OMIM. Testing these variants for association with 58 quantitative traits yielded 102 significant associations involving variants previously implicated in 34 different diseases. Notable examples included a POR missense variant implicated in Antley-Bixler syndrome that associated with a 1.76 (SE 0.27) cm increase in height and an ABCA3 missense variant implicated in interstitial lung disease that associated with reduced FEV1/FVC ratio. Association analyses with 1,134 disease traits yielded five additional variant-disease associations. We also observed contrasting levels of recessiveness between two more-common, classical Mendelian diseases. Carriers of cystic fibrosis variants exhibited increased risk of several mitigated disease phenotypes, whereas carriers of spinal muscular atrophy alleles showed no evidence of altered phenotypes. Incomplete penetrance of cystic fibrosis carrier phenotypes did not appear to be mediated by common allelic variation on the functional haplotype. Our results show that many disease-associated recessive variants can produce mitigated phenotypes in heterozygous carriers and motivate further work exploring penetrance mechanisms.

2021

Márquez-Luna C, Gazal S, Loh PR, Kim S, Furlotte N, Auton A, 23andMe Research Team, Price A. Incorporating functional priors improves polygenic prediction accuracy in UK Biobank and 23andMe data sets. Nat Commun. 2021;12(1):6052.
Polygenic risk prediction is a widely investigated topic because of its promising clinical applications. Genetic variants in functional regions of the genome are enriched for complex trait heritability. Here, we introduce a method for polygenic prediction, LDpred-funct, that leverages trait-specific functional priors to increase prediction accuracy. We fit priors using the recently developed baseline-LD model, including coding, conserved, regulatory, and LD-related annotations. We analytically estimate posterior mean causal effect sizes and then use cross-validation to regularize these estimates, improving prediction accuracy for sparse architectures. We applied LDpred-funct to predict 21 highly heritable traits in the UK Biobank (avg N = 373 K as training data). LDpred-funct attained a +4.6% relative improvement in average prediction accuracy (avg prediction R2 = 0.144; highest R2 = 0.413 for height) compared to SBayesR (the best method that does not incorporate functional information). For height, meta-analyzing training data from UK Biobank and 23andMe cohorts (N = 1107 K) increased prediction R2 to 0.431. Our results show that incorporating functional priors improves polygenic prediction accuracy, consistent with the functional architecture of complex traits.
Zhao Y, Stankovic S, Koprulu M, Wheeler E, Day F, Lango Allen H, Kerrison N, Pietzner M, Loh PR, Wareham N, Langenberg C, Ong K, Perry J. GIGYF1 loss of function is associated with clonal mosaicism and adverse metabolic health. Nat Commun. 2021;12(1):4178.
Mosaic loss of chromosome Y (LOY) in leukocytes is the most common form of clonal mosaicism, caused by dysregulation in cell-cycle and DNA damage response pathways. Previous genetic studies have focussed on identifying common variants associated with LOY, which we now extend to rarer, protein-coding variation using exome sequences from 82,277 male UK Biobank participants. We find that loss of function of two genes-CHEK2 and GIGYF1-reach exome-wide significance. Rare alleles in GIGYF1 have not previously been implicated in any complex trait, but here loss-of-function carriers exhibit six-fold higher susceptibility to LOY (OR = 5.99 [3.04-11.81], p = 1.3 × 10-10). These same alleles are also associated with adverse metabolic health, including higher susceptibility to Type 2 Diabetes (OR = 6.10 [3.51-10.61], p = 1.8 × 10-12), 4 kg higher fat mass (p = 1.3 × 10-4), 2.32 nmol/L lower serum IGF1 levels (p = 1.5 × 10-4) and 4.5 kg lower handgrip strength (p = 4.7 × 10-7) consistent with proposed GIGYF1 enhancement of insulin and IGF-1 receptor signalling. These associations are mirrored by a common variant nearby associated with the expression of GIGYF1. Our observations highlight a potential direct connection between clonal mosaicism and metabolic health.