A question that has remained unresolved for decades is the relative importance of adaptive vs. non-adaptive evolutionary processes (like demography, purifying selection, mutation, and recombination rate) in shaping patterns of genomic variation in natural populations. Although their individual contributions are well understood, an existing challenge is how to disentangle all of these processes using population-genetic approaches, because they can affect patterns of variation in a similar manner. It is thus difficult to account for the joint contribution of non-adaptive forces in generating evolutionary stochasticity and distinguishing it from more deterministic forces of interest, such as positive selection.
Because the non-adaptive evolutionary processes act continually in populations, it is imperative that they are accounted for when devising an appropriate evolutionary null model for hypothesis testing. For instance, while most new mutations in functional regions are deleterious, linked effects of purifying selection on nearby neutral alleles are rarely accounted for, when performing demographic inference that assumes complete neutrality. This issue is exacerbated in species with small compact genomes where direct and linked effects of purifying selection are pervasive, confounding patterns generated by other evolutionary forces. I am interested in better understanding effects of selection due to linkage and in general how non-adaptive evolutionary processes jointly shape patterns of genome-wide variation.
I have been working on characterizing the effects of fixation and segregation of deleterious mutations on linked neutral mutations and how these effects bias population-genetic inference that assumes strict neutrality, for instance, estimation of historical population size changes. This work has been under the supervision of Jeffrey Jensen at Arizona State University and Brian Charlesworth from the University of Edinburgh.
Here is a recent seminar summarizing my work at ASU.
Effect of background selection on demographic inference
We investigated how the shape of the deleterious distribution of fitness effects (DFE) and the density of selected sites affect inference of population history using commonly used inference methods.
We find that background selection results in under-estimation of population size and false inference of recent growth if the true demographic model is that of constant population size or decline. If the population has truly experienced recent growth, then BGS does not lead to substantial mis-inference. The severity of mis-inference increases with increase in density of selected sites and as the proportion of strongly deleterious mutations increases.
We propose a potential solution within the ABC framework, where we instead use directly selected sites to perform demographic inference by treating the deleterious DFE as a nuisance parameter and show that inference is reasonably accurate. You can read it here:
All code and input/output files are available here.
Joint inference of demography and purifying selection
Because most natural populations are constantly experiencing fluctuations in population size and purifying selection is constantly purging deleterious variation, a model that incorporates both of these processes is the appropriate starting point of a null model on which the role and importance of periodic processes like adaptation can be characterized. This has been a challenge in the field because it is difficult to distinguish the individual contributions of demography and selection as both of these evolutionary processes leave similar imprints in population-genetic data. We have therefore developed a much better solution than described above - a method for jointly inferring historical population size changes along with the distribution of fitness effects (DFE) of new deleterious mutations acting genome-wide, by using the decay of background selection around functional genomic elements. This method is based on an approximate Bayesian computation framework, which allows us to also incorporate variation in recombination and mutation rates across the genome. We make no assumptions about which sites in functional elements are neutral, thus making it possible to use our method to infer the DFE of regulatory elements. In addition, our method works much better than previous methods when there is selection on silent sites. Here is our published work:
We have also explored how the extent of linked effects of purifying selection (i.e. background selection) varies under different demographic non-equilibrium scenarios. Under demographic equilibrium, we derived analytical expressions for reduction in diversity due to background selection when the deleterious DFE is discrete. I have made both a Mathematica notebook and a Python script to perform these calculations for any discrete DFE (or any discretized continuous DFE), provided here:
Linked effects of deleterious sweeps
Although the probability of fixation of deleterious mutations is a lot lower than those of advantageous mutations, it was shown by Maruyama and Kimura in 1974 that if a semi-dominant mutation with a selective disadvantage -s does go to fixation by chance, it takes the same time to reach fixation in the population as a beneficial semi-dominant mutation with a selective advantage +s. This interesting and counter-intuitive result implies that fixation of deleterious mutations by chance could result in selective sweep-like signatures that could appear as false positives in genome scans. On taking a closer look, we found that only mildly deleterious mutations have appreciable probabilities of fixation and the reduction in diversity caused by their fixation is restricted to a small region close to the fixed site. Thus, fixations of deleterious mutations are unlikely to present as false-positives when detecting sweeps, unless recombination rates are extremely low. However, mildly deleterious mutations can contribute to outliers in FST scans even in the presence of weakly positively selected alleles. You can read it here and all the code is provided here: https://github.com/paruljohri/Deleterious_Sweeps.
Population genomics in unicellular eukaryotes
Unicellular eukaryotes represent organisms with a unique set of characteristics that allow us to sample entirely new parameter spaces in population genetics. Freeliving unicellular organisms can have large effective population sizes, have been shown in many species to lack population structure despite being cosmopolitan, have varying levels of inbreeding and usually have extremely compact genomes. In addition, despite the fact that unicellular eukaryotes comprise the most phylogenetic diversity, and are members of all eukaryotic kingdoms, they are highly understudied. We performed nuclear and mitochondrial population genomics in five species of Paramecium and found that direct purifying selection is pervasive across their extremely compact nuclear genomes. Our study highlights the lack of neutral markers for demographic inference in such compact genomes. Here is the published work:
The Paramecium species offer a particularly interesting system in which to study the evolution of mitochondrial genomes as Paramecium cells are more mitochondria rich than mammalian and yeast cells, and possibly undergo no bottleneck during cell division suggesting a large population size. The mitochondria also appear to exist as independent structural units and do not undergo fusion, unlike the constant flux of organelle fusion and fission in other metazoan mitochondrial populations and are thought to not be exchanged between parental cells during conjugation. In concordance, using population-genomic data we found a complete lack of recombination in their mitochondria, and showed that mitochondrial genomes experience similar or stronger efficacy of purifying selection than the recombining nucleus. We also demonstrated how changes in mutation spectra between species may have contributed to changes in nucleotide composition of the mitochondrial genome across these lineages. Here is the published manuscript:
Evolution of whole-genome duplicates
Whole-genome duplications (WGDs) are prevalent in many eukaryotes and are considered to be a major source of new genes. Although most gene duplicates are lost post-WGD, some are retained for extended periods, the evolutionary mechanisms for which are still debated. In case of a WGD, the question of loss and retention of duplicates is really determined by the fixation of mutations that abolish protein function (i.e., loss-of-function mutations or null alleles), which is mostly dependent on the population-genetic parameters involved. By examining such loss-of-function mutations segregating in populations we identify WGD paralogs (or duplicates) that may be headed towards loss. We found the loss of a gene duplicate post-WGD to be a gradual process, involving the fixation of nonsynonymous mutations and reduction in levels of expression, that occurs over a long period of evolutionary time. Concordantly, after inferring the distribution of fitness effects (DFE) of new base-substitution mutations at nonsynonymous sites and of frameshift-causing indels in pairs of WGD paralogs, we find that both types of mutations in the more highly expressed copy have a higher fitness cost than in their corresponding paralogs with lower expression. Using the inferred DFE allows us to more precisely estimate the expected time to fixation of null alleles in WGD paralogs, and we discuss the implications for evolutionary mechanisms responsible for long-term retention of paralogs post-WGD. This work is now published in Molecular Biology and Evolution and can be read here.