I am interested in how non-adaptive evolutionary processes like demographic changes, purifying selection, mutation and recombination rate, shape patterns of genome-wide variation. Although great advances have been made in the theoretical and statistical development to quantify each of these evolutionary processes individually, a statistical framework to jointly account for all of the above processes is currently missing. Because most natural populations are constantly experiencing fluctuations in population size and purifying selection is constantly purging deleterious variation, a model that incorporates both of these processes is the appropriate starting point of a null model on which the role and importance of periodic processes like adaptation can be characterized. This has been a challenge in the field because it is difficult to distinguish the individual contributions of demography and selection as both of these evolutionary processes leave similar imprints in population-genetic data.
I have been working on characterizing the effects of fixation and segregation of deleterious mutations on linked neutral mutations and how these effects bias population-genetic inference that assumes complete neutrality, for instance estimation of historical population size changes. This work has been under the supervision of Jeffrey Jensen at Arizona State University and Brian Charlesworth from the University of Edinburgh.
Here is a recent seminar summarizing my recent work at ASU.
Impact of background selection on demographic inference
We investigated how the shape of the deleterious distribution of fitness effects (DFE) and the density of selected sites affect inference of population history using commonly used inference methods.
We find that background selection results in under-estimation of population size and false inference of recent growth if the true demographic model is that of constant population size or decline. If the population has truly experienced recent growth, then BGS does not lead to substantial mis-inference. The severity of mis-inference increases with increase in density of selected sites and as the proportion of strongly deleterious mutations increases.
We propose a potential solution within the ABC framework, where we instead use directly selected sites to perform demographic inference by treating the deleterious DFE as a nuisance parameter and show that inference is reasonably accurate. You can read it here:
All code and input/output files are available here.
Joint inference of demography and purifying selection
A much better solution -- we have developed a method for jointly inferring historical population size changes along with the distribution of fitness effects (DFE) of new deleterious mutations acting genome-wide, by using the decay of background selection around functional genomic elements. This method is based on an approximate Bayesian computation framework, which allows us to also incorporate variation in recombination and mutation rates across the genome. We make no assumptions about which sites in functional elements are neutral, thus making it possible to use our method to infer the DFE of regulatory elements. In addition, our method works much better than previous methods when there is selection on silent sites. Here is our published work:
We have also explored how the extent of linked effects of purifying selection (i.e. background selection) varies under different demographic non-equilibrium scenarios. Under demographic equilibrium, we derived analytical expressions for reduction in diversity due to background selection when the deleterious DFE is discrete. I have made both a Mathematica notebook and a Python script to perform these calculations for any discrete DFE (or any discretized continuous DFE), provided here:
Population genomics in unicellular eukaryotes
Unicellular eukaryotes represent organisms with a unique set of characteristics that allow us to sample entirely new parameter spaces in population genetics. Freeliving unicellular organisms can have large effective population sizes, have been shown in many species to lack population structure despite being cosmopolitan, have varying levels of inbreeding and usually have extremely compact genomes. In addition, despite the fact that unicellular eukaryotes comprise the most phylogenetic diversity, and are members of all eukaryotic kingdoms, they are highly understudied. We performed nuclear and mitochondrial population genomics in five species of Paramecium. We found that Paramecium species exhibit one of the highest levels of diversity, no world-wide population structure, and direct purifying selection is pervasive across their compact nuclear genomes. Our study highlights the lack of neutral markers for demographic inference in such compact genomes. Here is the published work:
The Paramecium species offer a particularly interesting system in which to study the evolution of mitochondrial genomes as Paramecium cells are more mitochondria rich than mammalian and yeast cells, and possibly undergo no bottleneck during cell division suggesting a large population size. The mitochondria also appear to exist as independent structural units and do not undergo fusion, unlike the constant flux of organelle fusion and fission in other metazoan mitochondrial populations and are thought to not be exchanged between parental cells during conjugation. In concordance, using population-genomic data we found a complete lack of recombination in their mitochondria, and showed that mitochondrial genomes experience similar or stronger efficacy of purifying selection than the recombining nucleus. We also demonstrated how changes in mutation spectra between species may have contributed to changes in nucleotide composition of the mitochondrial genome across these lineages. Here is the published manuscript:
Selective forces on whole-genome duplicates
Whole-genome duplications (WGDs) are prevalent in many eukaryotes and are considered to be a major source of new genes. Although most gene duplicates are lost post-WGD, some are retained for extended periods, the evolutionary mechanisms for which are still debated. In case of a WGD, the question of loss and retention of duplicates is really determined by the fixation of nonfunctionalizing mutations in the population, which is mostly dependent on population-genetic parameters. I am thus using a population-genetic approach to characterize the selective forces acting on currently retained gene duplicates and to understand the mechanism of their loss.
Other relevant papers related to this work that I have been heavily involved in are: