MAXIMUM‐LIKELIHOOD ESTIMATES OF SELECTION...

MAXIMUM‐LIKELIHOOD ESTIMATES OF SELECTION COEFFICIENTS FROM DNA SEQUENCE DATA

Abstract

Selection can have a significant effect on sequence evolution and this will be reflected in the information contained within the phylogenetic relationships between species. Selection will reduce the frequency of any deleterious nucleotides, and this can be used to test for the presence of selection. The frequencies of different nucleotides can be predicted theoretically and compared to observed values. If a sample of sequences has an usually low frequency of a particular nucleotide then selection might be inferred to have acted upon these sequences. This conclusion can be true only if the sequences are not too closely related and if sufficient mutations have occurred during their evolution. Otherwise, the unusual pattern of nucleotides in the sequences may be caused by recent common ancestry. An algorithm is presented to obtain maximum-likelihood estimates of selection coefficients using the phylogenetic information contained within sequence data. A k-allele model is developed that uses the phylogeny to measure relative mutation rates and degrees of relatedness and to evaluate the likelihood in the presence of selection. The method is illustrated with examples from the NS2 genes of influenza viruses and the MHC genes of mice. It is shown that the maximum-likelihood estimate for mutation rates are very large for. influenza viruses and that statistically significant selection acts to maintain a specific coding sequence. Overall, the MHC genes also have significant selection to preserve the coding sequence, but at the antigen recognition site, this selection is reversed to promote genetic variation. Maximum-likelihood estimates of these selection coefficients are provided.