abstract
- Some of the assumptions underlying estimates of DNA and protein sequence divergence are examined. A solution for the variance of these estimates that allows for different mutation rates and different population sizes in each species and for an arbitrary structure in the initial population is obtained. It is shown that these conditions do not strongly affect estimates of divergence. In general, they cause the variance of divergence to be smaller than a binomial variance. Thus, the binomial variance that is usually assumed for these estimates is safely conservative. It is shown that variability in the mutation rate among sites can have an effect as large as or larger than variability in the mutation rate among bases. Variability in the mutation rate among bases and among sites causes the number of substitutions between two sequences to be underestimated. Protein and DNA sequences from several species are collected to estimate the variability in mutation rates among sites. When many homologous sequences are known, standard methods to estimate this variability can be used. The estimates of this variability show that this factor is important when considering the spectrum of spontaneous mutations and is strongly reflected in the divergence of sequences. Smaller variability is found for the third position of codons than for the first and second codon positions. This may be because of less selective constraints on this position or because the third position has been saturated with mutations for the sequences examined.