Correction of Population Stratification in Large Multi-Ethnic Association Studies
- Additional Document Info
- View All
BACKGROUND: The vast majority of genetic risk factors for complex diseases have, taken individually, a small effect on the end phenotype. Population-based association studies therefore need very large sample sizes to detect significant differences between affected and non-affected individuals. Including thousands of affected individuals in a study requires recruitment in numerous centers, possibly from different geographic regions. Unfortunately such a recruitment strategy is likely to complicate the study design and to generate concerns regarding population stratification. METHODOLOGY/PRINCIPAL FINDINGS: We analyzed 9,751 individuals representing three main ethnic groups - Europeans, Arabs and South Asians - that had been enrolled from 154 centers involving 52 countries for a global case/control study of acute myocardial infarction. All individuals were genotyped at 103 candidate genes using 1,536 SNPs selected with a tagging strategy that captures most of the genetic diversity in different populations. We show that relying solely on self-reported ethnicity is not sufficient to exclude population stratification and we present additional methods to identify and correct for stratification. CONCLUSIONS/SIGNIFICANCE: Our results highlight the importance of carefully addressing population stratification and of carefully "cleaning" the sample prior to analyses to obtain stronger signals of association and to avoid spurious results.
has subject area