What SNP Genotyping Errors Are Most Costly for Genetic Association Studies?

Sun Jung Kang, Derek Gordon, Stephen J. Finch

Research output: Contribution to journalArticlepeer-review

72 Scopus citations


Which genotype misclassification errors are most costly, in terms of increased sample size necessary (SSN) to maintain constant asymptotic power and significance level, when performing case/control studies of genetic association? We answer this question for single-nucleotide polymorphisms (SNPs), using the 2 × 3 χ2 test of independence. Our strategy is to expand the noncentrality parameter of the asymptotic distribution of the χ 2 test under a specified alternative hypothesis to approximate SSN, using a linear Taylor series in the error parameters. We consider two scenarios: the first assumes Hardy-Weinberg equilibrium (HWE) for the true genotypes in both cases and controls, and the second assumes HWE only in controls. The Taylor series approximation has a relative error of less than 1% when each error rate is less than 2%. The most costly error is recording the more common homozygote as the less common homozygote, with indefinitely increasing cost coefficient as minor SNP allele frequencies approach 0 in both scenarios. The cost of misclassifying the more common homozygote to the heterozygote also becomes indefinitely large as the minor SNP allele frequency goes to 0 under both scenarios. For the violation of HWE modeled here, the cost of misclassifying a heterozygote to the less common homozygote becomes large, although bounded. Therefore, the use of SNPs with a small minor allele frequency requires careful attention to the frequency of genotyping errors to ensure that power specifications are met. Furthermore, the design of automated genotyping should minimize those errors whose cost coefficients can become indefinitely large.

Original languageEnglish (US)
Pages (from-to)132-141
Number of pages10
JournalGenetic Epidemiology
Issue number2
StatePublished - Feb 2004
Externally publishedYes

ASJC Scopus subject areas

  • Epidemiology
  • Genetics(clinical)


  • Chi-square
  • Cost
  • Error detection
  • Genotype error
  • Linkage disequilibrium
  • Noncentrality parameter
  • Test of independence


Dive into the research topics of 'What SNP Genotyping Errors Are Most Costly for Genetic Association Studies?'. Together they form a unique fingerprint.

Cite this