Coetzee Laboratory


Genome-wide association studies (GWAS) of complex phenotypes have become more powerful as sample sizes of cases and controls have increased and meta-analyses have been employed.  Additionally, as next generation sequencing techniques became more feasible and increasingly affordable, more single nucleotide polymorphisms (SNPs) with lower minor allele frequencies (MAFs) have been identified. Thus, association signals at any given locus have become increasingly complex in large part due to the many candidate risk SNPs, correlated with each other due to linkage disequilibrium (LD). Consequently, it is virtually impossible to assign functionality, let alone causality, to any given SNP at a risk locus. This dispiriting situation is only made more daunting by the unexpected finding that more than 80 percent of these risk SNPs for many complex diseases are located in non-coding DNA. To address these issues, we and others have used chromatin biofeatures to inform potential functionality on the original discovery SNPs (known to the field as “index SNPs”) and their many surrogate SNPs—the former revealed by GWAS and the latter defined by r2 of population-specific LD.