whap

Haplotype-based association analysis package

Shaun Purcell, Massachusetts General Hospital, Boston, MA, USA
Pak Sham, Hong Kong University, Hong Kong
                                               
Background
Download
Examples
Conditional test tutorial
Usage
Warnings
Future developments
Licence
Citation
Contact
 

Warnings

  • Check the data thoroughly before running whap, e.g use PEDSTATS or similar.

  • Exclude rare haplotypes from analysis to improve numerical stability

  • WHAP is not suited to very large datasets, where large here means more than a few thousand SNPs on a thousand or more individuals. In particular, WHAP is not to whole genome association analysis, due to memory and speed issues (see PLINK for software that is).
  • Running time for the example dataset of an omnibus test for a 7 SNP haplotype on 300 individuals for a quantitative trait is around 0.8 of a second to phase and perform the association test (on a 3Gz Linux workstation). To perform 100 permutations on this same dataset takes around 1 minute.

  • In addition to not handling datasets with many thousands of SNPs, WHAP should not be used to attempt to phase more than a dozen or so SNPs at a time. The precise number will depend on the LD structure of the region (i.e. how many common/rare haplotypes there are). As a rule of thumb, most analyses should probably try to look at 10 or fewer SNPs in any one window resultsing in 10 or fewer common haplotypes.

  • Run whap with the --repeat N option where N is set to some value such as 50 or 100 to check the numerical stability of important results. This option will slow performance, but increase the chance of convergence at a global minimum, especially for models with many parameters.
  • Quantitative traits and covariates should be one an approximately standard normal scale (i.e. a mean near 0 and a variance near 1) -- traits with order-of-magnitude different means and/or variances are likely to show problems with numerical stability

  • Fix the trait mean and variance or prevalence where appropriate

  • Use of permutation is necessary to obtain significance values for sliding window analyses

  • Use the --cond, --prev and --model w options combined to perform a TDT-like test

  • Remember that including covariates when performing conditional (--cond) analyses can cause problems



Created by Shaun Purcell; Last updated by Lori Thomas: March 2006