Check the data thoroughly before running whap, e.g use PEDSTATS
or similar.
Exclude rare haplotypes from analysis to improve numerical stability
WHAP is not suited to very large datasets, where large here
means more than a few thousand SNPs on a thousand or more individuals. In
particular, WHAP is not to whole genome association analysis, due
to memory and speed issues (see PLINK for software
that is).
Running time for the example dataset of an omnibus test for a 7 SNP
haplotype on 300 individuals for a quantitative trait is around 0.8
of a second to phase and perform the association test (on a 3Gz Linux
workstation). To perform 100 permutations on this same dataset takes
around 1 minute.
In addition to not handling datasets with many thousands of SNPs,
WHAP should not be used to attempt to phase more than a dozen or
so SNPs at a time. The precise number will depend on the LD structure of
the region (i.e. how many common/rare haplotypes there are). As a rule of
thumb, most analyses should probably try to look at 10 or fewer
SNPs in any one window resultsing in 10 or fewer common haplotypes.
Run whap with the --repeat N option where N is set to some
value such as 50 or 100 to check the numerical stability of important results. This option will
slow performance, but increase the chance of convergence at a global minimum, especially for
models with many parameters.
Quantitative traits and covariates should be one an approximately standard
normal scale (i.e. a mean near 0 and a variance near 1) -- traits with
order-of-magnitude different means and/or variances are likely to show problems with
numerical stability
Fix the trait mean and variance or prevalence where appropriate
Use of permutation is necessary to obtain significance values for sliding window
analyses
Use the --cond, --prev and --model w options combined
to perform a TDT-like test
Remember that including covariates when performing conditional (--cond)
analyses can cause problems