whap

Haplotype-based association analysis package

Shaun Purcell, Massachusetts General Hospital, Boston, MA, USA
Pak Sham, Hong Kong University, Hong Kong
                                               
Background
Download
Examples
Conditional test tutorial
Usage
Warnings
Future developments
Licence
Citation
Contact
 

Conditional analyses tutorial

This page illustrates how to use whap to perform a series of conditional tests. Conditional is (confusingly) used in two senses in whap. The first refers to tests based on a retrospective likelihood (becase we condition on trait values in that case). In contrast, in this tutorial we consider tests of genetic variation that are conditional on local genetic variation.

 
  1. Make directory on D: drive, e.g. work using Windows or DOS:
     mkdir D:\work
     D:
     cd D:\work 
    
  2. Download this ZIP file and save in the newly created work directory

  3. Mac users: download whap from here

  4. Unzip the downloaded ZIP file into the work directory

  5. Check the contents of the work directory. The vital files that we will be using in the first instance are (for now, ignore any other files in this directory)
    • whap.exe
    • snphap.exe
    • dataACGT.ped
    • dataACGT.dat
    • dataACGT.map
  6. You can use more to view text files, e.g.
       more dataACGT.ped
    
The data file contains 400 unreleated individuals (200 cases, 200 controls) measured for 5 SNPs. We will perform a haplotype-based association analysis using whap.


 
PRACTICAL EXERCISE: FIRST PART, DETECTING THE EFFECT
 

Omnibus haplotype test (all six markers)
    whap --file dataACGT 
Single SNP test, e.g. first SNP
    whap --file dataACGT --alt 1
Single SNP test of second SNP, with permutation p-value (500 permutations, in practice you should do more)
    whap --file dataACGT --alt 2 --perm 500
All six single SNP tests, and permute data to obtain empirical p-values: the P_MAX p-value is the significance of the best single SNP result after correcting for multiple testing by permutation.
    whap --file dataACGT --alt 1 --window --perm 100
Omnibus test, excluding rare haplotypes: e.g. only consider haplotypes with frequency > 10%
    whap --file dataACGT --at 10
Haplotype-specific tests
    whap --file dataACGT --hs
Selecting subsets of SNPs for haplotype tests: e.g. haplotypes formed by SNP 3 and 4 note: use comma between the marker numbers, no spaces
    whap --file dataACGT --alt 3,4
Give haplotype frequencies in cases and controls: haplotypes based on all 6 SNPs
    whap --file dataACGT --cc-freqs
or based only on SNPs 3 and 4, for example:
    whap --file dataACGT --alt 3,4 --cc-freqs
Output estimated haplotype phases for each individual: e.g. for all six SNPs
    whap --file dataACGT  --phase

e.g. for only 3rd and 4th SNP
    whap --file dataACGT --alt 3,4 --phase
Get confidence intervals on estimates: note the first extra row is the CI for the intercept, this should be ignored. The remaining five lines ( [2] to [6] refer to the CIs for the haplotypes (excluding the first, most common haplotype, as this is the reference haplotype, so the effect is fixed to zero).
    whap --file dataACGT --ci 0.95
QUESTION: What do you conclude about this association? What SNPs / haplotypes are associated? Do they increase or decrease risk? What are the effect sizes?


EXTRA EXERCISE (for bonus points): use Haploview to analyse these data. NOTE: load the file data1234.ped into Haploview instead of dataACGT.ped -- the files contain the same data except the SNP coding is numeric (1,2,3,4) instead of text (A,C,G,T), as Haploview can only handle numeric allele codes.
SECOND PART: DISSECTING THE EFFECT


 
All these tests below are appropriate for when a (large) omnibus haplotype association has been detected in a particular region. Often multiple SNPs and haplotype all have significant p-values. We can perform analyses to see if we can get a clearer picture of the association, as follows.


 
Does SNP X have any effect after controlling for everything else?
 

i.e. SNP X is dropped from the null model:e.g. for SNP 1:
    whap --file dataACGT --alt 1,2,3,4,5 --null 2,3,4,5
For SNP 2
    whap --file dataACGT --alt 1,2,3,4,5 --null 1,3,4,5
etc. Perform this test for every SNP. A significant p-value indicates that the SNP still has an independent effect.


 
NOTE: look at the haplotype estimate numbers unders the null compared to the alternate, and see how they relate to the test


 
NOTE: this test is not possible for all SNPs -- you will get a df of 0 and LRT of 0 meaning that the test was not possible. Why would this be the case?


 
Does everything else still have an effect after controlling for SNP X?
 

Here we ask whether or not the omnibus test remains significant after dropping out a SNP at a time. If the p-value remains significant then you conclude that this SNP cannot explain the entire omnibus result. If the p-value becomes not significant, this might suggest that the single SNP can explain the total association.
 

e.g. for SNP 1
    whap --file dataACGT --alt 1,2,3,4,5 --null 1
and the same for SNP 2
    whap --file dataACGT --alt 1,2,3,4,5 --null 2
etc.


 
Does everything else still have an effect after controlling for haplotype H?


 
Finally, we can ask whether or not there is any evidence of association (i.e. omnibus test) after controlling for a single haplotype at a time.


 
This requires using the --constrain command to manually specify the haplotype parameters under the alternate and null models


 
For example, the omnibus test is manually specified as follows (i.e. under the alternate every haplotype has a unique effect; under the null all haplotypes have the same effect, i.e. no association)
    whap --file dataACGT --constrain 1,2,3,4,5,6/1,1,1,1,1,1
A haplotype-specific test, e.g. for the 2nd haplotype, would be specified: (i.e. under alternate we estimate only 2 parameters, one for the second haplotype, one for all others; under the null, we specify no association as for the omnibus test)
    whap --file dataACGT --constrain 1,2,1,1,1,1/1,1,1,1,1,1
Some notes on the use of the constrain option:
  • the first haplotype must always have the fixed value 1 (i.e. it is the reference haplotype)
  • there should always be the same number of parameter codes as there are haplotypes (i.e. six in this case).
  • The numbers before the / symbol specify the codes for the alternate model; the number after the / are the codes under the null model. The null model must always be nested in the alternate.
The point of the --constrain option is that haplotypes with the same parameter code are estimated to have the same coefficient (only within alternate and null models though).


 
The conditional tests all involve a null hypothesis different from the simple null of no association. For example, testing for whether there is an effect controlling for haplotype 1: (i.e. the test is to constrain the 2nd to 6th haplotypes to have the same effect)
    whap --file dataACGT --constrain 1,2,3,4,5,6/1,2,2,2,2,2
Testing for whether there is an effect controlling for haplotype 2:
    whap --file dataACGT --constrain 1,2,3,4,5,6/1,2,1,1,1,1
Testing for whether there is an effect controlling for haplotype 3:
    whap --file dataACGT --constrain 1,2,3,4,5,6/1,1,2,1,1,1
etc. Perform this for all six haplotypes, one at a time.


 
QUESTION: Compare these results to the single SNP and haplotype specific results. What do they tell you?
The files beging cvACGT.* are the same data, except with an extra variant included that represents the true causal variant, which sits on a single haplotype (AACTA). The causal variant is a C/T SNP where the T allele increases risk, located between the 2nd and 3rd SNPs in the file used previously, i.e. the full disease haplotype is AATCTA.


 
Follow-up exercise (time allowing): perform the same tests as above (omnibus, single SNP, haplotype specific and three classes of conditional tests) on the data files
   cvACGT.ped
   cvACGT.dat
   cvACGT.map
For example,
   whap --file cvACGT
etc. Interpret the tes results in light of the fact that this file contains an extra variant that is the functional causal variant itself.

Created by Shaun Purcell; Last updated by Lori Thomas: March 2006