|
1. Introduction
2. Basic information
3. Download and general notes
4. Command reference table
5. Basic usage/data formats
6. Data management
7. Summary stats
8. Inclusion thresholds
9. Population stratification
10. IBS/IBD estimation
11. Association
12. Family-based association
13. Permutation procedures
14. Multimarker tests
15. Conditional haplotype tests
16. Proxy association
17. Full imputation (beta)
18. LD-based results clumping
19. Epistasis
20. Copy Number Variation
21. R-plugins
22. SNP annotation lookup
23. Simulation tools
24. Profile scoring
25. Resources
26. Miscellaneous
27. FAQ & Hints
28. gPLINK
|
|
Epistasis
For disease-trait population-based samples, it is possible to test for
epistasis. The epistasis test can either be case-only or
case-control. All pairwise combinations of SNPs can be tested:
although this may or may not be desirable in statistical terms, it is
computationally feasible for moderate datasets using PLINK, e.g. the
4.5 billion two-locus tests generated from a 100K data set took just
over 24 hours to run, for approximately 500 individuals (with
the --fast-epistasis command). Alternatively, sets can be
specified (e.g. to test only the most significant 100 SNPs against all
other SNPs, or against themselves, etc). The output consists only
pairwise epistatic results above a certain significance value; also,
for each SNP, a summary of all the pairwise epistatic tests is given
(e.g. maximum test, proportion of tests significant at a certain
threshold, etc). To test for gene-by-environment interaction, see
either the section on stratified
analyses for disease traits, or
the section on QTL GxE for quantitative
traits.
IMPORTANT! These tests for epistasis are currently
only applicable for population-based samples, not family-based.
SNP x SNP epistasis
To test SNP x SNP epistasis for case/control population-based samplse, use the command
plink --file mydaya --epistasis
which will send output to the files
plink.epi.cc
plink.epi.cc.summary
where cc = case-control; for quantitative traits, cc will be replaced by qt.
The default test uses either linear or logistic regression, depending on whether the phenoype is
a quantitative or binary trait. PLINK makes a model based on allele dosage for each SNP, A
and B, and fits the model
Y ~ b0 + b1.A + b2.B + b3.AB + e
The test for interaction is based on the coefficient b3.
Hint For disease traits only, an approximate but
faster method can be used to screen for epistasis: use
the --fast-epistasis command instead
of --epistasis. This test is based on a Z-score for
difference in SNP1-SNP2 assocation (odds ratio) between cases and
controls (or in cases only, in a case-only analysis). If you use this
to screen a large number of SNPs, you should probably report the more
standard logistic regression test value also. In practice, both
approaches usually give similar results, which justifies the use
of --fast-epistasis as a screening tool for a
computationally-demanding problem. Of course, given a specific (and
often extreme) threshold, --epi1, the exact above-threshold
list of SNPs will not always be the same; if you choose to use this
approach, it is probably wise to apply it to select a subset of pairs
of SNPs below a reasonably liberal --epi1 threshold to be
tested with the more standard --epistasis command.
Important The --epistasis command is set up
for testing a potentially very large number of SNP by SNP comparisons,
most of which would not be significant or of interest. Because the output
may contains millions or billions of line, the default is to only output
tests with p-values less than 1e-4, as specified by the --epi1
option (see below). If your dataset is much smaller and you definitely
want to see all the output, add --epi1 1 . If you do not, odds
are you'll see a blank output file except for the header (i.e. immediately
telling you that none of the tests were significant at 1e-4).
Specifying which SNPs to test
There are different modes for specifying which SNPs are tested:
ALL x ALL
plink --file mydata --epistasis
SET1 x SET1 { where epi.set contains only 1 set }
plink --file mydata --epistasis --set-test --set epi.set
SET1 x ALL { where epi.set contains only 1 set }
plink --file mydata --epistasis --set-test --set epi.set --set-by-all
SET1 x SET2 { where epi.set contains 2 sets }
plink --file mydata --epistasis --set-test --set epi.set
For the 'symmetrical' cases (ALLxALL and SET1xSET1) then only unique pairs
are analysed.
For the other two cases (SET1xALL, SET1xSET2) then all pairs are
analysed (e.g. will perform SNPA x SNPB as well as SNPB x SNPA, if A
and B are in both SET1 and SET2). It will not try to analysis SNPA x
SNPA however.
The output
The output can be controlled via
plink --file mydata --epistasis --epi1 0.0001
which means only record results that are significant p<=0.0001. (This
prevents too much output from being generated). The output is in the form
CHR1 Chromosome of first SNP
SNP1 Identifier for first SNP
CHR2 Chromosome of second SNP
SNP2 Identifier for second SNP
OR_INT Odds ratio for interaction
Z Z score for test of odds ratio
P Asymptotic p-value
A second part of the output: for each SNP in SET1, or in ALL if no
sets were specified, is information about the number of significant
epistatic tests that SNP featured in (i.e. either with ALL other SNPs,
with SET1, or with SET2). The threshold --epi2 determines this:
plink --file mydata --epistasis --epi1 0.0001 --epi2 0.05
The output in the plink.epi.cc.summary file containts the following fields:
CHR Chromosome
SNP SNP identifier
N_SIG # significant epistatic tests (p <= "--epi2" threshold)
N_TOT # of valid tests (i.e. non-zero allele counts, etc)
PROP Proportion significant of valid tests
BEST_CHISQ Highest statistic for this SNP
BEST_CHR Chromosome of best SNP
BEST_SNP SNP identifier of best SNP
This file should be interpreted as giving only a very rough idea about
the extent of epistasis and which SNPs seem to be interacting
(although, of course, this is a naive statistic as we do not take LD
into account -- i.e. PROP does not represent the number
of independent epistatic results).
Case-only epistasis
For case-only epistatic analysis,
plink --file mydata --fast-epistasis --case-only
sends output to (co = case-only)
plink.epi.co
plink.epi.co.summary
All other options are as described above.
Currently, in case-only analysis, only SNPs that are more than 1 Mb
apart, or on different chromosomes, are included in case-only
tests. This behavior can be changed with the --gap option,
with the distance specified kb: for example, to specify a gap of 5 Mb,
plink --file mydata --fast-epistasis --case-only --gap 5000
This option is important, as the case-only test for epistasis assumes
that the two SNPs are in linkage equilibrium in the general
population.
Gene-based tests of epistasis
WARNING This test is still under heavy
development and not ready for use.
|
|