PLINK: Whole genome data analysis toolset plink...
Latest PLINK release is v1.03 (10-Jun-2008)

Whole genome association analysis toolset

Introduction | Basics | Download | Reference | Formats | Data management | Summary stats | Filters | Stratification | IBS/IBD | Association | Family-based | Permutation | Haplotypes | Conditional tests | Proxy association | Imputation | Clumping | Epistasis | Copy Number Variation | R-plugins | SNP annotation | Simulation | Profiles | Resources | Misc. | FAQ | gPLINK

1. Introduction

2. Basic information

3. Download and general notes

4. Command reference table

5. Basic usage/data formats 6. Data management

7. Summary stats 8. Inclusion thresholds 9. Population stratification 10. IBS/IBD estimation 11. Association 12. Family-based association 13. Permutation procedures 14. Multimarker tests 15. Conditional haplotype tests 16. Proxy association 17. Full imputation (beta) 18. LD-based results clumping 19. Epistasis 20. Copy Number Variation 21. R-plugins 22. SNP annotation lookup 23. Simulation tools 24. Profile scoring 25. Resources 26. Miscellaneous 27. FAQ & Hints

28. gPLINK
 

Miscellaneous

This page details a collection of options and commands that did not get proper mention elsewhere.

Output modifiers

One convenient filter is
     --pfilter 1e-3
which will, for example, only report statistics with p-values less than 1e-3.

NOTE This is operation for the basic association tests, but do not expect this to work for all methods that return a p-value.

To obtain -log10(p) values instead of p-values in the *adjusted file, add the flag (this does not change the output of p-values in other files)
     --log10

To fix the value of lambda used for the genomic control in the *adjusted file, instead of estimating it from the data, use the option, for example
     --lambda 1.2

To obtain an extra set of columns that facilitates making a Q-Q plot in the *.adjusted file, add the option
     --qq-plot
This will work with either basic p-values, or with --log10 p-values.

Analyses with different species

In this respect, PLINK differentiates between species only in terms of how many chromosomes there are, and which are sex-linked or haploid. Several non-human species are supported, by adding each analysis the extra flag
     --dog
or
     --horse
or
     --cow
or
     --sheep

Matrix of pairwise LD (genotype correlation)

Correlations based on genotype allele counts (i.e. w/out phasing, and for founders only) can be obtained with the commands
plink --file mydata --r

or

plink --file mydata --r2

These both create a file called
	plink.ld
with a list of R or R-squared values in it.
Filtering the output
By default, several filters on imposed on which pairwise calculations are calculated and reported. To only analyse SNPs that are not more than 10 SNPs apart, for example, use the option (default is 10 SNPs)
     --ld-window 10
to specify a kb window in addition (default 1Mb)
     --ld-window-kb 1000
and to report only values above a particular value (this only applies when the --r2 and not the --r command is used) (default is 0.2)
     --ld-window-r2 0.2
The default for --ld-window-r2 is set at 0.2 to reduce the size of output files when many comparisons are made: to get all pairs reported, set --ld-window-r2 to 0.
Obtaining LD values for a specific SNP versus all others
To obtain all LD values for a set of SNPs versus one specific SNP, use the --ld-snp command in conjunction with --r2. For example, to get a list of all values for every SNP within 1Mb of rs12345, use the command
    plink --file mydata 
          --r2 
          --ld-snp rs12345 
          --ld-window-kb 1000 
          --ld-window 99999 
          --ld-window-r2 0

The --ld-window and --ld-window-r2 commands effectively means that output will be shown for all other SNPs within 1Mb of rs12345.
Obtaining a matrix of LD values
Alternatively, it is possible to add the --matrix option, which creates a matrix of LD values rather than a list: in this case, all SNP pairs are calculated and reported.
Haplotype-based LD calculations
A different command, --ld, instead takes two SNP IDs as parameters and calculates the R-squared based on the four haplotype frequencies: i.e. unlike the basic --r2 command, which is based simply on the genotypic correlation, this involves phasing, just for one particular pair of SNPs. For example:
plink --file mydata --ld rs12345 rs67890

No output files are generated apart from the LOG file, which reports the estimated R-squared value:
     LD information for SNP pair [ rs12345  rs67890 ]

     r-sq = 0.944388
Again, these calculations are based only on founders.

Known issues

Development of PLINK is ongoing: as such, there is always likely to be a list of features, listed here, that are only partialy implemented, or have known problems not yet fixed. A list of known issues can be found on the warnings page:
     http://pngu.mgh.harvard.edu/purcell/plink/warnings.shtml
 
This document last modified Wednesday, 11-Jun-2008 18:25:10 EDT