|
1. Introduction
2. Basic information
3. Download and general notes
4. Command reference table
5. Basic usage/data formats
6. Data management
7. Summary stats
8. Inclusion thresholds
9. Population stratification
10. IBS/IBD estimation
11. Association
12. Family-based association
13. Permutation procedures
14. Multimarker tests
15. Conditional haplotype tests
16. Proxy association
17. Full imputation (beta)
18. LD-based results clumping
19. Epistasis
20. Copy Number Variation
21. R-plugins
22. SNP annotation lookup
23. Simulation tools
24. Profile scoring
25. Resources
26. Miscellaneous
27. FAQ & Hints
28. gPLINK
|
|
Resources available for download
This page contains links to several freely-available resources, mostly
generated by other individuals. All these resources are provided "as
is", without any guarantees regarding their correctness or utility.
The Phase 2 HapMap as a PLINK fileset
The HapMap genotype data (release
22) are available here as PLINK binary filesets. The SNPs are
currently coded according NCBI build 36 coordinates on the forward
strand. Several versions are available here: the entire dataset (a
single, very large fileset: you will need a computer with at least 2Gb
of RAM to load this file).
The filtered SNP set refers to a list of SNPs that have MAF
greater than 0.01 and genotyping rate greater than 0.95 in the 60 CEU
founders. This fileset is probably a good starting place for
imputation in samples of European descent. Filtered versions of the
other HapMap panels will be made available shortly.
Thanks to Paul de Bakker for generating these files.
| Description |
File size |
File name |
| Entire HapMap (270 individuals, 3.96 million SNPs) |
110M |
hapmap_r22.zip |
| CEU founders (60 individuals, 3.96 million SNPs) |
49M |
hapmap-ceu-all.zip |
| CEU founders (60 individuals, filtered 2.2 million SNPs) |
29M |
hapmap-ceu.zip |
| CEU founders (as above, files split by chromosome, 1-22 and X) |
29M |
hapmap-ceu-by-chr.zip |
Teaching materials and example dataset
A tutorial can be downloaded from here; the material is similar to the
online tutorial but slightly more involved. As it currently stands, it
is designed to first use gPLINK to perform a set of basic
tests and QC procedures and then move to standard PLINK for
more in-depth analysis.
It is designed to work on a standard modern laptop computer or
equivalent desktop. It was written for vesion 1.02 of PLINK, but
should remain compatible with future releases.
| Description |
File size |
File name |
| ZIP archive containing data |
15M |
example.zip |
| ZIP archive containing teaching materials |
1.3M |
teaching.zip |
You are feel free to use, modify or distribute these files in any way
you wish, although giving me appropriate credit for the materials
would be appreciated.
The example.zip archive contains
wgas1.ped Whole-genome SNP data example PED file
wgas1.map Corresponding MAP file
extra.ped Follow-up genotyping for a particular region
extra.map Corresponding MAP file
pop.cov Population membership variable
command-list.txt List of all commands for 2nd part of practical
The teaching.zip archive contains a PowerPoint and a Word file:
practical-1-slides.ppt
practical-2-notes.doc
These two files cover the first and second half of the tutorial
respectively. The second document assumes the first half has already
been completed (but also contains some introductory remarks concerning
the data). I will probably update the Word document to also include
the early commands covered in the PowerPoint/gPLINK part (i.e. so that
the entire practical can be performed from the command line rather
than using gPLINK). The list of commands (command-list.txt) is
included so that people can cut-and-paste commands in, rather than type. If
using DOS, it is a good idea to first increase the window width (right click on
header on DOS window, Properties, Layout and increase buffer and window width to
around 120 characters).
Everything should be fairly self-explantory after looking through the PowerPoint file
and Word document.
Multimarker test lists
These files, generated by Itsik Pe'er and others, facilitate the
'multi-marker predictor' approach to association testing, as described
in the manusctipt:
Pe'er I, de Bakker PI, Maller J, Yelensky R, Altshuler D
& Daly MJ (2006) Evaluating and improving power in whole-genome
association studies using fixed marker sets. Nat Genet, 38(6): 605-6.
They are PLINK-formatted lists of multimarker tests selected for
Affymetrix 500K and Illumina whole genome products, based on
consideration of the CEU Phase 2 HapMap (at r-squared=0.8
threshold). One should download the appropriate file and run with
the --hap option (after ensuring that any strand issues have
been resolved).
Note These haplotypes are specified in terms
of the +ve (positive) strand relative to the HapMap. You might need to
reformat your data prior to using these files (using the
--flip command, for instance) before you can use them.
Note These tables list all tags for every common HapMap
SNP, at the given r-squared threshold. The same haplotype may therefore
appear multiple times (i.e. if it tags more than 1 SNP).
Note These tables obviously assume that all tags on present in
the final, post-quality-control dataset: i.e. if certain SNPs have been removed,
it will be better to reselect the predictors -- that is, these lists should really
only be used as a first pass, for convenience.
In general, however, quite possibily an easier and better strategy is
instead to analyse the data within
an imputation context, e.g. utilising
the proxy association procedures rather than using these fixed lists.
Gene sets
Here is a PLINK-format SET
file, containing a genome-wide set of genes (N=18272). The
co-ordinates are based on NCBI B36 assembly, dbSNP 126; a gene is
arbitrarily defined as including 50kb upstream and downstream,
although this is chosen without respect to linkage disequilibrium:
clearly there is room for a more intelligent, LD-informed mapping of
SNPs to genes here, but this file is provided in the interim as a
rough-and-ready starting point.
Download (ZIP archive):
gene-list.zip
This document last modified Wednesday, 11-Jun-2008 18:25:10 EDT
|