PLINK: Whole genome data analysis toolset plink...
Latest PLINK release is v1.03 (10-Jun-2008)

Whole genome association analysis toolset

Introduction | Basics | Download | Reference | Formats | Data management | Summary stats | Filters | Stratification | IBS/IBD | Association | Family-based | Permutation | Haplotypes | Conditional tests | Proxy association | Imputation | Clumping | Epistasis | Copy Number Variation | R-plugins | SNP annotation | Simulation | Profiles | Resources | Misc. | FAQ | gPLINK

1. Introduction

2. Basic information

3. Download and general notes

4. Command reference table

5. Basic usage/data formats 6. Data management

7. Summary stats 8. Inclusion thresholds 9. Population stratification 10. IBS/IBD estimation 11. Association 12. Family-based association 13. Permutation procedures 14. Multimarker tests 15. Conditional haplotype tests 16. Proxy association 17. Full imputation (beta) 18. LD-based results clumping 19. Epistasis 20. Copy Number Variation 21. R-plugins 22. SNP annotation lookup 23. Simulation tools 24. Profile scoring 25. Resources 26. Miscellaneous 27. FAQ & Hints

28. gPLINK
 

SNP scoring routine

PLINK provides a simple means to generate scores or profiles for individuals based on a simple allelic scoring system involving one or more SNPs. One potential use of such would be to assign a single quantitative index of genetic load, perhaps to build simple multi-SNP prediction models.

Note This is an advanced function intended for exploratory analyses, that is still in a beta development phase. If the point of this routine isn't clear to you, you probably should just ignore this entire feature.

Basic usage

The basic command to generate a score is the --score option, e.g.
./plink --bfile mydata --score myprofile.raw

which takes as a parameter the name of a file (here myprofile.raw) that describes the scoring system. This file has the format of one or more lines, each with exactly three fields
     SNP ID
     Reference allele
     Score (numeric)
for example
     SNPA   A    1.95
     SNPB   C    2.04
     SNPC   T   -0.98
     SNPD   A   -0.24
These scores can be based on whatever you want. One choice might be the log of the odds ratio for significantly associated SNPs, for example. Then, running the command above would generate a file
     plink.profile
with one individual per row and the fields:
     FID     Family ID
     IID     Individual ID
     PHENO   Phenotype for that
     CNT     Number of non-missing SNPs used for scoring
     SCORE   Total score for that individual
The score is simply a sum across SNPs of the number of reference alleles (0,1 or 2) at that SNP multiplied by the score for that SNP. For, example,
     Genotype          A/A         G/G         A/T      0/0   
     # ref alleles      2           0           1       n/a
     Score            2*1.95  +   0*2.04  +  1*-0.98         -> 2.92
The score 2.92/3 (the average score per non-missing SNP) could then be used, e.g. as a covariate, or a predictor of disease if it is scored in a sample that is independent from the one used to generate the original scoring weights. Obviously, a score profile based on some effect size measure from a large number of SNPs will necessarily be highly correlated with the phenotype in the original sample: i.e. this in no (straightforward) way provides additional statistical evidence for associations in that sample.
 
This document last modified Wednesday, 11-Jun-2008 18:14:43 EDT