PLINK: Whole genome data analysis toolset plink...
Last original PLINK release is v1.07 (10-Oct-2009); PLINK 1.9 is now available for beta-testing

Whole genome association analysis toolset

Introduction | Basics | Download | Reference | Formats | Data management | Summary stats | Filters | Stratification | IBS/IBD | Association | Family-based | Permutation | LD calcualtions | Haplotypes | Conditional tests | Proxy association | Imputation | Dosage data | Meta-analysis | Result annotation | Clumping | Gene Report | Epistasis | Rare CNVs | Common CNPs | R-plugins | SNP annotation | Simulation | Profiles | ID helper | Resources | Flow chart | Misc. | FAQ | gPLINK

1. Introduction

2. Basic information

3. Download and general notes

4. Command reference table

5. Basic usage/data formats 6. Data management

7. Summary stats 8. Inclusion thresholds 9. Population stratification 10. IBS/IBD estimation 11. Association 12. Family-based association 13. Permutation procedures 14. LD calculations 15. Multimarker tests 16. Conditional haplotype tests 17. Proxy association 18. Imputation (beta) 19. Dosage data 20. Meta-analysis 21. Annotation 22. LD-based results clumping 23. Gene-based report 24. Epistasis 25. Rare CNVs 26. Common CNPs 27. R-plugins 28. Annotation web-lookup 29. Simulation tools 30. Profile scoring 31. ID helper 32. Resources 33. Flow-chart 34. Miscellaneous 35. FAQ & Hints

36. gPLINK
 

gPLINK


gPLINK is a freely-available, Java-based software package that:
  • is a GUI that allows construction of many common PLINK operations
  • provides a simple project management tool and analysis log
  • allows for data and computation to be on a separate server (via SSH)
  • facilitates integration with Haploview
This site provides:



Documentation

In this Section, we cover:  

Overview of gPLINK

gPLINK is a Java program that provides a simple form interface to the more commonly-used PLINK commands (i.e. instead of using the command line options). gPLINK provides menus and dialogs to create valid PLINK commands, executes them, keeps a record of all commands run in a project, keeps track of input and output files, allows annotation of result files and facilitates integration with Haploview.

Important Only common PLINK commands are included in gPLINK forms. However it is possible to enter whole PLINK command lines via PLINK -> Create Plink Command, which can be useful if the exact PLINK option has not been incorporated into gPLINK.

Alternatively, gPLINK can be used to collate previously-generated PLINK analysis, organizing the results and allowing for easier browsing with Haploview. Using gPLINK in this "browse-only" mode, it can provide a means for distributing results of analysis to a wider set of collaborators, for example.

Please refer to the main PLINK documentation pages for a more detailed description of the different analytic options and the output file formats.

gPLINK provides can be used to initiate analysis from the five major domains of PLINK commands; the menu options are shown in the figure below:


gPLINK is currently considered stable, however from time to time there may be updates to add new PLINK commands.

In gPLINK, a project corresponds to a folder, either on the local machine or on a remote machine. All output will be written to that folder. Each operation must be assigned a unique fileroot name, which is used to track operations. gPLINK keeps track of the commands run, and which files were used for input and which were created as output, storing this information in a metafile in the project folder.

gPLINK has three main panes: Folder viewer, Operation viewer and Log viewer. Folder viewer shows a list of the files in your current project. Left-clicking on the Folder viewer you can open files, launch Haploview, track the hierarchy of a file and edit file notations. Operation viewer displays the operations recorded by gPLINK, their associated input and output files as well as any operations notes. Left-clicking on the Operations viewer will pop-up options to: open files, edit operation notes, unlink/link files to/from an operation, create a new PLINK operation, and delete operation. Log viewer shows the log file associated with the selected operation in the Operation viewer.

Local versus remote modes

You can run gPLINK in one of two modes: either local or remote.

In local mode, everything resides on the same machine: PLINK, gPLINK, Haploview, the data and all computation.

In remote mode, PLINK, the data files and all PLINK computation reside on a separate Unix/Linux machine, connected to the local machine via SSH. The user runs the two Java-based tools gPLINK and Haploview on the local machine, issuing commands to the remote machine that actually does all the work; select results files can be downloaded to the local machine for subsequent viewing, either using Haploview or any other local software.

In remote mode, gPLINK will use two project folders: one remote folder where all the original data and results are stored, and one local temporary version. Any file in the temporary folder can be deleted after the session finishes.

If the remote server is also the head node of a cluster, and if jobs are sent to the cluster with a simple prefix on a command line, then gPLINK can also send PLINK jobs off to the cluster to be performed.

To utilize remote mode, you need
  • To have a Linux/Unix server with PLINK installed
  • To have access to this server via SSH (secure shell)

Starting a new project

The first step is always to create a local project folder by creating a new folder/directory (using the standard operating system, as you would create any other folder/directory). If running in local mode, you will typically want to populate this folder with your datasets. If running in remote mode, this local folder will typically be empty, and it will just be used as a place to store temporary files.

On opening gPLINK, you first select which local project folder you'll be using, with the File -> Set project folder option.

Next, you indicate whether the project will be local or remote. See the tour for more details on these steps.

Configuring your project

After you have opened your project folder, a configuration dialog will pop up. Here you should set the PLINK path to point to your current copy of PLINK (plink.exe in Windows, otherwise plink). If in remote mode, you will be pointing to the copy of PLINK you wish to use on the remote server.

If you intend to use Haploview, you need to set the Haploview .jar path to where the .jar file. You need Haploview version 4.0 or later to integrate with gPLINK.

The Editor options allows you to pick what command you wish to call when you view input or output files. This is an advanced feature and the defaults should work fine in most cases.

Again, see the tour for more details on these steps.

Starting PLINK jobs

By this stage, you have started the project and configured gPLINK. You only need to configure gPLINK once at the start of each project.

To initiate a PLINK job, select the appropriate menu option from the PLINK menu. A dialog will be shown, in which you must:
  • Specify the binary or standard fileset to be used for input
  • Specify a unique name for the output files
If there are files with the same root and the appropriate extensions, then you can select this fileset from the top combo box; alternatively, the files can be specified separately. If you select an alternate phenotype, you must return the panel to either the binary or standard fileset panel before completing the analysis.

Additionally, you can often optionally
  • Select an alternate phenotype file
  • Set filters and thresholds
  • Change other parameters relevant to the requested analysis
After clicking OK, you will be shown the corresponding PLINK command line that gPLINK has generated given your choices. You are also given the option to add a description to this operation: this can be any text that will help you track what you are doing (i.e. when you return a month later to the project and can't remember what all those cryptic filenames mean...)

After adding the description, you will be returned to the main window; the newly generated operation will be added to the list of operations. This does not necessarily mean that the PLINK command will be finished. Depending on the size of the data and the analysis chosen, PLINK analysis can take from seconds to days or more. The command will run in the background. You can close gPLINK and the PLINK command (or commands) will continue to run in the background. When you next open up gPLINK, if the analysis has finished, gPLINK should automatically detect this and connect the output to the entry in the list of operations. Clicking on the operation will display the log file associated with that PLINK run; if the PLINK job has finished, then the log file will be complete, ending either in a fatal error message or a line saying when the analysis was finished.

Viewing output files

You can view the result files by expanding the tree for the desired operation, going to the list of output files and left-clicking on the selected entry. Alternatively you can select the file from the Folder viewer in a similar manner. You will be given a choice to view the file in the default viewer or an alternate viewer. The default viewer depends on the machine type: it will be WordPad, TextEdit or emacs for Windows, Mac and Linux/other systems respectively. The alternate viewer can be set to anything (e.g. Excel) via the Configuration menu option.

Integration with Haploview

If in the Configuration panel you pointed gPLINK to an instance of Haploview (version 4), then an Open in Haploview option will also appear. For results files, this will bring up the results-viewer panel of Haploview. You can filter and sort results here as well as merging multiple result files together. You can also generate plots of results very easily (which are interactive in the sense that if you hover the mouse over a point, it will tell you which SNP, or individual, the point represents; clicking on the point will take you to the relevant entry in the results table).

It is also possible to extract subsets of your whole genome SNP data files for viewing in Haploview (i.e. viewing data rather than results of PLINK runs). Use the Data management -> Generate fileset -> Haploview fileset option for this. Then right-clicking on either the .info or .ped file and selecting to view in Haploview will load the data into Haploview. Note: use the filters to select manageable subsets of the data for viewing in Haploview (i.e. restrict the number of SNPs you wish to include).

Miscellanea and known issues

How do I kill a PLINK process initiated by gPLINK?

When you run a PLINK command from gPLINK a separate PLINK program is started independent of gPLINK. This means that you can close gPLINK and your operation will run to completion. It also means that if you decide to kill your PLINK operation you will need to do so through your operating system (for example kill in Linux or through Task Manager in Windows).

Manual Rescan folder option

The menu option Rescan folder checks the project folder for files created by PLINK commands. Typically this rescanning is performed automatically, for both local and remote projects, every couple of seconds or so and can be set for a different timing in the Configuration dialog. This option is for the inpatient, therefore.

Quirks, known issues

In no particular order:
  • All input files must have an extension (contain a period .) if they are to appear in the operation view
  • Do not attempt to use different machines with different architecture to access a project on a shared network drive that is mapped to the different machines. That is, in this case, PLINK would be running in "local" mode.
  • Java 1.5 is required to run gPLINK.

Download and installation

You first need up-to-date versions of Java, PLINK and Haploview on your computer.

Please follow all 4 of these steps:

  1. You will need Java 1.5 installed on your machine: this is freely available from the Sun website, for all common platforms. To download Java, follow this link and select Download Now.

  2. You need PLINK version 0.99p or greater to work with gPLINK, which can be downloaded from here.

  3. A beta version of Haploview that supports gPLINK can be obtained from this page

  4. After completing the above three steps, download the latest gPLINK (version 2.050) by clicking here for the JAR file (as a zipped archive).

If you download this file, please refer to the GPL v2 license. Source code is available upon request, and will soon be posted.

If you have downloaded the ZIP file, you must first extract the contents (a single JAR file).

In Windows, double-clicking on the gPLINK.jar file should probably work

If not (and on all other platforms), typing at the command line prompt
     java -jar gPLINK.jar
will start gPLINK (you must be in the directory where gPLINK.jar is, or specify it's location explicitly; likewise, java must be in your path; please ask your system administrator if you have problems with this).

Source Code

For interested developers the source code can be found here. Note that most users do not need the source code!
 
This document last modified Thursday, 01-Oct-2009 08:00:57 EDT