Famtypes

A simple utility for parsing PED files

Shaun Purcell, Massachusetts General Hospital

Background | Download | Pedigree File Format | Examples | Usage | Warnings | Future developments | Citation | Contact
Introduction

What does Famtypes do?

Checks for data entry errors:

Filters and reformats data:


Download

Famtypes is in beta, which means we're still working on it, adding features and fixing problems as we find them. Check the warnings section for known problems, and use at your own risk...

People within the MGH network can also run Famtypes on Pnguapp, by following this internal link. Others should download a copy onto their own machine using the links below.

Famtypes runs in DOS, Linux and OS X. The source code is also available for people who want to compile it themselves. Just right-click the appropriate link and choose 'Save-As...':

Files:

Version 1.4: Fixed availability column requirement, fixed bugs in command-line version.
Version 1.3: Added available individual and family counts. Forced use of availability col temporarily, added version and parameter notes to log file.
Version 1.2: Fixed filter 4, added check for duplicate indIDs within a family.
Version 1.1: Fixed avail, added ability to run from command line, compiles on machines tested.
Version 1.0: Known bug - required availability column, only compiled on some machines

Installation:

Please e-mail me if you download a copy, especially if you encounter any problems using Famtypes or have suggestions.


Pedigree File Format

Each row in a pedigree file has information for a single person. The first five columns, in order, need to have a family ID number (famID), an individual ID number within the family (indID), the indID of the person's father (patID), indID of mother (matID), and gender. Below is a sample pedigree file for two nuclear families. Family 1 consists of a father (indID 1), mother (indID 2) and male child (indID 3). Family 2 consists of a father (indID 1), mother (indID 2), female child (indID 3) and male child (indID 4).
1 1 0 0 1
1 2 0 0 2
1 3 1 2 1
2 1 0 0 1
2 2 0 0 2
2 3 1 2 2
2 4 1 2 1

Famtypes will also ask about several optional columns. Any columns that are used should be in the following order: Any individual created by Famtypes will have 0 in both affection status and availabilty columns if those columns exist in the pedigree file. If these columns are not being used, individuals created by Famtypes will be treated by filters as available and affected, just like the other individuals in the pedigree file.

Here, the data from the first example have been extended to include affection and availability status, 2 phenotypes and 2 genotypes.

1 1 0 0 1   0 0   0 0.0   0 0   0 0
1 2 0 0 2   1 1   1 3.0   3 4   3 4
1 3 1 2 1   2 1   0 1.5   3 3   3 3


Examples

Example 1: Missing vs Unknown Parent.

In this example a mother is missing from the pedigree file. In exMissPar.ped, the mother's ID is known since the offspring list the matID as 2. In exUnknPar.ped, the offspring have matID of 0, for unknown mother.

For the missing parent, Famtypes adds in a person to match the maternal ID referenced. You can download the log of changes, and the resulting pedigree file, which is shown below:
1 1 0 0 1   0 1
1 2 0 0 2   0 0
1 3 1 2 2   0 1
1 4 1 2 1   0 1
Compare this to the result of an unknown parent. Here, both offspring list a matID of 0, for unknown mother. In this case we can't assume they both have the same mother, since they might be half-siblings. log of changes, and the resulting pedigree file, which is shown below:
MG1_2   1   0   0   1   0 1
MG1_2 DUMM1 0   0   2   0 0
MG1_2   3   1 DUMM1 2   0 1
MG2_2   1   0   0   1   0 1
MG2_2 DUMM2 0   0   2   0 0
MG2_2   4   1 DUMM2 1   0 1

Example 2: Disjoint Families

In this example, two unrelated trios are listed as part of the same family.
1 1 0 0 1
1 2 0 0 2
1 3 1 2 2
1 4 0 0 1
1 5 0 0 2
1 6 4 5 1
Famtypes splits them into two different families by adding a group number (GRP) to the front of the famID.
GRP1_1 1 0 0 1
GRP1_1 2 0 0 2
GRP1_1 3 1 2 2
GRP2_1 4 0 0 1
GRP2_1 5 0 0 2
GRP2_1 6 4 5 1

Usage

First, you need access to a copy of Famtypes. The download section gives several sets of instructions for doing this.

There are two ways to run Famtypes, either interactively or from the command line.

Run Interactively:

Run from the Command Line:

Follow the interactive instructions, but when running Famtypes, give all 9 answers to the interactive questions above as command line arguments. For example, in test.ped (shown below), thre is an affection status and availability status column, but no phenotypes or genotypes.

1 1 0 0 1  1 1
1 2 0 0 2  1 1
1 3 1 2 1  1 1
To run this from the command line without treating disjoint families as siblings and without filtering, type the following.
./famtypes test.ped y y n 0 0 testOut.ped testLog.txt 1

Warnings

Known problems:
Future Developments


Citation & Contact

Famtypes was developed by

Created by Shaun Purcell : Aug 2005.

Updated by Lori Thomas : Feb 2006.