Checks that the pedigree file has the expected number of columns.
Checks that everyone listed as a parent has an entry. Creates
missing entries by assuming the missing person is a founder, unavailable, and
has unknown affection status, phenotypes and genotypes.
If only one parent is listed, creates an imaginary parent. This
ensures everyone, except founders, has two parents. Famtypes
sets these dummy entries as founders, with unknown affection status,
phenotypes and genotypes. It lists all dummy entries created in the log file.
Lists half-siblings in the log file.
Unknown genders are filled in from offspring data, where possible (if
all offspring list the parent as a single gender) and changed to male if
not (no offspring or conflicting description of parent gender). All
changes are listed in the log file.
If a family is disjoint (has people who are not related in any way)
Famtypes breaks that family into groups of people who are
related and list the changes in the log file. (If there are only
individuals with no parents or offspring, the user can decide to assume
these are siblings, and Famtypes will create dummy parents for them.)
Filters and reformats data:
The user can choose to break multigenerational families into multiple
nuclear families.
The user can choose to filter the data so that only families of a
certain type are listed in the log file statistics. They can be filtered
by:
All families. (no filter)
All families with both parents and at least 1 offspring
available.
All families with both parents and at least 1 affected
offspring available.
The nuclear family with the most affected offspring
in a multigenerational family, after ignoring unavailable people. (This
means only families with two available parents and at least one
available and affected offspring will be considered.)
Note: If affection status or availability columns are not provided,
all individuals will be treated as available and affected by the filters.
Returns a cleaned up pedigree file and a text file of statistics and
changes between the inital pedigree file and the cleaned up version.
Download
Famtypes is in beta, which means we're still working on it,
adding features and fixing problems as we find them. Check the warnings section for known problems, and use at your
own risk...
People within the MGH network can also run Famtypes on Pnguapp,
by following this internal link.
Others should download a copy onto their own machine using the links
below.
Famtypes runs in DOS, Linux and OS X. The source code is also
available for people who want to compile it themselves. Just right-click
the appropriate link and choose 'Save-As...':
Files:
Version 1.4: Fixed availability column requirement, fixed
bugs in command-line version.
Version 1.3: Added available individual and family counts. Forced use of
availability col temporarily, added version and parameter notes to log file.
Version 1.2: Fixed filter 4, added check for duplicate indIDs within a
family.
Version 1.1: Fixed avail, added ability to run from command line, compiles
on machines tested.
Version 1.0: Known bug - required availability column, only compiled on
some machines
Please e-mail
me if you download a copy, especially if you encounter any
problems using Famtypes or have suggestions.
Pedigree File Format
Each row in a pedigree file has information for a single person. The
first five columns, in order, need to have a family ID number (famID),
an individual ID number within the family (indID), the indID of
the person's father (patID), indID of mother (matID), and gender.
family ID number (famID)
individual ID number (indID, must be unique within a family)
father's indID (If founder, set fatID to 0)
mother's indID (If founder, set matID to 0)
gender (male - 1, female -2)
Below is a sample pedigree file for two nuclear families. Family 1
consists of a father (indID 1), mother (indID 2) and male child (indID 3).
Family 2 consists of a father (indID 1), mother (indID 2), female child
(indID 3) and male child (indID 4).
genotype(s) (Each genotype takes 2 columns. This data is usually
encoded as letters (A,T,G,C, x=unknown) or numbers (1-4, 0=unknown), but
any encoding with two columns is fine.)
Any individual created by Famtypes will have 0 in both affection
status and availabilty columns if those columns exist in the pedigree
file. If these columns are not being used, individuals created by
Famtypes will be treated by filters as available and affected, just
like the other individuals in the pedigree file.
Here, the data from the first example have been extended to include affection
and availability status, 2 phenotypes and 2 genotypes.
In this example a mother
is missing from the pedigree file. In
exMissPar.ped, the mother's ID is known since the offspring list the
matID as 2. In exUnknPar.ped,
the offspring have matID of 0, for unknown mother.
For the missing parent, Famtypes adds in a person to match the
maternal ID referenced. You can download the log of changes, and the resulting pedigree file, which is shown below:
Compare this to the result of an unknown parent. Here, both offspring
list a matID of 0, for unknown mother. In this case we can't assume they both
have the same mother, since they might be half-siblings. log of changes, and
the resulting pedigree file, which is shown
below:
First, you need access to a copy of Famtypes. The download section gives several sets of instructions for
doing this.
There are two ways to run Famtypes, either interactively or from
the command line.
Run Interactively:
Open a terminal
Change to the Famtypes directory.
(Type something like cd
user/famtypesFolder/famtypes)
Make sure the pedigree file (filename.ped) you want to run is in
this directory.
(Type ls and make sure your file is listed)
run Famtypes
(Type ./famtypes)
Famtypes will ask the following questions about the file and how to
process it:
Name of file to analyze - filename.ped
Is there an affection status column - y
or n
Is there an availability column - y
or n
Should Famtypes assume completely disjoint families are
really siblings - y or n
Number of phenotypes and covariates beyond column 5 -
0, 3, 15, etc
Number of genotypes (1 genotype takes 2 columns) -
0, 1, 23, etc
Name of new .ped output file - newfile.ped
Name of log file/secondary output file - logfile.txt
Type of filter (see intro for options) -
1, 2, 3, or 4
once it finishes, there should be two new files with the names
Famtypes asked for. The first is a cleaned up version of the pedigree
file and the second is a text file with statistics and a list of changes
between the original and new pedigree files.
Run from the Command Line:
Follow the interactive instructions, but when running
Famtypes, give all 9 answers to the interactive questions above as
command line arguments. For example, in test.ped (shown below), thre is
an affection status and availability status column, but no phenotypes or
genotypes.
1 1 0 0 1 1 1
1 2 0 0 2 1 1
1 3 1 2 1 1 1
To run this from the command line without treating disjoint families as
siblings and without filtering, type the following.
./famtypes test.ped y y n 0 0 testOut.ped testLog.txt 1
Warnings
Known problems:
If a pedigree file lists someone as their own parent, grandparent,
etc, Famtypes will freeze in an infinite loop.
Future Developments
Citation & Contact
Famtypes was developed by
Shaun Purcell
(Institute of
Psychiatry, London, UK & Whitehead Institute, MIT, Cambridge, MA,
USA)