A model-fitting implementation of the DeFries-Fulker model for
selected twin data
Shaun Purcell and Pak Sham
Introduction |
means.gawk |
prepare.gawk |
df.mx |
df-sl.mx |
(G)AWK |
Mx
These scripts are intended to facilitate a model-fitting approach to
the DeFries-Fulker (DF) model of twin analysis, outlined in Purcell & Sham. Two gawk scripts (1)
calculate the zygosity-specific proband and population means and (2)
transform the dataset based on these means. The Mx scripts
implement the basic DF model and the sex-limited model including DZ-O twins.
Usage : gawk -f means.gawk t=2 d=1 data1.raw to
specify a threshold of 2 or above. To select probands as scoring below
t, d=-1. This will create a file called
prepdata.dat which is used by prepare.gawk. If you
wish to specify your own proband and population means a
priori, you can edit the file prepdata.dat: the file
should contain four numbers, one per line, which are the MZ population
mean, the DZ population mean, the MZ proband mean, the DZ proband
mean.
As an example, consider the following dataset (ID, zygosity, twin 1, twin 2)
1 1 5 4
2 1 6 7
3 1 7 8
4 1 8 9
5 1 9 6
6 1 4 5
7 2 4 7
8 2 5 6
9 2 6 9
10 2 7 4
11 2 8 5
12 2 9 8
The output of running the means.gawk script is shown below
Total N ind. : 24
Grand mean : 6.5
MZ ind N : 12
MZ mean : 6.5
DZ ind N : 12
DZ mean : 6.5
Ind > t : 8
Proband mean : 8.5
Cotwin mean : 7.25
MZ > t : 4
MZ proband mean : 8.5
MZ cotwin mean : 7.5
DZ > t : 4
DZ proband mean : 8.5
DZ cotwin mean : 7
Estimate of rMZ : 0.5
Estimate of rDZ : 0.25
Estimate of h2g : 0.5
which gives a simple estimate of heritability from the DF model.
Also, the file prepdata.dat will have been written to
the local directory containing the means used by the next script.
Usage : gawk -f prepare.gawk t=2 d=1 data1.raw >
data1.trz to create a transformed dataset called data1.trz.
To continue the example given above, running the prepare.gawk
script on the above data, given the same threshold is specified
will produce the transformed datafile that the Mx script requires.
1 1 -0.75 -1.25 0 0
2 1 -0.25 0.25 0 0
3 1 0.25 0.75 0 1
4 1 0.75 1.25 1 1
5 1 1.25 -0.25 1 0
6 1 -1.25 -0.75 0 0
7 2 -1.25 0.25 0 0
8 2 -0.75 -0.25 0 0
9 2 -0.25 1.25 0 1
10 2 0.25 -1.25 0 0
11 2 0.75 -0.75 1 0
12 2 1.25 0.75 1 1
Usage : mx < df.mx > results1 to perform DF analysis
under Unix/Linux. Edit df.mx to constrain the genetic
parameter and re-run the script in order to determine the difference
in -2 log-likelihood of the models.
If we were to run the Mx script on our example dataset (taking care to
make sure the correct datafile name is specified) we obtain the
following output (extracted)
MATRIX Z
This is a computed FULL matrix of order 3 by 1
[=A_C_I-(A+C)]
1
1 5.0000E-01
2 -2.3615E-06
3 5.0000E-01
Your model has 4 estimated parameters and 48 Observed statistics
-2 times log-likelihood of data >>> 58.098
Degrees of freedom >>>>>>>>>>>>>>>> 44
In other words, the Z matrix contains the parameter estimates, which agree with the earlier estimate - heritability is 50%, the remaining variance can be ascribed to nonshared environmental factors.
In such a small sample, the power to say that the heritability estimate is significantly nonzero will be low. Fixing the A matrix by editing the Mx script, the results are now :
MATRIX Z
This is a computed FULL matrix of order 3 by 1
[=A_C_I-(A+C)]
1
1 0.0000
2 0.3750
3 0.6250
Your model has 3 estimated parameters and 48 Observed statistics
-2 times log-likelihood of data >>> 58.360
Degrees of freedom >>>>>>>>>>>>>>>> 45
For 1 degree of freedom difference (45 - 44 = 1) we have a difference
in -2 times the log-likelihood of only 58.360 - 58.098 = 0.262. This
illustrates why twin studies require more than a dozen pairs.
Usage : mx < df_sl.mx > results1 to perform DF
analysis. Edit df_sl.mx to perform the tests outlined in
Purcell & Sham.
The gawk or awk utility should be available on any
Unix/Linux operating system. To download a version of gawk
for MS-DOS follow the link above.
Mx is the model-fitting package used to implement the DF model.
It can be freely downloaded for various platforms from the link above.
Purcell S & Sham PC (submitted). A model-fitting implementation of
the DeFries-Fulker model for selected twin data.
Please note: Users of certain older or newer version of MS-DOS
may encounter problems with long filenames -- rename the `.gawk`
extensions to `.gwk`. Users of Windows XP may also encounter many
problems...
Page created by Shaun
Purcell.