Genetic association models

Shaun Purcell
PNGU, CHGR, MGH, Boston, USA

The three web-modules provided below provide a range of association tests. All modules assume a diallelic genetic marker.

The first, MODEL, is designed for case/control samples and/or TDT samples. Any number of samples can be jointly analysed. This module is designed for the analysis of a single sample, or a meta-analysis across different samples.

The second and third modules, GxE (1) and GxE (2) are designed for the specific case of case/control samples in two groups: one group is exposed to some environmental risk (E+) and one group is not (E-).

GxE (1) and GxE (2) implement similar tests: the difference is that GxE (1) automatically searches over all common models to select the best fitting; the GxE (2) module can be used to perform single tests of particular hypotheses.

The analysis in MODEL is similar to that in GxE (1) and GxE (2): the latter two modules allow for a main effect of envionmental exposure. If the sample has been ascertained on both disease status and enviormental exposure (e.g. equal sized groups E+/D+, E+/D-, E-/D+ and E-/D-) then MODEL should be used.


Module Design Analyses
MODEL (Temporarily unavailable) Genotype data from 1 or more case/control samples and/or TDT samples All genetic models, with or without allele frequencies and/or genetic effects constrained across samples
 
GxE (1) Case/control genotype data from exposed and unexposed environmental groups Automatically select the best fitting model from all possible models: allowing for gene-environment interaction, gene-environment correlation and a main effect of exposure
 
GxE (2) Case/control genotype data from exposed and unexposed environmental groups Specify a specific test, same parameters as GxE (1)

Instructions

All modules analyse diallelic markers. Five basic genetic main effect models are applied. In each case, the aa genotype is the reference genotype: the association is defined by two relative risks (RR), for the Aa and AA genotypes (i.e. RR(aa)=1): All models are nested in the Gen model; the None model is nested in all other models. Likelihood ratio tests can compare nested models -- as featured in MODEL and GxE (2). The AIC fit index is used to rank order models in GxE (1).

All models assume that Hardy-Weinberg equilibrium exists in the general population (note that different samples and environmental exposure groups are allowed to come from different populations). The allele frequency is an estimated parameter for case/control designs; it does not feature in the TDT design. Note: there may be problems if one or more of any input cells have a zero count.

MODEL

Case/control and TDT sample data are entered in the text box: the order of the input is important -- see the notes on the input page. When there is more than one sample (case/control or TDT) then the basic models are fitted twice: once assuming that the effects are homogeneous across samples, once allowing them to differ. By default, the Equal allele frequencies check box is checked -- if this is unchecked, then models allowing for different allele frequencies will be included (when there is more than 1 case/control sample in the data).

The output is organised in two sections: a list of the model estimates and log-likelihoods, and a series of likelihood ratio tests. The list of models will contain one, two or four sets (depending on whether unequal effects and/or frequencies are modeled) of the five basic association models. For each individual model, the frequency (if it is a case/control sample) and two relative risks RR(Aa) and RR(AA) are given -- the line order represents the order in which the data were entered.

A potentially larger number of likelihood ratio tests are constructed, by comparing the -2LL values from the individual models. The basic tests of association are listed several times, differing by whether effects and/or frequencies are allowed to differ. If the tests of effects and/or frequencies show evidence of heterogeneity, then one should look to the tests of genetic models that allow for that heterogeneity.

GxE (1)

The model in GxE (1) and GxE (2) is basically similar to that in MODEL: the language used and the way the output is presented are different, reflecting the different focus. Heteregenous effects becomes GxE (gene-environment interaction); heterogeneous allele frequencies becomes rGE (gene-environment correlation). This first model tests a wide range of GxE models.

GxE (2)

In this second module, the analysis is basically the same as in GxE (1) except that the user specifies a single model to test.
Shaun Purcell Nov 2003