HGANALYSE procedure
Analyses data using a hierarchical or double hierarchical generalized linear model (R.W. Payne, Y. Lee, J.A. Nelder & M. Noh).
Options
Parameters
Description
HGANALYSE is one of several procedures with the prefix HG, which provide tools for fitting the hierarchical and double hierarchical generalized linear models (HGLMs and DHGLMs) defined by Lee & Nelder (1996, 2001, 2006). These models extend generalized linear models (GLMs) to include additional random terms in the linear predictor. They include generalized linear mixed models (GLMMs) as a special case, but do not constrain the additional terms to follow a Normal distribution and to have an identity link (as in the GLMM). For example, if the basic generalized linear model is a log-linear model (Poisson distribution and log link), a more appropriate assumption for the additional random terms might be a gamma distribution and a log link.
The analysis involves fitting an augmented generalized linear model to describe the mean of the distribution. This has units corresponding to the original data units, together with additional units for the effects of the random terms; see Lee & Nelder (1996). Then there are further GLMs to describe the dispersion for each random term (including the residual dispersion, phi); see Lee & Nelder (2001). In a DHGLM, some of these dispersion GLMs are themselves extended to become HGLMs by the inclusion of random terms; see Lee & Nelder (2006).
Before calling HGANALYSE, the fixed and random terms in the HGLM must be defined by the HGFIXEDMODEL and HGRANDOMMODEL procedures, respectively. The HGDRANDOMMODEL procedure can then add random terms to a dispersion GLM, so that the model becomes a DHGLM.
The variate to be analysed must be supplied by the Y parameter and, if the y-values are binomial responses, the NBINOMIAL parameter should supply the corresponding total numbers. Residuals and fitted values can be saved using the RESIDUALS and FITTEDVALUES parameters, respectively. Note that only one y-variate can be analysed at once, so any additional variates are ignored (as occurs with the MODEL directive when generalized linear models are defined).
The SAVE parameter allows you to save a pointer containing full details of the analysis. This can then be used to generate further output from HGDISPLAY, HGKEEP, HGPLOT or HGPREDICT. The most recent save structure is kept automatically inside GenStat to use as a default for the SAVE options of HGDISPLAY, HGKEEP, HGPLOT and HGPREDICT. So, you need save the pointer explicitly only if you want to display output from more than one analysis at a time.
The PRINT, SEMETHOD and DMETHOD options control printed output, almost exactly as in the HGDISPLAY procedure (which is called by HGANALYSE to produce the output). The only difference is that PRINT has additional settings: monitoring provides information about the fitting process of an ordinary HGLM, and dhgmonitoring provides information about the fitting of the HGLM for the dispersion model in a DHGLM.
The other options control various aspects of the fitting process. The fitting process involves alternative fits of the augmented GLM for the mean given the current estimates of the dispersion parameters, and of the models that estimate the dispersion parameters. The convergence of the process is assessed by comparing the dispersion estimates from successive fits. The MAXCYCLE option can specify two scalars. The first sets a limit on the number of alternating fits (default 99), and the second controls the number of iterations in the estimation of the mean model and of the dispersion model (default 30). The TOLERANCE option defines the criterion for convergence in the alternating fits (default 0.005). The EMETHOD option determines whether Aitken (default) or adjusted Aitken extrapolation is used in the estimation of the dispersion estimates, or you can set EMETHOD=* to use neither. The ETOLERANCE option sets an upper limit on the ratio of the changed value to the original values in the extrapolations; the default value is 7.5. The EXIT option can be set to a scalar which will be set to zero or one according to whether or not the fitting has been successful.
By default HGANALYSE uses exact likelihood to obtain the y-variate and weights for the dispersion model. This produces estimates with less bias than the previous method, of extended quasi likelihood (EQL). However, option LMETHOD is provided to enable EQL estimates to be obtained if required. For some of the models the DLAPLACEORDER option allows the order of Laplace approximation involved in the estimation of the dispersion components to be increased from the standard value (and default) of 0, to either 1 or 2. This is appropriate for generalized linear mixed models with the binomial or Poisson distributions, where use of Laplace order 0 can lead to serious downwards bias. The MLAPLACEORDER option similarly allows you to set the order of Laplace approximation to use in the estimation of the mean model to 1 instead of 0.
Options: PRINT, LMETHOD, SEMETHOD, DMETHOD, EMETHOD, MLAPLACEORDER, DLAPLACEORDER, MAXCYCLE, EXIT, TOLERANCE, ETOLERANCE.
Parameters: Y, NBINOMIAL, RESIDUALS, FITTEDVALUES, SAVE.
Method
The model is fitted using the method of Lee & Nelder (2006).
Action with
RESTRICT
The Y variate can be restricted to analyse a subset of the data.
References
Lee, Y., & Nelder, J.A. (1996). Hierarchical generalized linear models (with discussion). Journal of the Royal Statistical Society, Series B, 58, 619-678.
Lee, Y., & Nelder, J.A. (2001). Hierarchical generalized linear models: a synthesis of generalised linear models, random-effect models and structured dispersions. Biometrika, 88, 987-1006.
Lee, Y. & Nelder, J.A. (2006). Double hierarchical generalized linear models (with discussion). Appl. Statist., 55, 1-29.