The ‘gold standard’ for freely available Structural Equation Modelling (SEM) software is often taken to be Mx. Unfortunately, it is difficult to use Mx with large data sets such as can be produced with fMRI, the source code for Mx is not available, and it is often much easier to program calculations using Matlab which is the platform for SPM, the most popular package for analyzing fMRI data. Matlab SEM code for fMRI analysis is therefore needed. In addition to Mx, an important resource for SEM is ‘SEMNET’ – the SEM discussion network. It includes as members some of the most experienced people using SEM, mostly in areas out with neuroimaging. The particular example used below was discussed on SEMNET some time ago - links to that discussion are provided. SEM remains the most frequently used method for fMRI measurement of ‘effective connectivity’. However, other techniques such as ‘dynamic causal modelling’ (DCM) are becoming increasingly popular. These other techniques are not discussed here. The example below is used to introduce SEM from the beginning, starting with an Mx analysis, then considering the problem from the perspective of a non-numerical algebraic solution, and finally using Matlab in conjunction with Adaptive Simulated Annealing (ASA) with the ASAMIN ‘mex’ gateway function. There were two motivations for this approach. Firstly, SEM calculations can be complex and such complexity can obscure mistakes and misconceptions. Therefore, the objective here was to describe the simplest possible, most transparent SEM computational method, which is at the same time completely general and practical to use with Matlab. Obfuscation is anathema. Matlab code and other files for the example are here. The second motivation relates to the use of SEM in fMRI. There are many pitfalls with SEM relating to issues such as ‘unidentified models’, ‘empirical under-identification’, poor fitting of the model to the data, ‘local minima’ etc. These issues are well recognised in non-neuroimaging fields such as economics and psychology, yet it is still rare to find an fMRI publication mentioning the problems. They are important though, since ignoring them may possibly render a calculation at best wrong and at worst misleading. An FAQ which clearly discusses many of the pitfalls is available. Given the effort and time that goes into obtaining fMRI data, it is very important to avoid such problems. SEM is often used in fMRI for measuring ‘effective connectivity’- the influence of one neuronal region on another. Accurate measurement of effective connectivity is particularly important for clinical fMRI studies of schizophrenia, such as those testing the ‘disconnection hypothesis’, and associated theories of abnormal corollary discharge due to dysfunctional ‘forward models’.
It is important to note that the following example was chosen to indicate what happens with a SEM identification problem. It was not difficult to generate this problem. 1/ Path Diagrams & Reticular Action Model (RAM) Matrices The Mx manual has a very useful Tutorial introduction. This first section is partly based on that Tutorial, but uses a different path diagram and data as an example, plus considers SEM from the perspective of fMRI. SEMs are usually represented by path diagrams. There are 4 elements to a path diagram: **Circles**represent non-measured hypothesised latent variables**Squares**represent the observed measured variables**Single headed arrows**indicate a causal relationship, pointing from the ‘causing’ variable towards the variable being ‘caused’**Double headed arrows**indicate a covariance or correlation. A double headed arrow that goes back to itself represents residual variance or error variance.
Consider the following path diagram (Mx file: ds_example1.mxd) This
path diagram represents 4 observed variables The
object of the SEM calculation is to estimate all these variables,
with In the case of an fMRI study it is possible to define ROIs using SPM and extract a ‘time-series’: e.g. in SPM99, the program ‘spm_regions.m’ extracts a representative time-course from voxel data in a data file (Y.mad) in terms of the first eigenvariate of the filtered and adjusted data in all suprathreshold voxels (saved in Y.mad) within a spherical ROI centred on the nearest (Y.mad) voxel to the selected location. Temporal filtering can be specified. P1 Total Number of Measurements
The output of this process would be four time-series corresponding to the F, G, H and J regions of interest, stored in a matrix (e.g. ts(4,:) ) - it is easy to calculate the covariance matrix ( cov(ts’) ). By this method the following observed covariance matrix ‘obs’ could have been calculated – see Mx data file (ds_example1.dat)
The RAM approach to SEM is very straightforward and completely general (McArdle and McDonald 1984). It is less efficient computationally than other much more complex techniques, so the latter were often used before the easy availability of fast PCs. Mx will automatically generate the RAM matrices from the path diagram but it is necessary to know how to do this for the later Matlab code. RAM requires the specification of three matrices, ‘F’, ‘A’ and ‘S’. The key to specification is consistent labelling of the rows and columns of the matrices with the latent and observed variables. The order of variables is unimportant.
Note
how the
A
A ‘1’ appears, wherever a row variable is the same as a column variable. In the
following, ‘I’ refers to the The predicted covariance matrix ‘pred’ can be calculated from these matrices
and this matrix is then compared with the actual observed matrix ‘obs’ using a variety of fitting functions, e.g. the ‘maximum likelihood’ function (‘ML’ below). The FAQ discusses such fitting equations in detail.
The ‘numerical approach’ to SEM involves specifying path values and calculating how well the predicted covariance matrix matches the observed matrix. The objective is to vary the path values such that the discrepancy quantified by the fit function is minimised.
P3 Poor fit between Model and Data
Running Mx on the above path model and data gives the output
AIC <
0 and RMSEA < 0.05 indicates a reasonably good fit of the model to the
data. which might seem OK at a first glance. However, adding Options SError to the Mx script results in these comments: Approximate numerical estimates of standard errors and CI's. *** Not as good as likelihood-based or bootstrap *** Likely to be wrong if error distribution is not symmetric
*** WARNING! *** Eigenvalue check indicates ill-conditioning Check your model for under-identification Most likely culprits are parameters specified as
Please check your model for under-identification and note that this message may be caused by parameters hitting actual or effective bounds. More complete diagnostic data may be found in the
Mx suggests parameters ‘e’ & ‘a’ may be suspect and does this by computing the eigenvalues and eigenvectors of the ‘hessian matrix’ (see below), using the information to assess for ‘under-identification’ - another of the pitfalls of SEM. It is easy to specify an under-identified model, yet this may not be obvious. In an under-identified model at least one of the path values can not be determined, although a SEM program will still output a (misleading) numerical value for an unidentified path. P4 Model Under-Identification and Empirical Under-Identification
The ‘algebraic approach’ suggests that the parameters of a SEM model are considered identified if it is possible to solve covariance structure equations for the unknown parameters - specifically to ‘express parameters as independent functions of the elements of the covariance matrix’. Say then we have a SEM model with various causal and covariance unknowns (a, b, c, d) which we want to estimate. Using RAM (or other) theory, we can derive algebraic expressions for a predicted covariance matrix ‘P’ say
where the elements of P are:
A SEM program would minimise the difference between P and the observed covariance matrix ‘O’ say, by varying the estimates of a-d according to a fit function where
However, the model is algebraically identified if it is possible to determine:
A detailed discussion on this method can be found elsewhere.
To implement this algebraic manipulation automatically, it is possible to use the computer algebra program Macsyma. Macsyma can translate simple Matlab code into Macsyma code, meaning that it is possible to use a simple Matlab program to calculate the algebraic form of the predicted covariance matrix. Whilst Matlab has a symbolic manipulation package based on Maple, it does not seem to have a function equivalent to Macsyma’s SOLVER (see below). By this method, the predicted covariance matrix for the example is The following is then defined so, for example
and 10 simultaneous equations are defined. The simultaneous equations can be solved in an automated manner using the Macsyma SOLVER package. This program tries to find solutions of arbitrary systems of linear or non-linear parametric equations by employing heuristic search and valuation strategies in combination with the Macsyma SOLVE and LINSOLVE commands. If the equations cannot be solved explicitly, either because there exists no analytic solution or because the symbolic solution is too complex, SOLVER attempts to solve a subset and returns the non-linear kernel that could not be solved. Problems encountered during SOLVER action result in notifications and if necessary, prompts for choosing alternative actions. By this method an automated algebraic solution of the example was attempted and produced:
however SOLVER reported a linear dependency
equation y9 was not used, and 4 possible solutions were found
This indicates an identification problem, consistent with the Mx warning, despite being apparently identified from the perspective of ‘rules of thumb’ – a structure should be identified if “two or more factors… only two indicators for a factor… no correlated errors, each indicator loads on only one factor, and none of the variances or covariances is equal to zero”. A discussion on the identification problems with the example can be found elsewhere. These are partly due to a subtle issue from the way the residual variance has been specified. It is reasonably clear that: - The signs of the variables are not fully identified
- As ‘e’ gets smaller, there is a tendency for empirical under-identification.
It is possible to compare the Mx path values with the algebraic solution:
Therefore, Mx does not seem to be consistent with a full algebraic solution (though Mx did report a problem).
The function called by asamin (SEM_func) calculates a variety of fitting functions for given input path parameters. In SEM_func(): set Nobs=250; and make sure the following is selected for fitting:
Then at the Matlab prompt define a variable, which corresponds to the Mx derived optimum path fit values:
Set:
Calling SEM_func directly should now return the value of ML ChiSq obtained by Mx:
which it does.
If only a few paths are of interest, it’s possible to use SEM_func() within a simple ‘grid search’ using multiple loops, e.g.
This method may sometimes be practical. However, with 9 variables to determine here, such a calculation would take too long. Therefore, a better way is required to find an optimal solution. One possible method is Adaptive Simulated Annealing (ASA), which is C code written by Lester Ingber and developed to find the best global fit of a constrained non-convex cost-function over D-dimensional space. Simulated annealing is a generalization of a Monte Carlo method for examining the equations of state and frozen states of n-body systems. The annealing concept is based on the way liquids freeze or metals recrystalize in the process of annealing. A melt, initially at high temperature and disordered, is cooled so that the system at any time is approximately in thermodynamic equilibrium. As cooling proceeds, the system becomes more ordered and approaches a ‘frozen’ ground state - an adiabatic approach to the lowest energy state. If the initial temperature of the system is too low or cooling is done insufficiently slowly, the system may become ‘quenched’ forming defects or freezing out in metastable states - a local minimum state. ASA is a particularly efficient version of simulated annealing and has many user defined parameters. ASAMIN is a Matlab gateway function to ASA written by Shinichi Sakata. ASAMIN and ASA have been compiled with Microsoft Visual C++ 6.0 to form the Matlab callable ‘asamin.dll’ routine. A particularly useful feature of ASAMIN is that the ‘cost function’ for minimisation by ASA (SEM_func()) can be written in Matlab not C.
Run ‘sem.m’, which fits the model 4 times with randomised starting values: (first row is estimated path values, second is estimated standard error). The SRMR (Tabachnick and Fidell 1996, p752) has been used for estimating the goodness of fit of the model to the data. SRMR<0.05 indicates a good fit – note that use of the SRMR is invalid here since this equation should be used with standardised data (correlation matrices) – a non-standardised version (RMR) is also available.
Minimum function value = 0.018526 ASA exit state = 0 1, SRMR = 0.011672
Minimum function value = 0.019854 ASA exit state = 0 1, SRMR = 0.012083
Minimum function value = 0.018742 ASA exit state = 0 1, SRMR = 0.011739
Minimum function value = 0.019555 ASA exit state = 0 1, SRMR = 0.011993 Consistent with discussion on SEMNET: - The signs of some variables flip on certain runs with different starting values, due to the sign not being identified.
- The estimated standard error of ‘e’ is consistently much larger than for the other variables.
- An identical minimum function value is not being achieved, indicating problems with fitting, perhaps in part due to under-identification, though local minima may also be important.
The above calculation did not use the maximum likelihood formula for fitting, since there is another problem - run ‘sem’ with this equation selected and observe the output. ML assumes a ‘positive definite’ predicted matrix, which can be tested for by attempting a Cholesky decomposition with the following code inserted into SEM_func()
This indicates non-positive definite predicted matrices are being generated during attempted function minimisation and so the ML formula can not be used. Assuming instead that a properly identified model is used and that consistent minimum function values are obtained with associated unique path values, a set of optimum paths can be derived for each subject taking part in a study. In the case of a clinical fMRI study, these can be considered summary statistics from a first-level analysis, and entered into a second level analysis using e.g. t-tests to compare a group of patients with a group of controls. The result would comprise a random effects analysis.
- It seems useful to attempt an algebraic solution in the first instance, to determine if an anatomical-functional model for an fMRI study is identified. This can be done using standard computer algebra programs such as Macsyma.
- Assuming the model is identified from a theoretical perspective, a numerical solution of the SEM can then be attempted using multiple re-runs and observing if consistent minima of the fitting function is obtained, and associated path values are unique. Alternatively, the algebraic solutions can just be used directly; however, empirical under-identification can still occur and the model may fit the data poorly.
- It is important to obtain a good fit between the path model and the data, since if the fit is poor, the model does not account for the data.
P5 Miscellaneous issues
From the perspective of the general linear model (GLM), analysis of variance & covariance (ANCOVA) are special cases of multiple regression and correlation (Keppel and Zedeck 1989). ‘Dummy variables’ can be used as the independent variables in a regression analysis to implement a t-test or ANOVA. ANOVA itself is just a generalisation of a t-test for the case of more than two groups. Other techniques such as Pearson’s correlation are also specific cases of the GLM. Multiple regression analysis has an unspecified number of independent variables, but only one dependent variable. The generalisation to an unspecified number of dependent variables is termed canonical correlation analysis (CCA). CCA has been referred to as a general parametric significance-testing system, since simple correlation, t-test for independent samples, ANOVA, factorial ANOVA, ANCOVA, t-test for correlated samples, discriminant analysis and even a chi-squared test of independence are special cases of CCA (Knapp 1978). Multivariate analysis of variance (MANOVA) is also a special case of CCA, using dummy variables as the independent variables, analogous to implementing ANOVA with linear regression (Zinkgraf 1983). Multivariate analysis of variance and covariance (MANCOVA) is similarly analogous to ANCOVA from the perspective of the multivariate GLM. However,
. There
is nothing in the notion of effective connectivity which requires a SEM
approach. The study of causal inference is a subject in itself (Perl
2000) .If despite such caveats SEM is still of interest, it is best to keep the model as simple as possible, since this will often make it easier to find a model that fits (explains) the data well. Additionally, simpler models are easier to solve algebraically. I am
interested to know if you disagree about anything, find errors in the above
discussion or Matlab code (sem.m, SEM_func.m, asamin.dll), and if you find
the code & discussion useful. Please cite [Steele J.D.
The method of image analysis described by Dr Meyer-Lyndenberg (Stein et al, 2007, NeuroImage, 36, 736-45) was developed using the above code as a starting point.
Bagozzi,
R. P., C. Fornell, et al. (1981). "Cannonical correlation analysis as
a special case of a structural relations model." Multivariate
Behavioral Research Friston,
K. (2002). "Beyond Phrenology: What Can Neuroimaging Tell Us About
Distributed Circuitry?" Annu Rev Neurosci Good, P. (1994). Permutation Tests. London, Springer-Verlag. Keppel, G. and S. Zedeck (1989). Data analysis for research designs. Berkley, Freeman. Knapp,
T. R. (1978). "Canonical correlation analysis: a general parametric
significance testing system." Psychological Bulletin Lawley, D. N. and A. E. Maxwell (1971). Factor analysis as a statistical method. London, Butterworths. McArdle,
J. J. and R. P. McDonald (1984). "Some algebraic properties of the
Reticular Action Model for moment structures." British Journal of
Mathematical and Statistical Psychology Perl, J. (2000). Causality, Models, Reasoning, and Inference. Cambridge, Cambridge University Press. Tabachnick, B. G. and L. S. Fidell (1996). Using multivariate statistics. New York, HarperCollins Publishers Inc. Zinkgraf,
S. (1983). "Performing factorial multivariate analysis of variance
using canonical correlation analysis." Educational and Psychological
Measurement
© J.D. Steele 2014 |