Motivate utilisation of path and CFA models; Argue how they connect to other models that we covered at the course.
Calculate number of free parameters and degrees of freedom of the proposed model.
Build a model in R statistical environment, estimate, and interpret the coefficients.
Criticise, modify, compare, and evaluate the fit of the proposed models.
General framework that uses various models to test relationships among variables
Other terms: covariance structure analysis, covariance structure modelling, causal modelling
Sewell Wright - "mathematical tool for drawing causal conclusions from a combination of of observational data and theoretical assumptions"
Waves:
SEM is a general modelling framework that is composed of measurement model and the structural model.
Judea Pearl - The Causal Foundations of Structural Equation Modeling
Measurement model focuses on the estimation of latent or composite variables
Structural model focuses on the estimation of relations between manifest and/or latent variables in the model (path model)
Terminology:
Manifest variables: observed/collected variables
Latent variables: infered measures - hypothetical constructs
Endogenous variables: dependent outcomes
Exogenous variables: predictors
Focus on covariance structure instead of mean
Model that test relationship between set of variables, often arranged in some sort of structural form.
A common focus of the path model is the estimation of mediation between X and Y.
Previous findings show that development of cognitive abilities in people depends on a range of factors in infancy and early childhood. General mental/cognitive abilities (e.g. reading or drawing), varied nutrition, physical exercises, and social engagement have shown to influence the level of cognitive abilities. Based on some of these studies, researchers postulate that social engagement is mediating factor between the behavioural factors and development of cognitive abilities.
Representation of our hypothetical assumptions in the form of the structural equation model
Total Number of the parameters that we can estimate: variables∗(variables+1)2
Matrix<-cov(Babies[,c('Nutrition','PhyExer','GMA','SocialBeh','CognitiveAb')])Matrix[upper.tri(Matrix)]<-NAknitr::kable(Matrix, format = 'html')
Nutrition | PhyExer | GMA | SocialBeh | CognitiveAb | |
---|---|---|---|---|---|
Nutrition | 45.6689837 | NA | NA | NA | NA |
PhyExer | -10.1006752 | 2652.9074 | NA | NA | NA |
GMA | 0.5641485 | -249.3049 | 2478.2889 | NA | NA |
SocialBeh | -11.6168733 | 3417.8681 | -506.1066 | 9988.898 | NA |
CognitiveAb | 210.6731970 | 48916.6339 | 1254.2100 | 94358.621 | 1125746 |
How many degrees of freedom do we have without the model?
How many degrees of freedom do we have without the model?
Number of observations (total number of parameters) = 15
Empty model = variances and covariances
Degrees of freedom (df) = 15 - 8 = 7
Most of the time (CFA model or other software): Degree of freedom for null model = (variables∗(variables+1)2)−variables
Matrix<-cov(Babies[,c('Nutrition','PhyExer','GMA','SocialBeh','CognitiveAb')])Matrix[upper.tri(Matrix)]<-NAMatrix[lower.tri(Matrix)]<-NAknitr::kable(Matrix, format = 'html')
Nutrition | PhyExer | GMA | SocialBeh | CognitiveAb | |
---|---|---|---|---|---|
Nutrition | 45.66898 | NA | NA | NA | NA |
PhyExer | NA | 2652.907 | NA | NA | NA |
GMA | NA | NA | 2478.289 | NA | NA |
SocialBeh | NA | NA | NA | 9988.898 | NA |
CognitiveAb | NA | NA | NA | NA | 1125746 |
Free parameters = variances + covariances + regression pathways = 14
modelAbility<-'SocialBeh~Nutrition+PhyExer+GMACognitiveAb~SocialBeh+Nutrition+GMA'
modelAbility<-'SocialBeh~Nutrition+PhyExer+GMACognitiveAb~SocialBeh+Nutrition+GMA'
fit1<-sem(modelAbility, data=Babies)summary(fit1)
## lavaan 0.6.15 ended normally after 1 iteration## ## Estimator ML## Optimization method NLMINB## Number of model parameters 8## ## Number of observations 100## ## Model Test User Model:## ## Test statistic 215.236## Degrees of freedom 1## P-value (Chi-square) 0.000## ## Parameter Estimates:## ## Standard errors Standard## Information Expected## Information saturated (h1) model Structured## ## Regressions:## Estimate Std.Err z-value P(>|z|)## SocialBeh ~ ## Nutrition 0.030 1.105 0.027 0.978## PhyExer 1.281 0.146 8.796 0.000## GMA -0.075 0.151 -0.500 0.617## CognitiveAb ~ ## SocialBeh 9.579 0.469 20.428 0.000## Nutrition 7.019 6.899 1.017 0.309## GMA 2.461 0.941 2.614 0.009## ## Variances:## Estimate Std.Err z-value P(>|z|)## .SocialBeh 5515.809 780.053 7.071 0.000## .CognitiveAb 215129.001 30423.835 7.071 0.000
Chi-square test: measure of how well model-implied covariance matrix fits data covariance
We would prefer not to reject the null hypothesis in this case
Assumptions:
Multivariate normality
N is sufficiently large (150+)
Parameters are not at boundary or invalid (e.g. variance of zero)
With the large samples it is sensitive to small misfits
Nonormality induces bias
summary(fit1, fit.measures=TRUE)
## lavaan 0.6.15 ended normally after 1 iteration## ## Estimator ML## Optimization method NLMINB## Number of model parameters 8## ## Number of observations 100## ## Model Test User Model:## ## Test statistic 215.236## Degrees of freedom 1## P-value (Chi-square) 0.000## ## Model Test Baseline Model:## ## Test statistic 438.108## Degrees of freedom 7## P-value 0.000## ## User Model versus Baseline Model:## ## Comparative Fit Index (CFI) 0.503## Tucker-Lewis Index (TLI) -2.479## ## Loglikelihood and Information Criteria:## ## Loglikelihood user model (H0) -1328.506## Loglikelihood unrestricted model (H1) -1220.888## ## Akaike (AIC) 2673.012## Bayesian (BIC) 2693.853## Sample-size adjusted Bayesian (SABIC) 2668.587## ## Root Mean Square Error of Approximation:## ## RMSEA 1.464## 90 Percent confidence interval - lower 1.303## 90 Percent confidence interval - upper 1.632## P-value H_0: RMSEA <= 0.050 0.000## P-value H_0: RMSEA >= 0.080 1.000## ## Standardized Root Mean Square Residual:## ## SRMR 0.080## ## Parameter Estimates:## ## Standard errors Standard## Information Expected## Information saturated (h1) model Structured## ## Regressions:## Estimate Std.Err z-value P(>|z|)## SocialBeh ~ ## Nutrition 0.030 1.105 0.027 0.978## PhyExer 1.281 0.146 8.796 0.000## GMA -0.075 0.151 -0.500 0.617## CognitiveAb ~ ## SocialBeh 9.579 0.469 20.428 0.000## Nutrition 7.019 6.899 1.017 0.309## GMA 2.461 0.941 2.614 0.009## ## Variances:## Estimate Std.Err z-value P(>|z|)## .SocialBeh 5515.809 780.053 7.071 0.000## .CognitiveAb 215129.001 30423.835 7.071 0.000
TLI: fit of .95 indicates that the fitted model improves the fit by 95% relative to the null mode, works OK with smaller sample sizes
CFI: Same as TLI, but not very sensitive to sample size
RMSEA: difference between the residuals of the sample covariance matrix and hypothesized model. If we have different scales it is hard to interpret, then we can check standardised root mean square residual (SRMR)
Add/take out theoretical pathways:
modelAbility2<-'SocialBeh~Nutrition+PhyExer+GMACognitiveAb~SocialBeh+Nutrition+GMA+PhyExer'fit2<-sem(modelAbility2, data=Babies)summary(fit2, fit.measures=TRUE)
## lavaan 0.6.15 ended normally after 1 iteration## ## Estimator ML## Optimization method NLMINB## Number of model parameters 9## ## Number of observations 100## ## Model Test User Model:## ## Test statistic 0.000## Degrees of freedom 0## ## Model Test Baseline Model:## ## Test statistic 438.108## Degrees of freedom 7## P-value 0.000## ## User Model versus Baseline Model:## ## Comparative Fit Index (CFI) 1.000## Tucker-Lewis Index (TLI) 1.000## ## Loglikelihood and Information Criteria:## ## Loglikelihood user model (H0) -1220.888## Loglikelihood unrestricted model (H1) -1220.888## ## Akaike (AIC) 2459.776## Bayesian (BIC) 2483.222## Sample-size adjusted Bayesian (SABIC) 2454.798## ## Root Mean Square Error of Approximation:## ## RMSEA 0.000## 90 Percent confidence interval - lower 0.000## 90 Percent confidence interval - upper 0.000## P-value H_0: RMSEA <= 0.050 NA## P-value H_0: RMSEA >= 0.080 NA## ## Standardized Root Mean Square Residual:## ## SRMR 0.000## ## Parameter Estimates:## ## Standard errors Standard## Information Expected## Information saturated (h1) model Structured## ## Regressions:## Estimate Std.Err z-value P(>|z|)## SocialBeh ~ ## Nutrition 0.030 1.105 0.027 0.978## PhyExer 1.281 0.146 8.796 0.000## GMA -0.075 0.151 -0.500 0.617## CognitiveAb ~ ## SocialBeh 5.701 0.213 26.781 0.000## Nutrition 8.548 2.352 3.634 0.000## GMA 2.814 0.321 8.764 0.000## PhyExer 11.390 0.413 27.577 0.000## ## Variances:## Estimate Std.Err z-value P(>|z|)## .SocialBeh 5515.809 780.053 7.071 0.000## .CognitiveAb 24999.990 3535.532 7.071 0.000
lavTestLRT(fit1,fit2)
## ## Chi-Squared Difference Test## ## Df AIC BIC Chisq Chisq diff RMSEA Df diff Pr(>Chisq) ## fit2 0 2459.8 2483.2 0.00 ## fit1 1 2673.0 2693.8 215.24 215.24 1.4637 1 < 2.2e-16 ***## ---## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
modindices(fit1, sort=TRUE)
## lhs op rhs mi epc sepc.lv sepc.all sepc.nox## 15 SocialBeh ~ CognitiveAb 88.379 -0.228 -0.228 -2.420 -2.420## 16 CognitiveAb ~ PhyExer 88.379 11.390 11.390 0.553 0.011## 22 PhyExer ~ CognitiveAb 82.143 0.128 0.128 2.635 2.635## 26 GMA ~ CognitiveAb 1.601 0.025 0.025 0.529 0.529## 18 Nutrition ~ CognitiveAb 1.002 0.007 0.007 1.114 1.114## 21 PhyExer ~ SocialBeh 0.000 0.000 0.000 0.000 0.000## 20 Nutrition ~ GMA 0.000 0.000 0.000 0.000 0.000## 19 Nutrition ~ PhyExer 0.000 0.000 0.000 0.000 0.000## 24 PhyExer ~ GMA 0.000 0.000 0.000 0.000 0.000## 28 GMA ~ PhyExer 0.000 0.000 0.000 0.000 0.000## 23 PhyExer ~ Nutrition 0.000 0.000 0.000 0.000 0.000## 25 GMA ~ SocialBeh 0.000 0.000 0.000 0.000 0.000## 17 Nutrition ~ SocialBeh 0.000 0.000 0.000 0.000 0.000
Direct effect (c): subgroups/cases that differ by one unit on X, but are equal on M are estimated to differ by c units on Y.
Indirect effect:
a) X -> M: cases that differ by one unit in X are estimated to differ by a units on M
b) M -> Y: cases that differ by one unit in M, but are equal on X, are estimated to differ by b units on Y
The indirect effect of X on Y through M is a product of a and b. The two cases that differ by one unit on X are estimated to differ by ab units on Y as a result of the effect of X on M which affects Y.
modelAbilityPath<-'SocialBeh~Nutrition+a*PhyExer+GMACognitiveAb~b*SocialBeh+c*PhyExer+GMAindirect := a*bdirect := ctotal := indirect + direct'fitPath<-sem(modelAbilityPath, data=Babies)summary(fitPath)
## lavaan 0.6.15 ended normally after 1 iteration## ## Estimator ML## Optimization method NLMINB## Number of model parameters 8## ## Number of observations 100## ## Model Test User Model:## ## Test statistic 12.401## Degrees of freedom 1## P-value (Chi-square) 0.000## ## Parameter Estimates:## ## Standard errors Standard## Information Expected## Information saturated (h1) model Structured## ## Regressions:## Estimate Std.Err z-value P(>|z|)## SocialBeh ~ ## Nutrition 0.030 1.105 0.027 0.978## PhyExer (a) 1.281 0.146 8.796 0.000## GMA -0.075 0.151 -0.500 0.617## CognitiveAb ~ ## SocialBeh (b) 5.704 0.227 25.180 0.000## PhyExer (c) 11.355 0.439 25.846 0.000## GMA 2.813 0.342 8.233 0.000## ## Variances:## Estimate Std.Err z-value P(>|z|)## .SocialBeh 5515.809 780.053 7.071 0.000## .CognitiveAb 28300.616 4002.312 7.071 0.000## ## Defined Parameters:## Estimate Std.Err z-value P(>|z|)## indirect 7.308 0.880 8.304 0.000## direct 11.355 0.439 25.846 0.000## total 18.664 0.894 20.879 0.000
Interaction between the predictors can be included similar to the linear regression model by using (:) sign.
modelAbilityInteraction<-
SocialBeh~Nutrition+PhyExer+GMA+PhyExer:GMA
CognitiveAb~SocialBeh+Nutrition+GMA
Theory: Strong theoretical assumptions that could be used to draw causal assumptions that could be tested using the data and specification of the model
Data: large samples, N:p rule - 20:1, more data usually better estimates.
Variables are causally dependent if there is an arrow between them
They are causally independent if there are no arrows between them
X1 is causally independent from Y2 conditional on Y1
PiecewiseSEM performs a test of directional separation (d-sep) and asks whether causally independent paths are significant when controlling for variables on which causal process is conditional.
#install.packages('piecewiseSEM)require(piecewiseSEM)model1<-psem(lm(SocialBeh~Nutrition+PhyExer+GMA, data=Babies), lm(CognitiveAb~SocialBeh+Nutrition+GMA, data=Babies))summary(model1, .progressBar=FALSE)
## ## Structural Equation Model of model1 ## ## Call:## SocialBeh ~ Nutrition + PhyExer + GMA## CognitiveAb ~ SocialBeh + Nutrition + GMA## ## AIC BIC## 229.364 255.416## ## ---## Tests of directed separation:## ## Independ.Claim Test.Type DF Crit.Value P.Value ## CognitiveAb ~ PhyExer + ... coef 95 26.8792 0 ***## ## Global goodness-of-fit:## ## Fisher's C = 209.364 with P-value = 0 and on 2 degrees of freedom## ## ---## Coefficients:## ## Response Predictor Estimate Std.Error DF Crit.Value P.Value Std.Estimate## SocialBeh Nutrition 0.0300 1.1278 96 0.0266 0.9789 0.0020## SocialBeh PhyExer 1.2814 0.1487 96 8.6187 0.0000 0.6604## SocialBeh GMA -0.0753 0.1538 96 -0.4899 0.6253 -0.0375## CognitiveAb SocialBeh 9.5792 0.4786 96 20.0156 0.0000 0.9023## CognitiveAb Nutrition 7.0193 7.0413 96 0.9969 0.3213 0.0447## CognitiveAb GMA 2.4607 0.9607 96 2.5614 0.0120 0.1155## ## ## ***## ## ***## ## *## ## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05## ## ---## Individual R-squared:## ## Response method R.squared## SocialBeh none 0.44## CognitiveAb none 0.81
Chapters 1 to 5 of Principles and Practice of Structural Equation Modeling by Rex B. Kline
Introduction to Mediation, Moderation, and Conditional Process Analysis: A Regression-Based Approach by Andrew F. Hayes
Latent Variable Modeling Using R: A Step-by-Step Guide by A. Alexander Beaujean
Keyboard shortcuts
↑, ←, Pg Up, k | Go to previous slide |
↓, →, Pg Dn, Space, j | Go to next slide |
Home | Go to first slide |
End | Go to last slide |
Number + Return | Go to specific slide |
b / m / f | Toggle blackout / mirrored / fullscreen mode |
c | Clone slideshow |
p | Toggle presenter mode |
t | Restart the presentation timer |
?, h | Toggle this help |
Esc | Back to slideshow |