Processing math: 100%
+ - 0:00:00
Notes for current slide
Notes for next slide

Lecture 3: Path models (Structural Equation Modelling)

Dr Nemanja Vaci

1 / 46

Press record

2 / 46

Corrections from the previous lecture

3 / 46

Intended learning outcomes

Motivate utilisation of path and CFA models; Argue how they connect to other models that we covered at the course.

Calculate number of free parameters and degrees of freedom of the proposed model.

Build a model in R statistical environment, estimate, and interpret the coefficients.

Criticise, modify, compare, and evaluate the fit of the proposed models.

4 / 46

Structural equation modelling (SEM)

General framework that uses various models to test relationships among variables

Other terms: covariance structure analysis, covariance structure modelling, causal modelling

Sewell Wright - "mathematical tool for drawing causal conclusions from a combination of of observational data and theoretical assumptions"

Waves:

  1. Causal modelling through path models
  2. Latent structures - factor analysis
  3. Structural causal models


SEM is a general modelling framework that is composed of measurement model and the structural model.

5 / 46

Judea Pearl - The Causal Foundations of Structural Equation Modeling

Measurement model focuses on the estimation of latent or composite variables
Structural model focuses on the estimation of relations between manifest and/or latent variables in the model (path model)

Terminology:

Manifest variables: observed/collected variables

Latent variables: infered measures - hypothetical constructs

  • Indicator variables: measures used to infer the latent concepts

Endogenous variables: dependent outcomes

Exogenous variables: predictors


Focus on covariance structure instead of mean

Structural part of the model (path analysis)

Model that test relationship between set of variables, often arranged in some sort of structural form.

A common focus of the path model is the estimation of mediation between X and Y.


6 / 46


First step: Specification of the model

Previous findings show that development of cognitive abilities in people depends on a range of factors in infancy and early childhood. General mental/cognitive abilities (e.g. reading or drawing), varied nutrition, physical exercises, and social engagement have shown to influence the level of cognitive abilities. Based on some of these studies, researchers postulate that social engagement is mediating factor between the behavioural factors and development of cognitive abilities.


7 / 46

Representation of our hypothetical assumptions in the form of the structural equation model

Can model be estimated?

Total Number of the parameters that we can estimate: variables(variables+1)2



8 / 46

Number of observations

Matrix<-cov(Babies[,c('Nutrition','PhyExer','GMA','SocialBeh','CognitiveAb')])
Matrix[upper.tri(Matrix)]<-NA
knitr::kable(Matrix, format = 'html')
Nutrition PhyExer GMA SocialBeh CognitiveAb
Nutrition 45.6689837 NA NA NA NA
PhyExer -10.1006752 2652.9074 NA NA NA
GMA 0.5641485 -249.3049 2478.2889 NA NA
SocialBeh -11.6168733 3417.8681 -506.1066 9988.898 NA
CognitiveAb 210.6731970 48916.6339 1254.2100 94358.621 1125746
9 / 46

How many parameters are we estimating (path model)?

How many degrees of freedom do we have without the model?

10 / 46

How many parameters are we estimating (path model)?

How many degrees of freedom do we have without the model?


Number of observations (total number of parameters) = 15
Empty model = variances and covariances
Degrees of freedom (df) = 15 - 8 = 7

10 / 46

Most of the time (CFA model or other software): Degree of freedom for null model = (variables(variables+1)2)variables

Matrix<-cov(Babies[,c('Nutrition','PhyExer','GMA','SocialBeh','CognitiveAb')])
Matrix[upper.tri(Matrix)]<-NA
Matrix[lower.tri(Matrix)]<-NA
knitr::kable(Matrix, format = 'html')
Nutrition PhyExer GMA SocialBeh CognitiveAb
Nutrition 45.66898 NA NA NA NA
PhyExer NA 2652.907 NA NA NA
GMA NA NA 2478.289 NA NA
SocialBeh NA NA NA 9988.898 NA
CognitiveAb NA NA NA NA 1125746

How many parameters (our model)?


Free parameters = variances + covariances + regression pathways = 14

11 / 46

Second step: model identification

  1. Under-indentified: more free parameters than total possible parameters

  2. Just-identified: equal number of free parameters and total possible parameters

  3. Over-identified: fewer free parameters than total possible parameters


    Parameters can either be: free, fixed or constrained
12 / 46

Third step: estimation of the model

modelAbility<-'
SocialBeh~Nutrition+PhyExer+GMA
CognitiveAb~SocialBeh+Nutrition+GMA
'
13 / 46

Third step: estimation of the model

modelAbility<-'
SocialBeh~Nutrition+PhyExer+GMA
CognitiveAb~SocialBeh+Nutrition+GMA
'
fit1<-sem(modelAbility, data=Babies)
summary(fit1)
## lavaan 0.6-9 ended normally after 46 iterations
##
## Estimator ML
## Optimization method NLMINB
## Number of model parameters 8
##
## Number of observations 100
##
## Model Test User Model:
##
## Test statistic 215.236
## Degrees of freedom 1
## P-value (Chi-square) 0.000
##
## Parameter Estimates:
##
## Standard errors Standard
## Information Expected
## Information saturated (h1) model Structured
##
## Regressions:
## Estimate Std.Err z-value P(>|z|)
## SocialBeh ~
## Nutrition 0.030 1.105 0.027 0.978
## PhyExer 1.281 0.146 8.796 0.000
## GMA -0.075 0.151 -0.500 0.617
## CognitiveAb ~
## SocialBeh 9.579 0.469 20.428 0.000
## Nutrition 7.019 6.899 1.017 0.309
## GMA 2.461 0.941 2.614 0.009
##
## Variances:
## Estimate Std.Err z-value P(>|z|)
## .SocialBeh 5515.809 780.053 7.071 0.000
## .CognitiveAb 215129.015 30423.837 7.071 0.000
13 / 46

Step four: model evaluation

Chi-square test: measure of how well model-implied covariance matrix fits data covariance

We would prefer not to reject the null hypothesis in this case

Assumptions:
Multivariate normality
N is sufficiently large (150+)
Parameters are not at boundary or invalid (e.g. variance of zero)


With the large samples it is sensitive to small misfits
Nonormality induces bias

14 / 46

Other fit indices

summary(fit1, fit.measures=TRUE)
## lavaan 0.6-9 ended normally after 46 iterations
##
## Estimator ML
## Optimization method NLMINB
## Number of model parameters 8
##
## Number of observations 100
##
## Model Test User Model:
##
## Test statistic 215.236
## Degrees of freedom 1
## P-value (Chi-square) 0.000
##
## Model Test Baseline Model:
##
## Test statistic 438.108
## Degrees of freedom 7
## P-value 0.000
##
## User Model versus Baseline Model:
##
## Comparative Fit Index (CFI) 0.503
## Tucker-Lewis Index (TLI) -2.479
##
## Loglikelihood and Information Criteria:
##
## Loglikelihood user model (H0) -1328.506
## Loglikelihood unrestricted model (H1) -1220.888
##
## Akaike (AIC) 2673.012
## Bayesian (BIC) 2693.853
## Sample-size adjusted Bayesian (BIC) 2668.587
##
## Root Mean Square Error of Approximation:
##
## RMSEA 1.464
## 90 Percent confidence interval - lower 1.303
## 90 Percent confidence interval - upper 1.632
## P-value RMSEA <= 0.05 0.000
##
## Standardized Root Mean Square Residual:
##
## SRMR 0.080
##
## Parameter Estimates:
##
## Standard errors Standard
## Information Expected
## Information saturated (h1) model Structured
##
## Regressions:
## Estimate Std.Err z-value P(>|z|)
## SocialBeh ~
## Nutrition 0.030 1.105 0.027 0.978
## PhyExer 1.281 0.146 8.796 0.000
## GMA -0.075 0.151 -0.500 0.617
## CognitiveAb ~
## SocialBeh 9.579 0.469 20.428 0.000
## Nutrition 7.019 6.899 1.017 0.309
## GMA 2.461 0.941 2.614 0.009
##
## Variances:
## Estimate Std.Err z-value P(>|z|)
## .SocialBeh 5515.809 780.053 7.071 0.000
## .CognitiveAb 215129.015 30423.837 7.071 0.000
15 / 46

Other fit indices

16 / 46

TLI: fit of .95 indicates that the fitted model improves the fit by 95% relative to the null mode, works OK with smaller sample sizes

CFI: Same as TLI, but not very sensitive to sample size

RMSEA: difference between the residuals of the sample covariance matrix and hypothesized model. If we have different scales it is hard to interpret, then we can check standardised root mean square residual (SRMR)

Model modification

Add/take out theoretical pathways:

modelAbility2<-'
SocialBeh~Nutrition+PhyExer+GMA
CognitiveAb~SocialBeh+Nutrition+GMA+PhyExer
'
fit2<-sem(modelAbility2, data=Babies)
summary(fit2, fit.measures=TRUE)
## lavaan 0.6-9 ended normally after 56 iterations
##
## Estimator ML
## Optimization method NLMINB
## Number of model parameters 9
##
## Number of observations 100
##
## Model Test User Model:
##
## Test statistic 0.000
## Degrees of freedom 0
##
## Model Test Baseline Model:
##
## Test statistic 438.108
## Degrees of freedom 7
## P-value 0.000
##
## User Model versus Baseline Model:
##
## Comparative Fit Index (CFI) 1.000
## Tucker-Lewis Index (TLI) 1.000
##
## Loglikelihood and Information Criteria:
##
## Loglikelihood user model (H0) -1220.888
## Loglikelihood unrestricted model (H1) -1220.888
##
## Akaike (AIC) 2459.776
## Bayesian (BIC) 2483.222
## Sample-size adjusted Bayesian (BIC) 2454.798
##
## Root Mean Square Error of Approximation:
##
## RMSEA 0.000
## 90 Percent confidence interval - lower 0.000
## 90 Percent confidence interval - upper 0.000
## P-value RMSEA <= 0.05 NA
##
## Standardized Root Mean Square Residual:
##
## SRMR 0.000
##
## Parameter Estimates:
##
## Standard errors Standard
## Information Expected
## Information saturated (h1) model Structured
##
## Regressions:
## Estimate Std.Err z-value P(>|z|)
## SocialBeh ~
## Nutrition 0.030 1.105 0.027 0.978
## PhyExer 1.281 0.146 8.796 0.000
## GMA -0.075 0.151 -0.500 0.617
## CognitiveAb ~
## SocialBeh 5.701 0.213 26.781 0.000
## Nutrition 8.548 2.352 3.634 0.000
## GMA 2.814 0.321 8.764 0.000
## PhyExer 11.390 0.413 27.577 0.000
##
## Variances:
## Estimate Std.Err z-value P(>|z|)
## .SocialBeh 5515.809 780.053 7.071 0.000
## .CognitiveAb 24999.990 3535.532 7.071 0.000
17 / 46

We can compare the models

lavTestLRT(fit1,fit2)
## Chi-Squared Difference Test
##
## Df AIC BIC Chisq Chisq diff Df diff Pr(>Chisq)
## fit2 0 2459.8 2483.2 0.00
## fit1 1 2673.0 2693.8 215.24 215.24 1 < 2.2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
18 / 46

Or check modification indices

modindices(fit1, sort=TRUE)
## lhs op rhs mi epc sepc.lv sepc.all sepc.nox
## 17 CognitiveAb ~ PhyExer 88.379 11.390 11.390 0.553 0.011
## 16 SocialBeh ~ CognitiveAb 88.379 -0.228 -0.228 -2.420 -2.420
## 23 PhyExer ~ CognitiveAb 82.143 0.128 0.128 2.635 2.635
## 27 GMA ~ CognitiveAb 1.601 0.025 0.025 0.529 0.529
## 19 Nutrition ~ CognitiveAb 1.002 0.007 0.007 1.114 1.114
## 18 Nutrition ~ SocialBeh 0.000 0.000 0.000 0.000 0.000
## 22 PhyExer ~ SocialBeh 0.000 0.000 0.000 0.000 0.000
## 26 GMA ~ SocialBeh 0.000 0.000 0.000 0.000 0.000
## 21 Nutrition ~ GMA 0.000 0.000 0.000 0.000 0.000
## 25 PhyExer ~ GMA 0.000 0.000 0.000 0.000 0.000
## 13 PhyExer ~~ GMA 0.000 0.000 0.000 NA 0.000
## 29 GMA ~ PhyExer 0.000 0.000 0.000 0.000 0.000
## 10 Nutrition ~~ PhyExer 0.000 0.000 0.000 NA 0.000
## 24 PhyExer ~ Nutrition 0.000 0.000 0.000 0.000 0.000
## 20 Nutrition ~ PhyExer 0.000 0.000 0.000 0.000 0.000
19 / 46

Direct and indirect

Direct effect (c): subgroups/cases that differ by one unit on X, but are equal on M are estimated to differ by c units on Y.

Indirect effect:
a) X -> M: cases that differ by one unit in X are estimated to differ by a units on M
b) M -> Y: cases that differ by one unit in M, but are equal on X, are estimated to differ by b units on Y

The indirect effect of X on Y through M is a product of a and b. The two cases that differ by one unit on X are estimated to differ by ab units on Y as a result of the effect of X on M which affects Y.

20 / 46

Direct and indirect

modelAbilityPath<-'
SocialBeh~Nutrition+a*PhyExer+GMA
CognitiveAb~b*SocialBeh+c*Nutrition+GMA
indirect := a*b
direct := c
total := indirect + direct
'
fitPath<-sem(modelAbilityPath, data=Babies)
summary(fitPath)
## lavaan 0.6-9 ended normally after 46 iterations
##
## Estimator ML
## Optimization method NLMINB
## Number of model parameters 8
##
## Number of observations 100
##
## Model Test User Model:
##
## Test statistic 215.236
## Degrees of freedom 1
## P-value (Chi-square) 0.000
##
## Parameter Estimates:
##
## Standard errors Standard
## Information Expected
## Information saturated (h1) model Structured
##
## Regressions:
## Estimate Std.Err z-value P(>|z|)
## SocialBeh ~
## Nutrition 0.030 1.105 0.027 0.978
## PhyExer (a) 1.281 0.146 8.796 0.000
## GMA -0.075 0.151 -0.500 0.617
## CognitiveAb ~
## SocialBeh (b) 9.579 0.469 20.428 0.000
## Nutrition (c) 7.019 6.899 1.017 0.309
## GMA 2.461 0.941 2.614 0.009
##
## Variances:
## Estimate Std.Err z-value P(>|z|)
## .SocialBeh 5515.809 780.053 7.071 0.000
## .CognitiveAb 215129.015 30423.837 7.071 0.000
##
## Defined Parameters:
## Estimate Std.Err z-value P(>|z|)
## indirect 12.275 1.519 8.079 0.000
## direct 7.019 6.899 1.017 0.309
## total 19.294 7.074 2.727 0.006
21 / 46

Interaction between the predictors can be included similar to the linear regression model by using (:) sign.

modelAbilityInteraction<-
SocialBeh~Nutrition+PhyExer+GMA+PhyExer:GMA
CognitiveAb~SocialBeh+Nutrition+GMA

Prerequisites

Theory: Strong theoretical assumptions that could be used to draw causal assumptions that could be tested using the data and specification of the model

Data: large samples, N:p rule - 20:1, more data usually better estimates.

  • We are not that interested in significance:

    a) Overall behaviour of the model more interesting

    b) More data higher probability of significant results (weak effects)

    c) Latent models are estimated by anchoring on indicator variables, different estimation can result in different patterns

22 / 46

Problems with SEM and alternatives

  1. Variables derived from the normal distribution
  2. Observations independent
  3. Large sample size
23 / 46

PiecewiseSEM

Variables are causally dependent if there is an arrow between them
There are causally independent if there are no arrows between them

X1 is causally independent from Y2 conditional on Y1

PiecewiseSEM performs a test of directional separation (d-sep) and asks whether causally independent paths are significant when controlling for variables on which causal process is conditional.

24 / 46

PiecewiseSEM

#install.packages('piecewiseSEM)
require(piecewiseSEM)
model1<-psem(lm(SocialBeh~Nutrition+PhyExer+GMA, data=Babies),
lm(CognitiveAb~SocialBeh+Nutrition+GMA, data=Babies))
summary(model1, .progressBar=FALSE)
##
## Structural Equation Model of model1
##
## Call:
## SocialBeh ~ Nutrition + PhyExer + GMA
## CognitiveAb ~ SocialBeh + Nutrition + GMA
##
## AIC BIC
## 229.364 255.416
##
## ---
## Tests of directed separation:
##
## Independ.Claim Test.Type DF Crit.Value P.Value
## CognitiveAb ~ PhyExer + ... coef 95 26.8792 0 ***
##
## Global goodness-of-fit:
##
## Fisher's C = 209.364 with P-value = 0 and on 2 degrees of freedom
##
## ---
## Coefficients:
##
## Response Predictor Estimate Std.Error DF Crit.Value P.Value Std.Estimate
## SocialBeh Nutrition 0.0300 1.1278 96 0.0266 0.9789 0.0020
## SocialBeh PhyExer 1.2814 0.1487 96 8.6187 0.0000 0.6604
## SocialBeh GMA -0.0753 0.1538 96 -0.4899 0.6253 -0.0375
## CognitiveAb SocialBeh 9.5792 0.4786 96 20.0156 0.0000 0.9023
## CognitiveAb Nutrition 7.0193 7.0413 96 0.9969 0.3213 0.0447
## CognitiveAb GMA 2.4607 0.9607 96 2.5614 0.0120 0.1155
##
##
## ***
##
## ***
##
## *
##
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05
##
## ---
## Individual R-squared:
##
## Response method R.squared
## SocialBeh none 0.44
## CognitiveAb none 0.81
25 / 46

Practical aspect

26 / 46

Getting the data

NBAPath<-read.table('NBApath.txt', sep='\t', header=T)
27 / 46

What is in the data?

summary(NBAPath)
## TEAM PCT Player Pos
## Length:3810 Min. :0.1061 Length:3810 Length:3810
## Class :character 1st Qu.:0.3780 Class :character Class :character
## Mode :character Median :0.5000 Mode :character Mode :character
## Mean :0.4905
## 3rd Qu.:0.6098
## Max. :0.8902
## Age GP PER
## Min. :18.00 Min. : 1.00 Min. :-13.10
## 1st Qu.:23.00 1st Qu.:34.00 1st Qu.: 10.00
## Median :25.00 Median :61.00 Median : 12.80
## Mean :26.05 Mean :53.73 Mean : 12.75
## 3rd Qu.:29.00 3rd Qu.:77.00 3rd Qu.: 15.80
## Max. :43.00 Max. :82.00 Max. : 35.20
28 / 46

Correlation matrix

cor(NBAPath[,c(2,5:7)])
## PCT Age GP PER
## PCT 1.00000000 0.14304325 0.08849459 0.07720633
## Age 0.14304325 1.00000000 0.05170204 0.03598025
## GP 0.08849459 0.05170204 1.00000000 0.45360129
## PER 0.07720633 0.03598025 0.45360129 1.00000000
29 / 46

Univariate plots

par(mfrow=c(1,2), bty='n',mar = c(5, 4, .1, .1), cex=1.1, pch=16)
plot(density(NBAPath$PER), main='')
plot(density(NBAPath$PCT), main='')

30 / 46

Bivariate plots

par(mfrow=c(1,2), bty='n',mar = c(5, 4, .1, .1), cex=1.1, pch=16)
plot(NBAPath$Age, NBAPath$PER)
plot(NBAPath$GP, NBAPath$PER)

31 / 46

Specification of the model

32 / 46

Identification of the model

What is the total number of parameters that we can estimate?


What is the number of free parameters that our model is estimating?


Is our model:
a) under-identified
b) just-identified
c) over-identified

Three path coefficients
Two error variances
One independent variable variance


Just identified model

33 / 46

Total number of parameters that we can estimate: 3*4/2 = 6
Number of free parameters: 1 variance, 2 errors, 3 regression pathways = 6
Just-identified model

Estimating the model

NBAmod1<-'
GP~b*Age
PER~a*Age+c*GP
dir := a
ind := b*c
tot := dir + ind
'
NBAfit1<-sem(NBAmod1, data=NBAPath)
summary(NBAfit1)
## lavaan 0.6-9 ended normally after 21 iterations
##
## Estimator ML
## Optimization method NLMINB
## Number of model parameters 5
##
## Number of observations 3810
##
## Model Test User Model:
##
## Test statistic 0.000
## Degrees of freedom 0
##
## Parameter Estimates:
##
## Standard errors Standard
## Information Expected
## Information saturated (h1) model Structured
##
## Regressions:
## Estimate Std.Err z-value P(>|z|)
## GP ~
## Age (b) 0.315 0.098 3.196 0.001
## PER ~
## Age (a) 0.016 0.018 0.869 0.385
## GP (c) 0.093 0.003 31.333 0.000
##
## Variances:
## Estimate Std.Err z-value P(>|z|)
## .GP 645.883 14.798 43.646 0.000
## .PER 21.834 0.500 43.646 0.000
##
## Defined Parameters:
## Estimate Std.Err z-value P(>|z|)
## dir 0.016 0.018 0.869 0.385
## ind 0.029 0.009 3.179 0.001
## tot 0.045 0.020 2.222 0.026
34 / 46

Explained variance - R2

When just identified model, we cannot use global indices of model fit
We need to use standard measures

inspect(NBAfit1, 'r2')
## GP PER
## 0.003 0.206
-2*logLik(NBAfit1)
## 'log Lik.' 58025.67 (df=5)
AIC(NBAfit1)
## [1] 58035.67
35 / 46

Respecification of the model

36 / 46
NBAmod2<-'
GP~b*Age
PER~c*GP
ind := b*c
'
NBAfit2<-sem(NBAmod2, data=NBAPath)
summary(NBAfit2, fit.measures=T)
## lavaan 0.6-9 ended normally after 21 iterations
##
## Estimator ML
## Optimization method NLMINB
## Number of model parameters 4
##
## Number of observations 3810
##
## Model Test User Model:
##
## Test statistic 0.755
## Degrees of freedom 1
## P-value (Chi-square) 0.385
##
## Model Test Baseline Model:
##
## Test statistic 888.633
## Degrees of freedom 3
## P-value 0.000
##
## User Model versus Baseline Model:
##
## Comparative Fit Index (CFI) 1.000
## Tucker-Lewis Index (TLI) 1.001
##
## Loglikelihood and Information Criteria:
##
## Loglikelihood user model (H0) -29013.211
## Loglikelihood unrestricted model (H1) -29012.833
##
## Akaike (AIC) 58034.422
## Bayesian (BIC) 58059.403
## Sample-size adjusted Bayesian (BIC) 58046.693
##
## Root Mean Square Error of Approximation:
##
## RMSEA 0.000
## 90 Percent confidence interval - lower 0.000
## 90 Percent confidence interval - upper 0.041
## P-value RMSEA <= 0.05 0.987
##
## Standardized Root Mean Square Residual:
##
## SRMR 0.005
##
## Parameter Estimates:
##
## Standard errors Standard
## Information Expected
## Information saturated (h1) model Structured
##
## Regressions:
## Estimate Std.Err z-value P(>|z|)
## GP ~
## Age (b) 0.315 0.098 3.196 0.001
## PER ~
## GP (c) 0.093 0.003 31.417 0.000
##
## Variances:
## Estimate Std.Err z-value P(>|z|)
## .GP 645.883 14.798 43.646 0.000
## .PER 21.838 0.500 43.646 0.000
##
## Defined Parameters:
## Estimate Std.Err z-value P(>|z|)
## ind 0.029 0.009 3.179 0.001

Model comparison

#install.packages('semTools')
require(semTools)
diff<-compareFit(NBAfit1, NBAfit2)
summary(diff)
## ################### Nested Model Comparison #########################
## Chi-Squared Difference Test
##
## Df AIC BIC Chisq Chisq diff Df diff Pr(>Chisq)
## NBAfit1 0 58036 58067 0.000
## NBAfit2 1 58034 58059 0.755 0.755 1 0.3849
##
## ####################### Model Fit Indices ###########################
## chisq df pvalue rmsea cfi tli srmr aic bic
## NBAfit1 .000† NA .000† 1.000† 1.000 .000† 58035.667 58066.894
## NBAfit2 .755 1 .385 .000† 1.000† 1.001† .005 58034.422† 58059.403†
##
## ################## Differences in Fit Indices #######################
## df rmsea cfi tli srmr aic bic
## NBAfit2 - NBAfit1 1 0 0 0.001 0.005 -1.245 -7.49
37 / 46

Respecification of the model

38 / 46
NBAmod3<-'
GP~b*Age
PER~a*Age+c*GP
PCT~d*PER
ind1 := b*c*d
ind2 := a*d
tot := ind1 + ind2
'
NBAfit3<-sem(NBAmod3, data=NBAPath)
summary(NBAfit3, fit.measures=T)
## lavaan 0.6-9 ended normally after 30 iterations
##
## Estimator ML
## Optimization method NLMINB
## Number of model parameters 7
##
## Number of observations 3810
##
## Model Test User Model:
##
## Test statistic 87.884
## Degrees of freedom 2
## P-value (Chi-square) 0.000
##
## Model Test Baseline Model:
##
## Test statistic 999.296
## Degrees of freedom 6
## P-value 0.000
##
## User Model versus Baseline Model:
##
## Comparative Fit Index (CFI) 0.914
## Tucker-Lewis Index (TLI) 0.741
##
## Loglikelihood and Information Criteria:
##
## Loglikelihood user model (H0) -27272.876
## Loglikelihood unrestricted model (H1) -27228.934
##
## Akaike (AIC) 54559.752
## Bayesian (BIC) 54603.469
## Sample-size adjusted Bayesian (BIC) 54581.227
##
## Root Mean Square Error of Approximation:
##
## RMSEA 0.106
## 90 Percent confidence interval - lower 0.088
## 90 Percent confidence interval - upper 0.126
## P-value RMSEA <= 0.05 0.000
##
## Standardized Root Mean Square Residual:
##
## SRMR 0.047
##
## Parameter Estimates:
##
## Standard errors Standard
## Information Expected
## Information saturated (h1) model Structured
##
## Regressions:
## Estimate Std.Err z-value P(>|z|)
## GP ~
## Age (b) 0.315 0.098 3.196 0.001
## PER ~
## Age (a) 0.016 0.018 0.869 0.385
## GP (c) 0.093 0.003 31.333 0.000
## PCT ~
## PER (d) 0.002 0.000 4.780 0.000
##
## Variances:
## Estimate Std.Err z-value P(>|z|)
## .GP 645.883 14.798 43.646 0.000
## .PER 21.834 0.500 43.646 0.000
## .PCT 0.023 0.001 43.646 0.000
##
## Defined Parameters:
## Estimate Std.Err z-value P(>|z|)
## ind1 0.000 0.000 2.647 0.008
## ind2 0.000 0.000 0.855 0.393
## tot 0.000 0.000 2.015 0.044

Parameter estimates

parameterestimates(NBAfit3, boot.ci.type ='bca.simple', standardized = T)
## lhs op rhs label est se z pvalue ci.lower ci.upper
## 1 GP ~ Age b 0.315 0.098 3.196 0.001 0.122 0.507
## 2 PER ~ Age a 0.016 0.018 0.869 0.385 -0.020 0.051
## 3 PER ~ GP c 0.093 0.003 31.333 0.000 0.087 0.099
## 4 PCT ~ PER d 0.002 0.000 4.780 0.000 0.001 0.003
## 5 GP ~~ GP 645.883 14.798 43.646 0.000 616.879 674.887
## 6 PER ~~ PER 21.834 0.500 43.646 0.000 20.853 22.814
## 7 PCT ~~ PCT 0.023 0.001 43.646 0.000 0.022 0.025
## 8 Age ~~ Age 17.498 0.000 NA NA 17.498 17.498
## 9 ind1 := b*c*d ind1 0.000 0.000 2.647 0.008 0.000 0.000
## 10 ind2 := a*d ind2 0.000 0.000 0.855 0.393 0.000 0.000
## 11 tot := ind1+ind2 tot 0.000 0.000 2.015 0.044 0.000 0.000
## std.lv std.all std.nox
## 1 0.315 0.052 0.012
## 2 0.016 0.013 0.003
## 3 0.093 0.453 0.453
## 4 0.002 0.077 0.077
## 5 645.883 0.997 0.997
## 6 21.834 0.794 0.794
## 7 0.023 0.994 0.994
## 8 17.498 1.000 17.498
## 9 0.000 0.002 0.000
## 10 0.000 0.001 0.000
## 11 0.000 0.003 0.001
39 / 46

Bootstrapping our model

#install.packages('bootstrap')
require(bootstrap)
boot<-bootstrapLavaan(NBAfit3, R=1000)
summary(boot)
## b a c d
## Min. :0.005998 Min. :-0.04514 Min. :0.08188 Min. :0.0009444
## 1st Qu.:0.248201 1st Qu.: 0.00455 1st Qu.:0.09038 1st Qu.:0.0019188
## Median :0.317217 Median : 0.01695 Median :0.09315 Median :0.0022331
## Mean :0.315653 Mean : 0.01603 Mean :0.09319 Mean :0.0022346
## 3rd Qu.:0.382654 3rd Qu.: 0.02715 3rd Qu.:0.09577 3rd Qu.:0.0025278
## Max. :0.708704 Max. : 0.07171 Max. :0.11073 Max. :0.0037156
## GP~~GP PER~~PER PCT~~PCT
## Min. :618.1 Min. :19.59 Min. :0.02220
## 1st Qu.:638.6 1st Qu.:21.29 1st Qu.:0.02321
## Median :646.0 Median :21.84 Median :0.02350
## Mean :645.7 Mean :21.85 Mean :0.02351
## 3rd Qu.:653.1 3rd Qu.:22.38 3rd Qu.:0.02379
## Max. :676.9 Max. :25.21 Max. :0.02485
40 / 46

Model building

41 / 46

Important aspects: theory

  • Difference between moderation, mediation and conditional process analysis
  • Exogenous and endogenous variables
  • Interpretation of the predictors
  • Calculation of free parameters and total parameters
  • Model identification: three-types of identifications
  • Overall fit of the model
42 / 46

Important aspects: practice

  • Building path model: both continous and categorical exogenous variables
  • Calculation of the direct and indirect pathways for predictors of interest
  • Adding an interaction to path model
  • Interpretation of the coefficients
  • Getting fit indices of the model
43 / 46

Literature

Chapters 1 to 5 of Principles and Practice of Structural Equation Modeling by Rex B. Kline

Introduction to Mediation, Moderation, and Conditional Process Analysis: A Regression-Based Approach by Andrew F. Hayes

Latent Variable Modeling Using R: A Step-by-Step Guide by A. Alexander Beaujean

44 / 46

Exercises for the next week

  1. Fill the reflection and feedback form by Monday: https://forms.gle/ZNpui99GyYZbE4UZ7

  2. Go over practical aspects of the lecture; Try building path models in R

  3. Think about what theory in psychology could you test using path modelling (regardless of you having the data). Think about how easy/difficult would be to defend the causal claims

45 / 46

Thank you for your attention

46 / 46

Press record

2 / 46
Paused

Help

Keyboard shortcuts

, , Pg Up, k Go to previous slide
, , Pg Dn, Space, j Go to next slide
Home Go to first slide
End Go to last slide
Number + Return Go to specific slide
b / m / f Toggle blackout / mirrored / fullscreen mode
c Clone slideshow
p Toggle presenter mode
t Restart the presentation timer
?, h Toggle this help
Esc Back to slideshow