Lecture 3: Path models

class: center, inverse
background-image: url("40084978.jpg")

# Lecture 3: Path models (Structural Equation Modelling)

### Dr Nemanja Vaci
---

</style>

## Press record

---

## Corrections from the previous lecture

---

## R code

[Link](https://nvaci.github.io/Lecture_3_code/Lecture3_Rcode.html)

---

## Structural equation modelling (SEM)

General framework that uses various models to test relationships among variables

Other terms: covariance structure analysis, covariance structure modelling, __causal modelling__

a) Regression model 
 - Legendre or Gauss (1805)

b) Confirmatory factor analysis 
 - Howe (1955), Anderson & Rubin (1956), Karl Joreskog (1963) 
 - Latent factor structure - Spearman (1904, 1927) 
 
c) Path model 
 - Sewell Wrigh (1918, 1921) 
 
d) Structural equation modelling: confirmatory factor + path model

---

## Terminology

Manifest variables: observed/collected variables 
Latent variables: infered measures - hypothetical constructs 
 - Indicator variables: measures used to infer the latent concepts

Endogenous variables: dependent outcomes 
Exogenous variables: predictors

Focus on covariance structure instead of mean

---

##  Graphical representation of the model

.center[
<img src="graphical.png", width = "120%"> 
]
---

## Prerequisites

Theory: Strong theoretical assumptions that could be tested using the data and specification of the model

Data: large samples, N:p rule - 20:1, more data usually better estimates. 
 - We are not that interested in significance: 
 a) Overall behaviour of the model more interesting 
 b) More data higher probability of significant results (weak effects) 
 c) Latent models are estimated by anchoring on indicator variables, different estimation can result in different patterns 
 d) Not that interesting theoretically

---

## Path analysis

Moderation, Mediation and Conditional Process Analysis 
---

## Path analysis

Moderation: An association between two variables is moderated when its size or sign depends on the values of the third variable

.center[
<img src="moder.png", width = "80%"> 
]

---

## Path analysis

Mediation: Variation in X variable influences variation in one of the mediators, which in turn result in variation in Y

.center[
<img src="mediat.png", width = "80%"> 
]

---

## Path analysis

Conditional process analysis: combination of mediation and moderation

.center[
<img src="CPA.png", width = "80%"> 
]

---

## Different possibilities

.center[
<img src="otherMod.png", width = "70%"> 
]

---

## Comment on causality and correlation

SEM does not test causal relationship, there are no statistical procedures that do this

Mathematical tools that help us to understand the data, extract the signal and interpret it

---

## Lavaan in R syntax

.center[
<img src="Rsyntax.png", width = "60%"> 
]

---

## Linear regression in Lavaan

```r
#install.packages('lavaan')
require(lavaan)
model1<-'
Height~1+Age #regression
'
fit1<-sem(model1, data=Babies)
```

.center[
<img src="Regression.png", width = "60%"> 
]

---

## Summary of the model
<style type="text/css">
pre {
 max-height: 300px;
 overflow-y: auto;
}

pre[class] {
 max-height: 100px;
}
</style>

```r
summary(fit1)
```

```
## lavaan 0.6-7 ended normally after 14 iterations
## 
##   Estimator                                         ML
##   Optimization method                           NLMINB
##   Number of free parameters                          3
##                                                       
##   Number of observations                           100
##                                                       
## Model Test User Model:
##                                                       
##   Test statistic                                 0.000
##   Degrees of freedom                                 0
## 
## Parameter Estimates:
## 
##   Standard errors                             Standard
##   Information                                 Expected
##   Information saturated (h1) model          Structured
## 
## Regressions:
##                    Estimate  Std.Err  z-value  P(>|z|)
##   Height ~                                            
##     Age               0.143    0.064    2.251    0.024
## 
## Intercepts:
##                    Estimate  Std.Err  z-value  P(>|z|)
##    .Height           57.026    1.176   48.509    0.000
## 
## Variances:
##                    Estimate  Std.Err  z-value  P(>|z|)
##    .Height           27.352    3.868    7.071    0.000
```

---

## Regression

```r
lm1<-lm(Height~Age, data=Babies)
summary(lm1)
```

```
## 
## Call:
## lm(formula = Height ~ Age, data = Babies)
## 
## Residuals:
## Min 1Q Median 3Q Max 
## -14.4765 -4.1601 -0.3703 3.9198 12.3842 
## 
## Coefficients:
## Estimate Std. Error t value Pr(>|t|) 
## (Intercept) 57.02580 1.18751 48.021 <2e-16 ***
## Age 0.14317 0.06426 2.228 0.0282 * 
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 5.283 on 98 degrees of freedom
## Multiple R-squared: 0.04821,	Adjusted R-squared: 0.0385 
## F-statistic: 4.964 on 1 and 98 DF, p-value: 0.02817
```

---

## Visualisation

```r
#install.packages('tidySEM')
require('tidySEM')
graph_sem(fit1, variance_diameter=.2)
```

---

## Interactions 1

We cannot add an interaction using _*_ sign as we would have in normal regression

```r
model2<-'
Height~1+Age*Weight
'
fit2<-sem(model2, data=Babies)
summary(fit2)
```

```
## lavaan 0.6-7 ended normally after 14 iterations
## 
##   Estimator                                         ML
##   Optimization method                           NLMINB
##   Number of free parameters                          3
##                                                       
##   Number of observations                           100
##                                                       
## Model Test User Model:
##                                                       
##   Test statistic                                 0.000
##   Degrees of freedom                                 0
## 
## Parameter Estimates:
## 
##   Standard errors                             Standard
##   Information                                 Expected
##   Information saturated (h1) model          Structured
## 
## Regressions:
##                    Estimate    Std.Err  z-value  P(>|z|)
##   Height ~                                              
##     Weight   (Age)      0.004    0.001    4.209    0.000
## 
## Intercepts:
##                    Estimate    Std.Err  z-value  P(>|z|)
##    .Height             41.923    4.180   10.030    0.000
## 
## Variances:
##                    Estimate    Std.Err  z-value  P(>|z|)
##    .Height             24.412    3.452    7.071    0.000
```

---

## Interactions 2

We cannot add an interaction using _*_ sign as we would have in normal regression

We need to create a new variable that codes the interaction

```r
Babies$AgeWeight = Babies$Age * Babies$Weight
Babies$AgeGender = Babies$Age * ifelse(Babies$Gender=='Girls',0,1) 
head(Babies)
```

```
##   Age   Weight   Height Gender Crawl TummySleep PhysicalSt AgeWeight AgeGender
## 1   4 3667.525 61.47215  Girls     0          1   32.57839  14670.10         0
## 2   7 3871.738 56.05987  Girls     1          0   25.43127  27102.16         0
## 3  22 4339.391 59.28653   Boys     1          1   29.91834  95466.60        22
## 4  26 4448.422 55.17084   Boys     0          1   24.71719 115658.98        26
## 5  24 4309.178 60.26487   Boys     1          0   30.41685 103420.28        24
## 6  11 4365.727 54.55308   Boys     1          1   34.54519  48023.00        11
```

---

## Interactions 3

We cannot add an interaction using __*__ sign as we would have in normal regression

We need to create a new variable that codes the interaction

```r
model2<-'
Height~1+Age+Weight + AgeWeight
'
fit2<-sem(model2, data=Babies)
summary(fit2)
```

```
## lavaan 0.6-7 ended normally after 22 iterations
## 
##   Estimator                                         ML
##   Optimization method                           NLMINB
##   Number of free parameters                          5
##                                                       
##   Number of observations                           100
##                                                       
## Model Test User Model:
##                                                       
##   Test statistic                                 0.000
##   Degrees of freedom                                 0
## 
## Parameter Estimates:
## 
##   Standard errors                             Standard
##   Information                                 Expected
##   Information saturated (h1) model          Structured
## 
## Regressions:
##                    Estimate        Std.Err  z-value  P(>|z|)
##   Height ~                                                  
##     Age                     0.691    0.493    1.402    0.161
##     Weight                  0.007    0.002    2.889    0.004
##     AgeWeight              -0.000    0.000   -1.141    0.254
## 
## Intercepts:
##                    Estimate        Std.Err  z-value  P(>|z|)
##    .Height                 30.724    9.205    3.338    0.001
## 
## Variances:
##                    Estimate        Std.Err  z-value  P(>|z|)
##    .Height                 22.925    3.242    7.071    0.000
```

---

## Theory

Development of muscles in early months of infancy supports physical strenght, where as the time passes infants are becoming physically stronger. Infants that experience stronger early development, measured through their height also experience higher levels of physical strenght.

Hypothethical assumptions: 
Positive effect of age on the physical activity 
Effect of age on physical activity is mediated by Babies height

---

## Specification of the model

Representation of our hypothetical assumptions in the form of the structural equation model

Let's check what our Babies think:

.center[
<img src="Image1.png", width = "80%"> 
]

---

## Representation of the model

.center[
<img src="Image2.png", width = "80%"> 
]

---

## Estimation of the model

```r
modelStrength<-'
Height~Age
PhysicalSt~Age+Height
'
fitStr1<-sem(modelStrength, data=Babies)
summary(fitStr1)
```

```
## lavaan 0.6-7 ended normally after 15 iterations
## 
##   Estimator                                         ML
##   Optimization method                           NLMINB
##   Number of free parameters                          5
##                                                       
##   Number of observations                           100
##                                                       
## Model Test User Model:
##                                                       
##   Test statistic                                 0.000
##   Degrees of freedom                                 0
## 
## Parameter Estimates:
## 
##   Standard errors                             Standard
##   Information                                 Expected
##   Information saturated (h1) model          Structured
## 
## Regressions:
##                    Estimate  Std.Err  z-value  P(>|z|)
##   Height ~                                            
##     Age               0.143    0.064    2.251    0.024
##   PhysicalSt ~                                        
##     Age              -0.007    0.061   -0.120    0.905
##     Height            0.425    0.094    4.518    0.000
## 
## Variances:
##                    Estimate  Std.Err  z-value  P(>|z|)
##    .Height           27.352    3.868    7.071    0.000
##    .PhysicalSt       24.183    3.420    7.071    0.000
```

---

## Visualisation of the results

```r
require(semPlot)
semPaths(fitStr1, 'model','est', edge.label.cex = 1.1)
```

---

## What is the effect of Age?

.center[
<img src="Image2.png", width = "70%">
]

Direct effect: `$a = -0.01$` 
Indirect effect: `$b*c= 0.14 * 0.42$` 
Total effect: `$a+(b*c)=-0.01+(0.14*0.42)$`

---

## Age effects

```r
modelStrength<-'
Height~b*Age
PhysicalSt~a*Age+c*Height

##quantification of effects
dir := a
ind := b*c
tot := dir+ind
'
fitStr1<-sem(modelStrength, data=Babies)
summary(fitStr1)
```

```
## lavaan 0.6-7 ended normally after 15 iterations
## 
##   Estimator                                         ML
##   Optimization method                           NLMINB
##   Number of free parameters                          5
##                                                       
##   Number of observations                           100
##                                                       
## Model Test User Model:
##                                                       
##   Test statistic                                 0.000
##   Degrees of freedom                                 0
## 
## Parameter Estimates:
## 
##   Standard errors                             Standard
##   Information                                 Expected
##   Information saturated (h1) model          Structured
## 
## Regressions:
##                    Estimate  Std.Err  z-value  P(>|z|)
##   Height ~                                            
##     Age        (b)    0.143    0.064    2.251    0.024
##   PhysicalSt ~                                        
##     Age        (a)   -0.007    0.061   -0.120    0.905
##     Height     (c)    0.425    0.094    4.518    0.000
## 
## Variances:
##                    Estimate  Std.Err  z-value  P(>|z|)
##    .Height           27.352    3.868    7.071    0.000
##    .PhysicalSt       24.183    3.420    7.071    0.000
## 
## Defined Parameters:
##                    Estimate  Std.Err  z-value  P(>|z|)
##     dir              -0.007    0.061   -0.120    0.905
##     ind               0.061    0.030    2.015    0.044
##     tot               0.053    0.066    0.815    0.415
```

---

## More pathways

---

## Lets modify model

```r
modelStrength2<-'
Height~b*Age
Weight~e*Age
PhysicalSt~a*Age+c*Height+f*Weight

##quantification of effects
dir := a
ind := b*c+e*f
tot := dir+ind
'
```

---

## Results

```r
fitStr2<-sem(modelStrength2, data=Babies)
```

```
## Warning in lav_data_full(data = data, group = group, cluster = cluster, : lavaan
## WARNING: some observed variances are (at least) a factor 1000 times larger than
## others; use varTable(fit) to investigate
```

```r
summary(fitStr2)
```

```
## lavaan 0.6-7 ended normally after 32 iterations
## 
##   Estimator                                         ML
##   Optimization method                           NLMINB
##   Number of free parameters                          8
##                                                       
##   Number of observations                           100
##                                                       
## Model Test User Model:
##                                                       
##   Test statistic                                16.364
##   Degrees of freedom                                 1
##   P-value (Chi-square)                           0.000
## 
## Parameter Estimates:
## 
##   Standard errors                             Standard
##   Information                                 Expected
##   Information saturated (h1) model          Structured
## 
## Regressions:
##                    Estimate    Std.Err   z-value  P(>|z|)
##   Height ~                                               
##     Age        (b)      0.143     0.064    2.251    0.024
##   Weight ~                                               
##     Age        (e)      2.439     5.774    0.422    0.673
##   PhysicalSt ~                                           
##     Age        (a)     -0.005     0.061   -0.075    0.940
##     Height     (c)      0.387     0.094    4.138    0.000
##     Weight     (f)      0.001     0.001    1.032    0.302
## 
## Variances:
##                    Estimate    Std.Err   z-value  P(>|z|)
##    .Height             27.352     3.868    7.071    0.000
##    .Weight         225306.345 31863.129    7.071    0.000
##    .PhysicalSt         23.966     3.389    7.071    0.000
## 
## Defined Parameters:
##                    Estimate    Std.Err   z-value  P(>|z|)
##     dir                -0.005     0.061   -0.075    0.940
##     ind                 0.058     0.029    2.014    0.044
##     tot                 0.053     0.065    0.826    0.409
```

---

## Additional information

```r
parameterestimates(fitStr2, boot.ci.type ='bca.simple', standardized = T)
```

```
##           lhs op        rhs label        est        se      z pvalue   ci.lower
## 1      Height  ~        Age     b      0.143     0.064  2.251  0.024      0.018
## 2      Weight  ~        Age     e      2.439     5.774  0.422  0.673     -8.878
## 3  PhysicalSt  ~        Age     a     -0.005     0.061 -0.075  0.940     -0.124
## 4  PhysicalSt  ~     Height     c      0.387     0.094  4.138  0.000      0.204
## 5  PhysicalSt  ~     Weight     f      0.001     0.001  1.032  0.302     -0.001
## 6      Height ~~     Height           27.352     3.868  7.071  0.000     19.771
## 7      Weight ~~     Weight       225306.345 31863.129  7.071  0.000 162855.760
## 8  PhysicalSt ~~ PhysicalSt           23.966     3.389  7.071  0.000     17.323
## 9         Age ~~        Age           67.588     0.000     NA     NA     67.588
## 10        dir :=          a   dir     -0.005     0.061 -0.075  0.940     -0.124
## 11        ind :=    b*c+e*f   ind      0.058     0.029  2.014  0.044      0.002
## 12        tot :=    dir+ind   tot      0.053     0.065  0.826  0.409     -0.073
##      ci.upper     std.lv std.all std.nox
## 1       0.268      0.143   0.220   0.027
## 2      13.755      2.439   0.042   0.005
## 3       0.115     -0.005  -0.007  -0.001
## 4       0.571      0.387   0.389   0.389
## 5       0.003      0.001   0.095   0.095
## 6      34.933     27.352   0.952   0.952
## 7  287756.930 225306.345   0.998   0.998
## 8      30.609     23.966   0.840   0.840
## 9      67.588     67.588   1.000  67.588
## 10      0.115     -0.005  -0.007  -0.001
## 11      0.115      0.058   0.089   0.011
## 12      0.180      0.053   0.082   0.010
```

---

## Categorical variables: exogenous

<img src="image5.png" width="70%" style="display: block; margin: auto;" />
---

## Results: categorical predictor exogenous

```r
Babies$Gender=ifelse(Babies$Gender=='Girls',0,1)

modelStrength3<-'
Height~Age
PhysicalSt~Age+Height+Gender
'
fitStr3<-sem(modelStrength3, data=Babies)
summary(fitStr3)
```

```
## lavaan 0.6-7 ended normally after 21 iterations
## 
##   Estimator                                         ML
##   Optimization method                           NLMINB
##   Number of free parameters                          6
##                                                       
##   Number of observations                           100
##                                                       
## Model Test User Model:
##                                                       
##   Test statistic                                 0.630
##   Degrees of freedom                                 1
##   P-value (Chi-square)                           0.427
## 
## Parameter Estimates:
## 
##   Standard errors                             Standard
##   Information                                 Expected
##   Information saturated (h1) model          Structured
## 
## Regressions:
##                    Estimate  Std.Err  z-value  P(>|z|)
##   Height ~                                            
##     Age               0.143    0.064    2.251    0.024
##   PhysicalSt ~                                        
##     Age              -0.007    0.061   -0.116    0.907
##     Height            0.426    0.094    4.526    0.000
##     Gender            0.090    0.986    0.091    0.927
## 
## Variances:
##                    Estimate  Std.Err  z-value  P(>|z|)
##    .Height           27.352    3.868    7.071    0.000
##    .PhysicalSt       24.181    3.420    7.071    0.000
```

---

## Categorical variables: endogenous

If it is endogenous (being predicted), then we need to specify this as a categorical variable and use different estimator (WLSMV)

---

## Results: categorical predictor 2

```r
modelStrength4<-'
Height~Age
Gender~Age
PhysicalSt~Age+Height+Gender
'
fitStr4<-sem(modelStrength4, ordered = c('Gender'),data=Babies)
summary(fitStr4)
```

```
## lavaan 0.6-7 ended normally after 32 iterations
## 
##   Estimator                                       DWLS
##   Optimization method                           NLMINB
##   Number of free parameters                         10
##                                                       
##   Number of observations                           100
##                                                       
## Model Test User Model:
##                                               Standard      Robust
##   Test Statistic                                 0.666       0.666
##   Degrees of freedom                                 1           1
##   P-value (Chi-square)                           0.415       0.415
##   Scaling correction factor                                  1.000
##   Shift parameter                                           -0.000
##        simple second-order correction                             
## 
## Parameter Estimates:
## 
##   Standard errors                           Robust.sem
##   Information                                 Expected
##   Information saturated (h1) model        Unstructured
## 
## Regressions:
##                    Estimate  Std.Err  z-value  P(>|z|)
##   Height ~                                            
##     Age               0.143    0.060    2.397    0.017
##   Gender ~                                            
##     Age              -0.008    0.015   -0.545    0.586
##   PhysicalSt ~                                        
##     Age              -0.009    0.071   -0.123    0.902
##     Height            0.425    0.098    4.322    0.000
##     Gender           -0.164    0.663   -0.248    0.804
## 
## Intercepts:
##                    Estimate  Std.Err  z-value  P(>|z|)
##    .Height           57.026    1.172   48.649    0.000
##    .Gender            0.000                           
##    .PhysicalSt        5.678    5.801    0.979    0.328
## 
## Thresholds:
##                    Estimate  Std.Err  z-value  P(>|z|)
##     Gender|t1        -0.189    0.284   -0.666    0.505
## 
## Variances:
##                    Estimate  Std.Err  z-value  P(>|z|)
##    .Height           27.352    4.513    6.061    0.000
##    .Gender            1.000                           
##    .PhysicalSt       24.156    3.551    6.803    0.000
## 
## Scales y*:
##                    Estimate  Std.Err  z-value  P(>|z|)
##     Gender            1.000
```

---

## Conditional process analysis

```r
modelStrengthCond<-'
Height~Age
PhysicalSt~Age+Height+AgeGender
'
```

<img src="image7.png" width="70%" style="display: block; margin: auto;" />
---
## Conditional process analysis: results

```r
fitStrCond<-sem(modelStrengthCond, data=Babies)
summary(fitStrCond)
```

```
## lavaan 0.6-7 ended normally after 14 iterations
## 
##   Estimator                                         ML
##   Optimization method                           NLMINB
##   Number of free parameters                          6
##                                                       
##   Number of observations                           100
##                                                       
## Model Test User Model:
##                                                       
##   Test statistic                                 1.628
##   Degrees of freedom                                 1
##   P-value (Chi-square)                           0.202
## 
## Parameter Estimates:
## 
##   Standard errors                             Standard
##   Information                                 Expected
##   Information saturated (h1) model          Structured
## 
## Regressions:
##                    Estimate  Std.Err  z-value  P(>|z|)
##   Height ~                                            
##     Age               0.143    0.064    2.251    0.024
##   PhysicalSt ~                                        
##     Age              -0.008    0.066   -0.128    0.898
##     Height            0.425    0.094    4.524    0.000
##     AgeGender         0.002    0.053    0.043    0.966
## 
## Variances:
##                    Estimate  Std.Err  z-value  P(>|z|)
##    .Height           27.352    3.868    7.071    0.000
##    .PhysicalSt       24.182    3.420    7.071    0.000
```
---

## Intepretation of the predictors

```
## lavaan 0.6-7 ended normally after 15 iterations
## 
##   Estimator                                         ML
##   Optimization method                           NLMINB
##   Number of free parameters                          5
##                                                       
##   Number of observations                           100
##                                                       
## Model Test User Model:
##                                                       
##   Test statistic                                 0.000
##   Degrees of freedom                                 0
## 
## Parameter Estimates:
## 
##   Standard errors                             Standard
##   Information                                 Expected
##   Information saturated (h1) model          Structured
## 
## Regressions:
##                    Estimate  Std.Err  z-value  P(>|z|)
##   Height ~                                            
##     Age        (b)    0.143    0.064    2.251    0.024
##   PhysicalSt ~                                        
##     Age        (a)   -0.007    0.061   -0.120    0.905
##     Height     (c)    0.425    0.094    4.518    0.000
## 
## Variances:
##                    Estimate  Std.Err  z-value  P(>|z|)
##    .Height           27.352    3.868    7.071    0.000
##    .PhysicalSt       24.183    3.420    7.071    0.000
## 
## Defined Parameters:
##                    Estimate  Std.Err  z-value  P(>|z|)
##     dir              -0.007    0.061   -0.120    0.905
##     ind               0.061    0.030    2.015    0.044
##     tot               0.053    0.066    0.815    0.415
```

Model predicts that: 
1 month older Babies are in average taller by 0.143 
Comparing babies that have same height, but are 1 month older model predicts that they are on average weaker by 0.007 
Comparing babies that have same age, but are 1 cm taller model predicts that they are stronger by 0.425 
Indirect effect: for every __b__ (0.143) unit increase in the association between age and height, there is an __ind__ (0.061) increase in strenght of the babies
---

## Can model be estimated?

Total Number of the parameters that we can estimate: `$\frac{variables*(variables+1)}{2}$`

.center[
<img src="Image2.png", width = "60%">
]

Three path coefficients 
Two error variances 
One independent variable variance

---

## Model identification

1. Underindentified: more free parameters than total possible parameters 
2. Just-identified: equal number of free parameters and total possible parameters 
3. Overidentified: fewer free parameters than total possible parameters 
 
Parameters can either be: free, fixed or constrained

---

## Fixing parameters

```r
modelStrengthFix<-'
Height~Age
PhysicalSt~0 *Age+Height
'
fitStr1Fix<-sem(modelStrengthFix, data=Babies)
summary(fitStr1Fix)
```

```
## lavaan 0.6-7 ended normally after 18 iterations
## 
##   Estimator                                         ML
##   Optimization method                           NLMINB
##   Number of free parameters                          4
##                                                       
##   Number of observations                           100
##                                                       
## Model Test User Model:
##                                                       
##   Test statistic                                 0.014
##   Degrees of freedom                                 1
##   P-value (Chi-square)                           0.905
## 
## Parameter Estimates:
## 
##   Standard errors                             Standard
##   Information                                 Expected
##   Information saturated (h1) model          Structured
## 
## Regressions:
##                    Estimate  Std.Err  z-value  P(>|z|)
##   Height ~                                            
##     Age               0.143    0.064    2.251    0.024
##   PhysicalSt ~                                        
##     Age               0.000                           
##     Height            0.422    0.092    4.604    0.000
## 
## Variances:
##                    Estimate  Std.Err  z-value  P(>|z|)
##    .Height           27.352    3.868    7.071    0.000
##    .PhysicalSt       24.186    3.420    7.071    0.000
```

---

## Constraining parameters

```r
modelStrengthCons<-'
Height~a*Age
PhysicalSt~a*Age+Height
'
fitStr1Cons<-sem(modelStrengthCons, data=Babies)
summary(fitStr1Cons)
```

---

## Overall fit: Chi-square test

Measure of how well model-implied covariance matrix fits data covariance 
We would prefer not to reject the null hypothesis in this case

Assumptions: 
Multivariate normality 
N is sufficiently large (150+) 
Parameters are not at boundary or invalid (e.g. variance of zero)

With the large samples it is sensitive to small misfits 
Nonormality induces bias 
---

## Overall fit: Other indices

.center[
<img src="fitInd.png", width = "60%">
]

---

## Fit indices of our model

```r
summary(fitStr1Cons, fit.measures=TRUE)
```

```
## lavaan 0.6-7 ended normally after 14 iterations
## 
## Estimator ML
## Optimization method NLMINB
## Number of free parameters 5
## Number of equality constraints 1
## 
## Number of observations 100
## 
## Model Test User Model:
## 
## Test statistic 2.882
## Degrees of freedom 1
## P-value (Chi-square) 0.090
## 
## Model Test Baseline Model:
## 
## Test statistic 24.179
## Degrees of freedom 3
## P-value 0.000
## 
## User Model versus Baseline Model:
## 
## Comparative Fit Index (CFI) 0.911
## Tucker-Lewis Index (TLI) 0.733
## 
## Loglikelihood and Information Criteria:
## 
## Loglikelihood user model (H0) -609.950
## Loglikelihood unrestricted model (H1) -608.509
## 
## Akaike (AIC) 1227.899
## Bayesian (BIC) 1238.320
## Sample-size adjusted Bayesian (BIC) 1225.687
## 
## Root Mean Square Error of Approximation:
## 
## RMSEA 0.137
## 90 Percent confidence interval - lower 0.000
## 90 Percent confidence interval - upper 0.334
## P-value RMSEA <= 0.05 0.130
## 
## Standardized Root Mean Square Residual:
## 
## SRMR 0.056
## 
## Parameter Estimates:
## 
## Standard errors Standard
## Information Expected
## Information saturated (h1) model Structured
## 
## Regressions:
## Estimate Std.Err z-value P(>|z|)
## Height ~ 
## Age (a) 0.065 0.044 1.479 0.139
## PhysicalSt ~ 
## Age (a) 0.065 0.044 1.479 0.139
## Height 0.400 0.094 4.271 0.000
## 
## Variances:
## Estimate Std.Err z-value P(>|z|)
## .Height 27.764 3.926 7.071 0.000
## .PhysicalSt 24.520 3.468 7.071 0.000
```

---

## Fit indices for our model

TLI: fit of .95 indicates that the fitted model improves the fit by 95% relative to the null mode, works OK with smaller sample sizes 
CFI: Same as TLI, but not very sensitive to sample size 
RMSEA: difference between the residuals of the sample covariance matrix and hypothesized model. If we have different scales it is hard to interpret, then we can check standardised root mean square residual (SRMR)

---

class: inverse, middle, center
# Practical aspect
---

## Getting the data

Influence of the media upon subsequent actions: [Link](http://finzi.psych.upenn.edu/library/psych/html/tal_or.html)

```r
NBAPath<-read.table('NBApath.txt', sep='\t', header=T)
```
---

## What is in the data?

```r
summary(NBAPath)
```

```
##      TEAM                PCT            Player              Pos           
##  Length:3810        Min.   :0.1061   Length:3810        Length:3810       
##  Class :character   1st Qu.:0.3780   Class :character   Class :character  
##  Mode  :character   Median :0.5000   Mode  :character   Mode  :character  
##                     Mean   :0.4905                                        
##                     3rd Qu.:0.6098                                        
##                     Max.   :0.8902                                        
##       Age              GP             PER        
##  Min.   :18.00   Min.   : 1.00   Min.   :-13.10  
##  1st Qu.:23.00   1st Qu.:34.00   1st Qu.: 10.00  
##  Median :25.00   Median :61.00   Median : 12.80  
##  Mean   :26.05   Mean   :53.73   Mean   : 12.75  
##  3rd Qu.:29.00   3rd Qu.:77.00   3rd Qu.: 15.80  
##  Max.   :43.00   Max.   :82.00   Max.   : 35.20
```

---

## Correlation matrix

```r
cor(NBAPath[,c(2,5:7)])
```

```
##            PCT        Age         GP        PER
## PCT 1.00000000 0.14304325 0.08849459 0.07720633
## Age 0.14304325 1.00000000 0.05170204 0.03598025
## GP  0.08849459 0.05170204 1.00000000 0.45360129
## PER 0.07720633 0.03598025 0.45360129 1.00000000
```

---

## Univariate plots

```r
par(mfrow=c(1,2), bty='n',mar = c(5, 4, .1, .1), cex=1.1, pch=16)
plot(density(NBAPath$PER), main='')
plot(density(NBAPath$PCT), main='')
```

---

## Bivariate plots

```r
par(mfrow=c(1,2), bty='n',mar = c(5, 4, .1, .1), cex=1.1, pch=16)
plot(NBAPath$Age, NBAPath$PER)
plot(NBAPath$GP, NBAPath$PER)
```

---

## Specification of the model

---

## Identification of the model

Three path coefficients 
Two error variances 
One independent variable variance

Number of distinct parameters that we can estimate: 3*4/2 = 6

Just identified model

---

## Estimating the model

```r
NBAmod1<-'
GP~b*Age
PER~a*Age+c*GP

dir := a
ind := b*c
tot := dir + ind
'
NBAfit1<-sem(NBAmod1, data=NBAPath)
summary(NBAfit1)
```

```
## lavaan 0.6-7 ended normally after 21 iterations
## 
##   Estimator                                         ML
##   Optimization method                           NLMINB
##   Number of free parameters                          5
##                                                       
##   Number of observations                          3810
##                                                       
## Model Test User Model:
##                                                       
##   Test statistic                                 0.000
##   Degrees of freedom                                 0
## 
## Parameter Estimates:
## 
##   Standard errors                             Standard
##   Information                                 Expected
##   Information saturated (h1) model          Structured
## 
## Regressions:
##                    Estimate  Std.Err  z-value  P(>|z|)
##   GP ~                                                
##     Age        (b)    0.315    0.098    3.196    0.001
##   PER ~                                               
##     Age        (a)    0.016    0.018    0.869    0.385
##     GP         (c)    0.093    0.003   31.333    0.000
## 
## Variances:
##                    Estimate  Std.Err  z-value  P(>|z|)
##    .GP              645.883   14.798   43.646    0.000
##    .PER              21.834    0.500   43.646    0.000
## 
## Defined Parameters:
##                    Estimate  Std.Err  z-value  P(>|z|)
##     dir               0.016    0.018    0.869    0.385
##     ind               0.029    0.009    3.179    0.001
##     tot               0.045    0.020    2.222    0.026
```

---

## Explained variance - R2

When just identified model, we cannot use global indices of model fit 
We need to use standard measures

```r
inspect(NBAfit1, 'r2')
```

```
##    GP   PER 
## 0.003 0.206
```

```r
-2*logLik(NBAfit1)
```

```
## 'log Lik.' 58025.67 (df=5)
```

```r
AIC(NBAfit1)
```

```
## [1] 58035.67
```

---

## Respecification of the model

---

## Estimating the model

```r
NBAmod2<-'
GP~b*Age
PER~c*GP

ind := b*c
'
NBAfit2<-sem(NBAmod2, data=NBAPath)
summary(NBAfit2, fit.measures=T)
```

```
## lavaan 0.6-7 ended normally after 21 iterations
## 
## Estimator ML
## Optimization method NLMINB
## Number of free parameters 4
## 
## Number of observations 3810
## 
## Model Test User Model:
## 
## Test statistic 0.755
## Degrees of freedom 1
## P-value (Chi-square) 0.385
## 
## Model Test Baseline Model:
## 
## Test statistic 888.633
## Degrees of freedom 3
## P-value 0.000
## 
## User Model versus Baseline Model:
## 
## Comparative Fit Index (CFI) 1.000
## Tucker-Lewis Index (TLI) 1.001
## 
## Loglikelihood and Information Criteria:
## 
## Loglikelihood user model (H0) -29013.211
## Loglikelihood unrestricted model (H1) -29012.833
## 
## Akaike (AIC) 58034.422
## Bayesian (BIC) 58059.403
## Sample-size adjusted Bayesian (BIC) 58046.693
## 
## Root Mean Square Error of Approximation:
## 
## RMSEA 0.000
## 90 Percent confidence interval - lower 0.000
## 90 Percent confidence interval - upper 0.041
## P-value RMSEA <= 0.05 0.987
## 
## Standardized Root Mean Square Residual:
## 
## SRMR 0.005
## 
## Parameter Estimates:
## 
## Standard errors Standard
## Information Expected
## Information saturated (h1) model Structured
## 
## Regressions:
## Estimate Std.Err z-value P(>|z|)
## GP ~ 
## Age (b) 0.315 0.098 3.196 0.001
## PER ~ 
## GP (c) 0.093 0.003 31.417 0.000
## 
## Variances:
## Estimate Std.Err z-value P(>|z|)
## .GP 645.883 14.798 43.646 0.000
## .PER 21.838 0.500 43.646 0.000
## 
## Defined Parameters:
## Estimate Std.Err z-value P(>|z|)
## ind 0.029 0.009 3.179 0.001
```
---

## Model comparison

```r
#install.packages('semTools')
require(semTools)
diff<-compareFit(NBAfit1, NBAfit2)
summary(diff)
```

```
## ################### Nested Model Comparison #########################
## Chi-Squared Difference Test
## 
##         Df   AIC   BIC Chisq Chisq diff Df diff Pr(>Chisq)
## NBAfit1  0 58036 58067 0.000                              
## NBAfit2  1 58034 58059 0.755      0.755       1     0.3849
## 
## ####################### Model Fit Indices ###########################
##         chisq df pvalue    cfi    tli        aic        bic rmsea  srmr
## NBAfit1 .000†        NA 1.000† 1.000  58035.667  58066.894  .000† .000†
## NBAfit2 .755   1   .385 1.000† 1.001† 58034.422† 58059.403† .000† .005 
## 
## ################## Differences in Fit Indices #######################
##                   df cfi   tli    aic   bic rmsea  srmr
## NBAfit2 - NBAfit1  1   0 0.001 -1.245 -7.49     0 0.005
```

---

## Respecification of the model

---

## Estimating the model

```r
NBAmod3<-'
GP~b*Age
PER~a*Age+c*GP
PCT~d*PER
ind1 := b*c*d
ind2 := a*d
tot := ind1 + ind2
'
NBAfit3<-sem(NBAmod3, data=NBAPath)
summary(NBAfit3, fit.measures=T)
```

```
## lavaan 0.6-7 ended normally after 30 iterations
## 
## Estimator ML
## Optimization method NLMINB
## Number of free parameters 7
## 
## Number of observations 3810
## 
## Model Test User Model:
## 
## Test statistic 87.884
## Degrees of freedom 2
## P-value (Chi-square) 0.000
## 
## Model Test Baseline Model:
## 
## Test statistic 999.296
## Degrees of freedom 6
## P-value 0.000
## 
## User Model versus Baseline Model:
## 
## Comparative Fit Index (CFI) 0.914
## Tucker-Lewis Index (TLI) 0.741
## 
## Loglikelihood and Information Criteria:
## 
## Loglikelihood user model (H0) -27272.876
## Loglikelihood unrestricted model (H1) -27228.934
## 
## Akaike (AIC) 54559.752
## Bayesian (BIC) 54603.469
## Sample-size adjusted Bayesian (BIC) 54581.227
## 
## Root Mean Square Error of Approximation:
## 
## RMSEA 0.106
## 90 Percent confidence interval - lower 0.088
## 90 Percent confidence interval - upper 0.126
## P-value RMSEA <= 0.05 0.000
## 
## Standardized Root Mean Square Residual:
## 
## SRMR 0.047
## 
## Parameter Estimates:
## 
## Standard errors Standard
## Information Expected
## Information saturated (h1) model Structured
## 
## Regressions:
## Estimate Std.Err z-value P(>|z|)
## GP ~ 
## Age (b) 0.315 0.098 3.196 0.001
## PER ~ 
## Age (a) 0.016 0.018 0.869 0.385
## GP (c) 0.093 0.003 31.333 0.000
## PCT ~ 
## PER (d) 0.002 0.000 4.780 0.000
## 
## Variances:
## Estimate Std.Err z-value P(>|z|)
## .GP 645.883 14.798 43.646 0.000
## .PER 21.834 0.500 43.646 0.000
## .PCT 0.023 0.001 43.646 0.000
## 
## Defined Parameters:
## Estimate Std.Err z-value P(>|z|)
## ind1 0.000 0.000 2.647 0.008
## ind2 0.000 0.000 0.855 0.393
## tot 0.000 0.000 2.015 0.044
```

---

## Parameter estimates

```r
parameterestimates(NBAfit3, boot.ci.type ='bca.simple', standardized = T)
```

```
##     lhs op       rhs label     est     se      z pvalue ci.lower ci.upper
## 1    GP  ~       Age     b   0.315  0.098  3.196  0.001    0.122    0.507
## 2   PER  ~       Age     a   0.016  0.018  0.869  0.385   -0.020    0.051
## 3   PER  ~        GP     c   0.093  0.003 31.333  0.000    0.087    0.099
## 4   PCT  ~       PER     d   0.002  0.000  4.780  0.000    0.001    0.003
## 5    GP ~~        GP       645.883 14.798 43.646  0.000  616.879  674.887
## 6   PER ~~       PER        21.834  0.500 43.646  0.000   20.853   22.814
## 7   PCT ~~       PCT         0.023  0.001 43.646  0.000    0.022    0.025
## 8   Age ~~       Age        17.498  0.000     NA     NA   17.498   17.498
## 9  ind1 :=     b*c*d  ind1   0.000  0.000  2.647  0.008    0.000    0.000
## 10 ind2 :=       a*d  ind2   0.000  0.000  0.855  0.393    0.000    0.000
## 11  tot := ind1+ind2   tot   0.000  0.000  2.015  0.044    0.000    0.000
##     std.lv std.all std.nox
## 1    0.315   0.052   0.012
## 2    0.016   0.013   0.003
## 3    0.093   0.453   0.453
## 4    0.002   0.077   0.077
## 5  645.883   0.997   0.997
## 6   21.834   0.794   0.794
## 7    0.023   0.994   0.994
## 8   17.498   1.000  17.498
## 9    0.000   0.002   0.000
## 10   0.000   0.001   0.000
## 11   0.000   0.003   0.001
```

---

## Model building

.center[
<img src="Loop.png", width = "50%">
]

---

## Bootstrapping our model

```r
#install.packages('bootstrap')
require(bootstrap)
```

```
## Loading required package: bootstrap
```

```
## Warning: package 'bootstrap' was built under R version 4.0.3
```

```r
boot<-bootstrapLavaan(NBAfit3, R=1000)
summary(boot)
```

```
##        b                 a                  c                 d            
##  Min.   :0.04561   Min.   :-0.03977   Min.   :0.08148   Min.   :0.0009711  
##  1st Qu.:0.25156   1st Qu.: 0.00430   1st Qu.:0.09061   1st Qu.:0.0019390  
##  Median :0.31844   Median : 0.01583   Median :0.09330   Median :0.0022062  
##  Mean   :0.31566   Mean   : 0.01602   Mean   :0.09319   Mean   :0.0022351  
##  3rd Qu.:0.37691   3rd Qu.: 0.02838   3rd Qu.:0.09580   3rd Qu.:0.0025510  
##  Max.   :0.66078   Max.   : 0.07148   Max.   :0.10923   Max.   :0.0035743  
##      GP~~GP         PER~~PER        PCT~~PCT      
##  Min.   :611.1   Min.   :19.42   Min.   :0.02203  
##  1st Qu.:638.3   1st Qu.:21.27   1st Qu.:0.02319  
##  Median :645.1   Median :21.82   Median :0.02350  
##  Mean   :645.7   Mean   :21.85   Mean   :0.02351  
##  3rd Qu.:653.2   3rd Qu.:22.39   3rd Qu.:0.02380  
##  Max.   :677.4   Max.   :24.80   Max.   :0.02511
```

---
## Important aspects: theory

- Difference between moderation, mediation and conditional process analysis 
- Exogenous and endogenous variables 
- Interpretation of the predictors 
- Calculation of free parameters and total parameters 
- Model identification: three-types of identifications 
- Overall fit of the model

---

## Important aspects: practice

- Building path model: both continous and categorical exogenous variables 
- Calculation of the direct and indirect pathways for predictors of interest 
- Adding an interaction to path model 
- Interpretation of the coefficients 
- Getting fit indices of the model

---
## Literature

Chapters 1 to 5 of Principles and Practice of Structural Equation Modeling by Rex B. Kline

Introduction to Mediation, Moderation, and Conditional Process Analysis: A Regression-Based Approach by Andrew F. Hayes

Latent Variable Modeling Using R: A Step-by-Step Guide by A. Alexander Beaujean

---

# Thank you for your attention