class: center, inverse background-image: url("40084978.jpg") # Lecture 3: Path models (Structural Equation Modelling) ### Dr Nemanja Vaci --- <style type="text/css"> body, td { font-size: 15px; } code.r{ font-size: 15px; } pre { font-size: 20px } .huge .remark-code { /*Change made here*/ font-size: 200% !important; } .tiny .remark-code { /*Change made here*/ font-size: 80% !important; } </style> ## Press record --- ## Corrections from the previous lecture --- ## R code [Link](https://nvaci.github.io/Lecture_3_code/Lecture3_Rcode.html) --- ## Structural equation modelling (SEM) General framework that uses various models to test relationships among variables <br/> Other terms: covariance structure analysis, covariance structure modelling, __causal modelling__<br/> a) Regression model <br/> - Legendre or Gauss (1805)<br/> b) Confirmatory factor analysis<br/> - Howe (1955), Anderson & Rubin (1956), Karl Joreskog (1963)<br/> - Latent factor structure - Spearman (1904, 1927) <br/> c) Path model <br/> - Sewell Wrigh (1918, 1921)<br/> d) Structural equation modelling: confirmatory factor + path model --- ## Terminology Manifest variables: observed/collected variables <br/> <br/> Latent variables: infered measures - hypothetical constructs <br/> - Indicator variables: measures used to infer the latent concepts <br/> <br/> Endogenous variables: dependent outcomes <br/> <br/> Exogenous variables: predictors <br/> <br/> <br/> Focus on covariance structure instead of mean <br/> <br/> --- ## Graphical representation of the model .center[ <img src="graphical.png", width = "120%"> <br/> ] --- ## Prerequisites Theory: Strong theoretical assumptions that could be tested using the data and specification of the model <br/><br/> Data: large samples, N:p rule - 20:1, more data usually better estimates. <br/> - We are not that interested in significance: <br/><br/> a) Overall behaviour of the model more interesting<br/><br/> b) More data higher probability of significant results (weak effects)<br/><br/> c) Latent models are estimated by anchoring on indicator variables, different estimation can result in different patterns<br/><br/> d) Not that interesting theoretically<br/> --- ## Path analysis Moderation, Mediation and Conditional Process Analysis <br/><br/> --- ## Path analysis Moderation: An association between two variables is moderated when its size or sign depends on the values of the third variable <br/><br/> .center[ <img src="moder.png", width = "80%"> <br/> ] --- ## Path analysis Mediation: Variation in X variable influences variation in one of the mediators, which in turn result in variation in Y <br/><br/> .center[ <img src="mediat.png", width = "80%"> <br/> ] --- ## Path analysis Conditional process analysis: combination of mediation and moderation <br/> .center[ <img src="CPA.png", width = "80%"> <br/> ] --- ## Different possibilities .center[ <img src="otherMod.png", width = "70%"> <br/> ] --- ## Comment on causality and correlation SEM does not test causal relationship, there are no statistical procedures that do this <br/><br/><br/> Mathematical tools that help us to understand the data, extract the signal and interpret it <br/> <br/> --- ## Lavaan in R syntax .center[ <img src="Rsyntax.png", width = "60%"> <br/> ] --- ## Linear regression in Lavaan ```r #install.packages('lavaan') require(lavaan) model1<-' Height~1+Age #regression ' fit1<-sem(model1, data=Babies) ``` .center[ <img src="Regression.png", width = "60%"> <br/> ] --- ## Summary of the model <style type="text/css"> pre { max-height: 300px; overflow-y: auto; } pre[class] { max-height: 100px; } </style> <style type="text/css"> .scroll-100 { max-height: 100px; overflow-y: auto; background-color: inherit; } </style> ```r summary(fit1) ``` ``` ## lavaan 0.6-7 ended normally after 14 iterations ## ## Estimator ML ## Optimization method NLMINB ## Number of free parameters 3 ## ## Number of observations 100 ## ## Model Test User Model: ## ## Test statistic 0.000 ## Degrees of freedom 0 ## ## Parameter Estimates: ## ## Standard errors Standard ## Information Expected ## Information saturated (h1) model Structured ## ## Regressions: ## Estimate Std.Err z-value P(>|z|) ## Height ~ ## Age 0.143 0.064 2.251 0.024 ## ## Intercepts: ## Estimate Std.Err z-value P(>|z|) ## .Height 57.026 1.176 48.509 0.000 ## ## Variances: ## Estimate Std.Err z-value P(>|z|) ## .Height 27.352 3.868 7.071 0.000 ``` --- ## Regression ```r lm1<-lm(Height~Age, data=Babies) summary(lm1) ``` ``` ## ## Call: ## lm(formula = Height ~ Age, data = Babies) ## ## Residuals: ## Min 1Q Median 3Q Max ## -14.4765 -4.1601 -0.3703 3.9198 12.3842 ## ## Coefficients: ## Estimate Std. Error t value Pr(>|t|) ## (Intercept) 57.02580 1.18751 48.021 <2e-16 *** ## Age 0.14317 0.06426 2.228 0.0282 * ## --- ## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 ## ## Residual standard error: 5.283 on 98 degrees of freedom ## Multiple R-squared: 0.04821, Adjusted R-squared: 0.0385 ## F-statistic: 4.964 on 1 and 98 DF, p-value: 0.02817 ``` --- ## Visualisation ```r #install.packages('tidySEM') require('tidySEM') graph_sem(fit1, variance_diameter=.2) ``` <img src="PathModel_files/figure-html/unnamed-chunk-7-1.png" style="display: block; margin: auto;" /> --- ## Interactions 1 We cannot add an interaction using _*_ sign as we would have in normal regression ```r model2<-' Height~1+Age*Weight ' fit2<-sem(model2, data=Babies) summary(fit2) ``` ``` ## lavaan 0.6-7 ended normally after 14 iterations ## ## Estimator ML ## Optimization method NLMINB ## Number of free parameters 3 ## ## Number of observations 100 ## ## Model Test User Model: ## ## Test statistic 0.000 ## Degrees of freedom 0 ## ## Parameter Estimates: ## ## Standard errors Standard ## Information Expected ## Information saturated (h1) model Structured ## ## Regressions: ## Estimate Std.Err z-value P(>|z|) ## Height ~ ## Weight (Age) 0.004 0.001 4.209 0.000 ## ## Intercepts: ## Estimate Std.Err z-value P(>|z|) ## .Height 41.923 4.180 10.030 0.000 ## ## Variances: ## Estimate Std.Err z-value P(>|z|) ## .Height 24.412 3.452 7.071 0.000 ``` --- ## Interactions 2 We cannot add an interaction using _*_ sign as we would have in normal regression We need to create a new variable that codes the interaction ```r Babies$AgeWeight = Babies$Age * Babies$Weight Babies$AgeGender = Babies$Age * ifelse(Babies$Gender=='Girls',0,1) head(Babies) ``` ``` ## Age Weight Height Gender Crawl TummySleep PhysicalSt AgeWeight AgeGender ## 1 4 3667.525 61.47215 Girls 0 1 32.57839 14670.10 0 ## 2 7 3871.738 56.05987 Girls 1 0 25.43127 27102.16 0 ## 3 22 4339.391 59.28653 Boys 1 1 29.91834 95466.60 22 ## 4 26 4448.422 55.17084 Boys 0 1 24.71719 115658.98 26 ## 5 24 4309.178 60.26487 Boys 1 0 30.41685 103420.28 24 ## 6 11 4365.727 54.55308 Boys 1 1 34.54519 48023.00 11 ``` --- ## Interactions 3 We cannot add an interaction using __*__ sign as we would have in normal regression We need to create a new variable that codes the interaction ```r model2<-' Height~1+Age+Weight + AgeWeight ' fit2<-sem(model2, data=Babies) summary(fit2) ``` ``` ## lavaan 0.6-7 ended normally after 22 iterations ## ## Estimator ML ## Optimization method NLMINB ## Number of free parameters 5 ## ## Number of observations 100 ## ## Model Test User Model: ## ## Test statistic 0.000 ## Degrees of freedom 0 ## ## Parameter Estimates: ## ## Standard errors Standard ## Information Expected ## Information saturated (h1) model Structured ## ## Regressions: ## Estimate Std.Err z-value P(>|z|) ## Height ~ ## Age 0.691 0.493 1.402 0.161 ## Weight 0.007 0.002 2.889 0.004 ## AgeWeight -0.000 0.000 -1.141 0.254 ## ## Intercepts: ## Estimate Std.Err z-value P(>|z|) ## .Height 30.724 9.205 3.338 0.001 ## ## Variances: ## Estimate Std.Err z-value P(>|z|) ## .Height 22.925 3.242 7.071 0.000 ``` --- ## Theory Development of muscles in early months of infancy supports physical strenght, where as the time passes infants are becoming physically stronger. Infants that experience stronger early development, measured through their height also experience higher levels of physical strenght. <br/> <br/> Hypothethical assumptions: <br/><br/> Positive effect of age on the physical activity <br/><br/> Effect of age on physical activity is mediated by Babies height <br/> --- ## Specification of the model Representation of our hypothetical assumptions in the form of the structural equation model Let's check what our Babies think: .center[ <img src="Image1.png", width = "80%"> <br/> ] --- ## Representation of the model .center[ <img src="Image2.png", width = "80%"> <br/> ] --- ## Estimation of the model ```r modelStrength<-' Height~Age PhysicalSt~Age+Height ' fitStr1<-sem(modelStrength, data=Babies) summary(fitStr1) ``` ``` ## lavaan 0.6-7 ended normally after 15 iterations ## ## Estimator ML ## Optimization method NLMINB ## Number of free parameters 5 ## ## Number of observations 100 ## ## Model Test User Model: ## ## Test statistic 0.000 ## Degrees of freedom 0 ## ## Parameter Estimates: ## ## Standard errors Standard ## Information Expected ## Information saturated (h1) model Structured ## ## Regressions: ## Estimate Std.Err z-value P(>|z|) ## Height ~ ## Age 0.143 0.064 2.251 0.024 ## PhysicalSt ~ ## Age -0.007 0.061 -0.120 0.905 ## Height 0.425 0.094 4.518 0.000 ## ## Variances: ## Estimate Std.Err z-value P(>|z|) ## .Height 27.352 3.868 7.071 0.000 ## .PhysicalSt 24.183 3.420 7.071 0.000 ``` --- ## Visualisation of the results ```r require(semPlot) semPaths(fitStr1, 'model','est', edge.label.cex = 1.1) ``` <img src="PathModel_files/figure-html/unnamed-chunk-12-1.png" style="display: block; margin: auto;" /> --- ## What is the effect of Age? .center[ <img src="Image2.png", width = "70%"> ] Direct effect: `\(a = -0.01\)`<br/> Indirect effect: `\(b*c= 0.14 * 0.42\)`<br/> Total effect: `\(a+(b*c)=-0.01+(0.14*0.42)\)` --- ## Age effects ```r modelStrength<-' Height~b*Age PhysicalSt~a*Age+c*Height ##quantification of effects dir := a ind := b*c tot := dir+ind ' fitStr1<-sem(modelStrength, data=Babies) summary(fitStr1) ``` ``` ## lavaan 0.6-7 ended normally after 15 iterations ## ## Estimator ML ## Optimization method NLMINB ## Number of free parameters 5 ## ## Number of observations 100 ## ## Model Test User Model: ## ## Test statistic 0.000 ## Degrees of freedom 0 ## ## Parameter Estimates: ## ## Standard errors Standard ## Information Expected ## Information saturated (h1) model Structured ## ## Regressions: ## Estimate Std.Err z-value P(>|z|) ## Height ~ ## Age (b) 0.143 0.064 2.251 0.024 ## PhysicalSt ~ ## Age (a) -0.007 0.061 -0.120 0.905 ## Height (c) 0.425 0.094 4.518 0.000 ## ## Variances: ## Estimate Std.Err z-value P(>|z|) ## .Height 27.352 3.868 7.071 0.000 ## .PhysicalSt 24.183 3.420 7.071 0.000 ## ## Defined Parameters: ## Estimate Std.Err z-value P(>|z|) ## dir -0.007 0.061 -0.120 0.905 ## ind 0.061 0.030 2.015 0.044 ## tot 0.053 0.066 0.815 0.415 ``` --- ## More pathways <img src="image4.png" width="70%" style="display: block; margin: auto;" /> --- ## Lets modify model ```r modelStrength2<-' Height~b*Age Weight~e*Age PhysicalSt~a*Age+c*Height+f*Weight ##quantification of effects dir := a ind := b*c+e*f tot := dir+ind ' ``` --- ## Results ```r fitStr2<-sem(modelStrength2, data=Babies) ``` ``` ## Warning in lav_data_full(data = data, group = group, cluster = cluster, : lavaan ## WARNING: some observed variances are (at least) a factor 1000 times larger than ## others; use varTable(fit) to investigate ``` ```r summary(fitStr2) ``` ``` ## lavaan 0.6-7 ended normally after 32 iterations ## ## Estimator ML ## Optimization method NLMINB ## Number of free parameters 8 ## ## Number of observations 100 ## ## Model Test User Model: ## ## Test statistic 16.364 ## Degrees of freedom 1 ## P-value (Chi-square) 0.000 ## ## Parameter Estimates: ## ## Standard errors Standard ## Information Expected ## Information saturated (h1) model Structured ## ## Regressions: ## Estimate Std.Err z-value P(>|z|) ## Height ~ ## Age (b) 0.143 0.064 2.251 0.024 ## Weight ~ ## Age (e) 2.439 5.774 0.422 0.673 ## PhysicalSt ~ ## Age (a) -0.005 0.061 -0.075 0.940 ## Height (c) 0.387 0.094 4.138 0.000 ## Weight (f) 0.001 0.001 1.032 0.302 ## ## Variances: ## Estimate Std.Err z-value P(>|z|) ## .Height 27.352 3.868 7.071 0.000 ## .Weight 225306.345 31863.129 7.071 0.000 ## .PhysicalSt 23.966 3.389 7.071 0.000 ## ## Defined Parameters: ## Estimate Std.Err z-value P(>|z|) ## dir -0.005 0.061 -0.075 0.940 ## ind 0.058 0.029 2.014 0.044 ## tot 0.053 0.065 0.826 0.409 ``` --- ## Additional information ```r parameterestimates(fitStr2, boot.ci.type ='bca.simple', standardized = T) ``` ``` ## lhs op rhs label est se z pvalue ci.lower ## 1 Height ~ Age b 0.143 0.064 2.251 0.024 0.018 ## 2 Weight ~ Age e 2.439 5.774 0.422 0.673 -8.878 ## 3 PhysicalSt ~ Age a -0.005 0.061 -0.075 0.940 -0.124 ## 4 PhysicalSt ~ Height c 0.387 0.094 4.138 0.000 0.204 ## 5 PhysicalSt ~ Weight f 0.001 0.001 1.032 0.302 -0.001 ## 6 Height ~~ Height 27.352 3.868 7.071 0.000 19.771 ## 7 Weight ~~ Weight 225306.345 31863.129 7.071 0.000 162855.760 ## 8 PhysicalSt ~~ PhysicalSt 23.966 3.389 7.071 0.000 17.323 ## 9 Age ~~ Age 67.588 0.000 NA NA 67.588 ## 10 dir := a dir -0.005 0.061 -0.075 0.940 -0.124 ## 11 ind := b*c+e*f ind 0.058 0.029 2.014 0.044 0.002 ## 12 tot := dir+ind tot 0.053 0.065 0.826 0.409 -0.073 ## ci.upper std.lv std.all std.nox ## 1 0.268 0.143 0.220 0.027 ## 2 13.755 2.439 0.042 0.005 ## 3 0.115 -0.005 -0.007 -0.001 ## 4 0.571 0.387 0.389 0.389 ## 5 0.003 0.001 0.095 0.095 ## 6 34.933 27.352 0.952 0.952 ## 7 287756.930 225306.345 0.998 0.998 ## 8 30.609 23.966 0.840 0.840 ## 9 67.588 67.588 1.000 67.588 ## 10 0.115 -0.005 -0.007 -0.001 ## 11 0.115 0.058 0.089 0.011 ## 12 0.180 0.053 0.082 0.010 ``` --- ## Categorical variables: exogenous <img src="image5.png" width="70%" style="display: block; margin: auto;" /> --- ## Results: categorical predictor exogenous ```r Babies$Gender=ifelse(Babies$Gender=='Girls',0,1) modelStrength3<-' Height~Age PhysicalSt~Age+Height+Gender ' fitStr3<-sem(modelStrength3, data=Babies) summary(fitStr3) ``` ``` ## lavaan 0.6-7 ended normally after 21 iterations ## ## Estimator ML ## Optimization method NLMINB ## Number of free parameters 6 ## ## Number of observations 100 ## ## Model Test User Model: ## ## Test statistic 0.630 ## Degrees of freedom 1 ## P-value (Chi-square) 0.427 ## ## Parameter Estimates: ## ## Standard errors Standard ## Information Expected ## Information saturated (h1) model Structured ## ## Regressions: ## Estimate Std.Err z-value P(>|z|) ## Height ~ ## Age 0.143 0.064 2.251 0.024 ## PhysicalSt ~ ## Age -0.007 0.061 -0.116 0.907 ## Height 0.426 0.094 4.526 0.000 ## Gender 0.090 0.986 0.091 0.927 ## ## Variances: ## Estimate Std.Err z-value P(>|z|) ## .Height 27.352 3.868 7.071 0.000 ## .PhysicalSt 24.181 3.420 7.071 0.000 ``` --- ## Categorical variables: endogenous If it is endogenous (being predicted), then we need to specify this as a categorical variable and use different estimator (WLSMV) <img src="image6.png" width="70%" style="display: block; margin: auto;" /> --- ## Results: categorical predictor 2 ```r modelStrength4<-' Height~Age Gender~Age PhysicalSt~Age+Height+Gender ' fitStr4<-sem(modelStrength4, ordered = c('Gender'),data=Babies) summary(fitStr4) ``` ``` ## lavaan 0.6-7 ended normally after 32 iterations ## ## Estimator DWLS ## Optimization method NLMINB ## Number of free parameters 10 ## ## Number of observations 100 ## ## Model Test User Model: ## Standard Robust ## Test Statistic 0.666 0.666 ## Degrees of freedom 1 1 ## P-value (Chi-square) 0.415 0.415 ## Scaling correction factor 1.000 ## Shift parameter -0.000 ## simple second-order correction ## ## Parameter Estimates: ## ## Standard errors Robust.sem ## Information Expected ## Information saturated (h1) model Unstructured ## ## Regressions: ## Estimate Std.Err z-value P(>|z|) ## Height ~ ## Age 0.143 0.060 2.397 0.017 ## Gender ~ ## Age -0.008 0.015 -0.545 0.586 ## PhysicalSt ~ ## Age -0.009 0.071 -0.123 0.902 ## Height 0.425 0.098 4.322 0.000 ## Gender -0.164 0.663 -0.248 0.804 ## ## Intercepts: ## Estimate Std.Err z-value P(>|z|) ## .Height 57.026 1.172 48.649 0.000 ## .Gender 0.000 ## .PhysicalSt 5.678 5.801 0.979 0.328 ## ## Thresholds: ## Estimate Std.Err z-value P(>|z|) ## Gender|t1 -0.189 0.284 -0.666 0.505 ## ## Variances: ## Estimate Std.Err z-value P(>|z|) ## .Height 27.352 4.513 6.061 0.000 ## .Gender 1.000 ## .PhysicalSt 24.156 3.551 6.803 0.000 ## ## Scales y*: ## Estimate Std.Err z-value P(>|z|) ## Gender 1.000 ``` --- ## Conditional process analysis ```r modelStrengthCond<-' Height~Age PhysicalSt~Age+Height+AgeGender ' ``` <img src="image7.png" width="70%" style="display: block; margin: auto;" /> --- ## Conditional process analysis: results ```r fitStrCond<-sem(modelStrengthCond, data=Babies) summary(fitStrCond) ``` ``` ## lavaan 0.6-7 ended normally after 14 iterations ## ## Estimator ML ## Optimization method NLMINB ## Number of free parameters 6 ## ## Number of observations 100 ## ## Model Test User Model: ## ## Test statistic 1.628 ## Degrees of freedom 1 ## P-value (Chi-square) 0.202 ## ## Parameter Estimates: ## ## Standard errors Standard ## Information Expected ## Information saturated (h1) model Structured ## ## Regressions: ## Estimate Std.Err z-value P(>|z|) ## Height ~ ## Age 0.143 0.064 2.251 0.024 ## PhysicalSt ~ ## Age -0.008 0.066 -0.128 0.898 ## Height 0.425 0.094 4.524 0.000 ## AgeGender 0.002 0.053 0.043 0.966 ## ## Variances: ## Estimate Std.Err z-value P(>|z|) ## .Height 27.352 3.868 7.071 0.000 ## .PhysicalSt 24.182 3.420 7.071 0.000 ``` --- ## Intepretation of the predictors ``` ## lavaan 0.6-7 ended normally after 15 iterations ## ## Estimator ML ## Optimization method NLMINB ## Number of free parameters 5 ## ## Number of observations 100 ## ## Model Test User Model: ## ## Test statistic 0.000 ## Degrees of freedom 0 ## ## Parameter Estimates: ## ## Standard errors Standard ## Information Expected ## Information saturated (h1) model Structured ## ## Regressions: ## Estimate Std.Err z-value P(>|z|) ## Height ~ ## Age (b) 0.143 0.064 2.251 0.024 ## PhysicalSt ~ ## Age (a) -0.007 0.061 -0.120 0.905 ## Height (c) 0.425 0.094 4.518 0.000 ## ## Variances: ## Estimate Std.Err z-value P(>|z|) ## .Height 27.352 3.868 7.071 0.000 ## .PhysicalSt 24.183 3.420 7.071 0.000 ## ## Defined Parameters: ## Estimate Std.Err z-value P(>|z|) ## dir -0.007 0.061 -0.120 0.905 ## ind 0.061 0.030 2.015 0.044 ## tot 0.053 0.066 0.815 0.415 ``` Model predicts that: 1 month older Babies are in average taller by 0.143 <br/><br/> Comparing babies that have same height, but are 1 month older model predicts that they are on average weaker by 0.007 <br/> Comparing babies that have same age, but are 1 cm taller model predicts that they are stronger by 0.425 <br/> Indirect effect: for every __b__ (0.143) unit increase in the association between age and height, there is an __ind__ (0.061) increase in strenght of the babies --- ## Can model be estimated? Total Number of the parameters that we can estimate: `\(\frac{variables*(variables+1)}{2}\)` <br/> .center[ <img src="Image2.png", width = "60%"> ] Three path coefficients <br/> Two error variances <br/> One independent variable variance <br/> --- ## Model identification 1. Underindentified: more free parameters than total possible parameters <br/> <br/> 2. Just-identified: equal number of free parameters and total possible parameters <br/><br/> 3. Overidentified: fewer free parameters than total possible parameters <br/> <br/> <br/> Parameters can either be: free, fixed or constrained <br/> --- ## Fixing parameters ```r modelStrengthFix<-' Height~Age PhysicalSt~0 *Age+Height ' fitStr1Fix<-sem(modelStrengthFix, data=Babies) summary(fitStr1Fix) ``` ``` ## lavaan 0.6-7 ended normally after 18 iterations ## ## Estimator ML ## Optimization method NLMINB ## Number of free parameters 4 ## ## Number of observations 100 ## ## Model Test User Model: ## ## Test statistic 0.014 ## Degrees of freedom 1 ## P-value (Chi-square) 0.905 ## ## Parameter Estimates: ## ## Standard errors Standard ## Information Expected ## Information saturated (h1) model Structured ## ## Regressions: ## Estimate Std.Err z-value P(>|z|) ## Height ~ ## Age 0.143 0.064 2.251 0.024 ## PhysicalSt ~ ## Age 0.000 ## Height 0.422 0.092 4.604 0.000 ## ## Variances: ## Estimate Std.Err z-value P(>|z|) ## .Height 27.352 3.868 7.071 0.000 ## .PhysicalSt 24.186 3.420 7.071 0.000 ``` --- ## Constraining parameters ```r modelStrengthCons<-' Height~a*Age PhysicalSt~a*Age+Height ' fitStr1Cons<-sem(modelStrengthCons, data=Babies) summary(fitStr1Cons) ``` ``` ## lavaan 0.6-7 ended normally after 14 iterations ## ## Estimator ML ## Optimization method NLMINB ## Number of free parameters 5 ## Number of equality constraints 1 ## ## Number of observations 100 ## ## Model Test User Model: ## ## Test statistic 2.882 ## Degrees of freedom 1 ## P-value (Chi-square) 0.090 ## ## Parameter Estimates: ## ## Standard errors Standard ## Information Expected ## Information saturated (h1) model Structured ## ## Regressions: ## Estimate Std.Err z-value P(>|z|) ## Height ~ ## Age (a) 0.065 0.044 1.479 0.139 ## PhysicalSt ~ ## Age (a) 0.065 0.044 1.479 0.139 ## Height 0.400 0.094 4.271 0.000 ## ## Variances: ## Estimate Std.Err z-value P(>|z|) ## .Height 27.764 3.926 7.071 0.000 ## .PhysicalSt 24.520 3.468 7.071 0.000 ``` --- ## Overall fit: Chi-square test Measure of how well model-implied covariance matrix fits data covariance <br/> <br/> We would prefer not to reject the null hypothesis in this case <br/> Assumptions: <br/> Multivariate normality <br/> N is sufficiently large (150+)<br/> Parameters are not at boundary or invalid (e.g. variance of zero)<br/><br/><br/> With the large samples it is sensitive to small misfits <br/> Nonormality induces bias <br/> --- ## Overall fit: Other indices .center[ <img src="fitInd.png", width = "60%"> ] --- ## Fit indices of our model ```r summary(fitStr1Cons, fit.measures=TRUE) ``` ``` ## lavaan 0.6-7 ended normally after 14 iterations ## ## Estimator ML ## Optimization method NLMINB ## Number of free parameters 5 ## Number of equality constraints 1 ## ## Number of observations 100 ## ## Model Test User Model: ## ## Test statistic 2.882 ## Degrees of freedom 1 ## P-value (Chi-square) 0.090 ## ## Model Test Baseline Model: ## ## Test statistic 24.179 ## Degrees of freedom 3 ## P-value 0.000 ## ## User Model versus Baseline Model: ## ## Comparative Fit Index (CFI) 0.911 ## Tucker-Lewis Index (TLI) 0.733 ## ## Loglikelihood and Information Criteria: ## ## Loglikelihood user model (H0) -609.950 ## Loglikelihood unrestricted model (H1) -608.509 ## ## Akaike (AIC) 1227.899 ## Bayesian (BIC) 1238.320 ## Sample-size adjusted Bayesian (BIC) 1225.687 ## ## Root Mean Square Error of Approximation: ## ## RMSEA 0.137 ## 90 Percent confidence interval - lower 0.000 ## 90 Percent confidence interval - upper 0.334 ## P-value RMSEA <= 0.05 0.130 ## ## Standardized Root Mean Square Residual: ## ## SRMR 0.056 ## ## Parameter Estimates: ## ## Standard errors Standard ## Information Expected ## Information saturated (h1) model Structured ## ## Regressions: ## Estimate Std.Err z-value P(>|z|) ## Height ~ ## Age (a) 0.065 0.044 1.479 0.139 ## PhysicalSt ~ ## Age (a) 0.065 0.044 1.479 0.139 ## Height 0.400 0.094 4.271 0.000 ## ## Variances: ## Estimate Std.Err z-value P(>|z|) ## .Height 27.764 3.926 7.071 0.000 ## .PhysicalSt 24.520 3.468 7.071 0.000 ``` --- ## Fit indices for our model TLI: fit of .95 indicates that the fitted model improves the fit by 95% relative to the null mode, works OK with smaller sample sizes <br/> <br/> CFI: Same as TLI, but not very sensitive to sample size <br/> <br/> RMSEA: difference between the residuals of the sample covariance matrix and hypothesized model. If we have different scales it is hard to interpret, then we can check standardised root mean square residual (SRMR)<br/><br/> --- class: inverse, middle, center # Practical aspect --- ## Getting the data Influence of the media upon subsequent actions: [Link](http://finzi.psych.upenn.edu/library/psych/html/tal_or.html) ```r NBAPath<-read.table('NBApath.txt', sep='\t', header=T) ``` --- ## What is in the data? ```r summary(NBAPath) ``` ``` ## TEAM PCT Player Pos ## Length:3810 Min. :0.1061 Length:3810 Length:3810 ## Class :character 1st Qu.:0.3780 Class :character Class :character ## Mode :character Median :0.5000 Mode :character Mode :character ## Mean :0.4905 ## 3rd Qu.:0.6098 ## Max. :0.8902 ## Age GP PER ## Min. :18.00 Min. : 1.00 Min. :-13.10 ## 1st Qu.:23.00 1st Qu.:34.00 1st Qu.: 10.00 ## Median :25.00 Median :61.00 Median : 12.80 ## Mean :26.05 Mean :53.73 Mean : 12.75 ## 3rd Qu.:29.00 3rd Qu.:77.00 3rd Qu.: 15.80 ## Max. :43.00 Max. :82.00 Max. : 35.20 ``` --- ## Correlation matrix ```r cor(NBAPath[,c(2,5:7)]) ``` ``` ## PCT Age GP PER ## PCT 1.00000000 0.14304325 0.08849459 0.07720633 ## Age 0.14304325 1.00000000 0.05170204 0.03598025 ## GP 0.08849459 0.05170204 1.00000000 0.45360129 ## PER 0.07720633 0.03598025 0.45360129 1.00000000 ``` --- ## Univariate plots ```r par(mfrow=c(1,2), bty='n',mar = c(5, 4, .1, .1), cex=1.1, pch=16) plot(density(NBAPath$PER), main='') plot(density(NBAPath$PCT), main='') ``` <img src="PathModel_files/figure-html/unnamed-chunk-32-1.png" style="display: block; margin: auto;" /> --- ## Bivariate plots ```r par(mfrow=c(1,2), bty='n',mar = c(5, 4, .1, .1), cex=1.1, pch=16) plot(NBAPath$Age, NBAPath$PER) plot(NBAPath$GP, NBAPath$PER) ``` <img src="PathModel_files/figure-html/unnamed-chunk-33-1.png" style="display: block; margin: auto;" /> --- ## Specification of the model <img src="image8.png" width="70%" style="display: block; margin: auto;" /> --- ## Identification of the model Three path coefficients <br/> Two error variances <br/> One independent variable variance <br/><br/><br/> Number of distinct parameters that we can estimate: 3*4/2 = 6<br/><br/> Just identified model<br/> --- ## Estimating the model ```r NBAmod1<-' GP~b*Age PER~a*Age+c*GP dir := a ind := b*c tot := dir + ind ' NBAfit1<-sem(NBAmod1, data=NBAPath) summary(NBAfit1) ``` ``` ## lavaan 0.6-7 ended normally after 21 iterations ## ## Estimator ML ## Optimization method NLMINB ## Number of free parameters 5 ## ## Number of observations 3810 ## ## Model Test User Model: ## ## Test statistic 0.000 ## Degrees of freedom 0 ## ## Parameter Estimates: ## ## Standard errors Standard ## Information Expected ## Information saturated (h1) model Structured ## ## Regressions: ## Estimate Std.Err z-value P(>|z|) ## GP ~ ## Age (b) 0.315 0.098 3.196 0.001 ## PER ~ ## Age (a) 0.016 0.018 0.869 0.385 ## GP (c) 0.093 0.003 31.333 0.000 ## ## Variances: ## Estimate Std.Err z-value P(>|z|) ## .GP 645.883 14.798 43.646 0.000 ## .PER 21.834 0.500 43.646 0.000 ## ## Defined Parameters: ## Estimate Std.Err z-value P(>|z|) ## dir 0.016 0.018 0.869 0.385 ## ind 0.029 0.009 3.179 0.001 ## tot 0.045 0.020 2.222 0.026 ``` --- ## Explained variance - R2 When just identified model, we cannot use global indices of model fit <br/> We need to use standard measures <br/> ```r inspect(NBAfit1, 'r2') ``` ``` ## GP PER ## 0.003 0.206 ``` ```r -2*logLik(NBAfit1) ``` ``` ## 'log Lik.' 58025.67 (df=5) ``` ```r AIC(NBAfit1) ``` ``` ## [1] 58035.67 ``` --- ## Respecification of the model <img src="image9.png" width="70%" style="display: block; margin: auto;" /> --- ## Estimating the model ```r NBAmod2<-' GP~b*Age PER~c*GP ind := b*c ' NBAfit2<-sem(NBAmod2, data=NBAPath) summary(NBAfit2, fit.measures=T) ``` ``` ## lavaan 0.6-7 ended normally after 21 iterations ## ## Estimator ML ## Optimization method NLMINB ## Number of free parameters 4 ## ## Number of observations 3810 ## ## Model Test User Model: ## ## Test statistic 0.755 ## Degrees of freedom 1 ## P-value (Chi-square) 0.385 ## ## Model Test Baseline Model: ## ## Test statistic 888.633 ## Degrees of freedom 3 ## P-value 0.000 ## ## User Model versus Baseline Model: ## ## Comparative Fit Index (CFI) 1.000 ## Tucker-Lewis Index (TLI) 1.001 ## ## Loglikelihood and Information Criteria: ## ## Loglikelihood user model (H0) -29013.211 ## Loglikelihood unrestricted model (H1) -29012.833 ## ## Akaike (AIC) 58034.422 ## Bayesian (BIC) 58059.403 ## Sample-size adjusted Bayesian (BIC) 58046.693 ## ## Root Mean Square Error of Approximation: ## ## RMSEA 0.000 ## 90 Percent confidence interval - lower 0.000 ## 90 Percent confidence interval - upper 0.041 ## P-value RMSEA <= 0.05 0.987 ## ## Standardized Root Mean Square Residual: ## ## SRMR 0.005 ## ## Parameter Estimates: ## ## Standard errors Standard ## Information Expected ## Information saturated (h1) model Structured ## ## Regressions: ## Estimate Std.Err z-value P(>|z|) ## GP ~ ## Age (b) 0.315 0.098 3.196 0.001 ## PER ~ ## GP (c) 0.093 0.003 31.417 0.000 ## ## Variances: ## Estimate Std.Err z-value P(>|z|) ## .GP 645.883 14.798 43.646 0.000 ## .PER 21.838 0.500 43.646 0.000 ## ## Defined Parameters: ## Estimate Std.Err z-value P(>|z|) ## ind 0.029 0.009 3.179 0.001 ``` --- ## Model comparison ```r #install.packages('semTools') require(semTools) diff<-compareFit(NBAfit1, NBAfit2) summary(diff) ``` ``` ## ################### Nested Model Comparison ######################### ## Chi-Squared Difference Test ## ## Df AIC BIC Chisq Chisq diff Df diff Pr(>Chisq) ## NBAfit1 0 58036 58067 0.000 ## NBAfit2 1 58034 58059 0.755 0.755 1 0.3849 ## ## ####################### Model Fit Indices ########################### ## chisq df pvalue cfi tli aic bic rmsea srmr ## NBAfit1 .000† NA 1.000† 1.000 58035.667 58066.894 .000† .000† ## NBAfit2 .755 1 .385 1.000† 1.001† 58034.422† 58059.403† .000† .005 ## ## ################## Differences in Fit Indices ####################### ## df cfi tli aic bic rmsea srmr ## NBAfit2 - NBAfit1 1 0 0.001 -1.245 -7.49 0 0.005 ``` --- ## Respecification of the model <img src="image10.png" width="70%" style="display: block; margin: auto;" /> --- ## Estimating the model ```r NBAmod3<-' GP~b*Age PER~a*Age+c*GP PCT~d*PER ind1 := b*c*d ind2 := a*d tot := ind1 + ind2 ' NBAfit3<-sem(NBAmod3, data=NBAPath) summary(NBAfit3, fit.measures=T) ``` ``` ## lavaan 0.6-7 ended normally after 30 iterations ## ## Estimator ML ## Optimization method NLMINB ## Number of free parameters 7 ## ## Number of observations 3810 ## ## Model Test User Model: ## ## Test statistic 87.884 ## Degrees of freedom 2 ## P-value (Chi-square) 0.000 ## ## Model Test Baseline Model: ## ## Test statistic 999.296 ## Degrees of freedom 6 ## P-value 0.000 ## ## User Model versus Baseline Model: ## ## Comparative Fit Index (CFI) 0.914 ## Tucker-Lewis Index (TLI) 0.741 ## ## Loglikelihood and Information Criteria: ## ## Loglikelihood user model (H0) -27272.876 ## Loglikelihood unrestricted model (H1) -27228.934 ## ## Akaike (AIC) 54559.752 ## Bayesian (BIC) 54603.469 ## Sample-size adjusted Bayesian (BIC) 54581.227 ## ## Root Mean Square Error of Approximation: ## ## RMSEA 0.106 ## 90 Percent confidence interval - lower 0.088 ## 90 Percent confidence interval - upper 0.126 ## P-value RMSEA <= 0.05 0.000 ## ## Standardized Root Mean Square Residual: ## ## SRMR 0.047 ## ## Parameter Estimates: ## ## Standard errors Standard ## Information Expected ## Information saturated (h1) model Structured ## ## Regressions: ## Estimate Std.Err z-value P(>|z|) ## GP ~ ## Age (b) 0.315 0.098 3.196 0.001 ## PER ~ ## Age (a) 0.016 0.018 0.869 0.385 ## GP (c) 0.093 0.003 31.333 0.000 ## PCT ~ ## PER (d) 0.002 0.000 4.780 0.000 ## ## Variances: ## Estimate Std.Err z-value P(>|z|) ## .GP 645.883 14.798 43.646 0.000 ## .PER 21.834 0.500 43.646 0.000 ## .PCT 0.023 0.001 43.646 0.000 ## ## Defined Parameters: ## Estimate Std.Err z-value P(>|z|) ## ind1 0.000 0.000 2.647 0.008 ## ind2 0.000 0.000 0.855 0.393 ## tot 0.000 0.000 2.015 0.044 ``` --- ## Parameter estimates ```r parameterestimates(NBAfit3, boot.ci.type ='bca.simple', standardized = T) ``` ``` ## lhs op rhs label est se z pvalue ci.lower ci.upper ## 1 GP ~ Age b 0.315 0.098 3.196 0.001 0.122 0.507 ## 2 PER ~ Age a 0.016 0.018 0.869 0.385 -0.020 0.051 ## 3 PER ~ GP c 0.093 0.003 31.333 0.000 0.087 0.099 ## 4 PCT ~ PER d 0.002 0.000 4.780 0.000 0.001 0.003 ## 5 GP ~~ GP 645.883 14.798 43.646 0.000 616.879 674.887 ## 6 PER ~~ PER 21.834 0.500 43.646 0.000 20.853 22.814 ## 7 PCT ~~ PCT 0.023 0.001 43.646 0.000 0.022 0.025 ## 8 Age ~~ Age 17.498 0.000 NA NA 17.498 17.498 ## 9 ind1 := b*c*d ind1 0.000 0.000 2.647 0.008 0.000 0.000 ## 10 ind2 := a*d ind2 0.000 0.000 0.855 0.393 0.000 0.000 ## 11 tot := ind1+ind2 tot 0.000 0.000 2.015 0.044 0.000 0.000 ## std.lv std.all std.nox ## 1 0.315 0.052 0.012 ## 2 0.016 0.013 0.003 ## 3 0.093 0.453 0.453 ## 4 0.002 0.077 0.077 ## 5 645.883 0.997 0.997 ## 6 21.834 0.794 0.794 ## 7 0.023 0.994 0.994 ## 8 17.498 1.000 17.498 ## 9 0.000 0.002 0.000 ## 10 0.000 0.001 0.000 ## 11 0.000 0.003 0.001 ``` --- ## Model building .center[ <img src="Loop.png", width = "50%"> ] --- ## Bootstrapping our model ```r #install.packages('bootstrap') require(bootstrap) ``` ``` ## Loading required package: bootstrap ``` ``` ## Warning: package 'bootstrap' was built under R version 4.0.3 ``` ```r boot<-bootstrapLavaan(NBAfit3, R=1000) summary(boot) ``` ``` ## b a c d ## Min. :0.04561 Min. :-0.03977 Min. :0.08148 Min. :0.0009711 ## 1st Qu.:0.25156 1st Qu.: 0.00430 1st Qu.:0.09061 1st Qu.:0.0019390 ## Median :0.31844 Median : 0.01583 Median :0.09330 Median :0.0022062 ## Mean :0.31566 Mean : 0.01602 Mean :0.09319 Mean :0.0022351 ## 3rd Qu.:0.37691 3rd Qu.: 0.02838 3rd Qu.:0.09580 3rd Qu.:0.0025510 ## Max. :0.66078 Max. : 0.07148 Max. :0.10923 Max. :0.0035743 ## GP~~GP PER~~PER PCT~~PCT ## Min. :611.1 Min. :19.42 Min. :0.02203 ## 1st Qu.:638.3 1st Qu.:21.27 1st Qu.:0.02319 ## Median :645.1 Median :21.82 Median :0.02350 ## Mean :645.7 Mean :21.85 Mean :0.02351 ## 3rd Qu.:653.2 3rd Qu.:22.39 3rd Qu.:0.02380 ## Max. :677.4 Max. :24.80 Max. :0.02511 ``` --- ## Important aspects: theory - Difference between moderation, mediation and conditional process analysis <br/> - Exogenous and endogenous variables <br/> - Interpretation of the predictors <br/> - Calculation of free parameters and total parameters <br/> - Model identification: three-types of identifications <br/> - Overall fit of the model --- ## Important aspects: practice - Building path model: both continous and categorical exogenous variables <br/> - Calculation of the direct and indirect pathways for predictors of interest <br/> - Adding an interaction to path model <br/> - Interpretation of the coefficients <br/> - Getting fit indices of the model <br/> --- ## Literature Chapters 1 to 5 of Principles and Practice of Structural Equation Modeling by Rex B. Kline <br/><br/> Introduction to Mediation, Moderation, and Conditional Process Analysis: A Regression-Based Approach by Andrew F. Hayes <br/><br/> Latent Variable Modeling Using R: A Step-by-Step Guide by A. Alexander Beaujean <br/><br/> --- # Thank you for your attention