class: center, middle, inverse, title-slide .title[ # Structural Equation Modelling (SEM) ] .author[ ### Nemanja Vaci ] .date[ ### 2023-03-23 ] --- <style type="text/css"> body, td { font-size: 15px; } code.r{ font-size: 15px; } pre { font-size: 20px } .huge .remark-code { /*Change made here*/ font-size: 200% !important; } .tiny .remark-code { /*Change made here*/ font-size: 80% !important; } --- </style> ## Press record --- ## Intended learning outcomes Motivate utilisation of path and CFA models <br/> Argue how they connect to other models that we covered at the course. <br/><br/> Calculate number of free parameters and degrees of freedom of the proposed model. <br/><br/> Build a model in R statistical environment, estimate, and interpret the coefficients. <br/><br/> Criticise, modify, compare, and evaluate the fit of the proposed models. --- ## Structural equation modelling (SEM) General framework that uses various models to test relationships among variables <br/> Other terms: covariance structure analysis, covariance structure modelling, __causal modelling__<br/> Sewell Wright - "mathematical tool for drawing __causal__ conclusions from a combination of of observational data and __theoretical assumptions__" Waves: 1. Latent structures - factor analysis <br/> 2. Causal modelling through path models <br/> 3. Structural causal models <br/> <br/><br/> SEM is a general modelling framework that is composed of measurement model and the structural model. ??? Judea Pearl - [The Causal Foundations of Structural Equation Modeling](https://ftp.cs.ucla.edu/pub/stat_ser/r370.pdf) Measurement model focuses on the estimation of latent or composite variables <br/> Structural model focuses on the estimation of relations between manifest and/or latent variables in the model (path model) <br/> Terminology: <br/> Manifest variables: observed/collected variables <br/> <br/> Latent variables: infered measures - hypothetical constructs <br/> - Indicator variables: measures used to infer the latent concepts <br/> <br/> Endogenous variables: dependent outcomes <br/> <br/> Exogenous variables: predictors <br/> <br/> <br/> Focus on covariance structure instead of mean <br/> <br/> --- class: inverse, middle, center # Latent structures --- ## Latent space of measures -- Principal Component Analysis (PCA) <br/><br/> Exploratory Factor Analysis (EFA) <br/><br/> Confirmatory Factor Analysis (CFA) ??? Differences between PCA and EFA:<br/><br/> [Link 1](https://stats.stackexchange.com/a/95106)<br/><br/> [Link 2](https://stats.stackexchange.com/a/288646) --- ## Exploratory factor analysis (EFA) Multivariate statistical procedure (Spearman): understanding and accounting for variation and covariation among of set of observed variables by postulating __latent__ structures (factors)<br/><br/> Factor: unobservable variable that influences more than one observed measure and accounts for their intercorrelation <br/><br/> If we partial out latent construct then intercorrelations would be zero <br/><br/> Factor analysis decomposes variance: __a) common variance (communality)__ and __b) unique variance__ ??? Thourough example of EFA in R: https://psu-psychology.github.io/psy-597-SEM/06_factor_models/factor_models.html#overview --- ## EFA versus CFA Reproduce observer relationships between measured variables with smaller number of latent factors <br/><br/> EFA is data-driven approach: weak or no assumptions on a number of latent dimensions and factor loadings (relations between indicators and factors) <br/> <br/> CFA is theory-driven approach: strong assumptions for both things <br/><br/> EFA is used earlier in the process of questionnaire development and construct validation --- ## Factor model <img src="image1.png" width="90%" style="display: block; margin: auto;" /> --- ## Factor or measurement model Is linear regression where the main predictor is latent or unobserved: <br/> `$$y=\tau+\lambda*\eta+\epsilon$$`<br/><br/> `\(y_1=\tau_1+\lambda_1*\eta+\epsilon_1\)`<br/> `\(y_2=\tau_2+\lambda_2*\eta+\epsilon_2\)`<br/> `\(y_3=\tau_3+\lambda_3*\eta+\epsilon_3\)`<br/><br/> `\(\tau\)` - the item intercepts or means<br/> `\(\lambda\)` - factor loadings - regression coefficients <br/> `\(\epsilon\)` - error variances and covariances <br/> `\(\eta\)` - the latent predictor of the items<br/> `\(\psi\)` - factor variances and covariances <br/> --- ## Exploratory factor model <img src="image2.png" width="90%" style="display: block; margin: auto;" /> --- ## Confirmatory factor model <img src="image3.png" width="90%" style="display: block; margin: auto;" /> --- ## Defining latent variables LVs are not measured, however we can still infer them from the observed data. To be able to do so, we need to define their scale: 1. Marker variable: single factor loading constraint to 1 <br/><br/> 2. Standardized latent variables: setting variance of variable to 1 (Z-score) <br/><br/> 3. Effects-coding: constraints that all of the loadings to one LV average 1.0 or that their sum is equal to number of indicators ??? https://www.researchgate.net/publication/255606342_A_Non-arbitrary_Method_of_Identifying_and_Scaling_Latent_Variables_in_SEM_and_MACS_Models --- ## Indicator variable and Standardizing LVs <img src="IndVar.png" width="70%" style="display: block; margin: auto;" /> --- ## Effect coding <img src="Effect.png" width="40%" style="display: block; margin: auto;" /> --- ## First step: Specification of the model Total number of parameters that we can estimate: `\(\frac{variables*(variables+1)}{2}\)` <br/> <br/> <br/> ```r Matrix<-cov(vars) Matrix[upper.tri(Matrix)]<-NA kableExtra::kable(Matrix, format = 'html', digits = 3, align='l') %>% kableExtra::kable_styling(font_size = 14) ``` <table class="table" style="font-size: 14px; margin-left: auto; margin-right: auto;"> <thead> <tr> <th style="text-align:left;"> </th> <th style="text-align:left;"> TimeOnTummy </th> <th style="text-align:left;"> PreciseLegMoves </th> <th style="text-align:left;"> PreciseHandMoves </th> <th style="text-align:left;"> Babbling </th> <th style="text-align:left;"> Screeching </th> <th style="text-align:left;"> VocalImitation </th> </tr> </thead> <tbody> <tr> <td style="text-align:left;"> TimeOnTummy </td> <td style="text-align:left;"> 24.802 </td> <td style="text-align:left;"> NA </td> <td style="text-align:left;"> NA </td> <td style="text-align:left;"> NA </td> <td style="text-align:left;"> NA </td> <td style="text-align:left;"> NA </td> </tr> <tr> <td style="text-align:left;"> PreciseLegMoves </td> <td style="text-align:left;"> 8.822 </td> <td style="text-align:left;"> 22.819 </td> <td style="text-align:left;"> NA </td> <td style="text-align:left;"> NA </td> <td style="text-align:left;"> NA </td> <td style="text-align:left;"> NA </td> </tr> <tr> <td style="text-align:left;"> PreciseHandMoves </td> <td style="text-align:left;"> 10.853 </td> <td style="text-align:left;"> 9.267 </td> <td style="text-align:left;"> 24.266 </td> <td style="text-align:left;"> NA </td> <td style="text-align:left;"> NA </td> <td style="text-align:left;"> NA </td> </tr> <tr> <td style="text-align:left;"> Babbling </td> <td style="text-align:left;"> 1.353 </td> <td style="text-align:left;"> 4.040 </td> <td style="text-align:left;"> 3.519 </td> <td style="text-align:left;"> 24.688 </td> <td style="text-align:left;"> NA </td> <td style="text-align:left;"> NA </td> </tr> <tr> <td style="text-align:left;"> Screeching </td> <td style="text-align:left;"> 0.589 </td> <td style="text-align:left;"> 3.116 </td> <td style="text-align:left;"> 1.720 </td> <td style="text-align:left;"> 8.081 </td> <td style="text-align:left;"> 21.159 </td> <td style="text-align:left;"> NA </td> </tr> <tr> <td style="text-align:left;"> VocalImitation </td> <td style="text-align:left;"> 2.715 </td> <td style="text-align:left;"> 4.457 </td> <td style="text-align:left;"> 4.326 </td> <td style="text-align:left;"> 11.315 </td> <td style="text-align:left;"> 4.809 </td> <td style="text-align:left;"> 25.012 </td> </tr> </tbody> </table> --- ## Theory and previous results Previous work in this area found that two __congeneric__ latent factors explain covariances of our six indicators: motoric and verbal latent component <br/><br/> <img src="image4.png" width="65%" style="display: block; margin: auto;" /> --- ## Estimated number of parameters <img src="parameters.png" width="50%" style="display: block; margin: auto;" /> -- Loadings `\((\lambda)\)`: 4 parameters<br/><br/> Residual variances `\((\epsilon)\)` : 6 parameters<br/><br/> Factor variances and covariances `\((\psi)\)` : 3 parameters<br/><br/> With intercepts: + 6 --- ## Second step: model identification 1. Under-indentified: more free parameters than total possible parameters <br/> <br/> 2. Just-identified: equal number of free parameters and total possible parameters <br/><br/> 3. Over-identified: fewer free parameters than total possible parameters <br/> <br/> <br/> Parameters can either be: free, fixed or constrained <br/> --- ## Third step: estimation of the model ```r #install.packages('lavaan') require(lavaan) model1<-' motor =~ TimeOnTummy + PreciseLegMoves + PreciseHandMoves verbal =~ Babbling + Screeching + VocalImitation ' fit1<-cfa(model1, data=Babies) ``` <style type="text/css"> pre { max-height: 300px; overflow-y: auto; } pre[class] { max-height: 100px; } </style> <style type="text/css"> .scroll-100 { max-height: 100px; overflow-y: auto; background-color: inherit; } </style> --- ## Results of the model ```r summary(fit1) ``` ``` ## lavaan 0.6-12 ended normally after 74 iterations ## ## Estimator ML ## Optimization method NLMINB ## Number of model parameters 13 ## ## Number of observations 100 ## ## Model Test User Model: ## ## Test statistic 3.376 ## Degrees of freedom 8 ## P-value (Chi-square) 0.909 ## ## Parameter Estimates: ## ## Standard errors Standard ## Information Expected ## Information saturated (h1) model Structured ## ## Latent Variables: ## Estimate Std.Err z-value P(>|z|) ## motor =~ ## TimeOnTummy 1.000 ## PreciseLegMovs 0.910 0.240 3.791 0.000 ## PreciseHandMvs 1.099 0.293 3.746 0.000 ## verbal =~ ## Babbling 1.000 ## Screeching 0.494 0.182 2.718 0.007 ## VocalImitation 0.716 0.246 2.906 0.004 ## ## Covariances: ## Estimate Std.Err z-value P(>|z|) ## motor ~~ ## verbal 3.433 1.901 1.806 0.071 ## ## Variances: ## Estimate Std.Err z-value P(>|z|) ## .TimeOnTummy 15.031 3.181 4.725 0.000 ## .PreciseLegMovs 14.709 2.867 5.130 0.000 ## .PreciseHandMvs 12.515 3.338 3.749 0.000 ## .Babbling 8.805 5.149 1.710 0.087 ## .Screeching 17.139 2.748 6.236 0.000 ## .VocalImitation 16.742 3.514 4.764 0.000 ## motor 9.523 3.625 2.627 0.009 ## verbal 15.635 5.947 2.629 0.009 ``` --- ## Results: visual <img src="image6.png" width="90%" style="display: block; margin: auto;" /> --- ## Interpretation of the coefficients: factor loadings - When unstandardized and loaded on a single factor, then unstandardized regression coefficients. Model predicted difference in the LVs between groups that differ in 1-unit on the predictor <br/> <br/> - When loaded on multiple factors, then regression coefficients become contingent on other factors (check Lecture 1, slide 11) <br/> <br/> - When standardized and loaded on a single factor (congeneric structure), then standardized loadings are estimated correlations between indicators and LVs <br/> <br/> - When standardized and loaded on a multiple factors, then same as the second option only standardized (beta weights) <br/> --- class: inverse, middle, center # Causal model --- ## Structural part of the model (path analysis) Model that test relationship between set of variables, often arranged in some sort of structural form. <br/> A common focus of the path model is the estimation of mediation between X and Y. ??? .center[ <img src="graphical.png", width = "120%"> <br/> ] --- ## First step: Specification of the model .center[ <img src="GeneralExample.png", width = "60%"> <br/> ] ??? Representation of our hypothetical assumptions in the form of the structural equation model --- ## Can model be estimated? Total Number of the parameters that we can estimate: `\(\frac{variables*(variables+1)}{2}\)` <br/> <br/> <br/> .center[ <img src="GeneralExample.png", width = "60%"> <br/> ] --- ## Number of observations ```r Matrix<-cov(Babies[,c('Nutrition','PhyExer','GMA','SocialBeh','CognitiveAb')]) Matrix[upper.tri(Matrix)]<-NA knitr::kable(Matrix, format = 'html') ``` <table> <thead> <tr> <th style="text-align:left;"> </th> <th style="text-align:right;"> Nutrition </th> <th style="text-align:right;"> PhyExer </th> <th style="text-align:right;"> GMA </th> <th style="text-align:right;"> SocialBeh </th> <th style="text-align:right;"> CognitiveAb </th> </tr> </thead> <tbody> <tr> <td style="text-align:left;"> Nutrition </td> <td style="text-align:right;"> 45.6689837 </td> <td style="text-align:right;"> NA </td> <td style="text-align:right;"> NA </td> <td style="text-align:right;"> NA </td> <td style="text-align:right;"> NA </td> </tr> <tr> <td style="text-align:left;"> PhyExer </td> <td style="text-align:right;"> -10.1006752 </td> <td style="text-align:right;"> 2652.9074 </td> <td style="text-align:right;"> NA </td> <td style="text-align:right;"> NA </td> <td style="text-align:right;"> NA </td> </tr> <tr> <td style="text-align:left;"> GMA </td> <td style="text-align:right;"> 0.5641485 </td> <td style="text-align:right;"> -249.3049 </td> <td style="text-align:right;"> 2478.2889 </td> <td style="text-align:right;"> NA </td> <td style="text-align:right;"> NA </td> </tr> <tr> <td style="text-align:left;"> SocialBeh </td> <td style="text-align:right;"> -11.6168733 </td> <td style="text-align:right;"> 3417.8681 </td> <td style="text-align:right;"> -506.1066 </td> <td style="text-align:right;"> 9988.898 </td> <td style="text-align:right;"> NA </td> </tr> <tr> <td style="text-align:left;"> CognitiveAb </td> <td style="text-align:right;"> 210.6731970 </td> <td style="text-align:right;"> 48916.6339 </td> <td style="text-align:right;"> 1254.2100 </td> <td style="text-align:right;"> 94358.621 </td> <td style="text-align:right;"> 1125746 </td> </tr> </tbody> </table> --- ## How many parameters are we estimating (path model)? How many degrees of freedom do we have without the model? -- .center[ <img src="ModelParameters.png", width = "60%"> <br/> ] Number of observations (total number of parameters) = 15<br/> Empty model = variances and covariances <br/> Degrees of freedom (df) __= 15 - 8 = 7__ <br/> ??? Most of the time (CFA model or other software): Degree of freedom for null model = `\((\frac{variables*(variables+1)}{2}) - variables\)` ```r Matrix<-cov(Babies[,c('Nutrition','PhyExer','GMA','SocialBeh','CognitiveAb')]) Matrix[upper.tri(Matrix)]<-NA Matrix[lower.tri(Matrix)]<-NA knitr::kable(Matrix, format = 'html') ``` <table> <thead> <tr> <th style="text-align:left;"> </th> <th style="text-align:right;"> Nutrition </th> <th style="text-align:right;"> PhyExer </th> <th style="text-align:right;"> GMA </th> <th style="text-align:right;"> SocialBeh </th> <th style="text-align:right;"> CognitiveAb </th> </tr> </thead> <tbody> <tr> <td style="text-align:left;"> Nutrition </td> <td style="text-align:right;"> 45.66898 </td> <td style="text-align:right;"> NA </td> <td style="text-align:right;"> NA </td> <td style="text-align:right;"> NA </td> <td style="text-align:right;"> NA </td> </tr> <tr> <td style="text-align:left;"> PhyExer </td> <td style="text-align:right;"> NA </td> <td style="text-align:right;"> 2652.907 </td> <td style="text-align:right;"> NA </td> <td style="text-align:right;"> NA </td> <td style="text-align:right;"> NA </td> </tr> <tr> <td style="text-align:left;"> GMA </td> <td style="text-align:right;"> NA </td> <td style="text-align:right;"> NA </td> <td style="text-align:right;"> 2478.289 </td> <td style="text-align:right;"> NA </td> <td style="text-align:right;"> NA </td> </tr> <tr> <td style="text-align:left;"> SocialBeh </td> <td style="text-align:right;"> NA </td> <td style="text-align:right;"> NA </td> <td style="text-align:right;"> NA </td> <td style="text-align:right;"> 9988.898 </td> <td style="text-align:right;"> NA </td> </tr> <tr> <td style="text-align:left;"> CognitiveAb </td> <td style="text-align:right;"> NA </td> <td style="text-align:right;"> NA </td> <td style="text-align:right;"> NA </td> <td style="text-align:right;"> NA </td> <td style="text-align:right;"> 1125746 </td> </tr> </tbody> </table> --- ## How many parameters (our model)? .center[ <img src="ModelParameters.png", width = "60%"> <br/> ] Free parameters = variances + covariances + regression pathways = 14 --- ## Second step: model identification 1. Under-indentified: more free parameters than total possible parameters <br/> <br/> 2. Just-identified: equal number of free parameters and total possible parameters <br/><br/> 3. Over-identified: fewer free parameters than total possible parameters <br/> <br/> <br/> Parameters can either be: free, fixed or constrained <br/> --- ## Third step: estimation of the model <style type="text/css"> pre { max-height: 300px; overflow-y: auto; } pre[class] { max-height: 100px; } </style> <style type="text/css"> .scroll-100 { max-height: 100px; overflow-y: auto; background-color: inherit; } </style> ```r modelAbility<-' SocialBeh~Nutrition+PhyExer+GMA CognitiveAb~SocialBeh+Nutrition+GMA ' ``` -- ```r fit1<-sem(modelAbility, data=Babies) summary(fit1) ``` ``` ## lavaan 0.6-12 ended normally after 1 iterations ## ## Estimator ML ## Optimization method NLMINB ## Number of model parameters 8 ## ## Number of observations 100 ## ## Model Test User Model: ## ## Test statistic 215.236 ## Degrees of freedom 1 ## P-value (Chi-square) 0.000 ## ## Parameter Estimates: ## ## Standard errors Standard ## Information Expected ## Information saturated (h1) model Structured ## ## Regressions: ## Estimate Std.Err z-value P(>|z|) ## SocialBeh ~ ## Nutrition 0.030 1.105 0.027 0.978 ## PhyExer 1.281 0.146 8.796 0.000 ## GMA -0.075 0.151 -0.500 0.617 ## CognitiveAb ~ ## SocialBeh 9.579 0.469 20.428 0.000 ## Nutrition 7.019 6.899 1.017 0.309 ## GMA 2.461 0.941 2.614 0.009 ## ## Variances: ## Estimate Std.Err z-value P(>|z|) ## .SocialBeh 5515.809 780.053 7.071 0.000 ## .CognitiveAb 215129.001 30423.835 7.071 0.000 ``` --- ## Direct and indirect .center[ <img src="simplified.png", width = "60%"> ] Direct effect (c): subgroups/cases that differ by one unit on X, but are equal on M are estimated to differ by __c__ units on Y. <br/> Indirect effect: <br/> a) X -> M: cases that differ by one unit in X are estimated to differ by __a__ units on M <br/> b) M -> Y: cases that differ by one unit in M, but are equal on X, are estimated to differ by __b__ units on Y <br/><br/> The indirect effect of X on Y through M is a product of __a__ and __b__. The two cases that differ by one unit on X are estimated to differ by __ab__ units on Y as a result of the effect of X on M which affects Y. --- ## Direct and indirect ```r modelAbilityPath<-' SocialBeh~Nutrition+a*PhyExer+GMA CognitiveAb~b*SocialBeh+c*Nutrition+GMA indirect := a*b direct := c total := indirect + direct ' fitPath<-sem(modelAbilityPath, data=Babies) summary(fitPath) ``` ``` ## lavaan 0.6-12 ended normally after 1 iterations ## ## Estimator ML ## Optimization method NLMINB ## Number of model parameters 8 ## ## Number of observations 100 ## ## Model Test User Model: ## ## Test statistic 215.236 ## Degrees of freedom 1 ## P-value (Chi-square) 0.000 ## ## Parameter Estimates: ## ## Standard errors Standard ## Information Expected ## Information saturated (h1) model Structured ## ## Regressions: ## Estimate Std.Err z-value P(>|z|) ## SocialBeh ~ ## Nutrition 0.030 1.105 0.027 0.978 ## PhyExer (a) 1.281 0.146 8.796 0.000 ## GMA -0.075 0.151 -0.500 0.617 ## CognitiveAb ~ ## SocialBeh (b) 9.579 0.469 20.428 0.000 ## Nutrition (c) 7.019 6.899 1.017 0.309 ## GMA 2.461 0.941 2.614 0.009 ## ## Variances: ## Estimate Std.Err z-value P(>|z|) ## .SocialBeh 5515.809 780.053 7.071 0.000 ## .CognitiveAb 215129.001 30423.835 7.071 0.000 ## ## Defined Parameters: ## Estimate Std.Err z-value P(>|z|) ## indirect 12.275 1.519 8.079 0.000 ## direct 7.019 6.899 1.017 0.309 ## total 19.294 7.074 2.727 0.006 ``` ??? Interaction between the predictors can be included similar to the linear regression model by using (:) sign.<br/> <br/> modelAbilityInteraction<-<br/> SocialBeh~Nutrition+PhyExer+GMA+__PhyExer:GMA__<br/> CognitiveAb~SocialBeh+Nutrition+GMA<br/> --- ## Step four: model evaluation Chi-square test: measure of how well model-implied covariance matrix fits data covariance <br/> <br/> We would prefer not to reject the null hypothesis in this case <br/> Assumptions: <br/> Multivariate normality <br/> N is sufficiently large (150+)<br/> Parameters are not at boundary or invalid (e.g. variance of zero)<br/><br/><br/> With the large samples it is sensitive to small misfits <br/> Nonormality induces bias <br/> --- ## Other fit indices ```r summary(fit1, fit.measures=TRUE) ``` ``` ## lavaan 0.6-12 ended normally after 1 iterations ## ## Estimator ML ## Optimization method NLMINB ## Number of model parameters 8 ## ## Number of observations 100 ## ## Model Test User Model: ## ## Test statistic 215.236 ## Degrees of freedom 1 ## P-value (Chi-square) 0.000 ## ## Model Test Baseline Model: ## ## Test statistic 438.108 ## Degrees of freedom 7 ## P-value 0.000 ## ## User Model versus Baseline Model: ## ## Comparative Fit Index (CFI) 0.503 ## Tucker-Lewis Index (TLI) -2.479 ## ## Loglikelihood and Information Criteria: ## ## Loglikelihood user model (H0) -1328.506 ## Loglikelihood unrestricted model (H1) -1220.888 ## ## Akaike (AIC) 2673.012 ## Bayesian (BIC) 2693.853 ## Sample-size adjusted Bayesian (BIC) 2668.587 ## ## Root Mean Square Error of Approximation: ## ## RMSEA 1.464 ## 90 Percent confidence interval - lower 1.303 ## 90 Percent confidence interval - upper 1.632 ## P-value RMSEA <= 0.05 0.000 ## ## Standardized Root Mean Square Residual: ## ## SRMR 0.080 ## ## Parameter Estimates: ## ## Standard errors Standard ## Information Expected ## Information saturated (h1) model Structured ## ## Regressions: ## Estimate Std.Err z-value P(>|z|) ## SocialBeh ~ ## Nutrition 0.030 1.105 0.027 0.978 ## PhyExer 1.281 0.146 8.796 0.000 ## GMA -0.075 0.151 -0.500 0.617 ## CognitiveAb ~ ## SocialBeh 9.579 0.469 20.428 0.000 ## Nutrition 7.019 6.899 1.017 0.309 ## GMA 2.461 0.941 2.614 0.009 ## ## Variances: ## Estimate Std.Err z-value P(>|z|) ## .SocialBeh 5515.809 780.053 7.071 0.000 ## .CognitiveAb 215129.001 30423.835 7.071 0.000 ``` --- ## Other fit indices .center[ <img src="fitInd.png", width = "60%"> ] ??? TLI: fit of .95 indicates that the fitted model improves the fit by 95% relative to the null mode, works OK with smaller sample sizes <br/> <br/> CFI: Same as TLI, but not very sensitive to sample size <br/> <br/> RMSEA: difference between the residuals of the sample covariance matrix and hypothesized model. If we have different scales it is hard to interpret, then we can check standardised root mean square residual (SRMR)<br/><br/> --- class: inverse, middle, center # Structural Equation Model --- ## SEM, finally <img src="SEM.png" width="80%" style="display: block; margin: auto;" /> --- ## Estimation of SEM ```r model4<-' #CFA model motor =~ TimeOnTummy + PreciseLegMoves + PreciseHandMoves verbal =~ Babbling + Screeching + VocalImitation #Path model motor ~ Age + Weight verbal ~ Age + Weight ' fit4<-sem(model4, data=Babies) ``` --- ## Structural equation model: Results ```r summary(fit4, standardized=TRUE) ``` ``` ## lavaan 0.6-12 ended normally after 81 iterations ## ## Estimator ML ## Optimization method NLMINB ## Number of model parameters 17 ## ## Number of observations 100 ## ## Model Test User Model: ## ## Test statistic 13.018 ## Degrees of freedom 16 ## P-value (Chi-square) 0.671 ## ## Parameter Estimates: ## ## Standard errors Standard ## Information Expected ## Information saturated (h1) model Structured ## ## Latent Variables: ## Estimate Std.Err z-value P(>|z|) Std.lv Std.all ## motor =~ ## TimeOnTummy 1.000 3.064 0.618 ## PreciseLegMovs 0.919 0.242 3.803 0.000 2.816 0.592 ## PreciseHandMvs 1.111 0.295 3.765 0.000 3.403 0.694 ## verbal =~ ## Babbling 1.000 3.498 0.708 ## Screeching 0.583 0.189 3.089 0.002 2.040 0.446 ## VocalImitation 0.899 0.263 3.422 0.001 3.144 0.632 ## ## Regressions: ## Estimate Std.Err z-value P(>|z|) Std.lv Std.all ## motor ~ ## Age -0.016 0.045 -0.355 0.723 -0.005 -0.043 ## Weight 0.000 0.001 0.085 0.932 0.000 0.010 ## verbal ~ ## Age -0.041 0.051 -0.803 0.422 -0.012 -0.097 ## Weight 0.002 0.001 2.108 0.035 0.001 0.263 ## ## Covariances: ## Estimate Std.Err z-value P(>|z|) Std.lv Std.all ## .motor ~~ ## .verbal 3.292 1.738 1.894 0.058 0.320 0.320 ## ## Variances: ## Estimate Std.Err z-value P(>|z|) Std.lv Std.all ## .TimeOnTummy 15.164 3.159 4.800 0.000 15.164 0.618 ## .PreciseLegMovs 14.662 2.862 5.122 0.000 14.662 0.649 ## .PreciseHandMvs 12.440 3.327 3.740 0.000 12.440 0.518 ## .Babbling 12.204 3.740 3.263 0.001 12.204 0.499 ## .Screeching 16.784 2.717 6.177 0.000 16.784 0.801 ## .VocalImitation 14.880 3.440 4.326 0.000 14.880 0.601 ## .motor 9.372 3.577 2.620 0.009 0.998 0.998 ## .verbal 11.299 4.205 2.687 0.007 0.923 0.923 ``` --- ## Prerequisites Theory: Strong theoretical assumptions that could be used to draw causal assumptions that could be tested using the data and specification of the model <br/><br/> Data: large samples <br/> - We are not that interested in significance: <br/><br/> a) Overall behaviour of the model more interesting<br/><br/> b) More data higher probability of significant results (weak effects)<br/><br/> c) Latent models are estimated by anchoring on indicator variables, different estimation can result in different patterns<br/><br/> --- ## Problems with SEM and alternatives 1. Variables derived from the normal distribution <br/> 2. Observations independent <br/> 3. Large sample size <br/> --- ## Important aspects: theory - Understanding differences between Exploratory FA and Confirmatory FA <br/> - How is linear model defined in the CFA<br/> - Scaling of the latent variables <br/> - Interpretation of the coefficients <br/> - Number of free parameters versus total number of parameters <br/> - Overall fit of the model --- ## Important aspects: practice - Specifying and estimating CFA model and path model <br/> - Scaling the LVs by using marker variable <br/> - Calculation of the direct and indirect pathways for predictors of interest <br/> - Making a full SEM model <br/> --- ## Literature Confirmatory Factor Analysis for Applied Research by Timothy A. Brown <br/> <br/> Chapters 9 of Principles and Practice of Structural Equation Modeling by Rex B. Kline <br/><br/> Latent Variable Modeling Using R: A Step-by-Step Guide by A. Alexander Beaujean <br/><br/> --- ## Thank you for your attention