This notebook is intended as an example of how to "do SEM" using different tools that are accessible in a Jupyter IPython Notebook installed with Anaconda. The notebook uses a Python3 kernel and interfaces with several R packages. Alternatives would be to run an R kernel in the notebook directly, or to use an RStudio markdown notebook and the reticulate package to access Python functionality from R.
To Do for future revisions:
Integrate instructions on using different hosted / collaborative notebook options for this tutorial:
Turn it into a RISE presentation https://rise.readthedocs.io/en/docs_hot_fixes/ (ideally also using the "Split Cells Notebook" extension for better visual organization)
Integrate other SEM tools available in R:
Compare with other tools:
For local use, I recommend jupyterlab as the interface, usually acessible as http://localhost:8888/lab
To run the code in this notebook, a local (or hosted) installation of Anaconda is required. The following installs R components available through Anaconda. We use the --yes option to avoid prompts for confirmation (impossible in the notebook) but that also means you cannot check what will be installed before it proceeds. To avoid this, you could run the installations from an Anaconda prompt or using the Anaconda Navigator.
R Studio is not strictly required, but it is a useful alternative way of using R. Also note that this assumes an Anaconda installation/environment that you have permission to change. On Windows, that means choosing the recommended "for this user only" option on install of Anaconda. Otherwise, these installations require Administrator / root rights.
!conda install --yes r-essentials
Solving environment: ...working... done # All requested packages already installed.
# !conda install --yes rstudio
Solving environment: ...working... done # All requested packages already installed.
!conda install --yes r-lavaan
Solving environment: ...working... done # All requested packages already installed.
Now we need rpy2
which provides the bridge between this python notebook and R. (Also installing tzlocal due to a current dependency bug in rpy2.)
!conda install --yes tzlocal rpy2
Solving environment: ...working... done # All requested packages already installed.
Let's enable the rpy2 extension, so that we can then execute R code with the %%R magic command at the top of a cell.
%load_ext rpy2.ipython
If you are on Windows, and you do not get text output from %%R cells showing up in the notebook, but instead in the console window where jupyter is running, this is a bug in rpy2 on Windows, and there's a workaround to capture stdout by running the following cells, see https://github.com/vitorcurtis/RWinOut
%%R
install.packages(c("R.utils"))
!curl -O "https://raw.githubusercontent.com/vitorcurtis/RWinOut/master/RWinOut.py"
% Total % Received % Xferd Average Speed Time Time Time Current Dload Upload Total Spent Left Speed 0 0 0 0 0 0 0 0 --:--:-- --:--:-- --:--:-- 0 0 0 0 0 0 0 0 0 --:--:-- --:--:-- --:--:-- 0 100 1259 100 1259 0 0 11550 0 --:--:-- --:--:-- --:--:-- 11550
%load_ext RWinOut
The rpy2.ipython extension is already loaded. To reload it, use: %reload_ext rpy2.ipython
And finally we want to see plots directly in the notebook. The simplest way is to request inline plots.
%matplotlib inline
But assuming you are running this in a JuyterLab interface, you might want the ipympl
library to get interactive widget plots.
# We install ipympl with pip, as it is not yet readily available with conda:
!pip install ipympl
# nodejs is needed for the interactive features if using JupyterLab,
# the corresponding package for normal notebooks, widgetsnbextension, should already be installed.
!conda install --yes nodejs
# install the jupyterlab extensions:
!jupyter labextension install @jupyter-widgets/jupyterlab-manager
!jupyter labextension install jupyter-matplotlib
Requirement already satisfied: ipympl in c:\users\sr876\appdata\local\continuum\anaconda3\lib\site-packages (0.2.1) Requirement already satisfied: ipykernel>=4.7 in c:\users\sr876\appdata\local\continuum\anaconda3\lib\site-packages (from ipympl) (5.1.0) Requirement already satisfied: ipywidgets>=7.0.0 in c:\users\sr876\appdata\local\continuum\anaconda3\lib\site-packages (from ipympl) (7.4.2) Requirement already satisfied: six in c:\users\sr876\appdata\local\continuum\anaconda3\lib\site-packages (from ipympl) (1.11.0) Requirement already satisfied: matplotlib>=2.0.0 in c:\users\sr876\appdata\local\continuum\anaconda3\lib\site-packages (from ipympl) (2.2.2) Requirement already satisfied: ipython>=5.0.0 in c:\users\sr876\appdata\local\continuum\anaconda3\lib\site-packages (from ipykernel>=4.7->ipympl) (7.1.1) Requirement already satisfied: traitlets>=4.1.0 in c:\users\sr876\appdata\local\continuum\anaconda3\lib\site-packages (from ipykernel>=4.7->ipympl) (4.3.2) Requirement already satisfied: jupyter-client in c:\users\sr876\appdata\local\continuum\anaconda3\lib\site-packages (from ipykernel>=4.7->ipympl) (5.2.3) Requirement already satisfied: tornado>=4.2 in c:\users\sr876\appdata\local\continuum\anaconda3\lib\site-packages (from ipykernel>=4.7->ipympl) (5.1.1) Requirement already satisfied: widgetsnbextension~=3.4.0 in c:\users\sr876\appdata\local\continuum\anaconda3\lib\site-packages (from ipywidgets>=7.0.0->ipympl) (3.4.2) Requirement already satisfied: nbformat>=4.2.0 in c:\users\sr876\appdata\local\continuum\anaconda3\lib\site-packages (from ipywidgets>=7.0.0->ipympl) (4.4.0) Requirement already satisfied: numpy>=1.7.1 in c:\users\sr876\appdata\local\continuum\anaconda3\lib\site-packages (from matplotlib>=2.0.0->ipympl) (1.15.3) Requirement already satisfied: cycler>=0.10 in c:\users\sr876\appdata\local\continuum\anaconda3\lib\site-packages (from matplotlib>=2.0.0->ipympl) (0.10.0) Requirement already satisfied: pyparsing!=2.0.4,!=2.1.2,!=2.1.6,>=2.0.1 in c:\users\sr876\appdata\local\continuum\anaconda3\lib\site-packages (from matplotlib>=2.0.0->ipympl) (2.2.2) Requirement already satisfied: python-dateutil>=2.1 in c:\users\sr876\appdata\local\continuum\anaconda3\lib\site-packages (from matplotlib>=2.0.0->ipympl) (2.7.5) Requirement already satisfied: pytz in c:\users\sr876\appdata\local\continuum\anaconda3\lib\site-packages (from matplotlib>=2.0.0->ipympl) (2018.7) Requirement already satisfied: kiwisolver>=1.0.1 in c:\users\sr876\appdata\local\continuum\anaconda3\lib\site-packages (from matplotlib>=2.0.0->ipympl) (1.0.1) Requirement already satisfied: colorama; sys_platform == "win32" in c:\users\sr876\appdata\local\continuum\anaconda3\lib\site-packages (from ipython>=5.0.0->ipykernel>=4.7->ipympl) (0.4.0) Requirement already satisfied: pygments in c:\users\sr876\appdata\local\continuum\anaconda3\lib\site-packages (from ipython>=5.0.0->ipykernel>=4.7->ipympl) (2.2.0) Requirement already satisfied: setuptools>=18.5 in c:\users\sr876\appdata\local\continuum\anaconda3\lib\site-packages (from ipython>=5.0.0->ipykernel>=4.7->ipympl) (40.5.0) Requirement already satisfied: decorator in c:\users\sr876\appdata\local\continuum\anaconda3\lib\site-packages (from ipython>=5.0.0->ipykernel>=4.7->ipympl) (4.3.0) Requirement already satisfied: prompt-toolkit<2.1.0,>=2.0.0 in c:\users\sr876\appdata\local\continuum\anaconda3\lib\site-packages (from ipython>=5.0.0->ipykernel>=4.7->ipympl) (2.0.7) Requirement already satisfied: jedi>=0.10 in c:\users\sr876\appdata\local\continuum\anaconda3\lib\site-packages (from ipython>=5.0.0->ipykernel>=4.7->ipympl) (0.13.1) Requirement already satisfied: pickleshare in c:\users\sr876\appdata\local\continuum\anaconda3\lib\site-packages (from ipython>=5.0.0->ipykernel>=4.7->ipympl) (0.7.5) Requirement already satisfied: backcall in c:\users\sr876\appdata\local\continuum\anaconda3\lib\site-packages (from ipython>=5.0.0->ipykernel>=4.7->ipympl) (0.1.0) Requirement already satisfied: ipython-genutils in c:\users\sr876\appdata\local\continuum\anaconda3\lib\site-packages (from traitlets>=4.1.0->ipykernel>=4.7->ipympl) (0.2.0) Requirement already satisfied: jupyter-core in c:\users\sr876\appdata\local\continuum\anaconda3\lib\site-packages (from jupyter-client->ipykernel>=4.7->ipympl) (4.4.0) Requirement already satisfied: pyzmq>=13 in c:\users\sr876\appdata\local\continuum\anaconda3\lib\site-packages (from jupyter-client->ipykernel>=4.7->ipympl) (17.1.2) Requirement already satisfied: notebook>=4.4.1 in c:\users\sr876\appdata\local\continuum\anaconda3\lib\site-packages (from widgetsnbextension~=3.4.0->ipywidgets>=7.0.0->ipympl) (5.7.0) Requirement already satisfied: jsonschema!=2.5.0,>=2.4 in c:\users\sr876\appdata\local\continuum\anaconda3\lib\site-packages (from nbformat>=4.2.0->ipywidgets>=7.0.0->ipympl) (2.6.0) Requirement already satisfied: wcwidth in c:\users\sr876\appdata\local\continuum\anaconda3\lib\site-packages (from prompt-toolkit<2.1.0,>=2.0.0->ipython>=5.0.0->ipykernel>=4.7->ipympl) (0.1.7) Requirement already satisfied: parso>=0.3.0 in c:\users\sr876\appdata\local\continuum\anaconda3\lib\site-packages (from jedi>=0.10->ipython>=5.0.0->ipykernel>=4.7->ipympl) (0.3.1) Requirement already satisfied: Send2Trash in c:\users\sr876\appdata\local\continuum\anaconda3\lib\site-packages (from notebook>=4.4.1->widgetsnbextension~=3.4.0->ipywidgets>=7.0.0->ipympl) (1.5.0) Requirement already satisfied: prometheus-client in c:\users\sr876\appdata\local\continuum\anaconda3\lib\site-packages (from notebook>=4.4.1->widgetsnbextension~=3.4.0->ipywidgets>=7.0.0->ipympl) (0.4.2) Requirement already satisfied: jinja2 in c:\users\sr876\appdata\local\continuum\anaconda3\lib\site-packages (from notebook>=4.4.1->widgetsnbextension~=3.4.0->ipywidgets>=7.0.0->ipympl) (2.10) Requirement already satisfied: nbconvert in c:\users\sr876\appdata\local\continuum\anaconda3\lib\site-packages (from notebook>=4.4.1->widgetsnbextension~=3.4.0->ipywidgets>=7.0.0->ipympl) (5.3.1) Requirement already satisfied: terminado>=0.8.1 in c:\users\sr876\appdata\local\continuum\anaconda3\lib\site-packages (from notebook>=4.4.1->widgetsnbextension~=3.4.0->ipywidgets>=7.0.0->ipympl) (0.8.1) Requirement already satisfied: MarkupSafe>=0.23 in c:\users\sr876\appdata\local\continuum\anaconda3\lib\site-packages (from jinja2->notebook>=4.4.1->widgetsnbextension~=3.4.0->ipywidgets>=7.0.0->ipympl) (1.0) Requirement already satisfied: pandocfilters>=1.4.1 in c:\users\sr876\appdata\local\continuum\anaconda3\lib\site-packages (from nbconvert->notebook>=4.4.1->widgetsnbextension~=3.4.0->ipywidgets>=7.0.0->ipympl) (1.4.2) Requirement already satisfied: bleach in c:\users\sr876\appdata\local\continuum\anaconda3\lib\site-packages (from nbconvert->notebook>=4.4.1->widgetsnbextension~=3.4.0->ipywidgets>=7.0.0->ipympl) (3.0.2) Requirement already satisfied: testpath in c:\users\sr876\appdata\local\continuum\anaconda3\lib\site-packages (from nbconvert->notebook>=4.4.1->widgetsnbextension~=3.4.0->ipywidgets>=7.0.0->ipympl) (0.4.2) Requirement already satisfied: entrypoints>=0.2.2 in c:\users\sr876\appdata\local\continuum\anaconda3\lib\site-packages (from nbconvert->notebook>=4.4.1->widgetsnbextension~=3.4.0->ipywidgets>=7.0.0->ipympl) (0.2.3) Requirement already satisfied: mistune>=0.7.4 in c:\users\sr876\appdata\local\continuum\anaconda3\lib\site-packages (from nbconvert->notebook>=4.4.1->widgetsnbextension~=3.4.0->ipywidgets>=7.0.0->ipympl) (0.8.4) Requirement already satisfied: webencodings in c:\users\sr876\appdata\local\continuum\anaconda3\lib\site-packages (from bleach->nbconvert->notebook>=4.4.1->widgetsnbextension~=3.4.0->ipywidgets>=7.0.0->ipympl) (0.5.1) Solving environment: ...working... done # All requested packages already installed. jupyter-widgets-jupyterlab-manager-0.38.1.tgz yarn install v1.9.4 info No lockfile found. [1/5] Validating package.json... [2/5] Resolving packages... warning css-loader > cssnano > autoprefixer > browserslist@1.7.7: Browserslist 2 could fail on reading Browserslist >3.0 config used in other tools. warning css-loader > cssnano > postcss-merge-rules > browserslist@1.7.7: Browserslist 2 could fail on reading Browserslist >3.0 config used in other tools. warning css-loader > cssnano > postcss-merge-rules > caniuse-api > browserslist@1.7.7: Browserslist 2 could fail on reading Browserslist >3.0 config used in other tools. [3/5] Fetching packages... info fsevents@1.2.4: The platform "win32" is incompatible with this module. info "fsevents@1.2.4" is an optional dependency and failed compatibility check. Excluding it from installation. [4/5] Linking dependencies... warning "@jupyterlab/vdom-extension > @nteract/transform-vdom@1.1.1" has incorrect peer dependency "react@^15.6.1". [5/5] Building fresh packages... success Saved lockfile. Done in 75.75s. yarn run v1.9.4 $ webpack Hash: f1eac10162dd2d44bf4d Version: webpack 4.12.2 Time: 39111ms Built at: 2018-11-02 20:57:26 Asset Size Chunks Chunk Names 1.055322dcf6c2bb19185f.js 888 KiB 1 [emitted] 674f50d287a8c48dc19ba404d20fe713.eot 162 KiB [emitted] 912ec66d7572ff821749319396470bde.svg 434 KiB [emitted] fee66e712a8a08eef5805a46892932ad.woff 95.7 KiB [emitted] b06871f281fee6b241d60582ae9369b9.ttf 162 KiB [emitted] main.5a992934528990838951.js 54.6 KiB main [emitted] main 0.cc98107762fcc28532b3.js 4.5 KiB 0 [emitted] vega.91b98e783d16fd1b9e23.js 519 bytes vega [emitted] vega vendors~@jupyter-widgets/controls~vega.3dd933b62461edbc58d8.js 22.6 KiB vendors~@jupyter-widgets/controls~vega [emitted] vendors~@jupyter-widgets/controls~vega vendors~main.44f7c1af2649541ea6c6.js 9.41 MiB vendors~main [emitted] vendors~main vendors~vega.1e302f886dbe300cf0ef.js 2.76 MiB vendors~vega [emitted] vendors~vega af7ae505a9eed503f8b8e6982036873e.woff2 75.4 KiB [emitted] vendors~@jupyter-widgets/controls.741a7524652a40694e8a.js 266 KiB vendors~@jupyter-widgets/controls [emitted] vendors~@jupyter-widgets/controls main.5a992934528990838951.js.map 63.5 KiB main [emitted] main 0.cc98107762fcc28532b3.js.map 5.73 KiB 0 [emitted] vega.91b98e783d16fd1b9e23.js.map 251 bytes vega [emitted] vega vendors~@jupyter-widgets/controls~vega.3dd933b62461edbc58d8.js.map 15.6 KiB vendors~@jupyter-widgets/controls~vega [emitted] vendors~@jupyter-widgets/controls~vega vendors~main.44f7c1af2649541ea6c6.js.map 10.9 MiB vendors~main [emitted] vendors~main vendors~vega.1e302f886dbe300cf0ef.js.map 2.19 MiB vendors~vega [emitted] vendors~vega 1.055322dcf6c2bb19185f.js.map 1.04 MiB 1 [emitted] vendors~@jupyter-widgets/controls.741a7524652a40694e8a.js.map 303 KiB vendors~@jupyter-widgets/controls [emitted] vendors~@jupyter-widgets/controls index.html 1.53 KiB [emitted] Entrypoint main = vendors~main.44f7c1af2649541ea6c6.js vendors~main.44f7c1af2649541ea6c6.js.map main.5a992934528990838951.js main.5a992934528990838951.js.map [0] multi whatwg-fetch ./build/index.out.js 40 bytes {main} [built] [1] vertx (ignored) 15 bytes {main} [optional] [built] [4] buffer (ignored) 15 bytes {main} [optional] [built] [5] crypto (ignored) 15 bytes {main} [optional] [built] [6] readable-stream (ignored) 15 bytes {main} [built] [7] supports-color (ignored) 15 bytes {main} [built] [8] chalk (ignored) 15 bytes {main} [built] [9] fs (ignored) 15 bytes {main} [built] [10] node-fetch (ignored) 15 bytes {vega} [built] [11] fs (ignored) 15 bytes {vega} [built] [ANye] ./build/index.out.js 35.9 KiB {main} [built] [RnhZ] ./node_modules/moment/locale sync ^\.\/.*$ 2.88 KiB {main} [optional] [built] [YuTi] (webpack)/buildin/module.js 497 bytes {vendors~main} [built] [eTbV] ./node_modules/codemirror/mode sync ^\.\/.*\.js$ 2.78 KiB {0} [built] [yLpj] (webpack)/buildin/global.js 489 bytes {vendors~main} [built] + 2390 hidden modules WARNING in jquery Multiple versions of jquery found: 2.2.4 ./~/jupyter-matplotlib/~/jquery from ./~/jupyter-matplotlib\src\mpl_widget.js 3.3.1 ./~/jquery from ./~/@jupyter-widgets\base\lib\widget.js WARNING in vega-lite Multiple versions of vega-lite found: 2.5.1 ./~/vega-lite\build\src from ./~/vega-lite\build\src\compile\selection\selection.js 2.6.0 ./~/vega-lite\build from ./~/vega-lite\build\src\index.js Check how you can resolve duplicate packages: https://github.com/darrenscerri/duplicate-package-checker-webpack-plugin#resolving-duplicate-packages-in-your-bundle Child html-webpack-plugin for "index.html": 1 asset Entrypoint undefined = index.html [KTNU] ./node_modules/html-loader!./templates/partial.html 567 bytes {0} [built] [YuTi] (webpack)/buildin/module.js 497 bytes {0} [built] [aS2v] ./node_modules/html-webpack-plugin/lib/loader.js!./templates/template.html 1.22 KiB {0} [built] [yLpj] (webpack)/buildin/global.js 489 bytes {0} [built] + 1 hidden module Done in 48.33s.
Node v8.9.3 > C:\Users\sr876\AppData\Local\Continuum\anaconda3\npm.CMD pack @jupyter-widgets/jupyterlab-manager Node v8.9.3 > node C:\Users\sr876\AppData\Local\Continuum\anaconda3\lib\site-packages\jupyterlab\staging\yarn.js install > node C:\Users\sr876\AppData\Local\Continuum\anaconda3\lib\site-packages\jupyterlab\staging\yarn.js run build
jupyter-matplotlib-0.3.0.tgz yarn install v1.9.4 info No lockfile found. [1/5] Validating package.json... [2/5] Resolving packages... warning css-loader > cssnano > autoprefixer > browserslist@1.7.7: Browserslist 2 could fail on reading Browserslist >3.0 config used in other tools. warning css-loader > cssnano > postcss-merge-rules > browserslist@1.7.7: Browserslist 2 could fail on reading Browserslist >3.0 config used in other tools. warning css-loader > cssnano > postcss-merge-rules > caniuse-api > browserslist@1.7.7: Browserslist 2 could fail on reading Browserslist >3.0 config used in other tools. [3/5] Fetching packages... info fsevents@1.2.4: The platform "win32" is incompatible with this module. info "fsevents@1.2.4" is an optional dependency and failed compatibility check. Excluding it from installation. [4/5] Linking dependencies... warning "@jupyterlab/vdom-extension > @nteract/transform-vdom@1.1.1" has incorrect peer dependency "react@^15.6.1". [5/5] Building fresh packages... success Saved lockfile. Done in 107.61s. yarn run v1.9.4 $ webpack Hash: f1eac10162dd2d44bf4d Version: webpack 4.12.2 Time: 36409ms Built at: 2018-11-02 21:00:13 Asset Size Chunks Chunk Names 1.055322dcf6c2bb19185f.js 888 KiB 1 [emitted] 674f50d287a8c48dc19ba404d20fe713.eot 162 KiB [emitted] 912ec66d7572ff821749319396470bde.svg 434 KiB [emitted] fee66e712a8a08eef5805a46892932ad.woff 95.7 KiB [emitted] b06871f281fee6b241d60582ae9369b9.ttf 162 KiB [emitted] main.5a992934528990838951.js 54.6 KiB main [emitted] main 0.cc98107762fcc28532b3.js 4.5 KiB 0 [emitted] vega.91b98e783d16fd1b9e23.js 519 bytes vega [emitted] vega vendors~@jupyter-widgets/controls~vega.3dd933b62461edbc58d8.js 22.6 KiB vendors~@jupyter-widgets/controls~vega [emitted] vendors~@jupyter-widgets/controls~vega vendors~main.44f7c1af2649541ea6c6.js 9.41 MiB vendors~main [emitted] vendors~main vendors~vega.1e302f886dbe300cf0ef.js 2.76 MiB vendors~vega [emitted] vendors~vega af7ae505a9eed503f8b8e6982036873e.woff2 75.4 KiB [emitted] vendors~@jupyter-widgets/controls.741a7524652a40694e8a.js 266 KiB vendors~@jupyter-widgets/controls [emitted] vendors~@jupyter-widgets/controls main.5a992934528990838951.js.map 63.5 KiB main [emitted] main 0.cc98107762fcc28532b3.js.map 5.73 KiB 0 [emitted] vega.91b98e783d16fd1b9e23.js.map 251 bytes vega [emitted] vega vendors~@jupyter-widgets/controls~vega.3dd933b62461edbc58d8.js.map 15.6 KiB vendors~@jupyter-widgets/controls~vega [emitted] vendors~@jupyter-widgets/controls~vega vendors~main.44f7c1af2649541ea6c6.js.map 10.9 MiB vendors~main [emitted] vendors~main vendors~vega.1e302f886dbe300cf0ef.js.map 2.19 MiB vendors~vega [emitted] vendors~vega 1.055322dcf6c2bb19185f.js.map 1.04 MiB 1 [emitted] vendors~@jupyter-widgets/controls.741a7524652a40694e8a.js.map 303 KiB vendors~@jupyter-widgets/controls [emitted] vendors~@jupyter-widgets/controls index.html 1.53 KiB [emitted] Entrypoint main = vendors~main.44f7c1af2649541ea6c6.js vendors~main.44f7c1af2649541ea6c6.js.map main.5a992934528990838951.js main.5a992934528990838951.js.map [0] multi whatwg-fetch ./build/index.out.js 40 bytes {main} [built] [1] vertx (ignored) 15 bytes {main} [optional] [built] [4] buffer (ignored) 15 bytes {main} [optional] [built] [5] crypto (ignored) 15 bytes {main} [optional] [built] [6] readable-stream (ignored) 15 bytes {main} [built] [7] supports-color (ignored) 15 bytes {main} [built] [8] chalk (ignored) 15 bytes {main} [built] [9] fs (ignored) 15 bytes {main} [built] [10] node-fetch (ignored) 15 bytes {vega} [built] [11] fs (ignored) 15 bytes {vega} [built] [ANye] ./build/index.out.js 35.9 KiB {main} [built] [RnhZ] ./node_modules/moment/locale sync ^\.\/.*$ 2.88 KiB {main} [optional] [built] [YuTi] (webpack)/buildin/module.js 497 bytes {vendors~main} [built] [eTbV] ./node_modules/codemirror/mode sync ^\.\/.*\.js$ 2.78 KiB {0} [built] [yLpj] (webpack)/buildin/global.js 489 bytes {vendors~main} [built] + 2390 hidden modules WARNING in jquery Multiple versions of jquery found: 2.2.4 ./~/jupyter-matplotlib/~/jquery from ./~/jupyter-matplotlib\src\mpl_widget.js 3.3.1 ./~/jquery from ./~/@jupyter-widgets\base\lib\widget.js WARNING in vega-lite Multiple versions of vega-lite found: 2.5.1 ./~/vega-lite\build\src from ./~/vega-lite\build\src\compile\selection\selection.js 2.6.0 ./~/vega-lite\build from ./~/vega-lite\build\src\index.js Check how you can resolve duplicate packages: https://github.com/darrenscerri/duplicate-package-checker-webpack-plugin#resolving-duplicate-packages-in-your-bundle Child html-webpack-plugin for "index.html": 1 asset Entrypoint undefined = index.html [KTNU] ./node_modules/html-loader!./templates/partial.html 567 bytes {0} [built] [YuTi] (webpack)/buildin/module.js 497 bytes {0} [built] [aS2v] ./node_modules/html-webpack-plugin/lib/loader.js!./templates/template.html 1.22 KiB {0} [built] [yLpj] (webpack)/buildin/global.js 489 bytes {0} [built] + 1 hidden module Done in 46.88s.
Node v8.9.3 > C:\Users\sr876\AppData\Local\Continuum\anaconda3\npm.CMD pack jupyter-matplotlib Node v8.9.3 > node C:\Users\sr876\AppData\Local\Continuum\anaconda3\lib\site-packages\jupyterlab\staging\yarn.js install > node C:\Users\sr876\AppData\Local\Continuum\anaconda3\lib\site-packages\jupyterlab\staging\yarn.js run build
Now enable widget-based plots:
%matplotlib widget
Finally, since this is a long document, the following extension adds a table-of-contents sidebar to the JupyterLab interface:
!jupyter labextension install @jupyterlab/toc
Now for some R packages that are not readily available in an Anaconda default install. They might be available through the conda-forge "channel" - however, at the time of writing, I cannot recommend this, as the performance of conda install
is abysmal when using R packages from that repository.
When using a server-hosted notebook, some or all of these packages might already be installed.
%%R
install.packages(c("semPlot", "OpenMx", "semTools", "sem", "gpairs", "GGally"))
A note to avoid possible confusion: lavaan
provides a function cfa
as a convenience for confirmatory factor analysis. There is also an R package called cfa
- however, that one is not related to SEM.
First, import all the basic packages in Python, such as pandas
, numpy
, matplotlib
.
Also import seaborn
for simple high-level plots with decent looks. It complements the default plotting provided by matplotlib
. See here, for a useful brief overview of looking at data using some of seaborn's plot types: https://elitedatascience.com/python-seaborn-tutorial
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
Next, some of the libraries that are statistics-oriented.
import statsmodels.api as sm
import statsmodels.formula.api as smf # the R-like interface for statsmodels
import statsmodels.graphics as smg
import sklearn
Now for loading R packages, we will be using.
%%R
library(lavaan)
library(semPlot)
library(OpenMx)
library(semTools)
In R, there are also several packages providing convenient high-level plots, such as generalized pairs plots.
%%R
library(GGally)
library(gpairs)
For the purpose of getting nice visual output, we will also set some defaults for plotting libraries.
defaultfigwidth, defaultfigheight = 10, 9
# set a slightly larger default size for plots. default dpi is 100
plt.rcParams['figure.figsize'] = [defaultfigwidth, defaultfigheight]
# enable seaborn's defaults for nicer plots overall:
sns.set(color_codes=True)
There is currently no built-in way to set default dimensions for plots in R. The %R magic command from the rpy2
library accepts width, height, and units parameters like this: %%R -w 10 -h 9 -u in -r 100
but it would be nice to set defaults.
Since this is python, there is a way around that using monkey-patching. Note that this is usually a Bad Idea(TM) and should be avoided if possible. It is also purely cosmetic for the purposes of this notebook, so it can be safely ignored. :)
# these are the defaults we want to set:
default_units = 'in' # inch, to make it more easily comparable to matpplotlib
default_res = 100 # dpi, same as default in matplotlib
default_width = 10
default_height = 9
# try monkey-patching a function in rpy2, so we effectively get these
# default settings for the width, height, and units arguments of the %R magic command
import rpy2
old_setup_graphics = rpy2.ipython.rmagic.RMagics.setup_graphics
def new_setup_graphics(self, args):
if getattr(args, 'units') is not None:
if args.units != default_units: # a different units argument was passed, do not apply defaults
return old_setup_graphics(self, args)
args.units = default_units
if getattr(args, 'res') is None:
args.res = default_res
if getattr(args, 'width') is None:
args.width = default_width
if getattr(args, 'height') is None:
args.height = default_height
return old_setup_graphics(self, args)
rpy2.ipython.rmagic.RMagics.setup_graphics = new_setup_graphics
We are borrowing example data from this excellent course offered at Harvard, S090A1: https://canvas.harvard.edu/courses/8737/pages/data
The actual data is from the Zambian Early Childhood Development Project. The full sample has more than 1600 Zambian six-year-olds, from a study led by Günther Fink and Stephanie Zuilkowski.
The data is in the proprietory stata format, so we first need to convert it and import it. We will use pandas.read_stata
but this could also be accomplished in R with the foreign
package. First, we write a helper function to download the data file if it is not in the current directly. Defining a function will allow us to re-use it later for other datasets.
import os.path
import urllib.request
def downloadIfMissing(filenameData, remoteLocation):
'''Check if the file exists. If not, try downloading from remoteLocation.'''
if not os.path.isfile(filenameData):
with urllib.request.urlopen(remoteLocation) as response:
with open(filenameData, 'xb') as destinationFile:
destinationFile.write(response.read())
# make sure we have the small data file available in the current directory, if not, try to download it:
filenameSmallZambiaData = "S090_InClass_Zambia.dta"
downloadIfMissing(filenameSmallZambiaData, "https://canvas.harvard.edu/courses/8737/files/1839865/download")
# read the data into a pandas dataframe
smallZambiaDF = pd.read_stata(filenameSmallZambiaData)
len(smallZambiaDF) # should return 1613
1613
# make sure we have the full measurement data file available in the current directory, if not, try to download it:
filenameMeasureZambiaData = "S090_InClass_Zambia_Measurement.dta"
downloadIfMissing(filenameMeasureZambiaData, "https://canvas.harvard.edu/courses/8737/files/1994882/download")
# read the data into a pandas dataframe
measureZambiaDF = pd.read_stata(filenameMeasureZambiaData)
len(measureZambiaDF) # should return 1623
1623
We now have the data in a dataframe, let's get an overview of what kind of data we are dealing with.
Now let's have a look at the dataframe.
smallZambiaDF.head() # equivalent to [:5] i.e. first five entries
childid | male | urban | ece | reasoning | socemo | vocab | vocabsq | wealth | books | |
---|---|---|---|---|---|---|---|---|---|---|
0 | 101 | Male | Urban | ECE | 3.0 | 1.05 | 18.0 | 324.0 | 3.0 | No books in home |
1 | 102 | Male | Urban | No ECE | 4.0 | 1.00 | 19.0 | 361.0 | 2.0 | No books in home |
2 | 103 | Female | Urban | No ECE | 4.0 | 1.80 | 19.0 | 361.0 | 3.0 | No books in home |
3 | 104 | Male | Urban | No ECE | 5.0 | 2.35 | 12.0 | 144.0 | 3.0 | No books in home |
4 | 105 | Male | Urban | No ECE | 5.0 | 1.50 | 25.0 | 625.0 | 2.0 | Books in home |
smallZambiaDF.info()
<class 'pandas.core.frame.DataFrame'> Int64Index: 1613 entries, 0 to 1612 Data columns (total 10 columns): childid 1613 non-null int16 male 1613 non-null category urban 1613 non-null category ece 1613 non-null category reasoning 1613 non-null float32 socemo 1613 non-null float32 vocab 1613 non-null float32 vocabsq 1613 non-null float32 wealth 1613 non-null float32 books 1613 non-null category dtypes: category(4), float32(5), int16(1) memory usage: 53.9 KB
smallZambiaDF.describe(include='all', percentiles=[]) # describe categorical and numerical columns, don't bother with percentiles
childid | male | urban | ece | reasoning | socemo | vocab | vocabsq | wealth | books | |
---|---|---|---|---|---|---|---|---|---|---|
count | 1613.000000 | 1613 | 1613 | 1613 | 1613.000000 | 1613.000000 | 1613.000000 | 1613.000000 | 1613.000000 | 1613 |
unique | NaN | 2 | 2 | 2 | NaN | NaN | NaN | NaN | NaN | 2 |
top | NaN | Female | Urban | No ECE | NaN | NaN | NaN | NaN | NaN | No books in home |
freq | NaN | 810 | 814 | 1110 | NaN | NaN | NaN | NaN | NaN | 1171 |
mean | 4158.768754 | NaN | NaN | NaN | 4.451333 | 1.647789 | 21.475512 | 489.346558 | 2.957222 | NaN |
std | 2376.382896 | NaN | NaN | NaN | 2.521065 | 0.450903 | 5.307204 | 216.253510 | 1.430792 | NaN |
min | 101.000000 | NaN | NaN | NaN | 0.000000 | 0.444444 | 0.000000 | 0.000000 | 1.000000 | NaN |
50% | 4312.000000 | NaN | NaN | NaN | 4.000000 | 1.611111 | 22.000000 | 484.000000 | 3.000000 | NaN |
max | 8125.000000 | NaN | NaN | NaN | 10.000000 | 3.000000 | 30.000000 | 900.000000 | 5.000000 | NaN |
smallZambiaDF.male.describe()
count 1613 unique 2 top Female freq 810 Name: male, dtype: object
smallZambiaDF.male.value_counts()
Female 810 Male 803 Name: male, dtype: int64
smallZambiaDF.wealth.value_counts()
1.0 356 4.0 340 2.0 311 5.0 307 3.0 299 Name: wealth, dtype: int64
# visually check relations between numeric variables
grid = sns.pairplot(smallZambiaDF, hue="ece", height=defaultfigheight/6, kind='scatter')
# we have to use an explicit height per facet-figure here, since a grid of figures doesn't follow the matplotlib default size
C:\Users\sr876\AppData\Local\Continuum\anaconda3\lib\site-packages\scipy\stats\stats.py:1713: FutureWarning: Using a non-tuple sequence for multidimensional indexing is deprecated; use `arr[tuple(seq)]` instead of `arr[seq]`. In the future this will be interpreted as an array index, `arr[np.array(seq)]`, which will result either in an error or a different result. return np.add.reduce(sorted[indexer] * weights, axis=axis) / sumval
grid = sns.pairplot(smallZambiaDF, hue="ece", height=defaultfigheight/6, kind='reg') # linear regressions on top of scatter
pd.scatter_matrix(smallZambiaDF)
plt.show()
C:\Users\sr876\AppData\Local\Continuum\anaconda3\lib\site-packages\ipykernel_launcher.py:1: FutureWarning: pandas.scatter_matrix is deprecated, use pandas.plotting.scatter_matrix instead """Entry point for launching an IPython kernel.
plt.figure()
sns.swarmplot(x="male", y="socemo", data=smallZambiaDF)
plt.show()
%%R -i smallZambiaDF
pairs(smallZambiaDF)
C:\Users\sr876\AppData\Local\Continuum\anaconda3\lib\site-packages\rpy2-2.9.4-py3.7-win-amd64.egg\rpy2\robjects\pandas2ri.py:191: FutureWarning: from_items is deprecated. Please use DataFrame.from_dict(dict(items), ...) instead. DataFrame.from_dict(OrderedDict(items)) may be used to preserve the key order. res = PandasDataFrame.from_items(items)
%%R -i smallZambiaDF
ggpairs(smallZambiaDF) # from library GGally
%%R -i smallZambiaDF
gpairs(smallZambiaDF) # from library gpairs
# Calculate correlations
corr = smallZambiaDF.corr()
# Heatmap
plt.figure()
sns.heatmap(corr)
plt.show()
To Do: implement an equivalent to ggpairs
in python, similar to scatter_matrix_all
found here: http://photo.etangkk.com/Python/blog-04.asp but using seaborn's PairGrid functionality.
What are our observed variables like? Continuous (e.g. age or height), ordinal (e.g. Likert items), or categorical (e.g. gender) or even dichotomous/binary (e.g. yes/no questions). Is validity (accurately captures the theoretical construct) and reliability (consistent results across time/rater/items) already established or just assumed?
SEM maximum likelihood estimator assumes normality of all endogenous variables, and implies they are continuous. But these assumptions can sometimes be relaxed.
TODO: how to check?
First, we do a simple regression. We can use scikit-learn
(more oriented towards machine learning, less support for detailed statistics output) or statsmodels
(more statistics oriented, with R-like interface) or R.
result = smf.ols(formula='socemo ~ ece', data=smallZambiaDF).fit()
print(result.params)
Intercept 1.613317 ece[T.ECE] 0.110536 dtype: float64
print(result.summary())
OLS Regression Results ============================================================================== Dep. Variable: socemo R-squared: 0.013 Model: OLS Adj. R-squared: 0.012 Method: Least Squares F-statistic: 21.06 Date: Sun, 18 Nov 2018 Prob (F-statistic): 4.80e-06 Time: 23:22:31 Log-Likelihood: -993.01 No. Observations: 1613 AIC: 1990. Df Residuals: 1611 BIC: 2001. Df Model: 1 Covariance Type: nonrobust ============================================================================== coef std err t P>|t| [0.025 0.975] ------------------------------------------------------------------------------ Intercept 1.6133 0.013 119.945 0.000 1.587 1.640 ece[T.ECE] 0.1105 0.024 4.589 0.000 0.063 0.158 ============================================================================== Omnibus: 23.699 Durbin-Watson: 0.838 Prob(Omnibus): 0.000 Jarque-Bera (JB): 21.012 Skew: 0.225 Prob(JB): 2.74e-05 Kurtosis: 2.667 Cond. No. 2.42 ============================================================================== Warnings: [1] Standard Errors assume that the covariance matrix of the errors is correctly specified.
fig = smg.regressionplots.abline_plot(model_results=result)
fig.axes[0].scatter(smallZambiaDF['ece'], smallZambiaDF['socemo'])
FigureCanvasNbAgg()
<matplotlib.collections.PathCollection at 0x23e368b1cf8>
Since this is based on a categorical variable (ece), it is effectively a one-way ANOVA, so we can also print it as that.
table = sm.stats.anova_lm(result, typ=2) # Type 2 Anova DataFrame
print(table)
sum_sq df F PR(>F) ece 4.229228 1.0 21.060384 0.000005 Residual 323.511946 1611.0 NaN NaN
pd.get_dummies(smallZambiaDF, columns=['ece',]).head()
childid | male | urban | reasoning | socemo | vocab | vocabsq | wealth | books | ece_No ECE | ece_ECE | |
---|---|---|---|---|---|---|---|---|---|---|---|
0 | 101 | Male | Urban | 3.0 | 1.05 | 18.0 | 324.0 | 3.0 | No books in home | 0 | 1 |
1 | 102 | Male | Urban | 4.0 | 1.00 | 19.0 | 361.0 | 2.0 | No books in home | 1 | 0 |
2 | 103 | Female | Urban | 4.0 | 1.80 | 19.0 | 361.0 | 3.0 | No books in home | 1 | 0 |
3 | 104 | Male | Urban | 5.0 | 2.35 | 12.0 | 144.0 | 3.0 | No books in home | 1 | 0 |
4 | 105 | Male | Urban | 5.0 | 1.50 | 25.0 | 625.0 | 2.0 | Books in home | 1 | 0 |
smallZambiaDF['ece'].str.get_dummies()['ECE'].head()
0 1 1 0 2 0 3 0 4 0 Name: ECE, dtype: int64
smallZambiaDF['socemo'].shape, smallZambiaDF['ece'].str.get_dummies()[['ECE']].shape
((1613,), (1613, 1))
linreg = sklearn.linear_model.LinearRegression()
linreg.fit(smallZambiaDF['ece'].str.get_dummies()[['ECE']], smallZambiaDF['socemo'])
LinearRegression(copy_X=True, fit_intercept=True, n_jobs=None, normalize=False)
print(linreg.intercept_, linreg.coef_)
1.613316750660673 [0.11053562]
sns.lmplot(x='ece_ECE', y='socemo', data=pd.get_dummies(smallZambiaDF), height=defaultfigheight)
FigureCanvasNbAgg()
<seaborn.axisgrid.FacetGrid at 0x16c688009e8>
f = plt.figure()
residaxes = sns.residplot(x='ece_ECE', y='socemo', data=pd.get_dummies(smallZambiaDF))
FigureCanvasNbAgg()
Calculating and plotting the simple linear model with R:
%%R -i smallZambiaDF
model = lm(socemo ~ ece, data = smallZambiaDF)
summary(model)
Call: lm(formula = socemo ~ ece, data = smallZambiaDF) Residuals: Min 1Q Median 3Q Max -1.16887 -0.32385 -0.02385 0.32615 1.38668 Coefficients: Estimate Std. Error t value Pr(>|t|) (Intercept) 1.66858 0.01204 138.551 < 2e-16 *** ece.L 0.07816 0.01703 4.589 4.8e-06 *** --- Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 Residual standard error: 0.4481 on 1611 degrees of freedom Multiple R-squared: 0.0129, Adjusted R-squared: 0.01229 F-statistic: 21.06 on 1 and 1611 DF, p-value: 4.795e-06
%%R -i smallZambiaDF
plot(socemo ~ ece, data = smallZambiaDF)
abline(model)
%%R -i smallZambiaDF
xyplot(socemo ~ ece, data = smallZambiaDF)
For comparison let's use SEM to run the same linear regression model as above.
%%R -i smallZambiaDF
semfit <- sem('socemo ~ reasoning', data=smallZambiaDF, meanstructure=TRUE)
summary(semfit)
lavaan 0.6-2 ended normally after 11 iterations Optimization method NLMINB Number of free parameters 3 Number of observations 1613 Estimator ML Model Fit Test Statistic 0.000 Degrees of freedom 0 Parameter Estimates: Information Expected Information saturated (h1) model Structured Standard Errors Standard Regressions: Estimate Std.Err z-value P(>|z|) socemo ~ reasoning -0.007 0.004 -1.672 0.094 Intercepts: Estimate Std.Err z-value P(>|z|) .socemo 1.681 0.023 73.853 0.000 Variances: Estimate Std.Err z-value P(>|z|) .socemo 0.203 0.007 28.399 0.000
dummiedSmallZambiaDF = pd.get_dummies(smallZambiaDF, dtype=np.int8)
# We use int8 instead of uint8 for the dummy variables, because R has no unsigned integers,
# so we would get warnings when passing these new variables into R.
# Further: the new dummy columns use the actual value as part of the column name,
# and therefore might have spaces in the column name.
# this is not great for specifying models, so let us replace the spaces with underscores:
dummiedSmallZambiaDF.columns = dummiedSmallZambiaDF.columns.str.replace(' ', '_')
%%R -i dummiedSmallZambiaDF
semfit <- sem('socemo ~ ece_ECE', data=dummiedSmallZambiaDF, meanstructure=TRUE)
summary(semfit, standardized=TRUE)
lavaan 0.6-2 ended normally after 13 iterations Optimization method NLMINB Number of free parameters 3 Number of observations 1613 Estimator ML Model Fit Test Statistic 0.000 Degrees of freedom 0 Minimum Function Value 0.0000000000000 Parameter Estimates: Information Expected Information saturated (h1) model Structured Standard Errors Standard Regressions: Estimate Std.Err z-value P(>|z|) Std.lv Std.all socemo ~ ece_ECE 0.111 0.024 4.592 0.000 0.111 0.114 Intercepts: Estimate Std.Err z-value P(>|z|) Std.lv Std.all .socemo 1.613 0.013 120.020 0.000 1.613 3.579 Variances: Estimate Std.Err z-value P(>|z|) Std.lv Std.all .socemo 0.201 0.007 28.399 0.000 0.201 0.987
%%R -i dummiedSmallZambiaDF
semfit <- sem('socemo ~ ece_ECE', data=dummiedSmallZambiaDF, estimator='MLMV', meanstructure=TRUE)
summary(semfit, standardized=TRUE)
lavaan 0.6-2 ended normally after 13 iterations Optimization method NLMINB Number of free parameters 3 Number of observations 1613 Estimator ML Robust Model Fit Test Statistic 0.000 0.000 Degrees of freedom 0 0 Minimum Function Value 0.0000000000000 Scaling correction factor NA Shift parameter for simple second-order correction (Mplus variant) Parameter Estimates: Information Expected Information saturated (h1) model Structured Standard Errors Robust.sem Regressions: Estimate Std.Err z-value P(>|z|) Std.lv Std.all socemo ~ ece_ECE 0.111 0.024 4.595 0.000 0.111 0.114 Intercepts: Estimate Std.Err z-value P(>|z|) Std.lv Std.all .socemo 1.613 0.013 119.945 0.000 1.613 3.579 Variances: Estimate Std.Err z-value P(>|z|) Std.lv Std.all .socemo 0.201 0.006 31.106 0.000 0.201 0.987
%%R -i dummiedSmallZambiaDF
semPaths(lm('socemo ~ ece_ECE', data=dummiedSmallZambiaDF), "std")
%%R -i dummiedSmallZambiaDF
bigmodel <- '
socemo ~ ece_ECE + wealth
reasoning ~ ece_ECE + wealth + books_Books_in_home
socemo ~~ reasoning
'
semfit <- sem(bigmodel, data=dummiedSmallZambiaDF, estimator="MLMV", meanstructure=TRUE)
summary(semfit, standardized=TRUE)
C:\Users\sr876\AppData\Local\Continuum\anaconda3\lib\site-packages\rpy2-2.9.4-py3.7-win-amd64.egg\rpy2\robjects\pandas2ri.py:191: FutureWarning: from_items is deprecated. Please use DataFrame.from_dict(dict(items), ...) instead. DataFrame.from_dict(OrderedDict(items)) may be used to preserve the key order. res = PandasDataFrame.from_items(items)
lavaan 0.6-2 ended normally after 31 iterations Optimization method NLMINB Number of free parameters 10 Number of observations 1613 Estimator ML Robust Model Fit Test Statistic 6.390 6.305 Degrees of freedom 1 1 P-value (Chi-square) 0.011 0.012 Scaling correction factor 1.014 Shift parameter -0.000 for simple second-order correction (Mplus variant) Parameter Estimates: Information Expected Information saturated (h1) model Structured Standard Errors Robust.sem Regressions: Estimate Std.Err z-value P(>|z|) Std.lv Std.all socemo ~ ece_ECE 0.100 0.026 3.916 0.000 0.100 0.103 wealth 0.009 0.008 1.161 0.246 0.009 0.029 reasoning ~ ece_ECE 0.741 0.151 4.922 0.000 0.741 0.136 wealth 0.208 0.046 4.502 0.000 0.208 0.118 books_Bks_n_hm 0.296 0.148 2.007 0.045 0.296 0.052 Covariances: Estimate Std.Err z-value P(>|z|) Std.lv Std.all .socemo ~~ .reasoning -0.079 0.030 -2.677 0.007 -0.079 -0.072 Intercepts: Estimate Std.Err z-value P(>|z|) Std.lv Std.all .socemo 1.589 0.025 64.749 0.000 1.589 3.525 .reasoning 3.523 0.140 25.208 0.000 3.523 1.398 Variances: Estimate Std.Err z-value P(>|z|) Std.lv Std.all .socemo 0.200 0.006 31.148 0.000 0.200 0.986 .reasoning 6.022 0.173 34.896 0.000 6.022 0.948
%%R
semPaths(semfit, "std")
%%R
semPaths(semfit, "std", curvePivot=TRUE, intercepts=FALSE, layout="tree2", rotation=2, nCharNodes=9, sizeMan=15,
groups=list(c("wealth", "ece_ECE", "books_Books_in_home"), c("reasoning", "socemo")),
color=c("lightblue", "purple"))
%%R
fitMeasures(semfit)
npar fmin 8.000 0.002 chisq df 6.390 1.000 pvalue chisq.scaled 0.011 6.305 df.scaled pvalue.scaled 1.000 0.012 chisq.scaling.factor baseline.chisq 1.014 123.037 baseline.df baseline.pvalue 7.000 0.000 baseline.chisq.scaled baseline.df.scaled 120.437 7.000 baseline.pvalue.scaled baseline.chisq.scaling.factor 0.000 1.023 cfi tli 0.954 0.675 nnfi rfi 0.675 0.636 nfi pnfi 0.948 0.135 ifi rni 0.956 0.954 cfi.scaled tli.scaled 0.953 0.673 cfi.robust tli.robust NA NA nnfi.scaled nnfi.robust 0.673 NA rfi.scaled nfi.scaled 0.634 0.948 ifi.scaled rni.scaled 0.948 0.953 rni.robust logl NA -4724.918 unrestricted.logl aic -4721.722 9465.835 bic ntotal 9508.922 1613.000 bic2 rmsea 9483.508 0.058 rmsea.ci.lower rmsea.ci.upper 0.022 0.104 rmsea.pvalue rmsea.scaled 0.302 0.057 rmsea.ci.lower.scaled rmsea.ci.upper.scaled 0.021 0.103 rmsea.pvalue.scaled rmsea.robust 0.308 NA rmsea.ci.lower.robust rmsea.ci.upper.robust NA NA rmsea.pvalue.robust rmr NA 0.004 rmr_nomean srmr 0.004 0.016 srmr_bentler srmr_bentler_nomean 0.016 0.016 srmr_bollen srmr_bollen_nomean 0.016 0.016 srmr_mplus srmr_mplus_nomean 0.016 0.016 cn_05 cn_01 970.642 1675.747 gfi agfi 0.996 0.941 pgfi mfi 0.066 0.998 ecvi 0.014
%%R
fitMeasures(semfit, c("cfi", "tli", "rmsea", "srmr"))
cfi tli rmsea srmr 0.954 0.675 0.058 0.016
Now let's try to do the same with OpenMX. Here, we need to first drop variables that we are not actually using in the model, otherwise OpenMX will complain.
smallZambiaDFForOMX = dummiedSmallZambiaDF[["wealth", "ece_ECE", "books_Books_in_home", "reasoning", "socemo"]]
%%R -i smallZambiaDFForOMX
manifests <- names(smallZambiaDFForOMX)
model <- mxModel("zambia simple", type="RAM",
manifestVars=manifests,
mxPath(from=c("wealth", "ece_ECE"), to="socemo", arrows=1, free=TRUE),
mxPath(from=c("wealth", "ece_ECE", "books_Books_in_home"), to="reasoning", arrows=1, free=TRUE),
mxPath(from="socemo", to="reasoning", arrows=2),
mxPath(from=manifests, arrows=2, values=.8), # needs starting values or it won't converge
mxPath(from="one", to=manifests),
mxData(observed=smallZambiaDFForOMX, type="raw"))
model
MxModel 'zambia simple' type : RAM $matrices : 'A', 'S', 'F', and 'M' $algebras : NULL $constraints : NULL $intervals : NULL $latentVars : none $manifestVars : 'wealth', 'ece_ECE', 'books_Books_in_home', 'reasoning', and 'socemo' $data : 1613 x 5 $data means : NA $data type: 'raw' $submodels : NULL $expectation : MxExpectationRAM $fitfunction : MxFitFunctionML $compute : NULL $independent : FALSE $options : $output : FALSE
%%R
modelRun <- mxRun(model)
%%R
mxCheckIdentification(model)
$status [1] TRUE $jacobian zambia simple.A[4,1] zambia simple.A[5,1] zambia simple.A[4,2] cov1_1 0.0 0.0 0.0 cov2_1 0.0 0.0 0.0 cov3_1 0.0 0.0 0.0 cov4_1 0.8 0.0 0.0 cov5_1 0.0 0.8 0.0 cov2_2 0.0 0.0 0.0 cov3_2 0.0 0.0 0.0 cov4_2 0.0 0.0 0.8 cov5_2 0.0 0.0 0.0 cov3_3 0.0 0.0 0.0 cov4_3 0.0 0.0 0.0 cov5_3 0.0 0.0 0.0 cov4_4 0.0 0.0 0.0 cov5_4 0.0 0.0 0.0 cov5_5 0.0 0.0 0.0 mean1 0.0 0.0 0.0 mean2 0.0 0.0 0.0 mean3 0.0 0.0 0.0 mean4 0.0 0.0 0.0 mean5 0.0 0.0 0.0 zambia simple.A[5,2] zambia simple.A[4,3] zambia simple.S[1,1] cov1_1 0.0 0.0 1 cov2_1 0.0 0.0 0 cov3_1 0.0 0.0 0 cov4_1 0.0 0.0 0 cov5_1 0.0 0.0 0 cov2_2 0.0 0.0 0 cov3_2 0.0 0.0 0 cov4_2 0.0 0.0 0 cov5_2 0.8 0.0 0 cov3_3 0.0 0.0 0 cov4_3 0.0 0.8 0 cov5_3 0.0 0.0 0 cov4_4 0.0 0.0 0 cov5_4 0.0 0.0 0 cov5_5 0.0 0.0 0 mean1 0.0 0.0 0 mean2 0.0 0.0 0 mean3 0.0 0.0 0 mean4 0.0 0.0 0 mean5 0.0 0.0 0 zambia simple.S[2,2] zambia simple.S[3,3] zambia simple.S[4,4] cov1_1 0 0 0 cov2_1 0 0 0 cov3_1 0 0 0 cov4_1 0 0 0 cov5_1 0 0 0 cov2_2 1 0 0 cov3_2 0 0 0 cov4_2 0 0 0 cov5_2 0 0 0 cov3_3 0 1 0 cov4_3 0 0 0 cov5_3 0 0 0 cov4_4 0 0 1 cov5_4 0 0 0 cov5_5 0 0 0 mean1 0 0 0 mean2 0 0 0 mean3 0 0 0 mean4 0 0 0 mean5 0 0 0 zambia simple.S[4,5] zambia simple.S[5,5] zambia simple.M[1,1] cov1_1 0 0 0 cov2_1 0 0 0 cov3_1 0 0 0 cov4_1 0 0 0 cov5_1 0 0 0 cov2_2 0 0 0 cov3_2 0 0 0 cov4_2 0 0 0 cov5_2 0 0 0 cov3_3 0 0 0 cov4_3 0 0 0 cov5_3 0 0 0 cov4_4 0 0 0 cov5_4 1 0 0 cov5_5 0 1 0 mean1 0 0 1 mean2 0 0 0 mean3 0 0 0 mean4 0 0 0 mean5 0 0 0 zambia simple.M[1,2] zambia simple.M[1,3] zambia simple.M[1,4] cov1_1 0 0 0 cov2_1 0 0 0 cov3_1 0 0 0 cov4_1 0 0 0 cov5_1 0 0 0 cov2_2 0 0 0 cov3_2 0 0 0 cov4_2 0 0 0 cov5_2 0 0 0 cov3_3 0 0 0 cov4_3 0 0 0 cov5_3 0 0 0 cov4_4 0 0 0 cov5_4 0 0 0 cov5_5 0 0 0 mean1 0 0 0 mean2 1 0 0 mean3 0 1 0 mean4 0 0 1 mean5 0 0 0 zambia simple.M[1,5] cov1_1 0 cov2_1 0 cov3_1 0 cov4_1 0 cov5_1 0 cov2_2 0 cov3_2 0 cov4_2 0 cov5_2 0 cov3_3 0 cov4_3 0 cov5_3 0 cov4_4 0 cov5_4 0 cov5_5 0 mean1 0 mean2 0 mean3 0 mean4 0 mean5 1 $non_identified_parameters [1] "None"
%%R
summary(modelRun)
Summary of zambia simple free parameters: name matrix row col 1 zambia simple.A[4,1] A reasoning wealth 2 zambia simple.A[5,1] A socemo wealth 3 zambia simple.A[4,2] A reasoning ece_ECE 4 zambia simple.A[5,2] A socemo ece_ECE 5 zambia simple.A[4,3] A reasoning books_Books_in_home 6 zambia simple.S[1,1] S wealth wealth 7 zambia simple.S[2,2] S ece_ECE ece_ECE 8 zambia simple.S[3,3] S books_Books_in_home books_Books_in_home 9 zambia simple.S[4,4] S reasoning reasoning 10 zambia simple.S[4,5] S reasoning socemo 11 zambia simple.S[5,5] S socemo socemo 12 zambia simple.M[1,1] M 1 wealth 13 zambia simple.M[1,2] M 1 ece_ECE 14 zambia simple.M[1,3] M 1 books_Books_in_home 15 zambia simple.M[1,4] M 1 reasoning 16 zambia simple.M[1,5] M 1 socemo Estimate Std.Error A 1 0.208256249 0.045763284 2 0.009257539 0.008323664 3 0.741469780 0.144594365 4 0.100492479 0.025700813 5 0.296275774 0.142460515 6 2.045906814 0.072042943 7 0.214596238 0.007556487 8 0.198934598 0.007005001 9 6.021752810 0.212049316 10 -0.079150620 0.027477928 11 0.200411594 0.007057006 12 2.957222635 0.035614481 13 0.311841230 0.011534387 14 0.274023513 0.011105512 15 3.523063821 0.141593354 16 1.589071982 0.025607614 Model Statistics: | Parameters | Degrees of Freedom | Fit (-2lnL units) Model: 16 8049 19249.93 Saturated: 20 8045 NA Independence: 10 8055 NA Number of observations/statistics: 1613/8065 Information Criteria: | df Penalty | Parameters Penalty | Sample-Size Adjusted AIC: 3151.934 19281.93 19282.28 BIC: -40198.781 19368.11 19317.28 To get additional fit indices, see help(mxRefModels) timestamp: 2018-11-20 01:04:09 Wall clock time: 0.07181001 secs optimizer: CSOLNP OpenMx version number: 2.9.9 Need help? See help(mxSummary)
%%R
semPaths(modelRun, "std")
%%R
semPaths(modelRun, "std", curvePivot=TRUE, intercepts=FALSE, layout="tree2", rotation=2, nCharNodes=9, sizeMan=15,
groups=list(c("wealth", "ece_ECE", "books_Books_in_home"), c("reasoning", "socemo")),
color=c("lightblue", "purple"))
There's also another package for doing SEM in R, calles sem
, but it seems older and less well supported/documented.
Research Question: Do ECE participation and book availability fully explain the relations between wealth and children's socio-emotional & reasoning skills in the Zambian context?
%%R -i dummiedSmallZambiaDF
biggermodel <- '
ece_ECE ~ wealth
books_Books_in_home ~ wealth
socemo ~ ece_ECE + books_Books_in_home
reasoning ~ ece_ECE + books_Books_in_home
socemo ~~ reasoning
'
semfit2 <- sem(biggermodel, data=dummiedSmallZambiaDF, estimator="MLR", meanstructure=TRUE)
# WLS would correspond to the adf method, but does not run to completion with this dataset
summary(semfit2, standardized=TRUE)
lavaan 0.6-2 ended normally after 35 iterations Optimization method NLMINB Number of free parameters 15 Number of observations 1613 Estimator ML Robust Model Fit Test Statistic 105.787 98.505 Degrees of freedom 3 3 P-value (Chi-square) 0.000 0.000 Scaling correction factor 1.074 for the Yuan-Bentler correction (Mplus variant) Parameter Estimates: Information Observed Observed information based on Hessian Standard Errors Robust.huber.white Regressions: Estimate Std.Err z-value P(>|z|) Std.lv Std.all ece_ECE ~ wealth 0.114 0.008 14.928 0.000 0.114 0.351 books_Books_in_home ~ wealth 0.051 0.008 6.560 0.000 0.051 0.162 socemo ~ ece_ECE 0.093 0.025 3.721 0.000 0.093 0.096 books_Bks_n_hm 0.067 0.026 2.588 0.010 0.067 0.067 reasoning ~ ece_ECE 0.961 0.143 6.730 0.000 0.961 0.177 books_Bks_n_hm 0.320 0.148 2.159 0.031 0.320 0.057 Covariances: Estimate Std.Err z-value P(>|z|) Std.lv Std.all .socemo ~~ .reasoning -0.076 0.030 -2.574 0.010 -0.076 -0.069 Intercepts: Estimate Std.Err z-value P(>|z|) Std.lv Std.all .ece_ECE -0.025 0.023 -1.089 0.276 -0.025 -0.053 .books_Bks_n_hm 0.124 0.024 5.262 0.000 0.124 0.279 .socemo 1.600 0.014 112.146 0.000 1.600 3.555 .reasoning 4.064 0.076 53.429 0.000 4.064 1.616 Variances: Estimate Std.Err z-value P(>|z|) Std.lv Std.all .ece_ECE 0.188 0.005 39.133 0.000 0.188 0.877 .books_Bks_n_hm 0.194 0.005 38.651 0.000 0.194 0.974 .socemo 0.200 0.006 31.190 0.000 0.200 0.986 .reasoning 6.099 0.175 34.873 0.000 6.099 0.964
%%R
fitMeasures(semfit2, c("cfi", "tli", "rmsea", "srmr"))
cfi tli rmsea srmr 0.882 0.608 0.110 0.046
%%R
semPaths(semfit2, "std", curvePivot=TRUE, intercepts=FALSE, layout="tree2", rotation=2, nCharNodes=9, sizeMan=15,
groups=list(c("wealth", "ece_ECE", "books_Books_in_home"), c("reasoning", "socemo")),
color=c("lightblue", "purple"))
simsem's semtools, see https://github.com/simsem/semTools/wiki/Functions
%%R
chisqSmallN(semfit)
naive.chisq k-factor adj.chisq df pvalue 0.000 0.999 0.000 0.000 1.000
bigsem, ctsem, gsem, lavaan.survey, regsem, semdiag
A caveat regarding fit measures: similar to the arbitraty cutoff for p-values indicating significance, the use of fit indices is controversial.
Usual indices:
%%R
fitMeasures(semfit, c("cfi", "tli", "rmsea", "srmr"))
cfi tli rmsea srmr 0.954 0.675 0.058 0.013
A simpler model to show mediation.
%%R -i dummiedSmallZambiaDF
simplermodel <- '
# Here we use a and b and c to keep track of the parameter estimations,
# so that we can then use it to estimate the indirect and total effects.
# ece is the supposed Mediator
ece_ECE ~ a*wealth
socemo ~ c*wealth + b*ece_ECE
# also estimate indirect effect:
indirect := a*b
# total effect:
total := c + (a*b)
'
semfit3 <- sem(simplermodel, data=dummiedSmallZambiaDF, estimator="MLR", meanstructure=TRUE)
summary(semfit3, standardized=TRUE)
lavaan 0.6-2 ended normally after 21 iterations Optimization method NLMINB Number of free parameters 7 Number of observations 1613 Estimator ML Robust Model Fit Test Statistic 0.000 0.000 Degrees of freedom 0 0 Minimum Function Value 0.0000000000000 Scaling correction factor NA for the Yuan-Bentler correction (Mplus variant) Parameter Estimates: Information Observed Observed information based on Hessian Standard Errors Robust.huber.white Regressions: Estimate Std.Err z-value P(>|z|) Std.lv Std.all ece_ECE ~ wealth (a) 0.114 0.008 14.928 0.000 0.114 0.351 socemo ~ wealth (c) 0.009 0.008 1.166 0.244 0.009 0.029 ece_ECE (b) 0.100 0.026 3.920 0.000 0.100 0.103 Intercepts: Estimate Std.Err z-value P(>|z|) Std.lv Std.all .ece_ECE -0.025 0.023 -1.089 0.276 -0.025 -0.053 .socemo 1.589 0.024 64.924 0.000 1.589 3.525 Variances: Estimate Std.Err z-value P(>|z|) Std.lv Std.all .ece_ECE 0.188 0.005 39.133 0.000 0.188 0.877 .socemo 0.200 0.006 31.147 0.000 0.200 0.986 Defined Parameters: Estimate Std.Err z-value P(>|z|) Std.lv Std.all indirect 0.011 0.003 3.815 0.000 0.011 0.036 total 0.021 0.007 2.772 0.006 0.021 0.066
%%R
fitMeasures(semfit3, c("cfi", "tli", "rmsea", "srmr"))
cfi tli rmsea srmr 1 1 0 0
%%R
semPaths(semfit3, "std", curvePivot=TRUE, intercepts=FALSE, layout="tree2", rotation=2, nCharNodes=9, sizeMan=15,
groups=list(c("wealth", "ece_ECE"), c("socemo")),
color=c("lightblue", "purple"))
Other packages that should be useful for mediation analysis, for example to compare results: psych
and MBESS
, see here http://nickmichalak.com/blog_entries/2018/nrg01/nrg01.html
The semtools package is helpful for this.
%%R -i dummiedSmallZambiaDF
twostagesemfit1 <- sem.2stage(model=biggermodel, data=dummiedSmallZambiaDF, estimator="MLR", meanstructure=TRUE)
summary(twostagesemfit1, standardized=TRUE)
Chi-squared test(s) results, ADJUSTED for missing data: Browne (1984) residual-based test statistic: chisq df pvalue 97.915 3.000 0.000 Satorra-Bentler (2001) scaled test statistic: chisq.naive scaling.factor chisq.scaled df pvalue 105.787 0.001 178767.164 3.000 0.000 Chi-squared test results, UNADJUSTED for missing data: lavaan 0.6-2 ended normally after 35 iterations Optimization method NLMINB Number of free parameters 17 Number of observations 1613 Estimator ML Model Fit Test Statistic 105.787 Degrees of freedom 3 P-value (Chi-square) 0.000 Parameter Estimates, with SEs (and tests/CIs) ADJUSTED for missing data: lhs op rhs est se z pvalue 1 ece_ECE ~ wealth 0.114 0.008 14.943 0.000 2 books_Books_in_home ~ wealth 0.051 0.008 6.599 0.000 3 socemo ~ ece_ECE 0.093 0.024 3.925 0.000 4 socemo ~ books_Books_in_home 0.067 0.025 2.709 0.007 5 reasoning ~ ece_ECE 0.961 0.131 7.315 0.000 6 reasoning ~ books_Books_in_home 0.320 0.138 2.321 0.020 7 socemo ~~ reasoning -0.076 0.028 -2.706 0.007 8 ece_ECE ~~ ece_ECE 0.188 0.008 24.811 0.000 9 books_Books_in_home ~~ books_Books_in_home 0.194 0.008 24.842 0.000 10 socemo ~~ socemo 0.200 0.007 28.354 0.000 11 reasoning ~~ reasoning 6.099 0.223 27.342 0.000 12 wealth ~~ wealth 2.046 0.074 27.537 0.000 13 ece_ECE ~1 -0.025 0.025 -0.988 0.323 14 books_Books_in_home ~1 0.124 0.025 4.943 0.000 15 socemo ~1 1.600 0.016 102.991 0.000 16 reasoning ~1 4.064 0.086 47.308 0.000 17 wealth ~1 2.957 0.036 83.034 0.000 ci.lower ci.upper std.lv std.all std.nox 1 0.099 0.129 0.114 0.351 0.246 2 0.036 0.066 0.051 0.162 0.113 3 0.047 0.140 0.093 0.096 0.096 4 0.019 0.116 0.067 0.067 0.067 5 0.704 1.219 0.961 0.177 0.177 6 0.050 0.591 0.320 0.057 0.057 7 -0.131 -0.021 -0.076 -0.069 -0.069 8 0.173 0.203 0.188 0.877 0.877 9 0.178 0.209 0.194 0.974 0.974 10 0.186 0.214 0.200 0.986 0.986 11 5.662 6.537 6.099 0.964 0.964 12 1.900 2.192 2.046 1.000 2.046 13 -0.074 0.024 -0.025 -0.053 -0.053 14 0.075 0.174 0.124 0.279 0.279 15 1.570 1.631 1.600 3.555 3.555 16 3.895 4.232 4.064 1.616 1.616 17 2.887 3.027 2.957 2.067 2.957
%%R -i dummiedSmallZambiaDF
biggerermodel <- '
ece_ECE ~ wealth
books_Books_in_home ~ wealth
socemo ~ wealth + ece_ECE + books_Books_in_home
reasoning ~ wealth + ece_ECE + books_Books_in_home
socemo ~~ reasoning
'
twostagesemfit2 <- sem.2stage(model=biggerermodel, data=dummiedSmallZambiaDF, estimator="MLR", meanstructure=TRUE)
summary(twostagesemfit2, standardized=TRUE)
Chi-squared test(s) results, ADJUSTED for missing data: Browne (1984) residual-based test statistic: chisq df pvalue 77.505 1.000 0.000 Satorra-Bentler (2001) scaled test statistic: chisq.naive scaling.factor chisq.scaled df pvalue 83.543 0.001 160732.251 1.000 0.000 Chi-squared test results, UNADJUSTED for missing data: lavaan 0.6-2 ended normally after 38 iterations Optimization method NLMINB Number of free parameters 19 Number of observations 1613 Estimator ML Model Fit Test Statistic 83.543 Degrees of freedom 1 P-value (Chi-square) 0.000 Parameter Estimates, with SEs (and tests/CIs) ADJUSTED for missing data: lhs op rhs est se z pvalue 1 ece_ECE ~ wealth 0.114 0.008 15.072 0.000 2 books_Books_in_home ~ wealth 0.051 0.008 6.602 0.000 3 socemo ~ wealth 0.008 0.009 0.898 0.369 4 socemo ~ ece_ECE 0.086 0.026 3.334 0.001 5 socemo ~ books_Books_in_home 0.066 0.025 2.597 0.009 6 reasoning ~ wealth 0.209 0.047 4.472 0.000 7 reasoning ~ ece_ECE 0.747 0.141 5.305 0.000 8 reasoning ~ books_Books_in_home 0.270 0.139 1.947 0.051 9 socemo ~~ reasoning -0.079 0.027 -2.881 0.004 10 ece_ECE ~~ ece_ECE 0.188 0.008 24.832 0.000 11 books_Books_in_home ~~ books_Books_in_home 0.194 0.008 24.832 0.000 12 socemo ~~ socemo 0.200 0.007 28.399 0.000 13 reasoning ~~ reasoning 6.022 0.212 28.399 0.000 14 wealth ~~ wealth 2.046 0.072 28.399 0.000 15 ece_ECE ~1 -0.025 0.025 -0.995 0.320 16 books_Books_in_home ~1 0.124 0.025 4.945 0.000 17 socemo ~1 1.581 0.026 61.422 0.000 18 reasoning ~1 3.526 0.141 24.952 0.000 19 wealth ~1 2.957 0.036 83.034 0.000 ci.lower ci.upper std.lv std.all std.nox 1 0.099 0.129 0.114 0.351 0.246 2 0.036 0.066 0.051 0.162 0.113 3 -0.009 0.024 0.008 0.024 0.017 4 0.035 0.136 0.086 0.088 0.088 5 0.016 0.115 0.066 0.065 0.065 6 0.117 0.300 0.209 0.119 0.083 7 0.471 1.023 0.747 0.138 0.138 8 -0.002 0.542 0.270 0.048 0.048 9 -0.132 -0.025 -0.079 -0.072 -0.072 10 0.173 0.203 0.188 0.877 0.877 11 0.178 0.209 0.194 0.974 0.974 12 0.186 0.213 0.200 0.985 0.985 13 5.606 6.437 6.022 0.951 0.951 14 1.905 2.187 2.046 1.000 2.046 15 -0.073 0.024 -0.025 -0.053 -0.053 16 0.075 0.174 0.124 0.279 0.279 17 1.530 1.631 1.581 3.511 3.511 18 3.249 3.803 3.526 1.401 1.401 19 2.887 3.027 2.957 2.067 2.957
%%R
anova(twostagesemfit1, twostagesemfit2) # equivalent to "likelihood-ratio test"
Difference test for Browne (1984) residual-based statistics: chisq df pvalue 20.411 2.000 0.000 Satorra-Bentler (2001) scaled difference test: chisq.naive scaling.factor chisq.scaled DF pvalue 22.244 0.001 35434.111 2.000 0.000
%%R
miPowerFit(semfit2)
lhs op rhs group 12 wealth ~~ wealth 1 18 ece_ECE ~~ books_Books_in_home 1 19 ece_ECE ~~ socemo 1 20 ece_ECE ~~ reasoning 1 21 books_Books_in_home ~~ socemo 1 22 books_Books_in_home ~~ reasoning 1 23 ece_ECE ~ books_Books_in_home 1 24 ece_ECE ~ socemo 1 25 ece_ECE ~ reasoning 1 26 books_Books_in_home ~ ece_ECE 1 27 books_Books_in_home ~ socemo 1 28 books_Books_in_home ~ reasoning 1 30 socemo ~ wealth 1 32 reasoning ~ wealth 1 33 wealth ~ ece_ECE 1 34 wealth ~ books_Books_in_home 1 35 wealth ~ socemo 1 36 wealth ~ reasoning 1 mi epc target.epc 12 0.000000000000000000000000000003003 -0.000000000000001431 0.20459 18 81.416605542162955089224851690232754 0.042884997752392018 0.02066 19 1.544749222765177121274859928234946 -0.017232019584356661 0.02085 20 21.625988933431568028709079953841865 -0.356307742032899999 0.11651 21 1.544738483979728682626841873570811 -0.039922711700388476 0.02008 22 21.626245636034973074401932535693049 -0.825492627898089082 0.11218 23 81.416625996730118686173227615654469 0.221397869134911884 0.10386 24 0.619563202904253484959440356760751 0.053795614873202932 0.10291 25 9.314088537604437334493923117406666 -0.037919599570400378 0.01842 26 81.416636979079456182262219954282045 0.227984502302336800 0.09628 27 14.613812036635893676361774851102382 0.529474396339441045 0.09908 28 13.254150786988219579143333248794079 0.071090656370003724 0.01773 30 1.544754756539778339785584648780059 0.010424279832891337 0.03147 32 21.626012906364280752313788980245590 0.215543337637906313 0.17583 33 0.000000005867903821677934663066101 0.000135862687082167 0.30877 34 0.000000008669718146489592254175238 0.000371547803528799 0.32069 35 0.855398203706040693994339108030545 0.079649234055344187 0.31774 36 20.934768677467971542682789731770754 0.071299881188622113 0.05687 std.epc std.target.epc significant.mi high.power decision.pow 12 -0.0000000000000006995 0.1 FALSE FALSE I 18 0.2075578170512830112 0.1 TRUE TRUE EPC:M 19 -0.0826332446876676813 0.1 FALSE FALSE I 20 -0.3058234554311530795 0.1 TRUE FALSE M 21 -0.1988357948464972713 0.1 FALSE FALSE I 22 -0.7358929489158683168 0.1 TRUE FALSE M 23 0.2131657988650298707 0.1 TRUE TRUE EPC:M 24 0.0522764444171291212 0.1 FALSE FALSE I 25 -0.2058713928429005968 0.1 TRUE FALSE M 26 0.2367888435868654440 0.1 TRUE TRUE EPC:M 27 0.5343921132592205359 0.1 TRUE FALSE M 28 0.4008673589843934626 0.1 TRUE FALSE M 30 0.0331221683287136609 0.1 FALSE TRUE NM 32 0.1225841182785387951 0.1 TRUE TRUE EPC:M 33 0.0000440015864972028 0.1 FALSE FALSE I 34 0.0001158582390606253 0.1 FALSE FALSE I 35 0.0250673777159869729 0.1 FALSE TRUE NM 36 0.1253687229667201364 0.1 TRUE TRUE EPC:M se.epc lower.epc upper.epc lower.std.epc upper.std.epc decision.ci 12 0.825817 -1.358348 1.358348 -0.66393 0.66393 I 18 0.004753 0.035067 0.050703 0.16972 0.24539 M 19 0.013865 -0.040037 0.005573 -0.19199 0.02673 I 20 0.076619 -0.482335 -0.230280 -0.41399 -0.19765 M 21 0.032121 -0.092757 0.012912 -0.46198 0.06431 I 22 0.177510 -1.117470 -0.533515 -0.99618 -0.47561 M 23 0.024537 0.181039 0.261757 0.17431 0.25202 M 24 0.068345 -0.058621 0.166212 -0.05697 0.16152 I 25 0.012425 -0.058357 -0.017482 -0.31683 -0.09491 I 26 0.025267 0.186424 0.269545 0.19362 0.27995 M 27 0.138504 0.301655 0.757294 0.30446 0.76433 M 28 0.019527 0.038972 0.103210 0.21975 0.58198 M 30 0.008387 -0.003371 0.024220 -0.01071 0.07696 NM 32 0.046350 0.139305 0.291782 0.07923 0.16594 I 33 1.773612 -2.917197 2.917469 -0.94479 0.94487 I 34 3.990361 -6.563189 6.563932 -2.04657 2.04680 I 35 0.086119 -0.062003 0.221302 -0.01951 0.06965 NM 36 0.015583 0.045668 0.096932 0.08030 0.17044 I
Compare a model across groups in the data, for example male vs. female. See http://lavaan.ugent.be/tutorial/groups.html for how to specify path constraints across groups.
%%R -i dummiedSmallZambiaDF
multigroupsemfit <- sem(model=biggermodel, data=dummiedSmallZambiaDF, estimator="MLR", meanstructure=TRUE,
group="male_Male")
summary(multigroupsemfit, standardized=TRUE)
lavaan 0.6-2 ended normally after 66 iterations Optimization method NLMINB Number of free parameters 30 Number of observations per group 1 803 0 810 Estimator ML Robust Model Fit Test Statistic 112.355 104.373 Degrees of freedom 6 6 P-value (Chi-square) 0.000 0.000 Scaling correction factor 1.076 for the Yuan-Bentler correction (Mplus variant) Chi-square for each group: 1 70.945 65.905 0 41.410 38.468 Parameter Estimates: Information Observed Observed information based on Hessian Standard Errors Robust.huber.white Group 1 [1]: Regressions: Estimate Std.Err z-value P(>|z|) Std.lv Std.all ece_ECE ~ wealth 0.118 0.011 11.214 0.000 0.118 0.372 books_Books_in_home ~ wealth 0.038 0.011 3.490 0.000 0.038 0.126 socemo ~ ece_ECE 0.076 0.036 2.104 0.035 0.076 0.078 books_Bks_n_hm 0.050 0.038 1.321 0.186 0.050 0.049 reasoning ~ ece_ECE 0.862 0.207 4.165 0.000 0.862 0.155 books_Bks_n_hm 0.353 0.220 1.604 0.109 0.353 0.061 Covariances: Estimate Std.Err z-value P(>|z|) Std.lv Std.all .socemo ~~ .reasoning -0.079 0.042 -1.891 0.059 -0.079 -0.070 Intercepts: Estimate Std.Err z-value P(>|z|) Std.lv Std.all .ece_ECE -0.038 0.031 -1.205 0.228 -0.038 -0.082 .books_Bks_n_hm 0.152 0.034 4.508 0.000 0.152 0.346 .socemo 1.567 0.020 78.464 0.000 1.567 3.497 .reasoning 4.155 0.110 37.657 0.000 4.155 1.622 Variances: Estimate Std.Err z-value P(>|z|) Std.lv Std.all .ece_ECE 0.184 0.007 26.400 0.000 0.184 0.861 .books_Bks_n_hm 0.191 0.007 26.100 0.000 0.191 0.984 .socemo 0.199 0.009 21.330 0.000 0.199 0.991 .reasoning 6.375 0.257 24.797 0.000 6.375 0.971 Group 2 [0]: Regressions: Estimate Std.Err z-value P(>|z|) Std.lv Std.all ece_ECE ~ wealth 0.109 0.011 9.902 0.000 0.109 0.330 books_Books_in_home ~ wealth 0.064 0.011 5.859 0.000 0.064 0.199 socemo ~ ece_ECE 0.111 0.035 3.208 0.001 0.111 0.115 books_Bks_n_hm 0.081 0.036 2.257 0.024 0.081 0.081 reasoning ~ ece_ECE 1.056 0.198 5.347 0.000 1.056 0.199 books_Bks_n_hm 0.302 0.201 1.503 0.133 0.302 0.055 Covariances: Estimate Std.Err z-value P(>|z|) Std.lv Std.all .socemo ~~ .reasoning -0.068 0.041 -1.667 0.096 -0.068 -0.064 Intercepts: Estimate Std.Err z-value P(>|z|) Std.lv Std.all .ece_ECE -0.010 0.033 -0.320 0.749 -0.010 -0.023 .books_Bks_n_hm 0.094 0.033 2.855 0.004 0.094 0.208 .socemo 1.634 0.020 80.741 0.000 1.634 3.647 .reasoning 3.971 0.104 38.032 0.000 3.971 1.610 Variances: Estimate Std.Err z-value P(>|z|) Std.lv Std.all .ece_ECE 0.192 0.007 28.994 0.000 0.192 0.891 .books_Bks_n_hm 0.195 0.007 28.525 0.000 0.195 0.961 .socemo 0.196 0.009 22.897 0.000 0.196 0.979 .reasoning 5.813 0.235 24.713 0.000 5.813 0.956
SEM can be used not only to relate measured variables, as above, but also to relate indicators (items, or full "scales" that combine items) to theorized "latent" variables that cannot be directly observed.
We use the extended version of the zambia dataset here which includes several more variables that are directly observed items. Initially, we'll use 5 items each for a socioemotional scale and a task-orientation scale. Each item is rated on a scale from 1 (never) to 4 (always).
measureZambiaDF.head()
childid | male | urban | reas1 | reas2 | reas3 | reas4 | reas5 | reas6 | reas7 | ... | to12 | to13 | ece | reasoning | socemo | vocab | vocabsq | taskorient | wealth | books | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | 101 | Male | Urban | Correctly answered | Correctly answered | Correctly answered | Incorrect | Incorrect | Incorrect | Incorrect | ... | 2.0 | 4.0 | ECE | 3.0 | 1.05 | 18.0 | 324.0 | 3.615385 | 3.0 | No books in home |
1 | 102 | Male | Urban | Correctly answered | Incorrect | Correctly answered | Correctly answered | Incorrect | Incorrect | Correctly answered | ... | 4.0 | 4.0 | No ECE | 4.0 | 1.00 | 19.0 | 361.0 | 3.769231 | 2.0 | No books in home |
2 | 103 | Female | Urban | Correctly answered | Correctly answered | Correctly answered | Incorrect | Incorrect | Correctly answered | Incorrect | ... | 4.0 | 4.0 | No ECE | 4.0 | 1.80 | 19.0 | 361.0 | 3.769231 | 3.0 | No books in home |
3 | 104 | Male | Urban | Correctly answered | Correctly answered | Correctly answered | Incorrect | Incorrect | Incorrect | Correctly answered | ... | 4.0 | 3.0 | No ECE | 5.0 | 2.35 | 12.0 | 144.0 | 3.692308 | 3.0 | No books in home |
4 | 105 | Male | Urban | Correctly answered | Correctly answered | Correctly answered | Incorrect | Incorrect | Correctly answered | Incorrect | ... | 4.0 | 3.0 | No ECE | 5.0 | 1.50 | 25.0 | 625.0 | 3.461539 | 2.0 | Books in home |
5 rows × 84 columns
measureZambiaDF.info()
<class 'pandas.core.frame.DataFrame'> Int64Index: 1623 entries, 0 to 1622 Data columns (total 84 columns): childid 1623 non-null int16 male 1623 non-null category urban 1623 non-null category reas1 1623 non-null category reas2 1623 non-null category reas3 1623 non-null category reas4 1623 non-null category reas5 1623 non-null category reas6 1623 non-null category reas7 1623 non-null category reas8 1623 non-null category reas9 1623 non-null category reas10 1623 non-null category se1 1551 non-null category se2 1461 non-null category se3 1512 non-null category se4 1587 non-null category se5 1482 non-null category se6 1539 non-null category se7 1562 non-null category se8 1567 non-null category se9 1550 non-null category se10 1569 non-null category se11 1571 non-null category se12 1516 non-null category se13 1560 non-null category se14 1532 non-null category se15 1559 non-null category se16 1476 non-null category se17 1581 non-null category se18 1484 non-null category se19 1573 non-null category se20 1569 non-null category voc1 1623 non-null float32 voc2 1623 non-null float32 voc3 1623 non-null float32 voc4 1623 non-null float32 voc5 1623 non-null float32 voc6 1623 non-null float32 voc7 1623 non-null float32 voc8 1623 non-null float32 voc9 1623 non-null float32 voc10 1623 non-null float32 voc11 1623 non-null float32 voc12 1623 non-null float32 voc13 1623 non-null float32 voc14 1623 non-null float32 voc15 1623 non-null float32 voc16 1623 non-null float32 voc17 1623 non-null float32 voc18 1623 non-null float32 voc19 1623 non-null float32 voc20 1623 non-null float32 voc21 1623 non-null float32 voc22 1623 non-null float32 voc23 1623 non-null float32 voc24 1623 non-null float32 voc25 1623 non-null float32 voc26 1623 non-null float32 voc27 1623 non-null float32 voc28 1623 non-null float32 voc29 1623 non-null float32 voc30 1623 non-null float32 to1 1589 non-null float32 to2 1585 non-null float32 to3 1573 non-null float32 to4 1587 non-null float32 to5 1588 non-null float32 to6 1591 non-null float32 to7 1563 non-null float32 to8 1572 non-null float32 to9 1552 non-null float32 to10 1551 non-null float32 to11 1547 non-null float32 to12 1548 non-null float32 to13 1566 non-null float32 ece 1623 non-null category reasoning 1623 non-null float32 socemo 1613 non-null float32 vocab 1623 non-null float32 vocabsq 1623 non-null float32 taskorient 1597 non-null float32 wealth 1623 non-null float32 books 1623 non-null category dtypes: category(34), float32(49), int16(1) memory usage: 385.5 KB
measureZambiaDF.describe(include="all")[["se1", "se2", "to1", "to2"]]
se1 | se2 | to1 | to2 | |
---|---|---|---|---|
count | 1551 | 1461 | 1589.000000 | 1585.000000 |
unique | 4 | 4 | NaN | NaN |
top | Sometimes | Sometimes | NaN | NaN |
freq | 648 | 704 | NaN | NaN |
mean | NaN | NaN | 3.111391 | 3.076341 |
std | NaN | NaN | 0.971020 | 0.950069 |
min | NaN | NaN | 1.000000 | 1.000000 |
25% | NaN | NaN | 2.000000 | 3.000000 |
50% | NaN | NaN | 3.000000 | 3.000000 |
75% | NaN | NaN | 4.000000 | 4.000000 |
max | NaN | NaN | 4.000000 | 4.000000 |
The se variables are not yet numerically coded, so we have to fix that, and prepare dummy-coding if needed.
measureZambiaDF["se1"].unique()
[Never, Always, Sometimes, Usually, NaN] Categories (4, object): [Never < Sometimes < Usually < Always]
for colnum in range(1,6):
measureZambiaDF["se" + str(colnum) + "c"] = measureZambiaDF["se" + str(colnum)].cat.codes
measureZambiaDF["se1c"].unique()
array([ 0, 3, 1, 2, -1], dtype=int64)
Let's check correlations to make sure our assumptions, i.e. that indicators within a factor are correlated, is correct.
# Calculate correlations
corr = measureZambiaDF[["se1c", "se2c", "se3c", "se4c", "se5c", "to1", "to2", "to3", "to4", "to5", "socemo", "taskorient"]].corr()
# Heatmap
plt.figure()
sns.heatmap(corr)
plt.show()
corr
se1c | se2c | se3c | se4c | se5c | to1 | to2 | to3 | to4 | to5 | socemo | taskorient | |
---|---|---|---|---|---|---|---|---|---|---|---|---|
se1c | 1.000000 | 0.448828 | 0.397943 | 0.315406 | 0.299793 | 0.057045 | 0.055181 | 0.092964 | 0.057652 | 0.004773 | 0.529947 | 0.048885 |
se2c | 0.448828 | 1.000000 | 0.362351 | 0.215676 | 0.299680 | 0.034002 | 0.053028 | 0.043206 | -0.022127 | -0.008994 | 0.386499 | 0.006681 |
se3c | 0.397943 | 0.362351 | 1.000000 | 0.317611 | 0.332186 | 0.041432 | 0.049954 | 0.101046 | 0.055205 | 0.030859 | 0.525709 | 0.045439 |
se4c | 0.315406 | 0.215676 | 0.317611 | 1.000000 | 0.283498 | 0.033503 | 0.065899 | 0.055018 | 0.037609 | 0.045378 | 0.501649 | 0.027030 |
se5c | 0.299793 | 0.299680 | 0.332186 | 0.283498 | 1.000000 | 0.037358 | 0.061677 | 0.090891 | 0.032899 | 0.062674 | 0.521056 | 0.053759 |
to1 | 0.057045 | 0.034002 | 0.041432 | 0.033503 | 0.037358 | 1.000000 | 0.689873 | 0.579799 | 0.443605 | 0.475216 | 0.086210 | 0.776598 |
to2 | 0.055181 | 0.053028 | 0.049954 | 0.065899 | 0.061677 | 0.689873 | 1.000000 | 0.604540 | 0.448777 | 0.448261 | 0.124137 | 0.759687 |
to3 | 0.092964 | 0.043206 | 0.101046 | 0.055018 | 0.090891 | 0.579799 | 0.604540 | 1.000000 | 0.444270 | 0.430099 | 0.151745 | 0.725361 |
to4 | 0.057652 | -0.022127 | 0.055205 | 0.037609 | 0.032899 | 0.443605 | 0.448777 | 0.444270 | 1.000000 | 0.518978 | 0.106936 | 0.657686 |
to5 | 0.004773 | -0.008994 | 0.030859 | 0.045378 | 0.062674 | 0.475216 | 0.448261 | 0.430099 | 0.518978 | 1.000000 | 0.108867 | 0.673800 |
socemo | 0.529947 | 0.386499 | 0.525709 | 0.501649 | 0.521056 | 0.086210 | 0.124137 | 0.151745 | 0.106936 | 0.108867 | 1.000000 | 0.115544 |
taskorient | 0.048885 | 0.006681 | 0.045439 | 0.027030 | 0.053759 | 0.776598 | 0.759687 | 0.725361 | 0.657686 | 0.673800 | 0.115544 | 1.000000 |
%%R -i measureZambiaDF
model <- '
to_latent =~ to1 + to2 + to3 + to4 + to5
se_latent =~ se1c + se2c + se3c + se4c + se5c
'
fit <- cfa(model, data=measureZambiaDF)
summary(fit, fit.measures=TRUE)
C:\Users\sr876\AppData\Local\Continuum\anaconda3\lib\site-packages\rpy2-2.9.4-py3.7-win-amd64.egg\rpy2\robjects\pandas2ri.py:67: UserWarning: Error while trying to convert the column "se1". Fall back to string conversion. The error is: Error in as.character.factor(x) : malformed factor (name, str(e))) C:\Users\sr876\AppData\Local\Continuum\anaconda3\lib\site-packages\rpy2-2.9.4-py3.7-win-amd64.egg\rpy2\robjects\pandas2ri.py:67: UserWarning: Error while trying to convert the column "se2". Fall back to string conversion. The error is: Error in as.character.factor(x) : malformed factor (name, str(e))) C:\Users\sr876\AppData\Local\Continuum\anaconda3\lib\site-packages\rpy2-2.9.4-py3.7-win-amd64.egg\rpy2\robjects\pandas2ri.py:67: UserWarning: Error while trying to convert the column "se3". Fall back to string conversion. The error is: Error in as.character.factor(x) : malformed factor (name, str(e))) C:\Users\sr876\AppData\Local\Continuum\anaconda3\lib\site-packages\rpy2-2.9.4-py3.7-win-amd64.egg\rpy2\robjects\pandas2ri.py:67: UserWarning: Error while trying to convert the column "se4". Fall back to string conversion. The error is: Error in as.character.factor(x) : malformed factor (name, str(e))) C:\Users\sr876\AppData\Local\Continuum\anaconda3\lib\site-packages\rpy2-2.9.4-py3.7-win-amd64.egg\rpy2\robjects\pandas2ri.py:67: UserWarning: Error while trying to convert the column "se5". Fall back to string conversion. The error is: Converting pandas "Category" series to R factor is only possible when categories are strings. (name, str(e))) C:\Users\sr876\AppData\Local\Continuum\anaconda3\lib\site-packages\rpy2-2.9.4-py3.7-win-amd64.egg\rpy2\robjects\pandas2ri.py:67: UserWarning: Error while trying to convert the column "se6". Fall back to string conversion. The error is: Error in as.character.factor(x) : malformed factor (name, str(e))) C:\Users\sr876\AppData\Local\Continuum\anaconda3\lib\site-packages\rpy2-2.9.4-py3.7-win-amd64.egg\rpy2\robjects\pandas2ri.py:67: UserWarning: Error while trying to convert the column "se7". Fall back to string conversion. The error is: Error in as.character.factor(x) : malformed factor (name, str(e))) C:\Users\sr876\AppData\Local\Continuum\anaconda3\lib\site-packages\rpy2-2.9.4-py3.7-win-amd64.egg\rpy2\robjects\pandas2ri.py:67: UserWarning: Error while trying to convert the column "se8". Fall back to string conversion. The error is: Error in as.character.factor(x) : malformed factor (name, str(e))) C:\Users\sr876\AppData\Local\Continuum\anaconda3\lib\site-packages\rpy2-2.9.4-py3.7-win-amd64.egg\rpy2\robjects\pandas2ri.py:67: UserWarning: Error while trying to convert the column "se9". Fall back to string conversion. The error is: Error in as.character.factor(x) : malformed factor (name, str(e))) C:\Users\sr876\AppData\Local\Continuum\anaconda3\lib\site-packages\rpy2-2.9.4-py3.7-win-amd64.egg\rpy2\robjects\pandas2ri.py:67: UserWarning: Error while trying to convert the column "se10". Fall back to string conversion. The error is: Error in as.character.factor(x) : malformed factor (name, str(e))) C:\Users\sr876\AppData\Local\Continuum\anaconda3\lib\site-packages\rpy2-2.9.4-py3.7-win-amd64.egg\rpy2\robjects\pandas2ri.py:67: UserWarning: Error while trying to convert the column "se11". Fall back to string conversion. The error is: Error in as.character.factor(x) : malformed factor (name, str(e))) C:\Users\sr876\AppData\Local\Continuum\anaconda3\lib\site-packages\rpy2-2.9.4-py3.7-win-amd64.egg\rpy2\robjects\pandas2ri.py:67: UserWarning: Error while trying to convert the column "se12". Fall back to string conversion. The error is: Error in as.character.factor(x) : malformed factor (name, str(e))) C:\Users\sr876\AppData\Local\Continuum\anaconda3\lib\site-packages\rpy2-2.9.4-py3.7-win-amd64.egg\rpy2\robjects\pandas2ri.py:67: UserWarning: Error while trying to convert the column "se13". Fall back to string conversion. The error is: Error in as.character.factor(x) : malformed factor (name, str(e))) C:\Users\sr876\AppData\Local\Continuum\anaconda3\lib\site-packages\rpy2-2.9.4-py3.7-win-amd64.egg\rpy2\robjects\pandas2ri.py:67: UserWarning: Error while trying to convert the column "se14". Fall back to string conversion. The error is: Error in as.character.factor(x) : malformed factor (name, str(e))) C:\Users\sr876\AppData\Local\Continuum\anaconda3\lib\site-packages\rpy2-2.9.4-py3.7-win-amd64.egg\rpy2\robjects\pandas2ri.py:67: UserWarning: Error while trying to convert the column "se15". Fall back to string conversion. The error is: Error in as.character.factor(x) : malformed factor (name, str(e))) C:\Users\sr876\AppData\Local\Continuum\anaconda3\lib\site-packages\rpy2-2.9.4-py3.7-win-amd64.egg\rpy2\robjects\pandas2ri.py:67: UserWarning: Error while trying to convert the column "se16". Fall back to string conversion. The error is: Error in as.character.factor(x) : malformed factor (name, str(e))) C:\Users\sr876\AppData\Local\Continuum\anaconda3\lib\site-packages\rpy2-2.9.4-py3.7-win-amd64.egg\rpy2\robjects\pandas2ri.py:67: UserWarning: Error while trying to convert the column "se17". Fall back to string conversion. The error is: Error in as.character.factor(x) : malformed factor (name, str(e))) C:\Users\sr876\AppData\Local\Continuum\anaconda3\lib\site-packages\rpy2-2.9.4-py3.7-win-amd64.egg\rpy2\robjects\pandas2ri.py:67: UserWarning: Error while trying to convert the column "se18". Fall back to string conversion. The error is: Error in as.character.factor(x) : malformed factor (name, str(e))) C:\Users\sr876\AppData\Local\Continuum\anaconda3\lib\site-packages\rpy2-2.9.4-py3.7-win-amd64.egg\rpy2\robjects\pandas2ri.py:67: UserWarning: Error while trying to convert the column "se19". Fall back to string conversion. The error is: Error in as.character.factor(x) : malformed factor (name, str(e))) C:\Users\sr876\AppData\Local\Continuum\anaconda3\lib\site-packages\rpy2-2.9.4-py3.7-win-amd64.egg\rpy2\robjects\pandas2ri.py:67: UserWarning: Error while trying to convert the column "se20". Fall back to string conversion. The error is: Error in as.character.factor(x) : malformed factor (name, str(e)))
lavaan 0.6-2 ended normally after 29 iterations Optimization method NLMINB Number of free parameters 21 Used Total Number of observations 1547 1623 Estimator ML Model Fit Test Statistic 219.773 Degrees of freedom 34 P-value (Chi-square) 0.000 Model test baseline model: Minimum Function Test Statistic 4175.115 Degrees of freedom 45 P-value 0.000 User model versus baseline model: Comparative Fit Index (CFI) 0.955 Tucker-Lewis Index (TLI) 0.940 Loglikelihood and Information Criteria: Loglikelihood user model (H0) -19814.396 Loglikelihood unrestricted model (H1) -19704.510 Number of free parameters 21 Akaike (AIC) 39670.792 Bayesian (BIC) 39783.018 Sample-size adjusted Bayesian (BIC) 39716.306 Root Mean Square Error of Approximation: RMSEA 0.059 90 Percent Confidence Interval 0.052 0.067 P-value RMSEA <= 0.05 0.018 Standardized Root Mean Square Residual: SRMR 0.034 Parameter Estimates: Information Expected Information saturated (h1) model Structured Standard Errors Standard Latent Variables: Estimate Std.Err z-value P(>|z|) to_latent =~ to1 1.000 to2 0.984 0.030 32.415 0.000 to3 0.864 0.030 28.936 0.000 to4 0.620 0.027 23.135 0.000 to5 0.655 0.028 23.500 0.000 se_latent =~ se1c 1.000 se2c 0.996 0.063 15.798 0.000 se3c 1.034 0.064 16.161 0.000 se4c 0.645 0.049 13.223 0.000 se5c 1.220 0.085 14.332 0.000 Covariances: Estimate Std.Err z-value P(>|z|) to_latent ~~ se_latent 0.057 0.016 3.489 0.000 Variances: Estimate Std.Err z-value P(>|z|) .to1 0.324 0.017 18.665 0.000 .to2 0.301 0.016 18.284 0.000 .to3 0.411 0.018 22.537 0.000 .to4 0.431 0.017 25.277 0.000 .to5 0.461 0.018 25.161 0.000 .se1c 0.539 0.028 19.129 0.000 .se2c 0.761 0.035 21.731 0.000 .se3c 0.712 0.034 20.787 0.000 .se4c 0.645 0.026 24.863 0.000 .se5c 1.749 0.073 23.896 0.000 to_latent 0.621 0.034 18.044 0.000 se_latent 0.388 0.034 11.426 0.000
%%R
semPaths(fit, "std", curvePivot=TRUE, intercepts=FALSE, layout="tree2", rotation=2, nCharNodes=9, sizeMan=5, sizeLat=10,
groups="latents",
color=c("lightblue", "purple"))
TODO: compare against models (similar to above) that use one latent factor instead of two, or that load some indicators (e.g. se4c) to both.
Let's test whether ECE partially explains the relation between wealth and children's socioemotional and task orientation skills.
# we need dummy coding again here, to use ECE
dummiedMeasureZambiaDF = pd.get_dummies(measureZambiaDF, dtype=np.int8)
# We use int8 instead of uint8 for the dummy variables, because R has no unsigned integers,
# so we would get warnings when passing these new variables into R.
# Further: the new dummy columns use the actual value as part of the column name,
# and therefore might have spaces in the column name.
# this is not great for specifying models, so let us replace the spaces with underscores:
dummiedMeasureZambiaDF.columns = dummiedMeasureZambiaDF.columns.str.replace(' ', '_')
%%R -i dummiedMeasureZambiaDF
fullsrmodel <- '
to_latent =~ to1 + to2 + to3 + to4 + to5
se_latent =~ se1c + se2c + se3c + se4c + se5c
ece_ECE ~ wealth
se_latent ~ ece_ECE + wealth
to_latent ~ ece_ECE + wealth
'
srsemfit <- sem(model=fullsrmodel, data=dummiedMeasureZambiaDF, estimator="MLR", meanstructure=TRUE)
summary(srsemfit, standardized=TRUE)
lavaan 0.6-2 ended normally after 52 iterations Optimization method NLMINB Number of free parameters 38 Used Total Number of observations 1547 1623 Estimator ML Robust Model Fit Test Statistic 242.490 224.444 Degrees of freedom 50 50 P-value (Chi-square) 0.000 0.000 Scaling correction factor 1.080 for the Yuan-Bentler correction (Mplus variant) Parameter Estimates: Information Observed Observed information based on Hessian Standard Errors Robust.huber.white Latent Variables: Estimate Std.Err z-value P(>|z|) Std.lv Std.all to_latent =~ to1 1.000 0.788 0.811 to2 0.984 0.028 35.744 0.000 0.775 0.817 to3 0.862 0.031 27.367 0.000 0.679 0.727 to4 0.621 0.033 18.963 0.000 0.490 0.598 to5 0.654 0.034 19.294 0.000 0.516 0.605 se_latent =~ se1c 1.000 0.627 0.651 se2c 0.990 0.059 16.805 0.000 0.620 0.579 se3c 1.022 0.072 14.228 0.000 0.640 0.603 se4c 0.641 0.057 11.225 0.000 0.401 0.447 se5c 1.212 0.100 12.078 0.000 0.760 0.498 Regressions: Estimate Std.Err z-value P(>|z|) Std.lv Std.all ece_ECE ~ wealth 0.115 0.008 14.756 0.000 0.115 0.353 se_latent ~ ece_ECE 0.125 0.044 2.855 0.004 0.199 0.093 wealth 0.028 0.014 2.013 0.044 0.045 0.064 to_latent ~ ece_ECE 0.171 0.048 3.540 0.000 0.217 0.101 wealth 0.103 0.016 6.432 0.000 0.131 0.188 Covariances: Estimate Std.Err z-value P(>|z|) Std.lv Std.all .to_latent ~~ .se_latent 0.042 0.015 2.732 0.006 0.089 0.089 Intercepts: Estimate Std.Err z-value P(>|z|) Std.lv Std.all .to1 2.743 0.054 51.115 0.000 2.743 2.822 .to2 2.717 0.052 51.998 0.000 2.717 2.861 .to3 2.702 0.047 57.230 0.000 2.702 2.891 .to4 3.258 0.040 81.464 0.000 3.258 3.981 .to5 3.072 0.042 73.536 0.000 3.072 3.601 .se1c 1.433 0.045 31.959 0.000 1.433 1.488 .se2c 1.167 0.045 25.922 0.000 1.167 1.090 .se3c 1.376 0.047 29.391 0.000 1.376 1.296 .se4c 1.582 0.034 46.726 0.000 1.582 1.762 .se5c 1.689 0.058 29.171 0.000 1.689 1.107 .ece_ECE -0.026 0.023 -1.134 0.257 -0.026 -0.056 .to_latent 0.000 0.000 0.000 .se_latent 0.000 0.000 0.000 Variances: Estimate Std.Err z-value P(>|z|) Std.lv Std.all .to1 0.323 0.022 14.525 0.000 0.323 0.342 .to2 0.301 0.022 13.493 0.000 0.301 0.333 .to3 0.412 0.023 17.755 0.000 0.412 0.472 .to4 0.430 0.023 19.102 0.000 0.430 0.642 .to5 0.461 0.023 20.300 0.000 0.461 0.634 .se1c 0.534 0.035 15.350 0.000 0.534 0.576 .se2c 0.762 0.048 15.995 0.000 0.762 0.665 .se3c 0.717 0.042 17.175 0.000 0.717 0.636 .se4c 0.645 0.028 23.178 0.000 0.645 0.800 .se5c 1.749 0.076 22.919 0.000 1.749 0.752 .ece_ECE 0.188 0.005 38.582 0.000 0.188 0.876 .to_latent 0.585 0.032 18.120 0.000 0.941 0.941 .se_latent 0.386 0.040 9.545 0.000 0.983 0.983
%%R
semPaths(srsemfit, "std", curvePivot=TRUE, intercepts=FALSE, layout="tree2", rotation=2, nCharNodes=9, sizeMan=5, sizeLat=10,
groups="latents",
color=c("lightblue", "purple"))
%%R
model <- ' visual =~ x1 + x2 + x3
textual =~ x4 + x5 + x6
speed =~ x7 + x8 + x9 '
fit <- cfa(model, data=HolzingerSwineford1939)
%%R
summary(fit, fit.measures=TRUE)
lavaan 0.6-2 ended normally after 35 iterations Optimization method NLMINB Number of free parameters 21 Number of observations 301 Estimator ML Model Fit Test Statistic 85.306 Degrees of freedom 24 P-value (Chi-square) 0.000 Model test baseline model: Minimum Function Test Statistic 918.852 Degrees of freedom 36 P-value 0.000 User model versus baseline model: Comparative Fit Index (CFI) 0.931 Tucker-Lewis Index (TLI) 0.896 Loglikelihood and Information Criteria: Loglikelihood user model (H0) -3737.745 Loglikelihood unrestricted model (H1) -3695.092 Number of free parameters 21 Akaike (AIC) 7517.490 Bayesian (BIC) 7595.339 Sample-size adjusted Bayesian (BIC) 7528.739 Root Mean Square Error of Approximation: RMSEA 0.092 90 Percent Confidence Interval 0.071 0.114 P-value RMSEA <= 0.05 0.001 Standardized Root Mean Square Residual: SRMR 0.065 Parameter Estimates: Information Expected Information saturated (h1) model Structured Standard Errors Standard Latent Variables: Estimate Std.Err z-value P(>|z|) visual =~ x1 1.000 x2 0.554 0.100 5.554 0.000 x3 0.729 0.109 6.685 0.000 textual =~ x4 1.000 x5 1.113 0.065 17.014 0.000 x6 0.926 0.055 16.703 0.000 speed =~ x7 1.000 x8 1.180 0.165 7.152 0.000 x9 1.082 0.151 7.155 0.000 Covariances: Estimate Std.Err z-value P(>|z|) visual ~~ textual 0.408 0.074 5.552 0.000 speed 0.262 0.056 4.660 0.000 textual ~~ speed 0.173 0.049 3.518 0.000 Variances: Estimate Std.Err z-value P(>|z|) .x1 0.549 0.114 4.833 0.000 .x2 1.134 0.102 11.146 0.000 .x3 0.844 0.091 9.317 0.000 .x4 0.371 0.048 7.779 0.000 .x5 0.446 0.058 7.642 0.000 .x6 0.356 0.043 8.277 0.000 .x7 0.799 0.081 9.823 0.000 .x8 0.488 0.074 6.573 0.000 .x9 0.566 0.071 8.003 0.000 visual 0.809 0.145 5.564 0.000 textual 0.979 0.112 8.737 0.000 speed 0.384 0.086 4.451 0.000
%%R
model <- '
x1 ~ y3
# measurement model
# ind60 =~ x1 + x2 + x3
# dem60 =~ y1 + y2 + y3 + y4
# dem65 =~ y5 + y6 + y7 + y8
# regressions
# dem60 ~ ind60
# dem65 ~ ind60 + dem60
'
fit <- sem(model, data=PoliticalDemocracy)
summary(fit, standardized=TRUE)
lavaan 0.6-2 ended normally after 10 iterations Optimization method NLMINB Number of free parameters 2 Number of observations 75 Estimator ML Model Fit Test Statistic 0.000 Degrees of freedom 0 Parameter Estimates: Information Expected Information saturated (h1) model Structured Standard Errors Standard Regressions: Estimate Std.Err z-value P(>|z|) Std.lv Std.all x1 ~ y3 0.073 0.024 2.999 0.003 0.073 0.327 Variances: Estimate Std.Err z-value P(>|z|) Std.lv Std.all .x1 0.473 0.077 6.124 0.000 0.473 0.893