--- title: "penTable" date: "`r Sys.Date()`" output: rmarkdown::html_vignette vignette: > %\VignetteIndexEntry{penTable} %\VignetteEngine{knitr::rmarkdown} %\VignetteEncoding{UTF-8} --- ```{r setup, include = FALSE} knitr::opts_chunk$set( collapse = TRUE, comment = "#>" ) library(coreStatsNMR) library(dplyr) library(data.table) set.seed(1234) ``` `penTable` is a function for creating penetration tables, with or without weights. It takes up to 5 input values: a dataframe (or data.table), an index variable, two variables for grouping the dataset, and an optional column with weights. ### Sample data For testing, we use the `mtcars` dataset that comes stock with R. It contains data from the 1974 Motor Trend US magazine, qith fuel economy and other statistics for a set of 32 cars. It looks something like this: ```{r mtcars} glimpse(mtcars) ``` We add random weights to the data for testing below. ```{r test_data, echo=TRUE} set.seed(1234) rand_wt <- data.frame(rand_bin = 1:3, rand_wt = rnorm(3, mean = 1, sd = 0.333)) test_data <- mtcars %>% mutate(index = row_number(), rand_bin = sample(1:3, nrow(mtcars), replace = TRUE)) %>% merge(rand_wt, by = "rand_bin") test_dt <- data.table(test_data) ``` ### Changing weights The code chunk below applies `penTable` to the test data. ```{r penTable_0} penTable(test_data, index = "index", x = "cyl", y = "gear", totWeightVar = "rand_wt") %>% knitr::kable() penTable(test_dt, index = "index", x = "cyl", y = "gear", totWeightVar = "rand_wt", inGroupWeightVar = "rand_wt") %>% knitr::kable() penTable(test_data, index = "index", x = "cyl", y = "gear", totWeightVar = "mpg") %>% knitr::kable() penTable(test_data, index = "index", x = "cyl", y = "gear", totWeightVar = "mpg", inGroupWeightVar = "rand_wt", accuracy = 0.1) %>% knitr::kable() ``` ### Overlap with `wtdPropTable` There are instances where penetration and saturation are the same, i.e. when there is only one record associated with each unique ID. For example, a dataset of heating equipment collected from homes where each home only had one piece of heating equipment. ```{r} n_distinct(test_data$index) == nrow(test_data) wtdPropTable( test_data, x = "cyl", y = "gear", totWeightVar = "rand_wt", accuracy = 1 ) %>% knitr::kable() penTable( test_dt, index = "index", x = "cyl", y = "gear", totWeightVar = "rand_wt", accuracy = 1 ) %>% knitr::kable() wtdPropTable( test_data, x = "cyl", y = "gear", totWeightVar = "rand_wt", inGroupWeightVar = "rand_wt", accuracy = 1 ) %>% knitr::kable() penTable( test_dt, index = "index", x = "cyl", y = "gear", totWeightVar = "rand_wt", inGroupWeightVar = "rand_wt", accuracy = 1 ) %>% knitr::kable() ``` ### Retrieve only counts used in `penTable` Helpful for QC checks and running significance testing on penetration table results. ```{r penTable_ns} penTable(test_data, index = "index", x = "cyl", y = "gear", totWeightVar = "mpg", only_ns = TRUE) %>% knitr::kable() ```