penTable

penTable is a function for creating penetration tables, with or without weights. It takes up to 5 input values: a dataframe (or data.table), an index variable, two variables for grouping the dataset, and an optional column with weights.

Sample data

For testing, we use the mtcars dataset that comes stock with R. It contains data from the 1974 Motor Trend US magazine, qith fuel economy and other statistics for a set of 32 cars. It looks something like this:

glimpse(mtcars)
#> Rows: 32
#> Columns: 11
#> $ mpg  <dbl> 21.0, 21.0, 22.8, 21.4, 18.7, 18.1, 14.3, 24.4, 22.8, 19.2, 17.8,…
#> $ cyl  <dbl> 6, 6, 4, 6, 8, 6, 8, 4, 4, 6, 6, 8, 8, 8, 8, 8, 8, 4, 4, 4, 4, 8,…
#> $ disp <dbl> 160.0, 160.0, 108.0, 258.0, 360.0, 225.0, 360.0, 146.7, 140.8, 16…
#> $ hp   <dbl> 110, 110, 93, 110, 175, 105, 245, 62, 95, 123, 123, 180, 180, 180…
#> $ drat <dbl> 3.90, 3.90, 3.85, 3.08, 3.15, 2.76, 3.21, 3.69, 3.92, 3.92, 3.92,…
#> $ wt   <dbl> 2.620, 2.875, 2.320, 3.215, 3.440, 3.460, 3.570, 3.190, 3.150, 3.…
#> $ qsec <dbl> 16.46, 17.02, 18.61, 19.44, 17.02, 20.22, 15.84, 20.00, 22.90, 18…
#> $ vs   <dbl> 0, 0, 1, 1, 0, 1, 0, 1, 1, 1, 1, 0, 0, 0, 0, 0, 0, 1, 1, 1, 1, 0,…
#> $ am   <dbl> 1, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 1, 0, 0,…
#> $ gear <dbl> 4, 4, 4, 3, 3, 3, 3, 4, 4, 4, 4, 3, 3, 3, 3, 3, 3, 4, 4, 4, 3, 3,…
#> $ carb <dbl> 4, 4, 1, 1, 2, 1, 4, 2, 2, 4, 4, 3, 3, 3, 4, 4, 4, 1, 2, 1, 1, 2,…

We add random weights to the data for testing below.

set.seed(1234)

rand_wt <- data.frame(rand_bin = 1:3,
                      rand_wt = rnorm(3, mean = 1, sd = 0.333))

test_data <- mtcars %>% 
  mutate(index = row_number(),
         rand_bin = sample(1:3, nrow(mtcars), replace = TRUE)) %>% 
  merge(rand_wt,
        by = "rand_bin")

test_dt <- data.table(test_data)

Changing weights

The code chunk below applies penTable to the test data.

penTable(test_data, index = "index", x = "cyl", y = "gear",
         totWeightVar = "rand_wt") %>%
    knitr::kable()
#> Warning in penTable.data.frame(test_data, index = "index", x = "cyl", y = "gear", : Using placeholder in-group weights of 1
gear 4 6 8 Total
n 11 7 14 32
3 9% 29% 86% 48%
4 73% 57% NA 37%
5 18% 14% 14% 16%

penTable(test_dt, 
         index = "index",
         x = "cyl",
         y = "gear", 
         totWeightVar = "rand_wt", 
         inGroupWeightVar = "rand_wt") %>%
    knitr::kable()
gear 4 6 8 Total
n 11 7 14 32
3 5% 31% 85% 48%
4 77% 55% NA 37%
5 18% 14% 15% 16%

penTable(test_data, 
         index = "index",
         x = "cyl",
         y = "gear",
         totWeightVar = "mpg") %>%
    knitr::kable()
#> Warning in penTable.data.frame(test_data, index = "index", x = "cyl", y = "gear", : Using placeholder in-group weights of 1
gear 4 6 8 Total
n 11 7 14 32
3 9% 29% 86% 38%
4 73% 57% NA 46%
5 18% 14% 14% 17%

penTable(test_data,
         index = "index",
         x = "cyl",
         y = "gear", 
         totWeightVar = "mpg",
         inGroupWeightVar = "rand_wt", 
         accuracy = 0.1) %>%
    knitr::kable()
gear 4 6 8 Total
n 11 7 14 32
3 5.4% 30.8% 84.8% 37.6%
4 77.0% 55.4% NA 45.8%
5 17.6% 13.7% 15.2% 16.6%

Overlap with wtdPropTable

There are instances where penetration and saturation are the same, i.e. when there is only one record associated with each unique ID. For example, a dataset of heating equipment collected from homes where each home only had one piece of heating equipment.

n_distinct(test_data$index) == nrow(test_data)
#> [1] TRUE

wtdPropTable(
    test_data,
    x = "cyl",
    y = "gear",
    totWeightVar = "rand_wt",
    accuracy = 1
) %>%
    knitr::kable()
#> Warning in wtdPropTable.data.frame(test_data, x = "cyl", y = "gear", totWeightVar = "rand_wt", : Using placeholder in-group weights of 1
gear 4 6 8 Total
n 11 7 14 32
3 9% 29% 86% 48%
4 73% 57% NA 37%
5 18% 14% 14% 16%

penTable(
    test_dt,
    index = "index",
    x = "cyl",
    y = "gear",
    totWeightVar = "rand_wt",
    accuracy = 1
) %>%
    knitr::kable()
#> Warning in penTable.data.frame(test_dt, index = "index", x = "cyl", y = "gear", : Using placeholder in-group weights of 1
gear 4 6 8 Total
n 11 7 14 32
3 9% 29% 86% 48%
4 73% 57% NA 37%
5 18% 14% 14% 16%

wtdPropTable(
    test_data,
    x = "cyl",
    y = "gear",
    totWeightVar = "rand_wt",
    inGroupWeightVar = "rand_wt",
    accuracy = 1
) %>%
    knitr::kable()
gear 4 6 8 Total
n 11 7 14 32
3 5% 31% 85% 48%
4 77% 55% NA 37%
5 18% 14% 15% 16%

penTable(
    test_dt,
    index = "index",
    x = "cyl",
    y = "gear",
    totWeightVar = "rand_wt",
    inGroupWeightVar = "rand_wt",
    accuracy = 1
) %>%
    knitr::kable()
gear 4 6 8 Total
n 11 7 14 32
3 5% 31% 85% 48%
4 77% 55% NA 37%
5 18% 14% 15% 16%

Retrieve only counts used in penTable

Helpful for QC checks and running significance testing on penetration table results.

penTable(test_data,
         index = "index",
         x = "cyl",
         y = "gear", 
         totWeightVar = "mpg",
         only_ns = TRUE) %>% 
  knitr::kable()
#> Warning in penTable.data.frame(test_data, index = "index", x = "cyl", y = "gear", : Using placeholder in-group weights of 1
cyl gear index_xy index_x penetration
4 4 8 11 0.7272727
4 5 2 11 0.1818182
4 3 1 11 0.0909091
6 4 4 7 0.5714286
6 3 2 7 0.2857143
6 5 1 7 0.1428571
8 3 12 14 0.8571429
8 5 2 14 0.1428571