penTable
is a function for creating penetration
tables, with or without weights. It takes up to 5 input values: a
dataframe (or data.table), an index variable, two variables for grouping
the dataset, and an optional column with weights.
For testing, we use the mtcars
dataset that comes stock
with R. It contains data from the 1974 Motor Trend US magazine, qith
fuel economy and other statistics for a set of 32 cars. It looks
something like this:
glimpse(mtcars)
#> Rows: 32
#> Columns: 11
#> $ mpg <dbl> 21.0, 21.0, 22.8, 21.4, 18.7, 18.1, 14.3, 24.4, 22.8, 19.2, 17.8,…
#> $ cyl <dbl> 6, 6, 4, 6, 8, 6, 8, 4, 4, 6, 6, 8, 8, 8, 8, 8, 8, 4, 4, 4, 4, 8,…
#> $ disp <dbl> 160.0, 160.0, 108.0, 258.0, 360.0, 225.0, 360.0, 146.7, 140.8, 16…
#> $ hp <dbl> 110, 110, 93, 110, 175, 105, 245, 62, 95, 123, 123, 180, 180, 180…
#> $ drat <dbl> 3.90, 3.90, 3.85, 3.08, 3.15, 2.76, 3.21, 3.69, 3.92, 3.92, 3.92,…
#> $ wt <dbl> 2.620, 2.875, 2.320, 3.215, 3.440, 3.460, 3.570, 3.190, 3.150, 3.…
#> $ qsec <dbl> 16.46, 17.02, 18.61, 19.44, 17.02, 20.22, 15.84, 20.00, 22.90, 18…
#> $ vs <dbl> 0, 0, 1, 1, 0, 1, 0, 1, 1, 1, 1, 0, 0, 0, 0, 0, 0, 1, 1, 1, 1, 0,…
#> $ am <dbl> 1, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 1, 0, 0,…
#> $ gear <dbl> 4, 4, 4, 3, 3, 3, 3, 4, 4, 4, 4, 3, 3, 3, 3, 3, 3, 4, 4, 4, 3, 3,…
#> $ carb <dbl> 4, 4, 1, 1, 2, 1, 4, 2, 2, 4, 4, 3, 3, 3, 4, 4, 4, 1, 2, 1, 1, 2,…
We add random weights to the data for testing below.
The code chunk below applies penTable
to the test
data.
penTable(test_data, index = "index", x = "cyl", y = "gear",
totWeightVar = "rand_wt") %>%
knitr::kable()
#> Warning in penTable.data.frame(test_data, index = "index", x = "cyl", y = "gear", : Using placeholder in-group weights of 1
gear | 4 | 6 | 8 | Total |
---|---|---|---|---|
n | 11 | 7 | 14 | 32 |
3 | 9% | 29% | 86% | 48% |
4 | 73% | 57% | NA | 37% |
5 | 18% | 14% | 14% | 16% |
penTable(test_dt,
index = "index",
x = "cyl",
y = "gear",
totWeightVar = "rand_wt",
inGroupWeightVar = "rand_wt") %>%
knitr::kable()
gear | 4 | 6 | 8 | Total |
---|---|---|---|---|
n | 11 | 7 | 14 | 32 |
3 | 5% | 31% | 85% | 48% |
4 | 77% | 55% | NA | 37% |
5 | 18% | 14% | 15% | 16% |
penTable(test_data,
index = "index",
x = "cyl",
y = "gear",
totWeightVar = "mpg") %>%
knitr::kable()
#> Warning in penTable.data.frame(test_data, index = "index", x = "cyl", y = "gear", : Using placeholder in-group weights of 1
gear | 4 | 6 | 8 | Total |
---|---|---|---|---|
n | 11 | 7 | 14 | 32 |
3 | 9% | 29% | 86% | 38% |
4 | 73% | 57% | NA | 46% |
5 | 18% | 14% | 14% | 17% |
penTable(test_data,
index = "index",
x = "cyl",
y = "gear",
totWeightVar = "mpg",
inGroupWeightVar = "rand_wt",
accuracy = 0.1) %>%
knitr::kable()
gear | 4 | 6 | 8 | Total |
---|---|---|---|---|
n | 11 | 7 | 14 | 32 |
3 | 5.4% | 30.8% | 84.8% | 37.6% |
4 | 77.0% | 55.4% | NA | 45.8% |
5 | 17.6% | 13.7% | 15.2% | 16.6% |
wtdPropTable
There are instances where penetration and saturation are the same, i.e. when there is only one record associated with each unique ID. For example, a dataset of heating equipment collected from homes where each home only had one piece of heating equipment.
n_distinct(test_data$index) == nrow(test_data)
#> [1] TRUE
wtdPropTable(
test_data,
x = "cyl",
y = "gear",
totWeightVar = "rand_wt",
accuracy = 1
) %>%
knitr::kable()
#> Warning in wtdPropTable.data.frame(test_data, x = "cyl", y = "gear", totWeightVar = "rand_wt", : Using placeholder in-group weights of 1
gear | 4 | 6 | 8 | Total |
---|---|---|---|---|
n | 11 | 7 | 14 | 32 |
3 | 9% | 29% | 86% | 48% |
4 | 73% | 57% | NA | 37% |
5 | 18% | 14% | 14% | 16% |
penTable(
test_dt,
index = "index",
x = "cyl",
y = "gear",
totWeightVar = "rand_wt",
accuracy = 1
) %>%
knitr::kable()
#> Warning in penTable.data.frame(test_dt, index = "index", x = "cyl", y = "gear", : Using placeholder in-group weights of 1
gear | 4 | 6 | 8 | Total |
---|---|---|---|---|
n | 11 | 7 | 14 | 32 |
3 | 9% | 29% | 86% | 48% |
4 | 73% | 57% | NA | 37% |
5 | 18% | 14% | 14% | 16% |
wtdPropTable(
test_data,
x = "cyl",
y = "gear",
totWeightVar = "rand_wt",
inGroupWeightVar = "rand_wt",
accuracy = 1
) %>%
knitr::kable()
gear | 4 | 6 | 8 | Total |
---|---|---|---|---|
n | 11 | 7 | 14 | 32 |
3 | 5% | 31% | 85% | 48% |
4 | 77% | 55% | NA | 37% |
5 | 18% | 14% | 15% | 16% |
penTable(
test_dt,
index = "index",
x = "cyl",
y = "gear",
totWeightVar = "rand_wt",
inGroupWeightVar = "rand_wt",
accuracy = 1
) %>%
knitr::kable()
gear | 4 | 6 | 8 | Total |
---|---|---|---|---|
n | 11 | 7 | 14 | 32 |
3 | 5% | 31% | 85% | 48% |
4 | 77% | 55% | NA | 37% |
5 | 18% | 14% | 15% | 16% |
penTable
Helpful for QC checks and running significance testing on penetration table results.
penTable(test_data,
index = "index",
x = "cyl",
y = "gear",
totWeightVar = "mpg",
only_ns = TRUE) %>%
knitr::kable()
#> Warning in penTable.data.frame(test_data, index = "index", x = "cyl", y = "gear", : Using placeholder in-group weights of 1
cyl | gear | index_xy | index_x | penetration |
---|---|---|---|---|
4 | 4 | 8 | 11 | 0.7272727 |
4 | 5 | 2 | 11 | 0.1818182 |
4 | 3 | 1 | 11 | 0.0909091 |
6 | 4 | 4 | 7 | 0.5714286 |
6 | 3 | 2 | 7 | 0.2857143 |
6 | 5 | 1 | 7 | 0.1428571 |
8 | 3 | 12 | 14 | 0.8571429 |
8 | 5 | 2 | 14 | 0.1428571 |