wtdFreqTable
is a function for calculating the
frequency any number of groups in a dataset, with or without weights. It
takes up to 4 input values:
For testing, we use the mtcars
dataset that comes stock
with R. It contains data from the 1974 Motor Trend US magazine, with
fuel economy and other statistics for a set of 32 cars. It looks
something like this:
glimpse(mtcars)
#> Rows: 32
#> Columns: 11
#> $ mpg <dbl> 21.0, 21.0, 22.8, 21.4, 18.7, 18.1, 14.3, 24.4, 22.8, 19.2, 17.8,…
#> $ cyl <dbl> 6, 6, 4, 6, 8, 6, 8, 4, 4, 6, 6, 8, 8, 8, 8, 8, 8, 4, 4, 4, 4, 8,…
#> $ disp <dbl> 160.0, 160.0, 108.0, 258.0, 360.0, 225.0, 360.0, 146.7, 140.8, 16…
#> $ hp <dbl> 110, 110, 93, 110, 175, 105, 245, 62, 95, 123, 123, 180, 180, 180…
#> $ drat <dbl> 3.90, 3.90, 3.85, 3.08, 3.15, 2.76, 3.21, 3.69, 3.92, 3.92, 3.92,…
#> $ wt <dbl> 2.620, 2.875, 2.320, 3.215, 3.440, 3.460, 3.570, 3.190, 3.150, 3.…
#> $ qsec <dbl> 16.46, 17.02, 18.61, 19.44, 17.02, 20.22, 15.84, 20.00, 22.90, 18…
#> $ vs <dbl> 0, 0, 1, 1, 0, 1, 0, 1, 1, 1, 1, 0, 0, 0, 0, 0, 0, 1, 1, 1, 1, 0,…
#> $ am <dbl> 1, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 1, 0, 0,…
#> $ gear <dbl> 4, 4, 4, 3, 3, 3, 3, 4, 4, 4, 4, 3, 3, 3, 3, 3, 3, 4, 4, 4, 3, 3,…
#> $ carb <dbl> 4, 4, 1, 1, 2, 1, 4, 2, 2, 4, 4, 3, 3, 3, 4, 4, 4, 1, 2, 1, 1, 2,…
We add random weights to the data for testing below.
The code chunk below applies wtdFreqTable
to the test
data, comparing frequencies when weighting either by miles per gallon
(mpg
) or the random weight variable we created
(rand_wt
). Switching the weighting scheme only changes the
weighted overall frequencies (in the right column).
wtdFreqTable
does not normalize the weights to
make total weighted count the same as the unweighted one, hence the
wonky results when weighting by mpg and hp
wtdFreqTable(test_data, x = "cyl", y = "gear", totWeightVar = "mpg") %>%
knitr::kable()
#> Warning in wtdFreqTable.data.frame(test_data, x = "cyl", y = "gear", totWeightVar = "mpg"): Using placeholder in-group weights of 1
gear | 4 | 6 | 8 | Total |
---|---|---|---|---|
n | 11 | 7 | 14 | 32.00000 |
3 | 1 | 2 | 12 | 12.02551 |
4 | 8 | 4 | NA | 14.65360 |
5 | 2 | 1 | 2 | 5.32089 |
wtdFreqTable(test_data, x = "cyl", y = "gear", totWeightVar = "mpg", inGroupWeightVar = "rand_wt", accuracy = 0.01) %>%
knitr::kable()
gear | 4 | 6 | 8 | Total |
---|---|---|---|---|
n | 11 | 7 | 14 | 32 |
3 | 0.54 | 2.23 | 12.44 | 12.03 |
4 | 7.78 | 4.01 | NA | 14.65 |
5 | 1.78 | 0.99 | 2.23 | 5.32 |
wtdFreqTable(test_data, x = "cyl", y = "gear", totWeightVar = "rand_wt") %>%
knitr::kable()
#> Warning in wtdFreqTable.data.frame(test_data, x = "cyl", y = "gear", totWeightVar = "rand_wt"): Using placeholder in-group weights of 1
gear | 4 | 6 | 8 | Total |
---|---|---|---|---|
n | 11 | 7 | 14 | 32.00000 |
3 | 1 | 2 | 12 | 15.21235 |
4 | 8 | 4 | NA | 11.78520 |
5 | 2 | 1 | 2 | 5.00245 |
wtdFreqTable(test_data, x = "cyl", y = "gear", totWeightVar = "rand_wt", inGroupWeightVar = "rand_wt", accuracy = 0.01) %>%
knitr::kable()
gear | 4 | 6 | 8 | Total |
---|---|---|---|---|
n | 11 | 7 | 14 | 32 |
3 | 0.54 | 2.23 | 12.44 | 15.21 |
4 | 7.78 | 4.01 | NA | 11.79 |
5 | 1.78 | 0.99 | 2.23 | 5.00 |
wtdFreqTable(mtcars, x = "cyl", y = "gear", totWeightVar = "hp", tot.label = "Statewide") %>%
knitr::kable()
#> Warning in wtdFreqTable.data.frame(mtcars, x = "cyl", y = "gear", totWeightVar = "hp", : Using placeholder in-group weights of 1
gear | 4 | 6 | 8 | Statewide |
---|---|---|---|---|
n | 11 | 7 | 14 | 32.000000 |
3 | 1 | 2 | 12 | 18.011078 |
4 | 8 | 4 | NA | 7.321687 |
5 | 2 | 1 | 2 | 6.667235 |