makeWeights
is a function for creating tables
of weighting values. It takes 3 input values: a dataframe (or
data.table), a column with sample counts, and a column with population
counts. The dataset should contain one row for each unique group
requiring a weight.
For testing, we use the HairEyeColor
dataset that comes
stock with R. It’s a table containing counts (Freq
) for
each combination of three categorical variables: Eye
,
Hair
, and Sex
. It looks something like
this:
HairEyeColor
#> , , Sex = Male
#>
#> Eye
#> Hair Brown Blue Hazel Green
#> Black 32 11 10 3
#> Brown 53 50 25 15
#> Red 10 10 7 7
#> Blond 3 30 5 8
#>
#> , , Sex = Female
#>
#> Eye
#> Hair Brown Blue Hazel Green
#> Black 36 9 5 2
#> Brown 66 34 29 14
#> Red 16 7 7 7
#> Blond 4 64 5 8
We tweak it by adding a random population variable. Next, we’ll apply
the makeWeights
function to it to look at what it adds onto
our dataframe.
test_data <- HairEyeColor %>%
data.frame() %>%
mutate(Population = round(runif(length(HairEyeColor),10000,20000),0))
test_dt <- data.table(test_data)
test_data %>% glimpse()
#> Rows: 32
#> Columns: 5
#> $ Hair <fct> Black, Brown, Red, Blond, Black, Brown, Red, Blond, Black, …
#> $ Eye <fct> Brown, Brown, Brown, Brown, Blue, Blue, Blue, Blue, Hazel, …
#> $ Sex <fct> Male, Male, Male, Male, Male, Male, Male, Male, Male, Male,…
#> $ Freq <dbl> 32, 53, 10, 3, 11, 50, 10, 30, 10, 25, 7, 5, 3, 15, 7, 8, 3…
#> $ Population <dbl> 11137, 16223, 16093, 16234, 18609, 16403, 10095, 12326, 166…
The code chunk below applies makeWeights
to the test
data, summarizing the proportion and population weights for each record,
along with some (optional) intermediate values for checking the
calculations. Let’s look at the first few rows:
test_weights <- makeWeights(data = test_data,
sampleVal = "Freq",
populationVal = "Population",
checkCols = TRUE)
knitr::kable(head(test_weights), digits = 2)
Hair | Eye | Sex | Freq | Population | sampleValWtd | populationWt | propWt | propCheck | populationCheck |
---|---|---|---|---|---|---|---|---|---|
Black | Brown | Male | 32 | 11137 | 14.23 | 348.03 | 0.44 | 14.23 | 11137 |
Brown | Brown | Male | 53 | 16223 | 20.73 | 306.09 | 0.39 | 20.73 | 16223 |
Red | Brown | Male | 10 | 16093 | 20.57 | 1609.30 | 2.06 | 20.57 | 16093 |
Blond | Brown | Male | 3 | 16234 | 20.75 | 5411.33 | 6.92 | 20.75 | 16234 |
Black | Blue | Male | 11 | 18609 | 23.78 | 1691.73 | 2.16 | 23.78 | 18609 |
Brown | Blue | Male | 50 | 16403 | 20.96 | 328.06 | 0.42 | 20.96 | 16403 |
The function calculates the proportion (propWt
) which
would transform the observed sample count (Freq
) for a
given row into a value (sampleVal_wtd
) proportional to that
row’s share of the overall population. It also produces the value
(populationWt
) that would transform the observed sample
count into the overall population (Population
) for a given
row. Along the way it produces some intermediate values as well, for
checking results.
Eventually, these checks will happen behind the scenes in more automated fashion, and would only pop up if the user requested them.