makeWeights

makeWeights is a function for creating tables of weighting values. It takes 3 input values: a dataframe (or data.table), a column with sample counts, and a column with population counts. The dataset should contain one row for each unique group requiring a weight.

Sample data

For testing, we use the HairEyeColor dataset that comes stock with R. It’s a table containing counts (Freq) for each combination of three categorical variables: Eye, Hair, and Sex. It looks something like this:

HairEyeColor
#> , , Sex = Male
#> 
#>        Eye
#> Hair    Brown Blue Hazel Green
#>   Black    32   11    10     3
#>   Brown    53   50    25    15
#>   Red      10   10     7     7
#>   Blond     3   30     5     8
#> 
#> , , Sex = Female
#> 
#>        Eye
#> Hair    Brown Blue Hazel Green
#>   Black    36    9     5     2
#>   Brown    66   34    29    14
#>   Red      16    7     7     7
#>   Blond     4   64     5     8

We tweak it by adding a random population variable. Next, we’ll apply the makeWeights function to it to look at what it adds onto our dataframe.

test_data <-  HairEyeColor %>%
  data.frame() %>%
  mutate(Population = round(runif(length(HairEyeColor),10000,20000),0))

test_dt <- data.table(test_data)

test_data %>% glimpse()
#> Rows: 32
#> Columns: 5
#> $ Hair       <fct> Black, Brown, Red, Blond, Black, Brown, Red, Blond, Black, …
#> $ Eye        <fct> Brown, Brown, Brown, Brown, Blue, Blue, Blue, Blue, Hazel, …
#> $ Sex        <fct> Male, Male, Male, Male, Male, Male, Male, Male, Male, Male,…
#> $ Freq       <dbl> 32, 53, 10, 3, 11, 50, 10, 30, 10, 25, 7, 5, 3, 15, 7, 8, 3…
#> $ Population <dbl> 11137, 16223, 16093, 16234, 18609, 16403, 10095, 12326, 166…

Sample table summary

The code chunk below applies makeWeights to the test data, summarizing the proportion and population weights for each record, along with some (optional) intermediate values for checking the calculations. Let’s look at the first few rows:

test_weights <- makeWeights(data = test_data, 
                            sampleVal = "Freq",
                            populationVal = "Population",
                            checkCols = TRUE)

knitr::kable(head(test_weights), digits = 2)
Hair Eye Sex Freq Population sampleValWtd populationWt propWt propCheck populationCheck
Black Brown Male 32 11137 14.23 348.03 0.44 14.23 11137
Brown Brown Male 53 16223 20.73 306.09 0.39 20.73 16223
Red Brown Male 10 16093 20.57 1609.30 2.06 20.57 16093
Blond Brown Male 3 16234 20.75 5411.33 6.92 20.75 16234
Black Blue Male 11 18609 23.78 1691.73 2.16 23.78 18609
Brown Blue Male 50 16403 20.96 328.06 0.42 20.96 16403

The function calculates the proportion (propWt) which would transform the observed sample count (Freq) for a given row into a value (sampleVal_wtd) proportional to that row’s share of the overall population. It also produces the value (populationWt) that would transform the observed sample count into the overall population (Population) for a given row. Along the way it produces some intermediate values as well, for checking results.

Eventually, these checks will happen behind the scenes in more automated fashion, and would only pop up if the user requested them.