pairTtest

pairTtest is a function for comparing a set of proportion pairs for statistically significant differences. It requires 3 input values: a dataframe, a column with means to compare, and a column with groups to compare. In other words, the success counts are a subset of the trial counts.

Sample data

For testing, we use the airquality dataset that comes stock with R. It contains ozone level measurements over the course of various months.

glimpse(airquality)
#> Rows: 153
#> Columns: 6
#> $ Ozone   <int> 41, 36, 12, 18, NA, 28, 23, 19, 8, NA, 7, 16, 11, 14, 18, 14, …
#> $ Solar.R <int> 190, 118, 149, 313, NA, NA, 299, 99, 19, 194, NA, 256, 290, 27…
#> $ Wind    <dbl> 7.4, 8.0, 12.6, 11.5, 14.3, 14.9, 8.6, 13.8, 20.1, 8.6, 6.9, 9…
#> $ Temp    <int> 67, 72, 74, 62, 56, 66, 65, 59, 61, 69, 74, 69, 66, 68, 58, 64…
#> $ Month   <int> 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5,…
#> $ Day     <int> 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18,…

We add a random bin (rand_bin) as a placeholder categorical variable to represent different data collection methods.

  • Month
  • Month plus rand_bin

This way we can look at how the proportions of treated insects vary from month-to-month and across the random bins (“data collection periods”):

test_data <-  airquality %>% 
  mutate(rand_bin = as.character(sample(1:3, nrow(airquality), replace = TRUE)),
         weight = sample(c(0.5,1,2), nrow(.), replace = TRUE))

pairT <- pairTtest(test_data,
                   "Ozone", "Month",
                   alpha = 0.05, n.min = 4)

We can inspect the output of pairwise T-testing in table form.

pairT %>% 
  head(8) %>% 
  knitr::kable()
grp1_lbl grp2_lbl grp1_n grp2_n p.value sig
6 5 9 26 1.0000000 FALSE
7 5 26 26 0.0002638 TRUE
7 6 26 9 0.0511274 FALSE
8 5 26 26 0.0001949 TRUE
8 6 26 9 0.0498733 TRUE
8 7 26 26 1.0000000 FALSE
9 5 29 26 1.0000000 FALSE
9 6 29 9 1.0000000 FALSE

A quick glance at the chart above suggests a good chunk of the statistically significant differences in ozone levels (5 rows) are due to the middle months (July + Aug) being different than the months bookending the dataset (May and September.

filter(pairT, sig) %>%
    head(8) %>% 
    knitr::kable()
grp1_lbl grp2_lbl grp1_n grp2_n p.value sig
7 5 26 26 0.0002638 TRUE
8 5 26 26 0.0001949 TRUE
8 6 26 9 0.0498733 TRUE
9 7 29 26 0.0048788 TRUE
9 8 29 26 0.0038781 TRUE

Doing the same as above, but using our rand_bin variable instead of the months in the dataset. Perhaps unsurprisingly, no significant differences in means across these randomly assigned groups.

pairTtest(test_data, "Ozone", "rand_bin") %>%
  head(8) %>% 
  knitr::kable()
grp1_lbl grp2_lbl grp1_n grp2_n p.value sig
2 1 39 39 0.9661033 FALSE
3 1 38 39 0.9878598 FALSE
3 2 38 39 0.9878598 FALSE

Weighted tests

The code chunk below applies pairTtest to the test data, comparing proportions of insects treated with different Months accross our randomly assigned bins (rand_bin).

pairT_wtd <- wtdPairTtest(test_data,
                          "Ozone", "Month", "weight",
                          alpha = 0.05, n.min = 4)

pairT_wtd_noAdj <- wtdPairTtest(test_data,
                                "Ozone", "Month", "weight",
                                p.adjust.method = "none",
                                alpha = 0.05, n.min = 4)

We can inspect the output of pairwise T-testing in table form.

pairT_wtd %>% 
  head(8) %>% 
  knitr::kable()
grp1_lbl grp2_lbl grp1_n grp2_n estimate estimate1 estimate2 std.err t.value df p.value method sig
5 6 31.5 11.5 -8.204279 22.31746 30.52174 7.309708 -1.1223812 11.92080 0.8513955 Two Sample Weighted T-Test (Welch) FALSE
5 7 31.5 28.0 -40.539682 22.31746 62.85714 7.228363 -5.6084179 36.69488 0.0000219 Two Sample Weighted T-Test (Welch) TRUE
5 8 31.5 35.5 -35.105075 22.31746 57.42254 7.791803 -4.5053856 35.31121 0.0006270 Two Sample Weighted T-Test (Welch) TRUE
5 9 31.5 36.5 -10.545553 22.31746 32.86301 5.794123 -1.8200432 48.99949 0.2994540 Two Sample Weighted T-Test (Welch) FALSE
6 7 11.5 28.0 -32.335404 30.52174 62.85714 9.251256 -3.4952448 23.34725 0.0134361 Two Sample Weighted T-Test (Welch) TRUE
6 8 11.5 35.5 -26.900796 30.52174 57.42254 9.897719 -2.7178784 25.09554 0.0586929 Two Sample Weighted T-Test (Welch) FALSE
6 9 11.5 36.5 -2.341275 30.52174 32.86301 8.289949 -0.2824233 16.28191 1.0000000 Two Sample Weighted T-Test (Welch) FALSE
7 8 28.0 35.5 5.434608 62.85714 57.42254 9.997649 0.5435885 49.75621 1.0000000 Two Sample Weighted T-Test (Welch) FALSE

pairT_wtd_noAdj %>% 
  head(8) %>% 
  knitr::kable()
grp1_lbl grp2_lbl grp1_n grp2_n estimate estimate1 estimate2 std.err t.value df p.value method sig
5 6 31.5 11.5 -8.204279 22.31746 30.52174 7.384076 -1.1110773 11.92080 0.2884428 Two Sample Weighted T-Test (Welch) FALSE
5 7 31.5 28.0 -40.539682 22.31746 62.85714 7.770947 -5.2168264 36.69488 0.0000074 Two Sample Weighted T-Test (Welch) TRUE
5 8 31.5 35.5 -35.105075 22.31746 57.42254 7.964060 -4.4079371 35.31121 0.0000931 Two Sample Weighted T-Test (Welch) TRUE
5 9 31.5 36.5 -10.545553 22.31746 32.86301 5.864994 -1.7980501 48.99949 0.0783314 Two Sample Weighted T-Test (Welch) FALSE
6 7 11.5 28.0 -32.335404 30.52174 62.85714 9.397350 -3.4409065 23.34725 0.0021923 Two Sample Weighted T-Test (Welch) TRUE
6 8 11.5 35.5 -26.900796 30.52174 57.42254 9.455832 -2.8448894 25.09554 0.0087177 Two Sample Weighted T-Test (Welch) TRUE
6 9 11.5 36.5 -2.341275 30.52174 32.86301 7.926770 -0.2953630 16.28191 0.7714499 Two Sample Weighted T-Test (Welch) FALSE
7 8 28.0 35.5 5.434608 62.85714 57.42254 9.861895 0.5510713 49.75621 0.5840510 Two Sample Weighted T-Test (Welch) FALSE

And just the significant results

filter(pairT_wtd, sig) %>%
    select(-c(estimate:df)) %>% 
    head(8) %>% 
    knitr::kable()
grp1_lbl grp2_lbl grp1_n grp2_n p.value method sig
5 7 31.5 28.0 0.0000219 Two Sample Weighted T-Test (Welch) TRUE
5 8 31.5 35.5 0.0006270 Two Sample Weighted T-Test (Welch) TRUE
6 7 11.5 28.0 0.0134361 Two Sample Weighted T-Test (Welch) TRUE
7 9 28.0 36.5 0.0038380 Two Sample Weighted T-Test (Welch) TRUE

filter(pairT_wtd_noAdj, sig) %>%
    select(-c(estimate:df)) %>% 
    head(8) %>% 
    knitr::kable()
grp1_lbl grp2_lbl grp1_n grp2_n p.value method sig
5 7 31.5 28.0 0.0000074 Two Sample Weighted T-Test (Welch) TRUE
5 8 31.5 35.5 0.0000931 Two Sample Weighted T-Test (Welch) TRUE
6 7 11.5 28.0 0.0021923 Two Sample Weighted T-Test (Welch) TRUE
6 8 11.5 35.5 0.0087177 Two Sample Weighted T-Test (Welch) TRUE
7 9 28.0 36.5 0.0007283 Two Sample Weighted T-Test (Welch) TRUE
8 9 35.5 36.5 0.0061723 Two Sample Weighted T-Test (Welch) TRUE