
pairTtest is a function for comparing a set of proportion pairs for statistically significant differences. It requires 3 input values: a dataframe, a column with means to compare, and a column with groups to compare. In other words, the success counts are a subset of the trial counts.

Sample data

For testing, we use the airquality dataset that comes stock with R. It contains ozone level measurements over the course of various months.

#> Rows: 153
#> Columns: 6
#> $ Ozone   <int> 41, 36, 12, 18, NA, 28, 23, 19, 8, NA, 7, 16, 11, 14, 18, 14, …
#> $ Solar.R <int> 190, 118, 149, 313, NA, NA, 299, 99, 19, 194, NA, 256, 290, 27…
#> $ Wind    <dbl> 7.4, 8.0, 12.6, 11.5, 14.3, 14.9, 8.6, 13.8, 20.1, 8.6, 6.9, 9…
#> $ Temp    <int> 67, 72, 74, 62, 56, 66, 65, 59, 61, 69, 74, 69, 66, 68, 58, 64…
#> $ Month   <int> 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5,…
#> $ Day     <int> 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18,…

We add a random bin (rand_bin) as a placeholder categorical variable to represent different data collection methods.

  • Month
  • Month plus rand_bin

This way we can look at how the proportions of treated insects vary from month-to-month and across the random bins (“data collection periods”):

test_data <-  airquality %>% 
  mutate(rand_bin = as.character(sample(1:3, nrow(airquality), replace = TRUE)),
         weight = sample(c(0.5,1,2), nrow(.), replace = TRUE))

pairT <- pairTtest(test_data,
                   "Ozone", "Month",
                   alpha = 0.05, n.min = 4)

We can inspect the output of pairwise T-testing in table form.

pairT %>% 
  head(8) %>% 
grp1_lbl grp2_lbl grp1_n grp2_n p.value sig
6 5 9 26 1.0000000 FALSE
7 5 26 26 0.0002638 TRUE
7 6 26 9 0.0511274 FALSE
8 5 26 26 0.0001949 TRUE
8 6 26 9 0.0498733 TRUE
8 7 26 26 1.0000000 FALSE
9 5 29 26 1.0000000 FALSE
9 6 29 9 1.0000000 FALSE

A quick glance at the chart above suggests a good chunk of the statistically significant differences in ozone levels (5 rows) are due to the middle months (July + Aug) being different than the months bookending the dataset (May and September.

filter(pairT, sig) %>%
    head(8) %>% 
grp1_lbl grp2_lbl grp1_n grp2_n p.value sig
7 5 26 26 0.0002638 TRUE
8 5 26 26 0.0001949 TRUE
8 6 26 9 0.0498733 TRUE
9 7 29 26 0.0048788 TRUE
9 8 29 26 0.0038781 TRUE

Doing the same as above, but using our rand_bin variable instead of the months in the dataset. Perhaps unsurprisingly, no significant differences in means across these randomly assigned groups.

pairTtest(test_data, "Ozone", "rand_bin") %>%
  head(8) %>% 
grp1_lbl grp2_lbl grp1_n grp2_n p.value sig
2 1 39 39 0.9661033 FALSE
3 1 38 39 0.9878598 FALSE
3 2 38 39 0.9878598 FALSE

Weighted tests

The code chunk below applies pairTtest to the test data, comparing proportions of insects treated with different Months accross our randomly assigned bins (rand_bin).

pairT_wtd <- wtdPairTtest(test_data,
                          "Ozone", "Month", "weight",
                          alpha = 0.05, n.min = 4)

pairT_wtd_noAdj <- wtdPairTtest(test_data,
                                "Ozone", "Month", "weight",
                                p.adjust.method = "none",
                                alpha = 0.05, n.min = 4)

We can inspect the output of pairwise T-testing in table form.

pairT_wtd %>% 
  head(8) %>% 
grp1_lbl grp2_lbl grp1_n grp2_n estimate estimate1 estimate2 std.err t.value df p.value method sig
5 6 31.5 11.5 -8.204279 22.31746 30.52174 7.309708 -1.1223812 11.92080 0.8513955 Two Sample Weighted T-Test (Welch) FALSE
5 7 31.5 28.0 -40.539682 22.31746 62.85714 7.228363 -5.6084179 36.69488 0.0000219 Two Sample Weighted T-Test (Welch) TRUE
5 8 31.5 35.5 -35.105075 22.31746 57.42254 7.791803 -4.5053856 35.31121 0.0006270 Two Sample Weighted T-Test (Welch) TRUE
5 9 31.5 36.5 -10.545553 22.31746 32.86301 5.794123 -1.8200432 48.99949 0.2994540 Two Sample Weighted T-Test (Welch) FALSE
6 7 11.5 28.0 -32.335404 30.52174 62.85714 9.251256 -3.4952448 23.34725 0.0134361 Two Sample Weighted T-Test (Welch) TRUE
6 8 11.5 35.5 -26.900796 30.52174 57.42254 9.897719 -2.7178784 25.09554 0.0586929 Two Sample Weighted T-Test (Welch) FALSE
6 9 11.5 36.5 -2.341275 30.52174 32.86301 8.289949 -0.2824233 16.28191 1.0000000 Two Sample Weighted T-Test (Welch) FALSE
7 8 28.0 35.5 5.434608 62.85714 57.42254 9.997649 0.5435885 49.75621 1.0000000 Two Sample Weighted T-Test (Welch) FALSE

pairT_wtd_noAdj %>% 
  head(8) %>% 
grp1_lbl grp2_lbl grp1_n grp2_n estimate estimate1 estimate2 std.err t.value df p.value method sig
5 6 31.5 11.5 -8.204279 22.31746 30.52174 7.384076 -1.1110773 11.92080 0.2884428 Two Sample Weighted T-Test (Welch) FALSE
5 7 31.5 28.0 -40.539682 22.31746 62.85714 7.770947 -5.2168264 36.69488 0.0000074 Two Sample Weighted T-Test (Welch) TRUE
5 8 31.5 35.5 -35.105075 22.31746 57.42254 7.964060 -4.4079371 35.31121 0.0000931 Two Sample Weighted T-Test (Welch) TRUE
5 9 31.5 36.5 -10.545553 22.31746 32.86301 5.864994 -1.7980501 48.99949 0.0783314 Two Sample Weighted T-Test (Welch) FALSE
6 7 11.5 28.0 -32.335404 30.52174 62.85714 9.397350 -3.4409065 23.34725 0.0021923 Two Sample Weighted T-Test (Welch) TRUE
6 8 11.5 35.5 -26.900796 30.52174 57.42254 9.455832 -2.8448894 25.09554 0.0087177 Two Sample Weighted T-Test (Welch) TRUE
6 9 11.5 36.5 -2.341275 30.52174 32.86301 7.926770 -0.2953630 16.28191 0.7714499 Two Sample Weighted T-Test (Welch) FALSE
7 8 28.0 35.5 5.434608 62.85714 57.42254 9.861895 0.5510713 49.75621 0.5840510 Two Sample Weighted T-Test (Welch) FALSE

And just the significant results

filter(pairT_wtd, sig) %>%
    select(-c(estimate:df)) %>% 
    head(8) %>% 
grp1_lbl grp2_lbl grp1_n grp2_n p.value method sig
5 7 31.5 28.0 0.0000219 Two Sample Weighted T-Test (Welch) TRUE
5 8 31.5 35.5 0.0006270 Two Sample Weighted T-Test (Welch) TRUE
6 7 11.5 28.0 0.0134361 Two Sample Weighted T-Test (Welch) TRUE
7 9 28.0 36.5 0.0038380 Two Sample Weighted T-Test (Welch) TRUE

filter(pairT_wtd_noAdj, sig) %>%
    select(-c(estimate:df)) %>% 
    head(8) %>% 
grp1_lbl grp2_lbl grp1_n grp2_n p.value method sig
5 7 31.5 28.0 0.0000074 Two Sample Weighted T-Test (Welch) TRUE
5 8 31.5 35.5 0.0000931 Two Sample Weighted T-Test (Welch) TRUE
6 7 11.5 28.0 0.0021923 Two Sample Weighted T-Test (Welch) TRUE
6 8 11.5 35.5 0.0087177 Two Sample Weighted T-Test (Welch) TRUE
7 9 28.0 36.5 0.0007283 Two Sample Weighted T-Test (Welch) TRUE
8 9 35.5 36.5 0.0061723 Two Sample Weighted T-Test (Welch) TRUE