pairTtest
is a
function for comparing a set of proportion pairs for statistically
significant differences. It requires 3 input values: a dataframe, a
column with means to compare, and a column with groups to compare. In
other words, the success counts are a subset of the trial counts.
For testing, we use the airquality
dataset that comes
stock with R. It contains ozone level measurements over the course of
various months.
glimpse(airquality)
#> Rows: 153
#> Columns: 6
#> $ Ozone <int> 41, 36, 12, 18, NA, 28, 23, 19, 8, NA, 7, 16, 11, 14, 18, 14, …
#> $ Solar.R <int> 190, 118, 149, 313, NA, NA, 299, 99, 19, 194, NA, 256, 290, 27…
#> $ Wind <dbl> 7.4, 8.0, 12.6, 11.5, 14.3, 14.9, 8.6, 13.8, 20.1, 8.6, 6.9, 9…
#> $ Temp <int> 67, 72, 74, 62, 56, 66, 65, 59, 61, 69, 74, 69, 66, 68, 58, 64…
#> $ Month <int> 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5,…
#> $ Day <int> 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18,…
We add a random bin (rand_bin
) as a placeholder
categorical variable to represent different data collection methods.
Month
Month
plus rand_bin
This way we can look at how the proportions of treated insects vary from month-to-month and across the random bins (“data collection periods”):
test_data <- airquality %>%
mutate(rand_bin = as.character(sample(1:3, nrow(airquality), replace = TRUE)),
weight = sample(c(0.5,1,2), nrow(.), replace = TRUE))
pairT <- pairTtest(test_data,
"Ozone", "Month",
alpha = 0.05, n.min = 4)
We can inspect the output of pairwise T-testing in table form.
grp1_lbl | grp2_lbl | grp1_n | grp2_n | p.value | sig |
---|---|---|---|---|---|
6 | 5 | 9 | 26 | 1.0000000 | FALSE |
7 | 5 | 26 | 26 | 0.0002638 | TRUE |
7 | 6 | 26 | 9 | 0.0511274 | FALSE |
8 | 5 | 26 | 26 | 0.0001949 | TRUE |
8 | 6 | 26 | 9 | 0.0498733 | TRUE |
8 | 7 | 26 | 26 | 1.0000000 | FALSE |
9 | 5 | 29 | 26 | 1.0000000 | FALSE |
9 | 6 | 29 | 9 | 1.0000000 | FALSE |
A quick glance at the chart above suggests a good chunk of the statistically significant differences in ozone levels (5 rows) are due to the middle months (July + Aug) being different than the months bookending the dataset (May and September.
grp1_lbl | grp2_lbl | grp1_n | grp2_n | p.value | sig |
---|---|---|---|---|---|
7 | 5 | 26 | 26 | 0.0002638 | TRUE |
8 | 5 | 26 | 26 | 0.0001949 | TRUE |
8 | 6 | 26 | 9 | 0.0498733 | TRUE |
9 | 7 | 29 | 26 | 0.0048788 | TRUE |
9 | 8 | 29 | 26 | 0.0038781 | TRUE |
Doing the same as above, but using our rand_bin
variable
instead of the months in the dataset. Perhaps unsurprisingly, no
significant differences in means across these randomly assigned
groups.
grp1_lbl | grp2_lbl | grp1_n | grp2_n | p.value | sig |
---|---|---|---|---|---|
2 | 1 | 39 | 39 | 0.9661033 | FALSE |
3 | 1 | 38 | 39 | 0.9878598 | FALSE |
3 | 2 | 38 | 39 | 0.9878598 | FALSE |
The code chunk below applies pairTtest
to the test data,
comparing proportions of insects treated with different Months accross
our randomly assigned bins (rand_bin
).
pairT_wtd <- wtdPairTtest(test_data,
"Ozone", "Month", "weight",
alpha = 0.05, n.min = 4)
pairT_wtd_noAdj <- wtdPairTtest(test_data,
"Ozone", "Month", "weight",
p.adjust.method = "none",
alpha = 0.05, n.min = 4)
We can inspect the output of pairwise T-testing in table form.
grp1_lbl | grp2_lbl | grp1_n | grp2_n | estimate | estimate1 | estimate2 | std.err | t.value | df | p.value | method | sig |
---|---|---|---|---|---|---|---|---|---|---|---|---|
5 | 6 | 31.5 | 11.5 | -8.204279 | 22.31746 | 30.52174 | 7.309708 | -1.1223812 | 11.92080 | 0.8513955 | Two Sample Weighted T-Test (Welch) | FALSE |
5 | 7 | 31.5 | 28.0 | -40.539682 | 22.31746 | 62.85714 | 7.228363 | -5.6084179 | 36.69488 | 0.0000219 | Two Sample Weighted T-Test (Welch) | TRUE |
5 | 8 | 31.5 | 35.5 | -35.105075 | 22.31746 | 57.42254 | 7.791803 | -4.5053856 | 35.31121 | 0.0006270 | Two Sample Weighted T-Test (Welch) | TRUE |
5 | 9 | 31.5 | 36.5 | -10.545553 | 22.31746 | 32.86301 | 5.794123 | -1.8200432 | 48.99949 | 0.2994540 | Two Sample Weighted T-Test (Welch) | FALSE |
6 | 7 | 11.5 | 28.0 | -32.335404 | 30.52174 | 62.85714 | 9.251256 | -3.4952448 | 23.34725 | 0.0134361 | Two Sample Weighted T-Test (Welch) | TRUE |
6 | 8 | 11.5 | 35.5 | -26.900796 | 30.52174 | 57.42254 | 9.897719 | -2.7178784 | 25.09554 | 0.0586929 | Two Sample Weighted T-Test (Welch) | FALSE |
6 | 9 | 11.5 | 36.5 | -2.341275 | 30.52174 | 32.86301 | 8.289949 | -0.2824233 | 16.28191 | 1.0000000 | Two Sample Weighted T-Test (Welch) | FALSE |
7 | 8 | 28.0 | 35.5 | 5.434608 | 62.85714 | 57.42254 | 9.997649 | 0.5435885 | 49.75621 | 1.0000000 | Two Sample Weighted T-Test (Welch) | FALSE |
grp1_lbl | grp2_lbl | grp1_n | grp2_n | estimate | estimate1 | estimate2 | std.err | t.value | df | p.value | method | sig |
---|---|---|---|---|---|---|---|---|---|---|---|---|
5 | 6 | 31.5 | 11.5 | -8.204279 | 22.31746 | 30.52174 | 7.384076 | -1.1110773 | 11.92080 | 0.2884428 | Two Sample Weighted T-Test (Welch) | FALSE |
5 | 7 | 31.5 | 28.0 | -40.539682 | 22.31746 | 62.85714 | 7.770947 | -5.2168264 | 36.69488 | 0.0000074 | Two Sample Weighted T-Test (Welch) | TRUE |
5 | 8 | 31.5 | 35.5 | -35.105075 | 22.31746 | 57.42254 | 7.964060 | -4.4079371 | 35.31121 | 0.0000931 | Two Sample Weighted T-Test (Welch) | TRUE |
5 | 9 | 31.5 | 36.5 | -10.545553 | 22.31746 | 32.86301 | 5.864994 | -1.7980501 | 48.99949 | 0.0783314 | Two Sample Weighted T-Test (Welch) | FALSE |
6 | 7 | 11.5 | 28.0 | -32.335404 | 30.52174 | 62.85714 | 9.397350 | -3.4409065 | 23.34725 | 0.0021923 | Two Sample Weighted T-Test (Welch) | TRUE |
6 | 8 | 11.5 | 35.5 | -26.900796 | 30.52174 | 57.42254 | 9.455832 | -2.8448894 | 25.09554 | 0.0087177 | Two Sample Weighted T-Test (Welch) | TRUE |
6 | 9 | 11.5 | 36.5 | -2.341275 | 30.52174 | 32.86301 | 7.926770 | -0.2953630 | 16.28191 | 0.7714499 | Two Sample Weighted T-Test (Welch) | FALSE |
7 | 8 | 28.0 | 35.5 | 5.434608 | 62.85714 | 57.42254 | 9.861895 | 0.5510713 | 49.75621 | 0.5840510 | Two Sample Weighted T-Test (Welch) | FALSE |
And just the significant results
grp1_lbl | grp2_lbl | grp1_n | grp2_n | p.value | method | sig |
---|---|---|---|---|---|---|
5 | 7 | 31.5 | 28.0 | 0.0000219 | Two Sample Weighted T-Test (Welch) | TRUE |
5 | 8 | 31.5 | 35.5 | 0.0006270 | Two Sample Weighted T-Test (Welch) | TRUE |
6 | 7 | 11.5 | 28.0 | 0.0134361 | Two Sample Weighted T-Test (Welch) | TRUE |
7 | 9 | 28.0 | 36.5 | 0.0038380 | Two Sample Weighted T-Test (Welch) | TRUE |
grp1_lbl | grp2_lbl | grp1_n | grp2_n | p.value | method | sig |
---|---|---|---|---|---|---|
5 | 7 | 31.5 | 28.0 | 0.0000074 | Two Sample Weighted T-Test (Welch) | TRUE |
5 | 8 | 31.5 | 35.5 | 0.0000931 | Two Sample Weighted T-Test (Welch) | TRUE |
6 | 7 | 11.5 | 28.0 | 0.0021923 | Two Sample Weighted T-Test (Welch) | TRUE |
6 | 8 | 11.5 | 35.5 | 0.0087177 | Two Sample Weighted T-Test (Welch) | TRUE |
7 | 9 | 28.0 | 36.5 | 0.0007283 | Two Sample Weighted T-Test (Welch) | TRUE |
8 | 9 | 35.5 | 36.5 | 0.0061723 | Two Sample Weighted T-Test (Welch) | TRUE |