pairTtest

pairTtest is a function for comparing a set of proportion pairs for statistically significant differences. It requires 3 input values: a dataframe, a column with means to compare, and a column with groups to compare. In other words, the success counts are a subset of the trial counts.

Sample data

For testing, we use the airquality dataset that comes stock with R. It contains ozone level measurements over the course of various months.

glimpse(airquality)
#> Rows: 153
#> Columns: 6
#> $ Ozone   <int> 41, 36, 12, 18, NA, 28, 23, 19, 8, NA, 7, 16, 11, 14, 18, 14, …
#> $ Solar.R <int> 190, 118, 149, 313, NA, NA, 299, 99, 19, 194, NA, 256, 290, 27…
#> $ Wind    <dbl> 7.4, 8.0, 12.6, 11.5, 14.3, 14.9, 8.6, 13.8, 20.1, 8.6, 6.9, 9…
#> $ Temp    <int> 67, 72, 74, 62, 56, 66, 65, 59, 61, 69, 74, 69, 66, 68, 58, 64…
#> $ Month   <int> 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5,…
#> $ Day     <int> 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18,…

We add a random bin (rand_bin) as a placeholder categorical variable to represent different data collection methods.

Month
Month plus rand_bin

This way we can look at how the proportions of treated insects vary from month-to-month and across the random bins (“data collection periods”):

test_data <-  airquality %>% 
  mutate(rand_bin = as.character(sample(1:3, nrow(airquality), replace = TRUE)),
         weight = sample(c(0.5,1,2), nrow(.), replace = TRUE))

pairT <- pairTtest(test_data,
                   "Ozone", "Month",
                   alpha = 0.05, n.min = 4)

We can inspect the output of pairwise T-testing in table form.

pairT %>% 
  head(8) %>% 
  knitr::kable()

grp1_lbl	grp2_lbl	grp1_n	grp2_n	p.value	sig
6	5	9	26	1.0000000	FALSE
7	5	26	26	0.0002638	TRUE
7	6	26	9	0.0511274	FALSE
8	5	26	26	0.0001949	TRUE
8	6	26	9	0.0498733	TRUE
8	7	26	26	1.0000000	FALSE
9	5	29	26	1.0000000	FALSE
9	6	29	9	1.0000000	FALSE

A quick glance at the chart above suggests a good chunk of the statistically significant differences in ozone levels (5 rows) are due to the middle months (July + Aug) being different than the months bookending the dataset (May and September.

filter(pairT, sig) %>%
    head(8) %>% 
    knitr::kable()

grp1_lbl	grp2_lbl	grp1_n	grp2_n	p.value	sig
7	5	26	26	0.0002638	TRUE
8	5	26	26	0.0001949	TRUE
8	6	26	9	0.0498733	TRUE
9	7	29	26	0.0048788	TRUE
9	8	29	26	0.0038781	TRUE

Doing the same as above, but using our rand_bin variable instead of the months in the dataset. Perhaps unsurprisingly, no significant differences in means across these randomly assigned groups.

pairTtest(test_data, "Ozone", "rand_bin") %>%
  head(8) %>% 
  knitr::kable()

grp1_lbl	grp2_lbl	grp1_n	grp2_n	p.value	sig
2	1	39	39	0.9661033	FALSE
3	1	38	39	0.9878598	FALSE
3	2	38	39	0.9878598	FALSE

Weighted tests

The code chunk below applies pairTtest to the test data, comparing proportions of insects treated with different Months accross our randomly assigned bins (rand_bin).

pairT_wtd <- wtdPairTtest(test_data,
                          "Ozone", "Month", "weight",
                          alpha = 0.05, n.min = 4)

pairT_wtd_noAdj <- wtdPairTtest(test_data,
                                "Ozone", "Month", "weight",
                                p.adjust.method = "none",
                                alpha = 0.05, n.min = 4)

We can inspect the output of pairwise T-testing in table form.

pairT_wtd %>% 
  head(8) %>% 
  knitr::kable()

grp1_lbl	grp2_lbl	grp1_n	grp2_n	estimate	estimate1	estimate2	std.err	t.value	df	p.value	method	sig
5	6	31.5	11.5	-8.204279	22.31746	30.52174	7.309708	-1.1223812	11.92080	0.8513955	Two Sample Weighted T-Test (Welch)	FALSE
5	7	31.5	28.0	-40.539682	22.31746	62.85714	7.228363	-5.6084179	36.69488	0.0000219	Two Sample Weighted T-Test (Welch)	TRUE
5	8	31.5	35.5	-35.105075	22.31746	57.42254	7.791803	-4.5053856	35.31121	0.0006270	Two Sample Weighted T-Test (Welch)	TRUE
5	9	31.5	36.5	-10.545553	22.31746	32.86301	5.794123	-1.8200432	48.99949	0.2994540	Two Sample Weighted T-Test (Welch)	FALSE
6	7	11.5	28.0	-32.335404	30.52174	62.85714	9.251256	-3.4952448	23.34725	0.0134361	Two Sample Weighted T-Test (Welch)	TRUE
6	8	11.5	35.5	-26.900796	30.52174	57.42254	9.897719	-2.7178784	25.09554	0.0586929	Two Sample Weighted T-Test (Welch)	FALSE
6	9	11.5	36.5	-2.341275	30.52174	32.86301	8.289949	-0.2824233	16.28191	1.0000000	Two Sample Weighted T-Test (Welch)	FALSE
7	8	28.0	35.5	5.434608	62.85714	57.42254	9.997649	0.5435885	49.75621	1.0000000	Two Sample Weighted T-Test (Welch)	FALSE


pairT_wtd_noAdj %>% 
  head(8) %>% 
  knitr::kable()

grp1_lbl	grp2_lbl	grp1_n	grp2_n	estimate	estimate1	estimate2	std.err	t.value	df	p.value	method	sig
5	6	31.5	11.5	-8.204279	22.31746	30.52174	7.384076	-1.1110773	11.92080	0.2884428	Two Sample Weighted T-Test (Welch)	FALSE
5	7	31.5	28.0	-40.539682	22.31746	62.85714	7.770947	-5.2168264	36.69488	0.0000074	Two Sample Weighted T-Test (Welch)	TRUE
5	8	31.5	35.5	-35.105075	22.31746	57.42254	7.964060	-4.4079371	35.31121	0.0000931	Two Sample Weighted T-Test (Welch)	TRUE
5	9	31.5	36.5	-10.545553	22.31746	32.86301	5.864994	-1.7980501	48.99949	0.0783314	Two Sample Weighted T-Test (Welch)	FALSE
6	7	11.5	28.0	-32.335404	30.52174	62.85714	9.397350	-3.4409065	23.34725	0.0021923	Two Sample Weighted T-Test (Welch)	TRUE
6	8	11.5	35.5	-26.900796	30.52174	57.42254	9.455832	-2.8448894	25.09554	0.0087177	Two Sample Weighted T-Test (Welch)	TRUE
6	9	11.5	36.5	-2.341275	30.52174	32.86301	7.926770	-0.2953630	16.28191	0.7714499	Two Sample Weighted T-Test (Welch)	FALSE
7	8	28.0	35.5	5.434608	62.85714	57.42254	9.861895	0.5510713	49.75621	0.5840510	Two Sample Weighted T-Test (Welch)	FALSE

And just the significant results

filter(pairT_wtd, sig) %>%
    select(-c(estimate:df)) %>% 
    head(8) %>% 
    knitr::kable()

grp1_lbl	grp2_lbl	grp1_n	grp2_n	p.value	method	sig
5	7	31.5	28.0	0.0000219	Two Sample Weighted T-Test (Welch)	TRUE
5	8	31.5	35.5	0.0006270	Two Sample Weighted T-Test (Welch)	TRUE
6	7	11.5	28.0	0.0134361	Two Sample Weighted T-Test (Welch)	TRUE
7	9	28.0	36.5	0.0038380	Two Sample Weighted T-Test (Welch)	TRUE


filter(pairT_wtd_noAdj, sig) %>%
    select(-c(estimate:df)) %>% 
    head(8) %>% 
    knitr::kable()

grp1_lbl	grp2_lbl	grp1_n	grp2_n	p.value	method	sig
5	7	31.5	28.0	0.0000074	Two Sample Weighted T-Test (Welch)	TRUE
5	8	31.5	35.5	0.0000931	Two Sample Weighted T-Test (Welch)	TRUE
6	7	11.5	28.0	0.0021923	Two Sample Weighted T-Test (Welch)	TRUE
6	8	11.5	35.5	0.0087177	Two Sample Weighted T-Test (Welch)	TRUE
7	9	28.0	36.5	0.0007283	Two Sample Weighted T-Test (Welch)	TRUE
8	9	35.5	36.5	0.0061723	Two Sample Weighted T-Test (Welch)	TRUE

- Sample data
- Weighted tests