Title: | Statistical Functions for Core Analysis Tasks at NMR |
---|---|
Description: | A set of statistical functions for use at NMR Group when completing core analysis tasks: frequency tables, cross-tabs, t-tests, proportion tests, etc. |
Authors: | Julian Ricardo [aut, cre], Jerrad Pierce [ctb], Matt Woundy [ctb] |
Maintainer: | Julian Ricardo <[email protected]> |
License: | file LICENSE |
Version: | 1.3.6-7 |
Built: | 2024-10-30 05:01:53 UTC |
Source: | https://gitlab.com/NMRgroup/corestatsnmr |
Returns a single area-weighted R-value from the two vectors it accepts as arguments: one vector of R-values, and the other of areas associated with each R-value
aggRval(r_val, area)
aggRval(r_val, area)
r_val |
Vector of r-values |
area |
Vector of area associated with each r-value |
Single area-weighted r-value
aggRval(c(2,5,20), c(10,10,50))
aggRval(c(2,5,20), c(10,10,50))
Calculate confidence interval (using normal distribution)
confInterval(x, conf_lvl = 0.9)
confInterval(x, conf_lvl = 0.9)
x |
Numerical vector |
conf_lvl |
A number from 0 to 1 indicating confidence level, defaults to 0.9 or 90% |
A dataframe summarizing the sample mean and confidence interval
confInterval(runif(100)); confInterval(runif(1e3))
confInterval(runif(100)); confInterval(runif(1e3))
Calculate the design effect for adjusting cluster sampling sizes
designEffect(n_obs, icc)
designEffect(n_obs, icc)
n_obs |
number. Observations in a cluster (e.g. average lamps in a home) |
icc |
number. Intraclass correlation (similarity of clustered data) |
A correction factor for sample sizes drawn from clustered units.
http://faculty.smu.edu/slstokes/stat6380/deff doc.pdf
designEffect(35, 0.75)
designEffect(35, 0.75)
Assumes data is provided in columns listing each category that is part of the weighting scheme, then a column for the sample n and a column for the general population.
makeWeights(data, sampleVal, populationVal, digits = 5, checkCols = FALSE) ## S3 method for class 'data.frame' makeWeights(data, sampleVal, populationVal, digits = 5, checkCols = FALSE)
makeWeights(data, sampleVal, populationVal, digits = 5, checkCols = FALSE) ## S3 method for class 'data.frame' makeWeights(data, sampleVal, populationVal, digits = 5, checkCols = FALSE)
data |
A data.frame (or data.table) to add weights to. |
sampleVal |
A string selecting the column in the data with sample counts |
populationVal |
A string selecting the column in the data with population counts |
digits |
A number of digits to use when rounding proportion weights |
checkCols |
A boolean that toggles whether to calculate checks on proportion and population (included as additional columns) |
A dataframe with population and proportion weights, as well as optional intermediate calculations.
myData <- data.frame(HairEyeColor) myData$Population <- round(runif(nrow(myData),10000,20000),0) makeWeights(data=myData,sampleVal="Freq",populationVal = "Population")
myData <- data.frame(HairEyeColor) myData$Population <- round(runif(nrow(myData),10000,20000),0) makeWeights(data=myData,sampleVal="Freq",populationVal = "Population")
Get the mode of a vector of values
mode(x, show_all = FALSE)
mode(x, show_all = FALSE)
x |
A vector of values to calculate the mode from |
show_all |
A boolean, if FALSE (default) returns a single mode or NA if there are none/multiple. If TRUE, returns multiple modes, if they exist |
The mode(s) of the supplied vector.
Pairwise proportion comparisons
pairPropTest( data, indexVar, valVar, grpVar, alpha = 0.1, n.min = 10, p.adjust.method = p.adjust.methods, counts = FALSE )
pairPropTest( data, indexVar, valVar, grpVar, alpha = 0.1, n.min = 10, p.adjust.method = p.adjust.methods, counts = FALSE )
data |
A dataset to calculate proportions for and test for statistically significant differences. |
indexVar |
string. Selects an index column for the dataset |
valVar |
string. Selects the column containing counts of successes in data |
grpVar |
string. Selects the column containing counts of trials in data |
alpha |
number. Significance level (e.g. 0.05 for 95-pct confidence level) |
n.min |
number. Minimum counts to consider |
p.adjust.method |
string. Method for adjusting p-values. See ?p.adjust for more details. |
counts |
Boolean. Toggles whether function returns significance results or counts (for diagnostic purposes) |
A dataframe showing p-values and statistically significant differences for the pairs of variables chosen
Pairwise T-test comparisons
pairTtest( data, valVar, grpVar, alpha = 0.1, n.min = 10, p.adjust.method = p.adjust.methods )
pairTtest( data, valVar, grpVar, alpha = 0.1, n.min = 10, p.adjust.method = p.adjust.methods )
data |
A dataset to calculate difference testing for and test for statistically significant differences. |
valVar |
string. Selects the column containing counts of successes in data |
grpVar |
string. Selects the column containing counts of trials in data |
alpha |
number. Significance level (e.g. 0.05 for 95-pct confidence level) |
n.min |
number. Minimum counts to consider |
p.adjust.method |
string. Method for adjusting p-values. See ?p.adjust for more details. |
A dataframe showing p-values and statistically significant differences for the pairs of variables chosen
Generating a penetration table
Generating a weighted proportion table (2-way)
penTable( data, index, x, y, totWeightVar = NULL, inGroupWeightVar = NULL, only_ns, accuracy, normwt = TRUE, tot.label = "Total" ) ## S3 method for class 'data.frame' penTable( data, index, x, y, totWeightVar = NULL, inGroupWeightVar = NULL, only_ns = FALSE, accuracy = 1, normwt = TRUE, tot.label = "Total" )
penTable( data, index, x, y, totWeightVar = NULL, inGroupWeightVar = NULL, only_ns, accuracy, normwt = TRUE, tot.label = "Total" ) ## S3 method for class 'data.frame' penTable( data, index, x, y, totWeightVar = NULL, inGroupWeightVar = NULL, only_ns = FALSE, accuracy = 1, normwt = TRUE, tot.label = "Total" )
data |
A dataset to calculate weighted proportions |
index |
string. Selects an index column for the dataset |
x |
string. Selects the first variable to find proportions for |
y |
string. Selects the second variable to find proportions for |
totWeightVar |
string. A string selecting the column to weight the population |
inGroupWeightVar |
string. A string selecting the column to use for in-group weights |
only_ns |
Boolean. Toggles whether to return penetration table or intermediate table of n's. |
accuracy |
number. A number to round to. Use (e.g.) 0.01 to show 2 decimal places of precision. If NULL, the default, uses a heuristic that should ensure breaks have the minimum number of digits needed to show the difference between adjacent values. |
normwt |
Boolean. if TRUE, normalize weights so that the total weighted count is the same as the unweighted one |
tot.label |
string. A string label for totals column |
A data.frame or data.table showing a penetration table
Proportion comparisons
propTest( data, indexVar, valVar, grpVar, counts = NULL, alpha = 0.1, n.min = 10, alternative = c("two.sided", "less", "greater") )
propTest( data, indexVar, valVar, grpVar, counts = NULL, alpha = 0.1, n.min = 10, alternative = c("two.sided", "less", "greater") )
data |
A dataset to calculate proportions for and test for statistically significant differences. |
valVar |
string. Selects the column containing counts of successes in data |
grpVar |
string. Selects the column containing counts of trials in data |
counts |
vector. Optional vector of strings containing columns counts for successes and trials (otherwise, function calculates counts from valVar and grpVar) |
alpha |
number. Significance level (e.g. 0.05 for 95-pct confidence level) |
n.min |
number. Minimum counts to consider |
alternative |
string. Specifies the alternative hypothesis. See ?prop.test |
A dataframe showing p-values and statistically significant differences for the chosen variables
Generate a statistical summary table, with optional grouping
statsTable( data, summVar, groupVar = NULL, stats, accuracy = NULL, totCol = TRUE, totWeightVar = NULL, inGroupWeightVar = NULL, drop0trailing = FALSE, colOrder = NULL ) ## S3 method for class 'data.frame' statsTable( data, summVar, groupVar = NULL, stats, accuracy = 1, totCol = TRUE, totWeightVar = NULL, inGroupWeightVar = NULL, drop0trailing = FALSE, colOrder = NULL ) ## S3 method for class 'data.table' statsTable( data, summVar, groupVar = NULL, stats, accuracy = 1, totCol = TRUE, totWeightVar = NULL, inGroupWeightVar = NULL, drop0trailing = FALSE, colOrder = NULL )
statsTable( data, summVar, groupVar = NULL, stats, accuracy = NULL, totCol = TRUE, totWeightVar = NULL, inGroupWeightVar = NULL, drop0trailing = FALSE, colOrder = NULL ) ## S3 method for class 'data.frame' statsTable( data, summVar, groupVar = NULL, stats, accuracy = 1, totCol = TRUE, totWeightVar = NULL, inGroupWeightVar = NULL, drop0trailing = FALSE, colOrder = NULL ) ## S3 method for class 'data.table' statsTable( data, summVar, groupVar = NULL, stats, accuracy = 1, totCol = TRUE, totWeightVar = NULL, inGroupWeightVar = NULL, drop0trailing = FALSE, colOrder = NULL )
data |
A data.frame (or data.table) to use for statistical summary |
summVar |
A string selecting the column in 'data' to summarize |
groupVar |
A string or list of strings selecting the (optional) columns in 'data' to grop by |
stats |
A list of strings selecting summary stats functions (i.e. mean, sd, sum) |
accuracy |
A number to round to. Use (e.g.) 0.01 to show 2 decimal places of precision. If NULL, the default, uses a heuristic that should ensure breaks have the minimum number of digits needed to show the difference between adjacent values. |
totCol |
A boolean toggling whether to include a total column |
totWeightVar |
A string selecting the column to weight the population |
inGroupWeightVar |
A string selecting the column to use for in-group weights |
drop0trailing |
A boolean toggling whether to include trailing zeros in the output (converts to strings) |
colOrder |
To be deprecated |
A data.frame with statistical summary results describing the selected variable.
library(dplyr) statsTable(iris, summVar = "Sepal.Length", groupVar = "Species", stats = c("n", "min", "max", "weighted.mean", "median", "sd"), accuracy = 2)
library(dplyr) statsTable(iris, summVar = "Sepal.Length", groupVar = "Species", stats = c("n", "min", "max", "weighted.mean", "median", "sd"), accuracy = 2)
Conducting stratified random sampling
Stratified random sampling
stratRandSample( data, group, size, select = NULL, replace = FALSE, bothSets = FALSE, keep.rownames = FALSE ) ## S3 method for class 'data.frame' stratRandSample( data, group, size, select = NULL, replace = FALSE, bothSets = FALSE, keep.rownames = NULL ) ## S3 method for class 'data.table' stratRandSample( data, group, size, select = NULL, replace = FALSE, bothSets = FALSE, keep.rownames = FALSE )
stratRandSample( data, group, size, select = NULL, replace = FALSE, bothSets = FALSE, keep.rownames = FALSE ) ## S3 method for class 'data.frame' stratRandSample( data, group, size, select = NULL, replace = FALSE, bothSets = FALSE, keep.rownames = NULL ) ## S3 method for class 'data.table' stratRandSample( data, group, size, select = NULL, replace = FALSE, bothSets = FALSE, keep.rownames = FALSE )
data |
A data.frame (or data.table) to use for allocating sample |
group |
string. The column(s) that represent strata |
size |
number. If <1, the proportion to take from each stratum. If an integer 1+, the number of samples to take from each stratum. If size is a vector of integers, the number of samples taken for each stratum. Recommended in latter case to use a named vector |
select |
list. Named list specifying a subset of strata to use in sampling |
replace |
boolean. Toggling whether to sample with replacement |
bothSets |
boolean. Toggling whether to return list of sampled and unsampled portions of data |
keep.rownames |
For data.tables only. See ?data.table. Adapted from https://gist.github.com/mrdwab/6424112 and https://gist.github.com/mrdwab/933ffeaa7a1d718bd10a |
A sample of the data passed to the function, optionally accounting for strata.
Tidy a weighted chi-squared contingency table test
## S3 method for class 'wtd.chi.sq' tidy(x)
## S3 method for class 'wtd.chi.sq' tidy(x)
x |
An htest object, such as those created by weights::wtd.chi.sq |
A tibble::tibble() with columns for method, coefficients, estimated values, p-value, and other statistics
Tidy a weighted t-test object
## S3 method for class 'wtd.t.test' tidy(x)
## S3 method for class 'wtd.t.test' tidy(x)
x |
An htest object, such as those created by weights::wtd.t.test() |
A tibble::tibble() with columns for method, coefficients, estimated values, p-value, and other statistics
Generating a weighted frequency table (2-way)
Generating a weighted frequency table (2-way)
wtdFreqTable( data, x, y, totWeightVar = NULL, inGroupWeightVar = NULL, accuracy = 1, normwt = TRUE, tot.label = "Statewide", colOrder = NULL ) ## S3 method for class 'data.frame' wtdFreqTable( data, x, y, totWeightVar = NULL, inGroupWeightVar = NULL, accuracy = 1, normwt = TRUE, tot.label = "Total", colOrder = NULL )
wtdFreqTable( data, x, y, totWeightVar = NULL, inGroupWeightVar = NULL, accuracy = 1, normwt = TRUE, tot.label = "Statewide", colOrder = NULL ) ## S3 method for class 'data.frame' wtdFreqTable( data, x, y, totWeightVar = NULL, inGroupWeightVar = NULL, accuracy = 1, normwt = TRUE, tot.label = "Total", colOrder = NULL )
data |
A dataset to calculate weighted frequencies. Only operational for data.frame for now |
x |
string. Selects the first variable to find frequencies for |
y |
string. Selects the second variable to find frequencies for |
totWeightVar |
string. A string selecting the column to weight the population |
inGroupWeightVar |
string. A string selecting the column to use for in-group weights |
accuracy |
number. A number to round to. Use (e.g.) 0.01 to show 2 decimal places of precision. If NULL, the default, uses a heuristic that should ensure breaks have the minimum number of digits needed to show the difference between adjacent values. |
normwt |
Boolean. if TRUE, normalize weights so that the total weighted count is the same as the unweighted one |
tot.label |
string. Label for totals column |
colOrder |
vector. Vector of strings to set the order for the colum given by variable x |
A data.frame showing a two-way weighted frequency table
Weighted pairwise proportion comparisons
wtdPairPropTest( data, indexVar, valVar, grpVar, weightVar, alpha = 0.1, n.min = 10, p.adjust.method = p.adjust.methods, counts = FALSE )
wtdPairPropTest( data, indexVar, valVar, grpVar, weightVar, alpha = 0.1, n.min = 10, p.adjust.method = p.adjust.methods, counts = FALSE )
data |
A dataset to calculate proportions for and test for statistically significant differences. |
indexVar |
string. Selects an index column for the dataset |
valVar |
string. Selects the column containing counts of successes in data |
grpVar |
string. Selects the column containing counts of trials in data |
weightVar |
string. Selects the column containing weights in the data |
alpha |
number. Significance level (e.g. 0.05 for 95-pct confidence level) |
n.min |
number. Minimum counts to consider |
p.adjust.method |
string. Method for adjusting p-values. See ?p.adjust for more details. |
counts |
Boolean. Toggles whether function returns significance results or counts (for diagnostic purposes) |
A dataframe showing p-values and statistically significant differences for the pairs of variables chosen
Pairwise Weighted T-Test comparisons
wtdPairTtest( data, valVar, grpVar, weightVar, alpha = 0.1, n.min = 10, p.adjust.method = p.adjust.methods )
wtdPairTtest( data, valVar, grpVar, weightVar, alpha = 0.1, n.min = 10, p.adjust.method = p.adjust.methods )
data |
A dataset to calculate difference testing for and test for statistically significant differences. |
valVar |
string. Selects the column containing counts of successes in data |
grpVar |
string. Selects the column containing counts of trials in data |
weightVar |
string. Selects the column containing weights in the data |
alpha |
number. Significance level (e.g. 0.05 for 95-pct confidence level) |
n.min |
number. Minimum counts to consider |
p.adjust.method |
string. Method for adjusting p-values. See ?p.adjust for more details. |
A dataframe showing p-values and statistically significant differences for the pairs of variables chosen
Generating a weighted proportion table (2-way)
Generating a weighted proportion table (2-way)
wtdPropTable( data, x, y, totWeightVar = NULL, inGroupWeightVar = NULL, pct_format = TRUE, accuracy = 0.1, normwt = TRUE, tot.label = "Total", colOrder = NULL ) ## S3 method for class 'data.frame' wtdPropTable( data, x, y, totWeightVar = NULL, inGroupWeightVar = NULL, pct_format = TRUE, accuracy = 0.1, normwt = TRUE, tot.label = "Total", colOrder = NULL )
wtdPropTable( data, x, y, totWeightVar = NULL, inGroupWeightVar = NULL, pct_format = TRUE, accuracy = 0.1, normwt = TRUE, tot.label = "Total", colOrder = NULL ) ## S3 method for class 'data.frame' wtdPropTable( data, x, y, totWeightVar = NULL, inGroupWeightVar = NULL, pct_format = TRUE, accuracy = 0.1, normwt = TRUE, tot.label = "Total", colOrder = NULL )
data |
A dataset to calculate weighted proportions |
x |
string. Selects the first variable to find proportions for |
y |
string. Selects the second variable to find proportions for |
totWeightVar |
string. A string selecting the column to weight the population |
inGroupWeightVar |
string. A string selecting the column to use for in-group weights |
pct_format |
boolean. Toggles whether proportions are given as decimals or percents (converts to strings) |
accuracy |
number. A number to round to. Use (e.g.) 0.01 to show 2 decimal places of precision. If NULL, the default, uses a heuristic that should ensure breaks have the minimum number of digits needed to show the difference between adjacent values. |
normwt |
Boolean. if TRUE, normalize weights so that the total weighted count is the same as the unweighted one |
tot.label |
string. A string label for totals column |
colOrder |
vector. A vector of strings to set the order for the column given by variable x |
A data.frame or data.table showing a two-way weighted proportion table