Statistics help / glossary

Adjusted means

See Least squares means, below.


AIC

Akaike information criterion is a measure of the relative quality of statistical models for a given set of data. Given a collection of models for the data, AIC estimates the quality of each model, relative to each of the other models. Hence, AIC provides a means for model selection. https://en.wikipedia.org/wiki/Akaike_information_criterion


ANOVA

ANalysis Of VAriance https://en.wikipedia.org/wiki/Analysis_of_variance


Chisq

Chi square. See Log-rank test, below.


Coefficient of variation (CV)

The coefficient of variation is equal to the standard deviation divided by the mean. It is often represented as a percentage, and conveys the amount of variability relative to the mean. For example, a CV of 5% (or 0.05) means the standard deviation is equal to 5% of the mean. There are no units associated with CV; therefore, it can be used to compare the spread of data across data sets that have different units and/or means.


Confidence interval / limit (CI, CL)

A confidence interval/limit has an upper and lower bound that specifies a range of values (or interval) that with some degree of confidence (e.g., 95%) contains an unobservable (true value) of a parameter of interest.


Contrast

In the context of a pairwise comparison (also known as a contrast), an estimate refers to the difference in model-adjusted means between the pair of groups.


denDF

Denominator degrees of freedom. See F statistic, below.


Degrees of freedom (DF)

Degrees of freedom refers to the number of values in a calculation that can vary, and represent the number of independent pieces of information that are used in the calculation of a statistic or parameter.


Estimate

An estimate refers to the difference in model-adjusted means between the pair of groups. See also Contrast.


F statistic (F, F value)

F statistic corresponds to a ratio of two different measures of variance for a set of data. In an analysis of variance (ANOVA), the numerator corresponds to the variance accounted for by the model (also referred to as mean square of the model, MSM), which is calculated as the between sum of squares (SS) divided by corresponding DF. The denominator corresponds to the variance accounted for by the residuals / error (also referred to as the mean square of the error, MSE), which is calculated as the within SS divided by the corresponding DF. The numerator degrees of freedom (numDF) are based on the number of groups compared (for 1-way it is the number of groups minus 1) and the denominator degrees of freedom (denDF) are based on the number of observations within the groups (for 1-way it is the number of observations minus the number of groups).


Intercept

The (Y-) intercept of a (linear regression) model is a constant that represents the mean response of the dependent variable (or Y) when all predictor/independent variables (or X.s) are excluded from the model. Additionally, the intercept ensures that the mean of the residuals is equal to zero which is a required linear regression model assumption.


Log-rank test

Log-rank test is a hypothesis test to compare the survival distributions of two or more groups. It is a nonparametric test and appropriate to use when the data are right skewed and censored. The log-rank test statistic compares estimates of the hazard functions of the groups at each observed event time. It is constructed by computing the observed and expected number of events in one of the groups at each observed event time and then adding these to obtain an overall summary across all-time points where there is an event. The log-rank test is a form of Chi-square test and results in a Chi-square statistic used to calculate significance of the test. https://en.wikipedia.org/wiki/Log-rank_test


Least squares means (LSM, lsmeans, model-adjusted means)

Least squares means are arithmetic means adjusted by model term(s). They represent a better estimate of the true population mean than the unadjusted group means, and are less sensitive to missing data.


Model coefficient

Represents the difference in the predicted value of Y (the dependent variable) for each one-unit difference in X (the dependent variable). For example, if we were using gender to predict weight, the coefficient would represent the difference in weight prediction for each gender. Coefficients are useful because they help build predictive models.


numDF

Numerator degrees of freedom. See F statistic.


Pairwise comparison

Pairwise comparison investigates the difference between a pair of model factor levels to determine if it is statistically different from zero.


P value, Pr(>F)

P value (also abbreviated as the probability of a more extreme test statistic (e.g., Pr(>F)) is the calculated probability of finding the observed or more extreme results when the null hypothesis of a study question is true. P values approaching zero (less than 0.001) are sometimes shown as 0 due to rounding.


Odds ratio

See Wang-Allison test.


Standard deviation (SD)

Similar to variance, the standard deviation indicates the spread of values in a data set. It is equal to the square root of the variance. The standard deviation is expressed in the same units as the mean; whereas, the variance is expressed in squared units.


Standard error of measurement (SE, SEM)

Standard error is a measure of the accuracy (or variability) of an estimate. In statistical terms, it is the standard deviation derived from a sampling distribution of a given statistic. When associated with an estimate, standard error is defined as the square root of the estimated error variance of the quantity.


Sum of squares (SS, Sum of Sq, RSS)

Sum of squares is a representation of variation. Sum of squares can be partitioned into variation accounted for between groups (differences between each group mean versus the grand mean) and within groups (variation of individual values versus each group mean). The summation of the SS between (also known as treatment) and within (also known as error or residual (RSS)) is referred to as the SS total. These values are used in the calculation of the test statistic that is used to convey statistical significance.


t statistic, t ratio

t-statistic is an estimate divided by the standard error. It is a measure of how extreme a statistical estimate is (or a measure of the size of a difference relative to variation in the data).


Unadjusted means

Simple averages. Contrast to Least squares means, above.


Variance

Variance is a value that represents how far elements (or values) in the data set are from the mean. Variance is calculated as follows:
1. Subtract the mean from each value in the data.
2. Square each of the differences.
3. Add all of the squared differences together.
4. Divide the sum of the squares by the number of values in the data set.


Wang-Allison test

Wang-Allison test is a Fisher's exact test that involves the comparison of the number of subjects alive and dead beyond a specified time point (e.g., 90% percentile) between two sample groups. This test results in an odds ratio that corresponds to the ratio of the number of subjects alive versus dead in group 1 over the number of subjects alive versus dead in group 2. Reference: Wang C., et al. (2004). Statistical methods for testing effects on "maximum lifespan". Mech. Ageing Dev., 125, 629.632.


Z-score

A z-score (or standard deviation score) is equal to the difference between an element (particular data value) and the mean divided by the standard deviation. It is a representation of the number of standard deviations an element is from the mean.

Z-scores are expressed in terms of standard deviations from their means. Resultantly, z-scores have a distribution with a mean of 0 and a standard deviation of 1.

A Z-score of zero means the score is the same as the mean. A Z-score can be positive or negative, indicating whether it is above or below the mean and by how many standard deviations.

If the data have a normal distribution, approximately 68% of the elements have a z-score between -1 and 1; 95% have a z-score between -2 and 2; and 99% have a z-score between -3 and 3.