Descriptive Statistics Mean average of all observation mean = (sum of all observations)/(sample size) Median the middle value of all observations if sample size is odd median = ((n+1)/2)th largest value if the sample size is even median = the average of the (n/2)th and ((n/2)+1)th largest value Mode the most commonly occurring value if there is more than 1 most commonly occurring value, there are as many modes as most commonly occurring values in decreasing order of resistance to outliers, mode > median > mean Types of Distributions Normal aka Gaussian, bell-shaped for continuous variables mean = median = mode Bi-modal distribution has 2 humps (each being a relative mode) if symmetrical, mean = median Skewed positive skew asymmetrical with tail trailing off to right mean > median > mode negative skew asymmetrical with tail trailing off to left mean < median < mode mean very sensitive to skew median somewhat resistant to skew mode very resistant to skew Other non-continuous variable types have their own distributions e.g., binary, categorical, ordinal, binomial, and count variables Characteristics of the Normal Distribution For continuous variables Defined entirely by 2 parameters Mean (µ) standard deviation (σ) A certain percentage of all observations will always fall within +/- certain standard deviations of the mean +/- 1 standard deviation = 68% +/- 2 standard deviations = 95% +/- 3 standard deviations = 99.7% Regression to the Mean Phenomenon in which sample points which were initially extreme often become closer to the mean in future measurements Most points will fall near on the average; therefore, extreme points are often a result of "luck" (e.g., a student performs particularly poor on an exam but normally performs at the average level) Has significance for study design e.g., patients with high blood pressure may improve after taking an experimental anti-hypertensive, but that improvement on the next measurement may be due to regression to the mean rather than the treatmentt the solution is to compare a control and experimental group. Measures of Variability Standard deviation a statistical measure that demonstrates how close together or spread apart the data is if data is closer together, the standard deviation will be smaller (and vice versa) often designated by σ equation square root[(sum of the differences between each data point and the mean squared)/n] Standard error a statistical measure that demonstrates how far the sample mean is from the true population mean helps determine confidence intervals equation standard deviation/square root of n