What is statistics?
Statistics is the science/math field related to data organization, analysis, and interpretation. It’s often fundamental in testing research ideas.
Why should you care? Many people learn about research findings daily through social media. For example, I recently saw this headline on Facebook: “Want to live longer? NIA study links fasting to longevity” to report on a publication. This headline might lead you to consider fasting regularly to help you live longer, but how likely is it that these results will apply to you?
It’s wonderful that science is now accessible to a broad audience, but this trend emphasizes the need for understanding the fundamentals of how researchers make
conclusions about data before changing your lifestyle (like fasting to try living longer). If you’re planning to be a healthcare provider, having a basic understanding of stats will help you evaluate research before making recommendations to patients.
Interested? Let’s start with some stats basics!
What are data?
Data are collected measurements/observations that provide information. Researchers can collect data in many ways, like running an experiment, distributing surveys, or extracting info from public databases.
One of the first steps in stats is appreciating the gestalt of these data points to determine their distribution, or frequency of observed values. Many popular stats tests are based on the assumption that data follow a “standard normal distribution”. If data do not meet this assumption, then results of commonly applied stats tests might be less meaningful.
To p or not to p: that is the title of way too many stats papers (see References). What does “p” have to do with the Q: How do data become “findings”?
Scientists often want to know at least 2 things in testing a research idea: 1) what is the likelihood that groups differ, or that a relationship exists? & 2) how strong is the relationship? In the case of “frequentist” stats – the most common approach – we answer q. 1 using significance testing.
Say we want to know whether happiness differs among puppy and cat lovers. Because we can’t realistically collect data from the entire general population, we randomly sample some people and test for a group difference, producing a value called a “test statistic.”
Here’s where things get tricky. Significance testing uses the test statistic
to estimate the probability against the assumption that groups do not differ, or null hypothesis. Instead of asking “is there a difference?”, significance testing asks, “should we reject the assumption that there is no difference?”
For each dataset, a test statistic will have an associated probability-value (aka “p”) that depends on several factors. If the test statistic is big enough to align with a small enough p-value, then we reject the assumption that there is no relationship. This doesn’t mean we “accept” that there is a difference.
Many scientists (including me 😬) often accept their hypothesis in discussing significance testing. ✅ “Findings do not support no difference between puppy and cat lovers’ happiness” vs. ❌“Findings suggest that puppy lovers are happier than cat lovers”? The first is how we should technically phrase the explanation of p-value-derived results, yet the latter is how they tend to be presented in research papers/media.
Do p-values tell us anything about a relationship’s strength? No. Frequentist tests are designed for smaller samples. When conducted on large groups (e.g., census data, medical databases), it can be easier to find “significant” results that are actually flimsy.
Correlation vs. Causation
A correlation is a way of testing a possible linear association between 2 factors – “linear” means the 2 factors are proportional and “association” means the 2 factors are related. Correlations can be positive (i.e., as one factor increases, the other increases) or negative (i.e., as one factor increases, the other decreases).
An example of a positive correlation is the association between the average outdoor temperature and the number of beach-related posts in your IG feed (i.e., as temperatures increase, #beachlife posts increase). An example of a negative correlation is the association between the average outdoor temperature and the number of hygge*-related posts in your IG feed (i.e., as temperatures decrease, #hyggeposts increase).
Causation is whether one factor increases/decreases IN RESPONSE to an increase/decrease in another factor. Whereas correlation asks, “Are colder temperatures associated with an increase in #hygge posts?”, causation asks, “Do colder temperatures cause #hygge posts?”
In studies of human behavior & physiology, we often can’t fully determine causation. We instead rely on testing associations because the procedures needed to test causation in a human are often unethical.
Let’s say we want to know whether activity in a brain region (e.g., amygdala) causes a behavior. To fully determine causation, we need to stimulate/stop amygdala activity and measure increases/decreases in a behavior, respectively. To do so directly, we would probably need to cut into a person’s skull to alter amygdala function. You can imagine that not many people would undergo a risky procedure just for the science!
Instead, we rely on other methods to help determine associations between behavior and physiology in humans (like running a correlation between neuroimaging data and behavior data) or a sense of causality from animal models using humane practices to test physiology changes. Of course, each approach has its own set of limitations, like the ability for animal model findings to apply to humans.
*Danish concept of coziness and enjoyment
Analysis of Variance
Let’s explore analysis of variance (aka ANOVA) - a commonly used collection of stats models. ANOVAs can help measure differences in variance among factors of interest  with several twists.
One-way ANOVA: Tests differences among >2 groups
One-way ANOVAs measure the ratio of variance across groups to variance within each group, helping us answer: Do >2 groups differ on a factor? Say we're interested in whether happiness differs among 3 groups after each listened to a 2010s viral song repeatedly for 2 hrs. Group A listened to “Gangnam Style”, Group B listened to "Baby Shark", and Group C listened to "What Does the Fox Say?" Using a one-way ANOVA, we test: does happiness after listening to a viral song differ based on song choice?
Analysis of Covariance (ANCOVA): Tests group differences accounting for other factors
The “C” in ANCOVA signals that we are “covarying” for other factors that might influence an outcome separately from experimental factors. In our study, we might control for depression symptoms – a person with more severe depression might have blunted happiness ratings not related to song choice. Using an ANCOVA, we test: does happiness after listening to a viral song differ based on song choice when accounting for depression severity?
Repeated Measures ANOVA (rmANOVA): Tests differences at 2+ time points.
Repeated measures refers to data collected over 2+ time points for the same factor. Our example’s repeated measure would be happiness ratings before AND after the listening period. rmANOVA tests for changes in a factor within one group or among 2+ groups. We can test main effects (i.e., do groups differ in happiness when combining time points; does happiness change across time points when combining all groups) and, importantly, test for an interaction effect (i.e., do groups differ in how they change on a factor over time). The image above shows an interaction effect. Groups A and B do not differ on happiness before music. Group A’s happiness remains stable after 2hrs of Gangnam Style, but Group B’s happiness significantly decreases after 2hrs of Baby 🦈.
 Wild, C. J., Utts, J. M., & Horton, N. J. (2018). What is statistics?. In International Handbook of Research in Statistics Education (pp. 5-36). Springer, Cham.
 Mitchell, S. J., Bernier, M., Mattison, J. A., Aon, M. A., Kaiser, T. A., Anson, R. M., ... & de Cabo, R. (2018). Daily Fasting Improves Health and Survival in Male Mice Independent of Diet Composition and Calories. Cell metabolism.
 Australian Bureau of Statistics
 Yale University Statistical Topics “The Normal Distribution”
 Buchinsky, F. J., & Chadha, N. K. (2017). To P or not to P: backing Bayesian statistics. Otolaryngology–Head and Neck Surgery, 157(6), 915-918.
 Lew, M. J. (2013). To P or not to P: on the evidential nature of P-values and their place in scientific inference. arXiv preprint arXiv:1311.0081.
 Christenson, P. (1995). To p or Not to p. Journal of Child and Adolescent Psychiatric Nursing, 8(1), 42-42.
 Wasserstein RL, Lazar NA. The ASA’s statement on p-values: context, process, and purpose. Am Stat. 2016;70:129-133.
 Lin, M., Lucas Jr, H. C., & Shmueli, G. (2013). Research commentary—too big to fail: large samples and the p-value problem. Information Systems Research, 24(4), 906-917.
 Greenland, S., Senn, S. J., Rothman, K. J., Carlin, J. B., Poole, C., Goodman, S. N., & Altman, D. G. (2016). Statistical tests, P values, confidence intervals, and power: a guide to misinterpretations. European journal of epidemiology, 31(4), 337-350.
 Altman DG. Practical Statistics for Medical Research. Chapman & Hall/CRC;
 New England Complex Systems Institute, “Concepts: Linear and Nonlinear”
 Swinscow TDV. In: Statistics at square one. Nineth Edition. Campbell M J, editor. University of Southampton; Copyright BMJ Publishing Group 1997.
 Kim, H. Y. (2014). Analysis of variance (ANOVA) comparing means of more than two groups. Restorative dentistry & endodontics, 39(1), 74-77.
 SPSS Tutorials: “Variance – What is it?”
 Field, A. (2013). Discovering statistics using IBM SPSS statistics. sage.
 Weinfurt, K. P. (2000). Repeated measures analysis: ANOVA, MANOVA, and HLM. In L. G. Grimm & P. R. Yarnold (Eds.), Reading and understanding MORE multivariate statistics (pp. 317-361). Washington, DC, US: American Psychological Association.