P-Value Calculator
Z-test, T-test, and Chi-Square tests
You ran a statistical test and got a number. Now you need to know what it means.
The p-value is what tells you whether your result is statistically significant or could just be random noise, and it's one of the most misunderstood numbers in all of statistics.
This calculator handles three common hypothesis tests:
- Z-tests
- T-tests (one-sample and two-sample)
- and Chi-Square tests.
Enter your test statistic, tell the calculator what kind of test you ran, and it gives you the p-value plus a plain-English interpretation.
What a P-Value Actually Means
Here's what a p-value is, and what it is not.
A p-value tells you: assuming the null hypothesis is true, how likely is it to see a result at least as extreme as yours?
It does NOT tell you:
- The probability that the null hypothesis is true
- The probability that your result happened by chance
- How large or important your effect is
This distinction matters. A p-value of 0.03 means "if there were really no effect, you'd see data this extreme or more extreme only 3% of the time." It does not mean "there's a 3% chance I'm wrong."
Think of it like a smoke alarm. A p-value below your threshold (usually 0.05) is the alarm going off. It tells you something worth investigating is probably happening. It doesn't tell you how bad the fire is.
How the Calculation Works
The p-value is calculated from your test statistic and the probability distribution it follows under the null hypothesis.
Z-Test P-Value
Used when you have a large sample (n ≥ 30) or a known population standard deviation.
One-tailed: p = 1 - Φ(|z|)
Two-tailed: p = 2 × (1 - Φ(|z|))
Where Φ is the standard normal CDF.
Example: You test whether a coin is fair. After 100 flips, you get 60 heads. Your z-score is 2.00.
- Two-tailed p-value: 2 × (1 - Φ(2.00)) = 2 × 0.0228 = 0.0455
- Since 0.0455 < 0.05, you can reject the null hypothesis. The coin is likely biased.
T-Test P-Value
Used for smaller samples or unknown population standard deviation. The t-distribution has heavier tails than the normal, reflecting more uncertainty with less data.
One-tailed: p = 1 - T_CDF(|t|, df)
Two-tailed: p = 2 × (1 - T_CDF(|t|, df))
The degrees of freedom (df) affect the shape of the distribution. For a one-sample t-test, df = n - 1.
Example: You measure sleep hours for 16 people and get t = 2.45, df = 15.
- Two-tailed p-value ≈ 0.027
- Significant at α = 0.05 but not at α = 0.01.
Chi-Square Test P-Value
Used to test whether observed frequencies differ from expected frequencies, or whether two categorical variables are independent. Always right-tailed.
p = 1 - χ²_CDF(χ², df)
Example: You survey 200 people about brand preference across 3 brands. Your chi-square statistic is 8.5, df = 2.
- p-value ≈ 0.014
- Significant at α = 0.05. The preference distribution is not uniform.
How to Interpret Your P-Value
The significance level (α) is your threshold, the maximum p-value you're willing to accept and still reject the null hypothesis. The most common thresholds:
| P-Value | Common Interpretation |
|---|---|
| p < 0.001 | Very strong evidence against null |
| 0.001 ≤ p < 0.01 | Strong evidence |
| 0.01 ≤ p < 0.05 | Moderate evidence |
| 0.05 ≤ p < 0.10 | Weak evidence |
| p ≥ 0.10 | Little to no evidence |
If p ≤ α: Reject the null hypothesis. Your result is statistically significant.
If p > α: Fail to reject the null. This does not mean the null is true, it means you don't have enough evidence to rule it out.
One more thing: statistical significance is not the same as practical significance. A study with 10,000 participants might get p = 0.001 for an effect so tiny it doesn't matter in practice. Always consider the effect size alongside the p-value.
How to Calculate P-Values in Excel
Excel has built-in functions for each test type.
Z-Test (two-tailed):
=2*(1-NORM.S.DIST(ABS(z),TRUE))
NORM.S.DIST(z, TRUE) returns the cumulative probability up to z. Subtract from 1 to get the upper tail, then multiply by 2 for two-tailed.
One-tailed Z-test:
=1-NORM.S.DIST(ABS(z),TRUE)
T-Test (two-tailed):
=T.DIST.2T(ABS(t), df)
Or for one-tailed (right tail):
=T.DIST.RT(t, df)
Note: T.DIST.2T requires a positive t-value and returns the two-tailed p-value directly.
Chi-Square (right-tailed):
=CHISQ.DIST.RT(chi2, df)
Gotcha with older Excel versions: T.DIST.2T, T.DIST.RT, and CHISQ.DIST.RT were introduced in Excel 2010. In older versions, use TDIST(ABS(t), df, 2) and CHIDIST(chi2, df). These older functions still work in current Excel too.
Running a full T-test from raw data:
=T.TEST(array1, array2, tails, type)
Where tails is 1 or 2, and type is 1 (paired), 2 (two-sample equal variance), or 3 (two-sample unequal variance). This returns the p-value directly without you needing to calculate the t-statistic separately.
Common Mistakes to Avoid
Confusing p = 0.05 with "5% chance you're wrong." The p-value is not the probability that the null hypothesis is true. A p-value of 0.03 means the data pattern would occur 3% of the time under the null, not that there's a 97% chance your hypothesis is correct.
Using a one-tailed test to get a smaller p-value. A one-tailed test is only valid if you had a directional hypothesis before collecting data. Switching to one-tailed after seeing results because it crosses the significance threshold is p-hacking.
Ignoring degrees of freedom for T-tests. Using the wrong df changes the p-value, sometimes substantially. For a one-sample t-test, df = n - 1. For an independent two-sample t-test, df is calculated from both sample sizes.
Treating p > 0.05 as "no effect." Failing to reject the null hypothesis doesn't prove the null is true. It just means your data didn't provide enough evidence. Underpowered studies often miss real effects.
Using Z-test when sample is small. For small samples (n < 30) without a known population standard deviation, use a T-test. The Z-test underestimates p-values in small samples, making results look more significant than they are.
Related Calculators
Confidence Interval Calculator: closely related to p-values. If a 95% confidence interval excludes the null value, the two-tailed p-value is less than 0.05. The two approaches are complementary.
Chi-Square Test: specifically for categorical data and contingency tables. The chi-square statistic and its p-value tell you whether a relationship between two variables is likely real.
FAQ
What is a good p-value?
"Good" depends on your field and what's at stake. Psychology and social sciences commonly use 0.05. Medical research often uses 0.01 or stricter. Physics uses far smaller thresholds (0.000001). The right threshold is the one you decided on before running the test.
What does a p-value of 0.05 mean exactly?
It means that if the null hypothesis were true, you would see data at least this extreme about 5% of the time. It's a measure of how surprising your data is under the assumption that nothing is happening.
Can I use T.TEST in Excel instead of calculating manually?
Yes. =T.TEST(array1, array2, 2, 2) runs a two-sample t-test and returns the p-value directly. The third argument is the number of tails (1 or 2), and the fourth is the test type (1 = paired, 2 = equal variance, 3 = unequal variance).
What's the difference between one-tailed and two-tailed?
A two-tailed test looks for any difference (higher or lower). A one-tailed test looks for a difference in a specific direction. One-tailed tests have half the p-value of the equivalent two-tailed test, so only use them when your hypothesis was directional from the start.
What if my p-value is exactly 0.05?
Technically, 0.05 ≤ 0.05, so you'd reject the null at α = 0.05. In practice, a result right at the boundary warrants caution. Consider whether the finding replicates and what the effect size looks like.
Why does the Chi-Square test only have one tail?
The chi-square statistic is always positive (it's a sum of squared differences), and the question is always whether the statistic is unusually large. There's no "negative chi-square" to worry about, so the test is inherently one-directional.
Related Articles / Calculators