Chi-Square Calculator
Goodness of Fit & Independence Test
| Category | Observed (O) | Expected (E) |
|---|
Use the calculator above to run either a goodness-of-fit test or a test of independence.
Enter your observed and expected frequencies, pick a significance level, and click Calculate.
You’ll get the chi-square statistic, degrees of freedom, p-value, and a plain-English verdict on the null hypothesis.
How the Chi-Square Test Works
The chi-square test asks whether the differences between what you observed and what you expected are due to chance or large enough to be real.
It works by measuring the gap between observed frequencies (O) and expected frequencies (E) across categories.
The larger those gaps, the bigger the chi-square statistic, and the smaller the p-value. Cross a low enough p-value and the result is statistically significant.
Goodness of Fit
Use this when you have one categorical variable, and you want to test whether your sample matches a specific distribution.
For example, a teacher expects her class grades to follow the distribution 30% A, 40% B, 20% C, 10% D.
After the exam, she counts the actual results and wants to know if they match expectations.
Formula:
Chi-square = Sum of [ (Observed - Expected)^2 / Expected ] across all categories
Degrees of freedom: k – 1 (where k is the number of categories)
Test for Independence
Use this when you have two categorical variables and want to test whether they are related. For example, does a customer’s preferred product color depend on their age group?
You set up a contingency table of observed frequencies. For each cell, the calculator computes the expected frequency:
Expected(i,j) = (Row i total x Column j total) / Grand total
Then applies the same chi-square formula across all cells.
Degrees of freedom: (rows – 1) x (columns – 1)
Practical Example
A candy company claims its bags contain equal amounts of five colors: red, blue, green, yellow, and orange (20% each).
You buy a bag with 100 candies and count: Red 18, Blue 24, Green 19, Yellow 22, Orange 17.
Expected count for each: 20
Chi-square contributions:
- Red: (18-20)^2 / 20 = 0.2
- Blue: (24-20)^2 / 20 = 0.8
- Green: (19-20)^2 / 20 = 0.05
- Yellow: (22-20)^2 / 20 = 0.2
- Orange: (17-20)^2 / 20 = 0.45
Total chi-square = 1.7
Degrees of freedom = 5 – 1 = 4
P-value = 0.791
With p = 0.791, well above alpha = 0.05, you fail to reject the null hypothesis. The color distribution is consistent with the company’s claim.
How to Interpret Your Results
The P-Value
The p-value is the probability of getting a chi-square statistic this large (or larger) by random chance, assuming your null hypothesis is true.
- p < alpha (e.g., p < 0.05): Reject the null hypothesis. The difference between observed and expected is statistically significant.
- p >= alpha: Fail to reject the null hypothesis. The data is consistent with what you expected.
A common threshold is alpha = 0.05, but the right level depends on the context. Medical studies often use 0.01 or 0.001 to reduce the chance of a false positive.
“Fail to Reject” is Not the Same as “Accept”
Not finding significance does not prove your hypothesis is correct.
It just means the data did not provide enough evidence against it. With a small sample, real differences often go undetected.
Interpreting the Independence Test
If you reject the null hypothesis of independence, you know the two variables are associated.
The chi-square test does not tell you how strong that association is or which direction it goes. For that, look at effect size measures like Cramer’s V.
The Breakdown Table
Each row in the breakdown table shows the contribution of one category to the total chi-square statistic.
Categories with high (O-E)^2/E values are the ones driving the result. If you get a significant p-value, check this table to see where the big gaps are.
How to Do This in Excel
Excel has three chi-square functions you’ll use regularly.
CHISQ.TEST
=CHISQ.TEST(actual_range, expected_range)
This is the most direct option. Point it at your range of observed values and your range of expected values, and it returns the p-value.
For the candy example, if your observed counts are in B2:B6 and expected counts are in C2:C6:
=CHISQ.TEST(B2:B6, C2:C6)
Returns 0.7907. That’s the p-value.
Gotcha: CHISQ.TEST returns only the p-value. It does not show the chi-square statistic itself. If you need the statistic, use CHISQ.DIST.RT.
Gotcha #2: The two ranges must have exactly the same dimensions, or you get a #N/A error. If your expected values are percentages (like 0.20, 0.20…), convert them to counts first by multiplying by your total sample size.
CHISQ.DIST.RT
=CHISQ.DIST.RT(x, deg_freedom)
This returns the right-tail probability for a known chi-square value. Use it when you’ve already calculated the chi-square statistic and need the p-value.
For the candy example:
=CHISQ.DIST.RT(1.7, 4)
Returns 0.7907.
CHISQ.INV.RT
=CHISQ.INV.RT(probability, deg_freedom)
This gives you the critical value for a given alpha level. If your chi-square statistic exceeds this number, you reject the null hypothesis.
=CHISQ.INV.RT(0.05, 4)
Returns 9.488. That’s the threshold at alpha = 0.05 with 4 degrees of freedom.
Note on older Excel versions: These functions replaced CHITEST, CHIDIST, and CHIINV starting in Excel 2010. The old names still work in most versions but are marked as deprecated. Use the newer .DIST, .TEST, and .INV naming convention.
Common Mistakes to Avoid
Using percentages or proportions instead of counts. Chi-square needs raw frequencies (actual counts). If you enter percentages like 20%, 35%, 45%, your result will be meaningless. Convert proportions to expected counts by multiplying by the sample size.
Ignoring small expected frequencies. If any expected cell value is below 5, the chi-square approximation becomes unreliable. The calculator warns you when this happens. For 2×2 tables with small counts, use Fisher’s exact test instead.
Confusing observed with expected. For the goodness of fit test, the expected values come from your theory or hypothesis, not from a second sample. If you put two different observed samples in the test, you’re running the wrong analysis.
Using chi-square on continuous data. Chi-square is for counts of categorical outcomes. If your data is numerical (heights, temperatures, scores), you need a different test. You would need to bin the data into categories first, and the choice of bins affects the result.
Treating a non-significant result as proof of independence. A p-value above 0.05 does not mean the variables are unrelated. It means you do not have enough statistical evidence to claim they are. With a small sample, real relationships often go undetected.
Related Statistical Tests
Fisher’s Exact Test is the right choice when sample sizes are small or expected frequencies drop below 5. It computes the exact probability rather than relying on the chi-square approximation.
G-Test (Likelihood Ratio Test) is an alternative to chi-square for goodness of fit and independence, based on log-likelihoods instead of squared differences. For large samples, both give similar results.
McNemar’s Test is for paired categorical data. If you surveyed the same group before and after a treatment, chi-square is the wrong tool. It assumes observations are independent, and paired data breaks that assumption.
Frequently Asked Questions
Can the chi-square statistic be negative?
No. Each term in the formula is (O-E)^2/E, which is always zero or positive. A chi-square of zero would mean every observed frequency exactly equals its expected value, which almost never happens in real data.
What is a good chi-square value?
There is no universal “good” value. What matters is whether the chi-square statistic exceeds the critical value for your degrees of freedom and alpha level. A chi-square of 10 could be significant with 2 degrees of freedom, but completely unremarkable with 15.
How many observations do I need?
A commonly cited rule is that each expected cell frequency should be at least 5. For a goodness of fit test with 5 categories, that means at least 25 total observations. For an independence test with a 3×4 table (12 cells), you’d want at least 60 total observations. Larger samples give more reliable results.
What is the difference between goodness of fit and the independence test?
Goodness of fit tests one variable against a known expected distribution. The independence test looks at two variables simultaneously in a contingency table to see if knowing one variable tells you anything about the other.
Does chi-square tell me which groups are different?
No. A significant chi-square tells you that at least one category differs from expected (or at least one pair of variables is associated), but it does not identify which ones. Look at the breakdown table to find the categories contributing most to the chi-square statistic.
Can I use chi-square for a 1×2 table?
You can, but with only 2 categories, the degrees of freedom is 1. For small samples in a 2-category test, some statisticians recommend applying Yates’ continuity correction, which adjusts the formula to [(|O-E| – 0.5)^2 / E]. The correction reduces the risk of a false positive with small counts.
Related Articles / Calculators