How to Find Outliers in Excel?

When analyzing data, we usually assume that the data values fall somewhere near the mean or median, or at least closeby.

However, there might be some values that are way off from the mean/ median.

These values, also known as outliers, can skew your analysis and end up giving misleading results.

In this tutorial we will see three ways in which you can find outliers in your Excel data:

Finding Outliers in Excel using the Inter-Quartile Range

The Inter-Quartile Range (IQR) is a measure of where the beginning and end of the bulk of your data lie.

So any value that is away from this cluster of data is most probably an outlier.

Let us first understand how the Inter-Quartile Range is calculated.

In order to calculate the IQR, we need to know the first and third quartiles of the data, because the formula for calculating IQR is:

IQR = Q3 - Q1

Where,

  • Q1 is the first quartile of the data
  • Q3 is the third quartile of the data

A quartile consists of a quarter of the values in the data, when the data is sorted from the smallest to largest values.

The first quartile (Q1) consists of the lowest 25% of the data.

Similarly, the third quartile (Q3) consists of the values that are just above the median (values that are between 50% to 75% of the data). 

These quartile values can easily be calculated using the QUARTILE.INC function in Excel. The syntax for the function is as follows:

QUARTILE.INC(array, quart)

Here,

  • array is the range of cells containing your data
  • quart is a number that indicates which quartile you want to calculate.

If you want to calculate the first quartile, you need to specify the quart parameter as 1. Similarly, if you want to calculate the third quartile, you need to specify the quart parameter as 3.

Once you have the Q1, Q3 and IQR values, you can use these three to calculate the smallest and largest values of the acceptable data range (also known as the Lower bound and Upper bound respectively).

So any value that is smaller than the lower bound or larger than the upper bound are the outliers. Here’s how you can calculate the lower and upper bound values:

We can calculate the lower bound limit by multiplying the IQR value by 1.5 and then subtracting it from the Q1 value:

Lower Bound = Q1-(1.5 * IQR)

Similarly, we can calculate the upper bound limit by multiplying the IQR value by 1.5 and then adding it to the Q3 value:

Upper Bound = Q3+(1.5 * IQR)

In a nutshell, the sequence of calculations to find outliers in your data are:

Select your data

  1. Calculate Q1 and Q3 using the QUARTILE function for your data.
  2. Calculate IQR by subtracting Q1 from Q3.
  3. Calculate Lower bound by multiplying IQR by 1.5 and subtracting it from Q1.
  4. Calculate Upper bound by multiplying IQR by 1.5 and adding it to the Q3.
  5. Find the points that are smaller than the lower bound or larger than the upper bound. These points are the outliers.

Let us take an example to see how to apply the above method in Excel. Consider the following list of data values:

Data with outliers

To calculate and find outliers in this list, follow the steps below:

  1. Create a small table next to the data list as shown below:
Create a table to calculate quartiles
  1. In cell E2, type the formula to calculate the Q1 value: =QUARTILE.INC(A2:A14,1).
Lower Quartile formula
  1. In cell E3, type the formula to calculate the Q3 value: =QUARTILE.INC(A2:A14,3).
Upper Quartile formuala
  1. In cell E4, type the formula to calculate the IQR value: =E3-E2.
calculating Inter quartile  range
  1. In cell E5, type the formula to calculate the lower bound value: =E2-(1.5*E4).
Lower bound formula
  1. In cell E6, type the formula to calculate the upper bound value: =E3+(1.5*E4).
Upper bound formula
  1. Now for each data value, you can find out if it is an outlier. Type the following formula in cell B2: =OR((A2<$E$5),(A2>$E$6)). This will return a TRUE value if the data value is an outlier and FALSE otherwise.
OR formula to check if the value is an outlier or not
  1. Copy this formula to the rest of the cells in column B by double clicking on the fill handle of the cell.
TRUE values are outliers

You should now see a TRUE value next to all the outliers in your data.

Note: Alternatively you can also use conditional formatting to highlight the outliers in your data. You can use the formula of Step 7 as the condition for formatting the cells.

Also read: How To Find Interquartile Range In Excel?

Finding Outliers in Excel using the Mean and Standard Deviation

Another simple way of detecting outliers is by using the Standard deviation of the data distribution.

The standard deviation is a quantity that expresses how much the points in a distribution differ from the mean value for the distribution.

Usually we assume a value to be an outlier if it is more than 2 or 3 times the standard deviation of the distribution.

To calculate the Standard deviation of data in Excel, we can use the STDEV.S function.

This function works on a sample of data (not population) and returns the standard deviation of the data’s distribution.

The syntax for the function is:

=STDEV.S(number1,[number2],…)

Here, number1, number2, etc. are references to individual cells in a range. You can also just specify a reference to the range of cells containing your data instead.

Besides the standard deviation you will also need to know the mean of the distribution. To calculate the mean, you can use the AVERAGE function. The syntax for the function is:

=AVERAGE(number1,[number2],…)

Again, number1, number2, etc. are references to the individual cells in a range. You can also just specify a reference to the range of cells containing your data instead.

The values that are 2 standard deviations away from the mean are the outliers.

This means any value that is less than Mean-(2*Standard Deviation) or more than Mean+(2*Standard Deviation) are outliers.

Consider the same dataset again:

Data with outliers

Let us use the Mean and Standard Deviation of the data to find outliers:

  1. Create a small table next to the data list as shown below:
table next to the data
  1. In cell E2, type the formula to calculate the Mean: =AVERAGE(A2:A14).
Formula for mean
  1. In cell E3, type the formula to calculate the Standard Deviation: =STDEV.S(A2:A14).
Formula for standard deviation
  1. In cell E4, type the formula to calculate the lower bound: =E2-(2*E3).
Lower bound formula
  1. In cell E5, type the formula to calculate the upper bound: =E2+(2*E3).
Upper bound formula
  1. Now for each data value, you can find out if it is an outlier. Type the following formula in cell B2: =OR((A2<$E$4),A2>$E$5)). This will return a TRUE value if the data value is an outlier and FALSE otherwise.
checking whether the value is outlier
  1. Copy this formula to the rest of the cells in column B by double clicking on the fill handle of the cell.
Values that are outliers

You should now see a TRUE value next to all the outliers in your data.

Also read: Calculate Correlation Coefficient in Excel

Finding Outliers in Excel using the Z-Score

Another way of finding outliers is by using the Z-score value. The Z-score value gives an idea of how far a data point is from the Mean. It is also known as the Standard Score.

To calculate the Z-score, we need to know the Mean and Standard deviation of the data distribution. The formula for the Z-score is:

Z = (X - mean) / Standard Deviation

Here, X is an individual data value in the distribution.

The further away a data value’s Z-score is from zero, the more unusual it is.

A standard cut-off value for finding outliers are Z-scores of +/-3 or further from zero. So any value with a Z-score of less than -3 and more than +3 can be considered an outlier.

Again let’s consider the same dataset:

Data with outliers

Here’s how you can use the Z-score method to find outliers in the data:

  1. Create a small table next to this list as shown below:
Mean and standard deviation table
  1. In cell F2, type the formula to calculate the Mean: =AVERAGE(A2:A14).
Mean formula
  1. In cell F3, type the formula to calculate the Standard Deviation: =STDEV.S(A2:A14).
Standard Deviation formula
  1. Now for each data value, calculate the Z-score. Type the following formula in cell B2 and copy it to the rest of the cells in column B: =(A2-$F$2)/$F$3.
Z score to calculate outliers
  1. You will notice none of the values in our dataset have crossed the -3 or +3 mark. This is because these points are probably not that far from the mean. However, even a boundary z-score of +2 and -2 can be considered as quite away from the mean. So, let us consider values with a Z-score less than -2 or more than +2 as outliers.
  2. Type the following formula in cell C2: =OR((B2<-2),(B2>2)). This will return a TRUE value if the data value is an outlier and FALSE otherwise.
OR formula to find outliers in Excel
  1. Copy this formula to the rest of the cells in column C by double clicking on the fill handle of the cell.
TRUE values are outliers

You should now see a TRUE value next to all the outliers in your data.

There are many other ways to find outliers in your data in Excel. In this tutorial, we took a look at three such ways.

You can select the method that works best with your data. We hope this was helpful for you.

Other articles you may also like:

I am a huge fan of Microsoft Excel and love sharing my knowledge through articles and tutorials. I work as a business analyst and use Microsoft Excel extensively in my daily tasks. My aim is to help you unleash the full potential of Excel and become a data-slaying wizard yourself.

Leave a Comment