Calculating covariance in Excel is a useful skill when you want to measure how two variables change together.
Covariance helps you determine the relationship between these variables and make informed decisions based on the data.
While variance involves a single data set, covariance compares two data sets with each other.
What is Covariance?
Covariance is a statistical measure that helps you understand the relationship between two sets of variables.
It shows whether the variables tend to increase or decrease together.
For example, if the variables are related to weather, you can use covariance to determine if temperature and rainfall are moving in the same direction.
If the covariance between the two variables is positive, then the variables are said to be directly proportional to each other.
This means that as one variable increases, so does the other, and vice versa.
Mathematically, the covariance between two variables, A & B, is calculated as follows:
In the expression above, A and B are the variables from two data sets for which we want to calculate the covariance. A’ & B’ are the mean values, and N is the number of values.
If the covariance between two variables is negative, then the variables are inversely related to each other. This means that as one variable increases, the other decreases, and vice versa. A covariance of zero means that there is no linear relationship between the variables.
Just like variance, covariance is of two types; sample covariance and population covariance.
Sample covariance calculates the covariance of a small sample from a large data set. Population covariance calculates the covariance of entire data sets.
Sample covariance can be thought of as an estimate of the population covariance, whereas population covariance is a parameter that describes the relationship between the two variables in the entire population.
Sample covariance is less precise because it is based on a smaller sample of data.
Also read: How to Calculate Variance in Excel?
Calculating Covariance in Excel
Excel has built-in functions which can calculate both the sample as well as the population covariance. For demonstration, take a look at the sample data set below.
The above data set shows the marks obtained by students in a test and the number of hours they studied for it.
We would like to determine the relationship between the Marks and the No. of Hours Studied. We will use covariance to determine this relationship.
So in this article, we will look at different methods of calculating the sample and population covariance of the data set shown above.
Method 1: Using the COVARIANCE.S Function
In this method, we will calculate the sample covariance using the COVARIANCE.S function.
The letter ‘S’ in the name of the COVARIANCE.S function signifies that this is used for calculating sample covariance, which makes it easy to remember. As an example, we will use the example data set that was shown earlier.
Select a cell in which you would like to calculate the covariance. In this case, I have selected cell D2.
To calculate the sample covariance, the formula is as follows:
In this formula, array1 is the range of cells of the first data set. In our case, this would be the Marks starting from cell B2 to cell B15.
Likewise, array2 is the range of cells of the second data set.
In this case, this would be the No. Of Hours Studied starting from cell C2 to cell C15. The above formula will then become:
Enter the formula as shown.
You can see that Excel has calculated the sample covariance in cell D2.
As explained earlier, this value is positive, which means that the Marks and No. Of Hours Studied are directly proportional to each other.
In other words, as the No. Of Hours Studies increases, so make the Marks and vice versa.
So in this method, we have seen how to calculate the sample covariance using Excel’s built-in function.
In the following method, we will look at how to calculate the population covariance.
Also read: How to Find Z-score in Excel?
Method 2: Using the COVARIANCE.P Function
In this method, we will use the COVARIANCE.P function to calculate the population covariance.
This is quite similar to the previous function except for the letter ‘P’ in the name, which signifies that this function will calculate the population covariance.
As an example, we will use the same data set that we saw earlier and have used in Method 1.
Select a cell for calculating the population covariance. Over here, I will be selecting an empty cell D2.
To calculate the population covariance, the formula syntax is as follows:
In the syntax above, array1 and array2 are the range of cells corresponding to the two data sets for which we want to calculate the covariance.
In this case, this would be the Marks from cell B2 to cell B15 and No. of Hours Studied from cell C2 to cell C15. The formula now becomes:
Enter the formula in the formula bar as highlighted below.
You can see that Excel has calculated the population covariance in cell D2.
The value is positive, which means that the Marks and No. Of Hours Studied are directly proportional to each other.
So in this method, we have seen the use of the COVARIANCE.P function for calculating the population covariance.
In the following method, we will use an Excel Add-Ins to calculate the covariance.
Method 3: Using Excel Add-Ins
In this method, we will make use of the Analysis ToolPak add-in.
This add-in comes default with Excel but is inactive/disabled. Let’s see how to enable this add-in and consequently calculate the covariance.
Please note that the Analysis ToolPak add-in only calculates the population covariance.
One advantage of using this add-in is that you can use multiple data sets instead of only two data sets when using the covariance functions.
So instead of outputting a single value, the add-in creates a relationship matrix.
- From the ribbon, select the File tab.
- Select Options from the panel on the left.
- Inside the options window select Add-Ins.
- Click on the Go button in front of Manage.
- Tick the checkbox for Analysis ToolPak.
- Click on Ok.
- Go to the Data tab.
- Under the Analysis section, click on Data Analysis.
- From the Analysis Tools list, select Covariance.
- Click on Ok.
- Click inside the Input Range field and select all your data sets. In this case, this will be columns B and C.
- Since our Input Range includes the column headings, make sure that the checkbox for ‘Labels in First Row’ is ticked.
- Select the ‘Output Range’ option under the ‘Output options’ section. (Skip this step if the Output Range is already selected)
- Click inside the ‘Output Range field’ and select the cell on which you want the output to be displayed. I have chosen cell D2.
- Click on Ok.
You will see that the covariance relationship matrix has been created at cell D2 as shown.
You might need to adjust the column widths to see the result better.
The matrix intersection for Marks and No. Of Hours Studied shows a positive value and is the same value calculated in Method 2.
Once again, this means that the Marks and the No. Of Hours Studied are directly proportional to each other.
So in this method, we saw how to calculate the population covariance using the help of the Analysis ToolPak add-in.
While this add-in is available in Excel, it is not activated by default.
Also read: How to Remove Add-Ins from Excel?
Covariance vs. Correlation
While covariance helps you identify the direction of the relationship between two variables, it doesn’t tell you about the strength of that relationship.
This is where correlation comes in.
- Covariance: Indicates the direction of the relationship (positive or negative).
- Correlation: Measures both the direction and strength of the relationship between the variables, ranging from -1 (a strong negative relationship) to 1 (a strong positive relationship).
Since correlation is more versatile and easier to interpret, it’s often preferred in practice.
However, understanding the concept of covariance provides you with a solid foundation for learning correlation as well.
In conclusion, we have seen different methods by which we can calculate covariance in Excel.
Use Method 1 if you want to calculate the sample covariance and Method 2 if you want to calculate the population covariance.
Method 3 also calculates the population covariance but allows more than two data sets.
Use Method 2 if you want to calculate the covariance of two data sets, as it is simpler. If you have more than two data sets to compare, choose Method 3.
Other Excel articles you may also like:
- Calculate the Coefficient of Variation in Excel
- How to Calculate Mean Squared Error (MSE) in Excel?
- How to Calculate Standard Error in Excel
- How to Calculate Confidence Interval in Excel
- How to Find Percentile in Excel (PERCENTILE Function)
- How to Calculate Percentage Difference in Excel
- How to Calculate IRR with Excel