One of the most common data cleaning tasks in Excel includes removing duplicate or redundant records/rows.
While removing whole rows that are duplicate is straightforward, it’s a little trickier when you’re trying to remove entire duplicate rows based on one or more columns.
In this tutorial, we are going to look at three easy ways to remove duplicate rows based on one column in Excel
Using the ‘Remove Duplicates’ Feature
This method is quite straightforward and the most commonly used way to remove duplicate rows in Excel.
Suppose you have a dataset as shown below, and you want to remove all the duplicate records based on Column A.
In this data, you can see that I have multiple instances of ‘Sleeping Bag’ and ‘Karaoke machine’ in column A, and I only want to retain the first row of each instance and get rid of all the others.
To remove all duplicate rows from our sample dataset (shown in the figure above), follow the steps listed below:
- Select the entire dataset, along with the column headers.
- From the Data tab, under the Data Tools group select the Remove Duplicates button.
- This will open the Remove Duplicates dialog box.
- If your selection in step 1 included column headers, then make sure the ‘My data has headers’ checkbox is checked.
- Click on the Unselect All button, to uncheck all the checkboxes under ‘Columns’.
- Now check the column based on which you want to remove the duplicate rows. In our example, we want to remove duplicate rows based on Product Name, so we will simply check the box next to its heading.
- Click OK to close the Remove Duplicates dialog box.
- You should see a notification telling you how many duplicate rows were removed and how many unique rows have been retained, as shown in the image below:
- Click OK
After applying the above steps on our sample dataset, here’s how the dataset should look:
Notice from the above image that exactly one instance corresponding to each product has been retained while all other duplicate rows have been removed.
In other words, this method does not get rid of all duplicate rows. It keeps just one copy and removes all others.
Note: If you want to remove duplicates based on more than one column, you can check the boxes next to the columns you want to include in step 6.
Using VBA Code to Remove Duplicate Rows based on one Column
Suppose you have a dataset as shown below, and you want to remove all the duplicate records based on Column A.
You can do this easily using a short VBA code as well. This method is useful when you have to do this quite often and don’t want to follow many steps.
So instead, you can have the code and add it to the Quick Access Toolbar, so that you can access it with a single click.
If you are comfortable with a little coding, then you can use this method to remove duplicate rows based on a single column.
Even if you’re not that keen on coding, you can just copy-paste the following code:
Sub Delete_duplicate_rows()
Dim Rng As Range
Set Rng = Selection
Rng.RemoveDuplicates Columns:=Array(1), Header:=xlYes
End Sub
This code uses a VBA built-in command for removing duplicates in list-objects.
It takes the selected range, as well as the columns that you want to base the duplicate removal on.
In the above code, we specified this as Column 1 or the Product Names column. The code then removes all the rows from the range that contain duplicate product names.
Note: You cannot undo changes made by this VBA script, so we suggest you keep a backup copy of your dataset before running the code.
To enter the above code, copy it and paste it in your developer window.
Here’s how to do this:
- From the Developer tab, select Visual Basic.
- Once your VBA window opens, you will see all your files and folders in the Project Explorer on the left side.
- Make sure ‘ThisWorkbook’ is selected under the VBA project with the same name as your Excel workbook.
- Click Insert->Module. A new module window should open up.
- Now you can start coding. Copy the above script and paste into the module window.
- Close the VBA window.
If you can’t see the Developer ribbon, from the File menu, go to Options. Select Customize Ribbon and check the Developer option from Main Tabs. Finally, Click OK.
Your macro is now ready to use.
Note: If you want to remove duplicates based on more than one column, you can specify the column numbers in the last line of the code. So if you want to search based on columns 1 and 2, your last line would be:
Rng.RemoveDuplicates Columns:=Array(1,2), Header:=xlYes
Running the Macro
To run your macro, do the following:
- Select the range of cells that you want to work with. In our case, select A1:C9
- Select the Developer tab.
- Click on the Macros button (under the Code group).
- This will open the Macro window, where you will find the names of all the macros that you have created so far.
- Select the macro (or module) named ‘Delete_duplicate_rows’.
- Click Run.
- Click OK.
All the rows containing duplicates should now be removed.
Explanation of the Code
The above code simply took the selected range of cells and assigned it to a variable, Rng.
The RemoveDuplicates command then took this range, searched for duplicates in Column 1 (as we had specified Columns:=Array(1)), and then removed all copies of rows containing the same value in Column 1.
Note: We specified Header:=xlYes to tell Excel that the first row of our selected range contains column headings.
Using Filters and the COUNTIF Function to Remove Duplicate Rows based on one Column
You can also use a formula to help you find duplicate values in your data. This method involves two steps.
Suppose you have the same dataset as shown below, and you want to remove all the duplicate records based on Column A.
First, we will use the COUNTIF function to count the first occurrence of a product as 1, its second occurrence as 2, and so on.
Then, we will use this result to filter out those rows that occur for the second time or more.
The result is the set of all duplicate rows. We can then delete these visible rows and remove the filter to obtain rows containing only unique product names.
The COUNTIF function helps count cells in a range that satisfy a given condition. The syntax for the function is as follows:
= COUNTIF (range, condition)
Here,
- range is the range of cells containing the data you want the function to work on (or count)
- condition is the condition that you want satisfied in order to include a cell in the count.
So, if you want to find the number of times the value at cell reference A2 appears in the range A2:A9, use the function as follows:
=COUNTIF(A2:A9,A2)
But if we want to count the number of times the value appears in the range A2 to the current row, then we can use the function as follows:
=COUNTIF($A$2:$A2,A2)
Note: We locked the starting cell reference in the first parameter with ‘$’ because we don’t want this reference to change when the formula is copied to the other cells.
Here, we want the cell reference $A$2 to remain constant, irrespective of which row the formula is copied to. In this way, we can count how many times the value appears up to the current row.
This will ensure that the first occurrence of the cell value is counted as 1, the second occurrence is counted as 2, and so on.
To remove all duplicate rows from our sample dataset, follow the steps listed below:
- Create a new column with the heading ‘Count’. Next to the dataset.
- In the first cell of the column, type the formula: =COUNTIF($A$2:$A2,A2).
- Copy this formula to the rest of the cells in the column by dragging down the cell’s fill handle. This will display the number of times each product name appears in column A.
- Now we need to filter the dataset to show only the duplicate rows.Select the entire dataset, along with the column headers.
- From the Data tab, under the Sort & Filter group select the Filter button.
- You should now see arrows next to each column’s heading.
- Click on the arrow next to the ‘Count’ heading.
- From the dropdown menu that appears, select Number Filters->greater than.
- This will open the Custom Autofilter dialog box. Type ‘1’ in the input box next to “is greater than”.
- Click OK.
- You should see all the duplicate rows only.
- Delete these filtered rows by selecting them, right-clicking and choosing Delete Rows from the context menu that appears.
- Remove the filter now by clicking on the Filter button again (under the Data tab).
- You should now be left with all the unique rows of the original dataset.
You can now get rid of the Count column.
In this tutorial, we looked at three ways in which you can remove duplicate rows based on one or more columns.
You can feel free to choose the method that best suits your requirement at hand.
We hope this was helpful.
Other Excel tutorials you may also like:
- How to Find Duplicates in Excel (Conditional Formatting/ Count If/ Filter)
- How to Remove Blank Columns in Excel? (Formula + VBA)
- Duplicate Sheet in Excel (Shortcuts + VBA)
- How to Count Negative Numbers in Excel
- How to Count How Many Times a Word Appears in Excel
- How to Select Rows with Specific Text in Excel
- How to Delete Filtered Rows in Excel (with and without VBA)
- Get Unique Values from a Column in Excel
- How to Count Unique Values in Excel (Formulas)