Sometimes when you receive a dataset with duplicates you may need to group the duplicates so you can easily identify them and decide to remove or consolidate them.
This way you can clean your data and ensure you are accurate in your data analysis and reporting.
Manually identifying and grouping duplicates in Excel can be tedious, time-consuming, and error-prone. However, Excel offers several built-in functions and features that you can use to simplify the process.
In this tutorial, I will show you how to use Excel’s built-in functions and features to group duplicates in Excel.
Method #1: Use Conditional Formatting and Sort Dialog Box
You can use conditional formatting to identify duplicates in a dataset. After identifying the duplicates you can group them using the Sort feature.
Suppose you have the dataset below with duplicate entries and want to identify and group them.
You can use the steps below to perform the task.
Step #1: Use Conditional Formatting to Identify the Duplicates
You can use the steps below to identify the duplicates.
- Enter the formula below in the ‘Concatenated String’ helper column.
=A2&B2&C2&D2&E2
The formula combines or concatenates the values in each row of columns A to E of the dataset in the ‘Concatenated String’ helper column.
- Select the combined strings in the helper column.
- Click the Home tab, open the Conditional Formatting drop-down list on the Styles group, hover over the Highlight Cells Rules option, and click the Duplicate Values option on the submenu.
The above step opens the Duplicate Values dialog box.
- On the Duplicate Values dialog box, choose the formatting style you want applied to all cells with duplicates. In this case, I have chosen Light Red Fill.
Excel applies the formatting style you choose to all cells with duplicates, as shown below.
Step #2: Group the Duplicate Records
You can use the steps below to group the duplicate records.
- Select the entire dataset, excluding the header row.
- Open the Data tab and click the Sort button on the Sort & Filter group.
The above step opens the Sort dialog box.
- On the Sort dialog box, open the Column drop-down list, select ‘Order ID,’ and ensure that ‘Smallest to Largest,’ the default option, is selected on the Order drop-down menu.
- On the Sort dialog box, click the Add Level command button to add a second level of drop-down lists.
The above step adds a second level of drop-down lists to the dialog box.
- On the second level of drop-down lists, open the Column drop-down list and select ‘Concatenated Strings,’ open the Sort On drop-down list and select ‘Cell Cor,’ open the Order drop-down list and choose whether the cells with no formatting should appear on top or at the bottom of the cells with formatting. In this case, I have chosen the ‘On Bottom’ option.
- Click OK.
The above steps group the duplicate records in the dataset as shown below.
You can delete the ‘Concatenated String’ helper column because you no longer need it.
After grouping the duplicates you can consolidate them, remove them, or extract unique records from the dataset as explained in the sections ‘Consolidate the Duplicates,’ ‘Extract Unique Records,’ and ‘Remove Duplicates’ of Method #2 below.
Also read: Remove Duplicate Rows based on one Column in Excel
Method #2: Use the SUMPRODUCT Function and Sort Dialog Box
You can use the SUMPRODUCT function that calculates the sum of the products of corresponding ranges or arrays to determine the number of duplicates for each entry in a dataset.
After determining the number of duplicates for each entry, you can use the Sort dialog box to group the duplicates. You can optionally consolidate the duplicates or copy unique records to a different location.
Suppose you have the dataset below with duplicate entries and want to identify and group them.
You can use the steps below to do the job.
Step #1: Use the SUMPRODUCT Formula to Identify the Duplicates
Enter the formula below in the ‘Number of Duplicates’ helper column to calculate the number of duplicates for each record.
=SUMPRODUCT(--(A2&B2&C2&D2&E2=$A$2:$A$16&$B$2:$B$16&$C$2:$C$16&$D$2:$D$16&$E$2:$E$16))
The formula returns the number of duplicates of each record in the dataset in the helper column ‘Number of Duplicates.’
Explanation of the Formula
=SUMPRODUCT(--(A2&B2&C2&D2&E2=$A$2:$A$16&$B$2:$B$16&$C$2:$C$16&$D$2:$D$16&$E$2:$E$16))
This formula compares the concatenated string of values in row 2 (A2:E2) with the concatenated string values for each row in the cell range A2:E16, counting the number of times that exact combination of values appears within the range.
Here’s how the formula works:
- A2&B2&C2&D2&E2 – This concatenates the values from cells A2, B2, C2, D2, and E2 into a single string.
- $A$2:$A$16&$B$2:$B$16&$C$2:$C$16&$D$2:$D$16&$E$2:$E$16 – This concatenates the values from columns A to E for every row in the range from row 2 to row 16 into strings, row by row.
- A2&B2&C2&D2&E2=$A$2:$A$16&$B$2:$B$16&$C$2:$C$16&$D$2:$D$16&$E$2:$E$16 – This checks if the concatenated string from row 2 (A2:E2) is equal to the concatenated string for each row in the range A2:E16. The result is an array of TRUE or FALSE values, where TRUE means the row matches and FALSE means it does not.
- –( … ) – The double negative converts the TRUE and FALSE values to 1s and 0s respectively. So, if a row matches, it returns 1; otherwise, it returns 0.
- SUMPRODUCT(…) – The SUMPRODUCT function sums up the 1s and 0s, effectively counting how many times the row (A2:E2) matches the rows in the range A2:E16.
Step #2: Group the Duplicates
You can group the duplicates using the steps below:
- Select the entire dataset, excluding the header row.
- Open the Data tab and click the Sort button on the Sort & Filter group.
The above step opens the Sort dialog box.
- On the Sort dialog box, open the Column drop-down list, select ‘Order ID,’ and ensure that ‘Smallest to Largest,’ the default option, is selected on the Order drop-down menu.
- On the Sort dialog box, click the Add Level command button to add a second level of drop-down lists.
The above step adds a second level of drop-down lists to the dialog box.
- On the second level of drop-down lists, open the Column drop-down list, select ‘Number of Duplicates,’ open the Order drop-down list and select ‘Largest to Smallest.’
- Click OK.
The above steps group the duplicates in the dataset as shown below.
After grouping the duplicates, you can manually consolidate the duplicates or extract unique records from the dataset.
Consolidate the Duplicates
If the duplicate entries are not redundant you can manually consolidate them. The resultant dataset will be as shown below.
Note: For example, John Doe appears three times in the original dataset, ordering one laptop each time. I manually consolidated the three entries into one indicating he ordered three laptops. The same logic applies to all other duplicate entries.
Extract Unique Records
You can extract unique records from the dataset if the duplicate records are redundant. This way you leave the original data unaffected.
You can extract unique records from the dataset using the below steps:
- Select the entire dataset including the header row.
- Open the Data tab and click the Advanced option on the Sort & Filter group.
The above step opens the Advanced Filter dialog box.
- On the Advanced Filter dialog box select the ‘Copy to another location’ option, click the range selector button on the ‘Copy to’ box, select the starting cell of the location where you want to copy the filtered dataset, and check the ‘Unique records only’ checkbox.
- Click OK.
Excel copies the filtered dataset of unique records to the new location, as shown below.
You can delete the ‘Number of Duplicates’ helper column because you no longer need it.
Remove Duplicates
You can delete the grouped duplicate records if you are sure you won’t need them later on using the steps below.
- Select the entire dataset, excluding the header row.
- Open the Data tab and click the Remove Duplicates option on the Data Tools group.
The above step opens the Remove Duplicates dialog box.
- On the Remove Duplicates dialog box, ensure all the columns are selected and click OK.
Excel removes all duplicate records from the dataset and displays a message message box indicating how many records have been removed.
Duplicate records have been removed from the dataset as shown below.
You can delete the helper column because you no longer need it.
Also read: Get Unique Values from a Column in Excel
Method #3: Use IF and COUNTIFS Functions and Sort Dialog Box
You can use the combination of IF and COUNTIFS functions to identify duplicate records in a dataset and thereafter use the Sort dialog box to group them.
Suppose you have the dataset below with duplicate entries and want to identify and group them.
You can use the steps below to do the job.
Step #1: Use IF and COUNTIFS Functions to Identify Duplicates
Enter the formula below in the ‘Unique/Duplicate’ helper column to identify records as unique or duplicates.
=IF(COUNTIFS($A$2:$A$16,A2,$B$2:$B$16,B2,$C$2:$C$16,C2,$D$2:$D$16,D2,$E$2:$E$16,E2)=1,"Unique","Duplicate")
Explanation of the Formula
=IF(COUNTIFS($A$2:$A$16,A2,$B$2:$B$16,B2,$C$2:$C$16,C2,$D$2:$D$16,D2,$E$2:$E$16,E2)=1,"Unique","Duplicate")
This formula checks if the combination of values in a row is unique within the specified range. If the combination occurs only once in the range, it labels it as ‘Unique’; if it appears more than once, it’s labeled ‘Duplicate.’
Step #2: Group the Duplicates
You can group the duplicates using the steps below:
- Select the entire dataset, excluding the header row.
- Open the Data tab and click the Sort button on the Sort & Filter group.
The above step opens the Sort dialog box.
- On the Sort dialog box, open the Column drop-down list, select ‘Order ID,’ and ensure that ‘Smallest to Largest,’ the default option, is selected on the Order drop-down menu.
- On the Sort dialog box, click the Add Level command button to add a second level of drop-down lists.
The above step adds a second level of drop-down lists to the dialog box.
- On the second level of drop-down lists, open the Column drop-down list, select ‘Unique/Duplicate,’ open the Order drop-down list, and select ‘A to Z.’
- Click OK.
The above steps group the duplicates in the dataset as shown below.
You can delete the ‘Unique/Duplicate’ helper column because it is no longer necessary.
After grouping the duplicates you can consolidate them, remove them, or extract unique records from the dataset as explained in the sections ‘Consolidate the Duplicates,’ ‘Extract Unique Records,’ and ‘Remove Duplicates’ of Method #2 above.
Prevent Duplicates Using Data Validation
You can use Excel’s data validation feature to prevent duplicate entries in your datasets from the start. Data validation allows you to restrict cell entries, ensuring that users enter only valid data, in this case, unique data.
You can use the steps below to apply data validation:
- Select the cell range where you want to prevent duplicates. In this case, I have selected cell range A2:A20 because I want Order IDs to be unique.
- Open the Data tab and click the Data Validation option on the Data Tools group.
The above step opens the Data Validation dialog box.
- Do the following on the Data Validation dialog box:
- Click the Settings tab.
- Uncheck the ‘Ignore blank’ checkbox not to allow empty cells.
- Open the Allow drop-down list and select ‘Custom.’
- Enter the formula =COUNTIF($A$2:$A$20, A2) = 1 on the Formula box.
- Still, on the Data Validation dialog box, do the following:
- Click the Error Alert tab.
- Open the Style drop-down list and select the icon you want on the error alert message box. In this case, I have chosen the Stop icon so that users know they cannot override the rule.
- Enter a title for the error alert, such as ‘Duplicate Entry Not Allowed.’
- Enter a message for the error alert, such as ‘This value already exists. Please enter a unique value.’
- Click OK.
- Test the data validation by trying to enter duplicate values in the Order ID column. Excel displays the error message.
I have shown you how to group duplicates in Excel using Excel’s built-in functions and features. I have also shown you how to prevent duplicate entries using data validation. I hope you found the tutorial helpful.
Other Excel articles you may also like: