How to Group Duplicates in Excel

Sometimes when you receive a dataset with duplicates you may need to group the duplicates so you can easily identify them and decide to remove or consolidate them.

This way you can clean your data and ensure you are accurate in your data analysis and reporting. 

Manually identifying and grouping duplicates in Excel can be tedious, time-consuming, and error-prone. However, Excel offers several built-in functions and features that you can use to simplify the process.

In this tutorial, I will show you how to use Excel’s built-in functions and features to group duplicates in Excel.

Method #1: Use Conditional Formatting and Sort Dialog Box

You can use conditional formatting to identify duplicates in a dataset. After identifying the duplicates you can group them using the Sort feature.

Suppose you have the dataset below with duplicate entries and want to identify and group them.

dataset to group duplicates in Excel

You can use the steps below to perform the task.

Step #1: Use Conditional Formatting to Identify the Duplicates

You can use the steps below to identify the duplicates.

  1. Enter the formula below in the ‘Concatenated String’ helper column.
=A2&B2&C2&D2&E2
Enter formula to concatenate the cells

The formula combines or concatenates the values in each row of columns A to E of the dataset in the ‘Concatenated String’ helper column.

  1. Select the combined strings in the helper column.
Select the combined cells in the helper column
  1. Click the Home tab, open the Conditional Formatting drop-down list on the Styles group, hover over the Highlight Cells Rules option, and click the Duplicate Values option on the submenu.
specify the formatting for duplicates

The above step opens the Duplicate Values dialog box.

  1. On the Duplicate Values dialog box, choose the formatting style you want applied to all cells with duplicates. In this case, I have chosen Light Red Fill.

Excel applies the formatting style you choose to all cells with duplicates, as shown below.

Duplicate cells are highlighted in the chosen color

Step #2: Group the Duplicate Records

You can use the steps below to group the duplicate records.

  1. Select the entire dataset, excluding the header row.
Select the entire data set
  1. Open the Data tab and click the Sort button on the Sort & Filter group.
Click on the sort option in the ribbon

The above step opens the Sort dialog box.

  1. On the Sort dialog box, open the Column drop-down list, select ‘Order ID,’ and ensure that ‘Smallest to Largest,’  the default option, is selected on the Order drop-down menu.
Select the sort by and sort order in the sort dialog box
  1. On the Sort dialog box, click the Add Level command button to add a second level of drop-down lists.
Click on the add level option in the sort dialog box

The above step adds a second level of drop-down lists to the dialog box.

  1. On the second level of drop-down lists, open the Column drop-down list and select ‘Concatenated Strings,’ open the Sort On drop-down list and select ‘Cell Cor,’ open the Order drop-down list and choose whether the cells with no formatting should appear on top or at the bottom of the cells with formatting. In this case, I have chosen the ‘On Bottom’ option.
now sort by concatenated string and the color
  1. Click OK.

The above steps group the duplicate records in the dataset as shown below.

Data set is sorted and duplicates on group together

You can delete the ‘Concatenated String’ helper column because you no longer need it. 

After grouping the duplicates you can consolidate them, remove them, or extract unique records from the dataset as explained in the sections ‘Consolidate the Duplicates,’ ‘Extract Unique Records,’ and ‘Remove Duplicates’ of Method #2 below. 

Also read: Remove Duplicate Rows based on one Column in Excel

Method #2: Use the SUMPRODUCT Function and Sort Dialog Box

You can use the SUMPRODUCT function that calculates the sum of the products of corresponding ranges or arrays to determine the number of duplicates for each entry in a dataset. 

After determining the number of duplicates for each entry, you can use the Sort dialog box to group the duplicates. You can optionally consolidate the duplicates or copy unique records to a different location.

Suppose you have the dataset below with duplicate entries and want to identify and group them.

dataset to group duplicates in Excel

You can use the steps below to do the job.

Step #1: Use the SUMPRODUCT Formula to Identify the Duplicates

Enter the formula below in the ‘Number of Duplicates’ helper column to calculate the number of duplicates for each record.

=SUMPRODUCT(--(A2&B2&C2&D2&E2=$A$2:$A$16&$B$2:$B$16&$C$2:$C$16&$D$2:$D$16&$E$2:$E$16))
Formula to get the number of duplicates

The formula returns the number of duplicates of each record in the dataset in the helper column ‘Number of Duplicates.’ 

Explanation of the Formula

=SUMPRODUCT(--(A2&B2&C2&D2&E2=$A$2:$A$16&$B$2:$B$16&$C$2:$C$16&$D$2:$D$16&$E$2:$E$16))

This formula compares the concatenated string of values in row 2 (A2:E2)  with the concatenated string values for each row in the cell range A2:E16, counting the number of times that exact combination of values appears within the range. 

Here’s how the formula works:

  1. A2&B2&C2&D2&E2 – This concatenates the values from cells A2, B2, C2, D2, and E2 into a single string.
  2. $A$2:$A$16&$B$2:$B$16&$C$2:$C$16&$D$2:$D$16&$E$2:$E$16 – This concatenates the values from columns A to E for every row in the range from row 2 to row 16 into strings, row by row.
  3. A2&B2&C2&D2&E2=$A$2:$A$16&$B$2:$B$16&$C$2:$C$16&$D$2:$D$16&$E$2:$E$16 – This checks if the concatenated string from row 2 (A2:E2) is equal to the concatenated string for each row in the range A2:E16. The result is an array of TRUE or FALSE values, where TRUE means the row matches and FALSE means it does not.
  4. –( … ) – The double negative converts the TRUE and FALSE values to 1s and 0s respectively. So, if a row matches, it returns 1; otherwise, it returns 0.
  5. SUMPRODUCT(…) – The SUMPRODUCT function sums up the 1s and 0s, effectively counting how many times the row (A2:E2) matches the rows in the range A2:E16.

Step #2: Group the Duplicates

You can group the duplicates using the steps below:

  1. Select the entire dataset, excluding the header row.
Select the entire data set
  1. Open the Data tab and click the Sort button on the Sort & Filter group.
Click on the sort icon in the data tab

The above step opens the Sort dialog box.

  1. On the Sort dialog box, open the Column drop-down list, select ‘Order ID,’ and ensure that ‘Smallest to Largest,’  the default option, is selected on the Order drop-down menu.
Sound my order ID and the sort order
  1. On the Sort dialog box, click the Add Level command button to add a second level of drop-down lists.
Add a new sorting level

The above step adds a second level of drop-down lists to the dialog box.

  1. On the second level of drop-down lists, open the Column drop-down list, select ‘Number of Duplicates,’ open the Order drop-down list and select ‘Largest to Smallest.’
Short by the number of duplicate and largest to smallest
  1. Click OK.

The above steps group the duplicates in the dataset as shown below.

Data sorted and duplicates are grouped together

After grouping the duplicates, you can manually consolidate the duplicates or extract unique records from the dataset.

Consolidate the Duplicates

If the duplicate entries are not redundant you can manually consolidate them. The resultant dataset will be as shown below.

Date after consolidation

Note: For example, John Doe appears three times in the original dataset, ordering one laptop each time. I manually consolidated the three entries into one indicating he ordered three laptops. The same logic applies to all other duplicate entries.

Extract Unique Records

You can extract unique records from the dataset if the duplicate records are redundant. This way you leave the original data unaffected. 

You can extract unique records from the dataset using the below steps:

  1. Select the entire dataset including the header row.
Select the entire data set
  1. Open the Data tab and click the Advanced option on the Sort & Filter group.
Click on the advanced icon in the data tab in the Excel Ribbon menu

The above step opens the Advanced Filter dialog box.

  1. On the Advanced Filter dialog box select the ‘Copy to another location’ option, click the range selector button on the ‘Copy to’ box, select the starting cell of the location where you want to copy the filtered dataset, and check the ‘Unique records only’ checkbox.
select unique records only in the advanced filter dialog box
  1. Click OK.

Excel copies the filtered dataset of unique records to the new location, as shown below.

Filter data set of unique records

You can delete the ‘Number of Duplicates’ helper column because you no longer need it.

Remove Duplicates

You can delete the grouped duplicate records if you are sure you won’t need them later on using the steps below.

  1. Select the entire dataset, excluding the header row.
Select the entire data set
  1. Open the Data tab and click the Remove Duplicates option on the Data Tools group.
click the remove duplicate icon in the data tools group

The above step opens the Remove Duplicates dialog box.

  1. On the Remove Duplicates dialog box, ensure all the columns are selected and click OK.
Select all the columns in the remove duplicate dialog box

Excel removes all duplicate records from the dataset and displays a message message box indicating how many records have been removed.

Message box after removing the duplicate records

Duplicate records have been removed from the dataset as shown below.

Duplicate records have been removed

You can delete the helper column because you no longer need it.

Also read: Get Unique Values from a Column in Excel

Method #3: Use IF and COUNTIFS Functions and Sort Dialog Box

You can use the combination of IF and COUNTIFS functions to identify duplicate records in a dataset and thereafter use the Sort dialog box to group them.

Suppose you have the dataset below with duplicate entries and want to identify and group them.

dataset to group duplicates in Excel

You can use the steps below to do the job.

Step #1: Use IF and COUNTIFS Functions to Identify Duplicates

Enter the formula below in the ‘Unique/Duplicate’ helper column to identify records as unique or duplicates.

=IF(COUNTIFS($A$2:$A$16,A2,$B$2:$B$16,B2,$C$2:$C$16,C2,$D$2:$D$16,D2,$E$2:$E$16,E2)=1,"Unique","Duplicate")
formula to get weather the record is duplicate or unique

 Explanation of the Formula

=IF(COUNTIFS($A$2:$A$16,A2,$B$2:$B$16,B2,$C$2:$C$16,C2,$D$2:$D$16,D2,$E$2:$E$16,E2)=1,"Unique","Duplicate")

This formula checks if the combination of values in a row is unique within the specified range. If the combination occurs only once in the range, it labels it as ‘Unique’; if it appears more than once, it’s labeled ‘Duplicate.’

Step #2: Group the Duplicates

You can group the duplicates using the steps below:

  1. Select the entire dataset, excluding the header row.
Select the entire data set
  1. Open the Data tab and click the Sort button on the Sort & Filter group.
Click on the sort icon in the data tab

The above step opens the Sort dialog box.

  1. On the Sort dialog box, open the Column drop-down list, select ‘Order ID,’ and ensure that ‘Smallest to Largest,’  the default option, is selected on the Order drop-down menu.
sound by order ID and sort order is smallest to largest
  1. On the Sort dialog box, click the Add Level command button to add a second level of drop-down lists.
Add another level of sorting

The above step adds a second level of drop-down lists to the dialog box.

  1. On the second level of drop-down lists, open the Column drop-down list, select ‘Unique/Duplicate,’ open the Order drop-down list, and select ‘A to Z.’
Select the then by column and the sort order
  1. Click OK.

The above steps group the duplicates in the dataset as shown below.

duplicates are grouped together

You can delete the ‘Unique/Duplicate’ helper column because it is no longer necessary.

After grouping the duplicates you can consolidate them, remove them, or extract unique records from the dataset as explained in the sections ‘Consolidate the Duplicates,’ ‘Extract Unique Records,’ and ‘Remove Duplicates’ of Method #2 above.

Prevent Duplicates Using Data Validation

You can use Excel’s data validation feature to prevent duplicate entries in your datasets from the start. Data validation allows you to restrict cell entries, ensuring that users enter only valid data, in this case, unique data.

You can use the steps below to apply data validation:

  1. Select the cell range where you want to prevent duplicates. In this case, I have selected cell range A2:A20 because I want Order IDs to be unique.
Select column A
  1. Open the Data tab and click the Data Validation option on the Data Tools group.
Click on the data validation icon in the ribbon in Excel

The above step opens the Data Validation dialog box.

  1. Do the following on the Data Validation dialog box:
  1. Click the Settings tab.
  2. Uncheck the ‘Ignore blank’ checkbox not to allow empty cells.
  3. Open the Allow drop-down list and select ‘Custom.’
  4. Enter the formula =COUNTIF($A$2:$A$20, A2) = 1 on the Formula box.
Enter the formula to prevent duplicate
  1. Still, on the Data Validation dialog box, do the following:
  1. Click the Error Alert tab.
  2. Open the Style drop-down list and select the icon you want on the error alert message box. In this case, I have chosen the Stop icon so that users know they cannot override the rule.
  3. Enter a title for the error alert, such as ‘Duplicate Entry Not Allowed.’ 
  4. Enter a message for the error alert, such as ‘This value already exists. Please enter a unique value.’
Specify the error message to show when duplicate entry is done
  1. Click OK.
  2. Test the data validation by trying to enter duplicate values in the Order ID column. Excel displays the error message.
Error message when Excel identifies a duplicate entry that already exists

I have shown you how to group duplicates in Excel using Excel’s built-in functions and features. I have also shown you how to prevent duplicate entries using data validation. I hope you found the tutorial helpful.

Other Excel articles you may also like:

I am a huge fan of Microsoft Excel and love sharing my knowledge through articles and tutorials. I work as a business analyst and use Microsoft Excel extensively in my daily tasks. My aim is to help you unleash the full potential of Excel and become a data-slaying wizard yourself.

Leave a Comment