Remove Duplicates in Power Query

Duplicates in Excel datasets are often a result of data entry errors, merging datasets, or importing data from multiple sources. The duplicates cause inaccurate results in data analysis and reporting. 

You can use Power Query to remove duplicates in your dataset, ensuring it is clean, accurate, and ready for analysis and reporting.

I will show you how to remove duplicates from a single-column and multiple-column dataset using Power Query.

Use Power Query to Remove Duplicates from a One-Column Dataset

Consider the single-column dataset below with 10 records having duplicates: ‘John Smith’ and ‘Alice Johnson’ each appear twice, while ‘Sarah Taylor’ appears thrice.

Data set with duplicate items.

You want to use Power Query to remove duplicates from the dataset, ensuring it has unique values.

Here’s how to do it:

  1. Select the dataset.
Select the dataset.
  1. Open the Data tab and click the From Table/Range option on the Get & Transform Data group.
Click on the from table or range option in the get in transform category.

The above step opens the Create Table dialog box with the dataset’s reference filled in and the ‘My table has headers’ checkbox selected.

Create table dialog box.
  1. Click OK on the Create Table dialog box.

The above step converts the data range to a table (if it wasn’t a table already) and loads the data onto the Power Query Editor. 

  1. You can use either of the options below to remove duplicates from the dataset.

Option #1: Use the Remove Duplicates Option on the Home Tab

On the Power Query Editor, click the Home tab, open the Remove Rows drop-down list on the Reduce Rows group, and select the Remove Duplicates option.

Click the home tab in Power Query Editor.

Option #2: Use the Remove Duplicates Option on the Shortcut Menu 

Right-click the dataset’s header and select the Remove Duplicates option on the shortcut menu.

Right click and then choose the remove duplicates option.

The above step removes duplicates from the dataset, reducing it from the original ten values (including duplicates) to six unique values.

Duplicates have been removed in Power Query data.

You can open the View tab and select the ‘Column distribution’ option on the Data Preview group to see the count of unique values in the column.

Check the column distribution option in the View tab in Power Query Editor.

Power Query visually displays the count of unique and distinct values in the column.

Visual display of distinct and unique values in a column.
  1. On the Home tab, click the Close & Load button.
Click on the file tab and then click on close and load.

The above step closes the Power Query Editor and loads the dataset with unique values onto the destination worksheet.

Power Query data is loaded into the worksheet in Excel.

Use Power Query to Remove Duplicates from a Multiple-Column Dataset

Consider the multi-column dataset below containing duplicate records for ‘John Smith,’ ‘Alice Johnson,’ and ‘Sarah Taylor.’

Dataset will duplicate rows.

Note: In a multi-column dataset, a duplicate record is a row where all column values exactly match those in another row.

You want to use Power Query to remove duplicate records from the dataset, ensuring it has unique records.

Here’s how to do it:

  1. Follow steps 1 to 3 in the previous section titled, ‘Use Power Query to Remove Duplicates From Single-column Dataset.’

The steps above convert the data range into a table (if it wasn’t a table) and load it into the Power Query Editor with the first column preselected.

Data loaded in Power Query Editor.
  1. Press and hold down the CTRL key and click the headers of the second and third columns.
Press and hold the control key and select headers.
  1. Do step 4 in the previous section to remove duplicates.

The above step removes duplicates from the dataset leaving unique records.

Duplicate records are removed in Power Query Editor.
  1. On the Home tab, click the Close & Load button.
Click on close and load.

The above step closes the Power Query Editor and loads the dataset with unique records onto the destination worksheet.

Data is loaded into excel file.

Comparison of Excel’s and Power Query’s Remove Duplicates Options

Although you can use Excel’s built-in Remove Duplicates option to remove duplicates from a dataset, Power Query’s Remove Duplicates option is often a better choice because of the reasons in the comparison table below.

Excel’s Remove Duplicates OptionPower Query’s Remove Duplicates Option
Directly modifies the original data, risking data loss without a backup.Preserves the original data; duplicates are removed only in Power Query.
Requires manually reapplying the Remove Duplicates option after each dataset update.Once you have set up the query, it only requires refreshing when the source data is updated. Ideal for dynamic datasets.
Performance may be degraded when dealing with large datasets.Efficiently processes large datasets without freezing or slowing down Excel.
Lacks a history or log of operations, making changes harder to trace.Provides a clear, step-by-step record of all transformations in the ‘Applied Steps’ pane, making it easy to review or modify.

I have shown you how to remove duplicates from a single-column and multiple-column dataset using Power Query. I hope you found the tutorial helpful.

Other Excel articles you may also like:

I am a huge fan of Microsoft Excel and love sharing my knowledge through articles and tutorials. I work as a business analyst and use Microsoft Excel extensively in my daily tasks. My aim is to help you unleash the full potential of Excel and become a data-slaying wizard yourself.