Remove duplicate rows of data in Excel

When you work with large amounts of data, removing or cleaning duplicate records can be very difficult. Excel makes this task easy. Below are two possible scenarios and solutions to optimize your spreadsheets.

Remark † The information in this article applies to Excel 2019, Excel 2016, Excel 2013, Excel 2010, and Excel 2007.

Content
  1. Remove duplicate data in Excel
  2. Field names and column letters
  3. Continuous data range
  4. Example of deleting duplicate data records
  5. Open the Remove Duplicates dialog box
  6. Find identical records
  7. Find and remove overlapping records with Duplicate Remover
  8. Check one box at a time
  9. Find partially matching records

Remove duplicate data in Excel

Spreadsheet programs such as Excel are often used as a database for things like spare parts, sales records, and mailing lists.

Databases in Excel are made up of tables of data organized into sets of data called records. A record has associated data in every cell or field in a row, such as company name, address, and phone number.

A common problem that occurs when a database grows in size is the possibility of duplicate records or rows of data. This doubling occurs when:

  • Entire records are entered into the database more than once. This will result in two or more identical entries

  • Multiple records have one or more fields, such as name and address, that contain the same data.

Duplicate entries cause problems. An example of a problem is that a duplicate record can send multiple copies of documents to the same person when the database information is used in a mail merge. Regularly check and remove duplicate entries to avoid problems.

It is easy to select duplicate records in a small sample, as in the image above. But when data tables contain hundreds or thousands of records, it becomes difficult to select duplicate records, especially overlapping records.

To make this task easier, Excel has a built-in data tool called Remove Duplicates. Remove Duplicates finds and removes identical and overlapping records.

When using the Remove Duplicate Files command, identical and overlapping records must be treated separately. This is because the Remove Duplicates dialog box displays the field names for the selected data table, and you choose which fields to include in the search for matching records:

  • For identical records, search all fields. Leave the check boxes next to all column or field names.

  • For overlapping records, leave the check boxes next to the matching fields.

Field names and column letters

The Remove Duplicates tool consists of a dialog box where you can choose which matching fields to search for by selecting the appropriate fields or column names.

The information displayed in the dialog box, be it field names or column letters, depends on whether your data has a row of headings or headings at the top of the data table, as shown in the image above.

If your data has headers, check the box next to My data with headers. This causes Excel to display the names in that row as field names in the dialog box.

If your data does not have a header, the dialog box displays the corresponding column letters for the selected range of data.

Continuous data range

For the Remove Duplicates tool to work correctly, the data table must be a contiguous range of data. The data table cannot contain empty rows, columns and, if possible, empty cells.

It is good practice not to have any holes in a data table when it comes to data management, not just if you are looking for duplicate data. Other Excel tools such as sorting and filtering work best when the data table is a continuous range of data.

Example of deleting duplicate data records

In the image above, the data table contains two identical entries for: A. Thompson and two overlapping entries for R. Holt. In this example, all fields are the same except for the student number.

The steps below describe how to use the Duplicate Data Remover tool to:

  • Delete the second of two identical entries for A. Thompson.

  • Remove the second overlapping entry for R. Holt.

Open the Remove Duplicates dialog box

  1. Select a cell with data in the sample database.

  2. Select tab Facts

  3. Select Remove Duplicates to select all data in the data table and open the Remove Duplicates dialog box.

This is what you will find in the Remove Duplicates dialog box:

  • The Remove Duplicates dialog box lists any column headings or field names from the sample data.

  • The check marks next to the field names indicate which columns Excel will match when finding duplicate records.

  • When the dialog box opens, all field names are selected.

Find identical records

This case study looks for identical records. Select all column headings and click OKAY.

Here is the result:

  • The dialog box closes and is replaced by the message: 1 duplicate values ​​found and removed; 7 more unique values

  • A. Thompson’s double entry line will be removed from the database

  • For R. Holt, there are two overlapping records because not all fields match. The student number for the two registrations is different. Excel treats each record as a unique data record.

Find and remove overlapping records with Duplicate Remover

Check one box at a time

In the previous example, Excel deleted the data records that exactly match the selected data fields. To find overlapping data records, uncheck only one field at a time as shown in the steps below.

A subsequent search for records matching all fields except name, age, or program removes all possible combinations for overlapping records.

Find partially matching records

  1. Optionally, select a cell with data in the data table.

  2. Select Facts

  3. Select Remove Duplicates to select all data in the data table and open the Remove Duplicates dialog box.

  4. All field names or column headings for the data table are selected.

  5. To find and remove entries that don’t match in every field, clear the check box next to the field names that Excel should ignore.

  6. In this example, clear the check box next to the Student ID column heading.

  7. Excel finds and deletes records with matching data in the Last Name, Initial, and Program fields.

  8. Select Okay

  9. The dialog box closes and is replaced by the message: 1 duplicate values ​​found and removed; 6 unique values ​​remain

  10. The row with the second entry for R. Holt with student ID ST348-252 is removed from the database.

  11. Select Okay to close the message box.

The data table preview is now free of all duplicate data.

Leave a Reply

Your email address will not be published.