When you work with text data in Excel, knowing the various techniques for extracting substrings from text strings in a cell is helpful.
You may need to extract parts or substrings of the text in a cell for analysis or to create new data. For example, you may need to extract usernames from email addresses or file extensions from filenames.
Excel does not have a specialized substring function to extract parts of text strings from a cell. However, it has an array of text functions you can use in text formulas to achieve this.
In addition, Excel offers the Find and Replace and the Text to Columns features, which you can use for the same purpose.
In this tutorial, I will show you how to extract substrings from the text in a cell using text formulas, the Find and Replace feature, and the Text to Column feature.
Extract Substrings From Text Strings Using Text Formulas
You can use Excel’s text functions in formulas to extract substrings from text in cells. The text in cells can be of uniform or varying lengths.
Extract Substrings from Text Strings of Uniform Length
In this section, I will show you, through examples, how to extract substrings from text data of uniform length.
Example #1.1: Extract the First Four Characters of Product Codes
You can use the TEXTBEFORE or LEFT functions to extract the first four characters from product codes of uniform length.
Use the LEFT Function
Suppose you have a list of product codes of uniform length and want to extract the first four characters.
The formula below will do the job:
=LEFT(A2,4)
The above formula uses the LEFT function to extract the first four characters from product codes in column A by taking the first four characters.
Use the TEXTBEFORE Function (Only Available in Excel 365)
Suppose you have a list of product codes of uniform length and want to extract the first four characters.
The formula below will do the job:
=TEXTBEFORE(A2, "-")
The above formula uses the TEXTBEFORE function to extract the first four characters from the product codes in column A by taking all the characters before the first hyphen.
Also read: How to Extract Number from Text in Excel (Beginning, End, or Middle)
Example #1.2: Extract the Last Three Characters of Product Codes
You can use the RIGHT or TEXTAFTER functions to extract the last three characters of product codes.
Use the RIGHT Function
Suppose you have a list of product codes of uniform length and want to extract the last three characters.
Here’s the text formula you can utilize:
=RIGHT(A2,3)
The formula above uses the RIGHT function to extract the last three characters of the product codes by taking three characters from the right side of the codes.
Use the TEXTAFTER Function (Only Available in Excel 365)
Suppose you have a list of product codes of uniform length and want to extract the last three characters.
Here’s the text formula you can utilize:
=TEXTAFTER(A2,"-",2)
The above formula extracts the last three characters of the product codes by taking all the characters after the second instance of the hyphen (-) character.
Example #1.3: Extract the Middle Six Characters of Product Codes
You can utilize the MID function to extract the middle six characters from product codes.
Suppose you have a list of product codes of uniform length and want to extract the middle six characters.
Here’s the text formula that you can use:
=MID(A2, 6, 6)
The above formula utilizes the MID function to extract the middle six characters from product codes in column A, beginning with the sixth character.
Extract Substrings from Text Strings of Varying Lengths
Extracting substrings from text data of uniform length using the LEFT, RIGHT, and MID functions is straightforward because you know the exact position of the characters you are targetting.
Extracting substrings from text data of varying lengths is not straightforward because you do not know, ahead of time, exactly where to start the extraction.
You will often need to find specific characters to get the appropriate starting position for the extraction using Excel’s FIND function.
With the FIND function, you can get the position number of a particular character and use that character’s position in other operations.
In this section, I will show you, through examples, how to extract substrings from text strings of varying lengths.
Example #2.1: Extract File Extensions From Filenames
You can utilize the TEXTAFTER or the combination of RIGHT, LEN, and FIND functions to extract file extensions from filenames.
Use RIGHT, LEN, and FIND Functions
Suppose you have a list of filenames and need to extract the file extensions.
Here’s a formula you can use:
=RIGHT(A2, LEN(A2) - FIND(".", A2))
The formula above extracts the extensions from the filenames by extracting the substring from a cell starting from the character immediately following the first occurrence of a period (“.”) to the end of the text string.
Use the TEXTAFTER Function (Only Available in Excel 365)
Suppose you have a list of filenames and need to extract the file extensions.
Here’s a formula you can use:
=TEXTAFTER(A2,".")
The above formula extracts file extensions from the filenames by taking all the characters after the period (“.”).
Also read: Remove a Specific Character from a String in Excel
Example #2.2: Extract Last Name and First Name From Full Names
You can use the TEXTBEFORE, TEXTAFTER, or the combination of RIGHT, LEFT, and FIND functions to extract the first and last names from full names.
Use the Combination of RIGHT, LEFT, and FIND Functions
Suppose you have the list of full names in one column below and want to extract the first and last names into separate columns.
You can use the formula below to extract the first name:
=LEFT(A2, FIND(" ", A2) - 1)
The above formula extracts the first names from the full names in column A by finding the position of the first space using the FIND function and then feeding that position into the LEFT function to extract the characters up to (but not including) the space.
Note: You can adapt the above formula to extract user names from email addresses. All you need to do is replace the space character ( ” “) inside the FIND function with the ‘@’ character, as shown below.
To extract the last name, you can use the formula below:
=RIGHT(A2, LEN(A2) - FIND(" ", A2))
The above formula extracts the text after the first space (last name) in full names in column A using the steps below:
- The FIND function returns the position number of the first space in the full name.
- The LEN function returns the total number of characters in the full name.
- The position number returned by the FIND function is subtracted from the total number of characters returned by the LEN function, and this results in the total number of characters after the first space.
- The RIGHT function uses the number calculated in step 3 above to extract the last name.
Note: You can adapt the above formula to extract full domain names from email addresses. All you need to do is replace the space character ( ” “) inside the FIND function with the ‘@’ character, as shown below:
Note: The above formula extracts the full domain name, which consists of two parts: the second-level domain (SLD) name and the top-level domain (TLD) name. For example, in the ‘example.com’ domain name, ‘example’ is the second-level domain name, and the suffix ‘.com’ is the top-level domain name.
Use the TEXTBEFORE and TEXTAFTER Functions (Only Available in Excel 365)
Suppose you have the list of full names in one column below and want to extract the first and last names into separate columns.
You can use the formula below to extract the first name:
=TEXTBEFORE(A2," ")
The above formula extracts the first names from the full names in column A by taking all the characters before the space (” “) character.
Note: You can adapt the above formula to extract user names from email addresses. All you need to do is replace the space character ( ” “) inside the TEXTBEFORE function with the ‘@’ character, as shown below:
You can use the below formula to extract the last name:
=TEXTAFTER(A2," ")
The above formula extracts the last names from the full names in column A by taking all the characters after the space (” “) character.
Note: You can adapt the above formula to extract full domain names from email addresses. All you need to do is replace the space character ( ” “) inside the TEXTAFTER function with the ‘@’ character, as shown below:
Example #2.3: Extract Second Level Domain (SLD) Names From Email Addresses
Sometimes, you may need to extract a substring from the middle of a text string. For instance, extract the second-level domain (SLD) names from email addresses, such as ‘example’ from ‘alice.smith@example.com.’
To extract a substring from the middle of a text string, you must determine the number positions of the characters immediately before and after the substring.
Suppose you have the following dataset consisting of email addresses and want to extract second-level domain names.
Here’s the formula you can use for the task:
=MID(A2,FIND("@",A2)+1,FIND(".",A2,FIND("@",A2))-FIND("@",A2)-1)
The above formula extracts the second-level domain name from the email addresses.
Here’s a step-by-step explanation of how the formula works:
- FIND(“@”, A2) + 1: This portion of the formula finds the number position of the ‘@’ character and adds 1 to it to get the number position of the first character of the second-level domain name.
- FIND(“.”, A2, FIND(“@”, A2)) – FIND(“@”, A2) – 1: This portion of the formula calculates the length of the second-level domain name by subtracting the number position of the ‘@’ character from the number position of the first period after the ‘@’ character and then subtracting 1.
- MID(A2, FIND(“@”, A2) + 1, FIND(“.”, A2, FIND(“@”, A2)) – FIND(“@”, A2) – 1): The MID function extracts a substring from the email address starting at the number position found in the first step above, which is the first character of the domain name, and of the length calculated in the second step above, which is the length of the second-level domain name.
Note: You can adapt the above formula to extract the middle name from full names by replacing the ‘@’ character with the space (” “) character, as shown below:
Also read: Extract ZIP Code from Address in Excel
Extract Substrings From Text Strings Using the Text to Columns Feature
Using text formulas to extract substrings will give dynamic results that update as the target strings change.
However, if you only need static results, you can use the ‘Text to Columns’ feature to divide text strings into substrings.
Suppose you have the list of full names in one column below and want to extract the first and last names into separate columns.
Here’s how to do it:
- Select the cell range, excluding the header row.
- On the Data tab, click the Text to Columns icon on the Data Tools group.
- On the Convert Text to Columns Wizard—Step 1 of 3 dialog box, select the’ Delimited’ option and click Next.
- On the Convert Text to Columns Wizard—Step 2 of 3 dialog box, select the ‘Space’ delimiter and click Next.
Note: Excel will use the delimiter you choose to split the text string into substrings. Excel previews how the substrings will look in the ‘Data preview’ area at the bottom of the dialog box.
- On the Convert Text to Columns Wizard—Step 3 of 3 dialog box, select the appropriate data format for each column and the destination cell where you want the data and click Finish.
In this example, we have selected the default ‘General’ number format, but you can choose a different number format appropriate for your data. The default destination cell is where you have the original text strings, but you can choose a different cell (as we have done) if you want to retain the original data.
The Convert Text to Columns Wizard splits the full names into first and last names.
Extract Substrings From Text Strings Using the Find and Replace Feature
The Find and Replace feature in Excel allows you to search for specific text or values within a worksheet and replace them with different text or values. You can utilize this feature to extract substrings from text strings.
Let’s say you have a list of email addresses and want to extract usernames using the Find and Replace feature.
Here’s how to do it:
- Copy the original data so that you work on the copy and preserve the original. The Find and Replace feature changes data.
- Select the data.
- On the Home tab, open the Find & Select drop-down on the Editing group and select the ‘Replace’ option.
Alternatively, you can press CTRL + H.
- Do the following on the Find and Replace dialog box that appears:
- Enter @* on the ‘Find what’ drop-down.
- Leave the ‘Replace with’ drop-down empty.
- Click Replace All.
- Click Close.
The above steps remove all text after the ‘@’ character, including the ‘@’ character, from the email addresses, leaving only the usernames as shown below:
In the above steps, we instructed the Find and Replace feature to find all occurrences of ‘@*’ and replace them with empty strings.
The asterisk (*) wildcard character represents any sequence of characters, so the Find and Replace feature replaces all ‘@’ characters and all characters after them with empty strings.
Note: If you want to extract the full domain names from the email addresses, all the steps above will remain the same except on the Find and Replace dialog box, you will enter ‘*@’ instead of ‘@*’ on the ‘Find what’ drop-down menu:
The Find and Replace feature will, this time around, replace the ‘@’ characters and all the text before them with empty strings, leaving only the full domain names as shown below:
In this tutorial, I showed you how to extract part of text in a cell using text formulas, the Text to Columns feature, and the Find and Replace feature. I hope you found the tutorial helpful.
Other Excel articles you may also like: