Extracting numbers from a list of cells with mixed text is a common data cleaning task.
Unfortunately, there is no direct menu or function created in Excel to help us accomplish this.
In this tutorial we will look at three cases where you might have a list of mixed text, from which you might want to extract numbers:
- When the number is always at the end of the text
- When the number is always at the beginning of the text
- When the number can be anywhere in the text
We will look at three different formulas that can be used to extract the numbers in each case.
At the end of the tutorial, we will also take a look at some VBA code that you can use to accomplish the same.
Brace yourself, this might get a little complex!
Extracting a Number from Mixed Text when Number is Always at the End of the Text
Consider the following example dataset:
Here, each cell has a mix of text and numbers, with the number always appearing at the end of the text. In such cases, we will need to use a combination of nested Excel functions to extract the numbers.
The functions we will use are:
- FIND – This function searches for a character or string in another string and returns its position.
- MIN – This function returns the smallest value in a list.
- LEFT – This function extracts a given number of characters from the left side of a string.
- SUBSTITUTE – This function replaces a particular substring of a given text with another substring.
- IFERROR – This function returns an alternative result or formula if it finds an error in a given formula.
Essentially, we will be using the above functions altogether to perform the following sequence of tasks:
- Find the position of the first numeric value in the given cell
- Extract and remove the text part of the given cell (by removing everything to the left of the first numeric digit)
The formula that we will use to extract the numbers from cell A2 is as follows:
=SUBSTITUTE(A2,LEFT(A2,MIN(IFERROR(FIND({0,1,2,3,4,5,6,7,8,9},A2),""))-1),"")
Let us break down this formula to understand it better. We will go from the inner functions to the outer functions:
- FIND({0,1,2,3,4,5,6,7,8,9},A2)
This function tries to find the positions of all the numbers (0-9) in the cell A2. Thus it returns an arrayformula:
{#VALUE!,#VALUE!,#VALUE!,#VALUE!,#VALUE!,#VALUE!,9,7,8,#VALUE!}
It returns a #VALUE! error for all digits except the 7th, 8th, and 9th digits because it was not able to find these numbers (0,1,2,3,4,5,9) in cell A2. It simply returns the positions of numbers 6,7 and 8 which are the 9th, 7th, and 8th characters respectively in A2.
- IFERROR(FIND({0,1,2,3,4,5,6,7,8,9},A2),””)
Next, the IFERROR function replaces all the error elements of the array with a blank (“”). As such it returns the arrayformula:
{“”,””,””,””,””,””,9,7,8,””}
- MIN(IFERROR(FIND({0,1,2,3,4,5,6,7,8,9},A2),””))
After this the MIN function finds the array element with least value. This is basically the position of the first numeric character in A2. The function now returns the value 7.
- LEFT(A2,MIN(IFERROR(FIND({0,1,2,3,4,5,6,7,8,9},A2),””))-1)
At this point we want to extract all text characters from A2 (so that we can remove them). So we want the LEFT function to extract all the characters starting backwards from the 7-1= 6th character. Thus we use the above formula. The result we get at this point is “arnold”. Notice we subtracted 1 from the second parameter of the LEFT function.
- SUBSTITUTE(A2,LEFT(A2,MIN(IFERROR(FIND({0,1,2,3,4,5,6,7,8,9},A2),””))-1),””)
Now all that’s left to do is remove the string obtained, by replacing it with a blank. This can be easily achieved by using the SUBSTITUTE function: We finally get the numeric characters in the mixed text, which is “786”.
Once you are done entering the formula, make sure you press CTRL+SHIFT+Enter, instead of just the return key. This is because the formula involves arrays.
In a nutshell, here’s what’s happening when you break down the formula:
=SUBSTITUTE(A2,LEFT(A2,MIN(IFERROR(FIND({0,1,2,3,4,5,6,7,8,9},A2),””))-1),””)
=SUBSTITUTE(A2,LEFT(A2,MIN(IFERROR({{#VALUE!,#VALUE!,#VALUE!,#VALUE!,#VALUE!,#VALUE!,9,7,8,#VALUE!}),””))-1),””)
=SUBSTITUTE(A2,LEFT(A2,MIN({“”,””,””,””,””,””,9,7,8,””}))-1),””)
=SUBSTITUTE(A2,”arnold”,””)
=786
When you drag the formula down to the rest of the cells, here’s the result you get:
Also read: How to Generate Random Letters in Excel?
Extracting a Number from Mixed Text when Number is Always at the Beginning of the Text
Now let us consider the case where the numbers are always at the beginning of the Text.
Consider the following example:
Here, each cell has a mix of text and numbers, with the number always appearing at the beginning of the text. In such cases, we will again need to use a combination of nested Excel functions to extract the numbers.
In addition to the functions we used in the previous formula, we are going to use two additional functions. These are:
- MAX – This function returns the largest value in a list
- RIGHT – This function extracts a given number of characters from the right side of a string
- LEN – This function finds the length of (number of characters in) a given string.
Essentially, we will be using these functions altogether to perform the following sequence of tasks:
- Find the position of the last numeric value in the given cell
- Extract and remove the text part of the given cell (by removing everything to the right of the last numeric digit)
The formula that we will use to extract the numbers from cell A2 is as follows:
=SUBSTITUTE(A2,RIGHT(A2,LEN(A2)-MAX(IFERROR(FIND({0,1,2,3,4,5,6,7,8,9},A2),""))),"")
Let us break down this formula to understand it better. We will go from the inner functions to the outer functions:
- FIND({0,1,2,3,4,5,6,7,8,9},A2)
This function tries to find the positions of all the numbers (0-9) in the cell A2. Thus it returns an arrayformula:
{#VALUE!,#VALUE!,1,#VALUE!,#VALUE!,2,#VALUE!,#VALUE!,#VALUE!,#VALUE!}
It returns a #VALUE! error for all digits except the 3rd and 6th digits because it was not able to find these numbers (0,1,3,4,6,7,8,9) in cell A2. It simply returns the positions of numbers 2 and 5 which are the 1st and 2nd characters respectively in A2.
- IFERROR(FIND({0,1,2,3,4,5,6,7,8,9},A2),””)
Next, the IFERROR function replaces all the error elements of the array with a blank. As such it returns the array formula:
{“”,””,1,””,””,2,””,””,””,””}
- MAX(IFERROR(FIND({0,1,2,3,4,5,6,7,8,9},A2),””))
After this the MAX function finds the array element with the highest value. This is basically the position of the last numeric character in A2. The function now returns the value 2.
- LEN(A2)-MAX(IFERROR(FIND({0,1,2,3,4,5,6,7,8,9},A2),””))
At this point we want to extract all text characters from A2 (so that we can remove them). We need to specify how many characters we want to remove. This is obtained by computing the length of the string in A2 minus the position of the last numeric value.Thus we will get the value 8-2 = “6”.
- RIGHT(A2,LEN(A2)-MAX(IFERROR(FIND({0,1,2,3,4,5,6,7,8,9},A2),””)))
We can now use the RIGHT function to extract 6 characters starting from the 2nd character onwards. The result we get at this point is “arnold”.
- SUBSTITUTE(A2,RIGHT(A2,LEN(A2)-MAX(IFERROR(FIND({0,1,2,3,4,5,6,7,8,9},A2),””))),””)
Now all that’s left to do is remove this string by replacing it with a blank. This is easily achieved by using the SUBSTITUTE function. We finally get the numeric characters in the mixed text, which is “25”.
Again, once you are done entering the formula, don’t forget to press CTRL+SHIFT+Enter, instead of just the return key.
In a nutshell, here’s what’s happening when you break down the formula:
=SUBSTITUTE(A7,RIGHT(A7,LEN(A7)-MAX(IFERROR(FIND({0,1,2,3,4,5,6,7,8,9},A7),””))),””)
=SUBSTITUTE(A7,RIGHT(A7,LEN(A7)-MAX(IFERROR({#VALUE!,#VALUE!,1,#VALUE!,#VALUE!,2,#VALUE!,#VALUE!,#VALUE!,#VALUE!}
),””))),””)
=SUBSTITUTE(A7,RIGHT(A7,LEN(A7)-MAX({“”,””,1,””,””,2,””,””,””,””}
)),””)
=SUBSTITUTE(A7,RIGHT(A7,LEN(A7)-2),””)
=SUBSTITUTE(A7,RIGHT(A7,8-2),””)
=SUBSTITUTE(A7,”arnold”,””)
=25
When you drag the formula down to the rest of the cells, here’s the result you get:
Also read: How to Convert to Sentence Case in Excel?
Extracting a Number from Mixed Text when Number can be Anywhere in the Text
Finally let us consider the case where the numbers can be anywhere in the text, be it the beginning, end or middle part of the text.
Let’s take a look at the following example:
Here, each cell has a mix of text and numbers, where the number appears in any part of the text. In such cases, we will need to use a combination of nested Excel functions to extract the numbers.
Here are the functions that we are going to use this time:
- INDIRECT – This function simply returns a reference to a range of values.
- ROW – This function returns a row number of a reference.
- MID – This function extracts a given number of characters from the middle of a string.
- TEXTJOIN – This function combines text from multiple ranges or strings using a specified delimiter between them.
Note that the formula discussed in this section will work only in Excel version 2016 onwards as it uses the newly introduced TEXTJOIN function. If you are using an older version of Excel, then you may consider using the VBA method instead (discussed in the next section).
In this method, we will essentially be using the above functions altogether to perform the following sequence of tasks:
- Break up the given text into an array of individual characters
- Find out and remove all characters that are not numbers
- Combine the remaining characters into a full number
The formula that we will use to extract the numbers from cell A2 is as follows:
=TEXTJOIN("",TRUE,IFERROR(MID(A2,ROW(INDIRECT("1:"&LEN(A2))),1)*1,""))
Let us break down this formula to understand it better. We will go from the inner functions to the outer functions:
- LEN(A2)
This function finds the length of the string in cell A2. In our case it returns 13.
- INDIRECT(“1:”&LEN(A2))
This function just returns a reference to all the rows from row 1 to row 12.
- ROW(INDIRECT(“1:”&LEN(A2)))
Now this function returns the row numbers of each of these rows. In other words, it simply returns an array of numbers 1 to 12. This function thus returns the array:
{1;2;3;4;5;6;7;8;9;10;11;12;13}
Note: We use the ROW() function instead of simply hard-coding an array of numbers 1 to 12 because we want to be able to customize this formula depending on the size of the string being worked on. This will ensure that the function adjusts itself when copied to another cell. ·
- MID(A2,ROW(INDIRECT(“1:”&LEN(A2))),1)
Next the MID function extracts the character from A2 that corresponds to each position specified in the array. In other words, it returns an array containing each character of the text in A2 as a separate element, as follows:
{“a”;”r”;”n”;”o”;”l”;”d”;”1″;”4″;”3″;”b”;”l”;”u”;”e”}
- IFERROR(MID(A2,ROW(INDIRECT(“1:”&LEN(A2))),1)*1,””)
After this, the IFERROR function replaces all the elements of the array that are non-numeric. This is because it checks if the array element can be multiplied by 1. If the element is a number, it can easily be multiplied to return the same number. But if the element is a non-numeric character, then it cannot be multiplied with 1 and thus returns an error.
The IFERROR function here specifies that if an element gives an error on multiplication, the result returned will be blank. As such it returns the arrayformula:
{“”;””;””;””;””;””;1;4;3;””;””;””;””}
- TEXTJOIN(“”,TRUE,IFERROR(MID(A2,ROW(INDIRECT(“1:”&LEN(A2))),1)*1,””))
Finally, we can simply combine the array elements together using the TEXTJOIN function. The TEXTJOIN function here combines the string characters that remain (which are the numbers only) and ignores the empty string characters. We finally get the numeric characters in the mixed text, which is “143”.
Once you are done entering the formula, don’t forget to press CTRL+SHIFT+Enter, instead of just the return key.
In a nutshell, here’s what’s happening when you break down the formula:
=TEXTJOIN(“”,TRUE,IFERROR(MID(A2,ROW(INDIRECT(“1:”&LEN(A2))),1)*1,””))
=TEXTJOIN(“”,TRUE,IFERROR(MID(A2,ROW(INDIRECT(“1:”&13)),1)*1,””))
=TEXTJOIN(“”,TRUE,IFERROR(MID(A2,{1;2;3;4;5;6;7;8;9;10;11;12;13},1)*1,””))
=TEXTJOIN(“”,TRUE,IFERROR({“a”;”r”;”n”;”o”;”l”;”d”;”1″;”4″;”3″;”b”;”l”;”u”;”e”}
*1,””))
=TEXTJOIN(“”,TRUE,IFERROR(MID(A2,{1;2;3;4;5;6;7;8;9;10;11;12;13},1)*1,””))
=TEXTJOIN(“”,TRUE, {“”;””;””;””;””;””;1;4;3;””;””;””;””})
=143
When you drag the formula down to the rest of the cells, here’s the result you get:
Also read: Add Sequential Numbers in Excel
Using VBA to Extract Number from Mixed Text in Excel
The above method works well enough in extracting numbers from anywhere in a mixed text.
However, it requires one to use the TEXTJOIN function, which is not available in older Excel versions (versions before Excel 2016).
If you’re on a version of Excel that does not support TEXTJOIN, then you can, instead, use a snippet of VBA code to get the job done.
If you have never used VBA before, don’t worry. All you need to do is copy the code below into your VBA developer window and run it on your worksheet data.
Here’s the code that you will need to copy:
'Code by Steve Scott from spreadsheetplanet.com
Function ExtractNumbers(CellRef As String)
Dim StringLength As Integer
StringLength = Len(CellRef)
For i = 1 To StringLength
If (IsNumeric(Mid(CellRef, i, 1))) Then Result = Result & Mid(CellRef, i, 1)
Next i
ExtractNumbers = Result
End Function
The above code creates a user-defined function called ExtractNumbers() that you can use in your worksheet to extract numbers from mixed text in any cell.
To get to your VBA developer window, follow these steps:
- From the Developer tab, select Visual Basic.
- Once your VBA window opens, Click Insert->Module. That’s it, you can now start coding.
Type or copy-paste the above lines of code into the module window. Your code is now ready to run.
Now, whenever you want to extract numbers from a cell, simply type the name of the function, passing the cell reference as a parameter. So to extract numbers from a cell A2, you will simply need to type the function as follows in a cell:
=ExtractNumbers(A2)
Explanation of the Code
Now let us take some time to understand how this code works.
- In this code, we defined a function named ExtractNumbers, that takes the string in the cell we want to work on. We assigned the name CellRef to this string.
Function ExtractNumbers(CellRef As String)
- We created a variable named StringLength, that will hold the length of the string, CellRef.
Dim StringLength As Integer
StringLength = Len(CellRef)
- Next, we loop through each character in the string CellRef and find out if it is a number. We use the function Mid(CellRef, i, 1) to extract a character from the string at each iteration of the loop. We also use the IsNumeric() function to find out if the extracted character is a number. Each extracted numeric character is combined together into a string called Result.
For i = 1 To StringLength
If (IsNumeric(Mid(CellRef, i, 1))) Then Result = Result & Mid(CellRef, i, 1)
Next i
- This Result is then returned by the function.
ExtractNumbers = Result
Note that since the workbook now has VBA code in it, you need to save it with .xls or .xlsm extension.
You can also choose to save this to your Personal Macro Workbook, if you think you will be needing to run this code a lot. This will allow you to run the code on any Excel workbook of yours.
Well, that was a lot!
In this tutorial, we showed you how to extract numbers from mixed text in excel.
We saw three cases where the numbers are situated in different parts of the text. We also showed you how to use VBA to get the task done quickly.
Other Excel articles you may also like:
- How to Extract URL from Hyperlinks in Excel (Using VBA Formula)
- How to Reverse a Text String in Excel (Using Formula & VBA)
- How to Add Text to the Beginning or End of all Cells in Excel
- How to Remove Text after a Specific Character in Excel (3 Easy Methods)
- How to Separate Address in Excel?
- How to Extract Text After Space Character in Excel?
- How to Find the Last Space in Text String in Excel?
- How to Remove Space before Text in Excel
- How to Extract Part of Text in a Cell in Excel
What if there are more than 2 numbers in the string and we want both numbers to be shown separately in 2 different columns?
It appears that if there is a repeated number e.g. 1626 has two 6. The formula [ =SUBSTITUTE(AJ8,RIGHT(AJ8,LEN(AJ8)-MAX(IFERROR(FIND({0,1,2,3,4,5,6,7,8,9},AJ8),””))),””) ] will only return one of these numbers e.g. 1626ven will return 162.
Steve,
Liked the VBA solution on Excel 2010. Thanks!
Had a case where the returned Result was seen as text so math functions didn’t work. Used =VALUE() to solve that. probably more efficient ways to do the same as needed.
To YOGENDER’s question, modifying the code to break apart additional numbers is possible. Here’s a quick modification:
Function ExtractNumbersExtended(CellRef As String)
‘ Code by Steve Scott from spreadsheetplanet.com
Dim StringLength As Integer
Dim NumberLetterBreak As Boolean
Dim Result ‘ not sure it would be good to force a type?
StringLength = Len(CellRef)
For i = 1 To StringLength
Zot = Mid(CellRef, i, 1) ‘ isolated the MID result for easier debugging
If (IsNumeric(Zot)) Then
Result = Result & Mid(CellRef, i, 1)
NumberLetterBreak = True
End If
If Not (IsNumeric(Zot)) And NumberLetterBreak Then
NumberLetterBreak = False
Result = Result & ” X ”
End If
Next i
ExtractNumbersExtended = Result
End Function
Second IF block checks if a number HAS already been encountered in the string [flag: NumberLetterBreak] and the current character IS NOT a number. It then resets the flag and adds a delimiter,” X ” , to the Result output but other methods could be used.
Output: Source:
27 X 11 X After 27 years with 11 Zots
27 X 11 X 22 X After 27 years with 11 Zots we look toward 22 more!