I’ve covered the main elements of regex patterns and the specific syntax used in Excel’s REGEXTRACT function. Here’s a summary:
Regex Patterns:
.
(dot)*
(star)+
(plus)?
(question mark){n,m}
(curly braces)[...]
(square brackets)^
(caret)$
(dollar sign)|
(pipe)(
and)
(parentheses)\
(backslash)\w
,\W
,\s
,\S
,\d
,\D
,\b
,\B
(?=...)
,(?!...)
,(?<=...)
,(?<!...)
REGEXTRACT Function:
REGEXTRACT(text, regex_pattern, [group_num])
text
: The text string to extract fromregex_pattern
: The regex pattern to match[group_num]
: Optional, specifies which capture group to extract (defaults to 0, which extracts the entire match)
With these elements and syntax, you can create a wide range of regex patterns to extract specific text from strings using the REGEXTRACT function in Excel.
Remember, regex is a powerful tool with a lot of depth.
The pattern I provided earlier is a general overview of the regex syntax, but there are some additional elements that can be used in regex patterns:
\
(backslash): Escapes special characters or denotes a special sequence\w
(word character): Matches any letter, digit, or underscore\W
(non-word character): Matches any character that’s not a letter, digit, or underscore\s
(whitespace): Matches any whitespace character (space, tab, newline, etc.)\S
(non-whitespace): Matches any character that’s not whitespace\d
(digit): Matches any digit (0-9)\D
(non-digit): Matches any character that’s not a digit\b
(word boundary): Matches the position between a word character and a non-word character\B
(non-word boundary): Matches any position that’s not a word boundary(?=...)
(positive lookahead): Asserts that the pattern matches without including it in the result(?!...)
(negative lookahead): Asserts that the pattern doesn’t match without including it in the result(?<=...)
(positive lookbehind): Asserts that the pattern matches before the current position without including it in the result(?<!...)
(negative lookbehind): Asserts that the pattern doesn’t match before the current position without including it in the result
These additional elements can be used to create more complex and specific patterns.
Also, some regex flavors (including the one used in Excel’s REGEXTRACT function) support additional features like:
- Character classes (e.g.,
[a-zA-Z0-9]
) - Quantifiers (e.g.,
*
,+
,?
,{n,m}
) - Grouping and capturing (e.g.,
(...)
) - Alternation (e.g.,
|
)
Compared to the original REGEXTRACT function, the new REGEXTRACT function in Excel 2021 and later versions includes some additional patterns and features, such as:
- Support for Unicode characters and Unicode property escapes (e.g.,
\p{L}
for letters) - Additional syntax for character classes (e.g.,
[a-zA-Z0-9_]
for word characters) - Improved support for character ranges (e.g.,
[a-z]
for lowercase letters) - New quantifiers (e.g.,
*?
and+?
for lazy matching) - Support for lookbehind assertions (e.g.,
(?<=...)
for positive lookbehind) - Support for negative lookbehind assertions (e.g.,
(?<!...)
)
Some examples of new patterns that can be used with the updated REGEXTRACT function include:
\p{L}
: Matches any Unicode letter\p{N}
: Matches any Unicode digit\p{Z}
: Matches any Unicode whitespace character[a-zA-Z0-9_]+
: Matches one or more word characters(?<=abc)def
: Matches “def” only if it’s preceded by “abc”(?<!abc)def
: Matches “def” only if it’s not preceded by “abc”
Keep in mind that the specific syntax and features supported may vary depending on the version of Excel and the regex flavor being used.