Regular Expressions (Regex) is a quick way to selectively pattern match and select a sequence characters within a larger sequence of characters
Most Common
Email
(?:[a-z0-9!#$%&'*+/=?^_`{|}~-]+(?:\.[a-z0-9!#$%&'*+/=?^_`{|}~-]+)*|"(?:[\x01-\x08\x0b\x0c\x0e-\x1f\x21\x23-\x5b\x5d-\x7f]|\\[\x01-\x09\x0b\x0c\x0e-\x7f])*")@(?:(?:[a-z0-9](?:[a-z0-9-]*[a-z0-9])?\.)+[a-z0-9](?:[a-z0-9-]*[a-z0-9])?|\[(?:(?:25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)\.){3}(?:25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?|[a-z0-9-]*[a-z0-9]:(?:[\x01-\x08\x0b\x0c\x0e-\x1f\x21-\x5a\x53-\x7f]|\\[\x01-\x09\x0b\x0c\x0e-\x7f])+)\])
IP Address
^(([0-9]|[1-9][0-9]|1[0-9]{2}|2[0-4][0-9]|25[0-5]).){3}([0-9]|[1-9][0-9]|1[0-9]{2}|2[0-4][0-9]|25[0-5])$
URL Address
^(([^:/?#]+):)?(//([^/?#]*))?([^?#]*)(\?([^#]*))?(#(.*))?
International Telephone
\+(9[976]\d|8[987530]\d|6[987]\d|5[90]\d|42\d|3[875]\d|2[98654321]\d|9[8543210]|8[6421]|6[6543210]|5[87654321]|4[987654310]|3[9643210]|2[70]|7|1)\d{1,14}$
Tutorial
A nice step by step tutorial is available here: https://regexone.com/
Anchors
Term |
Description |
^ |
Start of string, or start of line in multi-line pattern |
\A |
Start of string |
$ |
End of string, or end of line in multi-line pattern |
\Z |
End of string |
\b |
Word boundary |
\B |
Not word boundary |
\< |
Start of word |
\> |
End of word |
Quantifiers
Term |
Description |
* |
0 or more |
{3} |
Exactly 3 |
+ |
1 or more |
{3,} |
3 or more |
? |
0 or 1 |
{3,5} |
3, 4 or 5 |
Note:Add a ? to a quantifier to make it ungreedy.
Escape Sequences
Term |
Description |
\ |
Escape following character |
\Q |
Begin literal sequence |
\E |
End literal sequence |
Note: “Escaping” is a way of treating characters which have a special meaning in regular expressions literally, rather than as special characters.
Character Classes
Term |
Description |
\c |
Control character |
\s |
White space |
\S |
Not white space |
\d |
Digit |
\D |
Not digit |
\w |
Word |
\W |
Not word |
\x |
Hexadecimal digit |
\O |
Octal digit |
Special Characters
Term |
Description |
\n |
New line |
\r |
Carriage return |
\t |
Tab |
\v |
Vertical tab |
\f |
Form feed |
\xxx |
Octal character xxx |
\xhh |
Hex character hh |
Groups and Ranges
Term |
Description |
. |
Any character except new line (\n ) |
(a\|b) |
a or b |
(...) |
Group |
(?:...) |
Passive (non-capturing) group |
[abc] |
Range (a or b or c) |
[^abc] |
Not (a or b or c) |
[a-q] |
Lower case letter from a to q |
[A-Q] |
Upper case letter from A to Q |
[0-7] |
Digit from 0 to 7 |
\x |
Group/subpattern number x |
Note: Ranges are inclusive.
Pattern Modifiers
Term |
Description |
g |
Global match |
i * |
Case-insensitive |
m * |
Multiple lines |
s * |
Treat string as single line |
x * |
Allow comments and whitespace in pattern |
e * |
Evaluate replacement |
U * |
‘Ungreedy’ pattern |
Note: *
is a PCRE modifier
POSIX
Term |
Description |
[:upper:] |
Upper case letters |
[:lower:] |
Lower case letters |
[:alpha:] |
All letters |
[:alnum:] |
Digits and letters |
[:digit:] |
Digits |
[:xdigit:] |
Hexadecimal digits |
[:punct:] |
Punctuation |
[:blank:] |
Space and tab |
[:space:] |
Blank characters |
[:cntrl:] |
Control characters |
[:graph:] |
Printed characters |
[:print:] |
Printed characters and spaces |
[:word:] |
Digits, letters and underscore |
Assertions
Term |
Description |
?= |
Lookahead assertion |
?! |
Negative lookahead |
?<= |
Lookbehind assertion |
?!= or ?<! |
Negative lookbehind |
?> |
Once-only sub-expressions |
?() |
Condition [if then] |
?()\| |
Condition [if then else] |
?# |
Comment |
String Replacement
Term |
Description |
$n |
nth non-passive group |
$2 |
“xyz” in /^(abc(xyz))$/ |
$1 |
“xyz” in /^(?:abc)(xyz)$/ |
$` |
Before matched string |
$' |
After matched string |
$+ |
Last matched string |
$& |
Entire matched string |
Note: Some regex implementations use \
instead of $.
Acknowledgements to Dave Child