Regex

17 Dec 2020

Regular Expressions (Regex) is a quick way to selectively pattern match and select a sequence characters within a larger sequence of characters

Most Common

Email

(?:[a-z0-9!#$%&'*+/=?^_`{|}~-]+(?:\.[a-z0-9!#$%&'*+/=?^_`{|}~-]+)*|"(?:[\x01-\x08\x0b\x0c\x0e-\x1f\x21\x23-\x5b\x5d-\x7f]|\\[\x01-\x09\x0b\x0c\x0e-\x7f])*")@(?:(?:[a-z0-9](?:[a-z0-9-]*[a-z0-9])?\.)+[a-z0-9](?:[a-z0-9-]*[a-z0-9])?|\[(?:(?:25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)\.){3}(?:25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?|[a-z0-9-]*[a-z0-9]:(?:[\x01-\x08\x0b\x0c\x0e-\x1f\x21-\x5a\x53-\x7f]|\\[\x01-\x09\x0b\x0c\x0e-\x7f])+)\])

IP Address

^(([0-9]|[1-9][0-9]|1[0-9]{2}|2[0-4][0-9]|25[0-5]).){3}([0-9]|[1-9][0-9]|1[0-9]{2}|2[0-4][0-9]|25[0-5])$

URL Address

^(([^:/?#]+):)?(//([^/?#]*))?([^?#]*)(\?([^#]*))?(#(.*))?

International Telephone

\+(9[976]\d|8[987530]\d|6[987]\d|5[90]\d|42\d|3[875]\d|2[98654321]\d|9[8543210]|8[6421]|6[6543210]|5[87654321]|4[987654310]|3[9643210]|2[70]|7|1)\d{1,14}$

Tutorial

A nice step by step tutorial is available here: https://regexone.com/

Anchors

Term Description
^ Start of string, or start of line in multi-line pattern
\A Start of string
$ End of string, or end of line in multi-line pattern
\Z End of string
\b Word boundary
\B Not word boundary
\< Start of word
\> End of word

Quantifiers

Term Description
* 0 or more
{3} Exactly 3
+ 1 or more
{3,} 3 or more
? 0 or 1
{3,5} 3, 4 or 5

Note:Add a ? to a quantifier to make it ungreedy.

Escape Sequences

Term Description
\ Escape following character
\Q Begin literal sequence
\E End literal sequence

Note: “­Esc­api­ng” is a way of treating characters which have a special meaning in regular expressions literally, rather than as special characters.

Character Classes

Term Description
\c Control character
\s White space
\S Not white space
\d Digit
\D Not digit
\w Word
\W Not word
\x Hexade­cimal digit
\O Octal digit

Special Characters

Term Description
\n New line
\r Carriage return
\t Tab
\v Vertical tab
\f Form feed
\xxx Octal character xxx
\xhh Hex character hh

Groups and Ranges

Term Description
. Any character except new line (\n)
(a\|b) a or b
(...) Group
(?:...) Passive (non-c­apt­uring) group
[abc] Range (a or b or c)
[^abc] Not (a or b or c)
[a-q] Lower case letter from a to q
[A-Q] Upper case letter from A to Q
[0-7] Digit from 0 to 7
\x Group/­sub­pattern number ­x

Note: Ranges are inclusive.

Pattern Modifiers

Term Description
g Global match
i * Case-insensitive
m * Multiple lines
s * Treat string as single line
x * Allow comments and whitespace in pattern
e * Evaluate replacement
U * ‘Ungreedy’ pattern

Note: * is a PCRE modifier

POSIX

Term Description
[:upper:] Upper case letters
[:lower:] Lower case letters
[:alpha:] All letters
[:alnum:] Digits and letters
[:digit:] Digits
[:xdigit:] Hexadecimal digits
[:punct:] Punctuation
[:blank:] Space and tab
[:space:] Blank characters
[:cntrl:] Control characters
[:graph:] Printed characters
[:print:] Printed characters and spaces
[:word:] Digits, letters and underscore

Assertions

Term Description
?= Lookahead assertion
?! Negative lookahead
?<= Lookbehind assertion
?!= or ?<! Negative lookbehind
?> Once-only sub-expressions
?() Condition [if then]
?()\| Condition [if then else]
?# Comment

String Replacement

Term Description
$n nth non-passive group
$2 “­xyz­” in /^(abc­(xy­z))$/
$1 “­xyz­” in /^(?:a­bc)­(xyz)$/
$` Before matched string
$' After matched string
$+ Last matched string
$& Entire matched string

Note: Some regex implementations use \ instead of $.

Acknowledgements to Dave Child