Regex

Regular Expressions (Regex) is a quick way to selectively pattern match and select a sequence characters within a larger sequence of characters

Most Common

Email

(?:[a-z0-9!#$%&'*+/=?^_`{|}~-]+(?:\.[a-z0-9!#$%&'*+/=?^_`{|}~-]+)*|"(?:[\x01-\x08\x0b\x0c\x0e-\x1f\x21\x23-\x5b\x5d-\x7f]|\\[\x01-\x09\x0b\x0c\x0e-\x7f])*")@(?:(?:[a-z0-9](?:[a-z0-9-]*[a-z0-9])?\.)+[a-z0-9](?:[a-z0-9-]*[a-z0-9])?|\[(?:(?:25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)\.){3}(?:25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?|[a-z0-9-]*[a-z0-9]:(?:[\x01-\x08\x0b\x0c\x0e-\x1f\x21-\x5a\x53-\x7f]|\\[\x01-\x09\x0b\x0c\x0e-\x7f])+)\])

IP Address

^(([0-9]|[1-9][0-9]|1[0-9]{2}|2[0-4][0-9]|25[0-5]).){3}([0-9]|[1-9][0-9]|1[0-9]{2}|2[0-4][0-9]|25[0-5])$

URL Address

^(([^:/?#]+):)?(//([^/?#]*))?([^?#]*)(\?([^#]*))?(#(.*))?

International Telephone

\+(9[976]\d|8[987530]\d|6[987]\d|5[90]\d|42\d|3[875]\d|2[98654321]\d|9[8543210]|8[6421]|6[6543210]|5[87654321]|4[987654310]|3[9643210]|2[70]|7|1)\d{1,14}$

Tutorial

A nice step by step tutorial is available here: https://regexone.com/

Anchors

Term	Description
`^`	Start of string, or start of line in multi-line pattern
`\A`	Start of string
`$`	End of string, or end of line in multi-line pattern
`\Z`	End of string
`\b`	Word boundary
`\B`	Not word boundary
`\<`	Start of word
`\>`	End of word

Quantifiers

Term	Description
`*`	0 or more
`{3}`	Exactly 3
`+`	1 or more
`{3,}`	3 or more
`?`	0 or 1
`{3,5}`	3, 4 or 5

Note:Add a ? to a quantifier to make it ungreedy.

Escape Sequences

Term	Description
`\`	Escape following character
`\Q`	Begin literal sequence
`\E`	End literal sequence

Note: “Escaping” is a way of treating characters which have a special meaning in regular expressions literally, rather than as special characters.

Character Classes

Term	Description
`\c`	Control character
`\s`	White space
`\S`	Not white space
`\d`	Digit
`\D`	Not digit
`\w`	Word
`\W`	Not word
`\x`	Hexadecimal digit
`\O`	Octal digit

Special Characters

Term	Description
`\n`	New line
`\r`	Carriage return
`\t`	Tab
`\v`	Vertical tab
`\f`	Form feed
`\xxx`	Octal character xxx
`\xhh`	Hex character hh

Groups and Ranges

Term	Description
`.`	Any character except new line (`\n`)
`(a\\|b)`	a or b
`(...)`	Group
`(?:...)`	Passive (non-capturing) group
`[abc]`	Range (a or b or c)
`[^abc]`	Not (a or b or c)
`[a-q]`	Lower case letter from a to q
`[A-Q]`	Upper case letter from A to Q
`[0-7]`	Digit from 0 to 7
`\x`	Group/subpattern number `x`

Note: Ranges are inclusive.

Pattern Modifiers

Term	Description
`g`	Global match
`i *`	Case-insensitive
`m *`	Multiple lines
`s *`	Treat string as single line
`x *`	Allow comments and whitespace in pattern
`e *`	Evaluate replacement
`U *`	‘Ungreedy’ pattern

Note: * is a PCRE modifier

POSIX

Term	Description
`[:upper:]`	Upper case letters
`[:lower:]`	Lower case letters
`[:alpha:]`	All letters
`[:alnum:]`	Digits and letters
`[:digit:]`	Digits
`[:xdigit:]`	Hexadecimal digits
`[:punct:]`	Punctuation
`[:blank:]`	Space and tab
`[:space:]`	Blank characters
`[:cntrl:]`	Control characters
`[:graph:]`	Printed characters
`[:print:]`	Printed characters and spaces
`[:word:]`	Digits, letters and underscore

Assertions

Term	Description
`?=`	Lookahead assertion
`?!`	Negative lookahead
`?<=`	Lookbehind assertion
`?!=` or `?<!`	Negative lookbehind
`?>`	Once-only sub-expressions
`?()`	Condition [if then]
`?()\\|`	Condition [if then else]
`?#`	Comment

String Replacement

Term	Description
`$n`	nth non-passive group
`$2`	“xyz” in `/^(abc(xyz))$/`
`$1`	“xyz” in `/^(?:abc)(xyz)$/`
$`	Before matched string
`$'`	After matched string
`$+`	Last matched string
`$&`	Entire matched string

Note: Some regex implementations use \ instead of $.

Acknowledgements to Dave Child