Regular Expressions (Regex)
Regular expression (regex) is a tool for expressing patterns in text.
- a group of characters that describe how to execute a specific search pattern on a given text
- two forms:
- basic
- extended
- programs determine which form is supported
- differences are complex and subtle
- use cases:
- validate data
- find patterns in large amounts of text
- search and replace text
- validate email addresses
- search for phone numbers
| Element | Purpose |
|---|---|
| [ABC] | Character set |
| [A-Z] | Range |
| \w | Word |
| \d | Digit |
| \s | Whitespace |
| ^ | Beginning |
| $ | End |
| ? | May or may not exist |
| {1,3} | Quantifier |
Basic Features
Bracket Expressions
Bracket expressions utilize characters enclosed in brackets [] which match any one character within the brackets.
- e.g.,
b[aeiou]gmatches bag, beg, big, bog, and bug - brackets represent a single character in the word
- using a caret
^after the opening bracket matches against any character except the ones specified- e.g.,
b[^aeiou]gmatchesbbgorbAg, but notbagorbeg
- e.g.,
Range Expressions
A range expression is a variant on a bracket expression that uses a range of start and end points separated by a dash -.
- e.g.,
a[2-4]zmatchesa2z.a3z, anda4z
Any Single Character
The dot . represents any single character except a newline.
- e.g,
a.zmatchesa2z,abz, etc.
Start and End of Line
A text line (aka a record) consists of all the characters before the line is terminated with a newline.
- caret
^represents the start of a line- when not used inside of brackets
- dollar sign
$represents the end of a line
Repetition
A full or partial regular expression may be followed by a special symbol to denote repetition of the matched item.
- an asterisk
*denotes zero or more matches - often combined with dot
.*to specify a match with any substring- e.g.,
A.*LincolnmatchesAbe LincolnandAbraham Lincoln
- e.g.,
Escaping
To match a special character literally, you need to escape it.
- precede the character with a backslash
\ - e.g., to match
www.test.comneed to dowww\.test\.com
Extended Features
Additional Repetition Operators
A plus sign + matches one or more occurrences.
A question mark ? matches zero or one match.
Multiple Possible Strings
The vertical bar | separates two possible matches.
- e.g.,
car|truckmatches eithercarortruck
Parentheses
Parentheses () surround subexpressions.
- often used to specify how to apply operators
- e.g.,
file(one|two|three)\.txt
Grep with Regular Expressions
- to use an extended regular expression with
grep, you need to include the-Eoption