Regular expressions (often shortened to “regex”) are widely used in applications that involve matching patterns in text.
The full documentation for Java’s regular expression syntax can be found in the Javadoc for the Pattern class. The tables below include a simplified subset of the full syntax.
This regular expression tutorial introduces the basics of writing Java code that uses regular expressions. Note that the focus of today’s lab is on regular expression syntax, not on writing Java code that uses regular expressions. The Java code will be provided for you.
The examples below illustrate Java-style regular expressions applied to the following string:
cat bat comb catatat catacomb rabbit caaat. rat
Read through each example and make sure you understand why the provided regex matches the indicated regions of the string.
Most individual characters match themselves.
regex literal: "m"
cat cat comic catatat catatonic rabbit caaat. mat
^ ^
Some characters (“metacharacters”) have a special meaning when they appear in a regular expression. Here is the complete list:
<([{\^-=$!|]})?*+.>
If we want to match any of these characters we need to “escape” them
by prefixing them with a \
character. Since \
is itself a
metacharacter, it needs to be escaped to include it in a string
literal representing a regular expression:
regex literal: "\\."
cat cat comic catatat catatonic rabbit caaat. mat
^
Construct | Description |
---|---|
XY | X followed by Y |
X|Y | Either X or Y |
(X) | X, as a group |
regex literal: "caaat\\."
cat cat comic catatat catatonic rabbit caaat. mat
^----^
regex literal: "comic|caaat\\."
cat cat comic catatat catatonic rabbit caaat. mat
^---^ ^----^
Bracket notation can be used to create sets such that any character in the set will be considered a match.
Construct | Description |
---|---|
[abc]
|
a, b, or c (simple class) |
[^abc]
|
Any character except a, b, or c (negation) |
[a-zA-Z]
|
a through z, or A through Z, inclusive (range) |
regex literal: "[cm]at"
cat cat comic catatat catatonic rabbit caaat. mat
^-^ ^-^ ^-^ ^-^ ^-^
Construct | Description |
---|---|
.
|
Any character |
\d
|
A digit: [0-9]
|
\D
|
A non-digit: [^0-9]
|
\s
|
A whitespace character: [ \t\n\x0B\f\r]
|
\S
|
A non-whitespace character: [^\s]
|
\w
|
A word character: [a-zA-Z_0-9]
|
\W
|
A non-word character: [^\w]
|
regex literal: "...m" (any three characters followed by m)
cat cat comic catatat catatonic rabbit caaat. mat
^--^ ^--^
regex literal: "\\w" (any word character)
cat cat comic catatat catatonic rabbit caaat. mat
^^^ ^^^ ^^^^^ ^^^^^^^ ^^^^^^^^^ ^^^^^^ ^^^^^ ^^^
regex literal: "\\W" (any non-word character)
cat cat comic catatat catatonic rabbit caaat. mat
^ ^ ^ ^ ^ ^ ^^
regex literal: "\\s" (any whitespace character)
cat cat comic catatat catatonic rabbit caaat. mat
^ ^ ^ ^ ^ ^ ^
Construct | Description |
---|---|
X?
|
X, once or not at all |
X* | X, zero or more times |
X+ | X, one or more times |
X{n} | X, exactly n times |
X{n,} | X, at least n times |
X{n,m} | X, at least n but not more than m times |
regex literal: "c(at)*" (c followed by zero or more copies of at)
cat cat comic catatat catatonic rabbit caaat. mat
^-^ ^-^ ^ ^ ^-----^ ^---^ ^ ^
regex literal: "c(at)+" (c followed by one or more copies of at)
cat cat comic catatat catatonic rabbit caaat. mat
^-^ ^-^ ^-----^ ^---^
regex literal: "c(at){3}" (c followed by exactly three copies of at)
cat cat comic catatat catatonic rabbit caaat. mat
^-----^
Boundary matchers restrict where a match can be made.
Construct | Description |
---|---|
^
|
The beginning of a line |
$
|
The end of a line |
\b
|
A word boundary |
regex literal: "^cat"
cat cat comic catatat catatonic rabbit caaat. mat
^-^
regex literal: "\\bcat\\b" (the word cat, but not catatonic etc.)
cat cat comic catatat catatonic rabbit caaat. mat
^-^ ^-^
regex literal: "\\b\\w*ic" (any word ending in ic)
cat cat comic catatat catatonic rabbit caaat. mat
^---^ ^-------^
Download the following files:
Open up BleakHouse.txt
and numbers.txt
in a text editor to get
feel for their contents.
Complete the unfinished methods in RegexExercises.java
. Use
SearchDriver.java
to experiment with the regular expressions as you
develop each one. Submit your finished version of
RegexExercises.java
through Autolab.