Start of Tutorial > Start of Trail > Start of Lesson |
Search
Feedback Form |
ThePattern
API contains a number of useful predefined character classes, which offer convenient shorthands for commonly-used regular expressions:In the table above, each construct in the left-hand column is shorthand for the character class in the right-hand column. For example,
Predefined Character Classes .
Any character (may or may not match line terminators) \d
A digit: [0-9]
\D
A non-digit: [^0-9]
\s
A whitespace character: [ \t\n\x0B\f\r]
\S
A non-whitespace character: [^\s]
\w
A word character: [a-zA-Z_0-9]
\W
A non-word character: [^\w]
\d
means a range of digits (0-9), and\w
means a word character (any lowercase letter, any uppercase letter, the underscore character, or any digit). Use the predefined classes whenever possible. They make your code easier to read and eliminate errors introduced by malformed character classes.Constructs beginning with a backslash are called escaped constructs; we previewed escaped constructs in the String Literals section where we mentioned the use of backslash and
\Q
and\E
for quotation. If you are using an escaped construct within a string literal, you must preceed the backslash with another backslash for the string to compile. For example:In this exampleprivate final String REGEX = "\\d"; // a single digit\d
is the regular expression; the extra backslash is required so that the code compile. Our test harness reads the expressions directly from a file, however, so the extra backslash is unnecessary.The following examples demonstrate the use of predefined character classes. For each case, try to predicit the result before you read the output on the last line.
In the first three examples, our regular expression is simplyCurrent REGEX is: . Current INPUT is: @ I found the text "@" starting at index 0 and ending at index 1. Current REGEX is: . Current INPUT is: 1 I found the text "1" starting at index 0 and ending at index 1. Current REGEX is: . Current INPUT is: a I found the text "a" starting at index 0 and ending at index 1. Current REGEX is: \d Current INPUT is: 1 I found the text "1" starting at index 0 and ending at index 1. Current REGEX is: \d Current INPUT is: a No match found. Current REGEX is: \D Current INPUT is: 1 No match found. Current REGEX is: \D Current INPUT is: a I found the text "a" starting at index 0 and ending at index 1. Current REGEX is: \s Current INPUT is: I found the text " " starting at index 0 and ending at index 1. Current REGEX is: \s Current INPUT is: a No match found. Current REGEX is: \S Current INPUT is: No match found. Current REGEX is: \S Current INPUT is: a I found the text "a" starting at index 0 and ending at index 1. Current REGEX is: \w Current INPUT is: a I found the text "a" starting at index 0 and ending at index 1. Current REGEX is: \w Current INPUT is: ! No match found. Current REGEX is: \W Current INPUT is: a No match found. Current REGEX is: \W Current INPUT is: ! I found the text "!" starting at index 0 and ending at index 1..
(the "period" or "dot" metacharacter) which indicates "any character." Therefore, the match is successful in all three cases (a randomly-selected@
character, a digit, and a letter). The remaining examples each use a single regular expression construct from the Predefined Character Classes table. You can refer to this table to figure out the logic behind each match:Alternatively, a capital letter means the opposite:
\d
matches all digits\s
matches spaces\w
matches word characters
\D
matches non-digits\S
matches non-spaces\W
matches non-word characters
Start of Tutorial > Start of Trail > Start of Lesson |
Search
Feedback Form |
Copyright 1995-2004 Sun Microsystems, Inc. All rights reserved.