What Are Regular Expressions?
Regular expressions are a way to describe a set of strings based on common characteristics shared by each string in the set. They can be used to search, edit, or manipulate text and data. You must learn a specific syntax to create regular expressions — one that goes beyond the normal syntax of the JavaTM programming language. Regular expressions vary in complexity, but once you understand the basics of how they're constructed, you'll be able to decipher (or create) any regular expression.This trail teaches the regular expression syntax supported by the
java.util.regex
API and presents several working examples to illustrate how the various objects interact. In the world of regular expressions, there are many different flavors to choose from, such as grep, Perl, Tcl, Python, PHP, and awk. The regular expression syntax in thejava.util.regex
API is most similar to that found in Perl.How Are Regular Expressions Represented in This Package?
Thejava.util.regex
package primarily consists of three classes:Pattern
,Matcher
, andPatternSyntaxException
.
The last few lessons of this trail explore each class in detail. But first, you must understand how regular expressions are actually constructed. Therefore, the next section introduces a simple test harness that will be used repeatedly to explore their syntax.
- A
Pattern
object is a compiled representation of a regular expression. ThePattern
class provides no public constructors. To create a pattern, you must first invoke one of itspublic static compile
methods, which will then return aPattern
object. These methods accept a regular expression as the first argument; the first few lessons of this trail will teach you the required syntax.
- A
Matcher
object is the engine that interprets the pattern and performs match operations against an input string. Like thePattern
class,Matcher
defines no public constructors. You obtain aMatcher
object by invoking thematcher
method on aPattern
object.
- A
PatternSyntaxException
object is an unchecked exception that indicates a syntax error in a regular expression pattern.