Regular Expressions
Regular Expressions (regex or regexp) are real
linguistics, believe it or not. They are the lowest (but by far the
most thoroughly and usefully implemented) level of the Chomsky
Hierarchy, as used in Unix and in the filter
languages sed, awk, and perl.
If you've ever used a *.doc wildcard to denote a range
of files, or considered constraints on variables in syntax, or taken a
class in automata
theory, you'll be familiar with some of the issues.
Formalized by the logician Stephen Kleene as a method for specifying
syntactic structure precisely, regular expressions have been a part of
Unix for a quarter-century, and have found their way into many other
places. There are, for instance, thousands of search engines on
the Web that allow users to employ regular expressions in formulating
their queries.
To see just how many, try the following query (which is not a
canonical regular expression, by the way); it will return the beginning
of a long list of Web pages that have the words "submit" and the phrase
"regular expression" both appearing somewhere on them, which will
largely be search engines.
You don't really need Unix to find regular expressions useful. For example,
BBeditlite is a downloadable Macintosh editor (with special attachments
for editing HTML), that implements regular expressions in a big way.
BBeditlite will scan the contents of every text file on a Macintosh hard
disk in seconds (or a CD-ROM in minutes), and report or display all
files containing a regular expression.
|