Using
|
The
|
|
Glossary entries:
|
Regular ExpressionsRegular Expressions (regex or regexp) are real linguistics, believe it or not. They are the lowest (but by far the most thoroughly and usefully implemented) level of the Chomsky Hierarchy, as used in Unix and in the filter languages sed, awk, and perl. If you've ever used a *.doc wildcard to denote a range of files, or considered constraints on variables in syntax, or taken a class in automata theory, you'll be familiar with some of the issues. Formalized by the logician Stephen Kleene as a method for specifying syntactic structure precisely, regular expressions have been a part of Unix for a quarter-century, and have found their way into many other places. There are, for instance, thousands of search engines on the Web that allow users to employ regular expressions in formulating their queries. To see just how many, try the following query (which is not a canonical regular expression, by the way); it will return the beginning of a long list of Web pages that have the words "submit" and the phrase "regular expression" both appearing somewhere on them, which will largely be search engines. You don't really need Unix to find regular expressions useful. For example, BBeditlite is a downloadable Macintosh editor (with special attachments for editing HTML), that implements regular expressions in a big way. BBeditlite will scan the contents of every text file on a Macintosh hard disk in seconds (or a CD-ROM in minutes), and report or display all files containing a regular expression. | |
Brian W. Kernighan and ...... Rob
Pike,
|
Printed resourcesThere are many printed references on regular expressions; the topic has to be covered, and covered clearly, in every serious Unix book. The best single source of printed information (both on Unix and on regular expressions and their use in filters as well) for sophisticated beginners remains the first four chapters of Brian W. Kernighan and Rob Pike's classic The UNIX Programming Environment, which treat much the same topics as this chapter. This is pretty condensed stuff, but it's admirably clear; Kernighan is not only one the k in awk, and one of the creators of UNIX, but also one of the best writers in information science.This is also evident in Kernighan's other books, in collaboration with other distinguished authors, including the standard reference grammar of the C programming language by Kernighan and Ritchie, in various editions. His books with P. J. Plauger are enormously influential. These include their book on style in programming, (intended in the spirit of Strunk & White's The Elements of Style), and The Old and New Testaments of the Software Tools Philosophy. In the original Software Tools, the programs are in the RATFOR language, now obsolete; in the later Software Tools in Pascal, they are rewritten in Pascal, pace Kernighan's criticisms of Pascal. O'Reilly has recently issued a book specifically on regular expressions, called, regularly enough, Mastering Regular Expressions. The author, Jeffrey Friedl, maintains his own home page for the book with many useful links. The book goes into great detail on the dialect differences that have arisen in different implementations, and devotes considerable attention to Perl regular expressions, which are enhanced beyond the canonical. |