Unix for Linguists: Regular Expressions

Using
Computers
in
Linguistics:
A
Practical
Guide

The
UnixTM
Language
Family

Online Appendix:
Regular Expressions

Glossary entries:
Webcom
UNIXhelp
The Jargon File

A regex tutorial
with examples

A searching tip from
Christopher Lane

Regular Expressions at U Mass

Regular Expressions

Regular Expressions (regex or regexp) are real linguistics, believe it or not. They are the lowest (but by far the most thoroughly and usefully implemented) level of the Chomsky Hierarchy, as used in Unix and in the filter languages sed, awk, and perl.

If you've ever used a *.doc wildcard to denote a range of files, or considered constraints on variables in syntax, or taken a class in automata theory, you'll be familiar with some of the issues.

Formalized by the logician Stephen Kleene as a method for specifying syntactic structure precisely, regular expressions have been a part of Unix for a quarter-century, and have found their way into many other places. There are, for instance, thousands of search engines on the Web that allow users to employ regular expressions in formulating their queries.

To see just how many, try the following query (which is not a canonical regular expression, by the way); it will return the beginning of a long list of Web pages that have the words "submit" and the phrase "regular expression" both appearing somewhere on them, which will largely be search engines.

You don't really need Unix to find regular expressions useful. For example, BBeditlite is a downloadable Macintosh editor (with special attachments for editing HTML), that implements regular expressions in a big way. BBeditlite will scan the contents of every text file on a Macintosh hard disk in seconds (or a CD-ROM in minutes), and report or display all files containing a regular expression.

Brian W. Kernighan and ...

... Rob Pike,
The Unix Programming Environment

... Dennis Ritchie,
The C Programming Language

... P. J. Plauger,
The Elements of Programming Style

O'Reilly
Mastering Regular Expressions
Author's homepage

Patterns in Perl
Regexps in Perl
Perl Regular Expressions

Printed resources

There are many printed references on regular expressions; the topic has to be covered, and covered clearly, in every serious Unix book. The best single source of printed information (both on Unix and on regular expressions and their use in filters as well) for sophisticated beginners remains the first four chapters of Brian W. Kernighan and Rob Pike's classic The UNIX Programming Environment, which treat much the same topics as this chapter. This is pretty condensed stuff, but it's admirably clear; Kernighan is not only one the k in awk, and one of the creators of UNIX, but also one of the best writers in information science.

This is also evident in Kernighan's other books, in collaboration with other distinguished authors, including the standard reference grammar of the C programming language by Kernighan and Ritchie, in various editions. His books with P. J. Plauger are enormously influential. These include their book on style in programming, (intended in the spirit of Strunk & White's The Elements of Style), and The Old and New Testaments of the Software Tools Philosophy. In the original Software Tools, the programs are in the RATFOR language, now obsolete; in the later Software Tools in Pascal, they are rewritten in Pascal, pace Kernighan's criticisms of Pascal.

O'Reilly has recently issued a book specifically on regular expressions, called, regularly enough, Mastering Regular Expressions. The author, Jeffrey Friedl, maintains his own home page for the book with many useful links.

The book goes into great detail on the dialect differences that have arisen in different implementations, and devotes considerable attention to Perl regular expressions, which are enhanced beyond the canonical.

Back to Chapter Appendix Up to Book Page
Unix in General Shells and Aliases Filter Languages

Last change October 19, 1998 John Lawler

Using
Computers
in
Linguistics:
A
Practical
Guide

The
UnixTM
Language
Family

Online Appendix:
Regular Expressions

Glossary entries:
Webcom
UNIXhelp
The Jargon File

A regex tutorial
with examples

A searching tip from
Christopher Lane

Regular Expressions at U Mass

Regexp Syntax

Grep and Regular Expressions

BBeditlite download

Regular Expressions

Brian W. Kernighan and ...

... Rob Pike,
The Unix Programming Environment

... Dennis Ritchie,
The C Programming Language

... P. J. Plauger,
The Elements of Programming Style

O'Reilly
Mastering Regular Expressions
Author's homepage

Patterns in Perl
Regexps in Perl
Perl Regular Expressions

Printed resources

Using Computers in Linguistics: A Practical Guide

The UnixTM Language Family

Online Appendix: Regular Expressions

Glossary entries: WebcomUNIXhelp The Jargon File

A regex tutorialwith examples

A searching tip from Christopher Lane

Regular Expressions at U Mass

Regexp Syntax

Grep and Regular Expressions

BBeditlite download

Regular Expressions

Brian W. Kernighan and ...

... Rob Pike, The Unix Programming Environment

... Dennis Ritchie,The C Programming Language

... P. J. Plauger, The Elements of Programming Style

O'Reilly Mastering Regular Expressions Author's homepage Patterns in Perl Regexps in Perl Perl Regular Expressions

Printed resources

Using
Computers
in
Linguistics:
A
Practical
Guide

The
UnixTM
Language
Family

Online Appendix:
Regular Expressions

Glossary entries:
Webcom
UNIXhelp
The Jargon File

A regex tutorial
with examples

A searching tip from
Christopher Lane

... Rob Pike,
The Unix Programming Environment

... Dennis Ritchie,
The C Programming Language

... P. J. Plauger,
The Elements of Programming Style

O'Reilly
Mastering Regular Expressions
Author's homepage

Patterns in Perl
Regexps in Perl
Perl Regular Expressions