Using
|
The
|
|
On Usenet:
|
sed, awk, and perlsed, awk, and perl are some of the Unix utilities that implement Regular Expressions, mostly in tasks requiring pattern matching and substitution.They are widely used for data manipulation, searching, and general programming. While they were originally developed for and are integrated into Unix, they have been ported to every other computing environment, including PCs. sed is a stream editor, which follows commands just like an interactive editor, but is designed to run in batch mode, to perform repetitive search-and-replace commands untouched by human hand. It deals with individual characters and thus is more useful for phonological manipulation than large-scale textual analysis. It is cryptic, though no more so than, say, Turkish Vowel Harmony. awk (named after its authors: Aho, Weinberger, and Kernighan), is a text-oriented pattern-matching language that is at its best and most powerful when coping with large amounts of moderately structured data. For instance, one can perform text analysis on Usenet posts using awk. It is less cryptic than sed, and works at the word level, rather than characters. It can do anything that sed can, but sed is faster and simpler for what it does. Awk exists in several dialects, including nawk ('new awk'), with a richer command set, and gawk ('Gnu awk'), part of the Gnu operating system from Free Software Foundation. Both awk and sed exist on every Unix system; consult the local man pages for details of your specific implementation. They are also available for most microcomputer systems, including DOS. Both are line-oriented, and both have limitations, despite their utility. Perl, by contrast, is a full-featured programming language, designed to be useful for handling text and will do everything sed and awk can and plenty more besides. The script that runs The Chomskybot is written in Perl, and so are most of the CGI scripts that drive search engines and other Web programs. |