Наши партнеры








Книги по Linux (с отзывами читателей)

Библиотека сайта rus-linux.net

[ < ] [ > ]   [ << ] [ Up ] [ >> ]         [Top] [Contents] [Index] [ ? ]

14. Formatting Text

Methods and tools for changing the arrangement or presentation of text are often useful for preparing text for printing. This chapter discusses ways of changing the spacing of text and setting up pages, of underlining and sorting and reversing text, and of numbering lines of text.

14.1 Spacing Text  Change the spacing in text.
14.2 Paginating Text  Paginating text.
14.3 Underlining Text  Underlining text.
14.4 Sorting Text  Sorting text.
14.5 Numbering Lines of Text  Numbering text.
14.6 Reversing Text  Reversing text.


[ < ] [ > ]   [ << ] [ Up ] [ >> ]         [Top] [Contents] [Index] [ ? ]

14.1 Spacing Text

These recipes are for changing the spacing of text--the whitespace that exists between words, lines, and paragraphs.

The filters described in this section send output to standard output by default; to save their output to a file, use shell redirection (see section Redirecting Output to a File).

14.1.1 Eliminating Extra Spaces in Text  Making the whitespace the same.
14.1.2 Single-Spacing Text  Single-spacing text.
14.1.3 Double-Spacing Text  Double-spacing text.
14.1.4 Triple-Spacing Text  Triple-spacing text.
14.1.5 Adding Line Breaks to Text  Putting line breaks in text.
14.1.6 Adding Margins to Text  Putting margins in text.
14.1.7 Swapping Tab and Space Characters  Swapping tab and space characters.


[ < ] [ > ]   [ << ] [ Up ] [ >> ]         [Top] [Contents] [Index] [ ? ]

14.1.1 Eliminating Extra Spaces in Text

To eliminate extra whitespaces within lines of text, use the fmt filter; to eliminate extra whitespace between lines of text, use cat.

Use fmt with the `-u' option to output text with "uniform spacing," where the space between words is reduced to one space character and the space between sentences is reduced to two space characters.

  • To output the file `term-paper' with uniform spacing, type:

     
    $ fmt -u term-paper RET
    

Use cat with the `-s' option to "squeeze" multiple adjacent blank lines into one.

  • To output the file `term-paper' with multiple blank lines output as only one blank line, type:

     
    $ cat -s term-paper RET
    

You can combine both of these commands to output text with multiple adjacent lines removed and give it a unified spacing between words. The following example shows how the output of the combined commands is sent to less so that it can be perused on the screen.

  • To peruse the text file `term-paper' with multiple blank lines removed and giving the text unified spacing between words, type:

     
    $ cat -s term-paper | fmt -u | less RET
    

Notice that in this example, both fmt and less worked on their standard input instead of on a file--the standard output of cat (the contents of `term-paper' with extra blank lines squeezed out) was passed to the standard input of fmt, and its standard output (the space-squeezed `term-paper', now with uniform spacing) was sent to the standard input of less, which displayed it on the screen.


[ < ] [ > ]   [ << ] [ Up ] [ >> ]         [Top] [Contents] [Index] [ ? ]

14.1.2 Single-Spacing Text

There are many methods for single-spacing text. To remove all empty lines from text output, use grep with the regular expression `.', which matches any character, and therefore matches any line that isn't empty (see section Regular Expressions--Matching Text Patterns). You can then redirect this output to a file, or pipe it to other commands; the original file is not altered.

  • To output all non-empty lines from the file `term-paper', type:

     
    $ grep . term-paper RET
    

This command outputs all lines that are not empty--so lines containing only non-printing characters, such as spaces and tabs, will still be output.

To remove from the output all empty lines, and all lines that consist of only space characters, use `[^ ].' as the regexp to search for. But this regexp will still output lines that contain only tab characters; to remove from the output all empty lines and lines that contain only a combination of tab or space characters, use `[^[:space:]].' as the regexp to search for. It uses the special predefined `[:space:]' regexp class, which matches any kind of space character at all, including tabs.

  • To output only the lines from the file `term-paper' that contain more than just space characters, type:
     
    $ grep '[^ ].' term-paper RET
    

    To output only the lines from the file `term-paper' that contain more than just space or tab characters, type:

     
    $ grep '[^[:space:]].' term-paper RET
    

If a file is already double-spaced, where all even lines are blank, you can remove those lines from the output by using sed with the `n;d' expression.

  • To output only the odd lines from file `term-paper', type:

     
    $ sed 'n;d' term-paper RET
    


[ < ] [ > ]   [ << ] [ Up ] [ >> ]         [Top] [Contents] [Index] [ ? ]

14.1.3 Double-Spacing Text

To double-space text, where one blank line is inserted between each line in the original text, use the pr tool with the `-d' option. By default, pr paginates text and puts a header at the top of each page with the current date, time, and page number; give the `-t' option to omit this header.

  • To double-space the file `term-paper' and write the output to the file `term-paper.print', type:

     
    $ pr -d -t term-paper > term-paper.print RET
    

To send the output directly to the printer for printing, you would pipe the output to lpr:

 
$ pr -d -t term-paper | lpr RET

NOTE: The pr ("print") tool is a text pre-formatter, often used to paginate and otherwise prepare text files for printing; there is more discussion on the use of this tool in Paginating Text.


[ < ] [ > ]   [ << ] [ Up ] [ >> ]         [Top] [Contents] [Index] [ ? ]

14.1.4 Triple-Spacing Text

To triple-space text, where two blank lines are inserted between each line of the original text, use sed with the `'G;G'' expression.

  • To triple-space the file `term-paper' and write the output to the file `term-paper.print', type:

     
    $ sed 'G;G' term-paper > term-paper.print RET
    

The `G' expression appends one blank line to each line of sed's output; using `;' you can specify more than one blank line to append (but you must quote this command, because the semicolon (`;') has meaning to the shell--see Passing Special Characters to Commands). You can use multiple `G' characters to output text with more than double or triple spaces.

  • To quadruple-space the file `term-paper', and write the output to the file `term-paper.print', type:

     
    $ sed 'G;G;G' term-paper > term-paper.print RET
    

The usage of sed is described in Editing Streams of Text.


[ < ] [ > ]   [ << ] [ Up ] [ >> ]         [Top] [Contents] [Index] [ ? ]

14.1.5 Adding Line Breaks to Text

Sometimes a file will not have line breaks at the end of each line (this commonly happens during file conversions between operating systems). To add line breaks to a file that does not have them, use the text formatter fmt. It outputs text with lines arranged up to a specified width; if no length is specified, it formats text up to a width of 75 characters per line.

  • To output the file `term-paper' with lines up to 75 characters long, type:

     
    $ fmt term-paper RET
    

Use the `-w' option to specify the maximum line width.

  • To output the file `term-paper' with lines up to 80 characters long, type:

     
    $ fmt -w 80 term-paper RET
    


[ < ] [ > ]   [ << ] [ Up ] [ >> ]         [Top] [Contents] [Index] [ ? ]

14.1.6 Adding Margins to Text

Giving text an extra left margin is especially good when you want to print a copy and punch holes in it for use with a three-ring binder.

To output a text file with a larger left margin, use pr with the file name as an argument; give the `-t' option (to disable headers and footers), and, as an argument to the `-o' option, give the number of spaces to offset the text. Add the number of spaces to the page width (whose default is 72) and specify this new width as an argument to the `-w' option.

  • To output the file `owners-manual' with a five-space (or five-column) margin to a new file, `owners-manual.pr', type:

     
    $ pr -t -o 5 -w 77 owners-manual > owners-manual.pr RET
    

This command is almost always used for printing, so the output is usually just piped to lpr instead of saved to a file. Many text documents have a width of 80 and not 72 columns; if you are printing such a document and need to keep the 80 columns across the page, specify a new width of 85. If your printer can only print 80 columns of text, specify a width of 80; the text will be reformatted to 75 columns after the 5-column margin.

  • To print the file `owners-manual' with a 5-column margin and 80 columns of text, type:
     
    $ pr -t -o 5 -w 85 owners-manual | lpr RET
    

  • To print the file `owners-manual' with a 5-column margin and 75 columns of text, type:

     
    $ pr -t -o 5 -w 80 owners-manual | lpr RET
    


[ < ] [ > ]   [ << ] [ Up ] [ >> ]         [Top] [Contents] [Index] [ ? ]

14.1.7 Swapping Tab and Space Characters

Use the expand and unexpand tools to swap tab characters for space characters, and to swap space characters with tabs, respectively.

Both tools take a file name as an argument and write changes to the standard output; if no files are specified, they work on the standard input.

To convert tab characters to spaces, use expand. To convert only the initial or leading tabs on each line, give the `-i' option; the default action is to convert all tabs.

  • To convert all tab characters to spaces in file `list', and write the output to `list2', type:
     
    $ expand list > list2 RET
    

  • To convert only initial tab characters to spaces in file `list', and write the output to the standard output, type:

     
    $ expand -i list RET
    

To convert multiple space characters to tabs, use unexpand. By default, it only converts leading spaces into tabs, counting eight space characters for each tab. Use the `-a' option to specify that all instances of eight space characters be converted to tabs.

  • To convert every eight leading space characters to tabs in file `list2', and write the output to `list', type:
     
    $ unexpand list2 > list RET
    

  • To convert all occurrences of eight space characters to tabs in file `list2', and write the output to the standard output, type:

     
    $ unexpand -a list2 RET
    

To specify the number of spaces to convert to a tab, give that number as an argument to the `-t' option.

  • To convert every leading space character to a tab character in `list2', and write the output to the standard output, type:

     
    $ unexpand -t 1 list2 RET
    


[ < ] [ > ]   [ << ] [ Up ] [ >> ]         [Top] [Contents] [Index] [ ? ]

14.2 Paginating Text

The formfeed character, ASCII C-l or octal code 014, is the delimiter used to paginate text. When you send text with a formfeed character to the printer, the current page being printed is ejected and a new page begins--thus, you can paginate a text file by inserting formfeed characters at a place where you want a page break to occur.

To insert formfeed characters in a text file, use the pr filter.

Give the `-f' option to omit the footer and separate pages of output with the formfeed character, and use `-h ""' to output a blank header (otherwise, the current date and time, file name, and current page number are output at the top of each page).

  • To paginate the file `listings' and write the output to a file called `listings.page', type:

     
    $ pr -f -h "" listings > listings.page RET
    

By default, pr outputs pages of 66 lines each. You can specify the page length as an argument to the `-l' option.

  • To paginate the file `listings' with 43-line pages, and write the output to a file called `listings.page', type:

     
    $ pr -f -h "" -l 43 listings > listings.page RET
    

NOTE: If a page has more lines than a printer can fit on a physical sheet of paper, it will automatically break the text at that line as well as at the places in the text where there are formfeed characters.

You can paginate text in Emacs by manually inserting formfeed characters where you want them--see Inserting Special Characters in Emacs.

14.2.1 Placing Headers on Each Page  Putting headers on a page.
14.2.2 Placing Text in Columns  Putting text in columns.
14.2.3 Options Available When Paginating Text  More options for pagination.


[ < ] [ > ]   [ << ] [ Up ] [ >> ]         [Top] [Contents] [Index] [ ? ]

14.2.1 Placing Headers on Each Page

The pr tool is a general-purpose page formatter and print-preparation utility. By default, pr outputs text in pages of 66 lines each, with headers at the top of each page containing the date and time, file name, and page number, and footers containing five blank lines.

  • To print the file `duchess' with the default pr preparation, type:

     
    $ pr duchess | lpr RET
    


[ < ] [ > ]   [ << ] [ Up ] [ >> ]         [Top] [Contents] [Index] [ ? ]

14.2.2 Placing Text in Columns

You can also use pr to put text in columns--give the number of columns to output as an argument. Use the `-t' option to omit the printing of the default headers and footers.

  • To print the file `news.update' in four columns with no headers or footers, type:

     
    $ pr -4 -t news.update | lpr RET
    


[ < ] [ > ]   [ << ] [ Up ] [ >> ]         [Top] [Contents] [Index] [ ? ]

14.2.3 Options Available When Paginating Text

The following table describes some of pr's options; see the pr info for a complete description of its capabilities (see section Using the GNU Info System).

OPTION DESCRIPTION
+first:last Specify the first and last page to process; the last page can be omitted, so +7 begins processing with the seventh page and continues until the end of the file is reached.
-column Specify the number of columns to output text in, making all columns fit the page width.
-a Print columns across instead of down.
-c Output control characters in hat notation and print all other unprintable characters in "octal backslash" notation.
-d Specify double-spaced output.
-f Separate pages of output with a formfeed character instead of a footer of blank lines (63 lines of text per 66-line page instead of 53).
-h header Specify the header to use instead of the default; specify -h "" for a blank header.
-l length Specify the page length to be length lines (default 66). If page length is less than 11, headers and footers are omitted and existing form feeds are ignored.
-m Use when specifying multiple files; this option merges and outputs them in parallel, one per column.
-o spaces Set the number of spaces to use in the left margin (default 0).
-t Omit the header and footer on each page, but retain existing formfeeds.
-T Omit the header and footer on each page, as well as existing formfeeds.
-v Output non-printing characters in "octal backslash" notation.
-w width Specify the page width to use, in characters (default 72).

NOTE: It's also common to use pr to change the spacing of text (see section Spacing Text).


[ < ] [ > ]   [ << ] [ Up ] [ >> ]         [Top] [Contents] [Index] [ ? ]

14.3 Underlining Text

In the days of typewriters, text that was meant to be set in an italicized font was denoted by underlining the text with underscore characters; now, it's common practice to denote an italicized word in plain text by typing an underscore character, `_', just before and after a word in a text file, like `_this_'.

Some text markup languages use different methods for denoting italics; for example, in TeX or LaTeX files, italicized text is often denoted with brackets and the `\it' command, like `{\it this}'. (LaTeX files use the same format, but `\emph' is often used in place of `\it'.)

You can convert one form to the other by using the Emacs replace-regular-expression function and specifying the text to be replaced as a regexp (see section Regular Expressions--Matching Text Patterns).

  • To replace plaintext-style italics with TeX `\it' commands, type:
     
     M-x replace-regular-expression RET
    _\([^_]+\)_ RET
    \{\\it \1} RET
    

  • To replace TeX-style italics with plaintext _underscores_, type:

     
     M-x replace-regular-expression RET
    \{\\it \{\([^\}]+\)\} RET
    _\1_ RET
    

Both examples above used the special regexp symbol `\1', which matches the same text matched by the first `\( ... \)' construct in the previous regexp. See Info file `emacs-e20.info', node `Regexps' for more information on regexp syntax in Emacs.

To put a literal underline under text, you need to use a text editor to insert a C-h character followed by an underscore (`_') immediately after each character you want to underline; you can insert the C-h in Emacs with the C-q function (see section Inserting Special Characters in Emacs).

When a text file contains these literal underlines, use the ul tool to output the file so that it is viewable by the terminal you are using; this is also useful for printing (pipe the output of ul to lpr).

  • To output the file `term-paper' so that you can view underbars, type:

     
    $ ul term-paper RET
    

To output such text without the backspace character, C-h, in the output, use col with the `-u' option.

  • To output the file `term-paper' with all backspace characters stripped out, type:

     
    $ col -u term-paper RET
    


[ < ] [ > ]   [ << ] [ Up ] [ >> ]         [Top] [Contents] [Index] [ ? ]

14.4 Sorting Text

You can sort a list in a text file with sort. By default, it outputs text in ascending alphabetical order; use the `-r' option to reverse the sort and output text in descending alphabetical order.

For example, suppose a file `provinces' contains the following:

 
Shantung
Honan
Szechwan
Hunan
Kiangsu
Kwangtung
Fukien

  • To sort the file `provinces' and output all lines in ascending order, type:
     
    $ sort provinces RET
    Fukien
    Honan
    Hunan
    Kiangsu
    Kwangtung
    Shantung
    Szechwan
    $
    

  • To sort the file `provinces' and output all lines in descending order, type:

     
    $ sort -r provinces RET
    Szechwan
    Shantung
    Kwangtung
    Kiangsu
    Hunan
    Honan
    Fukien
    $
    

The following table describes some of sort's options.

OPTION DESCRIPTION
-b Ignore leading blanks on each line when sorting.
-d Sort in "phone directory" order, with only letters, digits, and blanks being sorted.
-f When sorting, fold lowercase letters into their uppercase equivalent, so that differences in case are ignored.
-i Ignore all spaces and all non-typewriter characters when sorting.
-n Sort numerically instead of by character value.
-o file Write output to file instead of standard output.


[ < ] [ > ]   [ << ] [ Up ] [ >> ]         [Top] [Contents] [Index] [ ? ]

14.5 Numbering Lines of Text

There are several ways to number lines of text.

One way to do it is to use the nl ("number lines") tool. Its default action is to write its input (either the file names given as an argument, or the standard input) to the standard output, with an indentation and all non-empty lines preceded with line numbers.

  • To peruse the file `report' with each line of the file preceded by line numbers, type:

     
    $ nl report | less RET
    

You can set the numbering style with the `-b' option followed by an argument. The following table lists the possible arguments and describes the numbering style they select.

ARGUMENT NUMBERING STYLE
a Number all lines.
t Number only non-blank lines. This is the default.
n Do not number lines.
pregexp Only number lines that contain the regular expression regexp (see section Regular Expressions--Matching Text Patterns).

The default is for line numbers to start with one, and increment by one. Set the initial line number by giving an argument to the `-v' option, and set the increment by giving an argument to the `-i' option.

  • To output the file `report' with each line of the file preceded by line numbers, starting with the number two and counting by fours, type:
     
    $ nl -v 2 -i 4 report RET
    

  • To number only the lines of the file `cantos' that begin with a period (`.'), starting numbering at zero and using a numbering increment of five, and to write the output to `cantos.numbered', type:

     
    $ nl -i 5 -v 0 -b p'^\.' cantos > cantos.numbered RET
    

The other way to number lines is to use cat with one of the following two options: the `-n' option numbers each line of its input text, while the `-b' option only numbers non-blank lines.

  • To peruse the text file `report' with each line of the file numbered, type:
     
    $ cat -n report | less RET
    

  • To peruse the text file `report' with each non-blank line of the file numbered, type:

     
    $ cat -b report | less RET
    

In the preceding examples, output from cat is piped to less for perusal; the original file is not altered.

To take an input file, number its lines, and then write the line-numbered version to a new file, send the standard output of the cat command to the new file to write.

  • To write a line-numbered version of file `report' to file `report.lines', type:

     
    $ cat -n report > report.lines RET
    


[ < ] [ > ]   [ << ] [ Up ] [ >> ]         [Top] [Contents] [Index] [ ? ]

14.6 Reversing Text

The tac command is similar to cat, but it outputs text in reverse order. There is another difference---tac works on records, sections of text with separator strings, instead of lines of text. Its default separator string is the linebreak character, so by default tac outputs files in line-for-line reverse order.

  • To output the file `prizes' in line-for-line reverse order, type:

     
    $ tac prizes RET 
    

Specify a different separator with the `-s' option. This is often useful when specifying non-printing characters such as formfeeds. To specify such a character, use the ANSI-C method of quoting (see section Passing Special Characters to Commands).

  • To output `prizes' in page-for-page reverse order, type:
     
    $ tac -s $'\f' prizes RET 
    

The preceding example uses the formfeed, or page break, character as the delimiter, and so it outputs the file `prizes' in page-for-page reverse order, with the last page output first.

Use the `-r' option to use a regular expression for the separator string (see section Regular Expressions--Matching Text Patterns). You can build regular expressions to output text in word-for-word and character-for-character reverse order:

  • To output `prizes' in word-for-word reverse order, type:
     
    $ tac -r -s '[^a-zA-z0-9\-]' prizes RET 
    

  • To output `prizes' in character-for-character reverse order, type:
     
    $ tac -r -s '.\| RET
    ' prizes RET
    

To reverse the characters on each line, use rev.

  • To output `prizes' with the characters on each line reversed, type:

     
    $ rev prizes RET
    


[ << ] [ >> ]           [Top] [Contents] [Index] [ ? ]