Наши партнеры








Книги по Linux (с отзывами читателей)

Библиотека сайта rus-linux.net

5.1. Markup: A General Overview

A markup language is a system for marking or tagging a document to define the structure of the document. You may add tags to your document to define which parts of your document are paragraphs, titles, sections, glossary items (the list goes on!). There are many markup languages in use today. XHTML and HTML will be familiar to those who author web documents. The LDP uses a markup language known as DocBook. Each of these markup languages uses its own "controlled vocabulary" to describe documents. For example: in XHTML a paragraph would be marked up with the tagset <p></p> while in DocBook a paragraph would be marked up with <para></para>. The tagsets are defined in a quasi dictionary known as a Document Type Definition (DTD).

Markup languages also follow a set of rules on how a document can be assembled. The rules are either SGML (Standard Generalized Markup Language) or XML (eXtensible Markup Language). These rules are essentially the "grammar" of a document's markup. SGML and XML are very similiar. XML is a sub-set of SGML, but XML requires more precise use of the tags when marking up a document. The LDP accepts both SGML and XML documents, but prefers XML.

There are three components to an XML/SGML document which is read by a person.

  • Content. As a TLDP author it is good to remember that this is the most important piece. Many authors will write the content first and add their markup later. Content may include both plain text and graphics. This is the only part that is required of LDP authors!

  • Markup. To describe the structure of a document a controlled vocabulary is added on top of the content. It is used to distinguish different kinds of content: paragraphs, lists, tables, warnings (and so on). The markup must also conform to either SGML or XML rules. If you are not comfortable adding markup to your document, a TLDP volunteer will do it for you.

  • Transformation. Finally the document is transformed from DocBook to PDF, HTML, PostScript for display in digital or paper form. This transformation is controlled through the Document Style Semantics and Specification Language (DSSSL). The DSSSL tells the program doing the transformation how to convert the raw markup into something that a human can read. The LDP uses a series of scripts to automate these transformations. You are not required to transform your own documents.

NoteContent, markup and transformations
 

Steve Champeon does a great job of explaining how content, markup languages, and transformations all fit together in his article The Secret Life of Markup. Although he is writing from an HTML perspective, the ideas are relevant and there is an example of DocBook markup.