HTML As a Document Format

Rick Aster

You’ve probably heard that HTML is the page-description language of the World Wide Web — the language that most Web pages are written in. Now there is another reason to be interested in HTML. In ODS, the Output Delivery System introduced in SAS version 7, HTML is the file format of formatted output document files from SAS procs and data steps.

Using HTML

The most obvious use of the HTML destination of ODS is to produce Web pages of SAS output. These web pages can be formatted with tables, fonts, colors, graphics, and links. Less obvious, but probably more important for most users, is the possibility of using the ODS HTML output to put SAS output into other kinds of documents, such as word processing documents. Because HTML is the only formatted document file format that ODS produces in version 7 and the most versatile markup file format supported in more recent versions of SAS, you’ll want to create output in HTML format, regardless of the kind of document you ultimately want to put the SAS output into.

Many years ago, Rhena Seidman and I wrote a chapter in the SAS Institute book Reporting From the Field that describes an elaborate process for producing Microsoft Word tables from SAS data. Even using a simplified example, it took us 17 pages to describe what you had to do. I'm happy to say that process is no longer necessary. Instead, you can use the PRINT proc (or any other proc) and ODS options to create an HTML file containing the table. You can open that HTML file directly in many document-oriented programs, including Microsoft Word, apply any necessary formatting, and save it as a native document. Putting SAS output tables into a word processing document is now just as easy as creating an old-fashioned "printout" directly from the SAS session.

If you need to produce the document in another application that cannot read HTML files directly, you can, in all likelihood, copy and paste the table from the ODS HTML output to the other application. Although not quite as direct, this is still much easier than what was possible in the past.

A Very Brief Introduction to HTML

If you are using HTML for formatted output from SAS programs, you may occasionally want to open up the files and deal with the HTML code directly, either to make changes in it or to extract a part of a file. Fortunately, you only need to know a little about HTML to do this.

HTML is a markup language. That simply means that an HTML file contains text amid tags that describe how the text is used. An HTML tag contains a code, such as the p for paragraph, between angle brackets, like this: <p>. A tag marks the beginning on an HTML object. A tag can also include options, in the same form of syntax as statement options in a SAS program. For example, <p align=center> is a paragraph tag with an option that indicates that the paragraph should be centered. Most objects also have end tags, a tag that marks the end of the object. The code is the same, but it is preceded by a slash, like this: </p>.

The two essential objects in any HTML file are the head and body. The head is found at the beginning of the file and describes the page in general. It contains the title of the page, which displays in the title bar when the page is displayed. The body follows the head and contains the contents of the web page. Within the body, you’ll probably find a table, which usually contains the real substance of the file. The tag codes for these objects are head, title, body, and table, so they’re easy to remember. The tag for a table row is <tr>, and the cells of a table are usually <td> objects. Everything is contained in one big <html> object. So the overall structure of the file is something like this:

. . . other head objects . . .
<body . . . page formatting options . . .>
. . .
<table . . . table formatting options . . .>
<tr . . . row formatting options . . . >
<td . . . cell formatting options . . . >Text of first cell
<td . . . cell formatting options . . . >Text of second cell
. . . more cells . . .
. . . more rows . . .
. . .

As long as you pay attention to the tags along with the text, you’ll find that it’s easy to make simple corrections directly in the HTML file, such as correcting the spelling of a word, deleting a footnote line, or changing the order of the rows in a table.