When I created Global Statements Dictionary, the dictionary of SAS words and computer terms that appears on the Global Statements web site, I wrote a SAS program to create the initial empty web pages. The pages contained titles, headers, and each page’s links to the previous and next pages.
dalpha.sas: This program reads a large text sample and generates a list of the first two letters of words. This list determines the pages that are included in the dictionary.
dpage.sas: This program generates the HTML files for the web pages.
alpha.txt: This is an excerpt of a sample output file from dalpha.sas. This file would be manually edited and used as input to dpage.sas.
a.html (code): This is an example of the generated HTML code.
a.html (web page): This shows what the generated web page could look like with an appropriate CSS file for formatting.
Global Statements Dictionary was envisioned as a computer dictionary for
SAS programmers, combining computer terms of interest with the SAS words
that make up a major part of SAS programs. After I had planned the scope of
the dictionary, I designed its presentation, which included the decision to
break it into web pages based on the first two letters of each word. So, for
example, the entry for access appears on the AC page, which is
the file name ac.html
.
My next two questions were, what pages do I need, and what is an easy way to create the pages? The two SAS programs in this project provide answers to these questions.
The dictionary would only include pages for those two-letter sequences that form the beginnings of actual words. There would also be a page for every initial letter. Theoretically, you could create this list by first creating a list of all SAS words and all words used in discussions of SAS work. In practice, you could create a good approximation by extracting a word list from a large sample of carefully selected text. The ideal document for this purpose would be SAS-related writing that did not contain many abbreviated names. A well-chosen sample of about 1 million words could be sufficient to identify all the two-letter prefixes needed for the dictionary.
If the text sample is placed into a single text file,
the program dalpha.sas
reads the text and creates a list of the prefixes used for those words.
It places the list in the file alpha.txt
. After any manual additions and deletions,
this file can be used as input for the second program, dpage.sas
.
In actual practice, for my text data, I used the list of SAS words from
Professional SAS Programmer’s Pocket Reference and a very
small text sample of perhaps 20,000 words. I then attempted to fill in
any missing prefixes manually. The process I followed is
reconstructed in dalpha.sas
.
Ordinarily, a set of web pages is created from a template, a starting
document that contains the common elements of the pages. That
approach was insufficient for Global Statements Dictionary; in
particular, an automated method was needed to create the links from
each page to the preceding and following pages and from each letter page
to all the pages that begin with that letter. For example, the A page would
contain links to the AB, AC, AD, etc., pages; the AB page would
link back to A and forward to AC.
The program dpage.sas
writes pages that include these links.
HTML is ASCII text, so it is easy to write a data step that writes HTML code.
The challenge in dpage.sas
is in configuring the data
so that all the elements that appear on one page are brought together.
The PROC TRANSPOSE
step brings together the two-letter
prefixes associated with each initial letter. Three SET statements
are arranged slightly out of sync so that they identify the current,
preceding, and following pages.
The revised version of the program shown here is updated to reflect the HTML 4.0 formatting used in Global Statements Dictionary until 2005. It is also optimized somewhat, although the performance improvements are not important as the program runs in a matter of seconds.
The output of the program is a sequence of text files that contain HTML coding. The output is shown here both as HTML code and as the resulting web page reflecting the effects of the Global Statements style sheet. For a large-scale demonstration of the output, visit Global Statements Dictionary at http://www.globalstatements.com/d.