The Character Set Project

Displaying Character Encoding Tables

Rick Aster

You can use a short data step program to generate a table displaying a character set. The programs presented here create character encoding tables for the ASCII and EBCDIC character sets.


Program Files

asciitable.sas: This program generates all 256 byte values and uses them to create a character encoding table that shows how SAS understands the ASCII character set in your SAS environment.

ebcdictable.sas: Virtually the same program with “ASCII” changed to EBCDIC, this program generates a character encoding table that shows how SAS understands the EBCDIC character set in your SAS environment.

Output Files

ASCII character encoding table: This is an example of output for asciitable.sas. Your output could differ depending on your environment and settings.

EBCDIC character encoding table: This is an example of output for ebcdictable.sas. Your output could differ depending on your environment and settings.


Description

A character set converts digital values to specific characters so that text can be displayed. The character encoding tables generated by these programs show the character sets with their corresponding hexadecimal values.

The structure of the two programs is identical. The data step demonstrates the looping logic that can be used to create a table. The first DO loop creates a header row for the table. The two nested DO loops create the data cells that appear in the table. In this example, the table is stored in a SAS dataset, but it could just as easily be written directly to an ODS table using the FILE and PUT statements in place of the OUTPUT statement.

The PROC REPORT step uses the NOHEADER option to omit the variable names from the output. With no COLUMN or DEFINE statements, the default is to show all variables from the SAS dataset as table columns.

The output table for each encoding relates hexadecimal values to characters. To form the hexadecimal value, combine the row digit with the column digit. For example, row E, column 4, shows the character for the hexadecimal value E4.

In one potentially confusing point, the EBCDIC logical not character, ¬, has been rendered as the ASCII caret character ^.

The output includes characters that are not part of the formal definition of the ASCII and EBCDIC character sets. These “extended” characters, especially those in rows 8 and 9 of the ASCII table and rows 0–3 of the EBCDIC table, may differ in your environment and are likely to change when you take text data from one program to another or even when you print what you see on the screen. The ASCII definition does not include any of the characters shown beyond hexadecimal 7E. Control characters, which are part of each character set, might not display at all or might display in various ways depending on the program you use to look at the text output. I have removed some control characters and replaced others with spaces in the output shown here, but there is still a significant chance that the output will display or print incorrectly because of the non-ASCII characters included. To see the actual output, run the SAS programs in your SAS environment.