The End of the Line: Options

Rick Aster

Some text data files are predictable and easy to work with. Every field is in the same place in every record, or line, and every record is the same length. Others are not quite so simple. In these files, the lengths of the fields or of the records themselves can vary from one record to the next. This raises the possibility that you might reach the end of the record at a point where you do not expect it — perhaps right in the middle of the INPUT or PUT statement terms that process the record.

Fortunately, SAS has options that let you determine how the INPUT or PUT statement will respond to this situation. These are statement options you can use in the INFILE statement for an input text file or in the FILE statement for an output text file.

There are three essential alternatives: process the record as far as it goes, skipping over the part of the record that is not there; use the next record as a continuation; or create an error condition and bring the entire program to a stop.

Alternative 1: Right Up to the Edge

The easiest action to picture in this situation is to stop processing the record as soon as you get to the end of the record. When you are writing a record in a PUT statement, this would imply writing as many variables as can fit on the record, and just not writing the rest of the variables. This is the action you get when you use the DROPOVER option in the FILE statement.

With the DROPOVER option, the data that the PUT statement writes may change when the line size changes. With a longer line size, the PUT statement writes more variables in the record.

The corresponding option for an input file is MISSOVER. With this option in the INFILE statement, the INPUT statement assigns missing values to any variables it fails to find because it reaches the end of the record first. This is the appropriate option whenever the last field you look for in the record is not always there.

The TRUNCOVER option is a subtle variation on the MISSOVER option. Use the TRUNCOVER option when a record ends in a variable-length field — a field that is sometimes shorter than its maximum length. This option allows the INPUT statement to read the field at the end of the record, even if the field is shorter than expected. This may be necessary if you process a text data file with a text editor that removes trailing spaces.

Alternative 2: Over the Edge

Sometimes, a file is designed such that data continues on a second line whenever it does not fit on a single line. The FLOWOVER option, which you can use in either the INFILE or FILE statement, is designed for this kind of file. With this option, when the INPUT statement gets to the end of the line, it looks for the next variable at the beginning of the next line.

Similarly, with this option, the PUT statement uses as many lines as it needs to write all of its variables. If you change the line size of the file, that may change the number of lines a PUT statement uses.

For an input file, the new SCANOVER option is almost the same as the FLOWOVER option. It differs only in the way it treats the scanning pointer control. With the SCANOVER option, the scanning pointer control can cover as many records as it takes to find the text value it is scanning for. Use the SCANOVER option for input data that is divided into multiple records. For example, you might use the SCANOVER option if you are reading data from an XML file and using the scanning pointer control to find specific tags in the file.

Alternative 3: Crash

If a file and a program have been carefully designed to work together, reaching the end of a record unexpectedly might indicate a problem with the file. In that case, you might want to stop the program immediately, with error messages, so that you can find out what went wrong. That is the purpose of the STOPOVER option. For an output file, the STOPOVER option creates an error condition whenever there is not enough room to write all the variables in a record. For an input file, the STOPOVER option creates an error condition if there is a record that does not have every variable field in it.