$O /\$

SAS® Decommissioning

Issues and Strategies for Technology Transition When Decommissioning a SAS Installation

Decommissioning a SAS installation is not necessarily as simple as erasing the storage volumes and reassigning the servers. To the extent that an organization relies on SAS for recurring processing or as a repository for data, careful planning is required to create a smooth transition to replacement software and services.

The Larger Context

The expiration of a SAS installation is part of the plan from the beginning. When a SAS site is installed, the software license file shows an exact date when the license expires. The license can be replaced with a new license that has a later expiration, but eventually for any site, there comes a day when the license expires. Besides the SAS license, there can be other software licenses involved, most often in connection with a data warehouse that the site depends on for analytics data. Removing the data warehouse or a part of it can be the occasion that prompts a replacement platform for processes built in SAS.

Besides the SAS license, there can be other software licenses involved, most often in connection with a data warehouse that the site depends on for analytics data. Removing the data warehouse or a part of it can be the occasion that prompts a replacement platform for processes built in SAS.

There can be many other reasons why an organization might be decommissioning SAS. These can include the consolidation of operations after a merger or acquisition, a security overhaul that mandates a streamlined approach to enterprise software, the regular turnover of data center hardware, a data center migration to a new location, a gradual transition of users over time from SAS to other platforms, or staffing cuts that remove SAS talent from the organization. Even if the goal is simply to replace a SAS server with a newer and bigger SAS server, the same considerations and cautions apply.

The Planning Imperative

In a SAS decommissioning process, the same as in many other contexts, a smooth transition depends on the work that is done in advance. Planning is the key to success.

SAS is known for doing a great many things well. The functionality of SAS almost always expands beyond what is written in the plan that brought SAS into an organization. Therefore, past plans cannot be used as a guide to understand what functionality SAS is providing to the organization currently. The first step, then, is to find out what valuable and essential functionality is coming from the SAS platform so that those functions can be replaced.

In a typical large enterprise, the ideal time to start planning for a SAS decommissioning is two years before it happens. This allows plenty of time to determine the uses of SAS, plan adequate replacements, replace code with new code, and run replacement processes in parallel for a short period to ensure that the new results are consistent with the results that the SAS platform was providing.

If two years is ideal, it must also be said that any planning at all is better than none. If events allow for only one day of planning, for example, that may be sufficient to save all SAS code and disable job schedules to avoid the cascade of error conditions from failed jobs. One month of planning is enough to allow a bumpy kind of transition, in which affected business users can determine which processes will be unavailable and start to design workarounds for them. These are not ideal scenarios, but they are preferable to the crisis that is likely to result from an unplanned and uncontrolled SAS expiration event.

The Wisdom of Users

There is no substitute for the collective knowledge of the users in identifying the valuable functionality obtained from the SAS platform so that replacement sources and processes can be planned. Identifying the full set of users can be trickier than you would expect, though. One way to identify users is to look at accounts authorized to access the SAS platform or server and those that have actually accessed it in a recent period. In identifying active users a look back period of 399 days or 57 weeks may be necessary to pick up users who access SAS only for annual reporting.

In a list of active accounts, pay special attention to functional IDs. These accounts may belong to intermediate tools, with individual users able to access SAS processing by way of their access to those tools. Intermediate tools could include SAS Enterprise Guide and SAS Add-In for Microsoft Office.

Also consider users whose only connection to SAS is their use of the outputs of SAS processing. These outputs could be reports, database tables, or data feeds, to cite three common examples.

Users may know already which areas of SAS processing would be difficult or impossible to fully duplicate with any replacement tool. These outputs can be noted as potential trouble spots in the transition. Users can along the way what compromises in functionality are acceptable, and which compromises would result in adverse business consequences.

In the early stages of planning, users provide essential insight into the priorities of the transition. Later, users may need training in the replacement tools and may need time to develop replacements for their own code and processes in the replacement platforms.

Tracing From Scheduled Jobs

The most critical SAS deliverables are likely to be found by tracing from any job scheduler that references the SAS server. A job in a job scheduler typically references a shell script, so the commands in the script must be examined to find out what SAS code is being run in each script. Tracing from the scheduler to the SAS code shows what code is actually running on a regular schedule. Usually there are descriptions in both the scheduler and the code to explain the purpose of each job. The scheduler also shows the frequency of each job, which might be daily, weekly, or monthly, and might be contingent on the results of another job, such as a data load.

Details From the Code

The most important SAS code is likely to be found by tracing from the job scheduler, but there are other ways of tracing code. In best practices, code resides on the SAS server itself and in a source code repository such as Git, so both places can be scanned for SAS code files.

A caution, though, is that the body of SAS code may contain large volumes of inactive code. Tracing from the scheduler and obtaining the advice of users are the main ways of identifying actual active code.

While many people may have opinions and theories about what the code does, the only way to find the specifics is by source code analysts — reading the SAS code to identify the inputs, the outputs, and the nature of the processing. For any processing that will be converted to another platform, the SAS code can serve as the definitive guide to current functionality that is to be duplicated or imitated.

Once there is a list of SAS code, then it is possible to categorize it and deject replacement platforms. SAS code that provides an ETL function (extract, transform, and load), for example, might be replaced by comparable code in a dedicated ETL tool. SAS code that consists mostly of embedded SQL code might be moved to a database query tool. Or SAS code that consists primarily of analytics and reporting might be targeted for replacement with an analytics tool.

In the early stages of planning, the objective is to write a list of the code. The largest work body of work in a SAS decommissioning process is likely to be replacing SAS code with comparable functionality coded for another platform. For best compatibility, developers can directly compare the results of the new code with the results of the old code in order to identify gaps and narrow those gaps as much as possible.

Code Cleanup

The idea of code cleanup might be counterintuitive when the ultimate goal is to to shut down a platform, but when there is plenty of lead time, a SAS code cleanup can reduce the total cost of a transition. This kind of cleanup provides a substantial benefit in operating and maintaining processes on the SAS platform, then provides an even bigger benefit as processes are migrated to a new server or platform. There are three main areas to target in SAS code cleanup in advance of a possible mass code transition, and these apply even if the nature and timing of the transition are not yet known.

External file references. SAS programs reference the physical file names of external files and directories, such as input and output data feeds, SAS data libraries, and output documents. To reduce the cost of analysis, maintenance, and conversion, collect all physical file names in an easy-to-find place near the top of the code. Remove physical file names from all other parts of the code.

Redundant code. More so than in modular and object-oriented coding environments, SAS environments are prone to collect minor variations of the same code. For example, the program file to receive a U.S. data feed might be 98 percent identical to the program file to receive the corresponding U.K. data feed. As another example, the daily and monthly versions of the same report might be substantially the same code in two separate program files. Combine the duplicate program files where that is easily done and will reduce the burden of maintenance, or make notes about the places where code is duplicated. The result is a smaller body of code that must be replaced by new code for the new platform. The code cleanup in SAS is an easy step that can substantially reduce the scale of the conversion effort.

Nonstandard SQL code. SQL code embedded in SAS code is the part of the code that is most likely to be revised and migrated to another environment, but in the process, any SAS-only SQL operators and function names will have to be rewritten. Most of this work can be done in the SAS environment, without needing to wait to find out what the target environment might be. For example, replace the SAS-specific IS MISSING operator with the portable IS NULL operator, or replace certain uses of the FINDC function or equivalent functions in SAS with the portable CONTAINS operator.

Data Export

SAS code is also the best guide to the input data, outputs, and SAS data in SAS processing. It is rare that all of this is accurately documented, so it is necessary to analyze the SAS code to create lists of all data involved in SAS processing.

SAS code can create enormous numbers of intermediate data sets, which may or may not be of interest. Users and file system statistics can provide a guide to which files are used regularly. What looks like an intermediate file in one SAS program might be a required input for another SAS program, so it is necessary to cross-reference SAS code in this way to find out how one program depends on another.

Sometimes, though not always, an export of selected SAS data for use in a platform other than SAS. This requires custom SAS code to export the data to a documented data file format. Most commonly, the interface file is a CSV file.

The same issues found in a data export from any platform can also be found in SAS data. A few of the most common issues involve ID numbers, dates, and illegal text characters. ID numbers may need to be converted to text in a certain way as part of the export in order to be received correctly. Dates may need to be formatted according to standard date formats (such as ISO 8601) or in the preferred format of the target platform. “Illegal” text characters — whatever characters would cause errors in the target platform — are most likely to occur in data elements that contain online text. These characters are best removed, encoded, replaced, or masked in SAS prior to the export.

Data Deletion

Code analysis will have identified the location of SAS data. This is important to keep track of to ensure that any potentially sensitive SAS data that must be deleted in a timely manner after SAS decommissioning.

Sticking to the Timeline

Anything that needs to be obtained from SAS should be done far in advance of turning SAS off. The SAS code is the most important thing to have, but it is often the easiest thing to find and extract. When data exports are required, they must be completed while SAS is still functioning. Careful planning is required so that the any late data can be exported and verified during the transition period.

In a hypothetical transition timeline that calls for SAS processing to stop on June 30, a SAS code freeze might go into effect as of December 31 and final SAS code copied out. Testing in parallel between SAS and replacement platforms might be scheduled for January, February, and March, with the goal of removing SAS processes from the job scheduler, exporting the last SAS data (if needed) and cutting over to replacement platforms early in April. This implies that all replacement platforms are in place and verified as of January 1, and replacement code has been written and shown to be functional by that date. This schedule allows the months of April, May, and June for any late remediation in the replacement code or exported SAS data.

A shorter transition timeline might be chosen if budgets are smaller, or a longer period if a seamless transition is essential.

After the SAS server is withdrawn from service, one of the last questions is what to save from the server. In general, the SAS code should be preserved for as long as it is potentially relevant. Audit requirements might dictate preserving some SAS log files and other files that serve as a record of processing, but they will be retained only for a set period of time. A backup copy of the software installation is useful to have if a change of plans results in SAS services being restored. On the other hand, data security rules are likely to require that most or all SAS data is deleted.

Draft 2024-04-22

Global Statements

Books