A Program to Compare Two SAS Format Catalogs & The SAS Data Set Characterization Utility by Michael A. Raithel
Time & Location
A Program to Compare Two SAS Format Catalogs
SAS programming professionals are sometimes faced with the task of determining the differences between two SAS format catalogs. Perhaps they received an updated format catalog from a collaborating organization; or maybe a colleague updated a format catalog to reflect changes in the underlying data. Either way; how can programmers tell which catalog entries and value/label pairs have been modified? If the two catalogs being compared are relatively small, then the tried-and-true method of outputting each of them via the FMTLIB option of PROC CATALOG and then manually comparing the listings may suffice. But, this method is laborious and error-prone when there are a large number of formats and format value/label pairs.
This paper presents a SAS program that compares two SAS format catalogs and reports the differences between them. It identifies mismatches in the format name, start value, end value, and label between the two catalogs being compared. Because the comparisons are done programmatically, this method eliminates tedious manual reviews and directly identifies all differences. Readers can immediately begin using this program to compare their own SAS format catalogs.
The SAS Data Set Characterization Utility
Most SAS programmers reach for two tools when they first receive a new SAS data set: PROC CONTENTS and PROC MEANS. They use PROC CONTENTS to review the data set’s metadata; the physical attributes of the variables such as name, label, type, and length. They use PROC MEANS to determine the basic arithmetic characteristics of the numerical variables, such as the minimum, maximum, and mean values. Doing this involves running two different SAS procedures, combing through two separate SAS-generated reports, and correlating the information about specific variables between the disparate reports.
The SAS Data Set Characterization Utility generates a single report file that contains the best of both the CONTENTS and the MEANS procedures. It produces an Excel file with a single row of consolidated metrics for each variable found in the SAS data set. The variable’s metrics include its key metadata attributes and—for numeric variables—its basic statistical properties. Additionally, the report contains the number of missing values for character and for numeric variables. Consequently, SAS programmers can utilize this utility to determine both the composition and the characteristics of a new SAS data set from a single amalgamated report.