I read SAS

… and sometimes I dream it!

Friday Quiz

I came across the following teaser through LinkedIn, from Amadeus Software, a UK-based SAS Consultancy:

Imagine you work in a department that collects a large volume of data each month.

To analyse the data and feed valuable insight back to your business, you run an analysis procedure which requires that all the data from the current year are available in a single table.

There are no issues in the first few months of the year, however your business is growing and ever more data are accumulated throughout the year.  Recently you have found the cumulative data set is too big for the disk you have available.

A colleague mentioned that an Amadeus Consultant (naturally) had recommended considering something called a SAS View.

The following program illustrates how you have been appending the data together (albeit a very small sample of the problem!).

* Build the problem;
%macro month;
%do m = 1 %to 12;
data mon&m.;
  do customer = 1 to 20;
%mend month;

* This works, but is really slow to build and 
  takes up too much disk space;
data all_months;
set  mon:;

Modify the data step so that ALL_MONTHS requires negligible disk space.  Given both a View and Table of the same name cannot exist in the same library, extend the code so that it defends itself against this problem.

Finally; what are the advantages and disadvantages of your solution?

Always one for some Friday fun, I propose the solution below:

/* User parameters */
%let lib = WORK;
%let dsname = ALL_MONTHS;
/* Message format for log feedback */
proc format;
   value dsexist
      low – 0 = ‘Nothing to drop’
      1 = “Table &lib..&dsname will be dropped.”
      2 = “View &lib..&dsname will be dropped.”
      3 – high = “Unable to drop &lib..&dsname because it is not a table or view”
/* Check if dataset exists */
/* Note: considered using a metadata view such  */
/* as dictionary.tables, or sashelp.vtable, but */
/* this was less performant, especially as the  */
/* session referenced remote libraries          */
data _null_;
   array memtypes(4) $ (‘DATA’, ‘VIEW’, ‘CATALOG’, ‘ACCESS’);
   _rc = 0;
   /* See if the dataset exists and find its type */
   do i = 1 to dim(memtypes);
      _rc + (2 ** (i – 1)) * exist(“&lib..&dsname”, memtypes(i));
   /* Generate sql drop expression to execute in next step */
   if _rc in (1, 2) then  
      call symput(‘sql_drop’,
         ‘drop ‘ ||
         ifc(_rc eq 1, ‘table’, ‘view’) ||
         ” &lib..&dsname”);
      call symput(‘sql_drop’, ”);
   /* Store an error code to a macro variable */
   call symput(‘err_code’, compress(ifn(_rc gt 2, 1, 0)));
   /* Put a note on the log */
   put ‘NOTE: ‘ _rc dsexist.;
/* This will execute the sql in sql_drop or do */
/* nothing if sql_drop is an empty string      */
proc sql noprint;
/* Use %sysfunc() to check for an error code       */
/* If the error code is 1 – an error has occured   */
/* and the data step becomes:                      */
/* —                                             */
/* data _null_;                                    */
/*    stop;                                        */    
/* run;                                            */
/* —                                             */
/* else the data step creates a view :             */
/* —                                             */
/* data &lib..&dsname / view=&lib..&dsname;        */
/*    set  mon:;                                   */
/* run;                                            */
/* —                                             */
data           %sysfunc(ifc(“&err_code” = “1”,
         &lib..&dsname / view=&lib..&dsname   
               %sysfunc(ifc(“&err_code” = “1”,
         set  mon:  

Source: A view to a kill

At the risk of going overboard my justification for this approach and considerations involved in coming to it include:

General considerations:

– views cannot be executed on a different platform to the one on which they are created: in a client server architecture a Windows client couldn’t access a Unix view created on a Linux sever, for example (a remote session running on the server would be able to access the view since the remote session would be running on the same platform;

– a view reflects the state of the data at run time, rather than at build time – is this desirable;

– a view is reliant on the stability and availability of the mon: datasets on which it depends;

– a view saves disk space, but is likely to increase the amount of processing required of the system since the code must run each time the view is accessed;

– could the storage or perfomance of the input datasets be improved – could they in turn be a view, could they be aggregated, is their page size optimized, should they be indexed or sorted;

– as usage of a view becomes business as usual consider reviewing its use

   – if users always summarize or sort the view, consider adapting the view to reflect this, or present a view (or table) that produces an aggregation of the data;

   – if joining/merging, rather than concatenation, is required, or is a next step then consider adding this into the view, a hash table lookup may be easiest;

– if the mon: datasets are not in the work library consider defining the source library within the DATA Step view:

   data <out> / view=<out>;

      _rc = libname(<lib>, <path>, <options);

      do until (_done);

         set <lib>.mon: end=_done;




   if we really wanted to we could

      > test for the existence of source libref using the libref() function,

      > capture the path() referenced by any such library,

      > clear the libref – eg use libref() again

      > assign the libref we want using libname()

      > add an end= variable, as above, either test for this using an if or do until loop

      > clear libref and reassign orginal library spec when end is reached


Specific considerations:

– I found using the exist() function to be quicker than scanning dictionary.tables or sashelp.vtable, validate in target environment – likely to be the case, especially where a large number of remote libraries is defined in an autoexec, etc;

– the DATA Step view could be made more complex for performance reasons, e.g. a macro generated code with DoW loops, but there is a cost attached to making it harder to read and maintain;

– the complexity in the early stages of the code allows it to be more robust, and should avoid the appearance of ERRORs in the log, as would be seen with something like:

      proc sql; drop table all_months; quit;

      proc sql; drop view all_months; quit;

– the format is probably overkill and could be replaced with conditional logic – I think the underlying idea of giving legible feedback is sound;

– the SQL Proc and the %sysfunc DATA Step both demonstrate different ways of defensively coding SAS to do nothing;

– limited parameterization by macro variables gives some flexibility and reusability, but could be taken further by exposing more parameters (eg input ds name, number of input ds, etc) or wrapping in a macro;

– the memtypes array gives more scalability than hard coding conditional logic, a format may have been more justified here;

Phew! In the words of Oscar Wilde, Winston Churchill or somesuch wit, “I’m sorry for not writing a shorter response, but I didn’t have the time.”

I look forward to seeing other solutions (go on, send yours in!) once the deadline has expired, but would be interested in your thoughts: how would you tackle this differently, what did I miss, what could I have done better?

SGF 2011

You may have noticed that I’ve not been posting as often as you/I might have hoped. I blame the pressure of work. Thanks, though to a young prince and his princess-to-be, I have the leisure of a double-bank-holiday weekend, …, and I noticed that SGF 2011 papers are available to read online. How I missed this I’m still not sure, but they’re there, or here.

For someone who reads SAS, this is ambrosia. I’m just off to read some papers, I may be a while …, but if you’ve got some favourites let me know!

Inserting a row

I came across an interesting discussion in a LinkedIn SAS group about how to insert a row of data into an existing dataset before row n. Apparently the question is a typical interview question, and I can see that it could provoke discussion about point,  sorting,  setting.

Some of the respondents felt that it was a bad question – it may well be, but the scenario is not so implausible that it is worth disregarding. I stumbled across a gotcha in formulating a solution using hash tables.

I decided that I would use sashelp.shoes and would create a subset of records marked with the row before which it should be inserted; because the before value would be the key in a hash table I wanted each to be distinct.

I first wrote splitter code as:

data myshoes(drop=_:) allshoes(drop=before _:);

length before 8;
declare hash ins();
_rc = ins.definekey(‘before’);
_rc = ins.definedone();
set sashelp.shoes;
insert = 1;
call missing(_i_);
if (ranuni(-1) lt 0.1) then do;

do _i_ = 1 by 1;

before = int(ranuni(-1)*300) + 1;
_rc = ins.check(key:before);
if (_rc ne 0) then leave;

_rc = ins.add();
output myshoes;

insert = 0;
output allshoes;


This looked right, but I kept getting duplicates in the output myshoes. Why when I was using a hash table to store the values I’d already used? Then it dawned on me that the loop structure of the DATA Step meant that before each record was read from sashelp.shoes the hash table was being re-initialized. Doh!

Read more of this post

Informats: overlapping ranges

Running this code below gave me error info: the two ranges overlap.

proc format;
   invalue xxx

But if I substituted any number value say -1000 for “low”, then everything seemed find.

Could you explain why? Thanks a lot.

Source: Post to SAS-L

Ref: Proc Format


An informat converts character data (a text string) to a number. Informats are widely used when processing text files, especially when converting date or date time literals to SAS’s date form. The input() function is used to return the result of applying a specified informat to a given input:

data _null_;
   mydate = input('20JAN2011', date9.);
   /* Numeric value */
   put mydate=;
   /* Formatted value */
   put mydate= :date9.;

will write the following output:


Response to post


HIGHT is not a proc format keyword.

Notes: Proc Format allows you to specify a range of values. In addition to explicit values a range could end with high, such a range would encompass any value higher than the range’s starting value. The minimum low works in a similar fashion.


I agree about your spelling issue. And, I just learned something: I cannot recalling seeing the use of _same_ before now.

Notes: According to the SAS online documentation

_same_ prevents the informat from converting the raw data as any other value. For example, the following GROUP. informat converts values 01 through 20 and assigns the numbers 1 through 20 as the result. All other values are assigned a missing value.


But [snip] changing to “-1000- highT=_same_”, as below, will not lead [to] the “overlap” error?

proc format;
        invalue xxx
                -1000-highT = _same_
                "N" = .;

_same_ I learned from Cody’s, he shows how to read mixed types of values for one variable.

Notes: Here, “N” is outside of the range -1000-high, so the error does not arise.


I would guess that hight evaluates to ‘missing’, which happens to also be numerically a very low number. “N” is also numeric missing (this is not a character informat). So you have (smallest possible number) -> (missing, which is pretty low) = _SAME_ “N” (missing) = something else.

However, if you now structure it as -1000 -> missing (which is not higher than -1000) = _SAME_ “N” (missing) = .

I hypothesize that SAS allows -1000 – . to be effectively -1000 to -1000, and thus it’s not overlapping.

v) [in response to iii]

I, too, am interested in finding an explanation. The proc apparently isn’t treating ‘hight’ in the way that I thought it might. For example, the following produces the same acceptable result and I, for one, have no idea why it does:

 proc format; 
   invalue xxx 

vi) [in response to iv]

Sounds quite reasonable. The following works as expected unless you uncomment the line that tries to recode -1000 to 5:

proc format;
           invalue xxx
      /* -1000=5 */

Notes: Here the question “does SAS treat a range that terminates in an unknown variable as a point range” is tested. A higher and lower value than the point in question are tested, as well as the mapping for “N” and an arbitrary value. The results do not raise an overlap error, although uncommenting the fifth line will reproduce the error.


Create an CNTLOUT= data set and examine the values of start and end.

Notes: The cntlout= option of the Proc Format statement directs SAS to generate a dataset (with a name specified after the equals sign) which includes the parameters SAS has used to construct the format. If the hypothesis about point ranges is correct, we would expect to see an observation in the output from cntlout which has  equal start and end values.

I tried this, using the format provided in (iii), and the first observation produced has start and end values of -1000, confirming the hypothesis suggested in (iv).

I read SAS

I read SASWelcome to my first post on ireadsas.wordpress.com.

Please check out the page about ireadsas first: it outlines the vision, values, and ground rules for the site.

In particular your attention is drawn to the disclaimer.

Until there is more content on the site, hopefully you can use the page to help you guess whether this blog might be for you.

If you are interested, then links to the right of this page should allow you to subscribe to the blog, either by using the RSS feed, or by signing up to receive an email when new content is posted.

Specifically, if you’ve come here looking for a blog about Scandinavian airlines, lost luggage and flight delays you may be disappointed. Similarly you may find that the British special forces, their covert operations and weapon systems feature not at all.

Whatever your particular interests, please feel free to browse the site and comment on what you see – I’d like to know what you think and will try to use constructive criticism to improve the site.