Stripping Non-ASCII Characters within Macro

Hit once more with a pesky en-dash issue (likely related to the transcoding between SAS & SQL Server) I discovered today there was no ‘in-built’ way to remove non-ascii (or extended-ascii) characters within SAS.

There is a great SUGI paper about this topic (here) but the approach required the use of a data step.  Let me save you some fiddling around if you need this as a macro capability, with the extract below.

  %macro ascii();
    %local i asciichars;
    /_ adjust here to include any additional chars _/
    %do i=32 %to 126;
      %let asciichars=&asciichars%qsysfunc (byte(&i));
    %end ;
    %str (&asciichars)
  %mend;

  /_ store in macvar for efficiency _/
  %let ascii*chars=%ascii();
  %put &=ascii_chars;

  /**
   * Example usage within macro language
   */
  %put %sysfunc(compress(my – endash,&ascii_chars,k ));

  /**
   * Example usage within data step
   */
  data \_null*;
    str="goodbye •–—˜™š›œžŸ ¡¢£¤¥¦§¨©ª«¬­®¯°±²³´µ¶· nasties" ;
    asciichars=symget('ascii_chars' );
    out=compress(str,asciichars,'k' );
    put out=;
  run;

The main gotchas were as follows:

  • The characters in byte(3,4,5,12,13) do funny things in macro (open code recursion etc)
  • It is not advisable to reference rank() above 127 as this extended set can vary country to country (the byte # may not be the same as the rank #)
  • The 32-126 range includes apostrophe and single quote, and thus they need to be handled appropriately!