Hit once more with a pesky en-dash issue (likely related to the transcoding between SAS & SQL Server) I discovered today there was no ‘in-built’ way to remove non-ascii (or extended-ascii) characters within SAS.

There is a great SUGI paper about this topic (here) but the approach required the use of a data step.  Let me save you some fiddling around if you need this as a macro capability, with the extract below.

  %macro ascii();
    %local i asciichars;
    /* adjust here to include any additional chars */
    %do i=32 %to 126;
      %letasciichars=&asciichars%qsysfunc(byte(&i));
    %end;
    %str(&asciichars)
  %mend;

  /* store in macvar for efficiency */
  %let ascii_chars=%ascii();
  %put &=ascii_chars;
  /**
   * Example usage within macro language
   */
  %put %sysfunc(compress(my – endash,&ascii_chars,k ));
  /**
   * Example usage within data step
   */
  data _null_;
    str=“goodbye •–—˜™š›œžŸ ¡¢£¤¥¦§¨©ª«¬­®¯°±²³´µ¶· nasties”;
    asciichars=symget(‘ascii_chars’);
    out=compress(str,asciichars,‘k’);
    put out=;
  run;

The main gotchas were as follows:

  • The characters in byte(3,4,5,12,13) do funny things in macro (open code recursion etc)
  • It is not advisable to reference rank() above 127 as this extended set can vary country to country (the byte # may not be the same as the rank #)
  • The 32-126 range includes apostrophe and single quote, and thus they need to be handled appropriately!