Reproducible random number generation in multiple SAS data steps

I am struggling to figure out the best way to generate random numbers reproducibly using multiple SAS data steps. To do it in one data step is straightfoward: just use CALL STREAMINIT at the start of the data step. However, if I then use a second data step, I can't figure out any way to continue the sequence of random numbers. If I don't use CALL STREAMINIT at all in the second data step, then the random numbers in the second data step are not reproducible. If I use CALL STREAMINIT with the same seed, I get the same random numbers as in the first data step. The only think I can think of is to use CALL STREAMINIT with a different seed in each data step. Somehow that seems less satisfactory to me than using just one long random number sequence starting with the firs data step. So for example I could do something like this:

%macro myrandom; %do i = 1 %to 10; data dataset&i; call streaminit(&i); [do stuff involving random numbers] run; %end; %mend; 

But somehow using a predictable sequence of seeds seems like cheating. Should I be worried about that? Is that actually a perfectly acceptable way of doing it, or is there a better way?

asked Dec 14, 2016 at 15:46 Adam Jacobs Adam Jacobs 413 5 5 silver badges 16 16 bronze badges

3 Answers 3

Here is my attempt at this:

%macro dataset_rand(_num,_rows); data dataset; do i = 0 to &_rows - 1; call streaminit(123); c = rand("UNIFORM"); varnum = mod(i,&_num.) +1; output; end; run; data %do i = 1 %to &_num.; dataset&i. %end; ; set dataset; %do j = 1 %to &_num; if varnum = &j. then output dataset&j.; %end; run; %mend; %dataset_rand(10,100); 

Here I ran one step to create every single row with a single random variable and another variable that will be used to assign it to a dataset.

input is _num and _rows, which allow you to chose how many rows total and how many tables, so the example (10,100) creates 10 tables of 10 rows. With dataset1 holding the 1st, 11th . 91st member of the random sequence.

That said I don't know of any reason why 10 datasets with 10 seeds, would be any better or worse than 1 dataset with 1 seed split into 10.

answered Dec 14, 2016 at 17:00 William Hudson William Hudson 11 2 2 bronze badges

"That said I don't know of any reason why 10 datasets with 10 seeds, would be any better or worse than 1 dataset with 1 seed split into 10." Maybe it isn't. Maybe the idea I had in the first place of just using a new seed each time is as good as anything else. I was wondering if I'd maybe overlooked something simple, but it doesn't look like I have. Thank you.

Commented Dec 16, 2016 at 15:06

Using RANUNI or similar (the 'old' random number streams), you would use call ranuni to accomplish this. This lets you save the seed for the next round, and then you could call symputx that value to the next datastep and re-start the same stream. That's because the output value for one pseudorandom value is a direct variation on the seed for the next in that algorithm.

However, using RAND , the seed is more complicated (it's not really just one value, after the first number was called). From the documentation:

The RAND function is started with a single seed. However, the state of the process cannot be captured by a single seed. You cannot stop and restart the generator from its stopping point.

This is of course a simplification (obviously SAS is capable of doing so, it just doesn't open up the right hooks for you to do so, presumably as it's not as straightforward as call ranuni is).

What you can do, though, is use the macro language, depending on exactly what you're trying to do. Using %syscall and %sysfunc , you can get a single stream that goes across data steps.

However, one caveat: it doesn't look like you can ever reset it. From documentation on Seed Values:

When the RANUNI function is called through the macro language by using %SYSFUNC, one pseudo-random number stream is created. You cannot change the seed value unless you close SAS and start a new SAS session. The %SYSFUNC macro produces the same pseudo-random number stream as the DATA steps that generated the data sets A, B, and C for the first macro invocation only. Any subsequent macro calls produce a continuation of the single stream.

This is specific to the ranuni family, but it looks like it is also true for thhe rand family.

So, start up a new session of SAS, and run this:

%macro get_rands(seed=0, n=, var=, randtype=Uniform, randargs=); %local i; %syscall streaminit(seed); %do i = 1 %to &n; &var. = %sysfunc(rand(&randtype. &randargs.)); output; %end; %mend get_rands; data first; %get_rands(seed=7,n=10,var=x); run; data second; %get_rands(n=10,var=x); run; data whole; call streaminit(7); do _i = 1 to 20; x = rand('Uniform'); output; end; run; 

But don't make the mistake of running it twice in one session.

Otherwise, your best bet is to generate your random numbers once, then use them in multiple data steps. If you use BY groups, it's easy to manage things this way. If you have specific questions how to implement your project in this way, let us know in a new question.