I was watching Gary Fredericks’ talk at Clojure West on splitting streams of random numbers. He has a solution to the problem that works for him. I will do something a little more general, and show along the line that every random stream of objects can be split including what he calls linear random streams. There is no need to fiddle with the PRNG’s to make them split-able.
As per his talk, random sequence here means deterministic pseudo-random sequence: given an initial seed, the pseudo-random sequence will be the same as long as one uses the same PRNG.
Given a stream of objects stream
, a split (the
mathematical name for the operation is `de-shuffle’ by the way) would
create two streams stream-1
and stream-2
which
are mutually disjoint and ordered. This means first that objects in both
stream-1
and stream-2
are chosen from
stream
by preserving their order. In other words, if
a
appears before b
in stream
and
if both appear in the same stream stream-i
then
a
should appear before b
also in
stream-i
. Mutual disjoint-ness condition states that there
is no overlap in stream-1
and stream-2
in
terms of the indices of the objects chosen: every object x
in stream
is either in stream-1
or in
stream-2
but not both.
I will use stateful random number generators. I will need the state later for splitting.
(defun make-a-number-stream (seed m)
(cons (seed-random-state seed)
(lambda (rng) (values (random m rng) rng))))
MAKE-A-NUMBER-STREAM
Here is the take
function:
(defun take (stream n)
(loop repeat n collect (funcall (cdr stream) (car stream))))
TAKE
Let me test it:
(let ((str (make-a-number-stream 0 15)))
(take str 20))
(12 5 0 3 11 3 7 9 3 5 2 4 7 6 8 8 12 10 1 6)
If I call the same random stream (another stream created with the same seed) I would get the same sequence:
(let ((str (make-a-number-stream 0 15)))
(take str 20))
(12 5 0 3 11 3 7 9 3 5 2 4 7 6 8 8 12 10 1 6)
but the results would be different if I use a different seed
(let ((str (make-a-number-stream 1 15)))
(take str 20))
(5 11 12 8 9 11 5 0 0 1 12 7 13 12 6 9 2 4 14 5)
In order to split a stream, I am going to need an additional random stream of binary values of 0’s and 1’s. Then the original stream will be split into two using this new stream:
(defun filter (str seed val)
(let* ((rng (seed-random-state seed))
(cut (lambda () (= val (random 2 rng)))))
(cons (seed-random-state (car str))
(lambda (rng)
(do ((x (funcall (cdr str) rng)
(funcall (cdr str) rng)))
((funcall cut) x))))))
FILTER
(defun split-stream (stream seed)
(values
(filter stream seed 1)
(filter stream seed 0)))
SPLIT-STREAM
Remember the last example of the previous section? Let us split that. Aaand, voila!
(let ((str (make-a-number-stream 1 15)))
(multiple-value-bind (left right)
(split-stream str 10)
(list (take left 10) (take right 10))))
((5 11 8 11 5 0 1 7 13 9) (12 9 0 12 12 6 2 4 14 5))
Garry explained to me that performance-wise what I did is horrible, and does not scale at all for the applications he has in mind. It is essentially a filtering procedure, and after say \(n\) split, each split stream would be \(2^n\) times slower than the original which, as anyone can imagine, is bad. But then again, what I did is very general and that level of generality comes with a price. This reminds me an Alan Perlis quote “LISP programmers know the value of everything and the cost of nothing.”