reservoirSampling<T> function

List<T> reservoirSampling<T>(
  1. Iterable<T> items,
  2. int N,
  3. Random random
)

Picks N unique elements from an Iterable using reservoir sampling. should be O(N).

'unique' meaning no single position will be selected more than once. If the incoming collection has repeated values, then you may have repeated values in the sample.

The basic algorithm is described here https://en.wikipedia.org/wiki/Reservoir_sampling

One change in this implementation is shuffling the results before returning. If the sample is relatively big compared to the # of items, then order within the sample is almost identical to the original list -- by shuffling the result the result will look less ordered.

for example,

   picking 5 items from [1,2,3,4,5,6,7,8,9]
   could end up like    [1,6,3,4,5]

Implementation

List<T> reservoirSampling<T>(Iterable<T> items, int N, Random random) {
  final List<T> reservoir = List<T>.filled(N, items.first);

  var ind = 0;
  final it = items.iterator;
  // first N items, just add to reservoir
  while (it.moveNext() && ind < N) {
    reservoir[ind] = it.current;
    ind++;
  }
  while (it.moveNext()) {
    // pick random index from 0 to ind
    final j = random.nextInt(ind + 1);
    // if randomly picked index is smaller than N,
    // then replace the element present at the index
    // with new element from iterable
    if (j < N) {
      reservoir[j] = it.current;
    }
    ind++;
  }
  // if N is relatively 'big' compared to the size of the collection, then
  // the items are practically in-order as they are in the list.
  reservoir.shuffle(random);
  return reservoir;
}