Skip to content

Instantly share code, notes, and snippets.

@bixb0012
Last active July 31, 2019 17:47
Show Gist options
  • Select an option

  • Save bixb0012/1f0649fd725bbf12d4a11672b7920e41 to your computer and use it in GitHub Desktop.

Select an option

Save bixb0012/1f0649fd725bbf12d4a11672b7920e41 to your computer and use it in GitHub Desktop.
Python: Data-Stream Sampling
#!python
# Background https://www.springer.com/cda/content/document/cda_downloaddocument/9783540286073-c2.pdf
# Example 1 adapted from https://stackoverflow.com/questions/12581437/python-random-sample-with-a-generator-iterable-iterator/
#
# Reference: 1) https://docs.python.org/3/library/functions.html
# 2) https://docs.python.org/3/library/itertools.html
# 3) https://docs.python.org/3/library/random.html
from itertools import islice
from random import randint
stream = # Data stream that implements __iter__
sample_size = # Size of sample to select from data stream
# Example 1: Reservoir Sampling
iterator = iter(stream)
sample = list(islice(iterator, sample_size))
for i, v in enumerate(iterator, sample_size):
r = randint(0, i)
if r < sample_size:
sample[r] = v
#
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment