Often it is useful to have access to the median value for fields of a data stream since they are more robust with respect to outliers. The median is defined as the value of a dataset such that, when sorted, 50% of the data is smaller than the value and 50% of the data is larger then the value. Ordinarily this is difficult to calculate on a stream because it requires the collection and sorting of all data.
The median of a data stream can be approximated with a technique called stochastic averaging. To approximate the median value of a data stream one could use the following approach:
Given the current estimate of the median M. If the next observed value in the stream is larger than M, increase the current estimate by r (= the learning rate). If it is smaller, decrease the estimate by r. When M is close to the median, it increases as often as it decreases, and therefore it stabilizes.
This approach was taken from the book "Real-time Analytics -