bgvo/skiping

Last active August 29, 2015 14:08

Star (0) You must be signed in to star a gist
Fork (0) You must be signed in to fork a gist

Learn more about clone URLs
Clone this repository at <script src="https://gist.github.com/bgvo/0bc834251d3dcbcf588a.js"></script>
Save bgvo/0bc834251d3dcbcf588a to your computer and use it in GitHub Desktop.

Download ZIP

Getting incrementing ranges of data (over a million of docs)

Raw

skiping

	olds = Item.order_by(id: 1).skip(new_batch*10000).limit(10000).not_in(id: set)

	olds.each do \|doc\|
	...
	end

kuadrosx commented Oct 31, 2014

using skip is a bad idea in any database, if you have a time field you should use it to paginate also $nin can be slow specially if "set" is big so maybe you should add a a field to know if the Item was processed

olds = Item.where(processed: false, :t.gt => last_time).order_by(t: 1)

olds.each do |doc|
    ...
    doc.set(:processed => true)
    last_time = doc.t
end

if it is not enough and you need more speed you should use moped directly and also use no_timeout option to avoid

olds = Item.collection.find(processed: false, :t  => {:$gt => last_time}).sort(t: -1)
olds.each do |doc|
    ...
    Item.collection.find(_id: doc['_id']).update(:$set => {:processed => true})
    last_time = doc['t']
end

Don't forget add indexes for t and processed

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment