-
-
Save swayson/84fc86da20db89b56eac to your computer and use it in GitHub Desktop.
SELECT * FROM table | |
WHERE _ROWID_ >= (abs(random()) % (SELECT max(_ROWID_) FROM table)) | |
LIMIT 1 |
Since there may be holes in the rowid sequence, there is no warranty it will always return something, even non-empty tables.
This picks one random row in the table by taking the first row in a random range.
Here is my proposal for random sampling multiple rows in an efficient way:
https://gist.github.com/alecco/9976dab8fda8256ed403054ed0a65d7b
If the full sample is of k rows (say 100k) and the underlying table has n rows (e.g. 100m).
For each of the k rows, this approach will pick a random rowid in the range of valid rowids. But since there might be wholes (e.g. deleted rows) it has to go find the first valid rowid (the >= part). So it's doing k searches in the B-Tree of rowids of the table.
My proposal traverses sequentially the whole of the rowids and samples that to j size (say 1m out of 100m), then sorts and picks k out of that. You can fine-tune the sampling j to get closer to k. Or at least fit in L2/L3 cache.
As long as k is not trivially small, sequential traversal of the rowid B-Tree should be much faster. YMMV
Wow, this is an old gist I forgot about π
This picks one random row in the table by taking the first row in a random range.
Here is my proposal for random sampling multiple rows in an efficient way:
https://gist.github.com/alecco/9976dab8fda8256ed403054ed0a65d7b
Looks great and thanks for the detailed description on the implementation. Think this is great learning material and will help folks π
Why is
ROWID
enclosed in β_β ?