-
-
Save acrosby/4601257 to your computer and use it in GitHub Desktop.
from rtree import index | |
from random import random | |
from datetime import datetime | |
timer = datetime.now() | |
# Create 10,000,000 random numbers between 0 and 1 | |
rands = [random() for i in range(10000000)] | |
# Function required to bulk load the random points into the index | |
# Looping over and calling insert is orders of magnitude slower than this method | |
def generator_function(): | |
for i, coord in enumerate(rands): | |
yield (i, (coord, coord+1, coord, coord+1), coord) | |
# Add points | |
tree = index.Index(generator_function()) | |
print (datetime.now()-timer).seconds # How long did it take to add the points | |
print list(tree.nearest((rands[50], rands[50], rands[50], rands[50]), 3)) | |
print (datetime.now()-timer).seconds # How long did it take to query for the nearest 3 points |
@Tasneem-gh This was just a speed test of the generator performance, nothing more. I suggest you look into the rtree and Python file io documentation if you are interested in serializing and deserializing data.
@acrosby
Thank you for replying
Actually I am using a different rtree library for building the index but I though of using the generator function to speed up the building process because it takes around an hour to build 1 million data records.
So, serializing data allows reading from a file and I can use the generator function along with that?
@Tasneem-gh That would depend on if your rtree library supports a generator as an input. But if it does, then you could read a text file of coordinates line by line as a generator inside of a generator that does some processing of the lines and yield
the result. Here is some info that may be helpful: https://realpython.com/introduction-to-python-generators/
@acrosby Thanks for the hint. Will try that
How can we use the generator function to read data from a file, instead of randomly generated data?