gist:65eccda026664c877ecc

So I have some time series data, potentially lots of it - 10-100M+ items.

The schema is roughly:

timestamp: datetime of event
value: float value of event
source_id: the "thing" that generated this data
zone_id: the container in which source_id is located

Data will be inserted into the store in arbitrary order.

I need to be able to efficiently work out, at a given point of time, the latest-recorded/last-known value for every source_id within a given zone_id. So for the given data, queried at time=8:

timestamp | value | source_id | zone_id
---------------------------------------
10        | 5     | 1         | 2
8         | 3     | 1         | 2
1         | 6     | 2         | 2
3         | 4     | 1         | 2
5         | 3     | 3         | 2
2         | 2     | 2         | 2

...the results would be:

source_id | value
-----------------
1         | 3
2         | 2
3         | 3

I'm trying to determine the most appropriate storage engine for performing this query. At the moment I'm using PostgreSQL with window queries but it's fairly slow. This doesn't seem like a particularly hard problem to solve, so does anybody know of a better technology I could try?

jaz303/gist:65eccda026664c877ecc