Last active
December 16, 2015 23:48
-
-
Save knewter/5515951 to your computer and use it in GitHub Desktop.
discussion on #postgresql where people are trying to help me out with the candle data query
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
--- Log opened Fri May 03 21:44:08 2013 | |
21:44 --> | jadams [[email protected]] has joined #postgresql | |
21:44 --- | Users 663 nicks [0 ops, 0 halfops, 0 voices, 663 normal] | |
21:44 --- | Channel #postgresql was synced in 2 seconds | |
21:44 < jadams> | hey hey | |
21:44 < supplicant> | hello | |
21:45 <-- | trbs2 [~trbs@2001:470:d2ad:1:4a5b:39ff:fe7d:1624] has quit (Remote host closed the connection) | |
21:46 < jadams> | are there any freelance dbas around these parts? Once I've spent another day or so on a query I might could use a dba to help make it go fast :D | |
21:46 <-- | nicholasf_ [[email protected]] has quit (Remote host closed the connection) | |
21:46 < jadams> | at present, I've got this working and it's ok-ish https://gist.github.com/knewter/5515761 | |
21:46 <-- | Kassandry [~kassandry@70.89.177-157-BusName-smpls.oregon.hfc.comcastbusiness.net] has quit (Quit: Leaving.) | |
21:47 < jadams> | not great though. The data's what I want and it's flexible, but it takes like 10 seconds for the query to respond. I need to get it substantially faster. Part of it's that I'm using an sql aggregate function (first/last) and I could use a C extension for those. But removing those two aggregates doesn't make the query any faster really | |
21:48 < supplicant> | pastebin the explain | |
21:48 < supplicant> | at paste.depesz.com | |
21:48 < jadams> | okie dokie | |
21:48 < supplicant> | I suspect five_minute_intervals is rather large | |
21:49 --> | lchiessi [[email protected]] has joined #postgresql | |
21:49 < RhodiumToad> | replacing the aggregates with window functions might help | |
21:50 <-- | ramr [[email protected]] has quit (Quit: Leaving.) | |
21:50 < RhodiumToad> | the five_minute_intervals thing seems extremely dubious. | |
21:50 <-- | lchiessi [[email protected]] has quit (Client Quit) | |
21:50 < jadams> | I keep getting a 'sorry something went wrong' when I try to paste to paste.depesz.com | |
21:51 < supplicant> | I'm fairly certain that converting your date into a unix timestamp, then dividing by 300 and grouping by that result will probably be faster | |
21:51 < RhodiumToad> | explain.depesz.com | |
21:51 < supplicant> | yeah, my bad on that one | |
21:51 < jadams> | https://gist.github.com/knewter/5515761 | |
21:51 < jadams> | ahh sorry | |
21:52 < jadams> | supplicant: I actually have it as a unix timestamp | |
21:52 < jadams> | in the date field | |
21:52 < jadams> | the date_as_datetime field is just a conversion of that I do on insert in my client | |
21:52 < jadams> | because I (foolishly?) thought it would make things faster to use dates | |
21:53 < jadams> | http://explain.depesz.com/s/Mw3Q | |
21:53 < RhodiumToad> | are you confusing date and timestamp? | |
21:53 < jadams> | no, it's called date but it's a timestamp | |
21:53 < jadams> | I only did that because it mimics the remote service data i'm storing | |
21:53 < jadams> | they call the column date :-\ | |
21:53 < RhodiumToad> | oh, that plan is going to suck so hard | |
21:53 < jadams> | it is an epoch integer | |
21:54 < jadams> | RhodiumToad: did you see the explain I pasted? | |
21:54 < RhodiumToad> | in pg, you can convert epoch integers to and from timestamp with time zone without loss of information | |
21:55 < RhodiumToad> | yes, that's what I was referring to as sucking :-) | |
21:55 < jadams> | ah...I don't even know how to read it to see why it's obvious it sucks | |
21:55 < RhodiumToad> | an explain analyze would be better | |
21:55 < jadams> | can do | |
21:56 <-- | lamneth_ [[email protected]] has left #postgresql ( ) | |
21:57 < RhodiumToad> | anyway, rather than join against a cte for the 5 minute thing, just calculate it | |
21:57 < jadams> | RhodiumToad: http://explain.depesz.com/s/Mii | |
21:58 < supplicant> | this plan will also go faster with more work_mem | |
21:58 < RhodiumToad> | not all that much | |
21:59 < supplicant> | this is also true | |
21:59 < RhodiumToad> | again, Amdahl's Law | |
22:00 < RhodiumToad> | so did I understand correctly that you already have a t.date column with a unixtime? | |
22:00 < jadams> | you did | |
22:01 --> | mmitchell [[email protected]] has joined #postgresql | |
22:01 < RhodiumToad> | so your starting point should be: | |
22:01 <-- | daemonkeeper [~Arno@debian/developer/arno] has quit (Ping timeout: 245 seconds) | |
22:01 < RhodiumToad> | select ... from trades t group by t.date/300; | |
22:01 <-- | booo [[email protected]] has quit (Read error: Operation timed out) | |
22:01 < RhodiumToad> | the only question mark would be how best to do the first/last stuff | |
22:02 < jadams> | ther'es a c extension for the aggregate first/last functions | |
22:02 < jadams> | I can take the first/last bits out for now | |
22:02 < RhodiumToad> | probably not a good idea; ORDER BY for aggs has a nontrivial penalty | |
22:02 < jadams> | aha | |
22:03 --> | _AnK [[email protected]] has joined #postgresql | |
22:03 * | RhodiumToad wrote that, knows how it works | |
22:03 <-- | mmitchell [[email protected]] has quit (Remote host closed the connection) | |
22:04 < RhodiumToad> | so there's a window function solution, and also a self-join solution | |
22:04 --> | nichola__ [[email protected]] has joined #postgresql | |
22:04 < RhodiumToad> | I would probably try both before considering C aggregates | |
22:04 <-- | dcramer_ [[email protected]] has quit (Quit: dcramer_) | |
22:05 --> | daemonkeeper [~Arno@debian/developer/arno] has joined #postgresql | |
22:05 --> | booo [[email protected]] has joined #postgresql | |
22:05 < jadams> | ok. I'm going to try to keep this log in my working notes on this stuff so I can reference back to it later | |
22:06 --- | nichola__ is now known as nicholasf_ | |
22:07 --- | nicholasf_ is now known as nicholasf | |
22:10 < jadams> | RhodiumToad: so if I do the group by like that, how do I get the start/end time of the interval? | |
22:10 < jadams> | NEVERMIND | |
22:10 < jadams> | I will hush, easy enough to solve | |
22:12 < jadams> | so the first/last bits aren't TERRIBLY egregious with the t.date/300 | |
22:14 < RhodiumToad> | select start_time, start_time+300, sum(amount) as volume, min(price) as low, max(price) as high, max(first_price) as first, max(last_price) as last from (select date/300 as start_time, price, first_value(price) over w as first_price, last_value(price) over w as last_price from trades window w as (partition by date/300 order by date rows between unbounded preceding and unbounded following)) s group by start_time order by start_tim | |
22:15 < RhodiumToad> | or something of that ilk | |
22:15 <-- | tureba [[email protected]] has quit (Quit: leaving) | |
22:15 --> | MmikeDOMA [[email protected]] has joined #postgresql | |
22:17 --> | phil_ [[email protected]] has joined #postgresql | |
22:18 --> | tureba [[email protected]] has joined #postgresql | |
22:19 <-- | Mmike [[email protected]] has quit (Ping timeout: 255 seconds) | |
22:19 --> | dmishe [[email protected]] has joined #postgresql | |
22:20 <-- | blaa [~alvaro@unaffiliated/blaa] has quit (Quit: KVIrc 4.2.0 Equilibrium http://www.kvirc.net/) | |
22:20 <-- | _Tenchi_ [[email protected]] has quit (Ping timeout: 264 seconds) | |
22:24 < jadams> | RhodiumToad: that's a very fast query, and on the surface looks like it does the right thing (haven't tried to verify the data as correct yet). Output here (and I tweaked it a little) https://gist.github.com/knewter/5515990 | |
22:24 --- | phil_ is now known as _Tenchi_ | |
22:25 < jadams> | RhodiumToad: my onle question is, how can I determine what time range a given result was over? | |
22:25 < jadams> | I have some values for start_time and end_time, but they aren't epochs because they were divided by 300 | |
22:25 <-- | cartan [[email protected]] has quit (Quit: Zzzzzzzzzz) | |
22:25 < RhodiumToad> | multiply them by 300 again | |
22:25 < jadams> | fair enough... | |
22:25 < RhodiumToad> | and use to_timestamp(start_time*300) if you want a timestamptz | |
22:25 < RhodiumToad> | what does the explain analyze look like? | |
22:27 < jadams> | I'll grab it in a sec for you. Just pasted the query and output here: https://gist.github.com/knewter/5515990 | |
22:27 < jadams> | the start_time and end_time aren't five minutes apart... | |
22:27 < jadams> | let me paste the explain analyze | |
22:27 < RhodiumToad> | oh, sorry | |
22:27 < RhodiumToad> | start_time*300, start_time*300+300 | |
22:28 < jadams> | http://explain.depesz.com/s/FDR | |
22:28 < jadams> | RhodiumToad: aha, thanks...derp | |
22:28 < RhodiumToad> | the start_time+300 in the original should have been start_time+1 | |
22:29 --> | eggyknap_ [[email protected]] has joined #postgresql | |
22:29 < RhodiumToad> | it's better if you can sort by the original raw start_time rather than the timestamptz version of it | |
22:29 < RhodiumToad> | saves a sort step | |
22:29 <-- | eggyknap [~jtolley@unaffiliated/eggyknap] has quit (Quit: leaving) | |
22:30 --- | eggyknap_ is now known as eggyknap | |
22:30 <-- | eggyknap [[email protected]] has quit (Changing host) | |
22:30 --> | eggyknap [~jtolley@unaffiliated/eggyknap] has joined #postgresql | |
22:30 < RhodiumToad> | anyway, two orders of magnitude speedup, not bad going | |
22:30 --> | drale2k [[email protected]] has joined #postgresql | |
22:31 < jadams> | here I fixed the start_time replacement (didn't mean to override the start_time with a tz): http://explain.depesz.com/s/0KC | |
22:32 < RhodiumToad> | that's the query plan I was aiming for | |
22:32 < jadams> | awesome. The query's here https://gist.github.com/knewter/5515990 | |
22:32 <-- | dmishe [[email protected]] has quit (Quit: ["Textual IRC Client: www.textualapp.com"]) | |
22:33 --> | mk3548208 [[email protected]] has joined #postgresql | |
22:33 < jadams> | RhodiumToad: many many thanks on this sir. | |
22:34 < RhodiumToad> | however, you may have noticed that rows are missing from the output where there's no data in the interval | |
22:34 < jadams> | yeah | |
22:34 < jadams> | I'm fairly sure the graphing library can just take care of that | |
22:34 < jadams> | it's time aware | |
22:34 < jadams> | I knew that was a consequence of losing the temp table | |
22:35 --> | an [[email protected]] has joined #postgresql | |
22:35 < RhodiumToad> | it's possible to fill in the gaps if need be (there's about two different ways of doing so) | |
22:35 --> | Ja [[email protected]] has joined #postgresql | |
22:35 <-- | Ja [[email protected]] has left #postgresql ( ) | |
22:35 --- | an is now known as Guest83986 | |
22:35 < RhodiumToad> | one is to do an outer join against a generated time series _after_ the grouping (much more efficient to do it after than before) | |
22:36 < RhodiumToad> | the other is a little trick with window functions and generate_series in the select list | |
22:37 < RhodiumToad> | the outer join one is probably easier to understand: | |
22:37 < RhodiumToad> | select * from (select ... group by start_time) s right join generate_series(...) g(start_time) using (start_time) order by start_time; | |
22:41 --> | josh__ [[email protected]] has joined #postgresql | |
22:41 --- | josh__ is now known as agliodbs | |
22:43 <-- | Guest83986 [[email protected]] has quit (Remote host closed the connection) | |
22:46 <-- | _Tenchi_ [[email protected]] has quit (Ping timeout: 248 seconds) |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment