Skip to content

Instantly share code, notes, and snippets.

@kidd
Last active January 18, 2025 13:00
Show Gist options
  • Save kidd/97cfcd7fb42fc2b03c8c812559bb01d2 to your computer and use it in GitHub Desktop.
Save kidd/97cfcd7fb42fc2b03c8c812559bb01d2 to your computer and use it in GitHub Desktop.
arg_max example in duckdb
D create table test as SELECT i as i ,random() as r, i%4 as gr from generate_series(1,100) t(i);
D select * from test;
┌───────┬──────────────────────┬───────┐
│ i │ r │ gr │
│ int64 │ double │ int64 │
├───────┼──────────────────────┼───────┤
│ 1 │ 0.14984029697949075 │ 1 │
│ 2 │ 0.20274345139105418 │ 2 │
│ 3 │ 0.07478489612107744 │ 3 │
│ 4 │ 0.782259951294926 │ 0 │
│ 5 │ 0.2284793616804479 │ 1 │
│ 6 │ 0.041695246948323966 │ 2 │
│ 7 │ 0.7573347926971816 │ 3 │
│ 8 │ 0.73962961294214 │ 0 │
│ 9 │ 0.7779207988125088 │ 1 │
│ 10 │ 0.9389072358466003 │ 2 │
│ · │ · │ · │
│ · │ · │ · │
│ · │ · │ · │
│ 91 │ 0.6364273847165581 │ 3 │
│ 92 │ 0.18678856715252357 │ 0 │
│ 93 │ 0.6967327018959291 │ 1 │
│ 94 │ 0.9226525455998845 │ 2 │
│ 95 │ 0.8744859222496129 │ 3 │
│ 96 │ 0.868602423665254 │ 0 │
│ 97 │ 0.2760435713632134 │ 1 │
│ 98 │ 0.03305368778134084 │ 2 │
│ 99 │ 0.6663320953180856 │ 3 │
│ 100 │ 0.36892018499060536 │ 0 │
├───────┴──────────────────────┴───────┤
│ 100 rows (20 shown) 3 columns │
└──────────────────────────────────────┘
D SELECT gr,max(r) from test group by all; -- did you notice this group by all? super cool!
┌───────┬────────────────────┐
│ gr │ max(r) │
│ int64 │ double │
├───────┼────────────────────┤
│ 0 │ 0.9624206805514219 │
│ 1 │ 0.9922432776056797 │
│ 2 │ 0.9643812561324754 │
│ 3 │ 0.9202900638152589 │
└───────┴────────────────────┘
-- How could we get the id of the one that has the max r?
-- https://stackoverflow.com/questions/3800551/select-first-row-in-each-group-by-group/7630564#7630564.
-- not easy. But in some cases, arg_max can just do the job
D SELECT gr,max(r),arg_max(i,r) from test group by gr;
┌───────┬────────────────────┬───────────────┐
│ gr │ max(r) │ arg_max(i, r) │
│ int64 │ double │ int64 │
├───────┼────────────────────┼───────────────┤
│ 0 │ 0.9624206805514219 │ 60 │
│ 1 │ 0.9922432776056797 │ 61 │
│ 2 │ 0.9643812561324754 │ 22 │
│ 3 │ 0.9202900638152589 │ 23 │
└───────┴────────────────────┴───────────────┘
@matsonj
Copy link

matsonj commented Jan 16, 2025

@ANelson82 the advantage here is that you can sort the partitions only, not the entire list. QUALIFY forces you sort the entire list, arg_max does not.

@ANelson82
Copy link

That makes sense. Appreciate the explination @matsonj

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment