Skip to content

Instantly share code, notes, and snippets.

@AshyIsMe
Created January 22, 2022 01:01
Show Gist options
  • Save AshyIsMe/b4a8dbbd94f3c322b999a4e51b71bea3 to your computer and use it in GitHub Desktop.
Save AshyIsMe/b4a8dbbd94f3c322b999a4e51b71bea3 to your computer and use it in GitHub Desktop.
Simple SQL style filtering in J
NB. `select a,b,c,d where a=5 and b>7 order by c desc limit 10`
NB. TODO: read data from parquet with https://github.com/AshyIsMe/JArrow
NB. Something like: 'a b c d'=: 'a';'b';'c';'d' readParquet 'foo.parquet'
a=:?1e6$10 NB. 1e6 random ints 0to9 inclusive
b=:?1e6$10
c=:?1e6$10
d=:?1e6$10
NB. select a,b,c,d where a=5 and b>7 order by c desc limit 10
10 {. df \: 2{"1 df=:((a=5) *. b>7) # (a,.b,.c,.d)
NB. J code in english:
NB. First 10 items of df: 10 {. df
NB. gradeDown(sortDesc) by column 3: \: 2{"1
NB. where df is rows matching a=5 and b>7: ((a=5) *. b>7)
NB. copied from a,b,c,d joined into a table: # (a,.b,.c,.d)
NB. Dumb benchmarks:
timespacex 'a=:?1e6$10 [ b=:?1e6$10 [ c=:?1e6$10 [ d=:?1e6$10'
NB. Seconds Bytes
NB. 0.032309 4.19471e7
timespacex '10 {. df \: 2{"1 df=:((a=5) *. b>7) # (a,.b,.c,.d)'
NB. Seconds Bytes
NB. 0.025436 6.71119e7
timespacex 'a=:?1e8$10 [ b=:?1e8$10 [ c=:?1e8$10 [ d=:?1e8$10'
NB. Seconds Bytes
NB. 3.79311 5.36871e9
timespacex '10 {. df \: 2{"1 df=:((a=5) *. b>7) # (a,.b,.c,.d)'
NB. Seconds Bytes
NB. 4.79104 8.58994e9
NB. 1e9 length columns require 8*4=32GB ram + extra for computation... 16GB M1 Macbook air haint a jacuzzi...
timespacex 'a=:?1e9$10 [ b=:?1e9$10 [ c=:?1e9$10 [ d=:?1e9$10'
timespacex '10 {. df \: 2{"1 df=:((a=5) *. b>7) # (a,.b,.c,.d)'
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment