Created
January 22, 2022 01:01
-
-
Save AshyIsMe/b4a8dbbd94f3c322b999a4e51b71bea3 to your computer and use it in GitHub Desktop.
Simple SQL style filtering in J
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
NB. `select a,b,c,d where a=5 and b>7 order by c desc limit 10` | |
NB. TODO: read data from parquet with https://github.com/AshyIsMe/JArrow | |
NB. Something like: 'a b c d'=: 'a';'b';'c';'d' readParquet 'foo.parquet' | |
a=:?1e6$10 NB. 1e6 random ints 0to9 inclusive | |
b=:?1e6$10 | |
c=:?1e6$10 | |
d=:?1e6$10 | |
NB. select a,b,c,d where a=5 and b>7 order by c desc limit 10 | |
10 {. df \: 2{"1 df=:((a=5) *. b>7) # (a,.b,.c,.d) | |
NB. J code in english: | |
NB. First 10 items of df: 10 {. df | |
NB. gradeDown(sortDesc) by column 3: \: 2{"1 | |
NB. where df is rows matching a=5 and b>7: ((a=5) *. b>7) | |
NB. copied from a,b,c,d joined into a table: # (a,.b,.c,.d) | |
NB. Dumb benchmarks: | |
timespacex 'a=:?1e6$10 [ b=:?1e6$10 [ c=:?1e6$10 [ d=:?1e6$10' | |
NB. Seconds Bytes | |
NB. 0.032309 4.19471e7 | |
timespacex '10 {. df \: 2{"1 df=:((a=5) *. b>7) # (a,.b,.c,.d)' | |
NB. Seconds Bytes | |
NB. 0.025436 6.71119e7 | |
timespacex 'a=:?1e8$10 [ b=:?1e8$10 [ c=:?1e8$10 [ d=:?1e8$10' | |
NB. Seconds Bytes | |
NB. 3.79311 5.36871e9 | |
timespacex '10 {. df \: 2{"1 df=:((a=5) *. b>7) # (a,.b,.c,.d)' | |
NB. Seconds Bytes | |
NB. 4.79104 8.58994e9 | |
NB. 1e9 length columns require 8*4=32GB ram + extra for computation... 16GB M1 Macbook air haint a jacuzzi... | |
timespacex 'a=:?1e9$10 [ b=:?1e9$10 [ c=:?1e9$10 [ d=:?1e9$10' | |
timespacex '10 {. df \: 2{"1 df=:((a=5) *. b>7) # (a,.b,.c,.d)' |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment