To test we create a file much larger than we’d read in with a typical AoC question - let’s mimic day 11 of 2021 (also valid for day 09 too) available here.
shuf -i 1000000000-9999999999 -n 10000000 -o big_test.txt
With a file this big, my original solution runs out of memory due to garbage collection issues. The problem here was misunderstanding eager comprehensions - the nested map should be part of the list comprehension itself not applied afterwards!
A good link about comprehensions in Scheme for reference.
(define (parse-input filename)
"Read file of numbers into 2D array."
(call-with-input-file filename
(λ (p)
(list->array 2 (map (λ (str)
(map (compose string->number string) (string->list str)))
(list-ec (:port line p read-line) line))))))
So first change is to bring the nested map inside the comprehension, in the original we just return variable line
which is the result of repeatedly applying read-line
to port
p
until an eof
received. Instead we apply a scheme expression to line
below, converting it to a list and then mapping over this list to covert to numbers.
(define (parse-input filename)
"Read file of numbers into 2D array."
(call-with-input-file filename
(λ (p)
(list->array 2 (list-ec (:port line p read-line)
(map (compose string->number string)
(string->list line)))))))
scheme@(day11)> ,pr (parse-input "big_test.txt") % cumulative self time seconds seconds procedure 30.11 13.66 13.60 string 15.40 6.95 6.95 %read-line 15.26 6.89 6.89 string->number 11.82 5.33 5.33 ice-9/boot-9.scm:797:8 8.44 3.81 3.81 string->list 8.17 168.91 3.69 srfi/srfi-1.scm:584:5:map1 4.25 1.92 1.92 ice-9/boot-9.scm:798:28 2.16 1.19 0.98 ice-9/boot-9.scm:790:0:compose 1.22 45.12 0.55 <current input>:133:4 1.15 1.10 0.52 srfi/srfi-1.scm:580:2:map 0.68 0.30 0.30 procedure? 0.61 0.27 0.27 list? 0.41 7.13 0.18 ice-9/rdelim.scm:193:0:read-line 0.20 0.09 0.09 %after-gc-thunk 0.07 0.03 0.03 reverse 0.07 0.03 0.03 list->array 0.00 45.15 0.00 ice-9/ports.scm:438:0:call-with-input-file 0.00 15.58 0.00 anon #x1bb051c 0.00 0.09 0.00 anon #x1bb0590 --- Sample count: 1481 Total time: 45.145951082 seconds (19.292058234 seconds in GC) scheme@(day11)>
SRFI-42 comes with a comprehension specifically for iterating over strings, which means we can get rid of the string->list
, by nesting one comprehension inside another.
(define (parse-input filename)
"Read file of numbers into 2D array."
(call-with-input-file filename
(λ (p)
(list->array 2 (list-ec (:port line p read-line)
(list-ec (:string ch line)
((compose string->number string) ch)))))))
scheme@(day11)> ,pr (parse-input "big_test.txt") % cumulative self time seconds seconds procedure 21.37 9.05 9.05 string 16.47 9.23 6.98 ice-9/boot-9.scm:790:0:compose 14.58 6.18 6.18 %read-line 13.93 5.90 5.90 string->number 11.75 42.34 4.98 <current input>:164:4 10.98 4.65 4.65 ice-9/boot-9.scm:797:8 5.37 2.28 2.28 ice-9/boot-9.scm:798:28 4.37 1.85 1.85 reverse 0.89 6.55 0.38 ice-9/rdelim.scm:193:0:read-line 0.24 0.10 0.10 <current input>:165:6:reverse@@srfi/srfi-42 0.06 0.03 0.03 list->array 0.00 42.36 0.00 ice-9/ports.scm:438:0:call-with-input-file 0.00 11.33 0.00 anon #x1bb051c --- Sample count: 1694 Total time: 42.364101649 seconds (14.487588887 seconds in GC) scheme@(day11)>
This is maybe slightly faster, but the string
constructor is still costing us, it would be better if we could deal directly with the chars rather than converting back to strings.
(define (parse-input filename)
"Read file of numbers into 2D array."
(call-with-input-file filename
(lambda (p)
(list->array 2 (list-ec (:port line p read-line)
(list-ec (:string ch line)
(if (char-numeric? ch))
(- (char->integer ch) 48)))))))
scheme@(day11)> ,pr (parse-input "big_test.txt") % cumulative self time seconds seconds procedure 42.32 7.24 7.24 %read-line 30.15 17.08 5.16 <current input>:269:4 16.10 2.76 2.76 reverse 9.18 1.57 1.57 char-numeric? 1.12 7.44 0.19 ice-9/rdelim.scm:193:0:read-line 0.94 0.16 0.16 <current input>:275:6:reverse@@srfi/srfi-42 0.19 0.03 0.03 list->array 0.00 17.12 0.00 ice-9/ports.scm:438:0:call-with-input-file --- Sample count: 534 Total time: 17.115120396 seconds (7.343203862 seconds in GC) scheme@(day11)>
You might save a few seconds by removing the numeric check too if you don’t care about checking your inputs:
(define (parse-input filename)
"Read file of numbers into 2D array."
(call-with-input-file filename
(lambda (p)
(list->array 2 (list-ec (:port line p read-line)
(list-ec (:string ch line)
(- (char->integer ch) 48)))))))
scheme@(day11)> ,pr (parse-input "big_test.txt") % cumulative self time seconds seconds procedure 51.00 6.45 6.45 %read-line 28.95 12.62 3.66 <current input>:172:4 16.93 2.14 2.14 reverse 2.23 6.73 0.28 ice-9/rdelim.scm:193:0:read-line 0.67 0.08 0.08 <current input>:178:6:reverse@@srfi/srfi-42 0.22 0.03 0.03 list->array 0.00 12.65 0.00 ice-9/ports.scm:438:0:call-with-input-file --- Sample count: 449 Total time: 12.648481982 seconds (4.129522935 seconds in GC) scheme@(day11)>
Whilst the use of nested list-ec
is certainly neat, it means that bad input is either incorrectly processed or at best filtered out using an if qualifer inside the comprehension. Let’s keep the char
approach but produce a proper error using a lambda instead.
(define (parse-input filename)
"Read file of numbers into 2D array."
(call-with-input-file filename
(lambda (p)
(list->array 2 (list-ec (:port line p read-line)
(map (λ (ch) (if (char-numeric? ch)
(- (char->integer ch) 48)
(error "NaN!")))
(string->list line)))))))
This seems to be slightly slower than using list-ec, note that if we keep the map but remove the if
check completely the result remains similar to but slightly slower than equivalent test above using list-ec without an if
qualifier.
scheme@(day11)> ,pr (parse-input "big_test.txt") % cumulative self time seconds seconds procedure 38.56 6.99 6.99 %read-line 18.31 3.32 3.32 string->list 16.37 35.80 2.97 srfi/srfi-1.scm:584:5:map1 8.63 3.10 1.56 <current input>:234:35 8.45 1.53 1.53 char-numeric? 2.29 18.11 0.42 <current input>:229:4 2.29 0.96 0.42 srfi/srfi-1.scm:580:2:map 1.76 7.31 0.32 ice-9/rdelim.scm:193:0:read-line 1.76 0.32 0.32 list? 1.23 0.22 0.22 procedure? 0.18 0.03 0.03 reverse 0.18 0.03 0.03 list->array 0.00 18.14 0.00 ice-9/ports.scm:438:0:call-with-input-file --- Sample count: 568 Total time: 18.137448602 seconds (8.826047527 seconds in GC) scheme@(day11)>
Finally you’re perhaps thinking do you need both list-ec
calls - the anwser is yes. Just like Python list comprehensions when can chain the :port
and :string
qualifiers together but we then get a flat list, for example.
(define (parse-input filename)
"Read file of numbers into 2D array."
(call-with-input-file filename
(lambda (p)
(list-ec (:port line p read-line)
(:string ch line)
(- (char->integer ch) 48)))))
Note that in this example you probably wouldn’t bother to use read-line
as you could read-string
to get the whole file as one string.