Skip to content

Instantly share code, notes, and snippets.

@Pitometsu
Last active May 9, 2023 02:17
Show Gist options
  • Save Pitometsu/bb1fdf07393e999a7daf8491e11d47a6 to your computer and use it in GitHub Desktop.
Save Pitometsu/bb1fdf07393e999a7daf8491e11d47a6 to your computer and use it in GitHub Desktop.
A test assignment on the Common Lisp position at KeepIT.
#! /usr/bin/env sh
#|
exec nix-shell --show-trace --pure -Q -p clisp --run "clisp -q -q $0 ${1+"$@"}"
exit
|#
;; Supposed to be O(letters + total-word * log unique-words) time,
;; and slightly less then O(unique-words) space.
;; For slower but much more memory efficient version
;; alphabetical trees could be useful.
;;
;; Current solution pretty imperative
;; in attempt to be minimalistic
;; and utilize default library only.
;;
(defun counts-file (file)
"Return a number of unique distinct words in FILE."
(hash-table-count
(with-open-file (channel file)
(loop with dictionary = (make-hash-table)
and word = (make-string-output-stream)
for char = (read-char channel nil)
if char do
(if (alphanumericp char) (write-char char word)
(let ((new-word (get-output-stream-string word)))
(unless (string= "" new-word)
(setf (gethash new-word dictionary) nil))))
else return dictionary))))
(progn
(format T "Unique distinct word found: ~D"
(counts-file (format NIL "~{~A~^ ~}" *ARGS*)))
(quit))
#! /usr/bin/env sh
#|
exec nix-shell --show-trace --pure -Q -p clisp --run "clisp -q -q $0 ${1+"$@"}"
exit
|#
;; 'remove-duplicates once would be faster, but more space-consuming.
;; No error handling according to YAGNI
(defun counts-file (file)
"Return a number of unique distinct words in FILE."
(length
(with-open-file (channel file)
(do ((result nil (adjoin item result))
(item (read channel nil 'eos)
(read channel nil 'eos)))
((eq item 'eos) result)))))
(progn
(format T "Unique distinct word found: ~D"
(counts-file (format NIL "~{~A~^ ~}" *ARGS*)))
(quit))
@Pitometsu
Copy link
Author

Pitometsu commented Apr 28, 2023

Setup:

curl https://nixos.org/nix/install | sh
. ~/.nix-profile/etc/profile.d/nix.sh

Usage:

wget "https://gist.githubusercontent.com/Pitometsu/bb1fdf07393e999a7daf8491e11d47a6/raw/0efe0a123e57b9698611932bfe8da1f2ba3808c4/KeepitAssignment.lisp"
chmod a+x ./KeepitAssignment.lisp
echo "horse a horse and a dog a pig" > ./example.txt
./KeepitAssignment.lisp ./example.txt
  • Result:
Unique distinct word found: 5

@Pitometsu
Copy link
Author

Implementation details: current implementation rely on CL symbols naming since it utilizes not the real set collection, but the list's set functionality, and so parse file words accordingly. Current memory consumption is linear to the memory needed for the list of unique words of the file plus the biggest word size for processing. It could be improved if a real set collection would be used, based on hashes or alphabetic tree structure. Speed can be improved in case of increasing memory consumption by loading the whole file at once.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment