Last active
May 9, 2023 02:17
-
-
Save Pitometsu/bb1fdf07393e999a7daf8491e11d47a6 to your computer and use it in GitHub Desktop.
A test assignment on the Common Lisp position at KeepIT.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
#! /usr/bin/env sh | |
#| | |
exec nix-shell --show-trace --pure -Q -p clisp --run "clisp -q -q $0 ${1+"$@"}" | |
exit | |
|# | |
;; Supposed to be O(letters + total-word * log unique-words) time, | |
;; and slightly less then O(unique-words) space. | |
;; For slower but much more memory efficient version | |
;; alphabetical trees could be useful. | |
;; | |
;; Current solution pretty imperative | |
;; in attempt to be minimalistic | |
;; and utilize default library only. | |
;; | |
(defun counts-file (file) | |
"Return a number of unique distinct words in FILE." | |
(hash-table-count | |
(with-open-file (channel file) | |
(loop with dictionary = (make-hash-table) | |
and word = (make-string-output-stream) | |
for char = (read-char channel nil) | |
if char do | |
(if (alphanumericp char) (write-char char word) | |
(let ((new-word (get-output-stream-string word))) | |
(unless (string= "" new-word) | |
(setf (gethash new-word dictionary) nil)))) | |
else return dictionary)))) | |
(progn | |
(format T "Unique distinct word found: ~D" | |
(counts-file (format NIL "~{~A~^ ~}" *ARGS*))) | |
(quit)) |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
#! /usr/bin/env sh | |
#| | |
exec nix-shell --show-trace --pure -Q -p clisp --run "clisp -q -q $0 ${1+"$@"}" | |
exit | |
|# | |
;; 'remove-duplicates once would be faster, but more space-consuming. | |
;; No error handling according to YAGNI | |
(defun counts-file (file) | |
"Return a number of unique distinct words in FILE." | |
(length | |
(with-open-file (channel file) | |
(do ((result nil (adjoin item result)) | |
(item (read channel nil 'eos) | |
(read channel nil 'eos))) | |
((eq item 'eos) result))))) | |
(progn | |
(format T "Unique distinct word found: ~D" | |
(counts-file (format NIL "~{~A~^ ~}" *ARGS*))) | |
(quit)) |
Implementation details: current implementation rely on CL symbols naming since it utilizes not the real set collection, but the list's set functionality, and so parse file words accordingly. Current memory consumption is linear to the memory needed for the list of unique words of the file plus the biggest word size for processing. It could be improved if a real set collection would be used, based on hashes or alphabetic tree structure. Speed can be improved in case of increasing memory consumption by loading the whole file at once.
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Setup:
Nix
:Usage: