Skip to content

Instantly share code, notes, and snippets.

@takagi
takagi / cifar_fp16
Last active June 10, 2019 06:54
Comparison of Chainer's cifar example between in FP32 mode and in FP16 mode
$ CHAIENR_DTYPE=float16 python train_cifar.py -d 0
Device: @cupy:0
# Minibatch-size: 64
# epoch: 300
Using CIFAR10 dataset.
epoch main/loss validation/main/loss main/accuracy validation/main/accuracy elapsed_time
1 2.32253 2.12119 0.175971 0.192178 23.2918
2 1.7554 1.87858 0.304497 0.302747 49.5816
3 1.46396 1.61379 0.450664 0.416202 75.7544
@takagi
takagi / dot.lisp
Last active December 13, 2016 17:43
(defun make-dvec (input-dimension initial-element)
(make-array input-dimension :element-type 'double-float :initial-element initial-element))
(defmacro dovec (vec var &body body)
`(loop for ,var fixnum from 0 to (1- (length ,vec)) do ,@body))
(defun dot (x y)
(declare (type (simple-array double-float) x y)
(optimize (speed 3) (safety 0)))
(let ((result 0.0d0))
https://github.com/takagi/cl-cuda/tree/issue/49.symbol-macro
・時間を食う処理は、update-density と update-force で、全体の 90% 以上
・アルゴリズムは所与としたとき、GPU の使い方のレベルで高速化する余地はあるか?
 →メモリアクセスにあまり局所性がなさそう
 →グローバルメモリへのアクセスが律速なので、それ以上はもうやりようがない?
・グリッドやブロックの割当てはどのようにやるもの?
@takagi
takagi / flexi-streams.lisp
Created September 9, 2015 14:03
flexi-streams's external format
(with-open-file (in "/Users/mtakagi/Desktop/bin" :direction :input
:element-type 'unsigned-byte)
(let ((buffer (make-array 256 :element-type 'unsigned-byte)))
(read-sequence buffer in)
(flexi-streams:octets-to-string buffer :external-format :utf-8)))
@takagi
takagi / tsuru.lisp
Last active August 29, 2015 14:17
Tsuru Capital recruiting test code sample.
;;;
;;; Fundamental WORD/INT types and readers
;;;
(deftype word8 ()
`(unsigned-byte 8))
(deftype word16 ()
`(unsigned-byte 16))
@takagi
takagi / read.lisp
Created March 26, 2015 11:51
Comparing efficiency of READ-BYTE with READ-SEQUENCE.
(require :sb-sprof)
(defun test-read-byte0 ()
(with-open-file (in "data" :direction :input
:element-type '(unsigned-byte 8))
(loop repeat (* 4 1024 1024)
do (read-byte in))))
(defun profile-read-byte0 ()
(sb-sprof:with-profiling (:max-samples 100