Kenta Murata mrkn

time	temp	headache
0	38.5	1
20	38.4	1
25	38.3	1
30	38.1	1
35	37.9	1
40	37.6	1
45	37.6	1
50	37.3	1
55	37.1	0

RubyData Tokyo Workshop 2017.10.26

RubyKaigi 2017 で実施した RubyData Workshop を東京で再演します。 Ruby でのデータサイエンスを実体験できるチャンスです。みなさんのご参加をお待ちしております。

場所と日時

日時: 10月26日 (木) 14:00〜16:30 会場: 株式会社 Speee (東京都港区六本木4-1-4黒崎ビル5Fセミナールーム)

	===== LIMIT=1000 =====
	Calculating -------------------------------------
	Mysql2Test.test_pluck_by_arrow(n) 64.717M bytes - 100.000 times
	Mysql2Test.test_pluck(n) 154.227M bytes - 100.000 times

	Comparison:
	Mysql2Test.test_pluck_by_arrow(n): 64716800.0 bytes
	Mysql2Test.test_pluck(n): 154226688.0 bytes - 2.38x larger

	===== LIMIT=2000 =====

	julia> function dot(a, b)
	s = zero(eltype(a))
	for i in 1:endof(a)
	s += a[i] * b[i]
	end
	return s
	end
	dot (generic function with 1 method)

	julia> a = ones(100000); b = ones(100000);

	```
	$ python
	Python 3.6.4 (default, Apr 3 2018, 09:35:44)
	[GCC 4.2.1 Compatible Apple LLVM 9.0.0 (clang-900.0.39.2)] on darwin
	Type "help", "copyright", "credits" or "license" for more information.
	>>> from mxnet.gluon.parameter import ParameterDict
	>>> pd1 = ParameterDict()
	>>> pd2 = ParameterDict()
	>>> pd1.get('a')
	Parameter a (shape=None, dtype=<class 'numpy.float32'>)

	compiling arrow-nmatrix.c
	arrow-nmatrix.c: In function ‘garrow_type_to_nmatrix_dtype’:
	arrow-nmatrix.c:57:8: error: ‘GARROW_TYPE_BOOL’ undeclared (first use in this function)
	case GARROW_TYPE_BOOL:
	^
	arrow-nmatrix.c:57:8: note: each undeclared identifier is reported only once for each function it appears in
	arrow-nmatrix.c:34:3: warning: enumeration value ‘GARROW_TYPE_BOOLEAN’ not handled in switch [-Wswitch-enum]
	switch (arrow_type) {
	^
	arrow-nmatrix.c: In function ‘nmatrix_dtype_to_garrow_data_type’:

	require 'benchmark'

	LEN = 10000
	TRY = 1000

	s0 = 'x' * 100
	s1 = 'x' * 99 + 'y'

	cases = {
	shuffle: Array.new(LEN) {\|i\| i }.shuffle,

	# Ruby とデータサイエンスの関係のこれまで

	- その昔 (Ruby 1.6 くらいの頃)、Ruby には NArray という numpy 的な数値配列ライブラリがあって、線形代数演算をするときはこれを使っていた。
	- NArray の開発が inactive になってしばらくして、NArray に影響されて NMatrix というライブラリを John Woods さんが作った。
	- John は SciRuby を立ち上げて、Ruby の科学技術計算ライブラリ群を増やそうと地道な活動をしはじめた
	- SciRuby は当初は勢いがあった (?) が次第に静かになっていった。GSoC では毎年プロジェクトを実施しているが、毎年出るアイデアが長期的視野を持っておらず、継続性もないため、ライブラリの出来は悪く、お世辞にも実用的とは言えないものだらけになっていました。
	- そうこうしているうちに、Ruby はデータサイエンスの盛り上がりから除け者状態になっていった
	- 2015 年頃、NArray を作っていた田中さんが復活し、新しく Ruby Numo というプロジェクトを立ち上げ新しい NArray を出した
	- 2016 年、私は「このままではいつまでたっても Ruby をデータサイエンスで実用的に使えない」と危機感を抱き、PyCall の開発を開始した
	- 2017 年、私は PyCall の最初の安定版をリリースし、Python を下働きさせることで Ruby をデータサイエンスで使える最低限の状況を作った

	mrkn-mbp15-late2016:homebrew-red-data-tools mrkn$ brew install apache-arrow-glib.rb
	==> Downloading https://www.apache.org/dyn/closer.cgi?path=arrow/arrow-0.7.0/apache-arrow-0.7.0.tar.gz
	Already downloaded: /Users/mrkn/Library/Caches/Homebrew/apache-arrow-glib-0.7.0.tar.gz
	"/opt/brew/lib/pkgconfig:/opt/brew/opt/jemalloc/lib/pkgconfig:/opt/brew/opt/apache-arrow/lib/pkgconfig:/opt/brew/opt/glib/lib/pkgconfig:/opt/brew/opt/gobject-introspection/lib/pkgconfig"
	==> ./configure --prefix=/opt/brew/Cellar/apache-arrow-glib/0.7.0 CC=clang
	Last 15 lines from /Users/mrkn/Library/Logs/Homebrew/apache-arrow-glib/01.configure:
	checking for ld used by clang++... /Applications/Xcode_8.3.3.app/Contents/Developer/Toolchains/XcodeDefault.xctoolchain/usr/bin/ld
	checking if the linker (/Applications/Xcode_8.3.3.app/Contents/Developer/Toolchains/XcodeDefault.xctoolchain/usr/bin/ld) is GNU ld... no
	checking whether the clang++ linker (/Applications/Xcode_8.3.3.app/Contents/Developer/Toolchains/XcodeDefault.xctoolchain/usr/bin/ld) s

	require 'pycall'

	SSD_KERAS_DIR = File.expand_path('../../ssd_keras', __FILE__)
	PyCall.import_module('sys').path.append(SSD_KERAS_DIR)
	PICS_DIR = File.join(SSD_KERAS_DIR, 'pics')

	np = PyCall.import_module('numpy')
	imagenet_utils = PyCall.import_module('keras.applications.imagenet_utils')
	image = PyCall.import_module('keras.preprocessing.image')