Last active
September 4, 2022 05:34
-
-
Save heronshoes/494df3d1df74a95ed67ff05bd7807cb2 to your computer and use it in GitHub Desktop.
Wine quality
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
{ | |
"cells": [ | |
{ | |
"cell_type": "markdown", | |
"id": "93423409-ccf2-4145-962b-09f7f879604c", | |
"metadata": {}, | |
"source": [ | |
"## winequality\n", | |
"\n", | |
"[daru + rbplotly + statsample のデモ](https://github.com/sciruby-jp/ruby-datascience-examples/blob/master/datasciencerb.ipynb)をRed amberでやってみる。" | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": 63, | |
"id": "48dced14-cefd-469d-a8f0-6a60e597d127", | |
"metadata": {}, | |
"outputs": [ | |
{ | |
"data": { | |
"text/plain": [ | |
"{:RedAmber=>\"0.2.0\", :Arrow=>\"9.0.0\"}" | |
] | |
}, | |
"execution_count": 63, | |
"metadata": {}, | |
"output_type": "execute_result" | |
} | |
], | |
"source": [ | |
"require 'red_amber'\n", | |
"{RedAmber: RedAmber::VERSION, Arrow: Arrow::VERSION}" | |
] | |
}, | |
{ | |
"cell_type": "markdown", | |
"id": "bf5f8203-034d-4610-ab43-f6c7bb7b5090", | |
"metadata": {}, | |
"source": [ | |
"### CSVの読み込み\n", | |
"\n", | |
"urlから直にcsvを読む。" | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": 2, | |
"id": "a8a9fa91-c846-4613-83d5-32f158ab66c1", | |
"metadata": {}, | |
"outputs": [ | |
{ | |
"data": { | |
"text/html": [ | |
"RedAmber::DataFrame <6497 x 13 vectors> <table><tr><th>type</th><th>fixed acidity</th><th>volatile acidity</th><th>citric acid</th><th>residual sugar</th><th>chlorides</th><th>free sulfur dioxide</th><th>total sulfur dioxide</th><th>density</th><th>pH</th><th>sulphates</th><th>alcohol</th><th>quality</th></tr><tr><td>red</td><td>7.4</td><td>0.7</td><td>0.0</td><td>1.9</td><td>0.076</td><td>11.0</td><td>34.0</td><td>0.9978</td><td>3.51</td><td>0.56</td><td>9.4</td><td>5</td></tr><tr><td>red</td><td>7.8</td><td>0.88</td><td>0.0</td><td>2.6</td><td>0.098</td><td>25.0</td><td>67.0</td><td>0.9968</td><td>3.2</td><td>0.68</td><td>9.8</td><td>5</td></tr><tr><td>red</td><td>7.8</td><td>0.76</td><td>0.04</td><td>2.3</td><td>0.092</td><td>15.0</td><td>54.0</td><td>0.997</td><td>3.26</td><td>0.65</td><td>9.8</td><td>5</td></tr><tr><td>red</td><td>11.2</td><td>0.28</td><td>0.56</td><td>1.9</td><td>0.075</td><td>17.0</td><td>60.0</td><td>0.998</td><td>3.16</td><td>0.58</td><td>9.8</td><td>6</td></tr><tr><td colspan='13'>⋮</td></tr><tr><td>white</td><td>6.5</td><td>0.24</td><td>0.19</td><td>1.2</td><td>0.041</td><td>30.0</td><td>111.0</td><td>0.99254</td><td>2.99</td><td>0.46</td><td>9.4</td><td>6</td></tr><tr><td>white</td><td>5.5</td><td>0.29</td><td>0.3</td><td>1.1</td><td>0.022</td><td>20.0</td><td>110.0</td><td>0.98869</td><td>3.34</td><td>0.38</td><td>12.8</td><td>7</td></tr><tr><td>white</td><td>6.0</td><td>0.21</td><td>0.38</td><td>0.8</td><td>0.02</td><td>22.0</td><td>98.0</td><td>0.98941</td><td>3.26</td><td>0.32</td><td>11.8</td><td>6</td></tr></table>" | |
], | |
"text/plain": [ | |
"#<RedAmber::DataFrame : 6497 x 13 Vectors, 0x000000000000f884>\n", | |
" type fixed acidity volatile acidity citric acid residual sugar ... quality\n", | |
" <string> <double> <double> <double> <double> ... <int64>\n", | |
" 1 red 7.4 0.7 0.0 1.9 ... 5\n", | |
" 2 red 7.8 0.88 0.0 2.6 ... 5\n", | |
" 3 red 7.8 0.76 0.04 2.3 ... 5\n", | |
" 4 red 11.2 0.28 0.56 1.9 ... 6\n", | |
" 5 red 7.4 0.7 0.0 1.9 ... 5\n", | |
" : : : : : : ... :\n", | |
"6495 white 6.5 0.24 0.19 1.2 ... 6\n", | |
"6496 white 5.5 0.29 0.3 1.1 ... 7\n", | |
"6497 white 6.0 0.21 0.38 0.8 ... 6\n" | |
] | |
}, | |
"execution_count": 2, | |
"metadata": {}, | |
"output_type": "execute_result" | |
} | |
], | |
"source": [ | |
"uri = URI('https://raw.githubusercontent.com/sciruby-jp/ruby-datascience-examples/master/winequality-both.csv')\n", | |
"wine = RedAmber::DataFrame.load(uri)" | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": 3, | |
"id": "acec0db1-5093-44fe-b197-7649318aef8d", | |
"metadata": {}, | |
"outputs": [ | |
{ | |
"name": "stdout", | |
"output_type": "stream", | |
"text": [ | |
" type fixed acidity volatile acidity citric acid residual sugar ... quality\n", | |
" <string> <double> <double> <double> <double> ... <int64>\n", | |
" 1 red 7.4 0.7 0.0 1.9 ... 5\n", | |
" 2 red 7.8 0.88 0.0 2.6 ... 5\n", | |
" 3 red 7.8 0.76 0.04 2.3 ... 5\n", | |
" 4 red 11.2 0.28 0.56 1.9 ... 6\n", | |
" 5 red 7.4 0.7 0.0 1.9 ... 5\n", | |
" : : : : : : ... :\n", | |
"6495 white 6.5 0.24 0.19 1.2 ... 6\n", | |
"6496 white 5.5 0.29 0.3 1.1 ... 7\n", | |
"6497 white 6.0 0.21 0.38 0.8 ... 6\n" | |
] | |
} | |
], | |
"source": [ | |
"puts wine" | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": 4, | |
"id": "074ff267-671d-410a-a4f7-7ec5dfafcab4", | |
"metadata": {}, | |
"outputs": [ | |
{ | |
"name": "stdout", | |
"output_type": "stream", | |
"text": [ | |
"RedAmber::DataFrame : 6497 x 13 Vectors\n", | |
"Vectors : 12 numeric, 1 string\n", | |
"# key type level data_preview\n", | |
"1 :type string 2 {\"red\"=>1599, \"white\"=>4898}\n", | |
"2 :\"fixed acidity\" double 106 [7.4, 7.8, 7.8, 11.2, 7.4, ... ]\n", | |
"3 :\"volatile acidity\" double 187 [0.7, 0.88, 0.76, 0.28, 0.7, ... ]\n", | |
"4 :\"citric acid\" double 89 [0.0, 0.0, 0.04, 0.56, 0.0, ... ]\n", | |
"5 :\"residual sugar\" double 316 [1.9, 2.6, 2.3, 1.9, 1.9, ... ]\n", | |
"6 :chlorides double 214 [0.076, 0.098, 0.092, 0.075, 0.076, ... ]\n", | |
"7 :\"free sulfur dioxide\" double 135 [11.0, 25.0, 15.0, 17.0, 11.0, ... ]\n", | |
"8 :\"total sulfur dioxide\" double 276 [34.0, 67.0, 54.0, 60.0, 34.0, ... ]\n", | |
"9 :density double 998 [0.9978, 0.9968, 0.997, 0.998, 0.9978, ... ]\n", | |
"10 :pH double 108 [3.51, 3.2, 3.26, 3.16, 3.51, ... ]\n", | |
"11 :sulphates double 111 [0.56, 0.68, 0.65, 0.58, 0.56, ... ]\n", | |
"12 :alcohol double 111 [9.4, 9.8, 9.8, 9.8, 9.4, ... ]\n", | |
"13 :quality int64 7 [5, 5, 5, 6, 5, ... ]\n" | |
] | |
} | |
], | |
"source": [ | |
"wine.tdr(13)" | |
] | |
}, | |
{ | |
"cell_type": "markdown", | |
"id": "48d1d323-9339-4ca8-aae3-9fba6ce0f59b", | |
"metadata": {}, | |
"source": [ | |
"### 赤ワインと白ワインを分ける" | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": 70, | |
"id": "a0678fc2-df0c-4b03-95fe-ab2fec197016", | |
"metadata": {}, | |
"outputs": [ | |
{ | |
"data": { | |
"text/plain": [ | |
"#<RedAmber::Vector(:string, size=2):0x000000000001c55c>\n", | |
"[\"red\", \"white\"]\n" | |
] | |
}, | |
"execution_count": 70, | |
"metadata": {}, | |
"output_type": "execute_result" | |
} | |
], | |
"source": [ | |
"wine[:type].uniq" | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": 71, | |
"id": "f31dc0ee-9458-4bbf-839a-fb95b6051592", | |
"metadata": {}, | |
"outputs": [ | |
{ | |
"data": { | |
"text/plain": [ | |
"[#<RedAmber::DataFrame : 1599 x 13 Vectors, 0x000000000001c570>\n", | |
" type fixed acidity volatile acidity citric acid residual sugar ... quality\n", | |
" <string> <double> <double> <double> <double> ... <int64>\n", | |
" 1 red 7.4 0.7 0.0 1.9 ... 5\n", | |
" 2 red 7.8 0.88 0.0 2.6 ... 5\n", | |
" 3 red 7.8 0.76 0.04 2.3 ... 5\n", | |
" 4 red 11.2 0.28 0.56 1.9 ... 6\n", | |
" 5 red 7.4 0.7 0.0 1.9 ... 5\n", | |
" : : : : : : ... :\n", | |
"1597 red 6.3 0.51 0.13 2.3 ... 6\n", | |
"1598 red 5.9 0.65 0.12 2.0 ... 5\n", | |
"1599 red 6.0 0.31 0.47 3.6 ... 6\n", | |
", #<RedAmber::DataFrame : 4898 x 13 Vectors, 0x000000000001c584>\n", | |
" type fixed acidity volatile acidity citric acid residual sugar ... quality\n", | |
" <string> <double> <double> <double> <double> ... <int64>\n", | |
" 1 white 7.0 0.27 0.36 20.7 ... 6\n", | |
" 2 white 6.3 0.3 0.34 1.6 ... 6\n", | |
" 3 white 8.1 0.28 0.4 6.9 ... 6\n", | |
" 4 white 7.2 0.23 0.32 8.5 ... 6\n", | |
" 5 white 7.2 0.23 0.32 8.5 ... 6\n", | |
" : : : : : : ... :\n", | |
"4896 white 6.5 0.24 0.19 1.2 ... 6\n", | |
"4897 white 5.5 0.29 0.3 1.1 ... 7\n", | |
"4898 white 6.0 0.21 0.38 0.8 ... 6\n", | |
"]" | |
] | |
}, | |
"execution_count": 71, | |
"metadata": {}, | |
"output_type": "execute_result" | |
} | |
], | |
"source": [ | |
"reds, whites = [\"red\", \"white\"].map { |type| wine[wine[:type] == type] }" | |
] | |
}, | |
{ | |
"cell_type": "markdown", | |
"id": "e35bb7bc-b955-4500-9714-73b4edde1864", | |
"metadata": {}, | |
"source": [ | |
"### qualityのヒストグラムを書く" | |
] | |
}, | |
{ | |
"cell_type": "markdown", | |
"id": "95136b66-d8f7-463c-92d1-33e5899c6428", | |
"metadata": {}, | |
"source": [ | |
"RedAmberでは可視化は他のプロットライブラリに任せるというポリシーなので省略(^_^)" | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": 72, | |
"id": "62af0b82-cc3f-441a-a28e-36a2ed83a325", | |
"metadata": {}, | |
"outputs": [ | |
{ | |
"data": { | |
"text/html": [ | |
"RedAmber::DataFrame <2 x 2 vectors> <table><tr><th>type</th><th>mean(quality)</th></tr><tr><td>red</td><td>5.6360225140712945</td></tr><tr><td>white</td><td>5.87790935075541</td></tr></table>" | |
], | |
"text/plain": [ | |
"#<RedAmber::DataFrame : 2 x 2 Vectors, 0x000000000001c598>\n", | |
" type mean(quality)\n", | |
" <string> <double>\n", | |
"1 red 5.64\n", | |
"2 white 5.88\n" | |
] | |
}, | |
"execution_count": 72, | |
"metadata": {}, | |
"output_type": "execute_result" | |
} | |
], | |
"source": [ | |
"# qualityに差がある?\n", | |
"wine.group(:type).mean(:quality)" | |
] | |
}, | |
{ | |
"cell_type": "markdown", | |
"id": "5155f09d-eec5-4c1b-b818-af123b1c562a", | |
"metadata": {}, | |
"source": [ | |
"雑に相関係数計算用のメソッドを追加する。" | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": 56, | |
"id": "7f0a265b-b666-46a5-9f43-e68f5f8bf3b5", | |
"metadata": {}, | |
"outputs": [ | |
{ | |
"data": { | |
"text/plain": [ | |
"RedAmber::DataFrame" | |
] | |
}, | |
"execution_count": 56, | |
"metadata": {}, | |
"output_type": "execute_result" | |
} | |
], | |
"source": [ | |
"module VectorCorrelation\n", | |
" # 共分散\n", | |
" def covariance(other)\n", | |
" ((self - self.mean) * (other - other.mean)).mean\n", | |
" end\n", | |
"\n", | |
" # 相関係数\n", | |
" def correlation_coeff(other)\n", | |
" covariance(other) / self.stddev / other.stddev\n", | |
" end\n", | |
"end\n", | |
"\n", | |
"module DataFrameCorrelation\n", | |
" # 相関を求める\n", | |
" def corr\n", | |
" df = pick { vectors.map(&:numeric?) } # 数値型だけ抽出する\n", | |
"\n", | |
" RedAmber::DataFrame.new(\n", | |
" df.keys.map do |key|\n", | |
" [key, df.vectors.map { |vector| vector.correlation_coeff(df[key]) }]\n", | |
" end\n", | |
" )\n", | |
" end\n", | |
"end\n", | |
"\n", | |
"RedAmber::Vector.prepend VectorCorrelation\n", | |
"RedAmber::DataFrame.prepend DataFrameCorrelation" | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": 57, | |
"id": "51f71f0d-c983-46a5-9b6f-e79627dc67a1", | |
"metadata": {}, | |
"outputs": [ | |
{ | |
"data": { | |
"text/html": [ | |
"RedAmber::DataFrame <12 x 12 vectors> <table><tr><th>fixed acidity</th><th>volatile acidity</th><th>citric acid</th><th>residual sugar</th><th>chlorides</th><th>free sulfur dioxide</th><th>total sulfur dioxide</th><th>density</th><th>pH</th><th>sulphates</th><th>alcohol</th><th>quality</th></tr><tr><td>1.0</td><td>0.21900825635099677</td><td>0.3244357254472982</td><td>-0.1119812810782357</td><td>0.2981947717027353</td><td>-0.282735428369563</td><td>-0.3290539012952197</td><td>0.4589099822804343</td><td>-0.25270046831623044</td><td>0.2995677443824938</td><td>-0.09545152256332895</td><td>-0.07674320790961987</td></tr><tr><td>0.21900825635099677</td><td>1.0</td><td>-0.37798131705526183</td><td>-0.1960111743476545</td><td>0.3771242764338632</td><td>-0.35255730641340494</td><td>-0.4144761946507164</td><td>0.27129564785117954</td><td>0.2614544027422559</td><td>0.22598367974107134</td><td>-0.03764038583468096</td><td>-0.26569947761146784</td></tr><tr><td>0.3244357254472982</td><td>-0.3779813170552618</td><td>1.0000000000000002</td><td>0.14245122598675877</td><td>0.038998014089851846</td><td>0.13312580951823172</td><td>0.1952419759814532</td><td>0.09615392906417064</td><td>-0.32980819113172216</td><td>0.056197300134971914</td><td>-0.010493492173379231</td><td>0.08553171718367848</td></tr><tr><td>-0.1119812810782357</td><td>-0.1960111743476545</td><td>0.14245122598675877</td><td>1.0</td><td>-0.1289404999032683</td><td>0.4028706400566608</td><td>0.4954815870066483</td><td>0.5525169502934877</td><td>-0.26731983687681155</td><td>-0.18592740529018426</td><td>-0.35941477081599993</td><td>-0.03698048458576945</td></tr><tr><td colspan='12'>⋮</td></tr><tr><td>0.29956774438249373</td><td>0.22598367974107134</td><td>0.056197300134971914</td><td>-0.18592740529018426</td><td>0.39559330654732616</td><td>-0.18845724880121537</td><td>-0.27572681991620596</td><td>0.259478495345752</td><td>0.19212340657115354</td><td>1.0000000000000002</td><td>-0.0030291949442553586</td><td>0.03848544587651445</td></tr><tr><td>-0.09545152256332894</td><td>-0.03764038583468096</td><td>-0.010493492173379231</td><td>-0.35941477081599993</td><td>-0.256915579972914</td><td>-0.17983843488934126</td><td>-0.26573963910716003</td><td>-0.6867454216813397</td><td>0.12124846709464608</td><td>-0.003029194944255359</td><td>1.0</td><td>0.4443185200075176</td></tr><tr><td>-0.07674320790961987</td><td>-0.26569947761146784</td><td>0.08553171718367848</td><td>-0.03698048458576945</td><td>-0.20066550043510206</td><td>0.05546305861663267</td><td>-0.0413854538556088</td><td>-0.30585790606941415</td><td>0.01950570371443586</td><td>0.03848544587651445</td><td>0.4443185200075176</td><td>1.0</td></tr></table>" | |
], | |
"text/plain": [ | |
"#<RedAmber::DataFrame : 12 x 12 Vectors, 0x000000000001c4bc>\n", | |
" fixed acidity volatile acidity citric acid residual sugar chlorides ... quality\n", | |
" <double> <double> <double> <double> <double> ... <double>\n", | |
" 1 1.0 0.22 0.32 -0.11 0.3 ... -0.08\n", | |
" 2 0.22 1.0 -0.38 -0.2 0.38 ... -0.27\n", | |
" 3 0.32 -0.38 1.0 0.14 0.04 ... 0.09\n", | |
" 4 -0.11 -0.2 0.14 1.0 -0.13 ... -0.04\n", | |
" 5 0.3 0.38 0.04 -0.13 1.0 ... -0.2\n", | |
" : : : : : : ... :\n", | |
"10 0.3 0.23 0.06 -0.19 0.4 ... 0.04\n", | |
"11 -0.1 -0.04 -0.01 -0.36 -0.26 ... 0.44\n", | |
"12 -0.08 -0.27 0.09 -0.04 -0.2 ... 1.0\n" | |
] | |
}, | |
"execution_count": 57, | |
"metadata": {}, | |
"output_type": "execute_result" | |
} | |
], | |
"source": [ | |
"wine.corr" | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": 58, | |
"id": "e7c7e373-4eed-4ce5-be28-31312870c926", | |
"metadata": {}, | |
"outputs": [ | |
{ | |
"name": "stdout", | |
"output_type": "stream", | |
"text": [ | |
" user system total real\n", | |
" 0.270285 0.003969 0.274254 ( 0.289007)\n" | |
] | |
} | |
], | |
"source": [ | |
"require 'benchmark'\n", | |
"\n", | |
"Benchmark.bm do |x|\n", | |
" x.report { wine.corr }\n", | |
"end; nil" | |
] | |
}, | |
{ | |
"cell_type": "markdown", | |
"id": "d396b228-d094-4abc-90e8-b9f5c4e4a961", | |
"metadata": {}, | |
"source": [ | |
"Rubyの上で計算しているし、行列の上三角/下三角両方計算している割にはそんなに遅くない。Red Arrow万歳。" | |
] | |
}, | |
{ | |
"cell_type": "markdown", | |
"id": "5b318984-e763-4840-9581-d8c5d6878c22", | |
"metadata": {}, | |
"source": [ | |
"### 操作性\n", | |
"\n", | |
"Arrowのデータはイミュータブルであることと、Vector#&演算子の取り扱いに注意." | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": 76, | |
"id": "9986a823-e97c-454a-918b-8bbb48ed5fc7", | |
"metadata": {}, | |
"outputs": [ | |
{ | |
"data": { | |
"text/plain": [ | |
"#<RedAmber::Vector(:boolean, size=6497):0x000000000001c5c0>\n", | |
"[false, false, false, true, false, false, false, true, true, false, false, ... ]\n" | |
] | |
}, | |
"execution_count": 76, | |
"metadata": {}, | |
"output_type": "execute_result" | |
} | |
], | |
"source": [ | |
"# この場合は括弧が必要\n", | |
"filter = (wine[:quality] >= 6) & (wine[:alcohol] <= 10)" | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": 73, | |
"id": "5cd52975-be17-4fdd-ac33-27fb806c36cd", | |
"metadata": {}, | |
"outputs": [ | |
{ | |
"data": { | |
"text/html": [ | |
"RedAmber::DataFrame <6497 x 13 vectors> <table><tr><th>type</th><th>fixed acidity</th><th>volatile acidity</th><th>citric acid</th><th>residual sugar</th><th>chlorides</th><th>free sulfur dioxide</th><th>total sulfur dioxide</th><th>density</th><th>pH</th><th>sulphates</th><th>alcohol</th><th>quality</th></tr><tr><td>red</td><td>7.4</td><td>0.7</td><td>0.0</td><td>1.9</td><td>0.076</td><td>11.0</td><td>34.0</td><td>0.9978</td><td>3</td><td>0.56</td><td>9.4</td><td>5</td></tr><tr><td>red</td><td>7.8</td><td>0.88</td><td>0.0</td><td>2.6</td><td>0.098</td><td>25.0</td><td>67.0</td><td>0.9968</td><td>3</td><td>0.68</td><td>9.8</td><td>5</td></tr><tr><td>red</td><td>7.8</td><td>0.76</td><td>0.04</td><td>2.3</td><td>0.092</td><td>15.0</td><td>54.0</td><td>0.997</td><td>3</td><td>0.65</td><td>9.8</td><td>5</td></tr><tr><td>red</td><td>11.2</td><td>0.28</td><td>0.56</td><td>1.9</td><td>0.075</td><td>17.0</td><td>60.0</td><td>0.998</td><td>0</td><td>0.58</td><td>9.8</td><td>6</td></tr><tr><td colspan='13'>⋮</td></tr><tr><td>white</td><td>6.5</td><td>0.24</td><td>0.19</td><td>1.2</td><td>0.041</td><td>30.0</td><td>111.0</td><td>0.99254</td><td>0</td><td>0.46</td><td>9.4</td><td>6</td></tr><tr><td>white</td><td>5.5</td><td>0.29</td><td>0.3</td><td>1.1</td><td>0.022</td><td>20.0</td><td>110.0</td><td>0.98869</td><td>3</td><td>0.38</td><td>12.8</td><td>7</td></tr><tr><td>white</td><td>6.0</td><td>0.21</td><td>0.38</td><td>0.8</td><td>0.02</td><td>22.0</td><td>98.0</td><td>0.98941</td><td>3</td><td>0.32</td><td>11.8</td><td>6</td></tr></table>" | |
], | |
"text/plain": [ | |
"#<RedAmber::DataFrame : 6497 x 13 Vectors, 0x000000000001c5ac>\n", | |
" type fixed acidity volatile acidity citric acid residual sugar ... quality\n", | |
" <string> <double> <double> <double> <double> ... <int64>\n", | |
" 1 red 7.4 0.7 0.0 1.9 ... 5\n", | |
" 2 red 7.8 0.88 0.0 2.6 ... 5\n", | |
" 3 red 7.8 0.76 0.04 2.3 ... 5\n", | |
" 4 red 11.2 0.28 0.56 1.9 ... 6\n", | |
" 5 red 7.4 0.7 0.0 1.9 ... 5\n", | |
" : : : : : : ... :\n", | |
"6495 white 6.5 0.24 0.19 1.2 ... 6\n", | |
"6496 white 5.5 0.29 0.3 1.1 ... 7\n", | |
"6497 white 6.0 0.21 0.38 0.8 ... 6\n" | |
] | |
}, | |
"execution_count": 73, | |
"metadata": {}, | |
"output_type": "execute_result" | |
} | |
], | |
"source": [ | |
"# :pHの列をfilterを使って置換する\n", | |
"wine2 = wine.assign do\n", | |
" { pH: wine[:pH].replace(filter, 0) }\n", | |
"end" | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": 75, | |
"id": "50c62894-5af4-452c-bc2b-a2a521a1aa05", | |
"metadata": {}, | |
"outputs": [ | |
{ | |
"data": { | |
"text/plain": [ | |
"{3=>4935, 0=>1228, 2=>332, 4=>2}" | |
] | |
}, | |
"execution_count": 75, | |
"metadata": {}, | |
"output_type": "execute_result" | |
} | |
], | |
"source": [ | |
"# pH=0はどれくらいあったのかな?\n", | |
"wine2[:pH].tally" | |
] | |
}, | |
{ | |
"cell_type": "markdown", | |
"id": "10e71d3c-9a9f-495b-8ab2-522aa75583fe", | |
"metadata": {}, | |
"source": [ | |
"## Daru, Pandasとの比較\n", | |
"\n", | |
"正直、Daruをあまり使ったことがないので(え?)、機能の比較があまりできないのですが、RedAmberでは上でやっているような行方向と列方向の真偽値でのフィルタリングが一番大事じゃないかなあと思っています。\n", | |
"\n", | |
"RedAmberでは、列の選択/削除は`pick`, `drop`, 行の選択/削除は`slice`, `remove`というメソッドに分けていて、行と列の両方に対して`[]`も使えるようになっています。\n", | |
"これらのメソッドは、真偽値の配列またはインデックスやキーの配列を受け付けるようになっているのですが、インデックスも突き詰めれば該当する真偽値の配列に行き着きますし、ベクトル演算の面から実装上も有利なことから、真偽値でのフィルタリングを通じて一貫した操作が成り立つと言えるのではないでしょうか。" | |
] | |
}, | |
{ | |
"cell_type": "markdown", | |
"id": "0d6bf507-89ac-4bf3-9e5f-ed1ac3283d43", | |
"metadata": {}, | |
"source": [ | |
"RedAmberのクラスはとっても肥大化していますが、これは使用者にとってのわかりやすさと、Rubyのコレクションクラスを積極的に利用する設計方針と、将来的にはベースとなるRed Arrowに機能を移してデータフレームライブラリはスリムにすれば良いという考えの下でやっています。これについてはご意見を頂戴したいと思っています。" | |
] | |
}, | |
{ | |
"cell_type": "markdown", | |
"id": "85fde615-3ec4-4fad-a668-10db2806515e", | |
"metadata": {}, | |
"source": [ | |
"私の拙い経験では、一時期 R や Pandas を使っていました(PyCall 経由も含めて)。しかし Ruby らしくデータサイエンスしたいという思いがずっとあって、Red Arrow という素晴らしいライブラリの助けを借りて自分なりにできることをやってみた結果が RedAmber です。他の言語で書かれているデータ処理事例を RedAmber で書いてみて、機能を追加したり改善したりをしているところですが、Ruby で書ける気持ちよさっていうのはやっぱりあると感じています。\n", | |
"\n", | |
"下記に機能別のコード例と、一部 Pandas Cookbook の内容をなぞってみた例があります。\n", | |
"https://github.com/heronshoes/red_amber/blob/master/doc/examples_of_red_amber.ipynb" | |
] | |
} | |
], | |
"metadata": { | |
"kernelspec": { | |
"display_name": "Ruby 3.1.1", | |
"language": "ruby", | |
"name": "ruby" | |
}, | |
"language_info": { | |
"file_extension": ".rb", | |
"mimetype": "application/x-ruby", | |
"name": "ruby", | |
"version": "3.1.1" | |
} | |
}, | |
"nbformat": 4, | |
"nbformat_minor": 5 | |
} |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment