Last active
June 1, 2023 07:55
-
-
Save heronshoes/5345629993b456123d6064c5e53d135e to your computer and use it in GitHub Desktop.
前のレコードに依存する表データのRedAmberでの処理方法
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
{ | |
"cells": [ | |
{ | |
"cell_type": "markdown", | |
"id": "0f431f59-a036-40ef-ba89-01e3ac701053", | |
"metadata": {}, | |
"source": [ | |
"## 前のレコードに依存する表データのRedAmberでの処理方法\n", | |
"\n", | |
"これは 5/31に開催された [すごい広島523 .rb with Python オンライン](https://great-h.connpass.com/event/283652/) で登場した話題のフォローです。\n", | |
"\n", | |
"[MySQL のユーザー定義変数と RDB のココロ](https://www.slideshare.net/tsudaa/my-sqlrdb) にあるようなデータの処理を\n", | |
"RedAmberのデータフレームでやったらどうなるか試してみました。\n", | |
"\n", | |
"データフレームなので特に変数も何も使い放題ですが、縛りとしてループを使うのではなくベクトル演算を使ったやり方を考えました。" | |
] | |
}, | |
{ | |
"cell_type": "markdown", | |
"id": "08b245dd-e5d8-4236-97af-48ddcceff64c", | |
"metadata": {}, | |
"source": [ | |
"### ライブラリの読み込み" | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": 1, | |
"id": "386e40ff-71b5-4efa-8224-4ab9c719265f", | |
"metadata": {}, | |
"outputs": [ | |
{ | |
"data": { | |
"text/plain": [ | |
"{:RedAmber=>\"0.5.0\", :Arrow=>\"12.0.0\"}" | |
] | |
}, | |
"execution_count": 1, | |
"metadata": {}, | |
"output_type": "execute_result" | |
} | |
], | |
"source": [ | |
"require 'red_amber'\n", | |
"{ RedAmber: RedAmber::VERSION, Arrow: Arrow::VERSION }" | |
] | |
}, | |
{ | |
"cell_type": "markdown", | |
"id": "4869dc28-2da4-47f0-86c7-70c3f72bd3d8", | |
"metadata": { | |
"tags": [] | |
}, | |
"source": [ | |
"### 表データの読み込み" | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": 3, | |
"id": "c8609107-cbd7-4aad-ad98-6c80edcd0a21", | |
"metadata": { | |
"tags": [] | |
}, | |
"outputs": [ | |
{ | |
"data": { | |
"text/html": [ | |
"RedAmber::DataFrame <5 x 3 vectors> <table><tr><th>日付</th><th>借入額</th><th>返済額</th></tr><tr><td>2023/06/01</td><td>20000</td><td>0</td></tr><tr><td>2023/06/02</td><td>0</td><td>5000</td></tr><tr><td>2023/06/03</td><td>0</td><td>4000</td></tr><tr><td>2023/06/04</td><td>0</td><td>3000</td></tr><tr><td>2023/06/05</td><td>10000</td><td>2000</td></tr></table>" | |
], | |
"text/plain": [ | |
"#<RedAmber::DataFrame : 5 x 3 Vectors, 0x000000000000f7e4>\n", | |
" 日付 借入額 返済額\n", | |
" <string> <int64> <int64>\n", | |
"0 2023/06/01 20000 0\n", | |
"1 2023/06/02 0 5000\n", | |
"2 2023/06/03 0 4000\n", | |
"3 2023/06/04 0 3000\n", | |
"4 2023/06/05 10000 2000\n" | |
] | |
}, | |
"execution_count": 3, | |
"metadata": {}, | |
"output_type": "execute_result" | |
} | |
], | |
"source": [ | |
"dataframe =\n", | |
" RedAmber::DataFrame.load(Arrow::Buffer.new(<<~CSV), format: :csv)\n", | |
" 日付,借入額,返済額\n", | |
" 2023/06/01,20000,0\n", | |
" 2023/06/02,0,5000\n", | |
" 2023/06/03,0,4000\n", | |
" 2023/06/04,0,3000\n", | |
" 2023/06/05,10000,2000\n", | |
" CSV" | |
] | |
}, | |
{ | |
"cell_type": "markdown", | |
"id": "101b4b68-e27a-4983-9a98-9a5703f5af14", | |
"metadata": {}, | |
"source": [ | |
"### 残額を計算し、新しい列に展開する(解答)" | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": 22, | |
"id": "51734ff9-afbf-444d-a8d6-80f11a6a5664", | |
"metadata": { | |
"tags": [] | |
}, | |
"outputs": [ | |
{ | |
"data": { | |
"text/html": [ | |
"RedAmber::DataFrame <5 x 6 vectors> <table><tr><th>日付</th><th>借入額</th><th>返済額</th><th>差引</th><th>利息</th><th>残額</th></tr><tr><td>2023/06/01</td><td>20000</td><td>0</td><td>20000</td><td>200</td><td>20200</td></tr><tr><td>2023/06/02</td><td>0</td><td>5000</td><td>15200</td><td>152</td><td>15352</td></tr><tr><td>2023/06/03</td><td>0</td><td>4000</td><td>11352</td><td>114</td><td>11466</td></tr><tr><td>2023/06/04</td><td>0</td><td>3000</td><td>8466</td><td>85</td><td>8551</td></tr><tr><td>2023/06/05</td><td>10000</td><td>2000</td><td>16550</td><td>166</td><td>16716</td></tr></table>" | |
], | |
"text/plain": [ | |
"#<RedAmber::DataFrame : 5 x 6 Vectors, 0x000000000000f910>\n", | |
" 日付 借入額 返済額 差引 利息 残額\n", | |
" <string> <int64> <int64> <double> <double> <double>\n", | |
"0 2023/06/01 20000 0 20000.0 200.0 20200.0\n", | |
"1 2023/06/02 0 5000 15200.0 152.0 15352.0\n", | |
"2 2023/06/03 0 4000 11352.0 114.0 11466.0\n", | |
"3 2023/06/04 0 3000 8466.0 85.0 8551.0\n", | |
"4 2023/06/05 10000 2000 16550.0 166.0 16716.0\n" | |
] | |
}, | |
"execution_count": 22, | |
"metadata": {}, | |
"output_type": "execute_result" | |
} | |
], | |
"source": [ | |
"dataframe\n", | |
" .assign do\n", | |
" v = 借入額 - 返済額\n", | |
" 残額 =\n", | |
" size.times.with_object([]) do |_, a|\n", | |
" v = v * 1.01\n", | |
" a << v\n", | |
" v = v.shift.fill_nil(0)\n", | |
" end\n", | |
" .reduce(&:+)\n", | |
" .ceil\n", | |
" 利息 = (残額 - 残額 / 1.01).ceil\n", | |
" {\n", | |
" '差引': 残額 - 利息,\n", | |
" '利息': 利息,\n", | |
" '残額': 残額,\n", | |
" }\n", | |
" end" | |
] | |
}, | |
{ | |
"cell_type": "markdown", | |
"id": "36d1e66a-ec75-40b0-928d-995c2825ad23", | |
"metadata": {}, | |
"source": [ | |
"このやり方は、下記のように1.01倍してシフトしたベクトルを作り、それらを足し合わせる操作を行っています。\n", | |
"\n", | |
"式としてはこれであっているのですが、全く直観的な処理ではないです。" | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": 5, | |
"id": "4a9e34e1-9e62-4f19-ac2b-45df87136b5f", | |
"metadata": {}, | |
"outputs": [ | |
{ | |
"data": { | |
"text/html": [ | |
"RedAmber::DataFrame <5 x 8 vectors> <table><tr><th>日付</th><th>借入額</th><th>返済額</th><th>first</th><th>second</th><th>third</th><th>fourth</th><th>fifth</th></tr><tr><td>2023/06/01</td><td>20000</td><td>0</td><td>20200</td><td>0</td><td>0</td><td>0</td><td>0</td></tr><tr><td>2023/06/02</td><td>0</td><td>5000</td><td>-5050</td><td>20402</td><td>0</td><td>0</td><td>0</td></tr><tr><td>2023/06/03</td><td>0</td><td>4000</td><td>-4040</td><td>-5100.5</td><td>20606</td><td>0</td><td>0</td></tr><tr><td>2023/06/04</td><td>0</td><td>3000</td><td>-3030</td><td>-4080.4</td><td>-5151.50</td><td>20812.1</td><td>0</td></tr><tr><td>2023/06/05</td><td>10000</td><td>2000</td><td>8080</td><td>-3060.3</td><td>-4121.2</td><td>-5203.02</td><td>21020.2</td></tr></table>" | |
], | |
"text/plain": [ | |
"#<RedAmber::DataFrame : 5 x 8 Vectors, 0x000000000000f80c>\n", | |
" 日付 借入額 返済額 first second third fourth fifth\n", | |
" <string> <int64> <int64> <double> <double> <double> <double> <double>\n", | |
"0 2023/06/01 20000 0 20200.0 0.0 0.0 0.0 0.0\n", | |
"1 2023/06/02 0 5000 -5050.0 20402.0 0.0 0.0 0.0\n", | |
"2 2023/06/03 0 4000 -4040.0 -5100.5 20606.02 0.0 0.0\n", | |
"3 2023/06/04 0 3000 -3030.0 -4080.4 -5151.51 20812.08 0.0\n", | |
"4 2023/06/05 10000 2000 8080.0 -3060.3 -4121.2 -5203.02 21020.2\n" | |
] | |
}, | |
"execution_count": 5, | |
"metadata": {}, | |
"output_type": "execute_result" | |
} | |
], | |
"source": [ | |
"dataframe\n", | |
" .assign(:first, :second, :third, :fourth, :fifth) do\n", | |
" v = 借入額 - 返済額\n", | |
" size.times.with_object([]) do |_, a|\n", | |
" v = v * 1.01\n", | |
" a << v\n", | |
" v = v.shift.fill_nil(0)\n", | |
" end\n", | |
" end" | |
] | |
}, | |
{ | |
"cell_type": "markdown", | |
"id": "6070910e-824e-4715-abd1-7f33abf1f74b", | |
"metadata": {}, | |
"source": [ | |
"一応ベクトル化された処理で書くことはできました。\n", | |
"前のレコードを参照して加えるような操作は、ベクトル的には列のシフト (`#shift`)を使えば良いことが分かりました。\n", | |
"\n", | |
"他のデータフレームやSQLでは、`lag`が相当すると思います。\n", | |
"\n", | |
"(行数-1)回シフトするので、本当に効率的な処理なのかどうかはわかりません。" | |
] | |
} | |
], | |
"metadata": { | |
"kernelspec": { | |
"display_name": "Ruby 3.2.1", | |
"language": "ruby", | |
"name": "ruby" | |
}, | |
"language_info": { | |
"file_extension": ".rb", | |
"mimetype": "application/x-ruby", | |
"name": "ruby", | |
"version": "3.2.1" | |
} | |
}, | |
"nbformat": 4, | |
"nbformat_minor": 5 | |
} |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment