Skip to content

Instantly share code, notes, and snippets.

Show Gist options
  • Save heronshoes/5345629993b456123d6064c5e53d135e to your computer and use it in GitHub Desktop.
Save heronshoes/5345629993b456123d6064c5e53d135e to your computer and use it in GitHub Desktop.
前のレコードに依存する表データのRedAmberでの処理方法
Display the source blob
Display the rendered blob
Raw
{
"cells": [
{
"cell_type": "markdown",
"id": "0f431f59-a036-40ef-ba89-01e3ac701053",
"metadata": {},
"source": [
"## 前のレコードに依存する表データのRedAmberでの処理方法\n",
"\n",
"これは 5/31に開催された [すごい広島523 .rb with Python オンライン](https://great-h.connpass.com/event/283652/) で登場した話題のフォローです。\n",
"\n",
"[MySQL のユーザー定義変数と RDB のココロ](https://www.slideshare.net/tsudaa/my-sqlrdb) にあるようなデータの処理を\n",
"RedAmberのデータフレームでやったらどうなるか試してみました。\n",
"\n",
"データフレームなので特に変数も何も使い放題ですが、縛りとしてループを使うのではなくベクトル演算を使ったやり方を考えました。"
]
},
{
"cell_type": "markdown",
"id": "08b245dd-e5d8-4236-97af-48ddcceff64c",
"metadata": {},
"source": [
"### ライブラリの読み込み"
]
},
{
"cell_type": "code",
"execution_count": 1,
"id": "386e40ff-71b5-4efa-8224-4ab9c719265f",
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"{:RedAmber=>\"0.5.0\", :Arrow=>\"12.0.0\"}"
]
},
"execution_count": 1,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"require 'red_amber'\n",
"{ RedAmber: RedAmber::VERSION, Arrow: Arrow::VERSION }"
]
},
{
"cell_type": "markdown",
"id": "4869dc28-2da4-47f0-86c7-70c3f72bd3d8",
"metadata": {
"tags": []
},
"source": [
"### 表データの読み込み"
]
},
{
"cell_type": "code",
"execution_count": 3,
"id": "c8609107-cbd7-4aad-ad98-6c80edcd0a21",
"metadata": {
"tags": []
},
"outputs": [
{
"data": {
"text/html": [
"RedAmber::DataFrame <5 x 3 vectors> <table><tr><th>日付</th><th>借入額</th><th>返済額</th></tr><tr><td>2023/06/01</td><td>20000</td><td>0</td></tr><tr><td>2023/06/02</td><td>0</td><td>5000</td></tr><tr><td>2023/06/03</td><td>0</td><td>4000</td></tr><tr><td>2023/06/04</td><td>0</td><td>3000</td></tr><tr><td>2023/06/05</td><td>10000</td><td>2000</td></tr></table>"
],
"text/plain": [
"#<RedAmber::DataFrame : 5 x 3 Vectors, 0x000000000000f7e4>\n",
" 日付 借入額 返済額\n",
" <string> <int64> <int64>\n",
"0 2023/06/01 20000 0\n",
"1 2023/06/02 0 5000\n",
"2 2023/06/03 0 4000\n",
"3 2023/06/04 0 3000\n",
"4 2023/06/05 10000 2000\n"
]
},
"execution_count": 3,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"dataframe =\n",
" RedAmber::DataFrame.load(Arrow::Buffer.new(<<~CSV), format: :csv)\n",
" 日付,借入額,返済額\n",
" 2023/06/01,20000,0\n",
" 2023/06/02,0,5000\n",
" 2023/06/03,0,4000\n",
" 2023/06/04,0,3000\n",
" 2023/06/05,10000,2000\n",
" CSV"
]
},
{
"cell_type": "markdown",
"id": "101b4b68-e27a-4983-9a98-9a5703f5af14",
"metadata": {},
"source": [
"### 残額を計算し、新しい列に展開する(解答)"
]
},
{
"cell_type": "code",
"execution_count": 22,
"id": "51734ff9-afbf-444d-a8d6-80f11a6a5664",
"metadata": {
"tags": []
},
"outputs": [
{
"data": {
"text/html": [
"RedAmber::DataFrame <5 x 6 vectors> <table><tr><th>日付</th><th>借入額</th><th>返済額</th><th>差引</th><th>利息</th><th>残額</th></tr><tr><td>2023/06/01</td><td>20000</td><td>0</td><td>20000</td><td>200</td><td>20200</td></tr><tr><td>2023/06/02</td><td>0</td><td>5000</td><td>15200</td><td>152</td><td>15352</td></tr><tr><td>2023/06/03</td><td>0</td><td>4000</td><td>11352</td><td>114</td><td>11466</td></tr><tr><td>2023/06/04</td><td>0</td><td>3000</td><td>8466</td><td>85</td><td>8551</td></tr><tr><td>2023/06/05</td><td>10000</td><td>2000</td><td>16550</td><td>166</td><td>16716</td></tr></table>"
],
"text/plain": [
"#<RedAmber::DataFrame : 5 x 6 Vectors, 0x000000000000f910>\n",
" 日付 借入額 返済額 差引 利息 残額\n",
" <string> <int64> <int64> <double> <double> <double>\n",
"0 2023/06/01 20000 0 20000.0 200.0 20200.0\n",
"1 2023/06/02 0 5000 15200.0 152.0 15352.0\n",
"2 2023/06/03 0 4000 11352.0 114.0 11466.0\n",
"3 2023/06/04 0 3000 8466.0 85.0 8551.0\n",
"4 2023/06/05 10000 2000 16550.0 166.0 16716.0\n"
]
},
"execution_count": 22,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"dataframe\n",
" .assign do\n",
" v = 借入額 - 返済額\n",
" 残額 =\n",
" size.times.with_object([]) do |_, a|\n",
" v = v * 1.01\n",
" a << v\n",
" v = v.shift.fill_nil(0)\n",
" end\n",
" .reduce(&:+)\n",
" .ceil\n",
" 利息 = (残額 - 残額 / 1.01).ceil\n",
" {\n",
" '差引': 残額 - 利息,\n",
" '利息': 利息,\n",
" '残額': 残額,\n",
" }\n",
" end"
]
},
{
"cell_type": "markdown",
"id": "36d1e66a-ec75-40b0-928d-995c2825ad23",
"metadata": {},
"source": [
"このやり方は、下記のように1.01倍してシフトしたベクトルを作り、それらを足し合わせる操作を行っています。\n",
"\n",
"式としてはこれであっているのですが、全く直観的な処理ではないです。"
]
},
{
"cell_type": "code",
"execution_count": 5,
"id": "4a9e34e1-9e62-4f19-ac2b-45df87136b5f",
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"RedAmber::DataFrame <5 x 8 vectors> <table><tr><th>日付</th><th>借入額</th><th>返済額</th><th>first</th><th>second</th><th>third</th><th>fourth</th><th>fifth</th></tr><tr><td>2023/06/01</td><td>20000</td><td>0</td><td>20200</td><td>0</td><td>0</td><td>0</td><td>0</td></tr><tr><td>2023/06/02</td><td>0</td><td>5000</td><td>-5050</td><td>20402</td><td>0</td><td>0</td><td>0</td></tr><tr><td>2023/06/03</td><td>0</td><td>4000</td><td>-4040</td><td>-5100.5</td><td>20606</td><td>0</td><td>0</td></tr><tr><td>2023/06/04</td><td>0</td><td>3000</td><td>-3030</td><td>-4080.4</td><td>-5151.50</td><td>20812.1</td><td>0</td></tr><tr><td>2023/06/05</td><td>10000</td><td>2000</td><td>8080</td><td>-3060.3</td><td>-4121.2</td><td>-5203.02</td><td>21020.2</td></tr></table>"
],
"text/plain": [
"#<RedAmber::DataFrame : 5 x 8 Vectors, 0x000000000000f80c>\n",
" 日付 借入額 返済額 first second third fourth fifth\n",
" <string> <int64> <int64> <double> <double> <double> <double> <double>\n",
"0 2023/06/01 20000 0 20200.0 0.0 0.0 0.0 0.0\n",
"1 2023/06/02 0 5000 -5050.0 20402.0 0.0 0.0 0.0\n",
"2 2023/06/03 0 4000 -4040.0 -5100.5 20606.02 0.0 0.0\n",
"3 2023/06/04 0 3000 -3030.0 -4080.4 -5151.51 20812.08 0.0\n",
"4 2023/06/05 10000 2000 8080.0 -3060.3 -4121.2 -5203.02 21020.2\n"
]
},
"execution_count": 5,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"dataframe\n",
" .assign(:first, :second, :third, :fourth, :fifth) do\n",
" v = 借入額 - 返済額\n",
" size.times.with_object([]) do |_, a|\n",
" v = v * 1.01\n",
" a << v\n",
" v = v.shift.fill_nil(0)\n",
" end\n",
" end"
]
},
{
"cell_type": "markdown",
"id": "6070910e-824e-4715-abd1-7f33abf1f74b",
"metadata": {},
"source": [
"一応ベクトル化された処理で書くことはできました。\n",
"前のレコードを参照して加えるような操作は、ベクトル的には列のシフト (`#shift`)を使えば良いことが分かりました。\n",
"\n",
"他のデータフレームやSQLでは、`lag`が相当すると思います。\n",
"\n",
"(行数-1)回シフトするので、本当に効率的な処理なのかどうかはわかりません。"
]
}
],
"metadata": {
"kernelspec": {
"display_name": "Ruby 3.2.1",
"language": "ruby",
"name": "ruby"
},
"language_info": {
"file_extension": ".rb",
"mimetype": "application/x-ruby",
"name": "ruby",
"version": "3.2.1"
}
},
"nbformat": 4,
"nbformat_minor": 5
}
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment