Created
May 3, 2022 18:04
-
-
Save dmbates/5b99dcbc3c3cc19b19a7c09a315e1fe3 to your computer and use it in GitHub Desktop.
Notebook related to MixedModels.jl issue 611
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
{ | |
"cells": [ | |
{ | |
"cell_type": "raw", | |
"id": "52407268", | |
"metadata": {}, | |
"source": [ | |
"---\n", | |
"title: Issue611\n", | |
"author: Douglas Bates\n", | |
"date: today\n", | |
"date-format: iso\n", | |
"execute:\n", | |
" keep-ipynb: true\n", | |
"---" | |
] | |
}, | |
{ | |
"cell_type": "markdown", | |
"id": "8a5dbdf2", | |
"metadata": {}, | |
"source": [ | |
"## Load packages and data" | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": 1, | |
"id": "8f6b34c7", | |
"metadata": {}, | |
"outputs": [], | |
"source": [ | |
"using CSV, MixedModels, ProgressMeter, Tables\n", | |
"ProgressMeter.ijulia_behavior(:clear);" | |
] | |
}, | |
{ | |
"cell_type": "markdown", | |
"id": "7ca7944e", | |
"metadata": {}, | |
"source": [ | |
"Recent developments in Julia make it convenient to name the weights vector `wts`, which is the name of the named argument in the `LinearMixedModel` constructor," | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": 2, | |
"id": "b344592e", | |
"metadata": {}, | |
"outputs": [ | |
{ | |
"data": { | |
"text/plain": [ | |
"2110-element Vector{Float64}:\n", | |
" 274.388611\n", | |
" 1015.990089\n", | |
" 2197.288998\n", | |
" 3367.868333\n", | |
" 2415.089921\n", | |
" 2980.472642\n", | |
" 1621.304086\n", | |
" 1452.363879\n", | |
" 1612.956475\n", | |
" 1836.63395\n", | |
" 2548.059624\n", | |
" 3075.149043\n", | |
" 2846.338504\n", | |
" ⋮\n", | |
" 90.037501\n", | |
" 64.327124\n", | |
" 29.44757\n", | |
" 29.084692\n", | |
" 27.125033\n", | |
" 409.445973\n", | |
" 12.593273\n", | |
" 431.874531\n", | |
" 37.276392\n", | |
" 45.440296\n", | |
" 75.674115\n", | |
" 18.573065" | |
] | |
}, | |
"execution_count": 3, | |
"metadata": {}, | |
"output_type": "execute_result" | |
} | |
], | |
"source": [ | |
"dat = CSV.read(\"./data/model_ready_data_small.csv\", columntable)\n", | |
"wts = CSV.read(\"./data/weights.csv\", columntable).weight_vector" | |
] | |
}, | |
{ | |
"cell_type": "markdown", | |
"id": "3289eed6", | |
"metadata": {}, | |
"source": [ | |
"Also, assign a `contrasts` specification of `Grouping` to the grouping factors" | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": 3, | |
"id": "89442b55", | |
"metadata": {}, | |
"outputs": [], | |
"source": [ | |
"contrasts = Dict(\n", | |
" :brand_category_channel_type => Grouping(),\n", | |
" :channel_type_sub_brand => Grouping(),\n", | |
");" | |
] | |
}, | |
{ | |
"cell_type": "markdown", | |
"id": "7111184e", | |
"metadata": {}, | |
"source": [ | |
"With the `Grouping` contrast there is no need to change from an Integer to a Categorical type.\n", | |
"In general I would recommend prepending a character to the number in the CSV file so that the column is read as a String or a `PooledArray`, which is like a Categorical array.\n", | |
"\n", | |
"For example," | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": 4, | |
"id": "142f98f1", | |
"metadata": {}, | |
"outputs": [ | |
{ | |
"data": { | |
"text/plain": [ | |
"2110-element Vector{String}:\n", | |
" \"B00\"\n", | |
" \"B00\"\n", | |
" \"B00\"\n", | |
" \"B00\"\n", | |
" \"B00\"\n", | |
" \"B00\"\n", | |
" \"B00\"\n", | |
" \"B00\"\n", | |
" \"B00\"\n", | |
" \"B00\"\n", | |
" \"B00\"\n", | |
" \"B00\"\n", | |
" \"B00\"\n", | |
" ⋮\n", | |
" \"B15\"\n", | |
" \"B15\"\n", | |
" \"B15\"\n", | |
" \"B15\"\n", | |
" \"B15\"\n", | |
" \"B15\"\n", | |
" \"B15\"\n", | |
" \"B15\"\n", | |
" \"B15\"\n", | |
" \"B15\"\n", | |
" \"B15\"\n", | |
" \"B15\"" | |
] | |
}, | |
"execution_count": 5, | |
"metadata": {}, | |
"output_type": "execute_result" | |
} | |
], | |
"source": [ | |
"brand = string.('B', lpad.(dat.brand_category_channel_type, 2, '0'))" | |
] | |
}, | |
{ | |
"cell_type": "markdown", | |
"id": "cbf43ef0", | |
"metadata": {}, | |
"source": [ | |
"Recently I do any permanent storage of data frames as arrow files (see [Arrow.jl](https://github.com/apache/arrow-julia)) because they retain the metadata and you don't need to go through the conversion from CSV to an internal representation followed by transformations.\n", | |
"\n", | |
"## Initial model fits\n", | |
"\n", | |
"Before going to vector-valued random effects I would try just random intercepts to see what kinds of fits they produce.\n", | |
"I write this now as a let block so that I can define the formula but it doesn't clutter up the namespace." | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": 5, | |
"id": "d9a1e689", | |
"metadata": {}, | |
"outputs": [ | |
{ | |
"name": "stderr", | |
"output_type": "stream", | |
"text": [ | |
"\r", | |
"\u001b[32mMinimizing 17 \t Time: 0:00:00 (23.36 ms/it)\u001b[39m\n" | |
] | |
}, | |
{ | |
"data": { | |
"text/latex": [ | |
"\\begin{tabular}\n", | |
"{l | r | r | r | r | r}\n", | |
" & Est. & SE & z & p & σ\\_channel\\_type\\_sub\\_brand \\\\\n", | |
"\\hline\n", | |
"(Intercept) & -0.0016 & 0.0072 & -0.22 & 0.8254 & 0.0000 \\\\\n", | |
"ppi & -0.8022 & 0.0718 & -11.18 & <1e-28 & \\\\\n", | |
"base\\_price & -0.6277 & 0.0725 & -8.66 & <1e-17 & \\\\\n", | |
"Residual & 33.0010 & & & & \\\\\n", | |
"\\end{tabular}\n" | |
], | |
"text/markdown": [ | |
"| | Est. | SE | z | p | σ_channel_type_sub_brand |\n", | |
"|:----------- | -------:| ------:| ------:| ------:| ------------------------:|\n", | |
"| (Intercept) | -0.0016 | 0.0072 | -0.22 | 0.8254 | 0.0000 |\n", | |
"| ppi | -0.8022 | 0.0718 | -11.18 | <1e-28 | |\n", | |
"| base_price | -0.6277 | 0.0725 | -8.66 | <1e-17 | |\n", | |
"| Residual | 33.0010 | | | | |\n" | |
], | |
"text/plain": [ | |
"Linear mixed model fit by maximum likelihood\n", | |
" volume ~ 1 + ppi + base_price + (1 | channel_type_sub_brand)\n", | |
" logLik -2 logLik AIC AICc BIC \n", | |
" -2279.1400 4558.2800 4568.2800 4568.3085 4596.5522\n", | |
"\n", | |
"Variance components:\n", | |
" Column Variance Std.Dev.\n", | |
"channel_type_sub_brand (Intercept) 0.000 0.000\n", | |
"Residual 1089.065 33.001\n", | |
" Number of obs: 2110; levels of grouping factors: 81\n", | |
"\n", | |
" Fixed-effects parameters:\n", | |
"─────────────────────────────────────────────────────\n", | |
" Coef. Std. Error z Pr(>|z|)\n", | |
"─────────────────────────────────────────────────────\n", | |
"(Intercept) -0.0015897 0.00720718 -0.22 0.8254\n", | |
"ppi -0.802217 0.0717856 -11.18 <1e-28\n", | |
"base_price -0.627739 0.0724693 -8.66 <1e-17\n", | |
"─────────────────────────────────────────────────────" | |
] | |
}, | |
"execution_count": 6, | |
"metadata": {}, | |
"output_type": "execute_result" | |
} | |
], | |
"source": [ | |
"m1 = let\n", | |
" form = @formula(\n", | |
" volume ~ 1 + ppi + base_price +\n", | |
" (1|channel_type_sub_brand)\n", | |
" )\n", | |
" fit(MixedModel, form, dat; contrasts, wts)\n", | |
"end" | |
] | |
}, | |
{ | |
"cell_type": "markdown", | |
"id": "06c5e97c", | |
"metadata": {}, | |
"source": [ | |
"As you can see, it converges on the boundary with a variance component of zero." | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": 6, | |
"id": "099da5eb", | |
"metadata": {}, | |
"outputs": [ | |
{ | |
"name": "stdout", | |
"output_type": "stream", | |
"text": [ | |
"Linear mixed model fit by maximum likelihood\n", | |
" volume ~ 1 + ppi + base_price + (1 | channel_type_sub_brand)\n", | |
" logLik -2 logLik AIC AICc BIC \n", | |
" -2279.1400 4558.2800 4568.2800 4568.3085 4596.5522\n", | |
"\n", | |
"Variance components:\n", | |
" Column Variance Std.Dev.\n", | |
"channel_type_sub_brand (Intercept) 0.000 0.000\n", | |
"Residual 1089.065 33.001\n", | |
" Number of obs: 2110; levels of grouping factors: 81\n", | |
"\n", | |
" Fixed-effects parameters:\n", | |
"─────────────────────────────────────────────────────\n", | |
" Coef. Std. Error z Pr(>|z|)\n", | |
"─────────────────────────────────────────────────────\n", | |
"(Intercept) -0.0015897 0.00720718 -0.22 0.8254\n", | |
"ppi -0.802217 0.0717856 -11.18 <1e-28\n", | |
"base_price -0.627739 0.0724693 -8.66 <1e-17\n", | |
"─────────────────────────────────────────────────────\n" | |
] | |
} | |
], | |
"source": [ | |
"println(m1)" | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": 7, | |
"id": "ca38e2db", | |
"metadata": {}, | |
"outputs": [ | |
{ | |
"name": "stderr", | |
"output_type": "stream", | |
"text": [ | |
"\r", | |
"\u001b[32mMinimizing 58 \t Time: 0:00:00 ( 7.66 ms/it)\u001b[39m\n" | |
] | |
}, | |
{ | |
"data": { | |
"text/latex": [ | |
"\\begin{tabular}\n", | |
"{l | r | r | r | r | r}\n", | |
" & Est. & SE & z & p & σ\\_channel\\_type\\_sub\\_brand \\\\\n", | |
"\\hline\n", | |
"(Intercept) & -0.0012 & 0.0069 & -0.17 & 0.8673 & 0.0000 \\\\\n", | |
"ppi & -0.3071 & 0.2650 & -1.16 & 0.2465 & 1.3776 \\\\\n", | |
"base\\_price & -0.6212 & 0.0695 & -8.94 & <1e-18 & \\\\\n", | |
"Residual & 31.4902 & & & & \\\\\n", | |
"\\end{tabular}\n" | |
], | |
"text/markdown": [ | |
"| | Est. | SE | z | p | σ_channel_type_sub_brand |\n", | |
"|:----------- | -------:| ------:| -----:| ------:| ------------------------:|\n", | |
"| (Intercept) | -0.0012 | 0.0069 | -0.17 | 0.8673 | 0.0000 |\n", | |
"| ppi | -0.3071 | 0.2650 | -1.16 | 0.2465 | 1.3776 |\n", | |
"| base_price | -0.6212 | 0.0695 | -8.94 | <1e-18 | |\n", | |
"| Residual | 31.4902 | | | | |\n" | |
], | |
"text/plain": [ | |
"Linear mixed model fit by maximum likelihood\n", | |
" volume ~ 1 + ppi + base_price + MixedModels.ZeroCorr((1 + ppi | channel_type_sub_brand))\n", | |
" logLik -2 logLik AIC AICc BIC \n", | |
" -2211.8904 4423.7809 4435.7809 4435.8208 4469.7075\n", | |
"\n", | |
"Variance components:\n", | |
" Column Variance Std.Dev. Corr.\n", | |
"channel_type_sub_brand (Intercept) 0.00000 0.00000\n", | |
" ppi 1.89789 1.37764 . \n", | |
"Residual 991.63246 31.49020\n", | |
" Number of obs: 2110; levels of grouping factors: 81\n", | |
"\n", | |
" Fixed-effects parameters:\n", | |
"────────────────────────────────────────────────────\n", | |
" Coef. Std. Error z Pr(>|z|)\n", | |
"────────────────────────────────────────────────────\n", | |
"(Intercept) -0.0011511 0.0068892 -0.17 0.8673\n", | |
"ppi -0.3071 0.264971 -1.16 0.2465\n", | |
"base_price -0.621192 0.0695056 -8.94 <1e-18\n", | |
"────────────────────────────────────────────────────" | |
] | |
}, | |
"execution_count": 8, | |
"metadata": {}, | |
"output_type": "execute_result" | |
} | |
], | |
"source": [ | |
"m2 = let\n", | |
" form = @formula(\n", | |
" volume ~ 1 + ppi + base_price +\n", | |
" zerocorr(1 + ppi | channel_type_sub_brand)\n", | |
" )\n", | |
" fit(MixedModel, form, dat; contrasts, wts)\n", | |
"end" | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": 8, | |
"id": "f6006725", | |
"metadata": {}, | |
"outputs": [ | |
{ | |
"name": "stdout", | |
"output_type": "stream", | |
"text": [ | |
"Linear mixed model fit by maximum likelihood\n", | |
" volume ~ 1 + ppi + base_price + (0 + ppi | channel_type_sub_brand)\n", | |
" logLik -2 logLik AIC AICc BIC \n", | |
" -2211.8904 4423.7809 4433.7809 4433.8094 4462.0531\n", | |
"\n", | |
"Variance components:\n", | |
" Column Variance Std.Dev. \n", | |
"channel_type_sub_brand ppi 1.89789 1.37764\n", | |
"Residual 991.63246 31.49020\n", | |
" Number of obs: 2110; levels of grouping factors: 81\n", | |
"\n", | |
" Fixed-effects parameters:\n", | |
"────────────────────────────────────────────────────\n", | |
" Coef. Std. Error z Pr(>|z|)\n", | |
"────────────────────────────────────────────────────\n", | |
"(Intercept) -0.0011511 0.0068892 -0.17 0.8673\n", | |
"ppi -0.3071 0.264971 -1.16 0.2465\n", | |
"base_price -0.621192 0.0695056 -8.94 <1e-18\n", | |
"────────────────────────────────────────────────────\n" | |
] | |
} | |
], | |
"source": [ | |
"m2a = let\n", | |
" form = @formula(\n", | |
" volume ~ 1 + ppi + base_price +\n", | |
" (0 + ppi | channel_type_sub_brand)\n", | |
" )\n", | |
" fit(MixedModel, form, dat; contrasts, wts)\n", | |
"end\n", | |
"println(m2a)" | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": 9, | |
"id": "450c3c31", | |
"metadata": {}, | |
"outputs": [ | |
{ | |
"name": "stdout", | |
"output_type": "stream", | |
"text": [ | |
"Linear mixed model fit by maximum likelihood\n", | |
" volume ~ 1 + ppi + base_price + MixedModels.ZeroCorr((1 + ppi | channel_type_sub_brand))\n", | |
" logLik -2 logLik AIC AICc BIC \n", | |
" -2211.8904 4423.7809 4435.7809 4435.8208 4469.7075\n", | |
"\n", | |
"Variance components:\n", | |
" Column Variance Std.Dev. Corr.\n", | |
"channel_type_sub_brand (Intercept) 0.00000 0.00000\n", | |
" ppi 1.89789 1.37764 . \n", | |
"Residual 991.63246 31.49020\n", | |
" Number of obs: 2110; levels of grouping factors: 81\n", | |
"\n", | |
" Fixed-effects parameters:\n", | |
"────────────────────────────────────────────────────\n", | |
" Coef. Std. Error z Pr(>|z|)\n", | |
"────────────────────────────────────────────────────\n", | |
"(Intercept) -0.0011511 0.0068892 -0.17 0.8673\n", | |
"ppi -0.3071 0.264971 -1.16 0.2465\n", | |
"base_price -0.621192 0.0695056 -8.94 <1e-18\n", | |
"────────────────────────────────────────────────────\n" | |
] | |
} | |
], | |
"source": [ | |
"println(m2)" | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": 10, | |
"id": "a2043b8a", | |
"metadata": {}, | |
"outputs": [ | |
{ | |
"name": "stderr", | |
"output_type": "stream", | |
"text": [ | |
"\r", | |
"\u001b[32mMinimizing 34 \t Time: 0:00:00 (16.69 ms/it)\u001b[39m\n" | |
] | |
}, | |
{ | |
"data": { | |
"text/latex": [ | |
"\\begin{tabular}\n", | |
"{l | r | r | r | r | r | r}\n", | |
" & Est. & SE & z & p & σ\\_channel\\_type\\_sub\\_brand & σ\\_brand\\_category\\_channel\\_type \\\\\n", | |
"\\hline\n", | |
"(Intercept) & -0.0016 & 0.0072 & -0.22 & 0.8254 & 0.0000 & 0.0000 \\\\\n", | |
"ppi & -0.8022 & 0.0718 & -11.18 & <1e-28 & & \\\\\n", | |
"base\\_price & -0.6277 & 0.0725 & -8.66 & <1e-17 & & \\\\\n", | |
"Residual & 33.0010 & & & & & \\\\\n", | |
"\\end{tabular}\n" | |
], | |
"text/markdown": [ | |
"| | Est. | SE | z | p | σ_channel_type_sub_brand | σ_brand_category_channel_type |\n", | |
"|:----------- | -------:| ------:| ------:| ------:| ------------------------:| -----------------------------:|\n", | |
"| (Intercept) | -0.0016 | 0.0072 | -0.22 | 0.8254 | 0.0000 | 0.0000 |\n", | |
"| ppi | -0.8022 | 0.0718 | -11.18 | <1e-28 | | |\n", | |
"| base_price | -0.6277 | 0.0725 | -8.66 | <1e-17 | | |\n", | |
"| Residual | 33.0010 | | | | | |\n" | |
], | |
"text/plain": [ | |
"Linear mixed model fit by maximum likelihood\n", | |
" volume ~ 1 + ppi + base_price + (1 | channel_type_sub_brand) + (1 | brand_category_channel_type)\n", | |
" logLik -2 logLik AIC AICc BIC \n", | |
" -2279.1400 4558.2800 4570.2800 4570.3199 4604.2066\n", | |
"\n", | |
"Variance components:\n", | |
" Column Variance Std.Dev.\n", | |
"channel_type_sub_brand (Intercept) 0.000 0.000\n", | |
"brand_category_channel_type (Intercept) 0.000 0.000\n", | |
"Residual 1089.065 33.001\n", | |
" Number of obs: 2110; levels of grouping factors: 81, 16\n", | |
"\n", | |
" Fixed-effects parameters:\n", | |
"─────────────────────────────────────────────────────\n", | |
" Coef. Std. Error z Pr(>|z|)\n", | |
"─────────────────────────────────────────────────────\n", | |
"(Intercept) -0.0015897 0.00720718 -0.22 0.8254\n", | |
"ppi -0.802217 0.0717856 -11.18 <1e-28\n", | |
"base_price -0.627739 0.0724693 -8.66 <1e-17\n", | |
"─────────────────────────────────────────────────────" | |
] | |
}, | |
"execution_count": 11, | |
"metadata": {}, | |
"output_type": "execute_result" | |
} | |
], | |
"source": [ | |
"m3 = let\n", | |
" form = @formula(\n", | |
" volume ~ 1 + ppi + base_price +\n", | |
" (1 | channel_type_sub_brand) +\n", | |
" (1 | brand_category_channel_type)\n", | |
" )\n", | |
" fit(MixedModel, form, dat; contrasts, wts)\n", | |
"end" | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": 11, | |
"id": "b87b114a", | |
"metadata": {}, | |
"outputs": [ | |
{ | |
"name": "stderr", | |
"output_type": "stream", | |
"text": [ | |
"┌ Warning: Cholesky factorization failed\n", | |
"└ @ Main In[12]:12\n" | |
] | |
} | |
], | |
"source": [ | |
"m4 = let\n", | |
" form = @formula(\n", | |
" volume ~ 1 + ppi + base_price +\n", | |
" (1 | channel_type_sub_brand) +\n", | |
" zerocorr(1 + ppi | brand_category_channel_type)\n", | |
" )\n", | |
" LinearMixedModel(form, dat; contrasts, wts)\n", | |
"end\n", | |
"try\n", | |
" updateL!(m4)\n", | |
"catch\n", | |
" @warn \"Cholesky factorization failed\"\n", | |
"end" | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": 12, | |
"id": "473d9216", | |
"metadata": {}, | |
"outputs": [ | |
{ | |
"data": { | |
"text/latex": [ | |
"\\begin{tabular}\n", | |
"{l | r | r | r | r | r | r}\n", | |
" & Est. & SE & z & p & σ\\_channel\\_type\\_sub\\_brand & σ\\_brand\\_category\\_channel\\_type \\\\\n", | |
"\\hline\n", | |
"(Intercept) & -0.0012 & 0.0069 & -0.17 & 0.8673 & & \\\\\n", | |
"ppi & -0.3071 & 0.2650 & -1.16 & 0.2465 & 1.3776 & 0.0000 \\\\\n", | |
"base\\_price & -0.6212 & 0.0695 & -8.94 & <1e-18 & & \\\\\n", | |
"Residual & 31.4902 & & & & & \\\\\n", | |
"\\end{tabular}\n" | |
], | |
"text/markdown": [ | |
"| | Est. | SE | z | p | σ_channel_type_sub_brand | σ_brand_category_channel_type |\n", | |
"|:----------- | -------:| ------:| -----:| ------:| ------------------------:| -----------------------------:|\n", | |
"| (Intercept) | -0.0012 | 0.0069 | -0.17 | 0.8673 | | |\n", | |
"| ppi | -0.3071 | 0.2650 | -1.16 | 0.2465 | 1.3776 | 0.0000 |\n", | |
"| base_price | -0.6212 | 0.0695 | -8.94 | <1e-18 | | |\n", | |
"| Residual | 31.4902 | | | | | |\n" | |
], | |
"text/plain": [ | |
"Linear mixed model fit by maximum likelihood\n", | |
" volume ~ 1 + ppi + base_price + (0 + ppi | channel_type_sub_brand) + (0 + ppi | brand_category_channel_type)\n", | |
" logLik -2 logLik AIC AICc BIC \n", | |
" -2211.8904 4423.7809 4435.7809 4435.8208 4469.7075\n", | |
"\n", | |
"Variance components:\n", | |
" Column Variance Std.Dev. \n", | |
"channel_type_sub_brand ppi 1.89789 1.37764\n", | |
"brand_category_channel_type ppi 0.00000 0.00000\n", | |
"Residual 991.63246 31.49020\n", | |
" Number of obs: 2110; levels of grouping factors: 81, 16\n", | |
"\n", | |
" Fixed-effects parameters:\n", | |
"────────────────────────────────────────────────────\n", | |
" Coef. Std. Error z Pr(>|z|)\n", | |
"────────────────────────────────────────────────────\n", | |
"(Intercept) -0.0011511 0.0068892 -0.17 0.8673\n", | |
"ppi -0.307101 0.264971 -1.16 0.2465\n", | |
"base_price -0.621192 0.0695056 -8.94 <1e-18\n", | |
"────────────────────────────────────────────────────" | |
] | |
}, | |
"execution_count": 13, | |
"metadata": {}, | |
"output_type": "execute_result" | |
} | |
], | |
"source": [ | |
"m5 = let\n", | |
" form = @formula(\n", | |
" volume ~ 1 + ppi + base_price +\n", | |
" (0 + ppi | channel_type_sub_brand) +\n", | |
" (0 + ppi | brand_category_channel_type)\n", | |
" )\n", | |
" fit(MixedModel, form, dat; contrasts, wts)\n", | |
"end" | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": 13, | |
"id": "bd45c758", | |
"metadata": {}, | |
"outputs": [ | |
{ | |
"name": "stdout", | |
"output_type": "stream", | |
"text": [ | |
"Linear mixed model fit by maximum likelihood\n", | |
" volume ~ 1 + ppi + base_price + (0 + ppi | channel_type_sub_brand) + (0 + ppi | brand_category_channel_type)\n", | |
" logLik -2 logLik AIC AICc BIC \n", | |
" -2211.8904 4423.7809 4435.7809 4435.8208 4469.7075\n", | |
"\n", | |
"Variance components:\n", | |
" Column Variance Std.Dev. \n", | |
"channel_type_sub_brand ppi 1.89789 1.37764\n", | |
"brand_category_channel_type ppi 0.00000 0.00000\n", | |
"Residual 991.63246 31.49020\n", | |
" Number of obs: 2110; levels of grouping factors: 81, 16\n", | |
"\n", | |
" Fixed-effects parameters:\n", | |
"────────────────────────────────────────────────────\n", | |
" Coef. Std. Error z Pr(>|z|)\n", | |
"────────────────────────────────────────────────────\n", | |
"(Intercept) -0.0011511 0.0068892 -0.17 0.8673\n", | |
"ppi -0.307101 0.264971 -1.16 0.2465\n", | |
"base_price -0.621192 0.0695056 -8.94 <1e-18\n", | |
"────────────────────────────────────────────────────\n" | |
] | |
} | |
], | |
"source": [ | |
"println(m5)" | |
] | |
}, | |
{ | |
"cell_type": "markdown", | |
"id": "c27042ae", | |
"metadata": {}, | |
"source": [ | |
"So the model seems to come down to a random effect for the slope w.r.t. `ppi` by `channel_type_sub_brand`, model `m2a`, and not much else." | |
] | |
} | |
], | |
"metadata": { | |
"kernelspec": { | |
"display_name": "Julia 1.7.2", | |
"language": "julia", | |
"name": "julia-1.7" | |
}, | |
"language_info": { | |
"file_extension": ".jl", | |
"mimetype": "application/julia", | |
"name": "julia", | |
"version": "1.7.2" | |
} | |
}, | |
"nbformat": 4, | |
"nbformat_minor": 5 | |
} |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
--- | |
title: "Issue611" | |
author: "Douglas Bates" | |
jupyter: julia-1.7 | |
date: today | |
date-format: iso | |
execute: | |
keep-ipynb: true | |
--- | |
## Load packages and data | |
```{julia} | |
using CSV, MixedModels, ProgressMeter, Tables | |
ProgressMeter.ijulia_behavior(:clear); | |
``` | |
Recent developments in Julia make it convenient to name the weights vector `wts`, which is the name of the named argument in the `LinearMixedModel` constructor, | |
```{julia} | |
dat = CSV.read("./data/model_ready_data_small.csv", columntable) | |
wts = CSV.read("./data/weights.csv", columntable).weight_vector | |
``` | |
Also, assign a `contrasts` specification of `Grouping` to the grouping factors | |
```{julia} | |
contrasts = Dict( | |
:brand_category_channel_type => Grouping(), | |
:channel_type_sub_brand => Grouping(), | |
); | |
``` | |
With the `Grouping` contrast there is no need to change from an Integer to a Categorical type. | |
In general I would recommend prepending a character to the number in the CSV file so that the column is read as a String or a `PooledArray`, which is like a Categorical array. | |
For example, | |
```{julia} | |
brand = string.('B', lpad.(dat.brand_category_channel_type, 2, '0')) | |
``` | |
Recently I do any permanent storage of data frames as arrow files (see [Arrow.jl](https://github.com/apache/arrow-julia)) because they retain the metadata and you don't need to go through the conversion from CSV to an internal representation followed by transformations. | |
## Initial model fits | |
Before going to vector-valued random effects I would try just random intercepts to see what kinds of fits they produce. | |
I write this now as a let block so that I can define the formula but it doesn't clutter up the namespace. | |
```{julia} | |
m1 = let | |
form = @formula( | |
volume ~ 1 + ppi + base_price + | |
(1|channel_type_sub_brand) | |
) | |
fit(MixedModel, form, dat; contrasts, wts) | |
end | |
``` | |
As you can see, it converges on the boundary with a variance component of zero. | |
```{julia} | |
println(m1) | |
``` | |
```{julia} | |
m2 = let | |
form = @formula( | |
volume ~ 1 + ppi + base_price + | |
zerocorr(1 + ppi | channel_type_sub_brand) | |
) | |
fit(MixedModel, form, dat; contrasts, wts) | |
end | |
``` | |
```{julia} | |
m2a = let | |
form = @formula( | |
volume ~ 1 + ppi + base_price + | |
(0 + ppi | channel_type_sub_brand) | |
) | |
fit(MixedModel, form, dat; contrasts, wts) | |
end | |
println(m2a) | |
``` | |
```{julia} | |
println(m2) | |
``` | |
```{julia} | |
m3 = let | |
form = @formula( | |
volume ~ 1 + ppi + base_price + | |
(1 | channel_type_sub_brand) + | |
(1 | brand_category_channel_type) | |
) | |
fit(MixedModel, form, dat; contrasts, wts) | |
end | |
``` | |
```{julia} | |
m4 = let | |
form = @formula( | |
volume ~ 1 + ppi + base_price + | |
(1 | channel_type_sub_brand) + | |
zerocorr(1 + ppi | brand_category_channel_type) | |
) | |
LinearMixedModel(form, dat; contrasts, wts) | |
end | |
try | |
updateL!(m4) | |
catch | |
@warn "Cholesky factorization failed" | |
end | |
``` | |
```{julia} | |
m5 = let | |
form = @formula( | |
volume ~ 1 + ppi + base_price + | |
(0 + ppi | channel_type_sub_brand) + | |
(0 + ppi | brand_category_channel_type) | |
) | |
fit(MixedModel, form, dat; contrasts, wts) | |
end | |
``` | |
```{julia} | |
println(m5) | |
``` | |
So the model seems to come down to a random effect for the slope w.r.t. `ppi` by `channel_type_sub_brand`, model `m2a`, and not much else. |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
[deps] | |
CSV = "336ed68f-0bac-5ca0-87d4-7b16caf5d00b" | |
CategoricalArrays = "324d7699-5711-5eae-9e2f-1d82baa6b597" | |
Chain = "8be319e6-bccf-4806-a6f7-6fae938471bc" | |
DataFrames = "a93c6f00-e57d-5684-b7b6-d8193f3e46c0" | |
MixedModels = "ff71e718-51f3-5ec2-a782-8ffcbfa3c316" | |
ProgressMeter = "92933f4c-e287-5a05-a399-4b506db050ca" | |
StatsBase = "2913bbd2-ae8a-5f71-8c99-4fb6c76f3a91" | |
Tables = "bd369af6-aec1-5ad0-b16a-f7cc5008161c" |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment