Skip to content

Instantly share code, notes, and snippets.

@dmbates
Created May 3, 2022 18:04
Show Gist options
  • Save dmbates/5b99dcbc3c3cc19b19a7c09a315e1fe3 to your computer and use it in GitHub Desktop.
Save dmbates/5b99dcbc3c3cc19b19a7c09a315e1fe3 to your computer and use it in GitHub Desktop.
Notebook related to MixedModels.jl issue 611
Display the source blob
Display the rendered blob
Raw
{
"cells": [
{
"cell_type": "raw",
"id": "52407268",
"metadata": {},
"source": [
"---\n",
"title: Issue611\n",
"author: Douglas Bates\n",
"date: today\n",
"date-format: iso\n",
"execute:\n",
" keep-ipynb: true\n",
"---"
]
},
{
"cell_type": "markdown",
"id": "8a5dbdf2",
"metadata": {},
"source": [
"## Load packages and data"
]
},
{
"cell_type": "code",
"execution_count": 1,
"id": "8f6b34c7",
"metadata": {},
"outputs": [],
"source": [
"using CSV, MixedModels, ProgressMeter, Tables\n",
"ProgressMeter.ijulia_behavior(:clear);"
]
},
{
"cell_type": "markdown",
"id": "7ca7944e",
"metadata": {},
"source": [
"Recent developments in Julia make it convenient to name the weights vector `wts`, which is the name of the named argument in the `LinearMixedModel` constructor,"
]
},
{
"cell_type": "code",
"execution_count": 2,
"id": "b344592e",
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"2110-element Vector{Float64}:\n",
" 274.388611\n",
" 1015.990089\n",
" 2197.288998\n",
" 3367.868333\n",
" 2415.089921\n",
" 2980.472642\n",
" 1621.304086\n",
" 1452.363879\n",
" 1612.956475\n",
" 1836.63395\n",
" 2548.059624\n",
" 3075.149043\n",
" 2846.338504\n",
" ⋮\n",
" 90.037501\n",
" 64.327124\n",
" 29.44757\n",
" 29.084692\n",
" 27.125033\n",
" 409.445973\n",
" 12.593273\n",
" 431.874531\n",
" 37.276392\n",
" 45.440296\n",
" 75.674115\n",
" 18.573065"
]
},
"execution_count": 3,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"dat = CSV.read(\"./data/model_ready_data_small.csv\", columntable)\n",
"wts = CSV.read(\"./data/weights.csv\", columntable).weight_vector"
]
},
{
"cell_type": "markdown",
"id": "3289eed6",
"metadata": {},
"source": [
"Also, assign a `contrasts` specification of `Grouping` to the grouping factors"
]
},
{
"cell_type": "code",
"execution_count": 3,
"id": "89442b55",
"metadata": {},
"outputs": [],
"source": [
"contrasts = Dict(\n",
" :brand_category_channel_type => Grouping(),\n",
" :channel_type_sub_brand => Grouping(),\n",
");"
]
},
{
"cell_type": "markdown",
"id": "7111184e",
"metadata": {},
"source": [
"With the `Grouping` contrast there is no need to change from an Integer to a Categorical type.\n",
"In general I would recommend prepending a character to the number in the CSV file so that the column is read as a String or a `PooledArray`, which is like a Categorical array.\n",
"\n",
"For example,"
]
},
{
"cell_type": "code",
"execution_count": 4,
"id": "142f98f1",
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"2110-element Vector{String}:\n",
" \"B00\"\n",
" \"B00\"\n",
" \"B00\"\n",
" \"B00\"\n",
" \"B00\"\n",
" \"B00\"\n",
" \"B00\"\n",
" \"B00\"\n",
" \"B00\"\n",
" \"B00\"\n",
" \"B00\"\n",
" \"B00\"\n",
" \"B00\"\n",
" ⋮\n",
" \"B15\"\n",
" \"B15\"\n",
" \"B15\"\n",
" \"B15\"\n",
" \"B15\"\n",
" \"B15\"\n",
" \"B15\"\n",
" \"B15\"\n",
" \"B15\"\n",
" \"B15\"\n",
" \"B15\"\n",
" \"B15\""
]
},
"execution_count": 5,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"brand = string.('B', lpad.(dat.brand_category_channel_type, 2, '0'))"
]
},
{
"cell_type": "markdown",
"id": "cbf43ef0",
"metadata": {},
"source": [
"Recently I do any permanent storage of data frames as arrow files (see [Arrow.jl](https://github.com/apache/arrow-julia)) because they retain the metadata and you don't need to go through the conversion from CSV to an internal representation followed by transformations.\n",
"\n",
"## Initial model fits\n",
"\n",
"Before going to vector-valued random effects I would try just random intercepts to see what kinds of fits they produce.\n",
"I write this now as a let block so that I can define the formula but it doesn't clutter up the namespace."
]
},
{
"cell_type": "code",
"execution_count": 5,
"id": "d9a1e689",
"metadata": {},
"outputs": [
{
"name": "stderr",
"output_type": "stream",
"text": [
"\r",
"\u001b[32mMinimizing 17 \t Time: 0:00:00 (23.36 ms/it)\u001b[39m\n"
]
},
{
"data": {
"text/latex": [
"\\begin{tabular}\n",
"{l | r | r | r | r | r}\n",
" & Est. & SE & z & p & σ\\_channel\\_type\\_sub\\_brand \\\\\n",
"\\hline\n",
"(Intercept) & -0.0016 & 0.0072 & -0.22 & 0.8254 & 0.0000 \\\\\n",
"ppi & -0.8022 & 0.0718 & -11.18 & <1e-28 & \\\\\n",
"base\\_price & -0.6277 & 0.0725 & -8.66 & <1e-17 & \\\\\n",
"Residual & 33.0010 & & & & \\\\\n",
"\\end{tabular}\n"
],
"text/markdown": [
"| | Est. | SE | z | p | σ_channel_type_sub_brand |\n",
"|:----------- | -------:| ------:| ------:| ------:| ------------------------:|\n",
"| (Intercept) | -0.0016 | 0.0072 | -0.22 | 0.8254 | 0.0000 |\n",
"| ppi | -0.8022 | 0.0718 | -11.18 | <1e-28 | |\n",
"| base_price | -0.6277 | 0.0725 | -8.66 | <1e-17 | |\n",
"| Residual | 33.0010 | | | | |\n"
],
"text/plain": [
"Linear mixed model fit by maximum likelihood\n",
" volume ~ 1 + ppi + base_price + (1 | channel_type_sub_brand)\n",
" logLik -2 logLik AIC AICc BIC \n",
" -2279.1400 4558.2800 4568.2800 4568.3085 4596.5522\n",
"\n",
"Variance components:\n",
" Column Variance Std.Dev.\n",
"channel_type_sub_brand (Intercept) 0.000 0.000\n",
"Residual 1089.065 33.001\n",
" Number of obs: 2110; levels of grouping factors: 81\n",
"\n",
" Fixed-effects parameters:\n",
"─────────────────────────────────────────────────────\n",
" Coef. Std. Error z Pr(>|z|)\n",
"─────────────────────────────────────────────────────\n",
"(Intercept) -0.0015897 0.00720718 -0.22 0.8254\n",
"ppi -0.802217 0.0717856 -11.18 <1e-28\n",
"base_price -0.627739 0.0724693 -8.66 <1e-17\n",
"─────────────────────────────────────────────────────"
]
},
"execution_count": 6,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"m1 = let\n",
" form = @formula(\n",
" volume ~ 1 + ppi + base_price +\n",
" (1|channel_type_sub_brand)\n",
" )\n",
" fit(MixedModel, form, dat; contrasts, wts)\n",
"end"
]
},
{
"cell_type": "markdown",
"id": "06c5e97c",
"metadata": {},
"source": [
"As you can see, it converges on the boundary with a variance component of zero."
]
},
{
"cell_type": "code",
"execution_count": 6,
"id": "099da5eb",
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"Linear mixed model fit by maximum likelihood\n",
" volume ~ 1 + ppi + base_price + (1 | channel_type_sub_brand)\n",
" logLik -2 logLik AIC AICc BIC \n",
" -2279.1400 4558.2800 4568.2800 4568.3085 4596.5522\n",
"\n",
"Variance components:\n",
" Column Variance Std.Dev.\n",
"channel_type_sub_brand (Intercept) 0.000 0.000\n",
"Residual 1089.065 33.001\n",
" Number of obs: 2110; levels of grouping factors: 81\n",
"\n",
" Fixed-effects parameters:\n",
"─────────────────────────────────────────────────────\n",
" Coef. Std. Error z Pr(>|z|)\n",
"─────────────────────────────────────────────────────\n",
"(Intercept) -0.0015897 0.00720718 -0.22 0.8254\n",
"ppi -0.802217 0.0717856 -11.18 <1e-28\n",
"base_price -0.627739 0.0724693 -8.66 <1e-17\n",
"─────────────────────────────────────────────────────\n"
]
}
],
"source": [
"println(m1)"
]
},
{
"cell_type": "code",
"execution_count": 7,
"id": "ca38e2db",
"metadata": {},
"outputs": [
{
"name": "stderr",
"output_type": "stream",
"text": [
"\r",
"\u001b[32mMinimizing 58 \t Time: 0:00:00 ( 7.66 ms/it)\u001b[39m\n"
]
},
{
"data": {
"text/latex": [
"\\begin{tabular}\n",
"{l | r | r | r | r | r}\n",
" & Est. & SE & z & p & σ\\_channel\\_type\\_sub\\_brand \\\\\n",
"\\hline\n",
"(Intercept) & -0.0012 & 0.0069 & -0.17 & 0.8673 & 0.0000 \\\\\n",
"ppi & -0.3071 & 0.2650 & -1.16 & 0.2465 & 1.3776 \\\\\n",
"base\\_price & -0.6212 & 0.0695 & -8.94 & <1e-18 & \\\\\n",
"Residual & 31.4902 & & & & \\\\\n",
"\\end{tabular}\n"
],
"text/markdown": [
"| | Est. | SE | z | p | σ_channel_type_sub_brand |\n",
"|:----------- | -------:| ------:| -----:| ------:| ------------------------:|\n",
"| (Intercept) | -0.0012 | 0.0069 | -0.17 | 0.8673 | 0.0000 |\n",
"| ppi | -0.3071 | 0.2650 | -1.16 | 0.2465 | 1.3776 |\n",
"| base_price | -0.6212 | 0.0695 | -8.94 | <1e-18 | |\n",
"| Residual | 31.4902 | | | | |\n"
],
"text/plain": [
"Linear mixed model fit by maximum likelihood\n",
" volume ~ 1 + ppi + base_price + MixedModels.ZeroCorr((1 + ppi | channel_type_sub_brand))\n",
" logLik -2 logLik AIC AICc BIC \n",
" -2211.8904 4423.7809 4435.7809 4435.8208 4469.7075\n",
"\n",
"Variance components:\n",
" Column Variance Std.Dev. Corr.\n",
"channel_type_sub_brand (Intercept) 0.00000 0.00000\n",
" ppi 1.89789 1.37764 . \n",
"Residual 991.63246 31.49020\n",
" Number of obs: 2110; levels of grouping factors: 81\n",
"\n",
" Fixed-effects parameters:\n",
"────────────────────────────────────────────────────\n",
" Coef. Std. Error z Pr(>|z|)\n",
"────────────────────────────────────────────────────\n",
"(Intercept) -0.0011511 0.0068892 -0.17 0.8673\n",
"ppi -0.3071 0.264971 -1.16 0.2465\n",
"base_price -0.621192 0.0695056 -8.94 <1e-18\n",
"────────────────────────────────────────────────────"
]
},
"execution_count": 8,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"m2 = let\n",
" form = @formula(\n",
" volume ~ 1 + ppi + base_price +\n",
" zerocorr(1 + ppi | channel_type_sub_brand)\n",
" )\n",
" fit(MixedModel, form, dat; contrasts, wts)\n",
"end"
]
},
{
"cell_type": "code",
"execution_count": 8,
"id": "f6006725",
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"Linear mixed model fit by maximum likelihood\n",
" volume ~ 1 + ppi + base_price + (0 + ppi | channel_type_sub_brand)\n",
" logLik -2 logLik AIC AICc BIC \n",
" -2211.8904 4423.7809 4433.7809 4433.8094 4462.0531\n",
"\n",
"Variance components:\n",
" Column Variance Std.Dev. \n",
"channel_type_sub_brand ppi 1.89789 1.37764\n",
"Residual 991.63246 31.49020\n",
" Number of obs: 2110; levels of grouping factors: 81\n",
"\n",
" Fixed-effects parameters:\n",
"────────────────────────────────────────────────────\n",
" Coef. Std. Error z Pr(>|z|)\n",
"────────────────────────────────────────────────────\n",
"(Intercept) -0.0011511 0.0068892 -0.17 0.8673\n",
"ppi -0.3071 0.264971 -1.16 0.2465\n",
"base_price -0.621192 0.0695056 -8.94 <1e-18\n",
"────────────────────────────────────────────────────\n"
]
}
],
"source": [
"m2a = let\n",
" form = @formula(\n",
" volume ~ 1 + ppi + base_price +\n",
" (0 + ppi | channel_type_sub_brand)\n",
" )\n",
" fit(MixedModel, form, dat; contrasts, wts)\n",
"end\n",
"println(m2a)"
]
},
{
"cell_type": "code",
"execution_count": 9,
"id": "450c3c31",
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"Linear mixed model fit by maximum likelihood\n",
" volume ~ 1 + ppi + base_price + MixedModels.ZeroCorr((1 + ppi | channel_type_sub_brand))\n",
" logLik -2 logLik AIC AICc BIC \n",
" -2211.8904 4423.7809 4435.7809 4435.8208 4469.7075\n",
"\n",
"Variance components:\n",
" Column Variance Std.Dev. Corr.\n",
"channel_type_sub_brand (Intercept) 0.00000 0.00000\n",
" ppi 1.89789 1.37764 . \n",
"Residual 991.63246 31.49020\n",
" Number of obs: 2110; levels of grouping factors: 81\n",
"\n",
" Fixed-effects parameters:\n",
"────────────────────────────────────────────────────\n",
" Coef. Std. Error z Pr(>|z|)\n",
"────────────────────────────────────────────────────\n",
"(Intercept) -0.0011511 0.0068892 -0.17 0.8673\n",
"ppi -0.3071 0.264971 -1.16 0.2465\n",
"base_price -0.621192 0.0695056 -8.94 <1e-18\n",
"────────────────────────────────────────────────────\n"
]
}
],
"source": [
"println(m2)"
]
},
{
"cell_type": "code",
"execution_count": 10,
"id": "a2043b8a",
"metadata": {},
"outputs": [
{
"name": "stderr",
"output_type": "stream",
"text": [
"\r",
"\u001b[32mMinimizing 34 \t Time: 0:00:00 (16.69 ms/it)\u001b[39m\n"
]
},
{
"data": {
"text/latex": [
"\\begin{tabular}\n",
"{l | r | r | r | r | r | r}\n",
" & Est. & SE & z & p & σ\\_channel\\_type\\_sub\\_brand & σ\\_brand\\_category\\_channel\\_type \\\\\n",
"\\hline\n",
"(Intercept) & -0.0016 & 0.0072 & -0.22 & 0.8254 & 0.0000 & 0.0000 \\\\\n",
"ppi & -0.8022 & 0.0718 & -11.18 & <1e-28 & & \\\\\n",
"base\\_price & -0.6277 & 0.0725 & -8.66 & <1e-17 & & \\\\\n",
"Residual & 33.0010 & & & & & \\\\\n",
"\\end{tabular}\n"
],
"text/markdown": [
"| | Est. | SE | z | p | σ_channel_type_sub_brand | σ_brand_category_channel_type |\n",
"|:----------- | -------:| ------:| ------:| ------:| ------------------------:| -----------------------------:|\n",
"| (Intercept) | -0.0016 | 0.0072 | -0.22 | 0.8254 | 0.0000 | 0.0000 |\n",
"| ppi | -0.8022 | 0.0718 | -11.18 | <1e-28 | | |\n",
"| base_price | -0.6277 | 0.0725 | -8.66 | <1e-17 | | |\n",
"| Residual | 33.0010 | | | | | |\n"
],
"text/plain": [
"Linear mixed model fit by maximum likelihood\n",
" volume ~ 1 + ppi + base_price + (1 | channel_type_sub_brand) + (1 | brand_category_channel_type)\n",
" logLik -2 logLik AIC AICc BIC \n",
" -2279.1400 4558.2800 4570.2800 4570.3199 4604.2066\n",
"\n",
"Variance components:\n",
" Column Variance Std.Dev.\n",
"channel_type_sub_brand (Intercept) 0.000 0.000\n",
"brand_category_channel_type (Intercept) 0.000 0.000\n",
"Residual 1089.065 33.001\n",
" Number of obs: 2110; levels of grouping factors: 81, 16\n",
"\n",
" Fixed-effects parameters:\n",
"─────────────────────────────────────────────────────\n",
" Coef. Std. Error z Pr(>|z|)\n",
"─────────────────────────────────────────────────────\n",
"(Intercept) -0.0015897 0.00720718 -0.22 0.8254\n",
"ppi -0.802217 0.0717856 -11.18 <1e-28\n",
"base_price -0.627739 0.0724693 -8.66 <1e-17\n",
"─────────────────────────────────────────────────────"
]
},
"execution_count": 11,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"m3 = let\n",
" form = @formula(\n",
" volume ~ 1 + ppi + base_price +\n",
" (1 | channel_type_sub_brand) +\n",
" (1 | brand_category_channel_type)\n",
" )\n",
" fit(MixedModel, form, dat; contrasts, wts)\n",
"end"
]
},
{
"cell_type": "code",
"execution_count": 11,
"id": "b87b114a",
"metadata": {},
"outputs": [
{
"name": "stderr",
"output_type": "stream",
"text": [
"┌ Warning: Cholesky factorization failed\n",
"└ @ Main In[12]:12\n"
]
}
],
"source": [
"m4 = let\n",
" form = @formula(\n",
" volume ~ 1 + ppi + base_price +\n",
" (1 | channel_type_sub_brand) +\n",
" zerocorr(1 + ppi | brand_category_channel_type)\n",
" )\n",
" LinearMixedModel(form, dat; contrasts, wts)\n",
"end\n",
"try\n",
" updateL!(m4)\n",
"catch\n",
" @warn \"Cholesky factorization failed\"\n",
"end"
]
},
{
"cell_type": "code",
"execution_count": 12,
"id": "473d9216",
"metadata": {},
"outputs": [
{
"data": {
"text/latex": [
"\\begin{tabular}\n",
"{l | r | r | r | r | r | r}\n",
" & Est. & SE & z & p & σ\\_channel\\_type\\_sub\\_brand & σ\\_brand\\_category\\_channel\\_type \\\\\n",
"\\hline\n",
"(Intercept) & -0.0012 & 0.0069 & -0.17 & 0.8673 & & \\\\\n",
"ppi & -0.3071 & 0.2650 & -1.16 & 0.2465 & 1.3776 & 0.0000 \\\\\n",
"base\\_price & -0.6212 & 0.0695 & -8.94 & <1e-18 & & \\\\\n",
"Residual & 31.4902 & & & & & \\\\\n",
"\\end{tabular}\n"
],
"text/markdown": [
"| | Est. | SE | z | p | σ_channel_type_sub_brand | σ_brand_category_channel_type |\n",
"|:----------- | -------:| ------:| -----:| ------:| ------------------------:| -----------------------------:|\n",
"| (Intercept) | -0.0012 | 0.0069 | -0.17 | 0.8673 | | |\n",
"| ppi | -0.3071 | 0.2650 | -1.16 | 0.2465 | 1.3776 | 0.0000 |\n",
"| base_price | -0.6212 | 0.0695 | -8.94 | <1e-18 | | |\n",
"| Residual | 31.4902 | | | | | |\n"
],
"text/plain": [
"Linear mixed model fit by maximum likelihood\n",
" volume ~ 1 + ppi + base_price + (0 + ppi | channel_type_sub_brand) + (0 + ppi | brand_category_channel_type)\n",
" logLik -2 logLik AIC AICc BIC \n",
" -2211.8904 4423.7809 4435.7809 4435.8208 4469.7075\n",
"\n",
"Variance components:\n",
" Column Variance Std.Dev. \n",
"channel_type_sub_brand ppi 1.89789 1.37764\n",
"brand_category_channel_type ppi 0.00000 0.00000\n",
"Residual 991.63246 31.49020\n",
" Number of obs: 2110; levels of grouping factors: 81, 16\n",
"\n",
" Fixed-effects parameters:\n",
"────────────────────────────────────────────────────\n",
" Coef. Std. Error z Pr(>|z|)\n",
"────────────────────────────────────────────────────\n",
"(Intercept) -0.0011511 0.0068892 -0.17 0.8673\n",
"ppi -0.307101 0.264971 -1.16 0.2465\n",
"base_price -0.621192 0.0695056 -8.94 <1e-18\n",
"────────────────────────────────────────────────────"
]
},
"execution_count": 13,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"m5 = let\n",
" form = @formula(\n",
" volume ~ 1 + ppi + base_price +\n",
" (0 + ppi | channel_type_sub_brand) +\n",
" (0 + ppi | brand_category_channel_type)\n",
" )\n",
" fit(MixedModel, form, dat; contrasts, wts)\n",
"end"
]
},
{
"cell_type": "code",
"execution_count": 13,
"id": "bd45c758",
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"Linear mixed model fit by maximum likelihood\n",
" volume ~ 1 + ppi + base_price + (0 + ppi | channel_type_sub_brand) + (0 + ppi | brand_category_channel_type)\n",
" logLik -2 logLik AIC AICc BIC \n",
" -2211.8904 4423.7809 4435.7809 4435.8208 4469.7075\n",
"\n",
"Variance components:\n",
" Column Variance Std.Dev. \n",
"channel_type_sub_brand ppi 1.89789 1.37764\n",
"brand_category_channel_type ppi 0.00000 0.00000\n",
"Residual 991.63246 31.49020\n",
" Number of obs: 2110; levels of grouping factors: 81, 16\n",
"\n",
" Fixed-effects parameters:\n",
"────────────────────────────────────────────────────\n",
" Coef. Std. Error z Pr(>|z|)\n",
"────────────────────────────────────────────────────\n",
"(Intercept) -0.0011511 0.0068892 -0.17 0.8673\n",
"ppi -0.307101 0.264971 -1.16 0.2465\n",
"base_price -0.621192 0.0695056 -8.94 <1e-18\n",
"────────────────────────────────────────────────────\n"
]
}
],
"source": [
"println(m5)"
]
},
{
"cell_type": "markdown",
"id": "c27042ae",
"metadata": {},
"source": [
"So the model seems to come down to a random effect for the slope w.r.t. `ppi` by `channel_type_sub_brand`, model `m2a`, and not much else."
]
}
],
"metadata": {
"kernelspec": {
"display_name": "Julia 1.7.2",
"language": "julia",
"name": "julia-1.7"
},
"language_info": {
"file_extension": ".jl",
"mimetype": "application/julia",
"name": "julia",
"version": "1.7.2"
}
},
"nbformat": 4,
"nbformat_minor": 5
}
---
title: "Issue611"
author: "Douglas Bates"
jupyter: julia-1.7
date: today
date-format: iso
execute:
keep-ipynb: true
---
## Load packages and data
```{julia}
using CSV, MixedModels, ProgressMeter, Tables
ProgressMeter.ijulia_behavior(:clear);
```
Recent developments in Julia make it convenient to name the weights vector `wts`, which is the name of the named argument in the `LinearMixedModel` constructor,
```{julia}
dat = CSV.read("./data/model_ready_data_small.csv", columntable)
wts = CSV.read("./data/weights.csv", columntable).weight_vector
```
Also, assign a `contrasts` specification of `Grouping` to the grouping factors
```{julia}
contrasts = Dict(
:brand_category_channel_type => Grouping(),
:channel_type_sub_brand => Grouping(),
);
```
With the `Grouping` contrast there is no need to change from an Integer to a Categorical type.
In general I would recommend prepending a character to the number in the CSV file so that the column is read as a String or a `PooledArray`, which is like a Categorical array.
For example,
```{julia}
brand = string.('B', lpad.(dat.brand_category_channel_type, 2, '0'))
```
Recently I do any permanent storage of data frames as arrow files (see [Arrow.jl](https://github.com/apache/arrow-julia)) because they retain the metadata and you don't need to go through the conversion from CSV to an internal representation followed by transformations.
## Initial model fits
Before going to vector-valued random effects I would try just random intercepts to see what kinds of fits they produce.
I write this now as a let block so that I can define the formula but it doesn't clutter up the namespace.
```{julia}
m1 = let
form = @formula(
volume ~ 1 + ppi + base_price +
(1|channel_type_sub_brand)
)
fit(MixedModel, form, dat; contrasts, wts)
end
```
As you can see, it converges on the boundary with a variance component of zero.
```{julia}
println(m1)
```
```{julia}
m2 = let
form = @formula(
volume ~ 1 + ppi + base_price +
zerocorr(1 + ppi | channel_type_sub_brand)
)
fit(MixedModel, form, dat; contrasts, wts)
end
```
```{julia}
m2a = let
form = @formula(
volume ~ 1 + ppi + base_price +
(0 + ppi | channel_type_sub_brand)
)
fit(MixedModel, form, dat; contrasts, wts)
end
println(m2a)
```
```{julia}
println(m2)
```
```{julia}
m3 = let
form = @formula(
volume ~ 1 + ppi + base_price +
(1 | channel_type_sub_brand) +
(1 | brand_category_channel_type)
)
fit(MixedModel, form, dat; contrasts, wts)
end
```
```{julia}
m4 = let
form = @formula(
volume ~ 1 + ppi + base_price +
(1 | channel_type_sub_brand) +
zerocorr(1 + ppi | brand_category_channel_type)
)
LinearMixedModel(form, dat; contrasts, wts)
end
try
updateL!(m4)
catch
@warn "Cholesky factorization failed"
end
```
```{julia}
m5 = let
form = @formula(
volume ~ 1 + ppi + base_price +
(0 + ppi | channel_type_sub_brand) +
(0 + ppi | brand_category_channel_type)
)
fit(MixedModel, form, dat; contrasts, wts)
end
```
```{julia}
println(m5)
```
So the model seems to come down to a random effect for the slope w.r.t. `ppi` by `channel_type_sub_brand`, model `m2a`, and not much else.
[deps]
CSV = "336ed68f-0bac-5ca0-87d4-7b16caf5d00b"
CategoricalArrays = "324d7699-5711-5eae-9e2f-1d82baa6b597"
Chain = "8be319e6-bccf-4806-a6f7-6fae938471bc"
DataFrames = "a93c6f00-e57d-5684-b7b6-d8193f3e46c0"
MixedModels = "ff71e718-51f3-5ec2-a782-8ffcbfa3c316"
ProgressMeter = "92933f4c-e287-5a05-a399-4b506db050ca"
StatsBase = "2913bbd2-ae8a-5f71-8c99-4fb6c76f3a91"
Tables = "bd369af6-aec1-5ad0-b16a-f7cc5008161c"
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment