Kyle F Butts kylebutts

This code shows the problem of post-selection inference following the review article Post-Selection Inference

library(tidyverse)
library(fixest)

Data generation process:

$X = (X_1, X_2, X_3)'$ is multi-variate normal with non-diagonal covaraince

I love knitr::spin() and code cells for my dev experience. However, there are a few edge-cases that made the function insufficient.

#' roxygen-style documentation would create problems since they would be interpreted as markdown.
With ark, jupytext style notebooks will be advantageous and as such, supporting # %% [markdown] would be beneficial.
If # %% is on the start of a line in a string, this would cause problems (unlikely, but still)

This function streamlines this proceess by rewriting the code from first-principles:

The function parses the source code using tree-sitter (see history for using R's parse function)
The code is iterated line-by-line and uses a state-machine to properly parse everything.

You can see a demo with temp.R which I've intentionally writen to highlight difficulties that knitr::spin has

library(fixest)
library(sandwich)

est <- feols(mpg ~ hp + i(cyl) | am, mtcars, vcov = "hc1")
vcov_bs <- sandwich::vcovBS(est, R = 500)

### attach new vcov to est
est_bs <- summary(est, vcov = vcov_bs)
###

# %% 
library(wooldridge)
library(dqrng)
library(collapse)
#> collapse 2.0.10, see ?`collapse-package` or ?`collapse-documentation`
#> 
#> Attaching package: 'collapse'
#> The following object is masked from 'package:stats':
#>

	# %%
	library(tidyverse)
	library(fixest)
	library(binsreg)
	library(patchwork)

	set.seed(20240829)
	x <- runif(500)
	w <- rnorm(n = 500, mean = x, sd = 1)
	y <- sin(x) + w * 2 + rnorm(500, mean = 0, sd = 1)

	% https://brand.uark.edu/graphic-identity/official-colors.php
	\definecolor{cardinal_red}{HTML}{9D2235}
	\definecolor{apple_blossom}{HTML}{FFFFFF}
	\definecolor{quartz}{HTML}{F2F2F4}
	\definecolor{gray_squirrel}{HTML}{C7C8CA}
	\definecolor{spoofers_stone}{HTML}{424242}
	\definecolor{black_whetstone}{HTML}{000000}
	\definecolor{diana_butterfly}{HTML}{2B5269}
	\definecolor{ozark_mountains}{HTML}{3F7F7F}
	\definecolor{birdsfoot_violet}{HTML}{2F1332}

	#' Create matrix from vectors for rows, columns, and values
	#'
	#' Note the rows and columns can be anything (e.g. strings containing fips).
	#' Internally, these are efficiently created to an index using
	#' `indexthis::to_index` (with values 1, ..., n_unique).
	#'
	#' @param i Vector used for the row indices.
	#' @param j Vector used for the column indices.
	#' @param x Vector used for the values.
	#' @param names Logical. If column and row names should be used. These

	# %% [markdown]
	# ---
	# format: gfm
	# ---
	# %%
	using StatsBase, Statistics
	using Random

	# %%
	"""

	library(tidyverse)

	# %%
	simulation <- function(n, p, k) {
	trials = purrr::map_dbl(1:100000, function(b) {
	# Take 100 shots and record if basket is made
	shots = as.numeric(runif(n) < p)

	# Observe streaks
	hot_hand_shot_results = c()

	library(htmltools)

	#* @serializer html
	#* @get /
	base = function() {
	html <- tags$html(
	tags$head(
	tags$script(src='https://unpkg.com/htmx.org@1.9.10/dist/htmx.js')
	),
	tags$body(