Skip to content

Instantly share code, notes, and snippets.

@bskinn
Last active December 30, 2017 17:09
Show Gist options
  • Save bskinn/7497558b028d2b2c70f866236a92928d to your computer and use it in GitHub Desktop.
Save bskinn/7497558b028d2b2c70f866236a92928d to your computer and use it in GitHub Desktop.
Old text file version of completed and pending/potential tasks for excel-mregress
-- TO DO --
xxx Strip all Optional arguments, to ensure specificity of behavior?
===THESE MAY BE REDUNDANT===
Parameterized predictor transforms, eventually to be optimized over, to find, e.g., the optimal exponent for a nonlinear dependence.
Workflow mini-language? Multi-step transforms from source (predictor transform->residual analysis->...)
Arbitrary, functional-form definitions of transforms? (<!predname!>, e.g.)
Combinatorial parameter cluster model selection (e.g., run model selection for all subsets of size m out of these n things,
and report comparative performance.
Implement some (optional?) relative datasource reference storage, so that rgns are more portable from machine to machine.
Given a regression against a set S of N predictors, containing one specific predictor P of interest, implement quick means for (at minimum) plotting (ordinate) the residuals of a fit against the N-1 predictors (S Ω !P) versus (abscissa) the predictor P. This would give an *approximate* visualization of the actual fitting value of O. HOWEVER: removing P from the fit DOES CHANGE the fitting behavior over the (S Ω !P) predictors, because that reduced fit no longer accounts for the effects of P as it participates in tandem with the (S Ω !P) other predictors!
===END===
*** Implement capability to prompt for re-selection of source workbook when regression is moved
xxx *** Implement ability to select where to place model selection results worksheet
xxx *** PLOTTING: Implement option to include an identity line?
xxx Only really relevant for 'fitted response vs response' plots
Implement ability to prevent predictors from being downselected in automatic model selection?
*** Re-work / streamline model selection code to use .addFilter, .delFilter
"Generalize 'XDataCheck' .... not sure what I meant when writing this?"
Implement retaining form positioning at least for Main Menu and Plot forms, maybe Filter Predictors, too.
Add # of outliers at a given alpha (set alpha? tie to plot form? one or more standard alphas?)
to the main form rgn summary?
It seems that sometimes it's desirable to clear the filters when Editing; sometimes not. Need to
enable selective filter clear(?)
ABSOLUTELY need to harden against formula-error cases (collinearity, etc. - unless/until such
things are dealt with on a deeper level)
Consider a more elegant check/correction in reg.modifyRegression for the prior charting status,
perhaps basically just bumping abscissa and/or ordinate to a default if either/both is presently
set to a predictor. Current always-force-to-default-charting is strongly cautious; who knows
if this caution is warranted?
Check first cell of each range of loaded source and compare to stored prior value
as a relatively robust check against changes to the source book
If change(s) detected, prompt user and recommend confirming source references
are correct
Will probably have to handle differently in different load circumstances?
OR -- could hash the dataset somehow
Robustify config section of Reg load to proofread against empty cells or invalid values
-- Apply defaults if bad settings are detected
Automate and robustify placement of plot trendline datalabel to avoid covering data points
Add v[ii] as a plottable variable?
Add checkbox option for whether to include the linfit datalabel
#### ADD CHECK TO ENSURE DELETING A REGRESSION DOESN'T PULL THE SOURCE OUT FROM UNDER
#### A CURRENTLY OPEN REGRESSION! Add to delete and close functions
Possibly add a check and notification to the Edit and Filter functions, to advise user
that the underlying data of an open Reg has been changed if the Edited/Filtered reg
is the source for that other Reg
Change initial folder of fd on NameEntry form if already defined within the Regression
-- OR, consider adding a 'currentBrowseFolder' to RegressMain: Remember last browsed folder
for opening regressions on the main form
Custom sigfigs on charts -- linfit datapoint, normalization factors in axis labels
Tab order on the Chart form
Plots of, or against, unfiltered predictors?
Plots including vs excluding filtered points
Would want to color based on filtered/unfiltered points
Wouldn't be able to plot fit parameters (residuals, etc.) for filtered points, though, since they wouldn't exist
for the filtered-out points
Have to trap for if a Regression returns bad numbers / errors / etc.
***Implement GUI filtering of data points
***Automated polynomials/interactions capability
Partial analysis: Auto-gen of new Regression that calculates residuals against a given predictor for the response and all other *active* predictors
(Y(xi) vs {Xj!=i(xi)}
***Automatic generation of factored datasets -- new SOURCE books with residuals of the response and of
a subset of predictors against another subset of predictors. (ability to see on a chart how the response
actually depends on a subset of parameters (incl. a single parameter), with the effects of other predictors
'factored out')
***'bubble plots' -- color-coded residual/Cook-distance as a function of two variables
(two predictors, one predictor+response?)
***Filter/unfilter outliers determined by studentized analysis, or in a range of Cook's distances, or such
Covariances? Separate sheet generation, only if called for.
Probably an entire new menu
What are cov's diagnostic for? Collinear predictors?
Collinearity diagnostics? (adapt 'CheckXData' function into the form code)
Proof predictor and response data ranges to ensure all values contained are numeric, every time source
defined or refreshed
***Check for whether source book exists when loading Regressions; implement graceful handling if absent
Do Range.CurrentRegion checking to ensure result sheet is properly structured/sized
Hide the internal working sheets?
***Some manner of interactive plot exploration would be really nice (point values, case #'s, whatever)
- Check box to show case numbers or something?
Q-Q plots for comparing response, predictors, residuals, etc. to various distributions (normal & Weibull at least)
--Chi-squared test is a formal statistical test for normality -- include on these? Should be
~straightforward to calculate when generating the Q-Q plot
-- CUSTOM RUN-TIME ERRORS --
1801 - object not initialized
-- TENTATIVE/UNCERTAIN --
Sort data by any of the above? Not showing explicit data/results columns very much.
May run into problems with collinear predictors at some point...
Coerce unique names for predictors; should hopefully allow for unambiguous naming of, e.g., chart tabs to facilitate auto-reload of previously generated analyses.
Include ability to leave data as a link to source workbook, instead of
copying as values-only?
Risky -- much more prone to breakage; probably better to just require
rebuilding of the regression book
[as needed] Robustify repop of workbooks list after opening of new/closing of old, as required
[as needed] Robustify error handling in actually generating the regression workbook
-- Currently in development --
-- COMPLETED --
[DONE] Fixed bugs in model selection code when a reduced model contains only one predictor
[DONE] Fixed bug when model reduces to one predictor; accidentally was testing number of points instead
of number of predictors.
[DONE] Added omitted definitions of series names on model selection chart
[DONE] Added priority recall of currently associated save folder for a regression when re-defining it, falling back
to the last-used application-level folder if missing/invalid
[DONE] Added retention of last-used folder memory for opening source data workbooks
[DONE] Added 'AIC' model selection button to the code enable/disable helper function
***Below completed as of v1.1.0***
[DONE] Implemented retention of last-used folder when opening/creating regressions
[DONE] Automated model selection -- construction of analysis based upon minimzation of AIC
Forward and backward; full factorial if model is small enough... how big is too big?
Some sort of criterion for summarily dropping a predictor? Threshold on beta p-value?
[DONE] Fixed edge case glitch when numerical precision limits result in diag(V) being >= 1
[DONE] Added helper routine for manual checking of linear independence of datasets
[DONE] Implemented basics of corrected AIC from Hu, 2007
***Below were completed as of v1.0.0***
[DONE] Different plot sizes for different uses
- large, how it is now
- medium, suitable for documents or small images in presentations, still has axis labels &c
- small, no axis labels, only min/max of axis ranges; for thumbnails
Thumbnails still have multiple tick labels in initial implementation
[DONE] Fiddle with Enter key behavior on the NameEntry form
[DONE] Highlight outliers in some fashion, either in data or on plots
--Customizeable alpha
[DONE] Add various charting capabilities for acting upon generated workbook
Plot any of these versus any other:
predictor, response, fit-response, residual, studentized residual, t-stat, Cook distance.
Also, residuals (or anything) vs sequence number
[DONE] Include student-t for identifying potential outliers
[DONE] Will have to figure out how to reversibly filter data points and predictors
Some sort of unique ID on the data points would probably help... row number of original data?
Apply filter (delete rows/columns in filtered sheet) in reverse order for easy indexing
[DONE] Add capability to have multiple regression workbooks open at once, with
switching between them (likely means new form)
[DONE] Will have to have a unified 'recreate regression' function w/in the class in order to
have the revised results propagate after data points or predictors are filtered out
[DONE] Add ability to close workbooks
[DONE] Add ability to include/exclude constant from the regression
[DONE] Implement R^2 calculation
[DONE] Implement stderr calculations for beta parameters
[DONE] IMPLEMENT names for predictors...
[DONE] Add capability to automatically reconstruct the necessary references, etc.
for a saved regression workbook
- Recall source data
- Recall regression workbook (just SAVED!)
[DONE] F-statistics
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment